본문 바로가기

개발

gcp jupyter notebook에서 url 사용하여 xml 파일 파싱하기

How to parse xml file in gcp jupyter notebook with url

import xml.etree.ElementTree as ET
import urllib.request

def parsefile(path):
    response = urllib.request.urlopen(path).read()
    tree = ET.fromstring(response)
    print(tree)

    time = tree.findall("event")

    starttime = [x.findtext("starttime") for x in time]
    duration = [x.findtext("duration") for x in time]

    if len(startlist) >= 2:
        starttime = int(startlist[0])*60 + int(startlist[1])
        alltime = starttime + int(durationlist[0])*60 + int(durationlist[1])
        return (starttime + alltime) / 2 # 중간으로 잘라줌

    else:
        pass
<?xml version='1.0' encoding='utf-8'?>
<annotation>
    <folder>datefight</folder>
    <filename>194-6_cam01_datefight02_place02_night_spring.mp4</filename>
    <source>
        <database>NIA2019 Database v1</database>
        <annotation>NIA2019</annotation>
    </source>
    <size>
        <width>3840</width>
        <height>2160</height>
        <depth>3</depth>
    </size>
    <header>
        <duration>00:05:15.6</duration>
        <fps>30</fps>
        <frames>9468</frames>
        <inout>IN</inout>
        <location>PLACE02</location>
        <season>SPRING</season>
        <weather>SUNNY</weather>
        <time>NIGHT</time>
        <population>3</population>
        <character>M20,M20,F20</character>
    </header>
    <event>
        <eventname>datefight</eventname>
        <starttime>00:02:50.6</starttime>
        <duration>00:01:34.9</duration>
    </event>
    <object>
        <objectname>Person_1</objectname>

위 코드는 행동의 시간, 지속시간 정보가 있는 xml 파일을 파싱하여

행동의 시작시간+종료시간의 중간을 계산한다.