python의 elementtree를 이용한 간단한 rss 파서

개발 끄적임들/리눅스 끄적임 2009. 1. 3. 19:40

python에서 XML을 파싱할 때 주로 elementtree 라이브러리를 사용한다. 나 같이 XML에 대해 잘 몰라도 쉽게 노드에 접근 및 추가할 수 있다.

요넘을 이용한 간단한 rss 파서를 만들어봤다. 아주 간단히...ㅋㅋ

#-*-encoding:utf-8
import socket
from urllib2 import Request, urlopen
import elementtree.ElementTree as ET

class Rss(object):
    id = int()
    link = str()
    title = str()
    description = str()
    item_list = list()

class RssItem(object):
    id = int()
    title = str()
    link = str()
    description = str()
    pub_date = str()
    site_id = int()

def get_rss(rss_url):
    rss = Rss()
    req = Request(rss_url)
    rss_content = str()
    response = None

    try:
        timeout = 3
        socket.setdefaulttimeout(timeout)
        response = urlopen(req)
    except IOError, e:
        if hasattr(e, 'reason'):
            print 'We failed to reach a server.'
            print 'Reason: ', e.reason
        elif hasattr(e, 'code'):
            print 'The server couldn\'t fulfill the request.'
            print 'Error code: ', e.code
        sys.exit(0)

    try:
        rss_content = response.read()
        tree = ET.fromstring(rss_content)
        channel = tree[0]
        rss.title = channel.find('title').text.strip()
        rss.link = channel.find('link').text.strip()
        rss.description = channel.find('description').text.strip()
        items = channel.findall('item')

        for item in items:
            rss_item = RssItem()
            rss_item.title = item.find('title').text.strip()
            rss_item.link = item.find('link').text.strip()
            rss_item.description = item.find('description').text.strip()
            rss_item.pub_date = item.find('pubDate').text.strip()
            rss.item_list.append(rss_item)
    except Exception, e:
        print e

    return rss

if __name__ == '__main__':
    site = get_rss("http://no99.tistory.com/rss")
    print site.title

참고문서
(참고하기 보다는 그냥 복사해왔다는 말이 정확하겠다^^)
urllib2 - http://www.voidspace.org.uk/python/articles/urllib2.shtml
elementtree - http://effbot.org/zone/element-index.htm

'개발 끄적임들 > 리눅스 끄적임' 카테고리의 다른 글

arch linux, 어떤 미러 사이트가 빠를까나? smart-mirror 0.1 (3)	2009.02.15
더욱더 화려해지고, 안정된 KDE 4.2 (0)	2009.02.09
TurboGears 2.0 Beta 1 release 되었다. (0)	2009.01.02
Kubuntu를 던져 버리다. (0)	2008.11.11
K/Ubuntu 8.10, Intel Wireless 3945AGB 드라이버 ipw3945 패치 작업 (1)	2008.10.31