python에서 XML을 파싱할 때 주로 elementtree 라이브러리를 사용한다. 나 같이 XML에 대해 잘 몰라도 쉽게 노드에 접근 및 추가할 수 있다.
요넘을 이용한 간단한 rss 파서를 만들어봤다. 아주 간단히...ㅋㅋ
참고문서
(참고하기 보다는 그냥 복사해왔다는 말이 정확하겠다^^)
urllib2 - http://www.voidspace.org.uk/python/articles/urllib2.shtml
elementtree - http://effbot.org/zone/element-index.htm
요넘을 이용한 간단한 rss 파서를 만들어봤다. 아주 간단히...ㅋㅋ
#-*-encoding:utf-8
import socket
from urllib2 import Request, urlopen
import elementtree.ElementTree as ET
class Rss(object):
id = int()
link = str()
title = str()
description = str()
item_list = list()
class RssItem(object):
id = int()
title = str()
link = str()
description = str()
pub_date = str()
site_id = int()
def get_rss(rss_url):
rss = Rss()
req = Request(rss_url)
rss_content = str()
response = None
try:
timeout = 3
socket.setdefaulttimeout(timeout)
response = urlopen(req)
except IOError, e:
if hasattr(e, 'reason'):
print 'We failed to reach a server.'
print 'Reason: ', e.reason
elif hasattr(e, 'code'):
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
sys.exit(0)
try:
rss_content = response.read()
tree = ET.fromstring(rss_content)
channel = tree[0]
rss.title = channel.find('title').text.strip()
rss.link = channel.find('link').text.strip()
rss.description = channel.find('description').text.strip()
items = channel.findall('item')
for item in items:
rss_item = RssItem()
rss_item.title = item.find('title').text.strip()
rss_item.link = item.find('link').text.strip()
rss_item.description = item.find('description').text.strip()
rss_item.pub_date = item.find('pubDate').text.strip()
rss.item_list.append(rss_item)
except Exception, e:
print e
return rss
if __name__ == '__main__':
site = get_rss("http://no99.tistory.com/rss")
print site.title
import socket
from urllib2 import Request, urlopen
import elementtree.ElementTree as ET
class Rss(object):
id = int()
link = str()
title = str()
description = str()
item_list = list()
class RssItem(object):
id = int()
title = str()
link = str()
description = str()
pub_date = str()
site_id = int()
def get_rss(rss_url):
rss = Rss()
req = Request(rss_url)
rss_content = str()
response = None
try:
timeout = 3
socket.setdefaulttimeout(timeout)
response = urlopen(req)
except IOError, e:
if hasattr(e, 'reason'):
print 'We failed to reach a server.'
print 'Reason: ', e.reason
elif hasattr(e, 'code'):
print 'The server couldn\'t fulfill the request.'
print 'Error code: ', e.code
sys.exit(0)
try:
rss_content = response.read()
tree = ET.fromstring(rss_content)
channel = tree[0]
rss.title = channel.find('title').text.strip()
rss.link = channel.find('link').text.strip()
rss.description = channel.find('description').text.strip()
items = channel.findall('item')
for item in items:
rss_item = RssItem()
rss_item.title = item.find('title').text.strip()
rss_item.link = item.find('link').text.strip()
rss_item.description = item.find('description').text.strip()
rss_item.pub_date = item.find('pubDate').text.strip()
rss.item_list.append(rss_item)
except Exception, e:
print e
return rss
if __name__ == '__main__':
site = get_rss("http://no99.tistory.com/rss")
print site.title
참고문서
(참고하기 보다는 그냥 복사해왔다는 말이 정확하겠다^^)
urllib2 - http://www.voidspace.org.uk/python/articles/urllib2.shtml
elementtree - http://effbot.org/zone/element-index.htm
'개발 끄적임들 > 리눅스 끄적임' 카테고리의 다른 글
arch linux, 어떤 미러 사이트가 빠를까나? smart-mirror 0.1 (3) | 2009.02.15 |
---|---|
더욱더 화려해지고, 안정된 KDE 4.2 (0) | 2009.02.09 |
TurboGears 2.0 Beta 1 release 되었다. (0) | 2009.01.02 |
Kubuntu를 던져 버리다. (0) | 2008.11.11 |
K/Ubuntu 8.10, Intel Wireless 3945AGB 드라이버 ipw3945 패치 작업 (1) | 2008.10.31 |