Fetching stuff with the URL Fetch API is simple (especially if one has faith that the source is there and it will deliver inside GAE time limits):
from google.appengine.api import urlfetch
from xml.dom import minidom
def parse(url):
r = urlfetch.fetch(url)
if r.status_code == 200:
return minidom.parseString(r.content)
As is accessing the resulting DOM with MiniDom. Here the source is an Atom feed:
import time
dom = parse(URL)
for entry in dom.getElementsByTagName('entry'):
try:
published = entry.getElementsByTagName('published')[0].firstChild.data
published = time.strftime('%a, %d %b', time.strptime(published, '%Y-%m-%dT%H:%M:%SZ'))
except IndexError, ValueError:
pass
…
In an earlier post a C++ snippet can be found where a DB XML container was created (or opened if already exists) and a document read from stdin was put into that container. That same snippet done in Python is pretty much identical: