It uses regular expression to separate the title and compares it with the previous one, saved in a text file, to determine if it's new. Here is the code:
URL = 'http://forum.xda-developers.com/showthread.php?t=1355660' SAVE_FILE = 'ics' REG_EXP = '<title>.+?\] ' import urllib2, re, time, webbrowser #### Load page print "Loading page..." html_content = urllib2.urlopen(URL).read() #### Search for version info try: match = re.findall(REG_EXP, html_content)[0] match = re.findall('\[.+', match)[0] #remove <title> except IndexError: match = "ERROR! Cant match regular expression:" + reg_exp ### Load SaveFile try: old = open(SAVE_FILE).readline() except IOError: old = 'file not found' if old == match: ## No new version print "Old Version:", match; time.sleep(5) #wait and close else: ## Update found!! open(SAVE_FILE, 'w').write(match) raw_input("UPDATE! \t"+ match + "\n\nPress any key to open url.") #print and pause webbrowser.open_new_tab(URL)
This is a more recent version that is simpler and can check multiple urls. It just prints out the topic title though, without checking if it's new.
URLs = [ 'http://forum.xda-developers.com/showthread.php?t=1766550', 'http://forum.xda-developers.com/showthread.php?t=1355660', ] import urllib2 for URL in URLs: #### Load page #print "Loading page: " + URL html = urllib2.urlopen(URL).read() begin = html.find('<title>') end = html.find('- xda', begin) title = html[begin+len('<title>'):end].strip() print title #print '\n' raw_input("Press any key")
No comments:
Post a Comment