RSS Feed Parsing Day 3 of 3
Day 54 of 100
One of the best things about the Python programming language is the vibrant community of developers writing packages that are easy to add your projects and save you lots of time. A great example of this is the Universal Feed Parser package for easily working with common feeds, such as RSS and Atom. While newsfeed standards are well documented and the xml file structure is consistent, this means writing your own parser would not be too difficult, it would take time. For myself and many other people, I want to do something with the feed data and not have to worry about processesing the file - just give me the data.
Universal Feed Parser does this - it takes an xml
file of data from a wide range of supported newsfeed protocols or it can directly process a URL to a supported newsfeed. For my application, I wanted to pull down the RSS
feed for my blog and create a list of articles along with a the link to the article. In the future, I want to run this application on a regular basis and send me (and possibly a mailing list) the title of the blog post and a link to it. You can see how this package is perfect for what I need, I want easy access to the data, but I don't want to spend all day figuring out how to parse the file.
The application consists of two files: one script to save the data to a file and one to process the data. I could certainly pull the data directly with feedparser
each time I ran the script but it is best practice to only pull the data when you need it or when it changes. With this structure, I can pull the data once, save it to a file, and then work with the data.
The first script uses the requests
package to get the RSS
feed data and save it to a file.
pull_xml.py
import requests
URL = "https://covrebo.com/feeds/all.atom.xml"
def main():
r = requests.get(URL)
with open("covrebo.xml", "wb") as f:
f.write(r.content)
if __name__ == '__main__':
main()
The second script opens the file of feed data that was created by the pull_xml.py
script and uses feedparser
to process and access the data. The feedparser.parse(FEED_FILE)
takes the data from the file and saves it as a feedparser object in the feed
variable. From there, you can iterate over the object and access all of the attributes of each entry in the feed. A full list of available attributes can be found by inspecting the xml
file or a list of common elements can be found in the feedparser documentation for common RSS elements and common Atom elements. For this script, I loop over the feed objects and print a list of article titles and a link to that article.
parser.py
import feedparser
FEED_FILE = "covrebo.xml"
def main():
feed = feedparser.parse(FEED_FILE)
for entry in feed.entries:
print(f"{entry.title}: {entry.link}")
if __name__ == '__main__':
main()
Example output:
Webscraping HTML Tables with Beautiful Soup 4: https://covrebo.com/webscraping-html-tables-with-beautiful-soup-4.html
HTTP Services and Searching JSON: https://covrebo.com/http-services-and-searching-json.html
Personal Data Backup Strategy 2 of 2: https://covrebo.com/personal-data-backup-strategy-2-of-2.html
...
Day 2 of 100 - Datetime and Timedelta Day 2 of 3: https://covrebo.com/day-2-of-100-datetime-and-timedelta-day-2-of-3.html
Day 1 of 100 - Datetime and Timedelta Day 1 of 3: https://covrebo.com/day-1-of-100-datetime-and-timedelta-day-1-of-3.html
Day 0 of 100 - Introduction and Orientation: https://covrebo.com/day-0-of-100-introduction-and-orientation.html
From here, there are lots of different ways to go. First, I plan to make the script more universal by prompting the user for the feed URL and filename. In the scripts above, the feed URL and filename are hard coded and this is fine for my purposes but by prompting for user input, it will make it easier for the script to be used for more than just this blog and printing a list of article titles and links. Second, despite the apparent simplicity of the newsfeed protocols, they are incredibly powerful for tracking news and events. For example, I could save the results of the of the feed to a file, query the feed on a regular basis, and have the script send me an email each time a new article is posted. I am an avid podcast listener and most podcast subscription links are RSS feeds so I could check the feed and get an email each time a new episode is posted. I am also a user of RSS feeds for news and I could write my own RSS reader using this package. One of my favorite features of the Python language is the vast library of packages that, just like feedparser
, allow me to build awesome things without have to write all of the background functions - just let me work with the data.
Resources
Talk Python Training 100 Days of Code Course
Univeral Feed Parser