Python Project to Parse Web Pages Such as Yahoo or
Google : created triggers when certain phrases were found
within web pages
Links
to Python Project to Parse web pages
such as Yahoo and Google to create triggers when
certain phrases were found within the web pages :
We built a Python program to monitor and filter news
feeds over the internet.
Python Project to Parse Web Pages Such
as Google or Yahoo: created triggers when certain phrases were found
within the web pages
- We built a program to monitor news feeds over the
internet.
- It filtered the news, alerting the user when it
notices a news story that matches the user's
interests.
- Python concepts: implemented classes and their
attributes, understanding class methods, understanding
inheritance, telling the difference between a class and
an instance of that class, and utilizing libraries as
black boxes.
- The project purpose was as follows: Many web sites
have content that is updated on an unpredictable
schedule. News sites such as Google News, are a good
example of this. One tedious way to keep track of this
changing content is to load a website up in your
browser, and periodically hit the refresh
button.
- This process can be streamlined and automated by
connecting to the web site's RSS feed, using an RSS feed
reader instead of a web browser.
- An RSS (Really Simple Syndication) reader will
periodically collect and draw your attention to updated
content.
- An RSS feed consists of (periodically changing) data
stored in an XML-format file residing on a web
server.
- For this project we did not need to know XML or how to
access these files over the network. We used a special
Python module to deal with these lower level
details.
- The higher level details of the structure of the
Google News RSS feed was our focus of this problem
set.
- If we loaded the URL into a browser, we would see the
browser's interpretation of the XML code generated by
the feed.
- When you connect to Google News RSS feed, you receive
a list of items.
- Each entry in this list represents a single news
item.
- In a Google News feed, every entry has the following
fields: guid = globally unique identifier for this news
story, title, subject, summary, link.
- The goal was to create an application that aggregates
several RSS feeds from various sources and can act on
all of them in the exact same way all in one
place.