November 15, 2016
In these days, SSIX is integrating a new type of sources into the platform, that will capture information from generic news feeds (or web feeds). This will allow to perform data ingestion from external websites exposing public feeds, like the ones provided by Google News, Yahoo Finance and other news aggregators or publishers, which give access to the latest published news or articles.
Two main types of news feeds are available to the community: RSS and Atom. Let’s understand together which are the main differences between these two standards.
The most famous and used format has always been RSS and the latest know version is 2.0. RSS has been released in 2002 under copyright by the Harvard University and the last update has been made in 2009 (http://www.rssboard.org/rss-specification). On the other side, the Atom protocol has been presented to IETF in 2005 (https://tools.ietf.org/html/rfc4287).
One of the major innovations introduced by Atom is the possibility to indicate the abstract of the entry in a separated tag. Atom also introduced the type attribute, used to specify the type of data contained in a tag.
We can state that, even if less used than RSS, Atom offers greater precision for defining structures and – more importantly – has been recognized as an IETF standard.
The following table shows the main RSS elements and their equivalents in Atom:
|description||summary and/or content|
|lastBuildDate (in channel)||updated|
|managingEditor||author or contributor|
|pubDate||published (subelement of entry)|
Following, an example of RSS feed:
<description>an example feed</description>
<title>1 < 2</title>
<description>1 < 2, 3 < 4. In HTML, <b> starts a bold phrase and you start a link with <a href=</description>
Here is an example of Atom feed:
<?xml version="1.0" encoding="utf-8"?>
<title>Feed di esempio</title>
<subtitle>Testo del sotto-titolo qui</subtitle>
<title>I robots Atom usano Amok</title>
<summary>Testo del sommario.</summary>