Simple Semantic Resolution - RSS 2.0 Module

Specification Version 1.0 (DRAFT)

This version : 7th May 2003

Latest version : http://purl.org/stuff/ssr

Danny Ayers, 2003. Comments appreciated: email blog

license/copyright willl be 'primarily public domain'

Abstract

This specification defines the Simple Semantic Resolution (SSR) Module for the RSS 2.0 syndication format. The purpose of SSR is to provide a mechanism by which the semantics of an RSS 2.0 document can be unambiguously resolved to an RDF model. This is done by declaring the RSS 2.0 file as being an RDF representation and provide a mapping between the RSS 2.0 syntax and the RDF model. The mapping is declared using an XSLT to give an RSS 1.0-based representation, this RDF/XML serialization providing anchorage to the RDF model. The role of the XSLT stylesheet is as part of a module's specification, processing with it will not be needed in most circumstances.

Put simply: an element is added to the source RSS 2.0 document which declares "this is RDF, and here is the mapping". This has absolutely no effect on the interpretation of the document as RSS 2.0 within the bounds of that specification, but enables the contents of RSS 2.0 files to be considered first class material for the Semantic Web.

This specification is intended to act as a bridge between RSS 2.0 and RSS 1.0 on a technical level, it is hoped readers of this document will consider the technical aspects above any political concerns. No general criticism of either format is intended here.

Module Developers : see also SSR-Enabling an RSS 2.0 Module

1. Background
- 1.1 The Problem
- 1.2 A Solution
2. Who has to do What, and Why
3. Example Usage
4. Definitions
5. Further Development
- 5.1 SSR and Modules
- 5.2 Potential Problems
6. Acknowledgements
7.Changes

1 Background

1.1 The Problem

RSS 2.0 is an easy format to generate, and has clearly defined semantics within the core application domain of simple syndication. It also offers an extension mechanism through the use of XML namespaces. Languages defined in other namespaces may be directly inserted into an RSS 2.0 document. However, the manner in which such a language (combined with core RSS 2.0) should be interpreted by an RSS 2.0 agent is not formally defined within the RSS 2.0 specification, and it falls to the author of the language/module to fully describe the interpretation.

RSS 1.0 on the other hand is perceived as being difficult to generate, but has far more widely defined semantics thanks to its foundation as an Resource Description Framework (RDF) language. By using the framework, a module author is spared a great deal of effort in defining how their extension should be interpreted, as the basic data relationships are defined already in RDF.

1.2 A Solution

It has been suggested many times that the standard RDF/XML serialization format is inelegant and cumbersome. But the RDF community is fairly united in the view that the important part of RDF is the model, not the serialization. Other representations are available - Notation 3, N-Triples and the node and arc graph visualization are all in common use.

Both RSS 1.0 and RSS 2.0 describe the same basic data structures (channels, items and so on) and within the RSS domain they effectively share the same data model. That this model can be described in terms of the RDF model is evinced by RSS 1.0. The RDF model in general is very powerful, but the RSS 1.0 (RDF/XML) format is considered difficult.

The solution proposed here is the obvious one : take the simple syntax of RSS 2.0 and combine it with the powerful model of RDF.

There are two requirements for this. The first is a way of mapping the RSS 2.0 syntax to the RDF model, the second is a means of declaring in an RSS 2.0 file that such a mapping is available.

Of significance is the fact that for most practical purposes the mapping is only needed in one direction : RSS 2.0 to RDF. Generally speaking, generation of RSS 2.0 from an application that is aware of the RDF model should be a relatively simple operation, as the target model expressed in XML could be described as a 'semantic downcast' from an RDF representation.

The mapping required can be expressed as an XSLT transform that converts RSS 2.0 into RSS 1.0. The use of XSLT to convert arbitrary XML to RDF is a relatively common practice, but this specification extends the idea by demanding an explicit assertion of the mapping within the source document. This is not unlike the children's game "Simon Says", where commands are ignored unless qualified by the key phrase - "touch your nose" (do nothing) "Simon says 'touch your nose'" (touch your nose). This analogy provides an alternate title for the specification : Simon Says: "RDF".

Processing can be applied directly to the data in an SSR-enabled feed without the application needing to use either XSLT or RDF/XML. The difference is that SSR can remove any ambiguity from the feed, so it's possible to be sure that an application behaving as intended.

2. Who has to do What, and Why

One of the aims of SSR is for a minimization of the effort required for the maximum benefit for everyone involved in the syndication process.

2.1 End User

The end user should notice no difference, all being well they smile more.

2.2 Content Provider

The feed producer has to ensure that the modules they are using are appropriately supported - that there is a reference map available somewhere. If their material is particularly complicated, they may have to provide the XSLT themselves. Apart from this it's just a matter of adding a single element to the feed.

If you anticipate a considerable amount of use of the richer content, then you may be advised to consider running the XSLT locally and supplying the transformed data in a separate feed.

2.3 End-Use Application Developer

A consumer of RSS 2.0 can consume the data exactly as before, ignoring the additional element (the element is from another namespace). If the application can use RDF, then either stylesheet processing or other means can be used to extract the required information from the feed. An RSS 1.0 source can be synthesized from the RSS 2.0 feed.

Anyone building an application from scratch can save a lot of work by supporting SSR. As the semantics contained in an SSR-enabled RSS feed are expressed in terms of the RDF model, then use of an interface with that model will allow decoupling of processing within the application. Typically this would involve using either an RDF API or making a simple custom representation of RDF artifacts (resources, properties etc.) within the application. RSS 1.0 data can be parsed into the RDF model directly. RSS 2.0 plus a set of known modules could also be treated in this fashion, though a more general solution would be XSLT pre-processing. Once the data is loaded into the RDF representation in the application, a common set of processing can then be applied to display or otherwise process the data.

2.4 Module Author

A module author (RSS 1.0 or 2.0) that wishes to take advantage of SSR should provide an XSLT stylesheet with their module, identified at a permanent address. The XSLT should be suitable for converting an RSS 2.0-compatible XML representation of information expressed using their module into an RDF/XML representation. The RDF/XML will be an RDF model that matches the semantics they intend in the RSS 2.0 syntax. A detailed description of the full procedure is provided in SSR-Enabling an RSS 2.0 Module.

2.5 Semantic Web Developer

One way of taking most advantage of SSR is to have your parser recognise the <ssr:rdf> element, and pump any RSS 2.0 containing this through an XSLT engine. Caching copies of the stylesheets you're interested in is probably a good idea.

However SSR simply disambiguates RSS 2.0 so that it can be interpreted using the RDF model. If SSR is used, then an inference engine could reason with the information supplied in an RSS 2.0 feed without going anywhere near XSLT or RDF/XML.

If you are working with an RDF Schema that may be used in conjunction with the RSS 1.0 vocabulary, then you should consider making your vocabulary available for use in RSS 2.0 by following the steps described in 2.4 above.

3. Example Usage

3.1 RSS 2.0 Source

This is a lot easier than the verbiage might suggest, here is an example :

<?xml version="1.0"?>

<rss version="2.0"

xmlns:ssr="http://purl.org/stuff/ssr">

<ssr:rdf transform="http://w3future.com/weblog/gems/rss2rdf.xsl" />

<channel> <title>A Sample Feed</title> <link>http://www.example.org/</link> <description>For demonstration purposes</description><item> <title>A Simple Item</title> <link>http://www.example.org/something.html</link> <guid>http://www.example.org/something.html</guid> <pubDate>Tue, 08 Apr 2003 10:28:59 GMT</pubDate> <description>Here is the descriptive text.</description> </item> </channel></rss>

The added line says :

this is RDF data
to interpret this data you can :
- apply the stylesheet at this address
- interpret the result as RDF/XML

But you don't have to apply the stylesheet at this address. It isn't necessary for an XSLT processor to be involved in day-to-day running at all. That is entirely up to the consumer of the feed.

A consumer that understands the RDF model can interpret the feed as RDF. This may or may not involve the use of XSLT.

The benefit is that the existing RSS 2.0 channel can now transparently act as a conduit for semantically rich data, and be backed by the formal specifications of the RDF framework.

3.2 Transformed Source

An interpreter of the source above will apply this XSLT transformation resulting in the following :

<?xml version="1.0" encoding="UTF-8"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http://purl.org/rss/1.0/" xmlns:r="http://backend.userland.com/rss2" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/">

<channel rdf:about="http://www.example.org/"> <dc:title>A Sample Feed</dc:title> <dc:description>For demonstration purposes</dc:description>

<items> <rdf:Seq> <rdf:li rdf:resource="http://www.example.org/something.html"/> </rdf:Seq> </items>

</channel> <item rdf:about="http://www.example.org/something.html"> <dc:title>A Simple Item</dc:title> <dc:date>2003-04-08T10:28:59</dc:date> <dc:description>Here is the descriptive text.</dc:description> <content:encoded><![CDATA[Here is the descriptive text.]]></content:encoded> </item> </rdf:RDF>

output.rdf

3.3 RDF Model

3.3.1 Graph Visualization

The RDF that the original RSS 2.0 represents can then be used in any RDF system, or visualized like this :

[apologies, I can't get a decent scaled image at the moment - using Isaviz]

Larger version (from W3C RDF Validator)

SVG Version

3.3.2 Triples of the Data Model in N-Triples Format (Sub, Pred, Obj)

<http://www.example.org/> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/rss/1.0/channel> . <http://www.example.org/> <http://purl.org/dc/elements/1.1/title> "A Sample Feed" . <http://www.example.org/> <http://purl.org/dc/elements/1.1/description> "For demonstration purposes" . _:jARP67745 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq> . _:jARP67745 <http://www.w3.org/1999/02/22-rdf-syntax-ns#_1> <http://www.example.org/something.html> . <http://www.example.org/> <http://purl.org/rss/1.0/items> _:jARP67745 . <http://www.example.org/something.html> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/rss/1.0/item> . <http://www.example.org/something.html> <http://purl.org/dc/elements/1.1/title> "A Simple Item" . <http://www.example.org/something.html> <http://purl.org/dc/elements/1.1/date> "2003-04-08T10:28:59" . <http://www.example.org/something.html> <http://purl.org/dc/elements/1.1/description> "Here is the descriptive text." . <http://www.example.org/something.html> <http://purl.org/rss/1.0/modules/content/encoded> "Here is the descriptive text." .

(from W3C RDF Validator)

4. Definitions

4.1 Namespace

The namespace for Simple Semantics Resolution is:

http://purl.org/stuff/ssr

the recommended prefix is ssr, so the namespace declaration will read

xmlns:ssr="http://purl.org/stuff/ssr"

4.2 Element

There is only one XML element in the SSR language :

<ssr:rdf/>

When this element appears as a child of the top-level <rss> element of an RSS 2.0 document, it states that the contents of this document may be interpreted as RDF.

This element will usually be used in combination with the attribute described below, although alternate usage is suggested:

Inclusion of this element in an RSS 2.0 document without any attributes would be hint to any application reading the data that an RDF interpretation is possible, but that the application itself should decide the mapping itself. In other words - take your best shot.

4.3 Attribute

There is only one XML attribute in the SSR language :

transform="uri-of-stylesheet"

This attribute is to be used in conjunction with the element above to state that the contents of this document may be interpreted as RDF using the stylesheet identified above to produce RDF/XML. So the line:

<ssr:rdf transform="http://w3future.com/weblog/gems/rss2rdf.xsl" />

states:

"this feed, after processing by the identified XSLT will produce RDF/XML that may be interpreted as the RDF model of this feed"

There is no duty imposed on either the producer or consumer of the feed to use the identified stylesheet or even retrieve it.

However, such a stylesheet must be available somewhere.

5 Further Development

5.1 SSR and Modules

Although the use of the SSR module does offer an important benefit to core RSS 2.0 data, the real gains are to be made when it is used in combination with other modules and namespaces.

It is anticipated that an RSS module author will provide a stylesheet that will convert RSS 2.0 documents including entities from their module into RSS 1.0. By mapping to RDF/XML in this way, the author is spared the trouble of detailing aspects like the relationship between their elements/attributes and the RSS 2.0 elements/attributes : all this will be made specific by applying the XSLT transformation and interpreting the results in terms of the RDF model.

A major bonus is that it will be possible to include existing RSS 1.0 modules and other RDF vocabularies in RSS 2.0 files without any loss or corruption of their semantics.

Multiple module (namespace) support may be problematic, it remains to be seen. But RSS 2.0 + module X is a start.

5.2 Potential Problems

The RSS 1.0 and RSS 2.0 specifications have drifted apart some distance from their common roots, so the mapping between the entities in each may not be precise. It is hoped that publication of this specification will help identify such mismatches.

There are likely to be related implementation issues (like the date format) but these may largely be resolvable within the XSLT (like the date format!).

6. Acknowledgements

The particular XSLT stylesheet referred to in this document is Sjoerd Visscher's RSS to RDF converter. The RDF graph diagram was generated using IsaViz. Thanks to Uche Ogbuji for the inspiration. Many thanks to the folks on the RSS development mailing lists for positive and helpful feedback.

7. Changes

2003-05-07

Changes section added.
Name change: Simple Semantics Resolution.
Paragraph added to 2.3.
Paragraph added to 2.5.
Various minor changes.