Introduction to RDFa

0
239

This is the translation of the article Introduction to RDFa by Mark Birbek, originally published in A List Apart on 23 June 2010. The translation is presented here with the consent of the publisher (A List Apart Magazine) and the author.

RDFa (“Resource Description Framework in attributes”) has been living its celebrity moment for a while. Google is starting to process RDFa and Microformats in its web-indexing work, using the data collected to improve the display of search results with so-called “rich snippets”. Yahoo !, meanwhile, had started processing RDFa for a year. With these two research giants operating on the same trajectory, a new type of web is closer than ever.

The web was designed to be used by human beings, much of the rich and useful information it contains is inaccessible to machines. People are able to adapt to all possible variations in terms of layout, spelling, color, position, etc. continuing to extract the right meaning from the page. The machines, on the other hand, need some help.

A new type of web, a semantic web, should be made of marked information in such a way that software can easily understand it. Before considering the way in which we can build such a web, let’s see what we could be able to do with it.

An improved research

By adding data suitable to be consumed by a machine on a web page we can improve our ability to do research. Imagine a news that says more or less “today the prime minister left for Australia”, referring to Britain’s prime minister, Gordon Brown. The article may not mention the prime minister by name, but it is very easy to assume that this news will appear when someone searches for the “Gordon Brown” key.

If the article in question dates back to 1940, however, we would not want this document to appear when a user searches for “Gordon Brown”, but when he searches for “Winston Churchill”.

To achieve this by using the same technique as in the example of Gordon Brown (ie mapping one set of words to another), our search engine should know the start and end dates of the presidencies of all the prime ministers in the UK, for then make cross-references between these and the publication date of the newspaper article. This would not be entirely impossible, but what would happen if the article were a piece of fiction or if it referred to the Australian prime minister? In these cases, a simple list of dates would not help us.

Indexing algorithms that try to deduce the necessary context from the text will certainly improve in the years to come, but an additional markup that makes unambiguous information can only make the search more accurate.

Improved user interfaces

Yahoo! and Google have started using RDFa to improve the user experience by optimizing the way that individual search results are displayed. Here is the Google approach:

Rich snippet on Google
Rich snippet on Google

And here’s Yahoo!

A search result visually improved on Yahoo!
A search result visually improved on Yahoo!

There is a commercial advantage in having a greater “understanding” by the search engine of the contents of the indexed page: most relevant advertisements can be inserted alongside the search results.

Now that we know why it would be desirable to enter data suitable for the machines on our pages, we can ask ourselves how to do it.

Features on HTML metadata

You are certainly familiar with the metadata features supported by HTML. The most used are the elements metae link. Some of you may know that the attribute @rel used with the element link can also be used with the elements a(I use in this context the term “HTML” to refer to the “families of HTML languages” since what I say applies to both HTML both to XHTML).

We will consider these existing features first because they provide the conceptual foundations on which RDFa was conceived and implemented.

The use of metaand linkin HTML language

The elements metaand linkreside in the section headof a document. They allow us to provide information that is related to those in that document. For example, I may want to specify that I created my document on May 9, 2009, that I am the author and that I grant other people the right to use the article as they see fit:

<pre class=”brush: php; html-script: true”>
<html>
<head>
<title>RDFa: Now everyone can have an API</title>
<meta name=”author” content=”Mark Birbeck” />
<meta name=”created” content=”2009-05-09″ />
<link rel=”license” href=”http://creativecommons.org/licenses/ »
by-sa/3.0/” />
</head>
.
.
.
</html>
</pre>

This example shows how HTML reserves an ad hoc space for metadata, a space distinct from the text of the document itself. HTML uses the headmetadata section and the section bodyfor all types of content on the page.

HTML also allows us to “confuse” these two areas: we can place the attribute @relon a clickable link while continuing to maintain the meaning it has in the context of the element link.

Use @rel

Imagine that I want to allow visitors to my site to view my Creative Commons license. For how things are, the information on the license I refer to is hidden from readers because it is in the section head.But the thing can be solved by adding an element to the body of the document:

<pre class=”brush: php; html-script: true”>
<a href=”http://creativecommons.org/licenses/by-sa/3.0/”>
CC Attribution-ShareAlike</a>
</pre>

All this is good and allows us to achieve our goals. First of all we have readable metadata from a machine in the section headthat describe the relationship between the license and the document:

<pre class=”brush: php; html-script: true”>
<link rel=”license” href=”http://creativecommons.org/licenses/ »
by-sa/3.0/” />
</pre>

Then, we have a link in bodywhich allows a human to click and read the license:

<pre class=”brush: php; html-script: true”>
<a href=”http://creativecommons.org/licenses/by-sa/3.0/”>
CC Attribution-ShareAlike</a>
</pre>

But HTML also allows us to use the @relelement attribute linkon an element a. In other words, it allows metadata that normally appear in the section headof a document to appear in the body.

With this powerful technique we can enter both machine metadata and clickable links for humans in one place:

<pre class=”brush: php; html-script: true”>
<a rel=”license” href=”http://creativecommons.org/licenses/by-sa/3.0/”>
CC Attribution-ShareAlike</a>
</pre>

This simple method of enriching markup with metadata is not often used on web pages, but is in fact the heart of RDFa. With this in fact we arrive at the first key principle of RDFa.

Rule 1

The elements link and imply that there is a relationship between the current document and other documents; the attribute @relallows us to provide a value that best describes this relationship.

Do not forget: using @relthe element awe take advantage of an existing HTML feature, a feature on which RDFa then draws its attention.

Apply different licenses to images

The previous example provides information about licenses relative to the page that contains them. But what to do in which a page contains multiple items, each of which has its own specific license? It does not take long to imagine scenarios where this could happen, just think of a page with the results of a research carried out on Flickr, YouTube or SlideShare.

RDFa takes the simple idea behind the use of @rel(expressing a relationship between two things) and relies on it, allowing the attribute to be applied to the attribute src of the element img.

So, for example, imagine a page of results on Flickr:

<pre class=”brush: php; html-script: true”>
<img src=”image1.png” />
<img src=”image2.png” />
</pre>

Let’s say that the first image is distributed with a Creative Commons license of Attribution-ShareAlike type , while the second one uses an Attribution-Noncommercial-No Derivative Works license . How will we operate?

If you thought it would be enough to use the attribute @relon the tag imgcorrectly, you guessed it. To express two different types of licenses, one for each image, we simply do this:

<pre class=”brush: php; html-script: true”>
<img src=”image1.png”
rel=”license” href=”http://creativecommons.org/licenses/by-sa/3.0/” />
<img src=”image2.png”
rel=”license” href=”http://creativecommons.org/licenses/ »
by-nc-nd/3.0/” />
</pre>

In the example above you can see in action the key principle, based on the features related to the metedates already present in HTML. It is something that simplifies life for those who want to approach RDFa starting from already known concepts derived from HTML.

Rule 2

The attributes @reland @hrefare no longer limited to the elements and link, but can also be used on tags img to indicate a relationship between the image and other elements.

Add properties to the

In our illustration of the HTML features we also saw that we can add text properties to the document:

<pre class=”brush: php; html-script: true”>
<meta name=”author” content=”Mark Birbeck” />
<meta name=”created” content=”2009-05-01″ />
</pre>

This code snippet tells us who created the document and when, but can only be used in headthe document. RDFa allows, based on this concept, to use it in the element body; @contentit is no longer confined to the tag meta, but can appear on any element.

Rule 3:

Normally, in an HTML document, the properties are defined in the section head of the document itself, using @contentthe tag meta. In HTML documents with RDFa, it @contentcan be used to define properties on any element.

However, there is a slight change compared to the way it @contentis used in the section head. Since the attribute @nameis already used for a different purpose in other parts of the HTML, it could lead to some confusion using it to represent the property name in the body. RDFa, therefore, provides a new attribute, called @property, which performs this function.

Rule 4:

Suppose that the date of publication of our document and the name of the author are within the section head, and that the same information is in a human readable form in bodythe document:

<pre class=”brush: php; html-script: true”>
<html>
<head>
<title>RDFa: Now everyone can have an API</title>
<meta name=”author” content=”Mark Birbeck” />
<meta name=”created” content=”2009-05-09″ />
</head>
<body>
<h1>RDFa: Now everyone can have an API</h1>
Author: <em>Mark Birbeck</em>
Created: <em>May 9th, 2009</em>
</body>
</html>
</pre>

With RDFa we can merge these two sets of information so that metadata is placed at the same point in the readable text:

<pre class=”brush: php; html-script: true”>
<html>
<head>
<title>RDFa: Now everyone can have an API</title>
</head>
<body>
<h1>RDFa: Now everyone can have an API</h1>
Author: <em property=”author” content=”Mark Birbeck”>
Mark Birbeck</em>
Published: <em property=”created” content=”2009-05-09″>
May 14th, 2009</em>
</body>
</html>
</pre>

We will see soon how this example can be improved. For now we just need to understand how, whether they are written in head, or written in body, metadata means the same thing; it is practically the equivalent of the technique based on @relwhat HTML already uses to establish relationships in the context of bodythe document.

Use vocabularies

At this point, we have to make a small digression. We can stop using it @name=”author” in the head document section because even if the property “author” is not defined in any specification, over the years people have started to expect to be able to use it. But RDFa allows (and requires) much more precision. When we use a term as “author” or “created”, we must indicate from where that term comes from. If we do not, we have no way of knowing if what you mean by “author” is the same thing that I mean.

This may not seem necessary. After all, how could one confuse such an obvious term as “author”? But imagine that the term is “country” (“country, country”) on a site that speaks of holidays (“holyday”); does the term define the country in which the holiday takes place, or that the holiday takes place in the countryside and not in the city? Many other words have different meanings in different contexts, and if you add the possibility that there are other languages, you will soon understand that if we really want to get something from our data, it is necessary to be precise. It means that we have to indicate where a certain term comes from.

In RDFa we do this by indicating that we want to use a certain collection of terms, called vocabulary. It’s easy: just specify the address of the dictionary along with a map in abbreviated form, like this:

<pre class=”brush: php; html-script: true”>
xmlns:dc=”http://purl.org/dc/terms/”
</pre>

If you know something about XML, you will certainly recognize that it is the syntax used in the declaration of a namespace.

This example gives us access to the list of terms of the Dublin Core vocabulary, by means of the prefix “dc”. Dublin Core includes many other terms, the two that we will use in our example are “creator”e “created”. To implement them we need to insert the prefix before each of them, like this:

<pre class=”brush: php; html-script: true”>
dc:creator
dc:created
</pre>

Now everything is clear: “dc:creator”it is not the same thing as “xyz:creator”.

Note that the mapping prefix must be placed at the top of the document, above the location where it will be used. In our example it could be inserted at the element bodyor element level html. The complete example is like this:

<pre class=”brush: php; html-script: true”>
<html xmlns:dc=”http://purl.org/dc/terms/”>
<head>
<title>RDFa: Now everyone can have an API</title>
</head>
<body>
<h1>RDFa: Now everyone can have an API</h1>
Author: <em property=”dc:creator” content=”Mark Birbeck”>
Mark Birbeck</em>

Published: <em property=”dc:created” content=”2009-05-09″>
May 9th, 2009</em>

</body>
</html>
</pre>

There are so many other vocabularies to choose from, I will make a list in the second article of this series. Obviously, nothing prevents you from creating your vocabulary for use in the context of your company, organization or interest group. But mind one thing that often surprises people: there is no centralized organization that will keep your job. There are best practices to follow. However, with the power comes responsibility, so try to learn as much as possible before proceeding to work on a new vocabulary.

Before returning to our example I should add something about the vocabularies; you will certainly be wondering why @rel=”license”it has not undergone the same treatment @property=”author”and requires a prefix. The answer is that HTML already has some default values for @rel, like “next”and “prev”; RDFa adds some others. One of those added by RDFa is precisely “license”.

But once we leave this list of values (for example, if we use the term “replaces”from the Dublin Core “knows”vocabulary or the FOAF vocabulary) then we have to use a prefix and a mapping technique just like for @property.

For example, suppose our article not only has a Creative Commons license as seen above, but replaces other documents, a type of relationship that we can express via the “replaces”Dublin Core term . Thus we express the two reports:

<pre class=”brush: php; html-script: true”>
<html xmlns:dc=”http://purl.org/dc/terms/”>
<head>
<title>RDFa: Now everyone can have an API</title>
</head>
<body>
<h1>RDFa: Now everyone can have an API</h1>
Author: <em property=”dc:creator” content=”Mark Birbeck”>
Mark Birbeck</em>

Created: <em property=”dc:created” content=”2009-05-09″>
May 9th, 2009</em>

License: <a rel=”license” href=”http://creativecommons.org/licenses/ »
by-sa/3.0/”>
CC Attribution-ShareAlike</a>

Previous version: <a rel=”dc:replaces” href=”rdfa.0.8.html”>
version 0.8</a>

</body>
</html>
</pre>

Well, now that we’re done with vocabularies we can go back to our main example.

Use online text to set the value of a property

In the above example, duplicating the “Mark Birbeck” @content text in the attribute and text in the body of the document may have confused you. If so, you are starting to get involved in the RDFa logic. We can actually remove the value @content if the text inserted in the body of the document preserves the value we want to use for the matadate:

<pre class=”brush: php; html-script: true”>
Author: <em property=”dc:creator”>Mark Birbeck</em>
</pre>

Rule 5:

If an attribute is not present @content, then the value of a property will be defined using inline text on a specific element.

Although the technique based on @contentis derived from the meta element of HTML, think of the previous example as the default mode for setting a property. Providing a value for it @contentcan be a way to overcome the value defined in the online text if it does not say exactly what you want. It also allows authors greater freedom in text than the user will read, as they can be more precise in the embedded data. The date of publication illustrates all this well; all the dates in these examples have the same meaning, yet they are presented differently to the user:

<pre class=”brush: php; html-script: true”>
<span property=”dc:created” content=”2009-05-14″>May 14th, 2009</span>
<span property=”dc:created” content=”2009-05-14″>May 14th</span>
<span property=”dc:created” content=”2009-05-14″>14th May</span>
<span property=”dc:created” content=”2009-05-14″>14/05/09</span>
<span property=”dc:created” content=”2009-05-14″>tomorrow</span>
<span property=”dc:created” content=”2009-05-14″>yesterday</span>
<span property=”dc:created” content=”2009-05-14″>14 Mai, 2009</span>
<span property=”dc:created” content=”2009-05-14″>14 maggio, 2009</span>
</pre>

Rule 6:

If the attribute @contentis present, it exceeds and overlaps with the one defined in the online text to set the property value.

We will see in the next part how to add properties to an image and how to add metadata to each item.

LEAVE A REPLY

Please enter your comment!
Please enter your name here