Linked Data Deployment
Daniel Lewis OpenLink Software dlewis@openlinksw.com
The Web of Documents
The current ubiquitous web is a “Web of Documents” (aka “Document Web” or “Web 1.0”), on this web, documents (aka pages) are interconnected using embedded hyperlinks. When you click on a hyperlink your browser takes you to a new web-based document at a specific location which is identified by a Uniform Resource Locator (aka URL) and fetched using the HyperText Transfer Protocol (aka HTTP). These hyperlinks are embedded in the document with no machine-readable meaning, and sometimes without human-understandable meaning. The reason for this is that the hyperlink relationship is not a labelled path, for example if there is a hyperlink from the XTech 2008 webpage to the W3C webpage, then we cannot ask what that relationship is. This loosely interconnected web of documents could be called a “Universe of Discord” because there is no hyperlink metadata available, which makes the web quite chaotic and can be particularly difficult to find what you are looking for.
The Web of Data
Due to the increase of understanding the need for socially-aware web applications coupled with the increase in popularity of object-oriented web application development and the understanding of semantics; document web developers are beginning to understand the importance of a data-centric web. This is the emerging “Web of Data” (or “Data Web”), which seeks to expose data, not just inside a web page but also, in a machine-friendly way.
Semantic Web formats and vocabularies are beginning to be used for the development of a “Web of Data”, many Social Networking tools expose user profiles using the Friend of a Friend (aka FOAF) format, which exposes information about a user and his/her contacts. FOAF and many other similar vocabularies are built using the Resource Description Framework (aka RDF) which is a format for describing data objects and their relationships. However, a “Web of Data” isn’t a web without inbound and outbound links, and this is where the notion of Linked Data steps in.
Linked Data
What is Linked Data?
Linked Data is made up of interconnected data objects available through a web application. Each object is connected to other objects using labels which represent the relationship that the two objects have with each other. One such example is shown in Figure 1.
timbernersleegraph.png Figure 1. A simple Linked Data Graph representing that the Tim_BernersLee object is the same as the DBLP object 100007. The above code is the RDF/N3 form for the triple described in Figure 1. There are four rules that Tim Berners-Lee laid down in his “Design Issues Linked Data” report [BER06]: •Identify things using valid Uniform Resource Identifiers (URIs) •Use HTTP URIs so that people can look up those names •Provide useful information when somebody uses the URI •Include links to other URIs so that users can discover more things
Together the above rules allow human users and machines to access and interpret data with minimal effort. However, we can increase the level of meaning by following some additional rules: •Reuse publicly available ontologies and schemas wherever possible, and expose any that you have developed yourself. •Exploit HTTP Server Rules to provide RDF when somebody asks for “application/rdf+xml”, and provide (X)HTML in other cases. This is known as content negotiation.
How do we deploy Linked Data?
There are a few different options for deploying Linked Data: •Code your object relationship graphs using RDF and carefully follow the rules defined above. Then upload the graphs to the right location on the web, or upload to a data server. •Use a Data Server, such as OpenLink Virtuoso and provide XHTML and RDF Views of data objects stored in the database. However, there are problems when deploying Linked Data, these are: •Data Access and Unambiguous Naming •Data Reference and Ambiguous Association
Data Access and Unambiguous Naming
The Document Web does not separate identity from representation, in this case the Uniform Resource Locator is the Uniform Resource Identifier. The Data Web uses Uniform Resource Identifiers (URI) as a method of separating identity from representation, which does allow the non-existence of an object at a physical location. In this case the Uniform Resource Identifier is not necessarily the Uniform Resource Locator but could be a pointer to a pointer to an address.
Graph Spaces and Objects via a URI
As URI’s are the key to a graph or object, it is very useful to use friendly address structures. There are technical documents available which discuss such friendliness [SAU08]. One such friendly URI is shown in Figure 2.
friendlyuri.png Figure 2. An example graph hierarchy presented as a friendly URI. Figure 2 shows a server with multiple containers (aka directories) which represent the graph at that level, within these graphs contain objects which are denoted in the URI after the hash fragment identifier.
Data Reference and Ambiguous Association
Data Reference is an issue, as a URI can point to either a graph or a specific object. When a specific object is referenced, either through a hash-based URI or a slash-based URI an object level graph will need to be fetched. This is a challenge when using hash-based URIs, as everything after the hash is ignored by a web server as it is typically dealt with at client side. Also an object should
be described in RDF at the same location as the Document presenting that object. A Data Web Server can solve these issues using Content Negotiation and URL Rewriting.
Content Negotiation
Content negotiation is a process described in the HTTP specification [FIE99]. It is a method of responding in a requested configuration. This has the advantage of responding with (X)HTML based information to different device types, for example a different response depending on whether the user agent was a standard web browser or a mobile web browser.
StandardAndMobileNegotiation.png Figure 3. Using content negotiation a web browser can receive content suited to a standard screen view, and a mobile can receive content suited to a mobile screen view from the same dereferenceable URI. However, content negotiation becomes a very powerful tool when there is a requirement to respond in either a Document Web format (e.g. XHTML) or a Semantic Web format (e.g. RDF+XML).
StandardAndSemanticNegotiation.png Figure 4. Using content negotiation a document web browser can receive (X)HTML and a Semantic Web browser can receive RDF from the same dereferenceable URI. Here we see data with MIME type application/xhtml+xml (or text/html) being sent when requested by a Document Web Browser, and data with MIME type application/rdf+xml (or application/rdf+n3) being sent when requested by a Semantic Web Browser. Many of the technicalities and issues of this technique have been discussed at length [JAC04] [FIE05] [DBE08] [OLS08].
URL Rewriting
The fragment identifier (#) is usually treated locally by the client, it never reaches as far as the server logic. Therefore a hash based URI is usually treated without anything after its hash. For example, the Personal URI: http://myopenlink.net/dataspace/person/danieljohnlewis#t his would appear as the graph URI: http://myopenlink.net/dataspace/person/danieljohnlewis The solution, as developed for OpenLink Virtuoso [OLS08], is to pre-process the URI using a URL rewriting pipeline, which is a set of rules to handle ac-
cept headers, response codes, response headers and rule processing. During this process regular expressions can be employed to handle the object identifier after the hash, therefore enabling the possibility of identity fetching via, for example, a SPARQL endpoint.
Linked Data and Data Portability
Socially-aware web applications such as Social Networking Tools have to be sensitive to users data, they need to be secure and enforce privacy measures. However, users of such systems are starting to demand the rights to their data, which is why the DataPortability Project [DPP] was started: •Users want to be able to reuse their data in various systems •Users want online security and privacy •Users want to be able to move their data to/from wherever just as they move files and folders on their local system One solution to these problems is the use of Linked Data. With Linked Data: •Users are able to reuse their data in various systems using data by reference •Security measures can be built using trust relationships embedded in object graphs •Linked Data can be carried from one location to another and retain links to objects in other systems
Linked Data and Data Spaces
A Data Space is a container for object-centric information, this information is linked inside the Data Space and across into other Data Spaces. Data Spaces cannot exist without Linked Data, the idea of a Data Space is turn a “Universe of Discord” into a “Universe of Discourse” where objects have subjects and relationships to other objects in an orderly fashion.
OpenLink Data Spaces [ODS] implement the Data Space Philosophy using Semantic Web and Linked Data technology and are user-centric, so every object is placed within a users Data Space. Figure 5 shows the Linked Data of OpenLink Data Spaces.
DataSpacesAndObjects.png Figure 5: The Linked Data provided in OpenLink Data Spaces defined using the Semantically Interlinked Online Communities (SIOC) and Friend of a Friend (FOAF) formats.
References
[BER06] T. Berners-Lee. (2006) “Design Issues - Linked Data” [DBE08] D. Berrueta & J. Phipps. (2008) “Best Practice Recipes for Publishing RDF Vocabularies” [DPP] DataPortability.org. “The DataPortability Project” [FIE99] R. Fielding et al. (1999) “Hypertext Transfer Protocol -- HTTP/1.1”
[FIE05] R. Fielding. (2005) “HttpRange-14 Resolved” [JAC04] I. Jacobs & N. Walsh. (2004) “Architecture of the World Wide Web, Volume One” [ODS] OpenLink Software. “OpenLink Data Spaces Wiki page” [OLS08] K. Idehen & C. Blakeley. (2008) “Deploying Linked Data” [OLV] OpenLink Software. “Virtuoso Universal Server” [SAU08] L. Sauermann & R. Cyganiak. (2008) “Cool URIs for the Semantic Web”