Linked Data has 4 key principles, denoted as the rules of linked data:
1. Give things URIs
2. Make sure that URIs are HTTP URIs
3. Return something meaningful when a URI is looked up
4. Link things together
Assuming that the 4 principles are followed, the first 3 are, in general, whereas the fourth is often not (see Hogan et al’s brilliant paper about the Pedantic Web for a fresh perspective of the application/deployment of SW tech), then the Web of Linked Data should be a rich information network. Using a follow-your-nose strategy, an agent can start at a point in the space and move from resource to resource, discovering a plethora of relations and links. This is great, we can read data and discover and learn new things. But… what if we want to write things? What if we want to update data in this space with new facts and information as it becomes available? As a toy example, consider the following URI:
http://dbpedia.org/resource/Prime_Minister_of_the_United_Kingdom
Looking up this URI, a machine is told that the current Prime Minister is Gordon Brown, however this is not the case – it is now David Cameron. The underlying mechanism by which DBPedia data is generated, relies on extracting information from Wikipedia pages and enriching such information with formal semantics. Such methods involve a large overhead when generating machine-readable data: large-scale information extraction, sophisticated disambiguation methods, triplification of information, etc.
One solution to this problem is to empower linked data consumers to also amend or edit existing resource descriptions. After speaking with Alex Passant, Olaf Hartig, Bernhard Schandl and Hannes Mühleisen at the linked data gathering at this summer’s Extended Semantic Web Conference, it was suggested that a 5th principle for linked data could be created to flow from the above 4. This 5th principles would, in essence, allow data to be pushed or put onto a given URI.
5. Push data onto URIs
Implementation of this principle is open to interpretation. So, if you have a URI which describes a resource that has fixed data (e.g. the DBPedia URI for the Battle of Waterloo) then pushing data such a URI should not be allowed. There may be a case where a resource description needs to be updated (e.g. my data.dcs URI), however the party who wishes to update the URI must be authorised as being permitted to do so.
Moving towards pushing data would not only reduce the computation forced on dataset generation processes, but would also produce a writable web of data, something which would querying of up-to-date resource descriptions.
It is really amazing helpful
Is it principly a principle?
4 principles are easy to remember and follow, and all are very broad, and very simple in principle. They can be implemented and expressed in a number of ways.
I can absolutely see the value in the pushing of data to uris, indeed I'd very much like to play with that at some point.
But is it a principle? Is this not something that coud, perhaps should, become a community norm, but might hinder many people publishing their data at all, or believing that their triples and links haven't made it to Linked Data status after all?
Good point. I think that
Good point.
I think that this was raised when discussing the notion of an additional principle on monday evening. Maybe an 'amendment', or an extra clause to an existing principles (say 3 for instance), could provide linked data producers with the option to allow URI updates. Food for thought.
sparqlPuSH
And, of course, one way to distribute the updates could be through the use of sparqlPuSH, as proposed by Alex and myself:
http://www.semanticscripting.org/SFSW2010/papers/sfsw2010_submission_6.pdf
Yes, this is a solution to
Yes, this is a solution to the problem of updating what is at a URI. I think that URI PATCH/PUSH/PUT should be facilitated through a range of means, depending on what the URI hoster can support.
What about dynamic relationships?
Matthew, thanks for the post!
Eager to join the conversation, I wrote a post myself:
http://knoesis.wright.edu/students/pablo/blog/2010/06/01/re-a-proposed-5...
The main points are:
I envision a network of trusted Linked Data servers where one party can digitally sign a SPARQL UPDATE and send the update to another trusted party.
Pushes about “fixed data” should not only be allowed, but encouraged if we want to increase value in the Linked Open Data cloud.
Cheers,
Pablo
Thanks for reading Pablo. You
Thanks for reading Pablo. You raise some interesting points in your post. I really like the idea of digitally signing a SPARQL UPDATE query and then using that query elsewhere, given that I am now a trusted entity.
Re: fixed data, there will be fixed data and changeable data. I think that the former should not change, or rather, the publisher should impose strict constraints on its modification (maybe only certain properties of the resource). Whereas the latter, could be altered using the crowd to audit/amend the data as required.
Rigidity
Thanks for the answer, Matthew. The idea of having policies for what is changeable sounds interesting. Do you see anything beyond rigid properties [1] being "fixed data" that can't be modified? Because if you don't, then the policies can maybe just be realized by annotating properties [2]?
[1] http://en.wikipedia.org/wiki/OntoClean#Rigidity
[2] http://www.w3.org/TR/owl-ref/#AnnotationProperty-def
I think that the notion of
I think that the notion of changeability and the policy behind its implementation/application should be left up to the linked data provider. If they believe that their dataset contains resources which alter over time then the appropriate actions should be permissible - but only to those who are allowed to do so.
Annotation properties would be one solution, coupling this with a named graph for the resource description could allow policies to be glued on to resources. Jenni Tennison sums up the application of named graphs nicely in versioning of government data [1], this is something which would have to be considered for URI Updates - i.e. give me the Prime Minister of GB at this period.
[1] http://www.jenitennison.com/blog/node/141
SPARQL Update
My problem with SPARQL update is that it's (naturally) limited to RDF.
Of course RDF is great, but if I want to update, say, DBPedia or my social network profile I might often want to update, say, a JPG representation and the RDF metadata at the same time.
Now, granted, I could separately upload the JPG and then update the RDF, but this potentially leads to broken references and fragmented provenance.
For me the solution is ('Linked Open') services that talk about resources via representations of those resources (RESTfully) and RDF about resource description (metadata).
Agree, but ...
... not sure where and how to do it. We're working a while already on this aspect [1] and hence I think it should rather go into [2] than updating the original Linked Data principles. TimBL?
Cheers,
Michael
[1] http://esw.w3.org/WriteWebOfData
[2] http://www.w3.org/DesignIssues/ReadWriteLinkedData.html
[1] sounds good, but as you
[1] sounds good, but as you mention there are issues with it. So you think it would be best to create a new set of principles? Based on [2]? Either way would be good for me, new principles might be clearer though.
PATCH
HTTP (and hence REST) now has a specific verb for this:
http://tools.ietf.org/html/rfc5789
(Rather than abusing PUT/POST)
Combining this with authentication is, I agree, necessary.
Types of push
Very interesting.
Maybe what we need to do is to organize the discussion to figure out what types of push can be made to a store, and how we should authorize each type. Changes to a rigid definition [1] (PATCH) are inherently different from just updates (HTTP UPDATE) to some anti-rigid or non-rigid property, and should be handled differently. Some people can be allowed to "fix" (PATCH) a definition, while other people are only allowed to add extra (non-definitional) information.
[1] http://en.wikipedia.org/wiki/OntoClean#Rigidity
PATCH, PUT, POST ...
I cannot see where the "abusing" is when using PUT and POST. They have different purposes than PATCH, and therefore it's fine to allow all of them (or not, depending on the data).
I agree that we need some sort of authentication, but the even trickier part is the question what happens when resources are linked. To continue Matthew's example, the resource http://dbpedia.org/resource/Prime_Minister_of_the_United_Kingdom refers (dbpprop:incumbent) to http://dbpedia.org/resource/Gordon_Brown , which in turn has a backlink (dbpprop:office) to the former resource -- both of which are now outdated. This leads to the question, given an update to a resource, which updates can be propagated to linked resources? Is there a way to automatically transform and propagate such updates? What happens with authentication and authorization when doing distributed updates?
Following a link in read-mode is one thing, but following a link in write-mode is another even more complex challenge.
re: propagating updates,
re: propagating updates, would this require some method to ping linked datasets from the resources or enforce updates in the same dataset (as your example of predicates linked to dbpedia resources for the prime minister demonstrate)? Maybe once you are authorised to perform an update on a given resource, link attributed to that resource should also be updatable by you - thereby overcoming the problem of outdated and incorrect data.
Propagating updates
Propagating updates can be done by the data provider, but this requires a relatively tight coupling of data providers which might not be very realistic. A probably more promising approach is to let the client decide, just as it is with reading data (the client decides which links to follow, which sources to trust, how to combine data, etc.). But I haven't thought about this in much detail, I would be happy to discuss this further.
Abusing meant the current
Abusing meant the current inappropriate use of PUT (of which I've been guilty) for exactly the purposes for which PATCH has been created.
PUT should only be used for a complete replacement of the representation.
Why ...
... is this inappropriate? The decision of whether to use PUT or PATCH is IMO only a question of the ratio between data (triples in this case) that remain unchanged vs. data that are to be changed. There should not be a general rule whether PUT or PATCH MUST be used.
Nice. The notion of PATCH is
Nice. The notion of PATCH is definitely more applicable, I don't want to replace the entire resource description, just a portion of it.
Post new comment