Standards

The semantic web, explained

VersionDude
Standards
6 min read

The idea of a web of data - where meaning is machine-readable - and the standards like RDF, OWL and reasoners that make it work.

The semantic web is an extension of the web in which data is given clear meaning. The aim is that machines, not just people, can process it. On the ordinary web, a page is text laid out for human eyes. Software can show it but does not really understand what it says. The semantic web wants to make that meaning explicit. It turns the web from a stack of documents into something closer to a global database.

The guiding vision is a web of data rather than a web of pages. On the normal web, meaning is trapped inside prose that only a human can read. Here, the links between things are stated openly in a form machines can read and query. For example, that a given author wrote a given book, published in a given year. Where the document web links pages to pages, the data web links facts to facts.

The standards stack: RDF, OWL, SPARQL

An abstract pattern of connected nodes and lines.

This vision rests on a stack of standards built largely through the W3C. Each layer builds on the one below. At the base sits RDF, the Resource Description Framework. It states facts as simple subject-predicate-object triples, such as 'this book has author that person'. These triples are kept very small on purpose. So rich knowledge can be built from many small statements that combine.

Because every fact is a uniform triple, data from many sources can be merged with ease. You just combine their statements into one large graph. There is no need to reconcile clashing table layouts, as you would with normal databases. Triples from one dataset sit happily beside triples from another. This is a core part of why the model is powerful, and why it suits a spread-out web.

On top of RDF sit ontologies. They give the raw triples a shared vocabulary and structure. Ontologies written in OWL, the Web Ontology Language, describe the classes of things in a field and the links between them. They define, for instance, what it means to be an author, or that every book has exactly one title. An ontology is basically an agreed-upon schema of meaning. It lets data built by different people work together.

To get data back out, SPARQL serves as the query language for these graphs. Much as SQL queries rows in a relational database, SPARQL queries patterns of triples across a graph. It lets you ask things like 'find every book whose author was born in a given city'. Together, RDF for facts, OWL for vocabulary and SPARQL for querying form the working toolkit of the semantic web.

Reasoners that infer new facts

Reasoners are the engines that turn this from mere storage into something truly smart. A reasoner takes an OWL ontology together with a set of facts. It then infers new facts that follow logically from what is stated. It does not just return what was written down. This is the step that lets the system answer questions the raw data never spelled out.

Pellet is one well-known example of such a reasoner. Given an OWL ontology, it can check the ontology for consistency and flag contradictions in the definitions. It can also derive new facts. For example, it may conclude that something must belong to a class because of the rules it meets. The reasoner does logical work on your behalf. It surfaces conclusions hidden in the data and its rules.

Why the grand vision stalled

It is worth being honest that the full semantic-web vision has not replaced the ordinary web. Adoption of the heavier standards has been uneven. Building and maintaining rich ontologies is hard. Many projects found the effort tough to justify against simpler ways. The grand idea of a fully machine-readable web is still more hope than daily reality. Admitting that is part of seeing the field clearly.

- VersionDude

The version that quietly succeeded

Even so, you meet the semantic web's descendants every single day. Structured data and schema.org markup power the rich results, knowledge panels and previews you see in search engines. They are a practical slice of the very same idea. They make the meaning of a page clear to machines. The light, step-by-step version of the vision quietly succeeded, even where the full version did not. That is the form most developers will work with.

Related project

Colour-highlighted JavaScript code on a dark screen

guides