The semantic web, explained

  • VersionDude
  • Standards
  • 6 min read

The idea of a web of data — where meaning is machine-readable — and the standards like RDF, OWL and reasoners that make it work.

The semantic web is an extension of the web in which information is given well-defined meaning, so that machines — not just people — can process it. On the ordinary web, a page is text laid out for human eyes; software can display it but does not truly understand what it says. The semantic web's ambition is to make that meaning explicit, turning the web from a collection of documents into something closer to a global database.

The guiding vision is a web of data rather than a web of pages. Instead of meaning being trapped inside prose that only a human can interpret, the relationships between things — that a particular author wrote a particular book, published in a particular year — are stated explicitly in a form machines can read and query. Where the document web links pages to pages, the data web links facts to facts.

— VersionDude

This vision rests on a stack of standards developed largely through the W3C, each layer building on the one below. At the base, RDF, the Resource Description Framework, expresses facts as simple subject-predicate-object triples, such as 'this book has author that person'. These triples are deliberately atomic, so that complex knowledge can be assembled from many small, combinable statements.

Because every fact is a uniform triple, data from different sources can be merged simply by combining their statements into one large graph. There is no need to reconcile incompatible table layouts as you would with traditional databases; triples from one dataset sit naturally beside triples from another. This composability is a core part of why the model is powerful, and why it suits a decentralised web.

On top of RDF sit ontologies, which give the raw triples shared vocabulary and structure. Ontologies written in OWL, the Web Ontology Language, describe the classes of things in a domain and the relationships between them — defining, for instance, what it means to be an author, or that every book has exactly one title. An ontology is essentially an agreed-upon schema of meaning that lets independently created data interoperate.

An abstract pattern of connected nodes and lines.
An abstract pattern of connected nodes and lines.

To get information back out, SPARQL serves as the query language for these graphs. Much as SQL queries rows in a relational database, SPARQL queries patterns of triples across a graph, letting you ask questions like 'find every book whose author was born in a given city'. Together, RDF for facts, OWL for vocabulary and SPARQL for querying form the practical toolkit of the semantic web.

Reasoners are the engines that turn this from structured storage into something genuinely intelligent. A reasoner takes an OWL ontology together with a set of facts and infers new facts that follow logically from what is stated, rather than only returning what was explicitly written down. This is the step that lets the system answer questions the raw data never spelled out directly.

Pellet is one well-known example of such a reasoner. Given an OWL ontology, it can check the ontology for consistency — flagging contradictions in the definitions — and derive entailed facts, such as concluding that something must belong to a particular class because of the rules it satisfies. The reasoner effectively does logical work on your behalf, surfacing conclusions implicit in the data and its rules.

It is worth being honest that the full semantic-web vision has not replaced the ordinary web, and adoption of the heavier standards has been uneven. Building and maintaining rich ontologies is demanding, and many projects found the effort hard to justify against simpler approaches. The grand idea of a fully machine-understandable web remains more aspiration than everyday reality, and acknowledging that is part of understanding the field accurately.

Even so, you encounter the semantic web's descendants every single day. Structured data and schema.org markup, which power the rich results, knowledge panels and previews you see in search engines, are a pragmatic slice of the very same idea — making the meaning of a page legible to machines. The lightweight, incremental version of the vision quietly succeeded, even where the maximal version did not, and that is the form most developers will work with.

Related project