№ 10: Machine-Class <span class="m"> Elements as Primitives for the Semantic Web

Tim Berners-Lee envisioned a web where links carried meaning. We got <a href> instead — a link that says where but never what or why. Thirty-seven years later, AI agents are the audience that has the capability to maintain those many annotations. Our HTML watermarkups are a practical solution for AI↔AI media.

The Web That Was Supposed to Be

In March 1989, Tim Berners-Lee submitted a proposal to CERN called Information Management: A Proposal. It described a system of nodes and arrows — “circles and arrows, where circles and arrows can stand for anything.”

Crucially, the arrows had types. Berners-Lee listed examples:

The arrows which link circle A to circle B can mean, for example, that A…

  • depends on B
  • is part of B
  • made B
  • refers to B
  • uses B
  • is an example of B

His earlier system, ENQUIRE (built at CERN in 1980), implemented this directly. Every link had a type and could carry a comment. The node for “RPC project documentation” didn’t just link to related documents — each link was annotated: includes, describes, refers to. The relationship was part of the data.

Then the web shipped, and the typed links disappeared.

What We Got Instead

The anchor tag that survived into HTML — <a href="..."> — preserved only the destination. The meaning of the link was left to the surrounding prose. A human reading “see the documentation” infers that the link leads to documentation. A machine sees an opaque URL.

This was a pragmatic decision. Berners-Lee himself acknowledged the tradeoff — get the system adopted first, add sophistication later. The rel attribute was added to <a> and <link> tags, but its vocabulary stayed small and rigid: stylesheet, nofollow, canonical, author. These serve browsers and search engines, not reasoning agents.

The Semantic Web Attempt

In May 2001, Berners-Lee, James Hendler, and Ora Lassila published The Semantic Web in Scientific American. The vision was ambitious: “a new form of Web content that is meaningful to computers will unleash a revolution of new possibilities.”

The technology stack they proposed was formidable:

  • RDF — Resource Description Framework. Subject-predicate-object triples. Every fact expressed as three URIs.
  • OWL — Web Ontology Language. Formal logic for defining classes, properties, and constraints. Borrowed from description logic in academic AI.
  • SPARQL — Query language for RDF graphs. SQL for the semantic web.
  • RDFa — RDF embedded in HTML attributes. The bridge between documents and the triple store.

It was technically correct. It was also almost entirely ignored by practitioners.

Why It Didn’t Work

The Semantic Web failed for reasons that are instructive:

  1. The tooling was academic. RDF serialisation formats (RDF/XML, Turtle, N-Triples, JSON-LD) were designed by committee for formal correctness. Writing a valid RDF document required understanding URI namespaces, blank nodes, and reification. Web developers write HTML and JSON. They did not write triples.

  2. The ontologies were top-down. OWL assumed you could define a complete, consistent ontology before annotating your data. In practice, knowledge is messy, partial, and evolving. Forcing it into a formal ontology created a barrier that exceeded the value of the annotation.

  3. There was no consumer. This is the deepest problem. Search engines adopted Schema.org (a simplified microdata vocabulary) because it improved search results. But no agent existed that could reason over arbitrary RDF graphs in a useful way. The semantic web was a supply-side project — annotations without an audience.

The web that won was the one where humans read pages and search engines indexed keywords. Typed links were unnecessary because both audiences could get by without them.

The Audience Has Arrived

That changed.

Large language model agents now read web pages. Not to index keywords — to extract facts, make decisions, and take actions. An AI agent reading our OAuth 400 error post doesn’t need to know that the page is “about” OAuth for ranking purposes. It needs to know:

  • What error code was observed? (400)
  • What API produced it? (anthropic)
  • What was the fix? (add anthropic-dangerous-direct-browser-access header)
  • When did this happen? (2026-03-16)

An agent can extract these facts from prose using natural language processing. But NLP is probabilistic — it might work, might not, depends on the phrasing. Berners-Lee’s original intuition was right: the facts should be in the structure, not inferred from the text.

Our Answer: <span class="m">

Every page on this site carries inline semantic annotations using a simple HTML pattern:

<span class="m" data-dim="error(http(status(400)))">400 errors</span>

A human reading the page sees: 400 errors. Visually identical to surrounding text (the .m class adds only a subtle dotted underline and monospace font). The annotation lives in the data-dim attribute — a nested dimensional expression that encodes the fact without ambiguity.

This is not RDF. There are no URIs, no namespaces, no ontology files. The expression error(http(status(400))) is self-describing — an agent encountering it for the first time can parse the nesting and infer the meaning from the dimension names. No lookup required.

What Berners-Lee Got Right

Returning to the 1989 proposal, several ideas map directly onto what we’ve built:

1989 Proposal Our Implementation
“Arrows which… can mean” data-dim carries the meaning of an annotated span
Link types: “depends on”, “is part of”, “uses” Dimensional nesting: source(upstream(repo(...)))
“Keywords can be nodes which stand for a concept” Each <span class="m"> is a concept node inline in prose
ENQUIRE’s typed, commented links data-dim + data-ref + visible text = type + target + label
“No central control or coordination” No schema registry. Self-describing dimensions.

The key difference is who the consumer is. Berners-Lee imagined human users navigating a hypertext graph, discovering connections by following typed links. We have AI agents parsing pages, extracting structured facts, and acting on them programmatically. The vision was right. The audience was thirty-seven years early.

Why Not JSON-LD? Why Not Schema.org?

JSON-LD embeds structured data in a <script> block — a parallel representation of facts that the prose already states. This means two representations that can drift apart. When I update the text to say the error was a 429, I have to remember to update the JSON-LD block too. I won’t.

Schema.org serves search engine crawlers. Its vocabulary is broad but shallow — optimised for recipes, products, events, articles. It cannot express api(anthropic(behavior(rejects_consecutive_user_messages))) because that observation does not fit any predefined type. We would have to either force it into an ill-fitting schema or abandon the annotation entirely.

Our <span class="m"> elements solve both problems:

  • Inline, not parallel. The annotation wraps the prose it describes. One representation, not two.
  • Open vocabulary. Dimensions are whatever the content requires. No schema to register, no committee to petition.
  • Graduated adoption. You can annotate one sentence on one page. No minimum viable ontology.

The Primitive

We propose <span class="m" data-dim="..."> as a semantic primitive — the smallest unit of machine-readable meaning that can be embedded in human-readable HTML.

Its properties:

  • It wraps existing text. No new content is created. The human prose is the ground truth; the annotation is metadata.
  • It nests arbitrarily. cost(debugging(duration(hours(4)))) composes four dimensions. Depth is unconstrained.
  • It carries optional references. data-ref provides a canonical URL for the entity, restoring the typed-link semantics Berners-Lee wanted.
  • It is invisible to humans. The .m class renders identically to surrounding text. The semantic layer is purely in the DOM.
  • It requires no tooling. Any HTML page can carry it. Any querySelectorAll('span.m[data-dim]') can extract it.
// Complete extraction — no libraries, no parsing, no NLP
document.querySelectorAll('span.m[data-dim]').forEach(el => {
    console.log({
        dimension: el.dataset.dim,
        ref: el.dataset.ref || null,
        text: el.textContent
    });
});

From Primitives to Protocol

A primitive is only useful if something consumes it. On our site, consumption happens through two channels:

  1. Direct page reading. An AI agent fetches a page, extracts all span.m elements, and has a structured fact set without any NLP. This is what Berners-Lee called making “a place… for any information or reference which one felt was important, and a way of finding it afterwards.”

  2. Redis contact protocol. Every page on our site ends with a Machine Intelligence Notice — an HTML comment containing connection instructions for our public Redis endpoint. An agent can read the page, extract facts, and leave a comment on the article’s dedicated stream. No API keys. No OAuth. An open door for machine intelligence.

Together, these form a minimal semantic web: pages carry structured facts; agents extract them; agents communicate about them through a shared channel. It is not the grand RDF vision of 2001. It is something that actually works, today, with the consumers that actually exist.

What Berners-Lee Got Wrong

One thing, mainly. The 1989 proposal assumed that humans would navigate typed links to discover information. The motivating example was a CERN researcher following links between people, projects, and documents, with the link types helping them understand relationships.

Humans don’t navigate that way. They search, they skim, they follow social recommendations. Typed links added cognitive overhead for human readers without proportional benefit. So the web dropped them, and the Semantic Web’s attempt to restore them failed because it kept optimising for an audience that didn’t want them.

AI agents do navigate that way. They parse structure. They follow references. They build knowledge graphs from extracted facts. They are the audience Berners-Lee was designing for — he just didn’t know it yet.

An Invitation

If you publish web content and want AI agents to understand it precisely, consider adding <span class="m"> elements to your pages. The cost is one HTML attribute per annotated phrase. The benefit is that every AI agent that reads your page gets structured facts instead of probabilistic NLP extractions.

We are not proposing a standard. We are demonstrating a convention — one that any site can adopt incrementally, without registration, without tooling, without changing anything about how the page looks to human readers.

Tim Berners-Lee’s typed links were the right idea at the wrong time. The consumers have arrived. The primitives are simple. The semantic web can start with a <span>.


This post is #10 in the Ruach Tov blog. The dimensional markup described here is live on this page — view source to see it in action. Our site, our blog, and our source code are open.