JSON-LD and Linked Data: why they matter in the agentic search era


The internet was built on the assumption that a human sits on the other side of every screen. That no longer holds. In June 2026, Cloudflare’s Radar data showed agentic AI bots generating 57.4 percent of web requests globally, with humans down to 42.6 percent. Cloudflare CEO Matthew Prince, who had expected the crossover in 2027, said it arrived faster than predicted and called the data “a bit messy” but clearly past the tipping point. Ordinary crawlers passed human traffic a decade ago. What is new is agentic traffic: the systems that browse the web on your behalf when you ask an AI assistant a question.

These readers do not read a page the way a person does. They parse it, resolve the entities on it, and decide whether they trust it enough to cite or act on. That work runs on something old and unglamorous: Linked Data. The most common way to publish it on a webpage is JSON-LD, and twenty years on it has become the connective tissue of the AI search era.

Where Linked Data came from:

Linked Data comes from a 2006 design note by Tim Berners-Lee, who set out four principles for data machines could connect across sites: name things with URIs, use HTTP URIs so they can be looked up, return useful information in open standards like RDF, and link to other URIs so machines can discover more. The goal was a web of data, not just documents. It stayed in academic circles for years, held back by syntaxes like RDF/XML and Turtle. Two things changed that. In 2011, Bing, Google, Yahoo, and Yandex launched schema.org, a shared vocabulary for describing things on the web. Then JSON-LD, proposed around 2010 by Manu Sporny, gave Linked Data a syntax that felt familiar to anyone working in JSON. It became a W3C Recommendation in January 2014, with version 1.1 following in 2020.

Why JSON-LD won in SEO

Google began favoring JSON-LD around 2015 and was soon documenting new types in it first. It sits in its own script block, separate from the visible HTML, so you can add and maintain it without touching your layout, and it works on JavaScript-rendered content. Crawls have repeatedly found it the most widely deployed structured data format, ahead of Microdata and RDFa. The payoff was rich results, SERP features, and clearer entity signals feeding Google’s Knowledge Graph.

The shift to agentic search:

The same structured data that earned rich snippets is now the layer AI systems read to understand and act. AI search does not rank ten blue links. It extracts entities, checks them for consistency, and assembles answers. Schema Markup gives it explicit context instead of forcing it to infer meaning from prose. Organization and Person markup, paired with sameAs links to identifiers like Wikidata, help an AI resolve who or what your content is about, and sources it can resolve with confidence are the ones it tends to cite. Results vary, and no one outside the platforms knows the exact weighting, but the direction is consistent across Google, Bing, and the major engines. (None of this is new thinking: the late Bill Slawski was writing about entities and semantic search at SEO by the Sea long before AI answers made it urgent.)

Agents push it further. They do not just understand content, they act on it: compare products, check a price, complete a task. That only works if the data is structured, current, and consistent. If your schema says one price and your page shows another, an agent learns to distrust both. It is why emerging agentic commerce protocols, like Google’s Universal Commerce Protocol and OpenAI’s Agentic Commerce Protocol, assume a machine-readable foundation underneath.

What Google said in Milan:

Google’s Search Central Live event in Milan in June 2026 made the direction explicit. The points below come from coverage of the event, mainly Search Engine Roundtable’s writeup of session notes, so treat them as reported rather than official documentation.

Google described “cross-page @id linkage,” which lets products reference shared organizational data on other URLs through a stable identifier instead of repeating it everywhere. That is the Linked Data principle of reusable identifiers applied to commercial schema. The notes also described cleaned, typed data being extracted as a context layer that directly powers AI Overviews and AI Mode, the clearest signal yet that structured data feeds the AI answer itself. Alongside this came renewed investment in schema.org, including plans to publish term-popularity statistics and validation rules in open standards like SHACL and ShEx. One useful myth correction: Google’s parsers do not reward formal HTML validation for its own sake, and forcing artificial chunking for AI is not the goal. Organize content for human readability, since the structure that matters is semantic.

Why it comes back to a Knowledge Graph

Schema Markup is no longer something you bolt onto pages. It is infrastructure: a data layer machines read to understand, trust, and act on your content. The reliable way to maintain it at scale is a Knowledge Graph. Page-by-page markup is brittle. It drifts out of sync, contradicts itself across templates, and breaks the moment an AI cross-checks one page against another. A Knowledge Graph holds your entities and their relationships in one consistent model, then publishes them as Linked Data that search engines and agents already know how to read.

That is the work WordLift does. We build a Knowledge Graph for your brand and mint a stable, unique identifier for every entity in it, a dereferenceable URI that other pages and external datasets can point to. That is exactly the reusable-identifier approach Google described in Milan, and we link those entities out with sameAs connections to authoritative sources like Wikidata.

The mechanics have moved from rich snippets to AI answers to agents, but the foundation has not. Name your things, describe them in a shared vocabulary, link them together, and keep them accurate. Linked Data was the right bet in 2006. It is a better one now.



Source link

Leave a Comment