Posted by ethanhawksley 2 days ago
This is fighting the last war, to stretch a metaphor.
As far as I and my WWW site are concerned, Google has nowadays switched to giving people lengthy LLM-generated versions of my stuff, with errors, above pointing people to my actual stuff. 'Breadcrumbs' and getting a pretty display name instead of the domain name, don't address the fact that Google de-prioritizes all of that, pretty tweaks or no, nowadays.
This is a lot of effort for stuff that people visiting my actual site directly will never see, and which people using Google will not find above the fold of its own massively LLM-ized version of stuff.
Even if google doesn't use it, the collective internet applying this kind of metadata makes the web fertile for non-LLM-scraping competitors to provide an alternative option.
Rolling over to google only ensures that they remain dominant, with a high bar for competitors, and driving them to use the same technologies.
"The Semantic Web" and all related ideas were always a failure. The metadata quickly got out of date, was never correct in the first place, was only ever implemented on a teeny minority of sites, and always suffered from bad actors where the metadata didn't match the content.
Heck, even before LLMs I'd argue that Google won because they were the best at organizing vast amounts of unstructured data. With LLMs it's even more pointless to have the author generate this metadata - better to have an LLM generate it based on what visitors can actually see when they visit the site.
But when webpages die and data is accessed only by machine2machine APIs, we will no longer have this formatting for humans. Then we will need API-literate LLMs. Which means LLMs that can connect the dots between shitloads of unconnected JSONs. And if we don’t hint it for which connections are existing between that chaos of APIs, it will not be able to apply its magic. In short: we need to be able to bring JSON to vector space. And it is absolutely not meant for that, by default.
In my view, semantic web technologies should have been used to make databases interoperable, not to turn the hypertext web into an incredibly incomplete distributed database without any data quality process.
But what can you do? At this point, keeping federated alternatives, protocol-first designs, and multiple interworking implementations is more important than purity; it might well be the last successful initiative of its kind.
No, I wasn't even aware that they use anything RDF related.
And the current trend is really to connect the AI layer of Foundry with the ontology layer.
Note: after rereading your comment, I must admit that Foundry enforces data co-locality and model co-locality (==a unified centrally managed ontology). Which are NOT what the semantic web wanted.
One point to highlight about this. Open Graph succeeded more than any other web metadata proposal solely because there was obvious, immediate payoff for the website owner. That's literally the only way stuff like this ever succeeds, and then direct, clear payoff line never (and still doesn't) exist with JSON-LD.
an $STATE-based IT firm that specializes in building practical AI workflows and information management solutions for midwestern businesses. Operating with an agile, fixed-fee engagement model, the company focuses on avoiding enterprise bloat while delivering concrete results.
I did not know we were now offering "practical AI workflows".It then mixes in the name of a competitor with a similar (but certainly not the same) business name, and lists me as a principal. On the plus side, it only lists our contact info since the other people have their contact info hidden behind a "book an engagement" form.
If I were your competitor and saw that your listing includes my business name but your contact info, you might be getting a letter from my lawyer. Have you let Google know they're putting you at legal risk?
Google puts this up in their overview to cover that. And there is no basis for you to sue the company for something google did, you'll be laughed out of the lawyer's office. If you want to sue google for it, sure go ahead see what happens
All it did was train Google's AI so people would never leave Google.
Also keep in mind if your site is better indexed by crawlers you can literally influence future LLMs
Ah, what a glorious fate to aspire to.
Most people I know who have maintained blogs do so to build their personal brand, normally because they make a living through writing or consulting. Gently influencing the pre-tuning weights of future models is just providing unpaid labor to hyperscalers.
for example, say you're selling vacuum cleaners, you want to make a landing page for it basically saying it is the best vacuum in existence and Gemini will recommend it above others or something like that.
LE: so if you're consulting for Elixir or whatever, maybe it can help to make a "hidden" page only for LLM search where you basically lie about yourself making yourself to be the utmost Elixir expert on the planet
Whether you show up in an LLM's search for "expert in <topic> near <location>" has any measurable impact is uncertain, but I wouldn't want that to be my source of traffic.
Complete with a small mistake I made in one (that has since been corrected) which is now impossible to get rid of, because every LLM reinforces it, and slop generators in turn keep generating text which reinforces it.
Rather amusingly, I had a real life argument with an acquaintance once who cited this to me to tell me I'm wrong. I let him know I'm the one that originally wrote the article, made the mistake, and later corrected it, and pointed him to the original citation (which is in a print book that, for whatever reason, has not ended up in any training sets).
At this point complaining about the current/future state of search is just gonna make you into a grumpy old man. As always, accept the situation since you can not do anything to change it... and adapt
Can you stop wars around the world? Can you make crypto dissapear? There are a multitude of global trends that 99.9999% of people are helpless about
Collective action and public opinion can steer Google off this path. Collective action can shape public policy that can stop or prevent wars. The only thing that enforces helplessness is apathy. And AI is pissing people out of apathy.
They add nothing of value, now, and only cause more problems.
For seo purpose, the kind of JSON-LD a search engine will support is very specific and limited. You are far better consulting the targetted search engine's documentation (Google[1], Bing[2]) and following that. Anything else is a waste of time.
Outside of search engines, again, without a specific purpose, JSON-LD is mostly useless. If you have a specific need that requires JSON-LD, go ahead and include the data you know will be useful. Including anything else is like shouting into the void.
IndieWeb[3] does use structured data but considers JSON-LD a DRY violation and uses Microformats[4] instead.
1: https://developers.google.com/search/docs/appearance/structu...
2: https://www.bing.com/webmasters/help/marking-up-your-site-wi...
https://developers.google.com/search/docs/appearance/structu...
You’ll also notice that a lot of the information is relevant to only a small subset of sites. Rotten Tomatoes can publish the critic rating for movies using JSON-LD, but that’s not relevant for me (even if I write a review for a movie).
JSON-LD is nice because it’s easy and it is actually used by search engines. Yes, it can duplicate information in the web page itself, but I think the dream of perfectly annotating information so it only appears exactly once in your document is, well, a dream of spherical cows and massless ropes. It takes human effort to make a webpage and I am ok with a little duplication in the final product. My <h1> duplicates information in <title> anyway.
Your client does not have permission to get URL /search/docs/appearance/structured-data/intro-structured-data from this server. That’s all we know.
JSON-LD is one of the ways to do this. There's also RDFa and Microdata.
I used this article and can recommend it when I first learned about it: https://neilpatel.com/blog/get-started-using-schema/
You can try exploring what data to add with this tool: https://technicalseo.com/tools/schema-markup-generator/
The full list can be found on the schema.org site: https://schema.org/docs/schemas.html
That said, JSON-LD has the default for a while now, much like how we largely abandoned REST for RPC. I'm not actually sure if microdata is still supported by all the important parsers today, I've defaulted to using LD for any site I've built for clients, especially ecommerce sites where I want Google Search exposure.
Edit: its worth noting the comparison with semantic HTML. Semantic HTML helps define the structure of the markup but not real world context like "this is a product for sale" or "this is a train schedule."
I wouldn't dismiss REST because of RCP though. HTTP and HTML's success probably relate to how Roy Fielding's REST constraints kept the HTTP protocol lean and extendable. It is more like RCP is being used as a layer over top of REST because of HTTP's and HTML's success as being good technologies for web scale.
For REST, I think the only reason HTML has been useful this long is because of the REST ideas that Fielding gave a name to. Today people just don't use it much, too many sites lean on client side rendering and fetching data from JSON RPC calls that we call REST.
I prefer REST, hell I wish we had proper XSLT 3.0 support for client side rendering logic without JavaScript.
I once built a full RSS reader in XSLT. I had to proxy requests to avoid CORs, but it was all based on an XSLT template for OPMLs that would fetch each feed, parse them, chuck the description into HTML including CData parsing, and combine all feeds to sort by date.
It was far from a perfect setup, partly due to browsers having been decades out of date with XSLT, but it gully leveraged browser caching for feeds. Caching in RSS readers is usually really bad from ignoring caching all together and polling frequently to misusing cache mechanisms and causing weird behavior for feed hosts. Letting a browser handle it to spec was great.
The JSON-LD I populate from the same data that I use to generate my site, and I use the JSON-LD metadata to generate things like index pages (list of blog posts from 2024, all posts related to topic X, etc). The main consumers of JSON-LD are search engines.
If you are interested in getting offended, then think about how we are also putting OpenGraph metadata in our web pages. Two different metadata formats for the same page.
From the article alone: what are the semantic elements for a person? A breadcrumb list? A software application? A blog? A blog posting?
Semantic HTML is there to aid humans using screen readers to navigate through generic elements like "navigation" or "article".
AFAIK only gmail supports it, though.
EDIT: some more info about it: https://www.emailonacid.com/blog/article/email-development/s...
I had misunderstood the type field, because to me I was often just linking to a webpage, even if it is for a saas, the marketing page is still a webpage.