Why XML tags are so fundamental to Claude

Posted by glth 8 hours ago

Why XML tags are so fundamental to Claude(glthr.com)

126 points | 82 commentspage 2

wooptoo 3 hours ago|

Amazing how an entire profession that until yesterday would pride itself on precision, clarity (in thought and in writing), efficiency, and formality, has now descended into complete quackery.

cyanydeez 3 hours ago|

Are you talking about the office of the president of the united states?

This vague posting is kind dumb.

wooptoo 2 hours ago||

It's a simple observation. I'm not here to win internet points. I've never before seen so much cargo-culting and mystic belief among engineers.

imglorp 7 hours ago||

A very minor porcelain on some of the agent input UX could present this structure for you. Instead of a single chat window, have four: task, context, constraints, output format.

And while we're at it, instead of wall-of-text, I also feel like outputs could be structured at least into thinking and content, maybe other sections.

TheJoeMan 8 hours ago||

That first image, “Structure Prompts with XML”, just screams AI-written. The bullet lists don’t line up, the numbering starts at (2), random bolding. Why would anyone trust hallucinated documentation for prompting? At least with AI-generated software documentation, the context is the code itself, being regurgitated into bulleted english. But for instructions on using the LLM itself, it seems pretty lazy to not hand-type the preferred usage and human-learned tips.

rafram 7 hours ago||

No, it’s two screenshots from Anthropic documentation, stitched together: https://platform.claude.com/docs/en/build-with-claude/prompt...

The post even links to that page, although there’s a typo in the link.

glth 7 hours ago|||

Author here: I have just fixed the typo. Thank you.

And yes, these are screenshots from Anthropic’s documentation.

dmd 7 hours ago|||

They're not even stitched together ; there's just no padding between the two images.

Calavar 7 hours ago|||

It looks like a screenshot from the Claude desktop app, so I don't think the author is trying to disguise the AI origin of the marerial

croes 6 hours ago|||

You just hallucinated the content is AI generated.

michaelcampbell 5 hours ago||

"This is AI" is the new "This is 'shopped, I can tell by the pixels."

tingletech 5 hours ago||

I can tell by the em dashes

doctorpangloss 4 hours ago||

There must be an OpenClaw YouTube video helping people post to hacker news, or something, because the front page is overrun with AI slop like this article, that makes no sense anyway. The author literally has no idea what any of this stuff means.

ryanschneider 3 hours ago||

Wait am I in the minority talking to Claude in markdown? I just assumed everyone does that, or at least all developers. It seems to work really well.

cyanydeez 3 hours ago|

I do that in openwebui for code indents like ```

alansaber 5 hours ago||

Sounds like as 1. XML is the cleanest/best quality training data (especially compared to PDF/HTML) 2. It follows that a user providing semantic tags in XML format can get best training alignment (hence best results). Shame they haven't quantified this assertion here.

lsc4719 5 hours ago|

Makes sense

twoodfin 5 hours ago||

This isn’t surprising: XML’s core purpose was to simplify SGML for a wider breadth of applications on the web.

HTML also descended from SGML, and it’s hard to imagine a more deeply grooved structure in these models, given their training data.

So if you want to annotate text with semantics in a way models will understand…

tingletech 5 hours ago|

XML and HTML are SGMLs

ChrisSD 4 hours ago||

HTML diverged from SGML pretty early on. Various standards over the years have attempted to specify it as an application of SGML but in practice almost nobody properly conformed to those standards. HTML5 gave up the pretence entirely.

wolttam 8 hours ago||

Anthropic’s tool calling was exposed as XML tags at the beginning, before they introduced the JSON API. I expect they’re still templating those tool calls into XML before passing to the model’s context

pocketarc 7 hours ago||

Yeah like I remember prior to reasoning models, their guidance was to use <think> tags to give models space for reasoning prior to an answer (incidentally, also the reason I didn't quite understand the fuss with reasoning models at first). It's always been XML with Anthropic.

wolttam 7 hours ago||

Exactly the same story here. I still use a tool that just asks them to use <think> instead of enabling native reasoning support, which has worked well back to Sonnet 3.0 (their first model with 'native' reasoning support was Sonnet 3.7)

scotty79 3 hours ago||

Can you sniff it out with Wireshark?

wolttam 1 hour ago||

They don't expose the raw context over the wire, it's all pre/post processed at their API endpoints.

ixxie 4 hours ago||

How about other frontier models, and smaller models?

Zebfross 7 hours ago||

I thought the goal was minimal instruction to let Claude determine the best way to solve the problem. Not adding this to my workflow anytime soon.

TheLNL 6 hours ago|

It is not for the end user, it is more for things like wrappers and automation scripts.

Nobody expects the end user to prompt the AI using a structured language like xml

CactusBlue 5 hours ago|

I think the main advantage of the XML here is that the model is expected to have a matching end tag that is balanced, which reduces the likelihood of malformed outputs.

More comments...