Top
Best
New

Posted by robin_reala 3/28/2025

Xee: A Modern XPath and XSLT Engine in Rust(blog.startifact.com)
381 points | 232 commentspage 2
mattrighetti 3/28/2025|
I will definitely try this out!

I have a service that extracts <meta> tags in webpages and to do that I'm currently using (and need) three different dependencies: html5ever, markup5ever_rcdom, markup5ever. I don't like those to be honest, the documentation is quite bad and it was difficult to understand how I should have used the libraries to achieve such a simple task.

XPath on the other hand makes this extremely easy in comparison, I wonder how this will perform compared to my current solution.

faassen 3/28/2025|
Thanks!

Unfortunately at this point there's no HTML parser frontend for Xee (and its underlying library Xot) yet (HTML 5 parser serialization is supported at least in code). It shouldn't be too hard to add at least HTML 5 support using something like html5ever.

mdaniel 3/29/2025||
I always hate it when license files have "yes, but" language in them because if the license file differs in some non-obvious way, now I have to pay lawyers to interpret it

https://github.com/Paligo/xee/blob/xee-v0.1.5/COPYRIGHT

And that goes double for when there is a separate LICENSE file in the repo https://github.com/Paligo/xee/blob/xee-v0.1.5/LICENSE-MIT

hpfr 3/29/2025|
Doesn’t look like “yes, but” language to me. Looks like the code is plain old MIT and the author is doing their due diligence with respect to vendored content in the repository subject to different licensing. Seems like they are being paid by a company to work on this, so it’s not surprising that they actually pay attention to copyright.

The fact that many project maintainers forget about vendored content and haphazardly slap the MIT license (or whatever) verbatim into a LICENSE file doesn’t actually give you a get-out-of-paying-lawyers-free card! If anything, Xee’s COPYRIGHT file gives me more confidence in my legal footing than an unadulterated LICENSE file would. It indicates the maintainer at least has a basic understanding of how copyright applies to their project.

tracnar 3/28/2025||
Nice! I tried using XQuery (superset of XPath 3) for a while through the BaseX implementation. It's pretty nice, but you have to face XML problems like namespaces, document order, attributes vs nodes, you don't know if you can have 0, 1 or more nodes, etc. Something I wish was more readily available would be to run XPath against JSON, yaml, etc. It's a nicer language than say jq, but its ties to XML sometimes make it hard to transfer.

Another pain point with XML is the lack of inline schema, so the languages around like XPath have to work with arbitrary structures unlike say JSON where you at least have basic primitives like map/dict, numbers, bool, etc

trympet 3/28/2025||
I recently had the pleasure of using XSLT after never having seen it before. I used it to transform a huge 130K line XML manifest with MAPI property metadata into C# source code. It was so simple, readable, and intuitive to use.
squiggleblaz 3/29/2025|
I learnt XSLT in university back in the early/mid part of the first decade of this century. I didn't much enjoy it. I've never used it, but all my career I've had to deal with terrible ad hoc templating languages. I recently had total freedom to choose what terrible ad hoc templating language to use, and I chose XSLT. I actually totally liked it: and it seemed to have everything I've needed. In previous jobs, there was always tickets that amounted to "make a fork of the terrible ad hoc templating language and hack it until it does this", but I reckon XSLT could do everything and then some.
nickm12 3/29/2025||
This is fantastic to see! I've used XML off and on since it was the red hot tech of the early 2000s. I wouldn't choose it today for a green field project, but it's still around in so many places, so we definitely need a high-performance, high-quality library written in Rust for this.

This could become a great foundation for a typed, (mostly) etree-compatible, python library built on top of this. I've used lxml for years and it's still my goto, but there are lots of places where it could be modernized.

threecheese 3/28/2025||
This is great, I’ve been looking for performant and safe XML processing to replace IBM stuff (websphere/datapower) that we really only keep around for hw accelerated payload processing. At our scale, lxml and others + BYO gateway tech has a similar run cost even considering IBM licensing. I hate running their crap, which requires k8s at a version that’s some hair-thin slice above the minimum supported EKS version, it’s almost like they want us to live in 24/7 fear of being OOS.
1shooner 3/28/2025||
I miss the declarative purity of XSLT as an HTML templating layer. I'd love to know if there is a similar system for more popular/current web stack.
o_pax 3/28/2025||
This is really good news, I am looking forward to trying it out! Is XQuery also planned as an additional frontend? By the way, there is also χrust, a rust project working towards pretty similar goals (XPath 3.1, XQuery 3.1 and XSLT 3.0). At first glance, the architecture also seems quite similar, it is not as far along, though. Have you had any contact with them?
ianand 3/28/2025||
Fun fact: A decade ago the designer of HAML and Sass created a modern alternative to XSLT. https://en.wikipedia.org/wiki/Tritium_(programming_language)
smitty1e 3/28/2025|
> XML is now niche technology, but it's a bigger niche than you might think, and it's not going to go away any time soon.

When you consider that .docx, .pptx, and .xlsx files are zipped XML archives, "niche" seems a misnomer.

mdaniel 3/29/2025|
especially .xlsx which is some "hold my beer" for someone trying to encode a dataframe in .xml :-(
smitty1e 3/29/2025||
Openpyxl is a great library.
More comments...