Posted by enriquelop 7 hours ago
*Edit*: Woah ! The French crew is here. We are at least 5 quoting a variation of <https://www.legifrance.gouv.fr/> for versioning.
The idea: legislation is just patches on patches on patches. Git already solves this. Instead of reading "strike paragraph 3 and replace with...", you get an actual diff.
The repo is the product. Browse any law, git log to see its full reform history, git diff to see exactly what changed.
Built the pipeline in ~4 hours with Claude Code. Source is BOE (Spain's official gazette) consolidated legislation API.
Exploring whether there's a business here — structured legislation API for legaltech/compliance, or just a useful open dataset. Curious what HN would build with this data.
Spain is not a country with a Common Law legal system entirely like the US or the UK. They have a civil law system where prior court judgement does not form a strictly binding precedent. Prior judgements can be important, but case law is not really a thing.
Is it not the same in Spain at all?
So while this project does track laws, is there any facility to determine which laws from which bodies are relevant to a specific activity in a specific location?
No, cities don't have their own laws, but the autonomous communities do have some influence in some laws and regulations (not all), like the amount of income tax you have to pay and so on. But cities within the autonomous communities don't have their own laws.
I think local government in Spain has at least as much authority as it does in the UK, maybe more, but almost certainly less than it does in the US.
Regardless, cities do not have their own "local laws" in the way your comment made it seem. We have national laws, and minor differences in various autonomous communities, since they have some legislative power to control their own industry, commerce, education and some more stuff.
Corps and cities are very similarly structured. Each are charted at the start, with corps getting governed by boards and c-suite types while cities have mayors and city council types. Both file paperwork to exist within the state. Both are subject to state laws, but are allowed to make up regulations specific to them as long as they are within the state's laws.
In the end, it's all just paperwork, at least in the US
I suspect that this should be qualified by "in the US"
Have you considered embedding semantic hierarchical structure directly in the markdown? Something like https://github.com/wikibonsai/semtree ? It lets you build a navigable tree across markdown files using indented [[wikilinks]] as the organizational spine. Could be a natural fit for legislation that already has an inherent taxonomy (constitutional → organic → ordinary, or by subject area).
Ed: Nevermind, I missed the "BOE (Spain's official gazette) consolidated legislation API" part. Sending jealous greetings from Germany. We just have a bunch of PDFs in Germany. And the private entity that has been publishing them for decades even claims copyright on them!
Also, in my experience (having built in this space before), regulations aren’t really the issue. Court rulings are, because there’s no open data for them in Spain. And the potential users for a paid product (legal professionals) already know the law; the key players (big law firms) have their own databases of annotated and verified court rulings and other documents.
I think the corollary that comes to mind is that reforms, with their git commits, are incrementally valuable if they refer to other parts of the legislation, previous commits, etc. to give more context as to the intent at the time of the law. So maybe there's a way to distill the legislative process into more PR and commit-oriented work—likely ex post as you did here, but perhaps in the future as part of an actual workflow.
And then maybe I'd pitch the idea to some technologically-inclined local government.
As to what can be done with the data, maybe one interesting step could be a graph-database regarding laws which reference other laws or the definitions that they depend on?
Just thinking how this could maybe used for (automated) research / visualization on the evolution of (spanish - in this case) law
Looking at the commit dates (which seem to be derived from the original publication dates) the history seems quite sparse/incomplete(?) I mean, there have only been 26 commits since 2000.
Yeah, I think everyone is aware. It's just that the last couple dozen commits, to me, looked like commits had been created in chronological order, so that topological order == chronological order.
> I know GitHub prefers sorting with author over commit date, but don't know how topology is handled.
Commits are usually sorted topologically.
Whenever a law is about to be changed/removed, run all the tests to make sure no regressions.
If the full compliment of software development practices were applied to legislation and ordinances we would be living in a very different world.
Is the parsing/uploading code shared somewhere else?
Definitely the kind of idea that would have been below my activation energy pre-Claude.
I think this approach should be standard, I have always wondered why the source of truth for these documents is not moved to a repo like git.
There really, really are.
The legal industry is well aware of that fact - and how many billable hours they stand to lose by making their work more efficient and understandable.
You know how tax prep companies spent over $90m 'lobbying' Congress to ensure that filing your taxes remains difficult and complicated [0]?
Well, lawyers know just as well or better how to butter their bread; and they will pull out every dirty trick they have to scupper attempts to make practising law more transparent or efficient in any way.
0 - https://www.opensecrets.org/news/2023/09/tax-prep-companies-...
For a while I thought about trying to write software that would turn the obscure natural-language diffs in written bills into a readable diff, showing the laws before and after with highlighted changes. But she said they just got the bills as paper printouts which weren't always even up-to-date, so it might not have helped much. Maybe now they're online. And LLMs might make the project easier.
Maryland just launched their regs on our platform:
https://regs.maryland.gov (https://github.com/maryland-dsd/law-xml-codified)
Feel free to reach out (email in bio) if you would like your community to publish their official laws on GitHub!
Out of curiosity, like what specifically?
Didn’t DOGE’s failure highlight that it actually wasn’t trivial? I’m skeptical at first glance but open to being proven wrong.
For example, there are thousands of divisions of government out there provisioning largely the same systems in duplicate. E.g. the very local government here has a web portal for the sports venue bookings like pools and tennis courts. They have a waste collection portal. Local tax portal.
Only recently has this been slightly standardized but even those efforts are purely regional. You might get 5 local councils in the city using one SaaS platform, another 5 using another SaaS platform, and another 5 rolling their own. For each function of local government.
Nevermind the fact that a local government in France like this probably has very similar needs to one in Belgium or even the US.
And the worst part is they are terrible at procurement so even when they do consolidate, they're basically getting scammed.
I often think about starting a cost-plus-priced open core project to deal with these issues. Like we build common government functions, and sell it for cost plus 20% markup, with a licence that lets the gov run it themselves if we ever go bust. But then I think procurement is largely a grift game and it might not do well for that reason.
No shade on the author, they made a fun thing. I'm directing my cannons more towards the parent post idea that the world needs software developers for their rare genius to use their beautiful brains to solve problems in ways no actual participant in the system could have ever thought of.
The additude that because you can prompt a LLM to write some python you are also uniquely situated to solve the world's problems is how we built an entire generation of automated solutions worse than what we had before.
For others wondering, while most of the Franco-era laws were nuked in 1978, this does include lots of old laws (ie pre-20th C).
However, the source material starts with a sqashed commit in 1960 :) So no changelog before that. The BOE source though is pretty phenomonal, they've scanned files going back to the 1600s so far.
But getting the entire country's law into git is already an impressive feat.
Not git, but Congress actually does have quite a bit of data digitized. A random example[0] -they even provide XML. The Congress data is going to give you all bills - many of which do not pass, so a different mission than this project.
[0] https://www.congress.gov/bill/118th-congress/house-bill/4818
Git isn't structured for collaborative commits, but community-wide conventions kind of "patches" support for it on top of the git message body, via "Co-Authored-By: name <name@example.com>" which IIRC most platforms support, and the convention itself initially comes from Linux kernel development.
You can see how certain articles have the option to check "how that particular article was at each moment in time". That would be way harder to track, but it would be awesome if not only could you "go back in time and see what the law was" but also "how its been evolving".