Spanish legislation as a Git repo

Posted by enriquelop 7 hours ago

Spanish legislation as a Git repo(github.com)

616 points | 187 comments

_ache_ 4 hours ago|

In France, not only our law are versioned. It's formally proved too!

*Edit*: Woah ! The French crew is here. We are at least 5 quoting a variation of <https://www.legifrance.gouv.fr/> for versioning.

allan_s 4 hours ago|

I find it ironic to have it named "catalan(g)" on a post about spanish law.

asveikau 3 hours ago||

Even better. The Catalan word for Catalan is català. So catala-lang.org fits that too.

enriquelop 7 hours ago||

I built a pipeline that converts all Spanish state legislation into version-controlled Markdown. Each law is a file, each reform is a real git commit with the historical date. 8,642 laws, 27,866 commits.

The idea: legislation is just patches on patches on patches. Git already solves this. Instead of reading "strike paragraph 3 and replace with...", you get an actual diff.

The repo is the product. Browse any law, git log to see its full reform history, git diff to see exactly what changed.

Built the pipeline in ~4 hours with Claude Code. Source is BOE (Spain's official gazette) consolidated legislation API.

Exploring whether there's a business here — structured legislation API for legaltech/compliance, or just a useful open dataset. Curious what HN would build with this data.

artirdx 6 hours ago||

Laws intent are often clarified in courts through judgments. If you can overlay the judgements on top of the corresponding law, at correct points in time, I think that will have value. It might, for example, show which laws were referenced the most and which needed to be clarified the most. It might give insights into what legal language constructs stood the test of time and which had to be repeatedly clarified.

da_chicken 6 hours ago|||

That's true, but it might not be as important here.

Spain is not a country with a Common Law legal system entirely like the US or the UK. They have a civil law system where prior court judgement does not form a strictly binding precedent. Prior judgements can be important, but case law is not really a thing.

tephra 5 hours ago||

I wonder how true this is, we have the same system in Sweden, that court judgement are not legally binding precedent for lower courts. But in practice lower courts will follow the rulings made by the high court.

Is it not the same in Spain at all?

philistine 4 hours ago|||

It's the same in Spain, which makes OPs proposal kind of useless. The big distinction between a civil and a common law system is the fundamentals. A country's civil code is properly defined, while a common law's system is based on previous cases you have to dig through to find the basics.

amszmidt 3 hours ago|||

Would be nice if someone did it with Swedens laws too!

dotancohen 6 hours ago||||

Laws are often cascaded as well. Specifically in this case, Spain is subdivided into Comunidades Autonomas - each have their own elected parliament. And inside those are cities with their own local laws.

So while this project does track laws, is there any facility to determine which laws from which bodies are relevant to a specific activity in a specific location?

embedding-shape 5 hours ago|||

> And inside those are cities with their own local laws.

No, cities don't have their own laws, but the autonomous communities do have some influence in some laws and regulations (not all), like the amount of income tax you have to pay and so on. But cities within the autonomous communities don't have their own laws.

donalhunt 5 hours ago||

No by(e)-laws in Spain? Certainly a thing in the UK, Ireland and I believe US and Canada. Is that a common law thing?

Mordisquitos 5 hours ago|||

Local authorities in Spain do have the authority to enact their own law-ish regulations, which are called 'ordenanzas'. For example, if I remember correctly, motorbikes are allowed to park on the pavement by default in Barcelona unless a sign says otherwise, but it is forbidden in Madrid unless a sign explicitly allows it.

I think local government in Spain has at least as much authority as it does in the UK, maybe more, but almost certainly less than it does in the US.

embedding-shape 5 hours ago|||

"By-laws" is typically the name of the rules/"laws" inside of a company or organization, I'm not familiar with that word in the context of "nation-wide criminal/civil laws".

Regardless, cities do not have their own "local laws" in the way your comment made it seem. We have national laws, and minor differences in various autonomous communities, since they have some legislative power to control their own industry, commerce, education and some more stuff.

dylan604 2 hours ago|||

> inside of a company or organization,

Corps and cities are very similarly structured. Each are charted at the start, with corps getting governed by boards and c-suite types while cities have mayors and city council types. Both file paperwork to exist within the state. Both are subject to state laws, but are allowed to make up regulations specific to them as long as they are within the state's laws.

In the end, it's all just paperwork, at least in the US

eep_social 3 hours ago||||

as an american I might call those “local ordinance” when they come from a smaller rulemaker like a town

ninalanyon 4 hours ago|||

> "By-laws" is typically the name of the rules/"laws" inside of a company

I suspect that this should be qualified by "in the US"

embedding-shape 4 hours ago||

No, I was talking about Spain, I have no idea how it works in the US. I thought mentioning "autonomous communities" was enough context to make it evident, but maybe it wasn't.

Mordisquitos 5 hours ago||||

I may be wrong, but I think autonomous community legislation is not published in the BOE itself (the Official State Gazette), but rather in each of their corresponding official gazettes (e.g. DOGC for Catalonia, BOCM for Madrid, BOA for Aragon, BOJA for Andalusia, etc.).

youknownothing 4 hours ago|||

yes: Comunidades Autonomas can only defined laws as "permitted" by the central government under a Estatuto de Autonomia (Autonomy statute? not good with legal jargon), which is effectively a law of its own. So at the central level the law says "in this particularly region, matters of education are dealt with regionally", and then that's when regional laws apply. Same from local laws. In essence, all laws emanate from the central government, but the central government decides to delegate some areas; technically, they could always take it back.

pseingatl 5 hours ago||||

Rarely in a civil law jurisdiction, essential in common-law jurisdictions.

SOLAR_FIELDS 5 hours ago|||

Perhaps reference it in the commit trailer?

manunamz 31 minutes ago|||

Very cool project. How are you thinking about indexing and discoverability? Git gives you the change history, but navigating the corpus itself seems like the harder problem: Finding related laws, understanding hierarchical relationships between statutes...

Have you considered embedding semantic hierarchical structure directly in the markdown? Something like https://github.com/wikibonsai/semtree ? It lets you build a navigable tree across markdown files using indented [[wikilinks]] as the organizational spine. Could be a natural fit for legislation that already has an inherent taxonomy (constitutional → organic → ordinary, or by subject area).

Bewelge 6 hours ago|||

Oooh Can you elaborate a bit how the gazette is publishing them? Like what format did you have to parse. And how many documents were there in total? I tried doing the same for German laws 1-2 years ago but LLMs weren't smart enough yet. And the costs would've been at least a couple of thousand €.

Ed: Nevermind, I missed the "BOE (Spain's official gazette) consolidated legislation API" part. Sending jealous greetings from Germany. We just have a bunch of PDFs in Germany. And the private entity that has been publishing them for decades even claims copyright on them!

sivann 4 hours ago||

Heh we have the exact same status in Greece. It’s sad the upstream is so sloppy.

rmonvfer 4 hours ago|||

I looked into this a while back and IIRC, the consolidated legislation doesn’t cover all legislation but only a handful.

Also, in my experience (having built in this space before), regulations aren’t really the issue. Court rulings are, because there’s no open data for them in Spain. And the potential users for a paid product (legal professionals) already know the law; the key players (big law firms) have their own databases of annotated and verified court rulings and other documents.

youknownothing 4 hours ago|||

This is brilliant. I had thought about this for a long while, you see laws that are just "go to law 132 and amend paragraph 4, then go to law 24 and amend paragraph 9". Basically "laws" are recorded as diffs, and then it's up to the reader to put up the final product in their heads. They should be doing it this way!

airstrike 4 hours ago|||

This is really cool. I've thought about it for a long time as well but never had the idea of just using git, which is equal parts genius and "obvious" in hindsight, as most great ideas are.

I think the corollary that comes to mind is that reforms, with their git commits, are incrementally valuable if they refer to other parts of the legislation, previous commits, etc. to give more context as to the intent at the time of the law. So maybe there's a way to distill the legislative process into more PR and commit-oriented work—likely ex post as you did here, but perhaps in the future as part of an actual workflow.

And then maybe I'd pitch the idea to some technologically-inclined local government.

daedric7 5 hours ago|||

Please! Can you make the same for Portugal? Laws here are a mess of reforms...

upcoming-sesame 5 hours ago||

is there a similar API for Portugal?

aerhardt 2 hours ago|||

I’ve had the idea of playing with our laws and trying to ask questions about their growing volume and complexity. This is timely and dope Enrique - mil gracias!

__MatrixMan__ 4 hours ago|||

It would be a good place to start if you wanted to hard fork the government.

Mordisquitos 5 hours ago|||

Congratulations, this is a brilliant resource. You have done one of those countless things which I often think about doing, but my utter lack of follow-through and other distractions make it a fantasy. I cannot wait to clone the repo and explore it.

As to what can be done with the data, maybe one interesting step could be a graph-database regarding laws which reference other laws or the definitions that they depend on?

zer00eyz 1 hour ago|||

Too bad that author, and committer are individuals and not lists. It would be good to see who wrote them and how the voting went as well.

7777777phil 7 hours ago||

cool idea, how far back (in time) do those 27k commits go?

Just thinking how this could maybe used for (automated) research / visualization on the evolution of (spanish - in this case) law

codethief 6 hours ago||

> how far back (in time) do those 27k commits go

Looking at the commit dates (which seem to be derived from the original publication dates) the history seems quite sparse/incomplete(?) I mean, there have only been 26 commits since 2000.

Meneth 6 hours ago||

It seems the commits aren't in proper date order. Here are some newer changes, placed before the latest commits: https://github.com/EnriqueLop/legalize-es/commits/master/?af...

forgotpwd16 4 hours ago||

It's related to commits actually having a parent-child structure (forming a graph) and timestamps (commit/author) being metadata. So commits 1->2->3->4 could be modified to have timestamps 1->3->2->4. I know GitHub prefers sorting with author over commit date, but don't know how topology is handled.

codethief 2 hours ago||

> It's related to commits actually having a parent-child structure (forming a graph) and timestamps (commit/author) being metadata.

Yeah, I think everyone is aware. It's just that the last couple dozen commits, to me, looked like commits had been created in chronological order, so that topological order == chronological order.

> I know GitHub prefers sorting with author over commit date, but don't know how topology is handled.

Commits are usually sorted topologically.

lcrisci 4 hours ago||

I love it. This is a step in the right direction to have a transparent database of existing laws and be able to consult them with your AI or anything capable to reason about them and explain the status quo of our national laws. I would love to see a similar setup for other countries.

wrxd 3 hours ago||

It would have been cool if the commit authors reflected the actual politicians responsible for the reforms. Find a law, run `git blame` and immediately know who’s responsible for it

deepsun 2 hours ago|

And even more useful would be unit-tests -- here is a loophole and here is the law preventing it.

Whenever a law is about to be changed/removed, run all the tests to make sure no regressions.

psychoslave 2 hours ago|||

Jurisdictional laws don't work that way though. It's more like a script for improvised theater. Everybody get the same text, but no one gets the same performance twice.

ecocentrik 2 hours ago|||

Tests for correctness, self similarity, duplication of concerns, contradictory statutes, edge case detection, cruft or outdated laws that muddy the waters...

If the full compliment of software development practices were applied to legislation and ordinances we would be living in a very different world.

dylan604 2 hours ago||

oh gawd, code is law is back. or is it law is code?

theptip 4 hours ago||

Nice! I was just implementing this for CA state bills.

Is the parsing/uploading code shared somewhere else?

Definitely the kind of idea that would have been below my activation energy pre-Claude.

I think this approach should be standard, I have always wondered why the source of truth for these documents is not moved to a repo like git.

dylan604 2 hours ago|

my first reaction cynically would bet that government really doesn't want the people to know exactly what the laws are. a more generous reason would be nobody in law is truly technical enough to understand it let alone implement it.

j-bos 7 hours ago||

This is brilliant. I wish this were available for all legislations. There's so many inefficiencies that are trivially solved with existing tech frameworks.

Schmerika 6 hours ago||

> There's so many inefficiencies that are trivially solved with existing tech frameworks.

There really, really are.

The legal industry is well aware of that fact - and how many billable hours they stand to lose by making their work more efficient and understandable.

You know how tax prep companies spent over $90m 'lobbying' Congress to ensure that filing your taxes remains difficult and complicated [0]?

Well, lawyers know just as well or better how to butter their bread; and they will pull out every dirty trick they have to scupper attempts to make practising law more transparent or efficient in any way.

0 - https://www.opensecrets.org/news/2023/09/tax-prep-companies-...

DennisP 5 hours ago|||

It's not just the legal industry, it's the legislators. I used to be friends with a former state senator, who had a background in forensic accounting. She said they purposely made the bills harder to parse than necessary so it was hard to figure out what they were actually doing. Given enough time, people could do it but in practice there wasn't time before voting on the bill, and that was on purpose too. Of course some of it was to reward lobbyists or do other unpopular things, but she used to read bills from back to front because the back was where they put all the graft. An example I remember was $50K in taxpayer money going to a congressman's birthday party.

For a while I thought about trying to write software that would turn the obscure natural-language diffs in written bills into a readable diff, showing the laws before and after with highlighted changes. But she said they just got the bills as paper printouts which weren't always even up-to-date, so it might not have helped much. Maybe now they're online. And LLMs might make the project easier.

FuckButtons 3 hours ago||

Presumably, there must be some point in time where the bill is made public in some form before going to a vote. If you could get the right tool in the hands of a journalist to turn whatever obscure format it’s in into something legible by an ordinary person there’s probably value there.

newyankee 4 hours ago||||

and then we have people touting Jevon's paradox as outcome of AI disruption leading to more work. Before we create new work we need to figure out how to reduce incentive of people to unnecessarily complicate stuff and to be honest the answer is never clean or easy

wolfi1 6 hours ago|||

unfortunately, laws are not everything. you need to know how to get around them. our country for example has the habit of creating a lex fugitiva that means that some regulations could be changed in other not related laws. good luck finding the correct regulation without a law degree

dgreisen 6 hours ago|||

Our nonprofit, Open Law Library, is working on this exact problem. It is definitely not trivial, but it is very doable. We partner directly with governments to help them implement so the git repos become the canonical record (rather than just an unofficial mirror).

Maryland just launched their regs on our platform:

https://regs.maryland.gov (https://github.com/maryland-dsd/law-xml-codified)

Feel free to reach out (email in bio) if you would like your community to publish their official laws on GitHub!

rafram 6 hours ago|||

Everyone in government knows what Track Changes is. The standard format of a piece of legislation in British-influenced systems is a diff. The tech field does not have secret knowledge that the rest of humanity lacks.

dinkumthinkum 3 hours ago||

I was thinking the same thing. I feel like people in software tech always think they have secret powers that experts in other fields can't imagine.

appstorelottery 7 hours ago|||

I couldn't agree more - this is fantastic work.

Esophagus4 6 hours ago|||

> so many inefficiencies that are trivially solved with existing tech frameworks.

Out of curiosity, like what specifically?

Didn’t DOGE’s failure highlight that it actually wasn’t trivial? I’m skeptical at first glance but open to being proven wrong.

bojan 6 hours ago|||

DOGE wasn't actually trying to make things more efficient. You can't count it as an honest attempt.

0x3f 3 hours ago||||

> Out of curiosity, like what specifically?

For example, there are thousands of divisions of government out there provisioning largely the same systems in duplicate. E.g. the very local government here has a web portal for the sports venue bookings like pools and tennis courts. They have a waste collection portal. Local tax portal.

Only recently has this been slightly standardized but even those efforts are purely regional. You might get 5 local councils in the city using one SaaS platform, another 5 using another SaaS platform, and another 5 rolling their own. For each function of local government.

Nevermind the fact that a local government in France like this probably has very similar needs to one in Belgium or even the US.

And the worst part is they are terrible at procurement so even when they do consolidate, they're basically getting scammed.

I often think about starting a cost-plus-priced open core project to deal with these issues. Like we build common government functions, and sell it for cost plus 20% markup, with a licence that lets the gov run it themselves if we ever go bust. But then I think procurement is largely a grift game and it might not do well for that reason.

hirako2000 6 hours ago|||

DOGE made the token gain more in market cap than it saved in expenses. Despite having at head a master of blind layoffs.

f1shy 6 hours ago|||

I would like to have a legal advisor based on that. At least for a first question, qithout paying a lawyer

idiotsecant 6 hours ago||

And, in the example of the stereotypical venture capital seeking techbro junk that has somehow infected the entire world, this project doesn't actually understand or solve any real world problems.

No shade on the author, they made a fun thing. I'm directing my cannons more towards the parent post idea that the world needs software developers for their rare genius to use their beautiful brains to solve problems in ways no actual participant in the system could have ever thought of.

The additude that because you can prompt a LLM to write some python you are also uniquely situated to solve the world's problems is how we built an entire generation of automated solutions worse than what we had before.

Quarrel 6 hours ago||

Great project.

For others wondering, while most of the Franco-era laws were nuked in 1978, this does include lots of old laws (ie pre-20th C).

However, the source material starts with a sqashed commit in 1960 :) So no changelog before that. The BOE source though is pretty phenomonal, they've scanned files going back to the 1600s so far.

cyrusradfar 6 hours ago||

I think this is great. Only limit of git is I can't imagine "git blame" works. It would be nice to know who voted for and against each patch. Git isn't structured for collaborative commits.

hirako2000 6 hours ago||

That could actually be a git commit log with date, votes and other metadata.

But getting the entire country's law into git is already an impressive feat.

3eb7988a1663 3 hours ago||

If we are spitballing, I think there should be an actual file associated with the text so you can see the vote. A file makes it trivial to grep for "Senator X".

Not git, but Congress actually does have quite a bit of data digitized. A random example[0] -they even provide XML. The Congress data is going to give you all bills - many of which do not pass, so a different mission than this project.

[0] https://www.congress.gov/bill/118th-congress/house-bill/4818

embedding-shape 6 hours ago|||

> Git isn't structured for collaborative commits.

Git isn't structured for collaborative commits, but community-wide conventions kind of "patches" support for it on top of the git message body, via "Co-Authored-By: name <name@example.com>" which IIRC most platforms support, and the convention itself initially comes from Linux kernel development.

1718627440 5 hours ago|||

You could have the parliament (meaning including the election cycle) as the main author and then the parties and votes as Co-Authors.

dinkumthinkum 3 hours ago|||

Is this really a problem we have now, though? This information is publicly available. If everyone here is so excited about LLMs then why would this even be needed? Anthropic can just give us the answer to every question. We don't need nerds that know what git is. :)

nuhuhh 6 hours ago||

Yeah you can, just smash commits on the PR where multiple contributed. It will say it was a collaborative commit in history showing all their avatars.

josalhor 5 hours ago||

Not only would be cool for laws to have appropiate time stamps so we can "go back in time to how it was at a certain moment", but also if we could have proper git commit diffs of how laws change over time. See this: https://www.boe.es/buscar/act.php?id=BOE-A-2015-11430

You can see how certain articles have the option to check "how that particular article was at each moment in time". That would be way harder to track, but it would be awesome if not only could you "go back in time and see what the law was" but also "how its been evolving".

jonhohle 5 hours ago||

Not only that, but authors and approvers could be used to track who created and voted for each change.

dan_linder 5 hours ago||

Then compare wording and structure with other bills proposed elsewhere to look for single sources trying to legalize an agenda or retry after earlier failed attempts.

upcoming-sesame 4 hours ago||

it would also be cool to know the "why" that went into the change

sigio 6 hours ago|

I did the same with a limited subset of dutch laws a while back: https://github.com/sigio?tab=repositories&q=wetboek

More comments...