Debian bookworm live images now reproducible

Posted by bertman 3/26/2025

Debian bookworm live images now reproducible(lwn.net)

766 points | 200 comments

jcmfernandes 3/26/2025|

Insane effort. This sounded like a pipe dream just a couple of years ago. Congrats to everyone involved, especially to those who drove the effort.

Joel_Mckay 3/26/2025|

The Debian group is admirable, and have positively changed the standards for OS design several times. Reminds me I should donate to their coffee fund around tax time =3

alfiedotwtf 3/27/2025|||

Exactly!

I’ve said it many times and I’ll repeat it here - Debian will be one of the few Linux distros we have right now, that will still exist 100 years from now.

Yea, it’s not as modern in terms of versioning and risk compared to the likes of Arch, but that’s also a feature!

roenxi 3/27/2025|||

> Debian will be one of the few Linux distros we have right now, that will still exist 100 years from now.

It'd certainly be nice, but if you've ever seen an organisation unravel it can happen with startling speed. I think the naive estimate is if you pick something at random it is half-way through its lifespan; so there isn't much call yet to say Debian will make it to 100.

Y_Y 3/27/2025|||

> I think the naive estimate is if you pick something at random it is half-way through its lifespan; so there isn't much call yet to say Debian will make it to 100.

This doesn't strike me as a strong argument. That naive estimate (in whatever form[0]) is typically based on not knowing anything else about the process you're looking at. We have lots of information about Debian and similar projects, and you can update your estimate (in a Bayesian fashion) when you know this. Given that Ian Murdock started Debian 31 years ago I think more than 100 years is a very reasonable guess.

[0] see e.g. https://en.wikipedia.org/wiki/Lindy_effect

Joel_Mckay 3/27/2025|||

Arguably, there is already the continuous package deprecation process that often leads to unpopular projects getting culled in the next upgrade.

In a way, Flatpak/Snap/Docker was mitigations to support old programs on new systems, and old systems with updated software no longer compatible with the OS. Not an ideal solution, but a necessary one if folks also wanted to address the win/exe dominant long-term supported program versions.

If working with unpopular oddball stuff one notices the packages cycle out of the repositories rather regularly. =3

OrderlyTiamat 3/27/2025|||

I appreciate the Lindy effect, but I'd be very cautious in applying it in just any domain. In particular IT, where new project continually spring up to dethrone others. Another 30 years for Debian seems reasonable, but I'd probably bet against another 100. A metaculus question for the longevity of projects like debian would be fascinating.

gorjusborg 3/27/2025|||

> In particular IT, where new project continually spring up to dethrone others.

The Lindy effect says nothing about popularity, which is how I translate your use of 'dethrone' here. It observes that something's duration of existence correlates with its chances for existence in the future.

rovr138 3/27/2025|||

> In particular IT, where new project continually spring up to dethrone others

Yet, it's lasted 31 years, which is a pretty insane amount of time in tech. This on top of being kept up to date, good structure, really good contributions and advancements.

On the other hand, you look at centos, redhat, and oracle and their debacle. How much did they fragment that area.

And then we have Debian just chugging along.

Joel_Mckay 3/27/2025|||

Indeed, it was sad when they ended the FreeBSD based Debian project due to a lack of interest.

I don't think traditional von Neumann architecture will even be around in 100 years, as energy demands drive more efficient designs for different classes of problems. =3

vbezhenar 3/27/2025||||

I feel more safe using Arch, compared to Debian. Debian adds so much of their patches on top of original software, that the result is hardly resembles the original. Arch just ships original code almost always. And I trust much more to the original developers, than Debian maintainers.

palata 3/27/2025|||

> And I trust much more to the original developers, than Debian maintainers.

Then that's a good reason not to use Debian indeed. Whatever the distro you choose, you give your trust to its maintainers.

But that's also a feature: instead of trusting random code from the Internet, you can trust random code from the Internet that has been vetted by a group of maintainers you chose to trust. Which is a bit less random, I think?

Joel_Mckay 3/27/2025|||

Debian standardized the vetting process for maintainers, validation environments, and shenanigans could be attributed to individual signatures rather quickly.

If you ever want a laugh, one should read what Canonical puts the kids though for the role. One could get a job flying a plane with less paperwork...

Authenticated signed packaging is often a slow process, and some people do prefer rapid out-of-band pip/npm/cargo/go until something goes sideways... and no one knows who was responsible (or which machine/user is compromised.)

Not really random, but understandably slow given the task of reaching "stable" OS release involves hundreds of projects... =3

palata 3/27/2025||

Yeah I think that's what I was trying to say. With a distro, you get some kind of validation by maintainers. With unvetted package managers, you just get something from somewhere.

vbezhenar 3/27/2025|||

I don't trust in any validation by the maintainers. There's too much code even in small projects. Big projects are oceans of code. Maintainers maintain too much packages to be able to understand even a little bit of changes. So, no, I don't trust it. It would require a specialized team of engineers for every single project to analyze changes in new versions. It just does not happen.

Best they can do is to follow developer's instructions to build a binary artefact and upload it somewhere. May be codify those instructions into a (hopefully) repeatable script like PKGBUILD.

palata 3/27/2025||

> Best they can do is to follow developer's instructions to build a binary artefact and upload it somewhere. May be codify those instructions into a (hopefully) repeatable script like PKGBUILD.

I don't understand; isn't this exactly what maintainers do? They write a recipe (be it a PKGBUILD or something else) that builds (maybe after applying a few patches) a package that they then distribute.

Whether you use Arch or Debian, you trust that the maintainers don't inject malware into the binaries they ship. And you trust that the maintainers trust the packages they distribute. Most likely you don't personally check the PKGBUILD and the upstream project.

vbezhenar 3/27/2025||

No, they alter and modify the software as they see fit.

Here's one of the recent examples: https://www.reddit.com/r/debian/comments/1cv30gu/debian_keep...

And that's applied to a lot of packages. Sometimes it leads to frustrated users who directly come to frustrated developers who have no idea what they're talking about, because developers did not intend software to be patched and built this way. Sometimes this leads straight to vulnerabilities. Sometimes this leads to unstable software, for example when maintainer "knows better" which libraries the software should link to.

yjftsjthsd-h 3/27/2025|||

> Here's one of the recent examples: https://www.reddit.com/r/debian/comments/1cv30gu/debian_keep...

They used an official build option to not ship a feature by default, and have another package that does enable all features. If that's your best example of

> Debian adds so much of their patches on top of original software, that the result is hardly resembles the original.

then I'm inclined to conclude that Debian is way more vanilla than I thought.

palata 3/27/2025||||

> No, they alter and modify the software as they see fit.

Well yeah, but you choose the maintainers that do it the way you prefer. In your care you say you like Arch better, because they "patch less" (if I understand your feeling).

Still they do exactly what you describe they should do: write a recipe, build it and ship a binary. You can even go with Gentoo if you want to build (and possibly patch) yourself, which I personally like.

> Here's one of the recent examples: [...]

Doesn't seem like it supports your point: the very first comment on that Reddit threads explains what they did: they split one package into two packages. Again, if you're not happy with the way the Debian maintainers do it, you can go with another distro. Doesn't change the fact that if you use a distro (as opposed to building your own from scratch), then you rely on maintainers.

Joel_Mckay 3/27/2025|||

In general, the apparent use-case and actual unintended impact on OS security must be clear. There is also always extreme suspicion regarding "security" widgets that touch the web browser, shell, or email programs. Normally, after something like CVE-2023-35866 is noted, a package maintainer may assume the project is a liability given the history.

If an application requires a 3 page BS explanation about how to use a footgun without self-inflicted pwning... it seems like bad design for a posix environment.

People that attempt an escalation of coercion with admins usually get a ban at minimum. Deception, threats, and abuse will not help in most cases if the maintainer is properly trained.

https://www.youtube.com/watch?v=lITBGjNEp08

Have a nice day, =3

freedomben 3/27/2025|||

I love Debian, but this is a genuine deal that many people don't know about. It also compounds if you're on Ubuntu as sometimes Canonical adds their own patches too. If you're just using Debian as a base OS to serve your own software, it doesn't matter as much but still does somewhat. It's not unusual for Debian-specific patches to be applied by the package maintainers in order to fix build errors, mismatched dependencies, etc. Most of the time those patches are harmless, but sometimes they are not. There have been security vulnerabilities for example that only existed in the Debian-based package of software. No distro is perfect and I don't intend this as a criticism of Debian (as they have legitimate reasons for doing what they do), and no distro (not even Arch) ships everything without any patches, but in my years of experience I've bumped my head on this in Debian several times.

progval 3/27/2025||

> There have been security vulnerabilities for example that only existed in the Debian-based package of software.

Any examples more recent than CVE-2008-0166?

freedomben 3/27/2025||

Currently on mobile and going from memory, but I remember having to push out quick patches for something around 2020-ish or late 2010s? The tip of my tongue says it was a use-after-free vuln in a patch to openssl, but I can't remember with confidence. I'll see if I can find it once I get home.

Worth noting lest I give the wrong impression, I don't think security is a reason to avoid Debian. For me the hacked up kernels and old packages have been much more the pain points, though I mostly stopped doing that work a few years ago. As a regular user (unless you're compiling lots of software yourself) it's a non-issue

Joel_Mckay 3/27/2025||

In general, most responsibly reported CVE allow several weeks for the patch fixes to propagate into the ecosystems before public disclosure.

Once an OS is no longer actively supported, it will begin to accumulate known problems if the attack surface is large.

Thus, a legacy complex-monolith or Desktop host often rots quicker than a bag of avocados. =3

walrus01 3/27/2025|||

It's quite easy to run Debian unstable (sid) if you want a more risky approach to having the newest of every package.

cess11 3/27/2025||

Commonly these days you can also add specific repos for the things you want to be more on the edge. Then there are some tools one might install manually, at the moment I remember doing it with fzf.

presbyterian 3/27/2025|||

Flatpak is also a great option for apps you might want to be more up-to-date than Debian provides in their package manager.

sgarland 3/27/2025|||

Yep, I do this for a few tools. Though apt-key deprecation still hasn’t been universally accepted, so that’s always a minor annoyance to deal with.

toasteros 3/27/2025|||

I always want to donate more to open source projects but as far as I know there aren't any I can get tax credits for in Canada. My budget is strapped just enough that I can't quite afford to donate for nothing.

Any Canadian residents here know of any tax credit eligible software projects to donate to?

Joel_Mckay 3/27/2025||

Depends on where you live, work, and invest. Still, I would recommend chatting with a local accountant to be sure if a significant contribution to a donee qualifies as deductible. Note most large universities will be registered in both the US/Canada.

https://www.canada.ca/content/dam/cra-arc/formspubs/pub/p113...

Best regards, =3

imcritic 3/26/2025||

I don't get how someone achieves reproducibility of builds: what about files metadata like creation/modification timestamps? Do they forge them? Or are these data treated as not important enough (like it 2 files with different metadata but identical contents should have the same checksum when hashed)?

jzb 3/26/2025||

Debian uses a tool called `strip-nondeterminism` to help with this in part: https://salsa.debian.org/reproducible-builds/strip-nondeterm...

There's lots of info on the Debian site about their reproducibility efforts, and there's a story from 2024's DebConf that may be of interest: https://lwn.net/Articles/985739/

frakkingcylons 3/27/2025||

I see this is written in Perl, is that the case with most Debian tooling?

lamby 3/27/2025|||

One of the authors of strip nondeterminism is here. The primary reason it's written in Perl is that given that strip-nondeterminism is used when building 99.9% of all Debian packages, using any other language would have essentially made that language's runtime a dependency for all building Debian packages. (Perl is already required by the build process, whilst Python is not.)

flkenosad 3/27/2025||

Question: is Perl the only runtime the Debian build process relies on?

yrro 3/28/2025||

Any packages with "Essential: yes" (run 'apt list ~E' to see them) are required on any Debian system. Additionally, the 'build-essential' pulls in other packages that must be present to build Debian packages via its dependencies: https://packages.debian.org/sid/build-essential

fooker 3/27/2025||||

It’s helpful to think of Perl as a superior bash, rather than a worse python, when it comes to scripting.

gjvc 3/27/2025|||

stealing this, thank you

nukem222 3/27/2025||||

Notably, they forgot to improve on readability and maintability, both of which are markedly worse with perl.

Look I get people use the tools they use and perl is fine, i guess, it does its job, but if you use it you can safely expect to be mocked for prioritizing string operations or whatever perl offers over writing code anyone born after 1980 can read, let alone is willing to modify.

For such a social enterprise, open source orgs can be surprisingly daft when it comes to the social side of tool selection.

Would this tool be harder to write in python? Probably. Is it a smart idea to use it regardless? Absolutely. The aesthetics of perl are an absolute dumpster fire. Larry Wall deserves persecution for his crimes.

sgarland 3/27/2025||

Did you miss the post a few above yours, where an author of this tool explained why it’s written in Perl? Introducing a new language dependency for a build, especially of an OS, is not something you undertake lightly.

nukem222 3/27/2025||

Right. Good luck finding people who want to maintain that. It just seems incredibly short-sighted unless the current batch of maintainers intend to live forever.

sgarland 3/27/2025||

Counterpoint: if someone knows Perl, they are much more likely to have the requisite skills to be a maintainer for a distro. It’s self-selection.

Imagine the filtering required for potential maintainers if they rewrote the packaging to JS.

eviks 3/27/2025|||

How is that helpful to ignore a better alternative just because a worse one exists?

palata 3/27/2025|||

They precisely say they use it as a better alternative to bash. Obviously they don't think that Python is a better alternative here... or did I misunderstand the question?

eviks 3/27/2025||

Not obvious to me that they think Python is worse than Perl, and make the phrase even less sensible.

dizhn 3/27/2025|||

Weird wording yes. I read it as "yes perl is better than bash" (I assume for tasks that need actual programming languages), "no it's not worse than python".

ben0x539 3/27/2025||

I'm not reading it as "it's not worse than python", I am reading it as "the choice was between bash and perl, python was not an option for reasons unrelated to its merits"

palata 3/27/2025|||

So you genuinely believe that they think Python is a better choice in this case, but still chose to go for Perl because they believe it's worse? How does that work?

eviks 3/27/2025||

It works by not mixing two different people: the commenter and the implementer.

Also, it works trivially even in the case of the implementer - he might believe Python is better, but chose Perl because he likes it more

fooker 3/28/2025|||

The same reason people write C++ instead of better^TM alternatives.

Pick the tool you already know and focus on solving the problem.

londons_explore 3/27/2025||||

Packaging and making build scripts is perhaps one of the most unrewarding tasks out there. As an open source project where most work is done for free, debian can't afford to be prescriptive about what languages are used for this sort of task.

account42 3/27/2025||

Actually it can and it is. Build system dependencies, especially ones that apply to all packages, are something that concerns the distribution as a whole and not something where each developer can just add their favorite one.

johnisgood 3/27/2025||||

I checked the code. Perl is suitable for these kind of tasks.

dannyobrien 3/27/2025||||

some, but not all. There's a bunch of historical code which means that Perl is in the base install, but modern tooling has a lot of Python too, as well as POSIX shell (not bash).

alfiedotwtf 3/27/2025||

Though a lot of the apt tooling is definitely written in Perl the last time I had to deep dive

johnisgood 3/27/2025||

And a lot of OpenBSD-related stuff is written in Perl, too. I do not think it is a bad thing at all.

alfiedotwtf 3/27/2025||

I absolutely love Perl. I'm just so sad Python won because Google blessed it as a language and at the time everyone wanted to work for Google.

Perl always gets hate on HN, but I actually wonder of those commenter, who has actually spent over a single hours using Perl after they've read the Camel book.

Honest opinion: if you're going to be spending time in Linux in your career, then you should read the Camel book at least once. Then and only then should you get to have an opinion on Perl!

freedomben 3/27/2025||

I mostly agree with you, though I do think Perl is genuinely harder to read than many other languages. Perl was often my goto for scripts before I learned Ruby (which has many glorious perl-isms in it even if most rubyists nowadays don't know or want to acknowledge that :-D ), and even looking back at some of my own code and knowing what it does, I have to read it a lot slower and more carefully than most other langs. Perl to me feels wonderfully optimized for writing, sometimes at the expense of reading. I love Perl's power and expressiveness, especially the string processing libs, and while I appreciate the flexibility in how many different ways there are to do things, it does mean that Perl code written by someone else with different approaches can sometimes be difficult to grok. For my own scripts I don't care about any of those issues and I often optimize for writing anyway, but there are plenty of applications where I would recommend against Perl, despite my affection for it.

And yes agree, people should read the camel book!

johnisgood 3/28/2025||

> there are plenty of applications where I would recommend against Perl

Yes of course, I would not write any type of servers in Perl, I would pick Go or Elixir or Erlang for such an use-case.

jeltz 3/27/2025|||

Last time I checked a lot was also written in Python.

o11c 3/26/2025|||

Timestamps are easiest part - you just set everything according to the chosen epoch.

The hard things involve things like unstable hash orderings, non-sorted filesystem listing, parallel execution, address-space randomization, ...

koolba 3/26/2025||

ASLR shouldn’t be an issue unless you intend to capture the entire memory state of the application. It’s an intermediate representation in memory, not an output of any given step of a build.

Annoying edge cases come up for things like internal object serialization to sort things like JSON keys in config files.

kazinator 3/27/2025|||

ASLR means that the pointers from malloc (which may come from mmap) are not predictable.

Sometimes programs have hash tables which use object identity as key (i.e. pointer).

ASLR can cause corresponding objects in different runs of the program to have different pointers, and be ordered differently in an identity hash table.

A program producing some output which depends on this is not necessarily a bug, but becomes a reproducibility issue.

E.g. a compiler might output some object in which a symbol table is ordered by a pointer hash. The difference in order doesn't change the meaning/validity of the object file, but is is seen as the build not having reproduced exactly.

account42 3/27/2025||

That's just one example of nondeterminism in compilers though - at the end it's the responsibility of the compile to provide options not to do that.

kazinator 3/27/2025||

Not for external causes like ASLR and memory allocators; those things should have their respective options for that.

account42 3/28/2025||

There is no guarantee that memory allocation is deterministic even without ASLR. If your program is supposed to be deterministic but its output depends on the memory addresses returned by the allocator then your program is buggy.

cperciva 3/26/2025||||

FreeBSD tripped over an issue recently where a C++ program (I think clang?) used a collection of pointers and output values in an order based on the pointers rather than the values they pointed to.

ASLR by itself shouldn't cause reproducibility issues, but it can certainly expose bugs.

ahartmetz 3/27/2025||

It is sometimes just fine to have a hash table with pointers as keys. It is by design an unordered collection, so you do not care about the order, only about finding entries.

Then at some point you happen to need all the entries, you iterate, and you get a random order. Which is not necessarily a problem unless you want reproducible builds, which is just a new requirement, not exposing a latent bug.

sodality2 3/26/2025|||

Let’s say a compiler is doing something in a multi-threaded manner - isn’t it possible that ASLR would affect the ordering of certain events which could change the compiled output? Sure you could just set threads to 1 but there’s probably some more edge cases in there I haven’t thought of.

zamadatix 3/26/2025||

I think you'd need the compiler to guarantee serialization order of such operations regardless if you used ASLR or not. Otherwise you're just hoping thread scheduling, core clocking, thread memory access, and many other things are the same between every system trying to do a reproducible build. Even setting threads to 1 may not solve that problem class if asynchronous functions/syscalls come into play.

purkka 3/26/2025|||

Generally, yes: https://reproducible-builds.org/docs/timestamps/

Since the build is reproducible, it should not matter when it was built. If you want to trace a build back to its source, there are much better ways than a timestamp.

ryandrake 3/26/2025||

C compilers offer __DATE__ and __TIME__ macros, which expand to string constants that describe the date and time that the preprocessor was invoked. Any code using these would have different strings each time it was built, and would need to be modified. I can't think of a good reason for them to be used in an actual production program, but for whatever reason, they exist.

mananaysiempre 3/26/2025|||

And that’s why GCC (among others) accepts SOURCE_DATE_EPOCH from the environment, and also has -Wdate-time. As for using __DATE__ or __TIME__ in code, I suspect that was more helpful in the age before ubiquitous source control and build IDs.

cperciva 3/26/2025||

Source control only helps you if everything is committed. If you're, say, working on changes to the FreeBSD boot loader, you're probably not committing those changes every time you test something but it's very useful to know "this is the version I built ten minutes ago" vs "I just booted yesterday's version because I forgot to install the new code after I built it".

jrockway 3/27/2025|||

Versions built into the code are nice. I think the correct answer is to commit before the build proper starts (automatically, without changing your HEAD ref) and put that in there. Then you can check version control for the date information, but if someone else happens to add the same bytes to the same base commit, they also have the same version that you do. (Similarly, you can always make the date "XXXXXXXXXXXXXXXXXXXXXX" or something, and just replace the bytes with the actual date after the build as you deploy it.)

What I actually did at $LAST_JOB for dev tooling was to build in <commit sha> + <git diff | sha256> which is probably not amazingly reproducible, but at least you can ask "is the code I have right now what's running" which is all I needed.

Finally, there is probably enough flexibility in most build systems to pick between "reuse a cache artifact even if it has the wrong stamping metadata", "don't add any real information", and "spend an extra 45 cpu minutes on each build because I want $time baked into a module included by every other source file". I have successfully done all 3 with Bazel, for example.

mananaysiempre 3/27/2025||||

> you're probably not committing those changes every time you test something

I’m not, but I really think I should be. As in, there should be a thing that saves the state of the tree every time I type `make`, without any thought on my part.

This is (assuming Git—or Mercurial, or another feature-equivalent VCS) not hard in theory: just take your tree’s current state and put it somewhere, like in a merge commit to refs/compiles/master if you’re on refs/heads/master, or in the reflog for a special “stash”-like “compiles” ref, or whatever you like.

The reason I’m not doing it already is that, as far as I can tell, Git makes it stupendously hard to take a dirty working tree and index, do some Git to them (as opposed to a second worktree using the same gitdir), then put things back exactly as they were. I mean, that’s what `git stash` is supposed to do, right?.. Except if you don’t have anything staged then (sometimes?..) after `git stash pop` everything goes staged; and if you’ve added new files with `git add -N` then `git stash` will either refuse to work, or succeed but in such a way that a later `git stash pop` will not mark these files staged (or that might be the behaviour for plain `git add` on new files?). Gods help you if you have dirty submodules, or a merge conflict you’ve fixed but forgot to actually commit.

My point is, this sounds like a problem somebody’s bound to have solved by now. Does anyone have any pointers? As things are now, I take a look at it every so often, then remember or rediscover the abovementioned awfulness and give up. (Similarly for making precommit hooks run against the correct tree state when not all changes are being committed.)

beecasthurlbow 3/27/2025||

An easy (ish) option here is to use autosquashing [1], which lets you create individual commits (saving your work - yay!) and then eventually clean em up into a single commit!

    git commit -am “Starting work on this important feature”
    
    # make some changes
    git add . && git commit —-squash “I made a change” HEAD

Then once you’re all done, you can do an auto squash interactive rebase and combine them all into your original change commit.

You can also use `git reset —-soft $BRANCH_OR_COMITTISH` to go back to an earlier commit but leave all changes (except maybe new files? Sigh) staged.

You also might check out `git reflog` to find commits you might’ve orphaned.

[1] https://thoughtbot.com/blog/autosquashing-git-commits

lmm 3/27/2025||||

> If you're, say, working on changes to the FreeBSD boot loader, you're probably not committing those changes every time you test something

Whyever not? Does the FreeBSD boot loader not have a VCS or something?

steveklabnik 3/27/2025|||

A subtlety that may be lost: FreeBSD uses CVS, and so there isn't a way to commit locally while you're working, like with a DVCS.

cperciva 3/27/2025||

FreeBSD hasn't used CVS since 2008.

steveklabnik 3/27/2025||

Huh! So, before I posted this, I went to go double check, and found https://wiki.freebsd.org/VersionControl. What I missed was the (now obvious) banner saying

> The sections below are currently a historical reference covering FreeBSD's migration from CVS to Subversion.

My apologies! At the end of the day, the point still stands in that SVN isn't a DVCS and so you wouldn't want to be committing unfinished code though, correct?

(I suspect I got FreeBSD mixed up with OpenBSD in my head here, embarrassing.)

jraph 3/27/2025|||

You could still use git-svn, but yeah, as another commenter wrote, I don't think reproducible build is that useful when debugging, it should be fine to have an actual timestamp in the binaries.

cperciva 3/28/2025|||

Well yes, but we've actually migrated to Git now. ;-)

steveklabnik 3/28/2025||

Welp! Egg on my face twice!

cperciva 3/27/2025|||

It's in the FreeBSD src tree. But we usually commit code once it's working...

lmm 3/29/2025||

Huh. If I was confident enough in a change to consider it worth doing an actual boot to test I'd certainly want to have it committed, to be able to track and go back to it. Even the broken parts of history are valuable IME.

chippiewill 3/27/2025||||

Which is fine, you don't need to use a reproducible build for local dev and can just use the real timestamp.

account42 3/27/2025|||

Nobody cares about reproducibility of local development builds so just limit your use of date/time to those and use a more appropriate build reference for release builds.

repiret 3/26/2025||||

> I can't think of a good reason for them

I work on a product whose user interface in one place says something like “Copyright 2004-2025”. The second year there is generated from __DATE__, that way nobody has to do anything to keep it up to date.

Arelius 3/26/2025||

I mean, you could do that, it's sort-of a lie though, maybe something better would be using the date of the most recent commit, which would be both more accurate, as far as authorship goes, and actually deterministic..

Pipe something like this into your build system:

    date --date "$(git log HEAD --author-date-order --pretty=format:"%ad" --date=iso | head -n1)" +"%Y"

fmbb 3/26/2025||||

Toolchains for reproducible software likely let you set these values, or ensure they are 1970-01-01 00:00:00

mikepurvis 3/26/2025|||

Nix sets everything to the epoch, although I believe Debian's approach is to just use the date of the newest file in the dsc tarballs.

lamby 3/27/2025|||

Debian's approach is actually to use the date specified in the top entry in the debian/changelog file. That's more transparent and resilient than any mtime.

yjftsjthsd-h 3/26/2025|||

Nix can also set it to things other than 0; I think my favorite is to set it by the time of the commit from which you're building.

terinjokes 3/27/2025|||

Which is also used when the contents of a derivation will be included in a zip file. The Unix epoch is about a decade older than the zip epoch.

lamby 3/27/2025|||

Strangely enough, sometimes using the epoch can expose bugs in libraries (etc.) when running or building in a timezone west of Greenwich due to the negative time offset taking time "below" zero.

rtpg 3/27/2025|||

It's super nice to have timestamps as a quick way to know what program you're looking at.

Sticking it into --version output is helpful to know if, for example, the Python binary you're looking at is actually the one you just built rather than something shadowing that

izacus 3/27/2025||

The whole point or reproducible builds is that you don't need to rely on timestamps and similar information to know which binary you're looking at.

paulddraper 3/26/2025|||

> Do they forge them?

Yes. All archive entries and date source code macros and any other timestamps are set to a standardized date (in the past).

lamby 3/27/2025||

This is not quite right. At least in Debian, only files that are newer than some standardised date are to that standardised date. This "clamping" preserves any metadata in older files.

echoangle 3/26/2025|||

Maybe dumb question but why would this change the reproducibility? If you clone a git repo, do you not get the meta data as it is stored in git? Or would the files have the modification date of the cloning?

I never actually checked that.

mathfailure 3/26/2025||

You clone source from git, but then you use them to build some artifacts. The artifacts build time may differ, yet with reproducible builds - the artifact should match.

echoangle 3/26/2025||

Right, but if you only clone and build, why would the files modification date be different compared to the version that was committed to git? Does just cloning a repo already lead to different file modification dates in my local copy?

hoten 3/26/2025||

Git does not store or restore file modification times.

codetrotter 3/26/2025|||

And the reason for that in turn is because if you are on one commit and check out and older commit, then restoring file modification times to what they were at the time of the older commit would cause build tools that look at file modification times to sometimes not pick up on all the changes.

echoangle 3/26/2025|||

Ah ok, that explains it.

HideousKojima 3/26/2025|||

Those aren't needed to generate a hash of a file. And that metadata isn't part of the file itself (or at least doesn't need to be), it's part of the filesystem or OS

imcritic 3/26/2025||

That's an acceptable answer for the simple case when you distribute just a file, but what if your distribution is something more complex, like an archive with some sub-archives? Metadata in the internal files will affect the checksum of the resulting archive.

londons_explore 3/27/2025|||

Finding and fixing cases like this are part of what the project has done...

exe34 3/26/2025|||

unless you fix them to a known epoch.

c0l0 3/26/2025|||

Yes.

TacticalCoder 3/26/2025||

> ... what about files metadata like creation/modification timestamps? Do they forge them?

The least difficult to solve for reproducible build but yes.

The real question is: why, in the past, was an entire ecosystem created where non-determinism was the norm and everybody thought it was somehow ok?

Instead of asking: "how one achieves reproducibility?" we may wonder "why did people got out of their way to make sure something as simple as a timestamp would screw determinism?".

For that's the anti-security mindset we have to fight. And Debian did.

brohee 3/27/2025|||

TBH security is someone the source of the issues, as it often involves adding randomness. For example, replacing deterministic hashes by keyed hashes to protect from hash flooding DoS led to deterministic output becoming nondeterministic (e.g. when displaying a hash table in its natural order).

Sorting had to be added to that kind of output.

BobbyTables2 3/27/2025|||

You’re forgetting that source control used to not be a mainstream practice…

Software was more artisanal in nature…

kroeckx 3/27/2025||

It's my understanding that is about generating the .iso file from the .deb files, not about generating the .deb files from source. Generating .deb from source in a reproducible way is still a work in progress.

abdullahkhalids 3/26/2025||

Is the build infrastructure for Debian also reproducible? It seems like we if someone wants to inject malware in Debian package binaries (without injecting them into the source), they have to target the build infrastructure (compilers, linkers and whatever wrapper code is written around them).

Also, is someone else also compiling these images, so we have evidence that the Debian compiling servers were not compromised?

jzb 3/26/2025||

There's a page that includes reproducibility results for Debian here: https://tests.reproducible-builds.org/debian/bookworm/index_...

I think there's also a similar thing for the images, but I might be wrong and I definitely don't have the link handy at the moment.

There's lots of documentation about all of the things on Debian's site at the links in the brief. And LWN also had a story last year about Holger Levsen's talk on the topic from DebConf: https://lwn.net/Articles/985739/

goodpoint 3/27/2025|||

The whole point of reproducible builds is to ensure security even if buildbots are compromised.

layer8 3/26/2025|||

And what about the hardware on which the build runs? Is it reproducible? ;)

kragen 3/26/2025|||

Working on it! But in general the answer is that for most purposes it's good enough to show that many independently produced pieces of hardware can reproduce the same results.

abdullahkhalids 3/26/2025||||

You are joking. But solving this problem is probably amongst the most important we can have in the information age we live in.

Every country in the world should have the capability of producing "good enough" hardware.

ratmice 3/26/2025||||

And who trusting trusted the original RepRap?

orblivion 3/27/2025||

The 50th generation builds a robot that murders you

TacticalCoder 3/26/2025||||

> And what about the hardware on which the build runs? Is it reproducible? ;)

"Fully Countering Trusting Trust through Diverse Double-Compiling (DDC) - Countering Trojan Horse attacks on Compilers"

https://dwheeler.com/trusting-trust/

If the build is reproducible inside VMs, then the build can be done on different architectures: say x86 and ARM. If we end up with the same live image, then we're talking something entirely different altogether: either both x86 and ARM are backdoored the same way or the attack is software. Or there's no backdoor (which is a possibility we have to fancy too).

nikisweeting 3/26/2025|||

well little johnny, when one hardware loves another hardware very much...

paulddraper 3/26/2025||

A la xz.

You must ultimately root trust in some set of binaries and any hardware that you use.

XorNot 3/26/2025||

For user space? No you can definitely do a stage 0 build which depends only on about 364 bytes of x86_64 binary (though ironically I haven't managed to get this to work for me yet).

The liability is EFI underneath that, and the Intel ring -1 stuff (which we should be mandating is open source).

paulddraper 3/27/2025||

> which depends only on about 364 bytes of x86_64 binary

jesboat 3/27/2025||

that's the point at which you say (reasonably accurately) that the 364 byte thing is written in machine code. it is small enough to manually translate between the binary and asm

geocrasher 3/26/2025||

What is the significance of a reproducible build, and how is it different than a normal distribution?

csense 3/26/2025||

Reproducible: If Alice and Bob both download and compile the same source code, Alice's binary is byte-for-byte identical to Bob's binary.

Normal: Before Debian's initiative to handle this problem, most people didn't think hard about all the ways system-specific differences might wind up in binaries. For example: __DATE__ and __TIME__ macros in C, parallel builds finishing in different order, anything that produces a tar file (or zip etc.) usually by default asks the OS for the input files' modification time and puts that into the bytes of the tar file, filesystems may list files in a directory in different order and this may also get preserved in tar/zip files or other places...

Why it's important: With reproducible builds, anyone can check the official binaries of Debian match the source code. This means going forward, any bad actors who want to sneak backdoors or other malware into Debian will have to find a way to put it in the source code, where it will be easier for people to spot.

sirsinsalot 3/27/2025|||

The important property that anyone can verify the untainted relationship between the binary and the source (providing we do the same for both tool chains, not relying on a blessed binary at any point) is useful if people do actually verify outside the debian sphere.

I hope they promote tools to enable easy verification on systems external to debian build machines.

walrus01 3/26/2025||||

as the 'xz' backdoor was in the source code, and remained there for a while before anyone spotted it, it doesn't necessarily guarantee that backdoors/malware won't make their way into the source of a very-widely-redistributed project.

badsectoracula 3/26/2025|||

Source code availability doesn't mean that backdoors wont be put in place, it just makes it relatively easier to spot and remove them. Reproducible builds mean that the people who look for backdoors, malware, etc can focus on the source code instead of the binaries.

jkaplowitz 3/26/2025||||

Certainly true. But removing some attack vectors still helps security and trustworthiness. These are not all or nothing questions.

jeltz 3/27/2025|||

Only part of the backdoor was in the source code. It was split like that between the tarball and the code to hide it better. But, yes, with reproducible builds they could have put all of it in the source.

floxy 3/27/2025||||

> __DATE__ and __TIME__ macros in C

So how do those work in these Debian reproducible builds? Do they outlaw those directives? Or do they set those based on something other than the current date and time? Or something else?

progval 3/27/2025||

The toolchain (eg. compiler) reads the time from an environment variable if present, instead of the actual time. https://reproducible-builds.org/docs/source-date-epoch/

flkenosad 3/27/2025|||

Thank you for that fantastic explaination.

orblivion 3/27/2025|||

Open source means "you can see the code for what you run". Except... how do you know that your executables were actually built from that code? You either trust your distro, or you build it yourself, which can be a hassle.

Now that the build is reproducible, you don't need to trust your distro alone. It's always exactly the same binary, which means it'll have one correct sha256sum. You can have 10 other trusted entities build the same binary with the same code and publish a signature of that sha256sum, confirming they got the same thing. You can check all ten of those. The likelihood that 10 different entities are colluding to lie to you is a lot lower than just your distro lying to you.

jrockway 3/27/2025|||

Reproducible builds actually solve a lot of problems. (Whether these are real problems, who really knows, but people spend a lot of money to solve them.)

At my last job, some team spent forever making our software build in a special federal government build cluster for federal government customers. (Apparently a requirement for everything now? I didn't go to those meetings.) They couldn't just pull our Docker images from Docker Hub; the container had to be assembled on their infrastructure. Meanwhile, our builds were reproducible and required no external dependencies other than Bazel, so you could git checkout our release branch, "bazel build //oci" and verify that the sha256 of the containers is identical to what's on Docker Hub. No special infrastructure necessary. It even works across architectures and platforms, so while our CI machines were linux / x86_64, you can build on your darwin / aarch64 laptop and get the exact same bytes, every time.

In a world where everything is reproducible, you don't need special computers to do secure builds. You can just build on a bunch of normal computers and verify that they all generate the same bytes. That's neat!

(I'll also note that the government's requirements made no sense. The way the build ended up working was that our CI system build the binaries, and then the binaries were sent to the special cluster, and there a special Dockerfile assembled the binaries into the image that the customers would use. As far as I can tell, this offers no guarantee that the code we said was in the image was in the image, but it checked their checkbox. I don't see that stuff getting any better over the next 4 years, so...)

genpfault 3/26/2025|||

https://en.wikipedia.org/wiki/Reproducible_builds

https://wiki.debian.org/ReproducibleBuilds/About

b112 3/26/2025|||

It means you can build it yourself, and know the source code you have, is all there is.

It validates that publicly available downloads aren't different from what is claimed.

rstuart4133 3/26/2025||

It's a link in a chain that allows you to trust programs you run.

- At the start of the chain, developers write software they claim is secure. But very few people trust the word of just one developer.

- Over time other developers look at the code and also pronounce it secure. Once enough independent developers from different countries and backgrounds do this, people start to believe it really is secure. As measure of security this isn't perfect, but it is verifiable and measurable in the sense more is always better, so if you set the bar very high you can be very confident.

- Somebody takes that code, goes through a complex process to produce a binary, releases it, and pronounces it is secure because it is only based on code that you trust, because of the process above. You should not believe this. That somebody could have introduced malicious code and you would never know.

- Therefore before reproducible builds, your only way to get a binary you knew was built from code you had some level of trust in was to build it yourself. But most people can't do that, so they have to trust that Debian, Google, Apple, Microsoft or whoever that are no backdoors have been added. Maybe people do place their faith in those companies, but is is misplaced. It's misplaced because countries like Australia have laws that allow them to compel such companies to silently introduce malicious code and distribute it to you. Australia's law is called the "Assistance and Access Bill (2018)". Countries don't introduce such laws for no reason. It's almost certain it is being used now.

- But now the build can be reproducible. That means many developers can obtain the same trusted source code from the source the original builder claimed he used, build the binary themselves, verify it is identical to the original so publicly validate the claim. Once enough independent developers from different countries and backgrounds do this, people start to believe it really built from the trusted sources.

- Ergo reproducible builds allow everyone, as opposed to just software developers, to run binaries they can be very confident was built just from code that has some measurable and verifiable level of trustworthiness.

It's a remarkable achievement for other reasons too. Although the ideas behind reproducible builds are very simple, it turned out executing it was about as simple as other straightforward ideas like "lets put a man on old moon". It seems build something as complex as an entire OS was beyond any company, or capitalism/socialism/communism, or a country. It's the product of something we've only seen arise in the last 40 years, open source, and it been built by a bunch of idealistic volunteers who weren't paid to do it. To wit: it wasn't done by commercial organisations like RedHat, or Ubuntu. It was done by Debian. That said, other similar efforts have since arisen like F-Droid, but they aren't on this scale.

zozbot234 3/26/2025||

Nice, these live images could become the foundation for a Debian-based "immutable OS" workflow.

polynox 3/27/2025|

That is the goal of Vanilla OS! https://vanillaos.org/

moondev 3/26/2025||

Do these live images come ready with cloud-init? A cloud-init in-memory live iso seems perfect for immutable infrastructure "anywhere"

bravetraveler 3/27/2025|

Should be trivial to put in, if not. Install the package and maybe prepare some datasource hints while reproducing the image. Depends on where you'll be using it.

The trick will be in the details, as usual. User data that both does useful work... and plays nicely with immutability.

I suspect it would be more sensible to skip the gymnastics of trying to manicure something inherently resistant, and instead, lean in on reproducibility. Make it as you want it, skip the extra work.

Want another? Great - they're freely reproducible :)

kragen 3/26/2025||

This is a huge milestone: https://lists.reproducible-builds.org/pipermail/rb-general/2...

Cort3z 3/26/2025||

I’m a noob to this subject. How can a build be non-reproducible? By that, I mean, what part of the build process could return non-deterministic output? Are people putting timestamps into the build and stuff like that?

r3trohack3r 3/26/2025||

File paths, timestamps, unstable ordering of inputs/outputs, locals, version info, variations in the build environment, etc.

This pages has a good write up

https://reproducible-builds.org/docs/

jcranmer 3/26/2025||

Timestamps, timestamps, absolute paths (i.e., differences between building /src versus /home/Cort3z/source), timestamps, file inode numbering ("for file in directory" defaults to inode order rather than alphabetical order in many languages, and that means it's effectively pseudorandom), more timestamps, using random data in your build process (e.g., embedding a generated private key, or signing something), timestamps, and accidental nondeterminism within the compiler.

By far the most prevalent source of nondeterminism is timestamps, especially since timestamps crop up in file formats you don't expect (e.g., running gzip stuffs a timestamp in its output for who knows what reason). After that, it's the two big filesystem issues (absolute paths and directory iteration nondeterminism), and then it's basically a long tail of individual issues that affect but one or two packages.

yupyupyups 3/26/2025|

This is amazing news. Well done!

More comments...