Posted by nl 6 hours ago
Why even do a census if you're just going to synthesize random data as the last step?
It must therefore be maximally transparent. Do you want president Trump or palantir to decide on the "noise infusion" algorithm?
also, if how would anyone know how accurate the "transparent" number is? if Trump or Thiel can fuck with the fuzzing they can just as do so with the base data.
Eg via some app that instructs respondents to enter a specific answer in a pseudorandomly chosen question.
Of course security would be another question.
Do. The American Census Survey (randomly-selected long-form questionairre) is dangerously overinvasive.
SELECT a.province, COUNT(DISTINCT b.id_num) FROM registry a INNER JOIN national_id b ON a.nat_id_num = b.id_num WHERE timeframe = 2026-01-01 GROUP BY 1
I was a big fan of differential privacy but now I think it might be doing more harm than good, as I haven't seen a single case where it was applied successfully in a problem where it actually mattered, and it contributed strongly to discrediting and preventing a lot of work on other anonymization techniques as it was deemed the only way to preserve privacy by the research community, so showing up with enhancements to k-anonymity or any other noise mechanism not rooted in it was a sure way to get ridiculed and ignored. And it's just not a practical mechanism, even when it works for a single disclosure you always end up having to blow up the privacy budget to a ridiculous amount in order to keep disclosing statistics as otherwise you would for almost all real-world data run out of budget after a few publications.
So, for me it's a technique that works in the areas where it doesn't really matter (publishing highly aggregated statistics that pose almost zero privacy risk even without differential privacy) and doesn't work in other areas where it would actually matter (publishing fine-grained data about individuals or small groups). There are some niche use cases but in my view the privacy community has really overblown the importance of differential privacy by portraying it as the only way to reliably anonymize data.
BTW the German census bureau has an interesting approach to anonymization which they use for several decades already and so far I haven't heard of any cases of successful de-anonymization of the data, maybe the US bureau should have a look at that for their own needs.
As the article says anytime you want to enforce privacy, the data becomes somewhat less useful, there is just no way around that.
The point of rights is that we have them and that they should not be trampled upon when they become slightly inconvenient to someone in power.
1: https://pmc.ncbi.nlm.nih.gov/articles/PMC8494446/?utm_source...
They weren't prepared for data that was obviously noisy. The data has always been inherently inaccurate, and folks just chose to ignore that previously
1: https://www.aeaweb.org/articles?id=10.1257%2Fpandp.20191107&... 2: https://www.science.org/doi/10.1126/sciadv.abk3283?utm_sourc... 3: https://www.nationalacademies.org/read/27150/chapter/14