Posted by e_daigle 14 hours ago
Obviously ratelimiting is a separate and important issue in api management.
The thing about building secure systems is that there are a lot of edges to cover.
Also, average Joe is not using proxy to hide the IP-address of their device so they leak their identity to the server anyway. Signal is not keeping those logs so that helps.
Messaging apps cater to different needs, sometimes you need only content-privacy. It's not a secret you're married to your partner and you talk daily, but the topics of the conversation aren't public information.
When you need to hide who you are and who you talk to (say Russian dissident group, or sexual minorities in fundamentalist countries), you might want to use Tor-exclusive messaging tools like Cwtch. But that comes at a near-unavoidable issue of no offline-messaging, meaning you'll have to have a schedule when to meet online.
Signal's centralized architecture has upsides and downsides, but what matters ultimately is, (a) are you doing what you can in the architectural limitations of the platform (strong privacy-by-design provides more features at same security level), and (b), are you communicating the threat model to the users so they can make informed decision whether the applications fits their threat model.
That Signal did none of those things implies that privacy was not their objective. Only secure communications was.
It's possible that the reason behind their anti-privacy stand is strategic, to discourage criminal use which could be used as a vector of attack against them. Doesn't change the fact that Signal is demonstrably anti-privacy by design.
> privacy was not their objective. Only secure communications was.
> Signal is demonstrably anti-privacy by design.
But your second is uncharitable and misses Signal's historical context.
The value of a phone number for spam prevention has been mentioned, but that's not the original reason why phone numbers were central to Signal. People forget that Signal was initially designed around using SMS as transport, as with Twitter.
Signal began as an SMS client for Android that transparently applied encryption on top of SMS messages when communicating with other Signal users. They added servers and IP backhaul as it grew. Then it got an iOS app, where 3rd party SMS clients aren't allowed. The two clients coexisted awkwardly for years, with Signal iOS as a pure modern messenger and Signal Android as a hybrid SMS client. Finally they ripped out SMS support. Still later they added usernames and communicating without exposing phone numbers to the other party.
You can reasonably disdain still having to expose a phone number to Signal, but calling it "anti-privacy by design" elides the origins of that design. It took a lot of refactoring to get out from under the initial design, just like Twitter in transcending the 140-character limit.
> You can reasonably disdain still having to expose a phone number to Signal, but calling it "anti-privacy by design" elides the origins of that design.
They introduced usernames without removing the requirement for phone numbers.
I rest my case.
> Not a very good case made since you obviously didn’t read the parent discussion.
This isn't an argument, do you have anything to back up your assertion?
Perfect privacy would mean not sending any messages at all, because you can never prove the message is going to the intended recipient. Any actual system is going to have tradeoffs, calling Signal anti-privacy is not serious, especially when you're suggesting cryptocurrency as a solution.
A ZKP system where you make a public record of your zero-knowledge proof sounds anti-privacy to me. Even if you're using something obfuscated like Monero, it's still public. I see where you're coming from, but I think I would prefer Signal just keep a database of all their users and promise to try and keep it safe rather than rely on something like Monero.
They have exactly that. They rely on TPMs for "privacy" which is not serious.
> Perfect privacy would mean not sending any messages at all
Not sending messages is incompatible with secure messaging which is the subject of the discussion...
> ZKP system where you make a public record of your zero-knowledge proof sounds anti-privacy to me.
A zero-knowledge proof provably contains zero information. Even if you use a type of ZKP vulnerable to a potential CRQC it's still zero information and can never be cracked to reveal information (a CRQC could forge proofs however).
> especially when you're suggesting cryptocurrency as a solution
Would you elaborate on why cryptocurrencies are not a solution? Especially if combined with ZKPs to sever the connection between the payment and the account. When combined with ZKPs, they could even accept Paypal for donations in exchange for private accounts.
Signal is just much smaller in terms of users so the potential value is lower.
So who's doing the computation? The spammer can't afford to run 3 second key derivation time per spam device? Or how long do you think normal user will wait while you burn their battery power before saying "Screw it, I'll just use WA"? Or is this something the server should be doing?
>Captcha
LLMs are getting quite good at getting around captchas.
>invite-code system
That works in lobste.rs when everyone can talk together, and recruit interesting people to join the public conversation. Try doing that with limited invites to recruit your peers to build a useful local network of peers and relatives. "I'm sorry Adam, I'm out of invites can you invite my mom's step-cousin, my mom needs to talk to them?"
>Signal's architects already knew that when they started designng it.
I think they really did, and they did what the industry had already established as the best practice for a hard problem.
The only reasonable alternative would've been email with heavy temp-mail hardening, or looking into the opposite end of Zooko's triangle and having long, random, hard-to-enumerate usernames like Cwtch and other Tor-based messengers do. But even that's not removing the spam-list problem of any publicly listed address ending up in a list that gets spammed with contact requests or opening messages with spam.
The user's device has to do the computation for it to be effective. How long does it normally take to sign up for a new messaging service like WhatsApp? Five minutes? You should burn the user's cellphone battery for about half that long, 150 seconds, 50 times more than you were thinking. Plus another half-minute every time you add a new contact. Times two for every time someone blocks you, up to a limit of 150 seconds. Minus one second for each day you've been signed up. Or something like that.
The value of signing up for Signal is much higher to a real user than it is to a spammer, so you just have to put the signup cost somewhere in the wide range in between.
LLMs didn't exist when Signal was designed, and Captchas still seem to be getting a lot of use today.
Invite codes worked fine for Gmail, and would work even better for any kind of closed messaging system like Signal; people who don't know any users of a particular messaging system almost never try to use it. The diameter of the world's social graph is maybe ten or twelve, so invite codes can cover the world's social graph with only small, transitory "out of invites" problems.
The "industry" had "established" that they "should" gather as much PII as possible in order to sell ads and get investments from In-Q-Tel.
If you actually do that you're going to crash a lot of cellphones and people will rightly blame your app for being badly coded.
Back in 2004, sure. Today, Gmail asks you for a phone number when signing up because of the spam problem.
Yeah, what could I possibly know about secure messaging.
>Plus another half-minute every time you add a new contact.
Can you point to some instant messaging app that has you wait 30 seconds before talking to them? Now niché is it?
You want proper uptake and accessibility to everyone, you need something like Samsung A16 to run the work in 150 seconds. Some non-amateur spammer throws ten RTX 5090s to unlock access to random accounts at 80x parallelism (capped by memory cost), with the reasonable time cost of whatever iterations that is, with quite a bit shorter time than 150 seconds. 121.5GFLOPs vs 10x104.8 TFLOPs leads to overall performance difference of 8,800x. And that account is then free to spam at decent pace for a long time before it gets flagged and removed.
The accounts are not generated in five minutes per random sweat shop worker: https://www.youtube.com/watch?v=CHU4kWQY3E8 has tap actions synced across sixty devices. And that's just to deal with human-like captchas that need to show human-like randomness. Proof-of-work is not a captcha, so you can automate it. Signal's client is open source for myriad of reasons, the most pressing of which is verifiable cryptographic implementations. So you can just patch your copy of the source to dump the challenge and forward it to the brute force rig.
Either the enumeration itself has to be computationally infeasible, or it has to be seriously cost limited (one registration per 5 dollar prepaid SIM or whatever).
>Invite codes worked fine for Gmail
Yeah and back in ~2004 when Hotmail had 2MB of free storage, GMail's 1,000MB of free storage may have also "helped".
Scrypt is memory-hard precisely to defeat attacks like that, which reinforces my belief that you don't know what you're talking about. It doesn't matter how many FLOPS or integer MIPS you have.
Waze was also invite-only, G+ was initially invite only. Did that model help or hurt them?
G+ didn't have that problem so much, but I don't remember it using invite codes.
But that would still put the CPM of the spam around US$2, which very few spammers can afford. Maybe mesothelioma lawyers and spearphishers.
You don't have to make spamming physically impossible, just unprofitable.
I thought the general belief (e.g., '“Proof-of-Work” Proves Not to Work') was that proof-of-work isn't very good anti-spam.
> or a Captcha
Aren't bots better at those than humans by now?
And making people do captchas in an instant messenger is a great way to make people not use that instant messenger.
> or an invite-code system like lobste.rs or early Gmail.
That's not a long-term option if you want to make something mainstream.
Bots may be better than humans at Captchas now, although I'm not certain of that, but they certainly weren't when Signal was designed.
I don't see why invite codes would be a problem for mainstream use.
Different system. The parent and GP are talking about proof-of-work being used directly for account creation. If a chat service required mining-levels of PoW (and hence any prospective new users to have an ASIC), it would not be very popular. Nor would it be very popular if it used a relative difficulty system and the spammers used dedicated servers while the legitimate users had to compete using only their phones.
I'm not saying you're wrong, but I have no idea what you're getting at, because the sentence sounds kind of absurd. As a result, I'm not sure if it addresses your point, but just to throw it out there: Bitcoin and anti-spam are different applications of proof of work. Anti-spam has to strike a compromise between being cheap for the user (who is often on relatively low-powered mobile hardware), and yet annoying enough to deter the spammer. It's not unreasonable to believe that such a compromise does not exist.
> Bots may be better than humans at Captchas now, although I'm not certain of that, but they certainly weren't when Signal was designed.
Fair point, but again, even in 2014, an instant messenger with captchas would have much more friction than every other messenger. And captchas aren't just bad because they introduce enough friction to drive away pretty much everybody: they also make users feel like they're being treated as potential criminals.
> I don't see why invite codes would be a problem for mainstream use.
Can you elaborate? Invite codes blocking access to the service itself "like lobste.rs" mean that no one can use your service unless they've been transitively blessed by you. That's obviously going to limit its reach...
I don't think a Captcha for signup would have been much friction. Certainly less than providing a phone number.
Why would someone want to use a closed messaging service like Signal unless they knew an existing user? I don't think that the requirement for that existing user to invite them would be a significant barrier. So I think it's not going to limit its reach.
Groups in messaging apps rarely contain more than 100 users. So invite codes can work well for messaging apps.
Should have deleted my account instead of just removing the app, because it turns out the difference between using signal and using SMS is obscured for most phones, and when people thought they were texting me they weren't. I was just out of contact for a long time as people kept sending me the wrong kind of messages. I suppose one could argue protecting contact/identity is not a real goal for e2e encryption, but what I see is a "privacy oriented" service that's clearly way too interested in bootstrapping a user base with network effects and shouldn't be trusted.
Those people already had your contact info, probably.
Also, I think there is a setting in Signal to prevent that - and via the OS you can block Signal's access to your contacts, of course.
What leaked was that I was a signal user, and that the person on the other side was a signal user. The security implications are obvious, and by itself, that's already enough to get someone who really needs to care about privacy killed.
> Also, I think there is a setting in Signal to prevent that
False. It happened without my permission as soon as the app was installed, and there was no way to opt out. Maybe they changed it since then, but the fact remains they obviously cared more about network-effects and user-counts than user privacy.
Sigh, there's just no need for this kind of apologism. You could just admit that a) it's bad behavior, b) they did it on purpose, and c) it's not possible to trust someone who does something like this. I'm aware they are nonprofit, so I don't know why it's like this, but the answer is probably somewhere in the list of donors.
I understand the unease about the notifications, but there are some hard tradeoffs between how you can store as little information as possible, remain as decentralized as possible, while getting the same benefits as centralized systems like Facebook.
I'm really of the opinion that a messenger similar to Signal but more centralized in the fashion of WhatsApp or even Facebook Messenger should exist, but I also understand why Signal works the way it does.
Yeah, no. The whole "every perspective has some validity" thing won't really apply to most safety/security issues. The most charitable thing to say here is that the workflow is completely broken. Less charitable but also valid is pointing out that it's actively harmful, and deliberate. I would be really surprised if this hadn't ever caused serious consequences whether a whistle blower was fired, an abused spouse got extra abused, or an informant was killed. If you think you've got a "valid perspective" that prioritizes mere user-discovery over user-safety, then you should not be attempting work that's close to safety and security, full stop.
Seems like it was working as designed, if you don't want any app to get your contact info don't share your contact info to anyone ever. Eventually they will share that info with any app.
Now that this crucial adoption feature has been removed, it makes zero sense for Signal to continue to rely on phone numbers. Since that feature has been removed, the utility of Signal has been lost anyway and many in my groups returned to regular SMS. So the system is already compromised from that perspective. At least forks such as Session tried to solve this (too bad Session removed forward secrecy and became useless)
We know from subpoenas that signal only holds the user phone number, creation timestamp, and last login timestamp. That’s it.
It’s a compromise meant to propagate the network, and it has a high degree of utility to most users. There are also plenty of apps that are de-facto anonymous and private. Signal is de facto non-anonymous but private, though using a personally identifiable token is not a hard requirement and is trivial to avoid. (A phone number of some kind is needed once for registration only)
While slightly unrelated, I thought, how we can fix this for truly secure and privacy-aware, non-commercial communication platforms like Matrix? Make it impossible to build such mapping. The core idea is that you should be able to find the user by number only if you are in their contact list - strangers not welcome. So every user, who wishes to be discovered, uploads hash(A, B) for every contact - a hash of user's phone number (A) and contact's phone number (B), swapped if B < A. Let's say user A uploaded hashes h(A,B) and h(A,C). Now, user B wishes to discover contacts and uploads hashes h(A, B) and h(B, D). The server sees matching hashes between A and B and lets them discover each other without knowing their numbers.
The advantages:
- as we hash a pair of 9-digit numbers, the hash function domain space is larger and it is more difficult to reverse the hashes (hash of a single phone number is reversed easily)
- each user can decide who may discover them
Disadvantages:
- a patient attacker can create hashes of A with all existing numbers and discover who are the contacts of A. Basically, extract anyone's phone book via discovery API. One way to protect against this would be to verify A's phone number before using discovery, but the government, probably, can intercept SMS codes and pass the verification anyway. However, the government can also see all the phone calls, so they know who is in whose phone book anyway.
- if the hash is reversed, you get pairs of phone numbers instead of just one number
Meanwhile, Matrix for now does support hashed contact lookup, although few clients implement it given the privacy considerations at https://spec.matrix.org/unstable/identity-service-api/#secur...
Especially just being able to run my own service will be priceless when something like chatcontrol eventually makes it through. Signal can only comply or leave, but they'll never manage to kill all the matrix servers around.
The data Signal has is: 1) registration time for a given phone number, 2) knowledge of daily login (24hr resolution). That's it. That's the metadata.
They do not have information on who is communicating with who, when messages are sent, if messages are sent, how many, the size, or any of that. Importantly, they do not have an identity (your name) associated with the account nor does that show for contacts (not even the phone number needs be shared).
Signal is designed to be safe from Signal itself.
Yes, it sucks that there is the phone number connected to the account, but you can probably understand that there's a reason authorities don't frequently send Signal data requests; because the information isn't very useful. So even if you have a phone number associated with a government ID (not required in America) they really can only show that you have an account and potentially that the account is active.
Like the sibling comment says, there's always a trade-off. You can't have a system that has no metadata, but you can have one that minimizes it. Signal needs to balance usability and minimize bots while maximizing privacy and security. Phone numbers are a barrier to entry for bots, preventing unlimited or trivial account generation. It has downsides but upsides too. One big upside is that if Signal gets compromised then there's can be no reconstruction of the chat history or metadata. IMO, it's a good enough solution for 99.9% of people. If you need privacy and security from nation state actors who are directly targeting you then it's maybe not the best solution (at least not out of the box) but otherwise I can't see a situation where it is a problem.
FWIW, Signal does look to be moving away from phone numbers. They have usernames now. I'd expect it to take time to completely get away though considering they're a small team and need to move from the existing infrastructure to that new one. It's definitely not an easy task (and I think people frequently underestimate the difficulty of security, as quoted in the article lol. And as suggested by the op: it's all edge cases)
What's wrong with account generation? Nothing. The problem is if they start sending spam to random people. So we can make registration or adding contacts paid (in cryptocurrency) and the problem is gone.
The majority of the user base would be gone, too.
I had a hard enough time convincing my friend group to use Signal as is. If they had to pay (especially if it had to be via cryptocurrency) none of them would have ever even considered it.
Most people would not, though, and that's the issue.
> What's wrong with account generation?
Your comment *literally* explains one issue...What's right with it? Accounts being generated (i.e. many inauthentic accounts controlled by few people) are always used to send spam, there are no exceptions. The perpetrators should be in prison.
> Does Signal protect from the scheme when the government sends discovery requests for all existing phone numbers (< 1B) and gets a full mapping between user id and phone number?
Signal does have the phone numbers, as you say. Can they connect a number to a username?
> That doesn't answer the GP question:
It does.They asked
>>> Does Signal protect from the scheme when the government sends discovery requests for all existing phone numbers (< 1B) and gets a full mapping between user id and phone number?
Which yes, this does protect that. There is no mapping between a user id and phone number. Go look at the reports. They only show that the phone number has a registered account but they do not show what the user id is. Signal doesn't have that information to give. > Can they connect a number to a username?
From Signal Usernames in Signal are protected using a custom Ristretto 25519 hashing algorithm and zero-knowledge proofs. Signal can’t easily see or produce the username if given the phone number of a Signal account. Note that if provided with the plaintext of a username known to be in use, Signal can connect that username to the Signal account that the username is currently associated with. However, once a username has been changed or deleted, it can no longer be associated with a Signal account.
This is in the details on[0] right above the section "Set it, share it, change it"So Signal cannot use phone numbers to identify usernames BUT Signal can use usernames to identify phone numbers IF AND ONLY IF that username is in active use. (Note that the usernames is not the Signal ID)
If you are worried about this issue I'd either disable usernames or continually rotate them. If the username is not connected with your account at the time the request is being made then no connection can be made by Signal. So this is pretty easy to thwart, though I wish Signal included a way to automate this (perhaps Molly has a way or someone can add it?) Either rotating after every use or on a timer would almost guarantee that this happens given that it takes time to get a search warrant and time for Signal to process them. You can see from the BigBrother link that Signal is not very quick to respond...
(I would be thrilled to learn that this changed, but it has been in place for many years and it's kinda hard to personally test)
discoverability does default to "on", but there is an opportunity to disable it during registration, which prevents those notifications.
Simplex was a decent option but they're going down the crypto rabbit hole and their project lead is...not someone who should be trusted by anyone in the crosshairs right now.
It's not hard to do so, so if they're having difficulty doing that, what other simple things are they having difficulty with? Why would anyone hinge their safety and well being on the whims of such a person?
I say this as a person who bought into the initial concept, and who has used it myself.
The CEO vanished from the discussion (again) so my proposals to improve ease of use of Tor never reached them. You can catch up on the discussion at https://discuss.privacyguides.net/t/simplex-vs-cwtch-who-is-...
I liked the SimpleX concept, but would prefer its relay server were replaced by Tor or i2p network.
And if they used Signal instead of NIH protocol.
Actually, the only unique SimpleX feature I really like is that it uses separate ids for every connection and group.
Signal mostly.
>separate ids for every connection and group
The thing is, there's Akamai and Runonflux, two companies hosting the entire public SimpleX infrastructure. If you're not using Tor and SimpleX Onion Services with your buddies, these two companies can perform end-to-end correlation attacks to spy on which IPs are conversing, and TelCos know which IPs belong to which customers at any given time. Mandatory data retention laws about the assigned IPs aren't rare.
As long as IP leaks are possible, I'd rather also use Signal, where at least the rest is battle tested and state of the art.
My concern with Signal is they'll either comply or move out of the EU with the incoming Chat Control, and I'd rather have a fully decentralized messenger with as few leaks as possible.
Phreeli [https://www.phreeli.com/] allows you to get a cell number with just a zip code. They use ZKP (Zero Knowledge Proofs) for payment tracking.
My Signal number is a Google Voice number that has nothing to do with any mobile phone. The Google account has advanced protection turned on so you can’t port it or get the SMSes without a hardware login token.
Do you have further reading on this?
To cut it short they use Intel SGX to create a "trusted environment" (trusted by the app/user) in which the run the contact discovery.
In that trusted environment you then run algorithms similar to other messengers (i.e. you still need to rate limit them as it's possible to iterate _all_ phone numbers which exist).
If working as intended, this is better then what alternatives provide as it doesn't just protect phone numbers from 3rd parties but also from the data center operator and to some degree even signal itself.
But it's not perfect. You can use side channel attacks against Intel SGX and Signal most likely can sneak in ways for them to access things by changing the code, sure people might find this but it's still viable.
In the end what matters is driving up the cost of attacks to a point where they aren't worth in all cases (as in either not worth in general or in there being easier attack vectors e.g. against your phone which also gives them what they want, either way it should be suited for systematic mass surveillance of everyone or even just sub groups like politicians, journalists and similar).
Signal provides content-privacy by design with E2EE. Signal provide metadata-privacy by policy, i.e. they choose to not collect data or mine information from it. If you need metadata-privacy by design, you're better off with purpose-built tools like Cwtch, Ricochet Refresh, OnionShare, or perhaps Briar.
- verify the attestation
- make sure it means the code they have published is the attested code
- make sure the published code does what it should
- and catch any divergence to this *fast enough* to not cause much damage
....
it's without question better then doing nothing
but it's fundamentally not a perfect solution
but it's very unclear if there even is a perfect solution, I would guess due to the characteristics of phone numbers there isn't a perfect solution
This might be the fault of opt-out serialization library (by default it serializes the whole object and you need to manually opt-out fields from it). So a programmer adds a field, forgets to add opt-out annotation and voilà.
Or they are just using plain JS dicts on the server and forgot to remove the key before using it in a response.
> The vulnerability they’re talking about was presented in a paper by researchers at the University of Vienna.
This vulnerability (mapping phone numbers to user id via rendevouz API) is old and was exploited in 2016 in Telegram [1] and allowed Iranian govt to build a phone book of 15M Telegram users. The paper also mentions that the vulnerability was known in 2012, still not fixed.
In a previous job, on my first audit of the code, I spotted such vulnerabilities pretty much everywhere.
Developers simply need to stop using these libraries.
There should never be a need to return a pin to the client. You’ve already texted/emailed it to them. They are going to send it back to you. You will check against your temporary storage, verify/reject, and delete it immediately after.
I keep seeing people try to explain away incompetence by blaming unaccountable things aka the tool or system. Exposed password? Must be the library. People really should stop using it. No, the library is not wrong, ppl should be better developers.
Peer reviewed paper is full of AI slop, must not be the reviewer’s fault, the citations were there, they were just fake. What is going on?
"Yeah but it wasn't in the docker tutorial I skimmed so I have no idea what it means."
They are likely a bit of both, increasingly more so going forward.
- some checks are straightforward and it would be dumb to use AI for them
- some checks require AI
Lots of things are really simple. But you have to know about them first.
but after having seen IRL people accidentally overlooking very basic things I now (since a few years) think using them is essential, even through they often suck(1).
(1): Like due to false positives, wrong severity classifications, wrong reasoning for why something is a problem and in generally not doing anything application specific, etc.
I mean who would be so dump to accidentally expose some RCE prone internal testing helper only used for local integration tests on their local network (turns out anyone who uses docker/docker-compose with a port mapping which doesn't explicitly define the interface, i.e. anyone following 99% of docker tutorials...). Or there is no way you forget to set content security policies I mean it's a ticket on the initial project setup or already done in the project template (but then a careless git conflict resolution removed them). etc.
Never heard of the wrench technique? It's always gonna work out great. Way cheaper and easier than "wizardy" too.
Ultimately, you're just buying time, generating tamper evidence in the moment, and putting a price-tag on what it takes to break in. There's no "perfectly secure", only "good enough" to the tune of "too much trouble to bother for X payout."
It may not be pertinent to the subject, but clearly I have found a kindred spirit in this author.
I'd only have 20 cents, which I guess is good. But I'm sure there's more I'm forgetting.
Related:
[1] https://news.ycombinator.com/item?id=44684373
I asked because both political parties have chapters at national, regional, state & local levels so "GOP job board" on the face wasn't clear which organization was running it. Some parties cover rural counties of just a few thousand people.
The real lesson: assume every service will eventually leak something. Use unique passwords everywhere, enable 2FA, and rotate credentials after breaches.
The tedious part is the rotation. I've seen people skip it because manually changing 50+ passwords is brutal. Automation helps but needs to be done securely (local-only, zero-knowledge).
we consistently have data breaches in institutions we trust is converging to a point where its literally just a data harvesting ops and everybody stops caring. They won't even bother to join class action lawsuits anymore because the rewards enrich the lawyers while everybody gets their twenty bucks in the mail after providing more personal data to the law firm its like a loophole.
we now have legalized insider trading in the form of "prediction markets", legalized money laundering and pump and dump through crypto, all of these always lead to failures for the participant disguised as wins.
Have they?