Posted by Bogdanp 8/30/2025
At one point in the late 1980's Microsoft had a GREATER than 100% market share of the Macintosh spreadsheet market.
How is this possible?
Market share (for a given period) is the participant's sales in the market divided by total sales. It just so happened that Lotus had more returns than sales of their failed spreadsheet, Lotus Jazz. So Lotus, had a negative market share and Microsoft had more sales of Excel than total sales in the market, resulting in a greater than 100% market share.
I don't remember the exact numbers and I believe there was at least one other competitor in the study. But let's just say the numbers were:
Microsoft: 102% Lotus: -2%
In that case the Herfindahl–Hirschman Index would be 102^2 + (-2)^2 = 10404 + 4 = 10408.
So, in this pathological case it is possible for the HHI to exceed 10,000.
Edited: Added (for a given period) above, for clarity.
I did however, find this humorous anecdote:
> A Lotus executive later joked, "The first month we shipped 62,000 copies, and the following month we got 64,000 copies back. It was such a failure they sent us the bootlegged copies back."
That's not a thing. Market share has no time period. It's an instantaneous measure of the state of things right now. How many of your company's units are out in the wild right now being used, vs. how many total units (sold by any company) are out there in the wild right now being used.
That number can and will be different at different points of time, as people buy and return your products, and buy and return your competitors' products[0]. You can certainly say that your market share increased by 300% or by -40% over a given period of time, but your actual market share is always a number between 0% and 100%.
> Market share (for a given period) is the participant's sales in the market divided by total sales.
No, that's a company's share of sales as compared to the industry/product category as a whole. Not market share.
[0] You also should take into account people who throw your product in the trash (or for software, delete it) without returning it. Depending on context, you might even want to take into account people who put your product in a box in their basement and never use it again. Assuming you could actually divine those numbers, which of course you likely can't.
https://en.wikipedia.org/wiki/Market_share
There are multiple methods of measuring multiple (related) things. What you are describing sounds more like the share of the installed base, which only works for certain types of products. (i.e. it doesn't work for consumables like apples or electricity)
The squared sum of normalized shares proves to be very useful in a lot of contexts -- not just market share. Voting is one great example.
Exactly. That's the way accounting works. They did not know in the previous quarter that the product would be returned in the following quarter, so they end up having negative sales in the current quarter.
Yes it produces "garbage output", which I find amusing.
You should stop, reflect on this fact for a moment, then go pick up a goddamn book
Sure. As long as we keep in mind that "return" doesn't mean "reverted sale", but "reverted shipment to retail".
Despite the smaller total numbers in Mastadon, it's great to see that the ecosystem seems to be successfully avoiding centralization like we've seen in the AT-Proto ecosystem.
I suspect that the cost of running AT proto servers/relays is prohibitive for smaller players compared to a Mastadon server selectively syndicating with a few peers, but I say this with only a vague understanding of the internals of both of these ecosystems.
The expensive things in ATProto are the Relay (crawls/listens to PDSs to produce the firehose) and the AppView (keeps a DB of all posts/likes/etc to serve users' requests). Expensive at scale anyway; if you want your own small network for hosting non-Bluesky posts (like WhiteWind's longer character limit), the event volume will be manageable.
For a lot of stuff though ATProto is built in a way that you shouldn't have to host your own; you can implement your own algorithmic feed that reads from the Bluesky Relay's firehose, or your own frontend that still gets data from the Bluesky AppView.
ATProto isn't "built this way".
Twitter was also built in a way where you could implement your own stuff - and then Twitter took that away.
With Mastodon, there is one large instance (controlled by the non-profit Mastodon gGmbH). If they tried closing themselves off, their users would be losing access to the majority of people in the network. Plus, while non-profits aren't perfect, they don't have VC investors to answer to.
Bluesky could decide to stop publishing the firehose or restrict its APIs - just as Twitter did. Given that they control 99.55% of the network, they can close it off without worrying about their users losing access to anything. And Bluesky is a for-profit company that has raised around $30M in VC.
What you talk about isn't a feature of ATProto. It's a feature of being centralized and having a company willing to let you use their servers for free (at least for now). This was the case with Twitter for a long time. You could read the Twitter firehose and build your own apps and frontends getting data from the Twitter APIs - just as you can do from Bluesky today.
But unless there's a reason why shutting off the firehose/APIs would be bad for Bluesky, they can do that at anytime. It might anger some users (as Reddit and Twitter both did), but they control the network and network effects are powerful. For most Bluesky users, they'd continue using it because they aren't there for some open protocol. They're on Bluesky because Twitter became a nazi bar. Until we see real decentralization with ATProto, it's just a centralized network like Twitter or Reddit which hasn't shut off its firehose and public API yet. Hopefully that won't happen, but it certainly could.
ATProto is built this way. "Closing off bluesky" would be an extraordinarily non-trivial process and would break basically everything. This is in large part why private data isn't a thing yet. The architecture is diametrically opposed to it.
> Bluesky could decide to stop publishing the firehose or restrict its APIs - just as Twitter did.
They could "technically" completely rearchitect their application frontend and backend to do this but any effort to do so would be visible from miles away given the architecture and warning claxons would start ringing immediately.
> And Bluesky is a for-profit company that has raised around $30M in VC.
Bluesky PBLLC is a for-profit "public benefit corporation" for what it's worth and the way they are structured would open them up to legal consequences if they were to move diametrically opposed to the mission they were founded on. Not just because they are a PBLLC but because their initial investment funding was drawn up under an explicit contract that views moves against "the development of decentralised social media" as a violation of the terms of that contract.
I understand that Bluesky's conformance to ATProto is just a promise, but it's a better promise than you get from most websites. Also in the meantime, if you migrate to a self-hosted PDS, you can ensure that even if Bluesky restricts access to their Relay's firehose, 3rd party Relay servers can still pick up your posts and publish their own unrestricted firehose.
And what if, before they did that, they updated the PDS code so it blocked all relays except for their one?
I'm not asking what you would do. I'm asking what would happen because of what everyone does. I think the name "Bluesky" would refer to the fully centralized bsky.app, and 99.9% of users would never notice a difference. Users who had other PDSes would either quit (nobody noticing their departure) or sign up to bsky.app like everyone else. The events of Twitter show it's probably the latter - people bent over backwards to comply with Musk to keep their accounts.
That's not how it works. Appviews pick the relay they use, not the client/user. The relay is used for gossip into the appview (and other things).
More importantly, appviews never see the client/user directly. Appviews only talk to the PDS. Really most things other than the client ever only talk to the PDS or listen to the relay. The only thing that ever directly talks to the client is the PDS.
The way atproto services generally work is the client configures a series of XRPC requests with HTTP headers to determine what appview, labelers, etc to use and it issues that request to the PDS. The PDS then proxies that request to the appview or wherever and they respond back to the PDS which routes the response back to you.
So in a real sense your PDS is not just a data host, but also operates akin to an IRC bouncer.
-----
> And what if, before they did that, they updated the PDS code so it blocked all relays except for their one?
PDS relay routing, etc is mostly all handled manually via config files,etc so this isn't really a concern. And PDS code is probably the "easiest" part of the ecosystem to hack on which is why there are like 6 different implementations with the majority (like 4) that maintain near feature parity with the "bluesky PDS" software.
And importantly, the bluesky PDS is literally a sqlite DB, an OAUTH implementation, some go IPLD data structure manipulation code, and a go XRPC router. It's fairly trivial to hack on as needed.
------
> I'm not asking what you would do. I'm asking what would happen because of what everyone does. I think the name "Bluesky" would refer to the fully centralized bsky.app [...]
Migration currently isn't perfect but within ~6 months it should be ironed out by the community at which point migrating off a PDS to another is just a matter of:
1. click button on new PDS to transfer/"create new account".
2. set your new email, password, and list your old/current handle.
3. get auth code via email (one from the new PDS and one from your DID provider)
4. input codes into migrator interface (for whichever migrator you are using)
5. log into your apps again.
There are multiple large PDS operators working really really hard to spin up operations (proper backups, failover, HA, etc) so they can run reliably and avoid the "my mastodon instance imploded guess everything is gone" issue. Open federation is only about ~ a year old (plus change) so the community is only just now really reaching the "mature third parties" stage.
Migration would be impossible because the bsky.app PDS wouldn't allow anyone to access the data except for the bsky.app relay.
other appviews wouldn't display bsky.app data because both the PDS and relay would block them.
Secondly, while you're right that most people don't have copies of their repos, some copies of the entire network exist in the community, as well as copies of the PLC Directory. Within a couple weeks we would have community run versions of the AppView, PDSs, Relays, and PLC, and some seriously pissed off community members who would now want to do everything they can to take up the mantle temporarily. Soon Bluesky the app, as well as Instagram and Twitter, would be flooded with tutorials of how to recover your accounts and migrate to another PDS.
If your point is that we'd lose at least half of the users of Bluesky, then yea probably. But ATproto would be just fine, and if people want to get their data back they likely can, as long as they can stomach a little work. And that process is getting easier all the time.
"Convince a HN user that a corporation with the ability to cut you off from your feed sources for more profit won't do exactly that and just because they use something decentralised-ish today doesn't mean they have to keep using that thing if they don't want to" challenge (difficulty: impossible)
It's not just a technical restriction though. It's also a legal restriction. Bluesky PBLLC is a public benefit corporation and they are at least in moderate part beholden to their charter.
More importantly, their initial investment contract requires them to further the decentralisation of social media and exposes them to legal consequences should they deviate from that mission.
Those combined make it effectively impossible for them to lock in bluesky. Doing so would require technical changes that would be under no uncertain terms against the charter of the company and against the terms of the investment contracts that initially funded the company. It's a poison pill that would kill the company and destroy any "shareholder value" the moment they try to lock down the service or lock in the users.
You aren't going to see them try this type of brazen lock-in because it'd be explicitly harmful to every investor/VC and saddle the company in legal hell until it smoulders into ash.
Bluesky/atproto is still very early/very young but the general theme is to focus on the UX and ergonomics to make decentralisation viable without forcing the user to be technical enough to care about it.
i.e. The end goal is that people should be able to say "fuck bluesky", click a button, wait for a loading bar to finish, and continue using bluesky without using bluesky PBLLC stuff.
To be clear we are not at this point yet and bluesky PBLLC is not suddenly hostile yet either. Triage is in order of priorities so some more important UX aspects are being handled first.
My personal prediction is that this should be a solved problem within less than a year but of course we'll see.
> or if you get banned from the PLC directory
This is technically true (for did:plc users) but this isn't a thing that's happened in the past and in the event it happens in the future (or even is seriously discussed in the future), did:plc will be in a substantially different situation than it is currently and the community should be situated by then to handle a federated or coordinated PLC directory.
1. https://www.da.vidbuchanan.co.uk/blog/adversarial-pds-migrat...
Yes, that's as simple as an IP address check, since they own all of it. Could alternatively be a password (bearer token).
> And Bluesky wouldn't likely stop existing altogether,
In fact, the change would be noticed by almost nobody, because 99.9% of users would still see 99.9% of their following (including 100% of the ones who populate the default feed) and would have to go far out of their way to see any change.
> There would be time when we realize that's happening to get another AppView up and running. There are arguably enough resources in either the ATproto dev community or other funding sources that another AppView could pop up within a week or two, and be maintained by the community for months without issue. And Bluesky wouldn't likely stop existing altogether, so people would log into Bluesky, see the news (or hear it elsewhere), and see people talking about how to get access to your account.
All irrelevant since it wouldn't have the content and people go there for the content. Views go where the content is.
> some copies of the entire network exist in the community
All irrelevant since it wouldn't have the new content posted after the block and people go there for the new content. New views go where the new content is.
> and some seriously pissed off community members who would now want to do everything they can to take up the mantle temporarily
Yes, exactly like the people who were pissed off at Twitter so they started Mastodon. Made literally no difference to Twitter. Most of those people even remained on Twitter and posted as much on Twitter as on Mastodon. Because views go where the content is. Thinking anything else is wilful delusion.
> Soon Bluesky the app, as well as Instagram and Twitter, would be flooded with tutorials of how to recover your accounts
Why recover? If you're part of the 99.9%, nothing changed for you.
> and migrate to another PDS.
I have no doubt there would be a flood of tutorials of how to make sure that only 0.1% of former Bluesky users will ever see your tweets, but nobody will follow that tutorial because they don't want to make sure that only 0.1% of former Bluesky users will ever see their tweets.
> If your point is that we'd lose at least half of the users of Bluesky
0.1% would be lost.
> But ATproto would be just fine
but completely useless, since its main use case is getting Bluesky feeds and that would be switched off
> and if people want to get their data back they likely can
This is nobody's goal. People log into Bluesky to see tweets from the people and topics they follow, not to "get their data".
> So, you're making stuff up that obviously has no basis in reality here.
I cannot understand why you are claiming this. I'm basing off the actual architecture and the way the parts interact. The design is just not feasible for locking down. Doing so completely breaks the model and it still leaks like a sieve if you try to.
----------
> Migration would be impossible because the bsky.app PDS wouldn't allow anyone to access the data except for the bsky.app relay.
Nope. Migration is still fully possible. Migration doesn't happen via the relay or any PDS->PDS mechanism. Migration is done via the client. The client/user runs operations on the source PDS, the destination PDS, and the DID registry. All the data is transported between by the client.
Specifically the way it works is you export/backup your information from your current PDS (in the form of a CAR file + blobs). Technically this step is optional. Even if the PDS goes offline or becomes hostile you can actually largely reconstruct this data from the network. Then you "create a new account" on the new PDS and upload your data that you backup up/recovered onto the new PDS. Then you update your DID to point to the new PDS. And finally you deactivate the account on the original PDS (basically saying I no longer store stuff here anymore).
This is part of the reason why migration tooling is a bit bumpy. Your JS script or app has to do the entire process by itself rather than letting the backends handle it. However it does make them extraordinarily resistant to data loss and/or takeover.
----------
> other appviews wouldn't display bsky.app data because both the PDS and relay would block them.
Relays work via gossip. If you can see the relay at any point, you can gossip 100% of their contents to another relay.
In the event bluesky PBLLC locked down their appview and PDS, they'd still have to make the relay open or everything breaks. Feed providers need access to the firehose. Labelers/Moderation Services need access to the firehose. And so on.
Everything is built with an assumption of a public firehose and if you lock down the firehose, all you need is one person to listen to the locked down firehose to 100% replicate it and gossip onto any other relay.
$30/mo is $360/yr, which for most people is a prohibitively large sum of money. That would make Bluesky access more expensive than even the most expensive Netflix subscription; closer to the cost of a cellular plan.
For comparison: for my Mastodon account I pay $5/mo or $60/yr to a dedicated hosting provider. This puts it in the same ballpark as paying for a private email host or a VPN subscription.
It doesn't meaningfully make you "more independent" because all Relays are trivial (they're just dumb re-broadcasters of a stream) and it makes sense to use one run by somebody else — a company or a community that's pooling resources.
Say if Bluesky (the company) bans someone, that person could still have the keys to their data, but their feeds will no longer be "re-broadcast" by that company's servers - right?
If an app is concerned that a relay is censoring some user that they care about, the easiest solution is just to host their own relay. It's probably cheaper to operate than their app is. But if they really wanted to, they could listen to multiple relays to "cover the gap" or just manually listen to the event stream from specific users' PDSs directly whenever they notice censorship (effectively operating a partial relay in addition to listening to a full but censored one). But, again, in reality they'd just host their own relay and not bother complicating things.
The hardest problem of relays censoring content is to notice it happening, but once you notice you can easily verify it and switch to a different relay.
It's less like a cellular plan and more like building your own private cell tower just because you can.
I'd be happy to be wrong here though.
If you want to avoid the entire bandwidth of the firehose, you need something like jetstream (at least until something like sharded relays come around).
However the relay gossip protocol is not as taxing as it used to be. Relay Sync 1.1 massively decreased overhead and it allows relays to run "thin", i.e. running with only a certain backlog of history and not carrying the full history of the network. So you can make a relay that only keeps 24 hours of history and it'll perpetually stay under like 100gb of storage (I don't remember the exact storage amount but storage size is pretty linear with backlog history).
Then again, i will not deny that there's also the possibility that i am simply cheap! :-)
Instead, you're looking for hosting a PDS which you absolutely can do for $10/mo (or less)
I run a PDS on a OVH Cloud VPS for $5/mo for myself, some alts, and some bots
Bluesky's architecture was pretty much dictated by the premise that anyone needs to be able to see any post on the entire system, regardless of whether they have any connections with the author. That algorithmic entertainment-style feeds need to exist. You do need that firehose and other expensive infrastructure for that, there's no going around it.
The fediverse, on the other hand, entirely relies on people following each other. Each server only receives and stores data that is relevant to its users. ActivityPub works like an automated email list management system. You follow someone, they start sending you their updates and forwarding any updates from others that they consider relevant, like replies to their posts.
Exactly this (that people want it at least - I don't think that means it needs to exist). And I think there would be a lot less frustration in the discourse of ActivityPub vs. ATproto, if we could collectively agree that you can't get this in a decentralized system. In a dense network, the number of edges scales with the square of the number of nodes. It's just not feasible to have a network that is both dense and has a large number of nodes.
I think "I prioritize virality, recommendation engines and network density, thus accept giving control over the network to a centralized and profit-oriented entity" is an entirely reasonable tradeoff to make. I just don't understand why BlueSky users don't seem to accept that it's the tradeoff they are making.
One reason Bluesky is so successful is because it doesn't shove decentralisation into the user's face like Mastodon does. The vast majority of people don't know what decentralisation is and don't care to.
I think that far too much effort is put into decentralisation and not enough into good moderation on these platforms.
What I mean is I own my own domains but I can't use them on Mastodon without self hosting an entire Mastodon server for one user per domain. Yes there are other implementations of the protocol but none really solve this well in a cheap to run way.
Mastodon's missing feature is identity portability. A user with their own domain should be able to easily use a larger instance to host their identities and be able to migrate them to another instance.
Doing a search on Twitter searches Twitter, the whole thing. A search on Mastodon only knows about the servers you're connected to (unless you're searching for a specific user, then it'll micro-target their server to get their account info, but you have to know their name through some side-channel. Similarly, if you chance across a Mastodon post and want to follow that user, unless you happen to be on the same node as them you have to enter your own node data to get redirected to do the follow because of the domain-based nature of web security.
These aren't deal-breakers but we have the hard numbers from other web UX to know that every time you put a friction point like these in the flow, you immediately lose some x% of users. Relative to services that are centralized, these things will slow Mastodon adoption.
(This may not be the worst thing. There are other goals besides maximizing the adoption numbers.)
They can roll that back, or push moderation angle more, but they won't be able to do so without also come forward with the fact that East Asia is producing substantially more amount overall and on average higher quality content incompatible with Western moderation. Those realities won't be popular anyway.
ATProto's Stacked Moderation is an interesting approach to combine platform, community, and user level choices
https://bsky.social/about/blog/03-12-2024-stackable-moderati...
I think they do quite well considering the disparate resource levels, but some servers are effectively unmoderated while others are very comfortable; plenty are racist or other types of bigot friendly, but the infrastructure for server-level blocks is ad-hoc. Yet it still seems to work better than you'd guess.
Decentralization means whomever runs the server could be great, could just not be good at running a server, could be a religious fundamentalist, a literal cop, a literal communist, a literal nazi, etc etc. And all have different ideas of what needs moderating. There is no mechanism to enforce that "fediverse wide" other than ad-hoc efforts on top of the system.
It is perhaps also worth noting that the Fediverse architecture does nothing to remove racists or bigots from the possibility of being found in the "fediverse" (here referring to the collection of all servers using the protocol and not the protocol itself), and... That's pretty much as-intended. Truth Social uses Mastodon as its backend; there is nothing the creators / maintainers of Mastodon could, or by design would, do to shut it off. The same architecture that makes it fundamentally impossible for Nazis to shut down a gay-friendly node makes it impossible for other people to shut down a Nazi node; there is merely the ability of each node to shield its users from the other.
That's a feature of the experiment, not a bug, and reasonable people have various opinions on that aspect of it.
Because, if it's purely about filtering out content not desired by users, it could be nearly trivially done at the edge, automatic and completely de-humanized, and the word as appearing lately doesn't read that way to me.
Sounds like reddit 15 years ago
If I set up a get rich quick scheme based on overhyping and selling land, it doesn't matter whether I buy the land today or it's land I had for decades for a completely different reason. Either way I'm going for a quick buck now.
Like what?
Bluesky is a small team of like ~30 people, if they keep running lean they have at least a chance of a decent profit margin. But none of that will make anyone a multi-billionaire, so never mind.
For consumers, plenty of ads and plenty of tracking. For businesses, heavily-restricted user-to-server APIs and features gated behind subscriptions, think custom domains with Bsky hosting, multi-user post approvals, integrating DMs with customer support systems etc.
You can do all of that while still being fair to and interoperable with the rest of the ecosystem. As long as you don't want the convenience, features and UI of Gmail, you can still communicate with Gmail users from any other provider, and the same could be true about Bsky.
This isn't quite right. ATProto has a completely different "shape" so it's hard to make apples-to-apples comparison.
Roughly speaking, you can think of Mastodon as a bunch of little independently hosted copies of Twitter that "email" (loosely speaking) each other to propagate information that isn't on your server. So it's cheap to run a server for a bunch of friends but it's cut off from what's happening in the world. Your identity is tied to your server (that's your webapp), and when you want to follow someone on another server, your server essentially asks that other server to send stuff to yours. This means that by default your view of the network is extremely fragmented — replies, threads, like counts are all desynchronized and partial[1] depending on which server you're looking from and which information is being forwarded to it.
ATProto, on the other hand, is designed with a goal of actually being competitive with centralized services. This means that it's partitioned differently – it's not "many Twitters talking to each other" which is Mastodon's model. Instead, in ATProto, there is a separation of concerns: you have swappable hosting (your hosting is the source of truth for your data like posts, likes, follows, etc) and you have applications (which aggregate data from the former). This might remind you of traditional web: it's like every social media user posts JSON to "their own website" (i.e. hosting) while apps aggregate all that data, similar to how Google Reader might aggregate RSS. As a result, in ATProto, the default behavior is that everyone operates with a shared view of the world — you always see all replies, all comments, all likes are counted, etc. It's not partial by default.
With this difference in mind, "decentralizing" ATProto is sort of multidimensional. In Mastodon, the only primitive is an "instance" — i.e. an entire Twitter-like webapp you can host for your users. But in ATProto, there are multiple decentralized primitives:
- PDS (personal data hosting) is application-agnostic data store. Bluesky's implementation is open source (it uses sqlite database per user). There are also alternative implementations for the same protocol. Bluesky the company does operate the largest ones. However, running a PDS for yourself is extremely cheap (like maybe $1/mo?). It's basically just a structured KV JSON storage organized as a Merkle tree. A bit like Git hosting.
- AppViews are actual "application backends". Bluesky operates the bsky.app appview, i.e. what people know as the Bluesky app. Importantly, in ATProto, there is no reason for everyone to run their own AppView. You can run one (and it costs about $300/mo to run a Bluesky AppView ingesting all data currently on the network in real time if you want to do that). Of course, if you were happy with tradeoffs chosen by Mastodon (partial view of the network, you only see what your servers' users follow), you could run that for a lot cheaper — so that's why I'm saying it's not apples-to-apples. ATProto makes it easy to have an actually cohesive experience on the network but the costs are usually being compared with fragmented experience of Mastodon. ATProto can scale down to Mastodon-like UX (with Mastodon-like costs) but it's just not very appealing when you can have the real thing.
- Relays are things "in between" PDS's and AppViews. Essentially a Relay is just an optimization to avoid many-to-many connections between AppViews and PDS's. A Relay just rebroadcasts updates from all PDS's as a single stream (that AppViews can subscribe to). Running a Relay used to be expensive but it got a lot cheaper since "Sync 1.1" (when a change in protocol allowed Relays to be non-archiving). Now it costs about $30/mo to run a Relay.
So all in all, running PDSs and Relays is cheap. Running full AppViews is more expensive but there's simply no equivalent to that in the Mastodon world because Mastodon is always fragmented[1]. And running a partial AppView (comparable to Mastodon behavior) should be much, much cheaper — but also not very appealing so I don't know anyone who's actually doing that. (It would also require adding a bit of code to filter out the stuff you don't care about.)
[1] Mastodon is adding a workaround for this with on-demand fetching, see https://news.ycombinator.com/item?id=45078133 for my questions about that; in any case, this is limited by what you can do on-demand in a pull-based decentralized system.
A clarifying question: the blog post [0] I found about zeppelin.social which I think is a full AppView, the author said this:
"The cost to run this is about US $200/mo, primarily due to the 16 terabytes of storage it currrently uses"
Last I heard the amount of storage was just a couple of terabytes so the growth seems to be very fast.
If and when the primary cost is the storage, IMO the crucial question is: what's the expected future cost of running community AppViews?
Because unless storage cost drops as fast as the BlueSky data grows (unlikely?), to me this architecture looks like it will very soon kick out smaller players and leave only BlueSky with enough money to keep the AppView running.
In any case, if you’re okay with a partial snapshot of the network (eg all posts during some window or even more partial) then you can arbitrarily narrow that down. In Mastodon, having a “full” archive is downright impossible which is why we’re not talking about the same with regards to Mastodon. Whereas ATProto makes it possible, with the cost being the floor of what you’d expect the cost for storing data to be. How could it be better?
They need to be stored, but do they technically have to be stored by just one AppView? I get that it's a 100x easier to implement it like that, but I don't think a distributed search would've been technically impossible (although, granted, necessarily it would have had worse UX).
Choosing this feature and then implementing it like they did was a technical choice. Technical choices have consequences and this, I think, was the one which will prevent BlueSky from reaching any meaningful decentralization.
And saying "you can create an inferior UX with affordable costs" is not a real answer. Any meaningful decentralization IMO can only happen if it's affordable to create feature identical nodes. That can only happen if you refuse to implement features in ways that need centralization to scale.
On the contrary, ATProto adds flexibility here. There are community-run projects like https://constellation.microcosm.blue/ that let small application builders avoid that burden. Of course you don’t want to overwhelm those by building a massive app on top. But the point is that ATProto starts with equivalent baseline to what you’d pay running a centralized service, and then gives you room to play with distribution of costs, potentially going all the way down to directly querying PDS’s on-demand or something in between like community-maintained caches or even potential third-party app-agnostic aggregation services. Eg you could imagine AWS, Vercel or Cloudflare building “app platforms” in five years that let you cheaply query shared data.
As for creating “identical” nodes, I think you hit the nail on the head — that’s not what ATProto aims to do. The insight is that it’s not useful or feasible for everyone to run their own copy of Twitter. But that it’s possible for everyone with “proportional interest” to run a “proportionally complete” part, with some of the costs being amortizable and poolable across many users and apps (thanks to shared infrastructure) and always individually replaceable (to avoid lock-in). This is strictly better than centralized.
I'm not super up-to-date on Mastodon's/ActivityPub's workings, but aren't replies to a post pushed to the original poster's server? So wouldn't followers then be able to pull from that server at any time to get an always-up-to-date view of replies, at least theoretically? (With maybe posts from the last few seconds missing if the network's slow.)
(Asking because I've seen you claim that the architecture is inherently limited to never be able to achieve the "cohesive" experience.)
Imagine if, when you refreshed this HN page, only comment chains you’re already in would refresh timely. Yes, this would “work” to some extent, but it would clearly be a regression.
Additionally, going viral can overload your server due to this architecture. In ATProto this never happpens for self-fosters (of PDS) because the cost is amortized by AppView. (Same as in centralized products where the cost is on the backend.)
(To be honest, I'm already surprised that Mastodon scaled as far as it did. I will say, if I had seen the state of the web's architecture 20 years ago today, I probably also would have claimed that it was inherently insecure and that there was no way to get it to be secure enough to scale to billions of users, so... I don't know, maybe people will keep finding duct tape solutions to make it work, worse-is-better-style.)
Reading through it, it just sounds like sharding/scaling for a centralized service that's meant to be owned and provided by a single entity.
Each of the pieces I've described (PDS, Relay, AppView) implement the protocol specified at https://atproto.com/. Anything that acts as an ATProto PDS can be used as an ATProto PDS, anything that acts as an ATProto Relay can be used as an ATProto Relay, and so on. I'm not sure I understand the question so pardon the tautology.
The structure allows federation by design — a Relay will index any PDS that asks to be indexed; an AppView can choose the Relay it wants to get the data from (or skip a Relay completely and index PDS's directly); anyone can make their own AppView for an existing or a new app. That's how there are multiple AppViews (both for Bluesky app and for other ATProto apps) ingesting data via multiple Relays from many PDS's. There aren't many independent operators of each piece (especially outside of PDS self-hosting) but nothing is privileging Bluesky's infra.
Additionally, Bluesky's reference implementations of each piece are open source. So people run them the same way you would usually run software -- by putting it on a computer and exposing it to the internet. To run a custom PDS, you can either use the Docker container provided by Bluesky (https://github.com/bluesky-social/pds) or implement your own (e.g. https://github.com/blacksky-algorithms/rsky). Ditto for other pieces.
>Reading through it, it just sounds like sharding/scaling for a centralized service that's meant to be owned and provided by a single entity.
You're right in that the goal is to make it on par with centralized services in terms of UX and performance/scaling. However, it is decentralized.
The picture at the end of this article might help: https://atproto.com/articles/atproto-for-distsys-engineers
When people "post", their posts go to their PDS's, which means that every AppView ingests data generated by every other AppView by default. There is no way to tell who's using which AppView — in fact, you can log into any AppView and your profile will be there with all your posts.
The only things they do other than feed hydration are track notifications, (optionally) provide a search engine, (optionally) provide a CDN, and (temporarily until E2EE rolls out) handle DMs.
So you can actually do things like the Red Dwarf [1] project which is a bluesky client without an appview. It's slower, you visibly notice request loading/pop-in, there's no notifications, and no search but it works with any other bluesky appview (since appviews are basically a lens into atproto rather than an independent service).
--------
If you wanted to run your own infrastructure, instead you'd probably want to run your own PDS. Running an appview has its benefits of course but the main way you "self host" is to run a PDS. That's fairly trivial and people have run them on all kinds of constrained hardware (including a literal jailbroken microwave if I remember correctly).
The AppView doesn't do that only for Bluesky data. It does it for any Personal Data Stores (user accounts with all their user data) that it knows about.
When you "interact" with users elsewhere, all you do is generate new records on your own PDS. You generate a "like" entry, or a reply, on your own PDS. It's your pds, all your stuff goes there. The AppView sees that and indexes it, attaches that like or that reply in the AppView to the post you're reacting to.
When you write a post, you save it to your PDS. Think of it like writing a blog. You're done, you hit submit, it shows up on a server somewhere. You can run your own server with your own data, or use someone else's. That's exactly how a PDS works; it is a storage server for your data.
The AppView is a way to index all the PDSes registered across the whole network. If your server is crawlable by the AppView, all your data shows up in the app. This is like if your blog is crawlable by Google, you show up in search results.
When you like a post, you commit a "like" record to your personal server. When the AppView displays likes, it looks at every indexed PDS and shows every like it can find for that post (simplifying a little for clarity). Each one of those likes might live on different servers, some of which is self-hosted.
Because you can run your own PDS, you can commit any data you want to it. You can even commit things that services may find distasteful. However, the AppView may refuse to serve this content to users; this is how content can be removed from the network and how users can be banned. The federation equivalent would be defederation, except it happens to singular accounts rather than entire instances.
If you disagree with the moderation policies run by Bluesky the company, that's when you can look into running an alternative AppView. This is similar to disagreeing with the admin of a particular Mastodon instance and moving to a different instance. Of course, as mentioned running an AppView is much more expensive, but that hasn't stopped folks from trying (I believe Blacksky is trying to run their own AppView that is fully independent of Bluesky).
To use an alternate AppView, you'd simply go to a different website. This website will index PDSes the same way that Bluesky does, but it may index them in a different way and include/exclude different content. The data is still there (nobody can reach into your PDS on your server and delete your data), but the AppView admins choose which content they wish to serve to people using their community, just as Mastodon admins choose who to federate with.
In this sense, it is indeed truly federated. The primitives are simply different; it's more granular than Mastodon.
You can write your own content to your own server and let it get indexed to any number of AppViews; you completely control your personal data and nobody can reach in and delete that data randomly as they don't own it - you do (at the cost of ~$1/month or a Raspberry Pi).
When you use the Bluesky service, you are seeing their view of the network and what they choose to index. You may disagree with this view, just as you may disagree with the admins of Mastodon.social etc. In that case, you can choose to use another AppView (such as deer.social or Blacksky) that adopts different policies. Since account information isn't stored on the AppView and it simply handles indexing and moderation, moving between AppViews is painless and no data needs to be transferred from one server to another - you simply use a different bookmark.
It could be that you disagree with all the current AppView admins. You can host your own, it's just expensive ($300/month, as mentioned). You can also tailor your AppView to index less content, which will of course limit the amount of data you consume and give you a partial view of the network, effectively defederating you from anything you do not wish to index.
But there's nothing stopping you from doing so!
But since essentially no one is using it doesn’t suggest much avoidance of centralization. These factors are not independent. It’s pretty easy to avoid anything when your total user count is a rounding error compared to the alternatives.
It's important to note that this isn't "you have to be big in order to be able to filter spam". That's not true at all; decentralized anti-spam lists have been a thing for decades and the big sites don't have any significant advantage in filtering spam.
It's allegedly that big sites will mark small sites as spam even when they're not, which makes it hard to run a small mail server. And there is some of that -- they also have a perverse incentive to do it on purpose because it kills their smaller competitors.
But it's also somewhat overstated. If you have a reverse DNS entry pointing back at your mail server and have properly configured DKIM, it's not inherently the case that you're always going to be marked as spam. And it's not inherently the case that you won't just because you use one of the big services -- they have the same incentive to do that to each other, after all.
You saying the average admin is gonna have no issue setting up their MX records but “won’t even know” what a reverse DNS entry is?
Sure, a lot of people pick one center and stare at it. But there are plenty of centers and that situation is getting better recently. Some even integrate git with ATProto or ActivityPub.
There are loads of different tracker sites. Many private. If one goes bad, others pop up to replace it. This is decentralised - there is not one player that strangles the ecosystem.
Not to mention the mainline DHT. Not impossible or even very resource hungry to run a scraper/crawler/listener and be able to search via it, like bitmagnet (https://bitmagnet.io/) that has some fun pipe-dreams like federation of indexers and something like an decentralized private tracker.
Ofc, your bitcoin would still be "safe" and still be "yours" on the chain assuming you owned your wallets directly, but those guarantees would now lack meaning or real world consequences. At least until another link to the real world could be established.
> so you can find someone in another country to convert you out through another currency
This is not so easy for large amounts -- the whole reason to use a Coinbase is institutional trust -- and is certainly inconvenient. In practical terms, across the ecosystem of users, it would not be some small roadbump but a massive problem.
I don't mind, I still think it's a huge leap forward, but it's important to set realistic expectations.
> A foreigner visiting Oxford or Cambridge for the first time is shown a number of colleges, libraries, playing fields, museums, scientific departments and administrative offices. He then asks “But where is the University?” .....
"The fediverse" is not a thing. There are many separate sites and the collection of all of them in total may be called "the fediverse", but that is not a thing by itself. They don't even all share a protocol in common. You cannot join the fediverse any more than you can join the game industry or the startup scene. You have to join a specific server, game company or startup (or more than one). And while from the outside you might have heard a lot of "the game industry is cut-throat" or "startups work together to innovate technology", once you are inside one of them, you'll find that it's a lot more fragmented than it at first appeared, and totally incohesive, and although from a bird's eye view it looked like you and your competitor down the hall were both making VR happen, they won't work together with you.
(Never used the fediverse, so zero context here).
The only difference in visible replies is in the moderation choices of the server the post is viewed from.
This is a catch 22 because the reason fedi is more decentralized is because of the low barrier of entry to run a node, but at the same time that low barrier means it takes less resources because it does not fetch every single message and piece of media.
Us small instances can't afford it.
In ATProto, there is no need to do this on-demand because the data is already there in the AppView. When you want to serve a page of replies, you read them from the database and serve them. There is no distributed fetching involved, no need to hit someone else's servers, no need to coalesce them or worry about limiting fetches, etc. This is why it works fine for threads without thousands of replies and hundreds of nesting levels. It can also be paginated on the server.
If you don't have this information on your server, how can you gracefully fetch thousands of replies from different servers and present a cohesive picture during a single request? I'm sure this PR does an attempt at that but I'm not sure this is a direct comparison because Mastodon can't avoid doing this on-demand. If we're comparing, it would be good to list the tradeoffs of Mastodon's implementation (and how it scales to deep threads) more explicitly.
There is also a section related to performance available at the link I posted. Third header, "Likely Concerns", second subheader, "DoS/Amplification".
I mean from the user's perspective: when I open a thread, I expect to instantly see the entire discussion happening across the entire network, with the paginated data coming back in a single roundtrip. Moreover, I expect every actor participating in the said discussion (wherever their data is stored) to see the same discussion as I do, with the same level of being "filled in", and in real time (each reply should immediately appear for each participant). It should be indistinguishable from UX of a centralized service where things happen instantly and are presented deterministically and universally (setting aside that centralized services abandoned these ideals in favor of personalization).
With ATProto, this is clearly achieved (by reading already indexed information from the database). How can you achieve this expectation in an architecture where there's no single source of truth and you have to query different sources for different pieces on demand in a worker? (To clarify, I did read the linked PR. I'm asking you because it seems obviously unachievable to me, so I'm hoping you'll acknowledge this isn't a 1:1 comparison in terms of user experience.)
To give a concrete example: is this really saying that replies will only be refreshed once in fifteen minutes[1]? The user expectation from centralized services is at most a few seconds.
[1]: https://github.com/mastodon/mastodon/pull/32615/files#diff-6...
For realtime discussions (like this one), I don't think we can call it consistent if it takes multiple minutes for each back-and-forth reply to propagate across instances in the best case (and potentially longer through multiple hops?) because you'll see different things depending on where you're looking and at which point in time.
At least to my observation; I haven't pulled apart the protocol to know why: if you're in a conversation in Mastodon it's real good about keeping you in it. The threading of posts seems to route them properly to the host servers the conversing accounts live on.
I hear your point that slower conversation can be better. That’s a product decision though. Would you intentionally slow down HN so that our comments don’t appear immediately? You could certainly justify it as a product decision but there’s a fine line between saying you should be able to make such decisions in your product, and your technology forcing you to make such decisions due to its inability to provide a distributed-but-global-and-realtime view of the network.
Auto-at-tagging doesn't scale to dozens and dozens of actively-engaged speakers, but neither does human attention, so that's not a problem that needs to be solved.
Seeing the existing convo in real time lets me decide which points to engage with and which have been explored, and to navigate between branches as they evolve in real time (some of which my friends participate in). I do earnestly navigate hundreds of times within an active thread — maybe it’s not your usage pattern but some of us do enjoy a realtime conversation with dozens of people (or at least observing one). There’s also something to the fact that I know others will observe the same consistent conversation state at the time I’m observing it.
You might not consider such an experience important to a product you’re designing, but you’re clearly taking a technological limitation and inventing a product justification to it. If Mastodon didn’t already have this peculiarity, you wouldn’t be discussing it since replies appearing in realtime would just seem normal.
In either case, whether you see it as a problem to be solved or not, it is a meaningful difference in the experiences of Twitter, Bluesky, and Mastodon — with both Twitter and Bluesky delivering it.
It's definitely not as clean as a centralized solution though.
If the decentralized network allows for some kind of targeted broadcasting, it becomes attractive for spamming (e.g: email).
If the decentralized network allows for concentration of responses on something, it becomes a potential tool for a DDoS attack (e.g.: DNS amplification).
So running a node should be somehow expensive, but the expense should be written off if the receivers of the message endorse it, by a one-time action, or automatically by subscribing. An initial credit would allow to establish an audience.
It looks like a perfect use case for a cryprocurrency of sorts %) But this means expensive coin generation, and distribution of the huge ledger across all nodes. That could be delegated to some specialized nodes, but here comes centralization again!
I disagree that building economics into the protocol is the way forward. There's so much power and creativity that can be unlocked when the substrate is free.
Read-only sites, like personal blogs, are easy to self-host but vulnerable to DDoS attacks if targeted.
Writeable sites, like anything with a form or message board, are now too expensive to run because of spammers and hacking bots.
We're now all paying rent to Cloudflare or Substack or whatever because you can't just be an isolated island on the open web anymore.
I call that working in practice. The fact that any individual self-hosted site could not survive a DDOS does not negate the fact that, well, the internet actually does exist.
Like I said: if your site is read-only and you self-host, you're vulnerable to DDoS. You'll avoid it if you aren't very famous and aren't randomly targeted, but you're still vulernable.
And if your site has any kinds of comments, forms, or "write" ability, then you absolutely can't self-host. You will be blown away by spam and hacking attempts.
The "tons" of people you're talking about don't exist. They're mostly on WordPress, Ghost, Substack, or something similar with custom domains, and the ones who aren't are still mostly hiding behind Cloudflare.
Where-as with Bluesky/ at protocol, most folks are on Bluesky servers, yes. But there's a very strong credible exit case where you can leave the Bluesky servers & just do your own thing. And follow whomever you want to follow.
Bluesky / at proto creates a trust mechanism beyond DNS, creates an identity that can be moved around between hosts or replicated outwards in a verifiable way. I dig ActivityPub, and have been a long time http enjoyer, but it's not ideal imo for social media to need to be so coupled to such strongly DNS based client-server systems.
NNTP is also great but most people can not afford individually to mirror entire binary groups and most ISP's no longer perform this so most people just use commercial news feeds if they want binaries or one of the free NNTP / Usenet providers if they are just using text. People can certainly peer with some of the free providers [1] and probably should to reduce the risk of people being censored. Much like IRC people can create their own little private or semi-private linked NNTP servers to replicate a distributed thread based forum of sorts.
[1] - https://www.eternal-september.org/index.php?showpage=peering
Too decentralized, and you can't find anything. Nobody uses it.
Too centralized, and censorship takes over. Nobody can speak freely.
You can think of the golden age of blogs and search as an example of both. Search engines formed a centralized hub with blogs, forums, etc. forming the spokes. For a while that worked well before it was degraded by spam and consolidation of disparate forums etc. into a handful of major platforms (fueled partly be acquisitions).
Remember, "the fediverse" is a bit like saying "the internet". "Internet folks are against centralization." Are they?
In economics, a market needs several reasonably strong businesses to get price competition. An EU study indicated that the minimum number is about 4. Below 4, price competition seems to disappear and you have oligopoly, or, at 1, monopoly.
In areas where there's no inherent effect like distance to stop centralization, markets tend towards oligopoly. Look at the number of browsers, the number of big banks, the number of cellular phone companies, and so forth. They're all between 2 and 4. The stable state seems to be around 3 big players.
This probably applies to social networks. There's only so much attention available.
(It is, of course, fundamentally impossible to keep people from indexing a default-open network, but if one does it, one does not advertise doing it outside the service-supported mechanisms).
This hits scaling problems. USENET ran into that.
Fediverse is almost straight left, and it's already 690. Straight up would be 5000. This is non-linear scale presented linearly.
It's not impossible, but each distributed component would have to be at least a small data center.
I wouldn’t be surprised if Facebook tries to eventually capture that data with Threads.
Mastodon is social like a quiet pub. Twitter and Bluesky are social like a crowd at a concert.
Good analogy. When Twitter started, I took one look at people shouting "I had a delicious sandwich today" and "I just took an amazing dump" and wanted no part of it. When it later turned into a "clever" contest, I wanted even less.
My quiet little Mastodon community, plus some outsiders I have chosen, is the kind of "social" I want. If someone starts behaving like an influencer, they get muted.
1. How hard is it to censor the network.
2. How hard would it be for some major player to enshittify the network.
Furthermore, while the fediverse has a single axis for decentralization, BlueSky has 3: number of "big index servers", number of PDSs, number of domain names (how many people own their handle):
1. Increasing the number of PDSs doesn't make it harder to censor the network when everyone still uses the same big index node.
2. BlueSky's primary defense against enshittification is user account portability. I'd love to see metrics on how many users have their own domain names. Having many PDSs is also a good defense here because it reduces the impact of BlueSky (the company) shutting off the firehose, but I still think account portability is the primary defense here.
it would be high in my list of desirability because decentralization means agency to move freely.
i like the fact that if i find the city or state i live in to be boring or economically terrible, i can move.
i like it that if i don’t like the food or atmosphere of a restaurant, i can go to a different restaurant.
i like that if i think billy is a constant asshole, me and my friends can move to the next table over and leave billy behind.
social networks are absolutely no different, no matter how hard certain people try to convince online is different, it isn’t.
we should be soooo incredibly leery of anyone who tells us it’s a good thing to have no agency to eat different food or go party at a different bar.
the hair on the back of our neck should stand up every time someone tries to convince us to go to only a couple of places with the same set of rulers and tries to convince us this is somehow good for us.
how many times have you heard “you want to avoid echo chambers” followed by “therefor everyone should all be on the exact same set of websites with the same set of rulers” followed by “anything less than everyone on the same site is a failure”
it’s double speak: if you are not trapped here, you have an echo chamber.
freedom of movement, freedom of association, etc… are incredibly important to the future health of the internet.
I wonder if people would actually migrate or if they'd just get boiled.
Bluesky isn't decentralized, anyway, because of the PLC directory.