Posted by dcu 2 days ago
> Data stored in human-readable CSVs
The choice to not use a database when two near-perfect tiny candidates exist, and furthermore to choose the notorious CSV format for storing data, is absolutely mystifying. One can use their Wasm builds if platform-specific binaries offend.
Super fast
Can’t hack me because those CSV files are stored elsewhere and only pulled on build
Free, ultra fast, no latency. Every alternative I’ve tried is slower and eventually costs money.
CSV files stored on GitHub/vercel/netlify/cloudflare pages can scale to millions of rows for free if divided properly
All these benefits also apply to SQLite, but SQLite is also typed, indexed, and works with tons of tools and libraries.
It can even be stored as a static file on various serving options mentioned above. Even better, it can be served on a per-page basis, so you can download just the index to the client, who can query for specific chunks of the database, further reducing the bandwidth required to serve.
Someone already chimed in for SQLite, so worth mentioning that Python is hard typed, just dynamic. Everyone has seen TypeError; you'll get that even without hints. It becomes particularly obvious when using Cython, the dynamic part is gone and you have to type your stuff manually. Type hints are indeed hints, but for your IDE, or mypy, or you (for clarity).
It's a bit like saying C++ isn't typed because you can use "auto".
If I want to bother with a SQL database, I at least want the benefit of the physical layer compressing data to the declared types and PostgreSQL scales down surprisingly well to lower-resource (by 2025 standards) environments.
- It's plain text
- It's super easy to diff
- It's a natural fit for saving it in a git repo
- It's searchable using standard tools (grep, etc.)
- It's easy to backup and restore
- You don't need to worry about it getting corrupt
- There are many tools designed to read it to produce X types of outputs
A few months ago I wrote my own CLI driven CSV based income and expense tracker at
https://github.com/nickjj/plutus. It helps me do quartly taxes in a few minutes and I can get an indepth look at my finances on demand in 1 command.My computer built in 2014 can parse 100,000 CSV rows in 560ms which is already 10x more items than I really have. I also spent close to zero effort trying to optimize the script for speed. It's a zero dependency single file Python script using "human idiomatic" code.
Overall I'm very pleased in the decision to use a single CSV file instead of a database.
So why wouldn't you just use a text format to persist a personal website a handful of people might use?
I created one of the SQLite drivers, but why would you bring in a dependency that might not be available in a decade unless you really need it? (SQLite will be there in 2035, but maybe not the current Go drivers)
Go binaries are statically linked, unless you expect the elf/pe format to not exist in 2035 your binary will still run just the same.
And if not well there will be an SQLite driver in 2035 and other than 5 lines of init code I don’t interact with the SQLite drover but rather the SQL abstraction in golang.
And if it’s such an issue then directly target the sqlite C api which will also still be there in 2035.
It's great for an extra challenge. Or for writing good literature.
Until the data for a static website becomes large enough to make JSON parsing a bottleneck, where is the problem?
I know, it's not generally suitable to store data for quick access of arbitrary pieces without parsing the whole file.
But if you use it at build time anyway (that's how I read the argument), it's pretty likely that you never will reach this bottleneck that makes you require any DBMS. Your site is static, you don't need to serve any database requests.
There is also huge overhead in powering static websites by a full-blown DBMS, in the worst case serving predictable requests without caching.
So many websites are powered by MySQL while essentially being static... and there are often unnecessarily complicated layers of caching to allow that.
But I'm not arguing against these layers per se (the end result is the same), it's just that, if your ecosystem is already built on JSON as data storage, it might be completely unneeded to pull in another dependency.
Not the same as restricting syntax within one programming language.
I'd also be hard pressed to find any real reason to chose CSV over JSONL for instance. Parsing is fast and utterly standard, it's predictible and if your data is really simple JSONL files will be super simple.
At it's simplest, the difference between a CSV line and a JSON array is 4 characters.
Also one less dependency.
I have started writing web apps that simply store the user data as a file, and I am very pleased with this approach.
It works perfectly for Desktop and Android.
iOS does not allow for real Chrome everywhere (only in Europe, I think), so I also offer to store the data in the "Origin private file system" which all browsers support. Fortunately it has the same API, so implementing it was no additional work. Only downside is that it cannot put files in a user selected directory. So in that mode, I support a backup via an old-fashioned download link.
This way, users do not have to put their data into the cloud. It all stays on their own device.
I never tried it, but from the descriptions I have read, Dropbox detects conflicting file saves (if you save on two devices while they are offline) and stores them as "conflicting copies". So the user can handle the conflict.
As a developer, you would do this in the application. "Hey, you are trying to save your data but the data on disk is newer than when you loaded it ... Here are the differences and your options how to merge.".
You're suggesting an actual API-facilitated data sync via Dropbox? Sure, but at that point why? Unless the data also needs to be read by 3rd party applications, might as well host it myself.
A lot of what I've read about local-first apps included solving for data syncing for collaborative features. I had no idea it could be this simple if all you need is local persistence.
If there were something that formalized this a little more, developers could even make their apps in a... Bring Your Own Network... kinda way. Maybe there's already someone doing this?
Since the File Access API lets web apps simply use the file system, I guess you could just write the file to a shared drive.
Technically probably not completely "without restriction". But for all practical purposes, it works just fine for me.
Could this allow accessing a local db as well? Would love something that could allow an app to talk directly to a db that lives locally in my devices, and that the db could sync across the devices - that way I still get my data in all of my devices, but it always stays only in my devices
Of course this would be relatively straightforward to do with native applications, but it would be great to be able to do it with web applications that run on the browser
Btw, does Chrome sync local storage across devices when logged in?
Like IndexDB? It’s a browser API for an internal key-value storage database.
> Btw, does Chrome sync local storage across devices when logged in?
Syncing across devices still requires some amount of traffic through Google’s servers, if I’m not mistaken. Maybe you could cook something up with WebRTC, but I can’t imagine you could make something seamless.
No, but extensions have an API to a storage which syncs itself across logged-in devices. So potentially you can have a setup where you create a website and an extension and the extension reads the website's localStorage and copies it to `chrome.storage.sync`.
Sounds like an interesting idea actually.
I've been playing with chrome extensions recently, and have made them directly talk to a local server with a db. So using extensions, it's relatively easy to to store data locally and potentially sync it across devices
I like the idea of leveraging chrome.storage.sync though, I wonder what the limitations are
https://developer.mozilla.org/en-US/docs/Mozilla/Add-ons/Web...
says that there is a 100kb limit, and a 512 KV pair limit per extension.
Quite limiting, but if this pattern becomes popular I don't see why it can't be expanded to have the same limit as localStorage (5MB)
Ambiguity in your storage format isn’t good in the long run… JSON lines can be trivially parsed anywhere without a second thought.
[1]wazero:https://wazero.io/ [2]:https://pkg.go.dev/modernc.org/sqlite
For the wazero based driver, it's this package (I'm the author): https://github.com/ncruces/go-sqlite3
Also didn't expect that I would be talking to the author of wazero myself haha. I really admire your project.
modernc takes the SQLite amalgamation, runs it through the C preprocessor, then converts the result to Go file using the ccgo compiler. Not many further details on how that works: https://www.reddit.com/r/golang/comments/1apreer/comment/kqa...
The Wasm version takes the same SQLite amalgamation and compiles it to portable Wasm using clang/wasi-sdk; the platform specific bits are implemented in Go.
I'm not sure I can say which one is more minimalist with a straight face. One consists of mechanically generated, platform specific, 8MB Go files. The other embeds 1.5MB Wasm BLOB and needs wazero (a big dependency on its own).
I’ve heard this complaint but have yet to have an issue deploying to Linux/MacOS/Windows on arm or x86 using CGO backed libraries.
Maybe if you truly are targeting some niche platform but then you likely have some other issues to contend with and where are you deploying to?
Not 100% drop-in though. I’ve hit some snags around VFS support.
admin,1,salt,5V5R4SO4ZIFMXRZUL2EQMT2CJSREI7EMTK7AH2ND3T7BXIDLMNVQ====,"admin"
alice,1,salt,PXHQWNPTZCBORTO5ASIJYVVAINQLQKJSOAQ4UXIAKTR55BU4HGRQ====,
> Here we have user ID which is user name, version number (always 1), salt for password hashing, and the password itself (hashed with SHA-256 and encoded as Base32). The last column is a list of roles assigned to the user.I haven't had to handle password hashing in like a decade (thanks SSO), but isn't fast hashing like SHA-256 bad for it? Bcrypt was the standard last I did it. Or is this just an example and not what is actually used in the code?
I also tried to make it a bit more flexible: to use `bcrypt` one can provide their own `pennybase.HashPasswd` function. To use SQLite one can implement five methods of `pennybase.DB` interface. It's not perfect, but at the code size of 700 lines it should be possible to customise any part of it without much cognitive difficulties.
A SQLite connection can be made with just a sqlite://data.db connection string.
However using the stdlib abstraction adds a lot of performance overhead; although it’ll still be competitive with CSV files.
I'm guessing the goal is that the file can be managed more easily with a text editor and some shell utils.
Fast hashing is only a concern if your database becomes compromised and your users are incapable of using unique passwords on different sites. The hashing taking forever is entirely about protecting users from themselves in the case of an offline attack scenario. You are burning your own CPU time on their behalf.
In an online attack context, it is trivial to prevent an attacker from cranking through a billions attempts per second and/or make the hashing operation appear to take a constant amount of time.
you don’t need bcrypt or pbkdf with api keys, as they are not passwords. they are high entropy and unique and long - unlike passwords.
How many CPU seconds should I burn for every user's login attempt to compensate for the remote possibility that someone steals the user database? Are we planning to have the database stolen?
Even if you spin for 30 minutes per attempt, someone with more hardware and determination than your enterprise could eventually crack every hash. How much money is it worth to play with a two-layer cake of unknowns?
Has anyone considered what the global carbon footprint is of bitcoin mining for passwords? How many tons of CO2 should be emitted for something that will probably never happen? This is like running the diesel generators 24/7/365 in anticipation of an outage because you couldn't be bothered to pay for a UPS.
I appreciate how it seems like we have a spectrum of similar options emerging now for simple backends, ranging from pennybase to trailbase to pocketbase. I do hope one of them eventually implements postgres as an alternative to sqlite at some point though.
Start cheap, gather market, then crank the costs after lock-in.
Even "open-source" is abused. First everything is open-source, and then reasons come out for why premium services will be closed source.
"cost"/"price" vocabulary clarification, should you ever want to read or write business plans, communicate with accountants, CFO's, etc.
"costs" are what companies pay for supplies/inputs that the company purchases.
"prices" are what those same companies offer to charge buyers for the products the company sells.
companies want to keep costs down, and companies benefit from high prices. (when you said "crank the costs", it thunks)
since people don't generally operate their lives as companies, it tends to seem like "costs" and "prices" are the same thing, but in addition to the above, "costs" to a company reflect actual expenditures in total, and "prices" represent an advertisement for each of something pending that has not transacted yet.
"cost" is an accounting term, total revenues - total costs = total profits
"price" is a marketing term, $1 each, $10 for a dozen!
(of course this could be quibbled into incomprehensively, which is another thing you should not do in "business communication", always streamline communication to get to the takeaway as quickly as possible)
But in all seriousness, I may be going on a tangent but I don't think that anybody can monetize code under less than 1k loc. Are there any cool examples anybody want to share?
Maybe "simple" api's would generally be the only thing that would be more monetizable and still fall under less than 1k loc. But still I would love hearing more about this kind of thing.
Can you give me some examples. Also I personally feel as if most of these are saturated and I don't think that I could earn a million with such loc. and maybe its me but personally I don't like touching most crypto since its grift. And the only one I'd like is monero for privacy but I doubt how much I need it anyway.
What kinds apps are folks building with this? Are there any decently sized websites running on Pocketbase/trailbase?
Now what I do like though is the second line of your post. What are the comparisons... Now IMO, the biggest thing is that this thing is genuinely really tiny (less than 1k loc is wild) and maybe they really followed the occam's razor and just ditched sql and the simplest sql (sqllite) altogether for the sweet csv.
I never thought there would be a day where I would have to say that sqlite would be the one complex given how in all senses sqlite is like the most simplest / embeddable sql database or databases in general. Maybe I am going into a tangent but I love sqlite and what pocketbase does tbh. I think of sqlite + per user db and I just get so happy thinking about this architecture tbh. I love sqlite.
I tried to experiment with an API similar to what k8s api server offers: dynamic schemas for custom resources, generated uniform REST API with well-defined RBAC rules, watch/real-time notifications, customisation of business logic with admission hooks etc.
I also attempted to make it as small as possible. So yeah, I don't try to compete with Pocketbase and others, just trying to see what it takes to build a minimally viable backend with a similar architecture.
The choice of the "database" is dictated by the very same goals. I deliberately made it an interface, better databases exist and can be plugged in with little code changes. But for starters I went with what Go stdlib offers, and CSV is easy enough to debug.
Given the setup I’d guess it makes sense for little household scale apps w/ a user list in the low tens of people.
NIH mostly.
A big part of why PocketBase is so good is because the project is aggressively constrained, both in features and contributions. I'd suggest people contribute to the ecosystem, which is big and growing.