Datasette Apps: Host custom HTML applications inside Datasette

Posted by lumpa 6 days ago

Datasette Apps: Host custom HTML applications inside Datasette(simonwillison.net)

156 points | 68 comments

fsuts 6 days ago|

To save anyone else wondering what is Datasette a search:

“Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.

Datasette is aimed at data journalists, museum curators, archivists, local governments, scientists, researchers and anyone else who has data that they wish to share with the world. It is part of a wider ecosystem of 44 tools and 154 plugins dedicated to making working with structured data as productive as possible.”

hbcondo714 5 days ago|

I’ve been using the Observable Framework[1] for this kind of work but it doesn’t appear to be actively developed anymore so will look into Datasette.

[1] https://github.com/observablehq/framework

lmeyerov 5 days ago||

Multiple projects are coming to the same point it seems. Motherduck has been marketing "dives" since the beginning of the year (https://motherduck.com/blog/duck-dive-and-answer/) and in the Louie.ai team, we have been iterating on different patterns for similar needs. I'm getting the feeling that the answer to SaaS apps as fixed UIs over databases being dead because of coding agents means just the fixed dashboard pattern is dead, not SaaS, and BYO UI is part of the new table stakes.

I'm curious where the pattern will go. My sense is there is a split between cathedrals vs bazaar for approach here, where cathedrals are quite rigid app builders, think framer/wix, while bazaars focus a layer below for more flexibility but less integrated.

rileyphone 5 days ago||

Absolutely, plus if you control the coding agent you can enforce certain guarantees and have it wrap your services with a custom sdk. I've been exploring this pattern in a couple of different domains where it's just a vite react app wrapped in an iframe with a JWT bridge giving auth, hosted on a separate domain.

skeeter2020 5 days ago||

At the enterprise level this feels a lot like Snowflake buying StreamLit to try and have a similar experience, and keep you in the Snowflake ecosystem burning credits.

anitil 6 days ago||

When I've needed something like this in the past I've spun up simple HTML pages and used the json endpoint that all datasette instances come with [0]. I like this new pattern much better, as it keeps your app and data in one place (I remember having some issue with this at the time, though I can't remember what the actual issue was)

So I imagine we could now load some data in to sqlite, design some HTML also loaded in to the db, and deploy. Although looking at the source, it seems like stored apps are expected to be managed by the plugin itself, but I'm sure there's a way around that

[0] Eg from one of the examples - https://datasette.io/legislators/-/query.json?sql=select+*+f... . If you strip the '.json' you get the html view. For what it's worth there's also a '.csv' version.

simonw 6 days ago||

I'm going to think about how Datasette Apps can work with the apps themselves stored on a filesystem so they can be revision controlled using Git.

I have an idea for a way to edit them through Datasette and have them backed up to Git via a separate mechanism, but having them on disk would be a whole lot more convenient.

Filed an issue here: https://github.com/datasette/datasette-apps/issues/30

anitil 6 days ago|||

Interesting idea, I know there's the fsdir [0] table-valued function / module that allows loading from disk, so it should be possible to modify that or hard-code base list of paths or something

[0] https://sqlite.org/src/file/ext/misc/fileio.c, it allows you to read a directory recursively in the cli (`select * from fsdir("./");`)

Edit: It allows upwards traversals (`select * from fsdir("../../../../etc/passwd");`), so beware

simonw 6 days ago||

Wow, I didn't know about that one. SQLite never ceases to surprise.

I'm sticking with the Python bundled sqlite3 though so I'm not in a good place to take advantage of that one.

anitil 6 days ago||

It's probably out of scope for you, but I've used the 'vtfunc' module [0] for a similar purpose actually.

[0] https://github.com/coleifer/sqlite-vtfunc

ramses0 5 days ago|||

nee: couchapps

https://railsware.com/blog/couchdb-and-couchapp-part-1/amp/#...

https://couchapp.readthedocs.io/en/latest/couchapp/gettingst...

https://couchapp.readthedocs.io/en/latest/user/list-of-couch...

e12e 6 days ago|||

> it keeps your app and data in one place (I remember having some issue with this at the time, though I can't remember what the actual issue was)

CORS headers?

anitil 3 days ago||

I think you might be right, that rings a bell. There's an item on the front page about this right now, and I'm guilty as charged - I don't understand CORS at all

https://news.ycombinator.com/item?id=48614844

Talpur1 6 days ago|||

This seems to attractive side of seeing it, however the striping Json would not be suitable i believe

Talpur1 6 days ago||

[flagged]

jacobgold 6 days ago||

It is pretty cool that we have browser features like this to rely on.

I remember writing code in the bad old days to parse HTML tags and allowlist specific attributes. Now browsers have a much better solution baked in.

But it still makes me a bit nervous. Seems like a very small bug could sneak in. This is a good example of where I would reach for Fable to double check the implementation and have a lot of extra tests.

(nit: would be nice if the chat box treated Enter and Shift+Enter the way these other companies have trained my brain, but maybe that is a deliberate choice.)

simonw 6 days ago|

In the three short days we had access to Fable I did have it run a review, and it spotted an issue for me to fix.

Thankfully GPT-5.5 is really strong on security stuff too. I wouldn't have dared build this without a whole lot of Opus/GPT-assisted prototyping and testing along the way.

euroderf 6 days ago||

I never understood why someone hasn't made a framework that makes it stupidly easy to fill an HTML page with SQLite database tables, with all the usual display controls, and with as much "liveness" as desired, and with a protocol (over HTTPS) to manage comms to a server-side instance. SQLite is robust, lightweight, bulletproof - a WASM build belongs on ALL the webpages !

joren- 6 days ago||

As mentioned below I have been building the 'read' side of this: a data publication platform. I wanted to avoid any server side components. The communication / write part and updating the server-side sqlite database would need running components on the server which I wanted to avoid.

The 'write' part would technically be very doable and not that different from other back-ends.

https://github.com/GhentCDH/Pihka

mstipetic 6 days ago|||

Did you have a look at https://evidence.dev

iLoveOncall 6 days ago|||

Because it's pretty much worthless.

You almost never need just a basic list of all the data in your table, even if you're able to filter and sort it. There's no moat there at all. People need serious BI tools, and that throws simplicity out of the window (PowerBI, QuickSight, etc.).

mpeg 5 days ago||

I disagree, a lot of the time people buy "serious BI tools" precisely because they think they need all that power and complexity.

In reality, what most people need is much simpler, a mini app with some curated datasets and simple filters, maybe some AI querying if we want to get fancy. There's some companies out there that work with big data, but for the rest of us small data is ok.

simonw 5 days ago|||

I think of Datasette as a "small data" platform, where small data is anything that would fit on my phone.

My phone has 1TB of storage.

mpeg 5 days ago||

I've used that with companies I consult for, everyone thinks they should do what Google does, so sometimes I'll drop them the "your whole company data fits in my phone/laptop" line to make them understand the (lack of) scale

uberex 5 days ago|||

duckdb -ui

mpeg 5 days ago||

Data engineers hate this one simple trick

dsego 6 days ago|||

Something like sync engines? I think there are a bunch nowadays.

https://syntax.fm/show/924/sync-engines-and-local-data

potatoman22 5 days ago|||

It's not specific to SQLlite per se, but that's what most dashboard builders are

uberex 6 days ago||

Like MS Access on web?

jumpkick 5 days ago||

Imagine if this were built into browsers and you only had to serve a SQLite file.

simonw 5 days ago||

I have a version of Datasette that runs entirely in the browser (using Pyodide and WebAssembly) and it's smaller than a lot of modern React homepages (12.35MB):

https://lite.datasette.io/

My more recent prototype shrinks that to 10.47 MB transferred: https://simonw.github.io/research/pyodide-asgi-browser/datas...

20after4 5 days ago||

From TFA:

> a Datasette-style backend to a self-contained HTML frontend is an astonishingly powerful combination.

1000% agree. And datasette is a terrific framework to build any kind of data exploration or visualization on.

This sounds like an awesome feature and a good excuse for me to dive back into playing with Datasette.

joren- 6 days ago||

Looks like a good addition to the datasette ecosystem. I have been working on a similar idea with cusom html around sqlite databases. By default a faceted search interface is generated but by reusing the client side data layer, custom apps are made easy.

The design keeps data and presentation together and even maps do not rely on external services.

I have called it Pihka: https://ghentcdh.github.io/Pihka/ https://github.com/GhentCDH/Pihka

pietz 6 days ago||

Hey Simon,

although I'm coming from a different starting point, it seems like some of our thoughts have aligned. I'm building https://caipi.ai/ as a workspace for agents to build simple data driven apps. The agent edits through MCP and the user gets an interactive app in the browser.

If you're interested picking each others brains around this topic, I'd be psyched to have a chat. gh:pietz.

est 5 days ago|

I didn't quite get the CSP part. Why use and srcdoc and <meta http-equiv="Content-Security-Policy"> instead of a real server header? Static hosting?

simonw 5 days ago|

If you host iframe apps at a fixed URL like:

  /-/apps/iframe-content/timeline.html

You can protect it with CSP headers, but you can't also protect it with the sandbox="" attribute (should a user visit it directly)

If you want both sandbox= restrictions and CSP headers at the same time the only way I've found that works cross all major borders is the iframe plus srcdoc="" with injected CSP meta headers patterns.

Note that a lot of sandbox implementations serve their iframe content from a separate domain, to ensure cookies and localStorage and other same origin things are robustly protected.

I can't do that easily for Datasette because it's open source software that people can run on their own laptops, so I didn't want to block people on "now register a domain/subdomain and set this up in DNS".

cxr 5 days ago|||

CSP is optional and designed to be one part of a defense-in-depth strategy (to extent that it was thoughtfully designed at all—it's an awful standard that should not have made it past proposal stage). It's not a solution for sandboxing untrusted content and should not be relied upon that way. Treating it like one is a great demonstration of how some uses of CSP make people more vulnerable.

simonw 5 days ago||

Right, which is why I'm combining it with <iframe sandbox=""> - which really is designed to be used as a sandbox (if you can figure out the right way to implement it.)

cxr 4 days ago||

> <iframe sandbox=""> - which really is designed to be used as a sandbox

Not for untrusted content living on the same origin to prevent it from exercising any of the powers that it would ordinarily have to be able to access sensitive data. It's a misleading name and shouldn't have been chosen. There is no combination of CSP or the iframe sandbox attribute that can be relied upon for that purpose. This is a fundamental limitation of the way the specs were written.

(There needs to be a big warning about this on MDN, but moving from the old wiki to a wiki with GitHub for login to the GitHub-based pull request process really didn't help the there's-a-problem-on-this-page-but-limited-resources-to-make-things-better problem.)

simonw 4 days ago||

That's why I'm careful not to include allow-same-origin in the sandbox attribute - without that the iframe content is treated as a separate origin from the parent.

And I serve the content in srcdoc= to ensure there's no URL a user can visit which would directly execute the content outside of that iframe sandbox.

cxr 4 days ago||

> That's why I'm careful not to include allow-same-origin in the sandbox attribute

It doesn't matter. I just said there is no combination of CSP or the iframe sandbox attribute that can be relied upon here.

simonw 3 days ago||

If that's true then my project is fatally flawed and I need to stop distributing it.

I'm not convinced it's true - I've been thinking about this for months, and building experimental prototypes to help me get to the combination that I think makes sense.

Can you describe an exploit that the combination I'm using of iframe sandbox= srcdoc= with an injected meta CSP tag doesn't handle?

Would moving the untrusted content to be served from a separate domain entirely close the hole?

(In case it's not clear the iframe sandbox= is the bit that's doing most of the work here - the CSP stuff is there mainly to protect against malicious apps that deliberately exfiltrate stolen private data.)

cxr 3 days ago||

> Would moving the untrusted content to be served from a separate domain entirely close the hole?

Yes.

est 5 days ago|||

[dead]

More comments...