Show HN: Kage – Shadow any website to a single binary for offline viewing

Posted by tamnd 5 hours ago

Show HN: Kage – Shadow any website to a single binary for offline viewing(github.com)

309 points | 64 comments

simonw 2 hours ago|

I was intrigued to see how the demo GIF in the README was generated: https://github.com/tamnd/kage/blob/01e75b87ecc893bbba7943c63...

Turns out it's using another project by the same author: https://github.com/tamnd/ascii-gif

The script used for the demo is at https://github.com/tamnd/kage/blob/01e75b87ecc893bbba7943c63... and has a comment showing how to run it:

  ascii-gif render docs/demo/kage.tape -o docs/static/demo.gif

Looks like it's an opinionated wrapper around https://github.com/charmbracelet/vhs

jubilanti 1 hour ago||

Have you heard the good news about the terminal savior asciinema -- https://asciinema.org/

stavros 5 minutes ago|||

VHS is fantastic for scripting cli video generation.

alterom 1 hour ago||

FYI, on other platforms (Windows/MacOS), LiceCAP is a fantastic tool to record screen into compact GIFs by the author of Winamp and Reaper DAW:

https://www.cockos.com/licecap/

wolttam 4 hours ago||

One use I'd have for this is company wikis that you want to give folks easy offline access to (maybe the wiki has documentation that's useful at sites that don't have cellular coverage).

Cool!

It would be especially cool to have a version that didn't require the separate serving process - even though it's nifty you can package up a whole site as a single binary.

Maybe a single HTML entrypoint shim with a bit of javascript that could index into an archive (potentially embedded) of the site's content?

tamnd 4 hours ago|

Submitting this to Hacker News is the right place! Thanks for your idea. I will consider implementing that :)

Also, in my mind, I already have a script/program to convert HTML to Markdown, so it could actually store everything on disk as a folder of Markdown files, and then commit them to a Git repo.

mgiampapa 37 minutes ago||

I think the zim flow was perfect for offline use. I know I will be making use of it as soon as I can figure out how to pass chrome the cookies so I can be signed into the site. Didn't see it in the page, but I didn't look closely yet.

ninalanyon 2 hours ago||

> kage serve $HOME/data/kage/paulgraham.com

If the result is static why does it need a server? Isn't it possible to make it so that it can simply be opened by the browser? Like:

$ firefox $HOME/data/kage/paulgraham.com

Then the result would be useable on machines without kage nstalled.

doctoboggan 2 hours ago||

Usually JavaScript is blocked when you load pages that way.

dmazzoni 1 hour ago|||

Not all JavaScript, but a lot of APIs are restricted

embedding-shape 2 hours ago||||

Since when? You won't be able to make HTTP requests to localhost, as it'd be a different Origin, but I don't think any mainstream browser blocks JS outright when you use file:// to load and view HTML files.

rzzzt 1 hour ago||

Somewhere around 2019, each document loaded from file:// became its own origin in Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=1500453 (I didn't check when this happened in Chromium)

Related WHATWG discussion: https://github.com/whatwg/html/issues/3099

pixelatedindex 2 hours ago||||

I thought all the JS was stripper?

recursive 1 hour ago|||

I am quite familiar with this and it is factually false

danielheath 1 hour ago||

Js modules don’t work on file urls (classic js does).

afavour 2 hours ago||

You’ll likely run into a ton of CORS issues doing that.

embedding-shape 2 hours ago||

I don't think so, there is no HTTP requests being done from JS as it's stripped away, and all the other resources are pulled down (and I'm assume their reference made relative), so really shouldn't be any issues because of CORS at all.

chfritz 4 minutes ago||

how is this different from using puppeteer to load the page and save the DOM as HTML?

maxloh 4 hours ago||

I find SingleFile [0] to be a much more robust version of this.

It strips out all the JavaScript too, but also packs everything into a single HTML file that is easy to transfer. Binary assets (like web fonts and images) are packed as base64 strings.

They also offer a CLI powered by Puppeteer. [1]

[0]: https://github.com/gildas-lormeau/singlefile

[1]: https://github.com/gildas-lormeau/single-file-cli

tamnd 4 hours ago||

It seems this repo only saves one web page?

What I'm implementing here is mirroring a whole website, with all its subpages, so you can browse it all offline. For example, all essays from paulgraham.com.

maxloh 3 hours ago|||

Oh, I see. In that case, feature-wise, it is actually a modern alternative to HTTrack.

I think the misunderstanding stems from the browser's "Save As" reference in the description. It is misleading. You use "Save As" to save a single page, not an entire website.

Also, the description lacks a clear explanation of the project's purpose. It would be helpful to include a sentence explaining that the program downloads an entire website, not just a single page.

sdevonoes 4 hours ago|||

[flagged]

sermah 4 hours ago||

Um. Whose website are you on right now?

ivangelion 3 hours ago||

Don't come here to laugh but always great when it happens anyways.

wamatt 2 hours ago|||

Love love love SingleFile too. The FF extension works pretty well for a clean save.

That said, Kage looks promising if OP can combine SingleFile reproduction quality with the HTTPTrack spidering approach. SPA's are kinda tricky with archiving and do wonder how well Kage would handle that

initramfs 1 hour ago||

I've seen the option in IE- .mhtml.

For some reason it displays in IE better but I don't recall seeing this option in chrome of Firefox recently..

tamnd 4 hours ago|||

And thanks for the link. Let me implement this single HTML feature, it looks nice to have!

maxloh 3 hours ago||

Yeah. An idea on top of that is to bundle an entire website into a single HTML page, with vendored JavaScript to enable client-side routing (all of the original pages' JS is still stripped out).

That way, the page is self-contained as it is, but requires no bundled binary code to serve the site. It is actually safer security-wise.

The vendored script can be as simple as this:

  const site = {
    "path-1": "<!DOCTYPE html><html> ... </html>",
    "path-2": "<!DOCTYPE html><html> ... </html>",
    // More paths
  }

  function attachListeners() {
    for (const [path, html] of Object.entries(site)) {
      document.querySelector(`a[href=${path}]`).onclick = () => {
        document.documentElement.outerHTML = html
        attachListeners()
      }
    }
  }

  document.addEventListeners("DOMContentLoaded", attachListeners)

arikrahman 2 hours ago|||

This is what I first thought and it's a very elegant solution, and not needlessly overcomplicated.

HelloUsername 4 hours ago|||

What's the difference with, any webbrowser on a computer, File -> Save as ?

nmstoker 4 hours ago|||

That's for a single page, this handles the whole site. Also the browser Save As options often work poorly.

dmazzoni 1 hour ago|||

Save As works fine for simple websites with static content.

Let's say you have a site that fetches content from a database. If you Save As, then at best you'll get a local copy of an HTML page with JS that loads the content from the same remote database. It might not work (since the local copy has a different origin), or if it does, it requires you to be online, which defeats half of the purpose.

What this project, and SingleFile, both do is save a snapshot of what the rendered page actually looks like at that moment in time. The scripts are stripped out so it runs locally and has no external dependencies.

telesilla 3 hours ago||

I've been using httrack (https://www.httrack.com) to download wikis to read on flights, which isn't perfect but better than I'd found previously. I'll try this out, I'd be delighted to have good results. Thanks for the post.

throwaway219450 1 hour ago|

Specifically for wikis, is there a reason you wouldn't use Kiwix? For non "official" releases it's more complicated, but there are some services to generate the ZIM files. The desktop reader app is pretty good in my experience.

https://wiki.openzim.org/wiki/Build_your_ZIM_file

coffeecoders 1 hour ago||

I've accumulated a bunch of old website archives over the years. The funny thing is the ugly HTML dumps have been more useful than the "perfect" archive.

It's one of the reasons I've become a bigger fan of RSS over time. A feed from 10-ish years ago is often more usable today than a carefully preserved (application) website.

dimiprasakis 4 hours ago||

Neat project, I like the idea. One thing from a quick read: you launch Chrome with --no-sandbox. Is there a good reason for that? Security wise it's probably not a good idea. If there is no reason, I'd suggest leaving the sandbox on!

In any case, cool stuff :)

gregwebs 4 hours ago||

This seems like it has potential to create a lot of load on a site- are there settings to set how fast it clones or avoid images/videos? Is there a way to only get a subset of a website?

tamnd 4 hours ago||

Could you help create a new issue for that? I will do it later. It is already 1:00 AM my time, but I am happy that anyone is interested in it. : )

ares623 1 hour ago||

Just pretend you're an AI crawler problem solved

shinryuu 2 hours ago|

Reminds me of this. https://gwern.net/gwtar

Compared to that is there anything kage does better?

More comments...