Top
Best
New

Posted by david927 3/30/2025

Ask HN: What are you working on? (March 2025)

What are you working on? Any new ideas that you're thinking about?
390 points | 997 comments
enisberk 3/31/2025|
Finishing up my PhD thesis on low-resource audio classification for ecoacoustics. Our partners deployed 98 recorders in remote Arctic/sub-Arctic regions, collecting a massive (~19.5 years) dataset to monitor wildlife and human noise.

Labeled data is the bottleneck, so my work focuses on getting good results with less data. Key parts:

- Created EDANSA [1], the first public dataset of its kind from these areas, using a improved active learning method (ensemble disagreement) to efficiently find rare sounds.

- Explored other low-resource ML: transfer learning, data valuation (using Shapley values), cross-modal learning (using satellite weather data to train audio models), and testing the reasoning abilities of MLLMs on audio (spoiler: they struggle!).

  Happy to discuss any part!
[1]https://scholar.google.com/citations?user=AH-sLEkAAAAJ&hl=en
teleforce 4/1/2025||
Hi Enis, it seems a very interesting project. I myself with my team are currently working with the non-stationary of physiological and earthquake seismic public data mainly based on the time-frequency distributions, and the results are very promising.

Just wondering if the raw data that you've mentioned are available publicly so we can test our techniques on them or they're only available through research collaborations. Either way very much interested on the potential use of our techniques for the polar research in Arctic and/or Antarctica.

enisberk 4/1/2025||
Hi teleforce, thanks! Your project sounds very interesting as well.

That actually reminds me, at one point, a researcher suggested looking into geophone or fiber optic Distributed Acoustic Sensing (DAS) data that oil companies sometimes collect in Alaska, potentially for tracking animal movements or impacts, but I never got the chance to follow up. Connecting seismic activity data (like yours) with potential effects on animal vocalizations or behaviour observed in acoustic recordings would be an interesting research direction!

Regarding data access:

Our labeled dataset (EDANSA, focused on specific sound events) is public here: https://zenodo.org/records/6824272. We will be releasing an updated version with more samples soon.

We are also actively working on releasing the raw, continuous audio recordings. These will eventually be published via the Arctic Data Center (arcticdata.io). If you'd like, feel free to send me an email (address should be in my profile), and I can ping you when that happens.

Separately, we have an open-source model (with updates coming) trained on EDANSA for predicting various animal sounds and human-generated noise. Let me know if you'd ever be interested in discussing whether running that model on other types of non-stationary sound data you might have access to could be useful or yield interesting comparisons.

d_burfoot 3/31/2025|||
You should train a GPT on the raw data, and then figure out how to reuse the DNN for various other tasks you're interested in (e.g. one-shot learning, fine-tuning, etc). This data setting is exactly the situation that people faced in the NLP world before GPT. I would guess that some people from the frontier labs would be willing to help you, I doubt even your large dataset would cost very much for their massive GPU fleets to handle.
enisberk 3/31/2025||
Hi d_burfoot, really appreciate you bringing that up! The idea of pre-training a big foundation model on our raw data using self-supervised learning (SSL) methods (kind of like how GPT emerged in NLP) is definitely something we've considered and experimented with using transformer architectures.

The main hurdle we've hit is honestly the scale of relevant data needed to train such large models from scratch effectively. While our ~19.5 years dataset duration is massive for ecoacoustics, a significant portion of it is silence or ambient noise. This means the actual volume of distinct events or complex acoustic scenes is much lower compared to the densely packed information in the corpora typically used to train foundational speech or general audio models, making our effective dataset size smaller in that context.

We also tried leveraging existing pre-trained SSL models (like Wav2Vec 2.0, HuBERT for speech), but the domain gap is substantial. As you can imagine, raw ecoacoustic field recordings are characterized by significant non-stationary noise, overlapping sounds, sparse events we care about mixed with lots of quiet/noise, huge diversity, and variations from mics/weather.

This messes with the SSL pre-training tasks themselves. Predicting masked audio doesn't work as well when the surrounding context is just noise, and the data augmentations used in contrastive learning can sometimes accidentally remove the unique signatures of the animal calls we're trying to learn.

It's definitely an ongoing challenge in the field! People are trying different things, like initializing audio transformers with weights pre-trained on image models (ViT adapted for spectrograms) to give them a head start. Finding the best way forward for large models in these specialized, data-constrained domains is still key. Thanks again for the suggestion, it really hits on a core challenge!

mnky9800n 4/1/2025||
Do the recorders have overlapping detections?
enisberk 4/1/2025||
If you’re asking whether multiple recorders were active at the same time, then yes, we had recorders at 98 different locations over four years, primarily during the summer months. However, these locations were far apart, so no two recorders captured the same exact area.
mnky9800n 4/4/2025||
Oh the reason I ask is that multiple recorders that hear the same ambient noise can be stacked to produce signals that are otherwise unobservable in a single signal.
frontierkodiak 3/31/2025|||
oh man that's awesome. I have been working for quite some time on big taxonomy/classification models for field research, espec for my old research area (pollination stuff). the #1 capability that I want to build is audio input modality, it would just be so useful in the field-- not only for low-resource (audio-only) field sensors, but also just as a supplemental modality for measuring activity out of the FoV of an image sensor.

but as you mention, labeled data is the bottleneck. eventually I'll be able to skirt around this by just capturing more video data myself and learning sound features from the video component, but I have a hard time imagining how I can get the global coverage that I have in visual datasets. I would give anything to trade half of my labeled image data for labeled audio data!

enisberk 3/31/2025||
Hi Caleb, thanks for the kind words and enthusiasm! You're absolutely right, audio provides that crucial omnidirectional coverage that can supplement fixed field-of-view sensors like cameras. We actually collect images too and have explored fusion approaches, though they definitely come with their own set of challenges, as you can imagine.

On the labeled audio data front: our Arctic dataset (EDANSA, linked in my original post) is open source. We've actually updated it with more samples since the initial release, and getting the new version out is on my to-do list.

Polli.ai looks fantastic! It's genuinely exciting to see more people tackling the ecological monitoring challenge with hardware/software solutions. While I know the startup path in this space can be tough financially, the work is incredibly important for understanding and protecting biodiversity. Keep up the great work!

thePhytochemist 3/31/2025|||
I'd love to turn my spectrogram tool into something more of a scientific tool for sound labelling and analysis. Do you use a spectrograph for your project?
enisberk 3/31/2025||
Hey thePhytochemist, cool tool! Yes, spectrograms are fundamental for us. Audacity is the classic for quick looks. For systematic analysis and ML inputs, it's mostly programmatic generation via libraries like torch.audio or librosa. Spectrograms are a common ML input, though other representations are being explored.

Enhancing frequenSee for scientific use (labelling/analysis) sounds like a good idea. But I am not sure what is missing from the current tooling. What functionalities were you thinking of adding?

999900000999 4/1/2025|||
How do I download the sounds, seems like a great resource for game developers and other artists
enisberk 4/1/2025||
Our labeled dataset (EDANSA, focused on specific sound events) is public here: https://zenodo.org/records/6824272. We will be releasing an updated version with more samples soon.

We are also actively working on releasing the raw, continuous audio recordings. These will eventually be published via the Arctic Data Center (arcticdata.io). If you'd like, feel free to send me an email (address should be in my profile), and I can ping you when that happens.

999900000999 4/1/2025||
How can I actually search for audio, I'll check in 6 months or so.

What's the licensing is it public domain

enisberk 4/1/2025||
You can search for the "Arctic Soundscapes Project 2019-2024". We are still working out the specifics of the licensing to meet our funding requirements, but it will be permissive.
ghoshbishakh 3/31/2025|||
Very interesting. All the best for your thesis. Mine is not nearly as interesting enough.
enisberk 3/31/2025||
Thanks! Appreciate it. Your work looks very interesting too, especially in the distributed systems space. Cheers!
r3db34rd 4/1/2025||
[dead]
taejavu 3/30/2025||
I've been putting together a no-nonsense free invoice generator, for people (like myself) that only occasionally send invoices. It's more-or-less a WYSIWYG editor, and the state is stored in the URL, so you don't have to worry about keeping track of where you stored your copy - if you've sent someone an email with the link, you've got a copy. This project was born out of the frustration of trying to generate an invoice on my phone on the go, I found all the existing solutions to be quite awful (forced signups to paid subscriptions, clunky interface etc).

Would love to hear any feedback the HN crowd has. I'm aware of a couple of alignment issues, will fix them up tonight. Also, yes, there will be a "generate PDF" button, for now if you want a pdf I'd suggest using the Print dialog to "Save as PDF".

https://bestfreeinvoice.com

vldszn 3/31/2025||
I built something similar for myself, but with a live PDF preview, support for downloading PDFs in multiple languages (English and Polish), VAT tax deductions, and multiple currencies.

https://easyinvoicepdf.com/

Also it’s open-source https://github.com/VladSez/easy-invoice-pdf

Would love to receive any feedback as well :)

taejavu 3/31/2025||
Nice work, this looks better than the established players in this space! You've got some customisation options in yours that I'd love to implement in mine as soon as time permits. Huge props for not forcing people to sign up for an account to use it!
vldszn 3/31/2025||
Thank you =)
6510 3/31/2025|||
Most important is for it to have the right number consistently without room for error. You may also use multiple sequences like if you sell cars and crates of bananas you wouldn't want to mix those but you would want to use a single tool.

Anything else can be corrected. It is important to easily make corrections and/or credit nota as those seem to happen at the worse time. Usually the same as the invoice but with amounts in the -

It is also nice to tie the products into it so that you don't have to type it every time and get consistent naming. Same for an address book.

Havoc 3/31/2025|||
Think you’d need to make it country specific. Some countries have specific requirements around invoices and what needs to be on it
joseda-hg 3/31/2025||
If you could add custom templates, you could let the community add the country specific things
qwertox 3/31/2025||
I'm leaving this link here regarding the "EU e-invoicing standard CEN 16931"

https://ec.europa.eu/digital-building-blocks/sites/display/D...

I don't know how complicated or easy it would be to just create templates which satisfy this.

In Germany it's already required for B2B transactions [0]

  In principle, the following formats will comply with CEN 16931:
  ZUGFeRD: Hybrid format: Human-readable PDF/A-3 with an embedded XML file in the syntax "Cross-Industry Invoice" (CII)
  XRechnung: XML file in the syntax "Cross-Industry Invoice" (CII)
  XRechnung: XML file in the "Universal Business Language" (UBL) syntax
[0] https://www.bdo.global/en-gb/insights/tax/indirect-tax/germa...!
raphig 3/31/2025||
You can use AI to convert your PDF invoice into XRechnung to comply with EN 16931 should that occur, e.g., https://www.invoice-converter.com/en
colincooke 3/31/2025|||
Nice work! Would totally have used this when I was freelancing. Honestly love the serif'd fonts, would love to see everything serif'd tbh.

Also back when I had to do these (I used Wave) having a notes section was very useful to include a few things (i.e. I used to include conversion rates). Would probably be pretty easy.

taejavu 3/31/2025||
Thanks for the feedback, a notes section is a great idea!
6510 3/31/2025||
it might be required to name some law
ttw44 4/2/2025|||
I really like this, I'll maybe use it once I actually get some clients to respond ;)

I tried implementing my own on my own website, but between all my projects and work I keep pushing my own business website backward...

niraj-agarwal 3/31/2025|||
Useful and simple. The pattern could be applied to any templatized document we need to generate.
taejavu 3/31/2025||
Yeah it'd be cool to consider this approach for some other domains. Sometime soon I'll make it so you can change the word in the top-left from "INVOICE" to "QUOTE" or "RECEIPT". The nice thing about an invoice is you know there's going to be a relatively small amount of data, so storing the state in the URL is a plausible approach (even if it looks obscene to discerning users ).
nodesocket 3/31/2025|||
Interesting, I’ve been using https://simpleinvoices.io for a couple of years and really like it. Integrates with my Stripe and super easy to configure. Best of luck!
taejavu 3/31/2025|||
It looks nice at a glance, but there's just no way I can justify $15/mth for the few times a year I need to send an invoice. If https://bestfreeinvoice.com gains some amount of traction I'd love to extend the feature set to justify a paid tier, but the basic experience (i.e. what's currently live) will always be free and genuinely useful.
hboon 3/31/2025|||
I have been using simpleinvoices.io for a few years too. It's great. (Just checked, 10 this year).
atishaykumar 3/31/2025|||
Pls add support for changing currency and deductions (in India 10% TDS is supposed to be deducted on the amount (ie amount before tax is added)

Overall it’s pretty good solution for occasional invoice generators.

taejavu 3/31/2025||
Currency selection is on the roadmap but the need for deductions wasn't on my radar - thanks for the tip!
Aspos 3/30/2025|||
Awesome. Would suggest considering a built-in URL shortener.
taejavu 3/31/2025|||
I know the URLs are long compared to what people are used to. Part of the rationale for putting this out there in the current state is I'm curious if people in practise will share long URLs or not.

At some point I'd like to add shortlinks but at the moment everything is clientside, there's no persistence at all (beyond localStorage). I think that's a nice feature from a security perspective.

Cyphase 3/31/2025||||
Not too short or it's too easy to guess.

The app might be stateless right now (I haven't checked); if it is, adding a URL shortener will break that.

Aspos 3/31/2025||
How about zipping the state string or, even better, putting it into a KV storage thus exchanging it for a 10-char string.
taejavu 3/31/2025||
Correct me if I'm wrong but zipping it would mean it would be encoded in a way that wouldn't be usable as a URL. The state is currently compressed and URL encoded via lz-string fwiw/in case you're curious.
nodesocket 3/31/2025|||
Ha, I was going to say is that url query string a sha256 or something.
taejavu 3/31/2025||
Yeah it's compressed and encoded with `lz-string` but I know it's hideous - I'm curious if that will prove a practical problem for adoption, even if a technical crowd isn't fond of the choice. Surprisingly most messaging apps you paste urls into these days don't show the whole thing anyway - when you paste it'll transform the link into a small preview card with just the domain portion of the URL visible.
LeonM 3/31/2025|||
Not be negative about bestfreevoice, or any other invoicing tool, but it always seems like developers and early-stage entrepreneurs don't understand invoicing.

First major disconnect is that not every country uses invoices, but may use receipts instead. This is true for the USA for example, so many US devs (for example: Stripe in the early days) are not familiar with the concept of invoicing. Technically there is no difference between receipts and invoices, so if you're not familiar with the concept of invoicing, just read this post with /s/invoice/receipt in mind.

The point about invoicing is to act as a non-mutable entry into the ledgers of both parties (seller and buyer). In most countries (especially EU) invoices are mandated by law for B2B transactions, and so is keeping accounts (aka bookkeeping). So for invoicing to be practical it needs to be tied to your books/accounts. Because of this, any business will use some bookkeeping/accounting software, which will have invoicing capabilities built-in. Invoicing as a standalone product doesn't make sense if you have to import it all into your ledgers later.

Then there is the 'design' trap, which many invoicing startups seem to fall for. Invoices are weird things. They are basically very, very inefficient artefacts from the past. An invoice is just a very little amount of transaction data exchanged between buyer and seller. In the days of physical bookkeeping (actual paper books) paper invoices made sense, but nowadays it is all done digitally. So the invoice is effectively a machine-2-machine interface, but for all sorts of legacy reasons we still wrap them in PDF with a fancy design that looks great for humans, but it effectively impossible for machines to read.

There are all sorts of attempts made to improve upon this situation (like OCR, and nowadays AI to extract data from PDF invoices). There are open structured data formats such as UBL to replace / augment PDF invoices, but due to all sorts of politics and lobbying the open standards have been doomed from the beginning. There is a lot of money made in accounting software, and they all rely on vendor lock-in. The major accounting software vendors have very strong incentives to keep us from adopting UBL et al, and most of the established accounting product suck, but you can't easily migrate so you'll be stuck with it.

If you run or own a business, treat your books as an asset of your business, a very important asset for that matter. Books are kept in accounting software, which is typically part of a larger software suite which also features tax filing, HRM, asset management, invoicing, etc. In fancy business terms this is often called ERP. But think of ERP as just your central database, or your 'books'.

Choosing your accounting software an important decision. Choose accounting software that allows exporting your data (very important!), that has an API (also very important), and preferably a web interface. It should be always available, so on-premise software is out. For entrepreneurs: choose your own accounting software, do not be tempted to hire an external bookkeeper that keeps the books in 'their' systems (accountant lock-in). Don't let an accountant recommend your software either, they get huge kickbacks from the software vendors (vendor lock-in). Every sale—whether this being PoS, invoicing, or a payment integration like Stripe—should automatically registered as a ledger entry in the books, preferably with an invoice document attached. Here you can see why an accountant who keeps your books in their systems won't work, you don't want to be stuck having to periodically send an email (or shoebox) filled with invoices for them to process. Your books should be owned by the business, should be automated (at least for the receivable side), and always be up-to-date. You can then give an(y) accountant access to your books for them to do audits, tax filings, etc. For a business, the books are the central database of the business, everything else revolves around it. Do not be tempted to write your own, instead integrate with existing solutions while avoiding vendor lock-in as much as possible.

Integrating your business with the accounting software is an ever-ongoing part of your software development efforts, so not underestimate it. Accepting payments is hard, making sure it is well registered in your books is equally hard. It takes _much_ more time than you'd think (most first-time entrepreneurs actually don't consider it at all). There are no silver bullets here.

nick88msn 3/31/2025|||
In Italy, many of these invoicing challenges have already been tackled through a nationwide standardized system.

Every invoice—whether B2B or even B2C (receipts included)—must be sent electronically using a government-defined XML format. This invoice includes predefined metadata and is digitally signed by the issuing party. Once ready, it gets submitted to the national tax agency’s centralized system, called the Sistema di Interscambio (SdI), which validates and registers it before forwarding it to the recipient.

This system essentially acts as a clearinghouse: it ensures all invoices go through the same format, are verifiably issued, and are automatically recorded on both ends. For consumers (B2C), the invoice still goes through the same pipeline and is made available in their personal portal on the IRS website, while the seller can still email a copy (PDF) for convenience.

This centralized and machine-readable approach has eliminated a lot of the fragmentation seen elsewhere. There’s no vendor lock-in, no OCR, and no AI needed to parse PDFs—just a signed XML file going through a common pipeline. It’s not perfect, but it shows how much smoother things can be when the rules (and formats) are defined at the infrastructure level.

LeonM 3/31/2025||
> Every invoice—whether B2B or even B2C (receipts included)—must be sent electronically using a government-defined XML format

So not a universal standard then. Imagine having to implement a different format for every country you do business with...

For the Netherlands there is a similar (but slightly different I believe) XML type format required if you want to do business with the government. Initially a company successfully lobbied to get their closed-specification version to be the mandated standard for government, to get the XML spec you had to become partner (I believe for €8k/year or something).

Luckily they are now performing a XKCD 927 and have defined a few new (this time open) standards, which they aim to consolidate into a new spec that complies to EN 16931. EN 16931 is the EU compliance standard for e-invoicing.

mappu 4/2/2025||
In New Zealand we are also phasing in eInvoicing using Peppol BIS 3.0, which complies to EN 16931.
conductr 3/31/2025||||
While all well and true. Some scale is assumed.

There’s plenty of need for basic use invoicing like this. Generate an invoice as a way to bill someone or serve as an estimate for work/project cost. Not everyone is at a place where it needs to be so formal and integrated into a complete solution that tracks the dollars from invoice to balance sheet to income statement, etc. It’s a lot especially for people that are just freelancing and need a similar probably infrequent way to send a bill. They probably are just tracking things in a spreadsheet and not even big enough to use quickbooks or anything else. It would be a poor use of time and over engineering to put that all in place and setup things that cost subscription dollars in perpetuity just to bill for a one off charge. Or even a handful of them.

When I think of people I pay, my lawn guy and my housekeeper both just text me how much I owe them. Then I zelle them. They both have dozens of clients at least and I imagine they are doing this way for them all. If I were a business, in May insist on getting an invoice to load the AP into an accounting flow from my end but they wouldn’t really want to change their system of doing things just to comply with my request. So, they may want something like this that just basically converts the text message info into an official looking invoice.

I feel the real problem is everyone is assuming this side project type thing to solve every edge case that exists in the world. Even the bigger guys like stripe. That’s the wrong take. They offer a solution, you have to evaluate of it fits your needs, if not, use something else. If you’re in a locale that mandates something completely different, use something else. This project is being very transparent about what it does and how it works, which should help you out if you have a requirements list to compare it to.

fauigerzigerk 3/31/2025||||
>First major disconnect is that not every country uses invoices, but may use receipts instead. This is true for the USA for example, so many US devs (for example: Stripe in the early days) are not familiar with the concept of invoicing. Technically there is no difference between receipts and invoices, so if you're not familiar with the concept of invoicing, just read this post with /s/invoice/receipt in mind.

I find this hard to believe. An invoice is a request for payment. A receipt is proof/confirmation of payment. Invoices sometimes double as receipts (or rather the other way around) when the payment is made immediately. But how can a country not have something that represents a formal request for payment by some future time?

I don't even understand this from an accounting perspective. What would accounts receivable and accounts payable even mean without this distinction? How would you date the respective journal entries if there is no distinction?

LeonM 3/31/2025||
> But how can a country not have something that represents a formal request for payment by some future time?

There are plenty countries where the vendor will charge the account of the customer, like a 'pull' mechanism. In many countries they'll use (or used) checks/cheques for that, or a different payment account like a credit card. The agreement for this would have been a contract. They may still use invoices for larger transactions, but they aren't always required by law.

I remember that in the old days, Google, Stripe, etc wouldn't send invoices, sometimes you'd get a minimal receipt message by email, but that was about it. This was particularly annoying for EU-based companies where there are minimal requirements for invoices and/or receipts.

Times have changed though. Most companies, including US-based, will now offer invoices that comply with most international regulations.

Except PayPal of course, for some reason they still seem to get away with not offering invoices. You'll have to download your monthly account overview in PDF from their merchant portal, and they just slapped the following text on it: "This statement may serve as a receipt for accounting and tax related purposes.".

linsomniac 3/31/2025||||
>choose your own accounting software, do not be tempted to hire an external bookkeeper that keeps the books in 'their' systems (accountant lock-in).

~30 years ago I worked at a very small business (3 employees) and they used and liked Quickbooks. The accountant convinced them to switch to some "better" system and for around 3 months they had no idea how much money they had, they just lost all visibility into the system because it didn't work in the way they expected. "If things didn't look right, we'd just go through every screen in the system and press Post." At the end of those 3 months they realized they had unexpectedly gotten into $70K in debt -- this was ~35 years ago when a house was around that much. They had to take a second mortgage on their house. Eventually, they figured out the accounting system, righted the ship, and paid back the second mortgage over a few years. Y2K really helped, with that giant bump in sales.

thomasstuttard 3/31/2025|||
What accounting software would you recommend for first-time entrepreneurs? Are their any open-source solutions that can be self hosted that integrate with existing solutions?

I am just starting my journey into entrepreneurship, and have yet to choose a bank or accounting software, and would appreciate guidance. I am based in the UK, and will only be conducting business in the UK to start off with.

laleck 3/31/2025|||
Not OP but there are a few open source options. GNU cash is friendlier for beginners due to the GUI. I like plain text accounting, specifically beancount.

As far as integrations, GNU cash lets you import from various formats like quicken while beancount has lots of plugins from the community like importers for various banks. I don’t believe either offer invoicing but you could integrate it yourself or just manually record.

IMO, the hardest part of keeping your own books is learning double entry accounting.

thomasstuttard 3/31/2025||
Thanks for the recommendation for GNU cash will give that a look. What resources would you recommend for learning double entry accounting?
Nextgrid 3/31/2025|||
Starling Bank as the bank, and FreeAgent as the accounting software - it'll handle personal tax (self-assessment), corporation tax, VAT, and payroll. If you need an accountancy practice, I very much recommend Maslins - they'll provide FreeAgent access in that case as part of their fee.
thomasstuttard 3/31/2025||
Thanks for the recommendation, will take a look at Starling Bank and FreeAgent.
pcdoodle 3/31/2025||
Great friggin name!

Love that it dumps you right into the experience.

taejavu 3/31/2025||
Haha thank you, I was amazed the domain was available! And yeah jumping through hoops just to get to the invoice generator is something that frustrated me with existing alternatives, so dumping the user straight into it was one of the foremost decisions the design centered around.
andrethegiant 3/30/2025||
I'm working on pure.md[1], which lets your scripts, APIs, apps, agents, etc reliably access web content in markdown format. Simply prefix any URL with `pure.md/` and you get the unblocked markdown content of that webpage. It avoids bot detection and renders JavaScript-heavy websites, and can convert HTML, PDFs, images, and more into pure markdown.

pure.md acts as a global caching layer between LLMs and web content. I like to think of it like a CDN for LLMs, similar to how Cloudinary is a CDN for images.

[1] https://pure.md

shoebham 3/31/2025||
Love the recursion redirect at pure.md/pure.md
andrethegiant 3/31/2025||
You found the easter egg!
WillAdams 3/31/2025|||
It seems to miss URLs?

At: https://willadams.gitbook.io/design-into-3d/2d-drawing the links for:

https://mathcs.clarku.edu/~djoyce/java/elements/elements.htm...

https://mathcs.clarku.edu/~djoyce/java/elements/bookI/bookI....

https://mathcs.clarku.edu/~djoyce/java/elements/bookI/defI1....

are rendered as:

_Elements_ _:_ _Book I_ _:_ _Definition 1_

Maybe detect when a page is on gitbook or some other site where there is .md source on github or some other site and grab the original instead?

andrethegiant 3/31/2025||
By default, href values of <a> tags are removed, because they add significant token length without adding more context. Coming soon, you can specify a request header to set whether or not you want links removed from the response. Those underscores you mentioned are from the italics.
metadat 4/1/2025|||
Cool project!

Recently discussed, too: https://news.ycombinator.com/item?id=43462894 (10 comments)

wild_egg 3/31/2025|||
Thanks for sharing. I was planning on building something like this in April after hitting too many issues with Jina and Tavily but it looks like you've already done the hard work!
andrethegiant 3/31/2025||
Thanks! Still a work in progress :-)
wanderingbit 3/31/2025|||
What a great idea, I will soon be a paying customer. This solves a problem of an app I'm using that I was hesitant to try to develop myself.
andrethegiant 3/31/2025||
Much appreciated!
hardlyfun 3/31/2025|||
Very nice, how did you manage to bypass sites with cloudflare turnstile setup?
udev4096 3/31/2025||
Flaresolverr most probably
erekp 3/31/2025|||
how do you exactly fallback to common crawl? isn't the cost to even hold and query common crawl insane?
andrethegiant 3/31/2025|||
With AWS Athena, you can query the contents of someone else’s public S3 bucket. You pay per read, but if you craft your query the right way then it’s very inexpensive. Each query I run only scans about 1MB of data.
wfn 4/1/2025|||
Since I was just looking at this accidentally, here are some examples of how to query at a ~cent-per-query cost level (just examples but quite illustrative): https://commoncrawl.org/blog/index-to-warc-files-and-urls-in...
m0rde 3/31/2025|||
Is there an example we can see?
27theo 3/31/2025||
https://pure.md/https://news.ycombinator.com/item?id=4353323...
sharpshadow 4/1/2025|||
Works great on mobile thanks, helpful tool to bypass flaky websites, js and even some paywalls.
udev4096 3/31/2025||
[flagged]
NationOfJoe 3/31/2025||
i have no skin in the game and honestly i am wondering how this idea contributes to enshittifying the web more?

this idea just seems like it provides the same content as visiting the site in a different view, like reader mode?

hbsbsbsndk 3/31/2025|||
The service seems designed to bypass anti-scraping measures. If site owners don't want their content scraped by AI this is subverting their will.

It also obfuscates responsibility between the AI vendor and the scraping service. One can imagine unethical AI providers using a series of ephemeral "gateways" to access content while avoiding any legal or reputational harm.

elric 3/31/2025|||
I think the parent is referring to the goal of making the web more "LLM friendly".
megadragon9 3/30/2025||
I built a machine learning library [1] (similar to PyTorch's API) entirely from scratch using only Python and NumPy. It was inspired by Andrej Karpathy's Micrograd project [2]. I slowly added more functionality and evolved it into a fully functional ML library that can build and train classical CNNs [3] to even a toy GPT-2 [4].

I wanted to understand how models learn, like literally bridging the gap between mathematical formulas and high-level API calls. I feel like, as a beginner in machine learning, it's important to strip away the abstractions and understand how these libraries work from the ground up before leveraging these "high-level" libraries such as PyTorch and Tensorflow. Oh I also wrote a blog post [5] on the journey.

[1] https://github.com/workofart/ml-by-hand

[2] https://github.com/karpathy/micrograd

[3] https://github.com/workofart/ml-by-hand/blob/main/examples/c...

[4] https://github.com/workofart/ml-by-hand/blob/main/examples/g...

[5] https://www.henrypan.com/blog/2025-02-06-ml-by-hand/

999900000999 3/30/2025||
FOSS MTG inspired digital card game.

I love card games, but for digital card games the business model is beyond predatory. If you need a specific card your option is to basically buy a pack. Let’s say this is about 3$ give or take. But if it’s a specific rare card, you can open a dozen of so packs and still not get the specific card you want.

This can go on indefinitely, and apologists will claim you can just work around this, by building a different deck. But the business model clearly wants you to drop 50 to 100$ just to get a single card.

All for this to repeat every 3 months when they introduce new mechanics to nerf the old cards or just rotate out the dream deck you spent 100$+ to build.

I’m under no impression I’ll directly compete, but it’s a fun FOSS game you can spin up with friends. Or even since it’s all MIT, you can fork and sell.

It also gives me an excuse to use Python, looks like Django on the backend and Godot for the game client. Although the actual logic runs in Django so you can always roll a different game client.

Eventually I’d like different devs to roll their own game clients in whatever framework they want.

Want to play from the CLI, sure

sircastor 4/1/2025||
Many years ago Decipher (who made the Star Trek and Star Wars TCGs) rolled out a web platform for playing their games. It was the business model but with none of the advantages of the physical property. You would spend money on their platform to buy their digital cards, to play only there, and when you left the cards just disappeared into the void.
tasuki 3/31/2025|||
Have you heard about Mindbug[0]? It's a recent MTG-inspired (co-created by one of the authors of MTG) card game. Plays quick and is full of interesting and consequential decisions.

[0]: https://boardgamegeek.com/boardgame/345584/mindbug-first-con...

sentrysapper 4/2/2025|||
I started building a MtG competitor inspired by Altered and Netrunner. As weird as it sounds, I started with some bash scripts to see how the meta would play out to make sure the card values/strategies were balanced.

I would love to compare game development notes if you're interested in discussing this sometime.

bhu8 3/31/2025||
I'm sold. How do I play?
999900000999 3/31/2025||
Wait 3 months for me to finish.

So far it's basically just a Django server. You're responsible for self hosting ( although I imagine I'll put up a demo server), you can define your own cards.

You can play the game by literally just using curl calls if you want to be hardcore about it.

I *might* make a nice closed source game client one day with a bunch of cool effects, but that's not my focus right now.

maxwelljoslyn 3/30/2025||
My master's thesis[1] was half research, half dev project, exploring how we can continue to fully fuse traditional RPGs with computers. This goal is my life quest, my life's work.

I think virtual tabletops (VTTs) as they currently stand are barking up the wrong tree[2]. I want a computer-augmented RPG to allow the GM to do everything he does in the analog form of the game. On-the-fly addition of content to the game world, defining of new kinds of content, defining and editing rules, and many other things ... as well as the stuff VTTs do, of course. The closest we've gotten in the last 30 years is LambdaMOO and other MUDs.

The app I made for my thesis project was an experimental vertical slice of the kinds of functionality I want. The app I made after that last year is more practical and focused on the needs of my weekly game, in my custom system; I continue to develop it almost daily.

I'm itching to tackle the hardest problem of all, which is fully incorporating game rules in a not-totally-hardcoded way. I need rules to be first-class objects that can be programmatically reasoned about to do cool things like "use the Common Lisp condition system to present end user GMs with strategies for recovering from buggy rules." Inspirations include the Inform 7 rulebook system.

[1] See my homepage, under Greatest Hits: https://www.mxjn.me

[2] Anything that requires physical equipment other than dice and a regular computer is also barking up the wrong tree. So no VR, no video-tracked physical miniatures, no custom-designed tabletop, no Microsoft Surface... Again, just my opinion.

EliasWatson 3/31/2025||
I'm working on something similar. I'm building a MUD with an LLM playing the role of GM. Currently it just controls NPCs, but I eventually want it to be able to modify the game rules in real time. My end goal is a world that hundreds of players can play in simultaneously, but has the freedom and flexibility of a TTRPG (while still remaining balanced and fair).
maxwelljoslyn 4/1/2025||
That's really cool, Elias. I keep seeing people try to put LLMs into the role of the GM. But I think you're doing something new and important by working to have the rules available to it.

Is your project available anywhere? Best of luck!

If you're interested, because I kept seeing "LLM as GM" projects, I got curious about how well it would work to have LLMs as players instead. So I made this:

https://github.com/maxwelljoslyn/gm-trainer

It's a training ground for GMs to practice things like spontaneous description, with 4 AI players that get fed what each other say so they act in a reasonably consistent manner. It's not perfect, but I've gotten some good use out of it.

tessierashpool 3/31/2025|||
how do you feel about Talespire? it allows pretty fast on-the-fly map-making as long you’re not dealing with significant vertical distances, although it’s got very little in common with LambdaMOO. but MUDs generally seem to be MMRPG precursors at this point, unless there’s an underground community I’m unaware of.
maxwelljoslyn 4/1/2025||
I feel the same way about Tailspire as I feel about pretty much every other VTT. They might do okay, even pretty well, at combat maps and/or character sheets, but I want is the whole game world in the computer. Maps are just a fraction of what I need as a GM. I need data on economics and population numbers and power structures. And I need computation over all those things.

For instance, my game rules include an economic subsystem, which takes in the production of goods and services at hundreds of in-game cities, and computes prices for over a thousand player-purchaseable goods. The "second app" that I referred to above allows players to (among many other things) purchase stuff at the market nearest their current location and have those items go straight into their character sheet. If the "item" is actually an animal, a hired mercenary, etc. then a different subsystem generates a new NPC with the right statistics and attaches the player to it as owner/liege.

I could write an extension for a VTT that talks to my economic system over an API, and throws items up on screen, lets players purchase them, moves them into their character sheet using the right function calls in the VTT's extension library, etc. But every step of the way, I would be fighting to cram this subsystem into the VTT's conception that gameplay begins and ends with maps and char sheets.

petesergeant 3/31/2025||
I am full-time building LLM-based NPCs for a text-based MMORPG. Been doing a lot of work recently on allowing progression through scenarios with them where the rules are in a class, and the LLM takes care of communicating user intent to the rules engine, based on free-text, and writes back to the user with the results.
maxwelljoslyn 4/1/2025||
That's sweet! I think LLMs have incredible potential for descriptions and for NPC behavior, and I really like that you have this bridge between freeform intent and a rules engine. I'd like to pick your brain about it - I'll send an email.
greysteil 3/30/2025||
I've been building mentions.us[1] - it sends you alerts when your keywords are mentioned on Hacker News, Reddit, Bluesky, LinkedIn and a few other places. For anyone who uses F5Bot, it's similar but with some extra data sources and a Slack integration.

It's been a fun project. Dealing with the scale of Reddit (~300 posts/second) creates some interesting technical challenges. It's also let me polish up my frontend development skills.

I don't think it will ever be a money spinner - it has ~70 folks using it buy they're all on the free tier. It's felt really good to build something useful, though.

[1]: https://mentions.us

maxwelljoslyn 3/30/2025||
You just got a signup :) Free plan, I'll admit. I don't need or want anything other than email notifications, and the free plan for that is very generous. Thanks for building this.
greysteil 3/31/2025||
:D
csomar 3/31/2025|||
For the social platforms, are you hooking up to their APIs or just using Google? I'm only interested in emails and would pay a small price for that (say 5-7/month). I've signed up and added my first keyword to test.

That being said, here is an additional feature: being able to track discord/slack/telegram by providing my API key and you streaming the content of the groups I've signed up to.

greysteil 3/31/2025||
We’re hooking up to the APIs - the goal is to alert you of mentions as quickly as possible, so waiting for Google to index results would introduce (much) too much lag.

Interesting feature request! I’ll have a think on it.

dailydetour123 3/31/2025|||
This is really interesting, thanks for sharing! I'm keen to know how it compares to a tool like Pulsar? I've been quoted a huge amount to use their service, and it looks like mention.us basically fulfills the same social listening function? If it does then I will definitely push my org to sign up!
greysteil 3/31/2025||
Thanks! I haven't used Pulsar, but the general answer is that mentions.us is focussed on sending you alerts for notifications, whereas more sophisticated social listening tools provide a lot more analytics (e.g., sentiment analysis).

If your company just wants alerts when their keywords are mentioned on social media then mentions.us should work great for them. If you work for Coca Cola then you likely need something very different from your social listening tool!

dailydetour123 3/31/2025||
Thanks for clarifying, we're a small org and so the few mentions we get could be analysed manually I'm sure. I will flag it to the marketing team!
juliensalinas 3/31/2025|||
Sounds very cool. I'm curious how you manage to monitor Linkedin though. The only tool that seems capable of monitoring Linkedin is https://kwatch.io , so if you manage to achieve that too it's impressive.
greysteil 3/31/2025||
Hey Julien! I’ve seen you advertising KWatch in lots of places, assume you’re connected to it / know the founder?

For LinkedIn monitoring we use the voyager APIs. It’s not perfect because it gets posts but not comments, but it’s pretty good.

fmxsh 3/31/2025|||
Perhaps it would be of interest to people into social media marketing or people trying to build social media presence. Keywords mean a lot to them. I'm sure you've thought of it. Perhaps that is where market potential exists for it.
aniketsaurav18 3/31/2025|||
your pricing is little confusing, for free you are providing 100 keywords, and for you most expensive plan you are providing also 100 keywords, in fact only diff between these two is slack notification. What's the motivation behind this pricing plan?
greysteil 3/31/2025||
I put more details in a reply to another comment, but basically I think the number of people willing to pay for email alerts is small, so I’ve made the service free for them. It’s only teams who want Slack notifications who have paid plans.

I’m not optimising to extract every possible $ from the market with that pricing strategy. Instead I hope it will maximise the number of users whilst breaking even on costs.

huksley 3/31/2025|||
Looks very interesting!! I registered and found an issue: when I add the mention keyword, it shows two results, but after saving it, it shows zero results. I tried checking mentions for my side project DollarDeploy.
greysteil 3/31/2025||
Thanks for the feedback! For saved terms we show you the number of matches we’ve notified you about, which always starts at zero, whereas during creating we show you how many you would have matched. That’s a confusing UI and I should improve it
kshitij_libra 3/31/2025|||
This seems very useful. Why not make it paid ? Do you think your customers won’t buy ? Have you tried ?

What would your customers need to make them want to pay for it ?

greysteil 3/31/2025||
I think most of the people who sign up for email alerts would never pay. Lots of them are indie hackers or folks with a side project - I've been there, and know how price sensitive those communities are. I'd rather they use the service for free than not at all - I get valuable feedback from that, a marketing boost if they tell others about it, and the validation of having built something other people use.

I do have a paid plan for people who want Slack notifications, and I think those folks ought to be happy to pay. My hope is that I'll eventually get a few paid signups and that those will cover the costs of the service (which are minimal).

I know I lose a bit of revenue with the above approach, but it's a tradeoff I'm happy to make.

clueless 3/30/2025|||
how do you get real time acsess to reddit posts?
greysteil 3/30/2025||
Through the API - in particular the info endpoint[1], combined with the fact that Reddit IDs are base36 encoded sequentially increasing integers[2]. You can get 100 objects at a time, so if you make ~3 requests a second it's enough to get all of the new posts and comments.

[1] https://www.reddit.com/dev/api/#GET_api_info

[2] https://www.reddit.com/dev/api/#fullnames

qwikhost 4/2/2025|||
How do you get realtime data from Reddit?
qwikhost 3/31/2025||
How do you get realtime data from LinkedIn?
greysteil 3/31/2025||
For now we use the LinkedIn voyager API's search endpoint
erekp 3/31/2025||
with your account? you never get blocked? that's impressive
amrangaye 3/31/2025||
Does “vibe coding” count :-) I’m from west Africa and lately been very interested in African fairy tales to read to my daughter. Ended up building ( a GPT-backed interface that can insert her in any African story she wants. We also have a list of African queens who’re not famous anymore but did amazing things (look up Queen Nzinga for example). So I’m doing a series of little children’s books about each queen - have them exported to PDF so I can print them out and bind for her: her own little Collection of fairytales. I plan to put it online later - even if you’re not African I think it’s a great way to explore our history.
tcmart14 3/31/2025||
This sounds pretty cool. As another commenter asked if you have a write up, I just want to throw my hat in and say, I would enjoy reading a write up what your doing. From the technical bits to a review by your daughter of how well the stories came out.
amrangaye 3/31/2025||
Thank you. Yes she’s been enjoying them a lot now that we don’t have to repeat the same stories every night :) I’ll share on here once I’m done and also the right up.
Aaronstotle 3/31/2025|||
I love learning more history, thank you for the recommendation, I would have never discovered this myself.
apilove 4/1/2025||
There is nice list url shortener API https://publicapis.dev/category/url-shorteners and worth try it.
joelfried 3/31/2025|||
Sounds super unique. Please include me too if you do any public follow up? I'd actually enjoy reading those, myself as I've collected folktale collections for years!
amrangaye 3/31/2025||
Will do. Hn has no way of tagging people or following up though so I’m not sure how to share once I go live. But if you shoot me a message I’ll update you once done. amrangaye at gmail dot com.
ryanjamurphy 3/31/2025||
Would love to do something similar. Have you written about the technicals/set up somewhere?
amrangaye 3/31/2025||
Not yet as I’m still finessing it to her needs. She has a problem getting rid of thumb sucking, and finally asked for a fairy tale with a thumb sucking princess who eventually stops sucking her thumb lol. It’s a fun activity and also letting me learn about LLMs.
ryanjamurphy 4/1/2025||
We've used them to generate stories about our kids and their favourite characters, too. It's a great use case — your approach sounds excellent. Good luck to you!
mhuffman 3/31/2025||
I dusted off an old app (2012 or so) that I wrote for me and my girlfriend who are in a long-distance relationship which I call "Date Night Movie Player". It needs updates and she mentioned that she missed using it. Basically it lets two people sync watching a video/movie together, while chatting in a side transparent overlay and has a remote control with interesting buttons like timed "beer break", "bathroom break", along with pause so you can draw an arrow on the screen or circle something of interest. There is also a button that might (it is random chance and you can only do it a few times) let you steal the remote control from the other user. Only the person with the remote can really pause after all! It gives the experience of watching a movie together and being able to comment about things happening like when we are together.
sram1337 3/31/2025||
Have you seen the YouTube Watch Together app on Discord? It's built into the client
mhuffman 3/31/2025||
I have! It was one of those "I should have moved when I had the chance" moments. But such is life as a developer.
mvieira38 3/31/2025|||
I would 100% pay for this as a big movie fan and living away from my gf
mhuffman 3/31/2025||
Interesting to hear. One big thing I had to address was the fact that she lives in a rural area with very slow internet access and I don't. So I built in and option for us both to select a movie and a time for the date and it would pre-download a high-quality version.
dmonitor 3/31/2025||
It's a shame that apps like these are "illegal". There's so much fun innovation to be had.
mhuffman 3/31/2025|||
I don't think the apps themselves are illegal any more than VLC is illegal. You could likely watch something on the app that is illegal depending on where you live.
mvieira38 3/31/2025|||
Maybe the legality is clearer if they are self hostable? Then all the liability is on the user who downloaded the movies in the first place
mhuffman 3/31/2025||
I think this is the way to think about it. Mine was a desktop application with no central server.
fillskills 3/31/2025|||
Would love to learn about this. I am building something similar for my family
randomhustler 3/31/2025||
I had this usecase, Ended up using google meet and screensharing.
hedayet 3/30/2025|
https://aponlink.com/

I built a no-AI, human-only social network focused on ONLY one thing - keeping people connected.

I'd stepped away from mainstream social media last year due to the overwhelming negativity, privacy violation, etc. Then around early this year, I started to feel I was missing updates from people who actually matter in my life. Instead of going back to traditional platforms, I decided to create a simple solution myself.

The platform emphasizes: - No AI algorithms or content manipulation - No infinite scrolling designed to trap your attention - A simple interface for sharing life updates with close connections (Text and Photos only for now)

We've intentionally made connecting difficult: no user search and no friend suggestions - you only connect with people you already know and care about.

Web: https://aponlink.com/ Android App: https://play.google.com/store/apps/details?id=com.aponlink.a... (iOS version coming soon)

I'd love to hear how this approach resonates with the HN community, particularly from those who've also grown tired of traditional social media.

protocolture 3/30/2025||
So I like what you are doing but I think it might be worth having a larger think about this

> no user search and no friend suggestions

I get the intentionality, but the reason that facebook was successful was that it found the people you intentionally wanted to communicate with for you.

The issue is that the social graph overstays its welcome. After its done finding all the people you want to communicate with, it suggests a ton of people you dont.

I actually find this to be similar to netflix and spotify suggestions, both of which were able to find things I wanted to consume early on, but now just give me waves of shit.

Consider doing something a lot smaller, like an opt in, 1 month activation at a time, depth 1 search to find people you might want to connect with, but without the hassle of having to swap details on another platform.

hedayet 3/31/2025||
That’s a great insight. I totally agree that early Facebook’s ability to surface actual connections was valuable before it turned into an endless recommendation machine.

The challenge is figuring out how to offer just enough discoverability that doesn't creep users. I like your idea of an opt-in, time-limited, depth-1 search, it keeps things intentional while reducing friction. Definitely something to think about.

Curious: would you see value in a simple "import contacts" option, or do you think that would risk overstepping?

protocolture 3/31/2025||
As long as it is isnt intrusive. Both LinkedIn and Facebook have done this to me at some stage or another, and I get endless prompts to try again, and theres also a bunch of users on those platforms that are now recommended to me because of the search.

It would be useful to identify my friends but I dont want a loose thread of some guy I emailed 20 years ago to constantly bug me.

gwbas1c 3/30/2025|||
You need an about page with screenshots, ect. I'd like to know what I'm getting into before I invest my extremely valuable time and attention into your site.

---

The links on the bottom of the page (about, privacy policy, ect) don't work.

In general: Non-functioning links / buttons are a huge no-no. When I encounter non-functioning links / buttons in software, I just assume I'm going to waste my time and move on.

I know that sometimes when designing a UI, you want to be able to "see" what the final product will look like. Leaving them in before they work is sloppy, and gives the impression that your product also has more loose ends.

hedayet 3/30/2025||
Thanks for the feedback!

1. The bottom nav links are fixed now, really appreciate you pointing that out.

2. > ...an About page with screenshots great suggestion! We’ll work on adding that soon to better showcase what Aponlink is about.

btbuildem 3/30/2025|||
Something like this has been on my mind for a while now -- take the useful, positive elements from across the socials (network of connections, media sharing, events, etc) and create mini-nets that let people who want to, stay in touch.

How do you envision onboarding? Do I join, and then try to convince a handful of people to join as well?

hedayet 3/30/2025||
Glad this resonates with you! That’s exactly the goal—keeping the useful parts of social networking while removing the noise and AI-driven manipulation.

> How do you envision onboarding? Do I join, and then try to convince a handful of people to join as well?

Yes, that's been the idea so far for onboarding. But we’re also exploring ways to make the platform more organically discoverable and valuable from day one (without AI).

In my case, it's been easy to convince my network to move and I found they shared a similar level of dismay towards traditional networks.

Please let me know if you have any suggestions on the onboarding process

btbuildem 4/1/2025||
I wish I had suggestions! The daunting nature of the onboarding is what cooled my jets in the first place, and I never got part the ideation phase with this particular project.

The need is there (at least for some of us!) so the sell shouldn't be so hard, but I feel like I'm missing the "a-ha!" differentiator here. It's not enough to pull the good/useful remnants from the sludge socials are today; it would need an extra something to excite people enough to make the effort to engage with yet another online service.

gonzaloalvarez 3/30/2025||
Trying to sign up, got this error:

Error Firebase: Error (auth/network-request-failed).

hedayet 3/30/2025||
Thanks for trying it out! That particular error usually happens due to a temporary network issue or if third-party cookies are blocked. Could you try refreshing or using a different browser?

I’ll also check on my end to make sure everything is running smoothly. Appreciate the heads-up!

More comments...