Top
Best
New

Posted by ytkimirti 3 hours ago

Show HN: I made Google Trends for Hacker News by indexing 18 years of comments(hackernewstrends.com)
395 points | 113 comments
zX41ZdbW 55 minutes ago|
I host a publicly open database with Hacker News data at https://play.clickhouse.com/play?user=play#U0VMRUNUICogRlJPT...

So you can create any sort of similar services in a single SQL query and an HTML page.

I also hosted it as a publicly accessible data lake, which you can query from everywhere: https://github.com/ClickHouse/ClickHouse/issues/29693#issuec...

It is also updated in real-time.

linmer 26 minutes ago||
Thank you for providing this, you are a hero!!! I'm gonna try to do cool stuff with it!
tgv 27 minutes ago|||
It probably also got swamped in real-time...
linmer 14 minutes ago||
Do you mean it's not updated? You gotta sort by update_time column. Looks sorted, but you gotta sort it with a query like:

SELECT * FROM hackernews_history

ORDER BY update_time DESC

LIMIT 100;

And yeah, I got that from deepseek because I don't have a brain.

GeoAtreides 20 minutes ago||
oh hey, per HN terms and conditions I license my HN data only to HN. Can you please remove my data from the set? Thank you!
snowwrestler 4 minutes ago|||
Not sure if joking, but if this product is not republishing the text of your contributions (to which you hold copyright), you’re probably not going to convince a court to do anything here.

Generally speaking it is not a violation to scrape, index, and analyze web content as long as you don’t republish copyrighted content without a license, or violate access controls. For example: search engine indexes.

linmer 9 minutes ago||||
Wait, so I have to ask for every single person's permissions to use this data?

uhhhhhhhhhhhhhhhhhhhhhh

pelagicAustral 19 minutes ago||||
You must be fun at parties
moralestapia 15 minutes ago|||
By uploading any User Content you hereby grant and will grant Y Combinator and its affiliated companies a nonexclusive, worldwide, royalty free, fully paid up, transferable, sublicensable, perpetual, irrevocable license to copy, display, upload, perform, distribute, store, modify and otherwise use your User Content for any Y Combinator-related purpose in any form, medium or technology now known or later developed.

@zX41ZdbW, you can safely ignore this guy.

@GeoAtreides, next time read the actual terms of service before hallucinating.

codingdave 5 minutes ago|||
> for any Y Combinator-related purpose

That is actually the key phrase. HN can provide the API, no problem. People can consume the API, no problem.. But I'd ask an attorney if API consumers can then re-release the data for purposes not related to YC. By my reading, they cannot.

moralestapia 3 minutes ago||
You might want to read it again, then:

https://opensource.org/license/mit

codingdave 2 minutes ago||
That is about the software, not the data.
GeoAtreides 12 minutes ago|||
>Y Combinator and its affiliated companies

is zX41ZdbW either?

moralestapia 11 minutes ago||
Oh, now I see my comment might be a bit harsh.

I didn't consider you might now know about:

https://github.com/hackernews/api

GeoAtreides 7 minutes ago||
yes, and per HN terms and conditions only YC and YC affiliated (as you quoted) can use the api legally. I don't license my content to anyone else and so it shouldn't be use by anyone else, even if it's available on a free-for-all API (nice move HN, btw).
moralestapia 6 minutes ago||
https://github.com/HackerNews/API/blob/master/LICENSE

It's right there, you just have to click the link I shared ...

Aachen 59 minutes ago||
Google Trends is about searches

This is about published text. More like if Google Trends counted word occurrences on webpages. Or if Google Ngrams counted webpages instead of books

People don't write much about non-newsworthy things whereas many people search "burger" anytime they want a burger delivery. The datasets aren't usable in the same way

Edit: not to say it's not a cool product! Just keep this in mind and enjoy using it :)

Aachen 39 minutes ago||
Someone asked an imo good question (that I was going to vouch for, idk why it was dead), but deleted it. Not sure why, but so I'll not credit the username in case they don't want that and changed some words for stylometrics avoidance

> The concept seems pretty comparable. From the title I had a good idea of what it was; when clicking on it, the visual presentation felt familiar & intuitive. \n\n Being a little less literal can be useful!

That's why I'm pointing it out: the title leads you to think they're the same metric, the page looks visually similar, and so you treat it as the same data type; but when you read the data through this lens, you draw wrong conclusions. It took me a while, scrolling down the examples, before I realised why it felt so off and that my mindset is wrong. It's what's being written about currently, not what people on HN are actually looking for

It's indeed not about being nonliteral, it's for me about having been confused about the data being shown

john_strinlai 34 minutes ago||
>Someone asked an imo good question but deleted it. Not sure why

it was me, and i deleted it because i realized my last sentence "being a little less literal can be useful" came across as unnecessarily blunt, which i didn't want. but i wasnt sure how to express what i wanted to say without it being that way. so i deleted it while rethinking my phrasing, and rethinking your comment.

in the end, i kind of came around to understand where you were coming from, so i didnt bother to recomment.

Aachen 33 minutes ago||
Thanks! Didn't come across like that to me though, all good
morkalork 48 minutes ago|||
Now if Algolia had a dataset of what people are searching for on HN that'd be it
Aachen 42 minutes ago||
Was considering that as well, but I doubt that people use Algolia in the same way that they use Google
gslepak 25 minutes ago||
Very cool! There seems to be a bug here: https://hackernewstrends.com/?q=vim&q=emacs&q=zed

For some reason the results cut off at 2018-10 even though "Popular Comparisons" preview shows more.

dacox 1 minute ago||
very cool! not sure if something is broken, but there seems to be no data past 2019 on any of the queries that i can see
kaelyx 1 hour ago||
Hello, /api/hn -> 502 {"error":"Your database has been temporarily rate-limited, please contact support@upstash.com for further details."}
simonpure 2 hours ago||
Hug of death

` /api/hn -> 504 An error occurred with your deployment FUNCTION_INVOCATION_TIMEOUT cle1::c8vgv-1782399959042-aeba3cae05ff `

docheinestages 2 hours ago||
If this project is an ad for their product (Upstash, promising "Highly Available, Infinitely Scalable"), then the last thing they'd want is a hug of death :/
ryan_n 2 hours ago|||
Oof that would be hilarious/tragic
steve1977 1 hour ago|||
Downstash
y1n0 1 hour ago||
Must stash
superxpro12 1 hour ago|||
/api/hn -> 502 {"error":"Your database has been temporarily rate-limited, please contact support@upstash.com for further details."}
Roonerelli 1 hour ago|||
I get

/api/hn -> 502 {"error":"Search entry should have an initialized schema, command was: [\"SEARCH.AGGREGATE\",\"hn\",\"{\\\"$or\\\":[{\\\"title\\\":{\\\"$eq\\\":\\\"anthropic\\\",\\\"$boost\\\":5}},{\\\"text\\\":{\\\"$eq\\\":\\\"anthropic\\\"}}]}\",\"{\\\"by_month\\\":{\\\"$dateHistogram\\\":{\\\"field\\\":\\\"time\\\",\\\"fixedInterval\\\":\\\"30d\\\"}},\\\"top_authors\\\":{\\\"$terms\\\":{\\\"field\\\":\\\"by\\\",\\\"size\\\":6}},\\\"by_type\\\":{\\\"$terms\\\":{\\\"field\\\":\\\"type\\\",\\\"size\\\":4}}}\"]"}

jjordan 2 hours ago|||
back in my day we called this a good ole' fashioned slashdotting.
lysace 1 hour ago|||
Our startup (~20 people) got slashdotted in 1998 or so. I was the only one randomly awake at the time. Remember watching all the logs from our web server in realtime, ready to immediately kill anything or anyone threatening the overall availability.

512 kbps uplink, I think. Even accidental DoS was trivial. We had a self-hosted little data center at our office with the only available stupidly expensive commercial connection.

Felt some dread having to restart the main (async, single-process) web server a few times to keep things going due to bugs in our code. So many* people on dial-up patiently waiting for the page to load.

It was exhilarating though :).

*) Surely at least a hundred!

mysterydip 1 hour ago||
One of the things I love about HN is having stories like this in the comments from otherwise random unassuming usernames
Onavo 1 hour ago|||
Its funny that these days the bottleneck is usually the data layer. Servers are so powerful now that even your average $5 server can handle HN levels of load if configured correctly.
ytkimirti 2 hours ago|||
We will be with you shortly :)
aNapierkowski 2 hours ago||
yeah we killed it :(
kpw94 1 hour ago||
The huge spike of "lk-99" in science & frontier tech is amusing...

This is cool concept, would love a positive/negative sentiment computed for each comment that refers to a given word, so you can see trends of "cloudflare (positive)" vs "cloudflare (negative)" where first one counts comments only if sentiment confidence is greater than say 0.6 and the other one counts comments only if sentiment is less than 0.4 (assuming [0,1] sentiment score)

Petersipoi 14 minutes ago||
It's funny how "trump" dwarfs just about any other term. Truly a hacker forum.
throwaway29812 14 minutes ago|
[dead]
arjie 1 hour ago||
One useful feature would be to normalize by total so that I can see changes in something as opposed to just total site growth. Right now I have to chart a single generic parameter but if I pick poorly it’ll confuse the issue.
apitman 5 minutes ago|
I'd love to see the opposite as well, ie how much has HN grown over time.
smalltorch 2 hours ago|
Reminds me of this side project I'm working on.

https://gitlab/here_forawhile/torum

It's a HN clone, that syncs with HN that allows you to basically establish smaller private communities who can discuss anything that's on HN without actually being on HN.

It also indexes and let's you search through the DB which I find is really useful to find things that peak my interest.

hk__2 1 hour ago||
Fixed link: https://gitlab.com/here_forawhile/torum
all2 1 hour ago||
*pique

'peak' refers to the top of a thing, commonly mountains

smalltorch 14 minutes ago||
*find things that align with my intrest peaks
More comments...