Guy running a Google rival from his laundry room

Posted by coloneltcb 9/10/2025

Guy running a Google rival from his laundry room(www.fastcompany.com)

246 points | 151 commentspage 2

evanjrowley 9/10/2025|

Search websites by Ryan Pearce:

- SearchaPage - Web Search Engine https://searcha.page/

- Seek Ninja - Stealthy Search Engine https://seek.ninja/

317070 9/10/2025||

https://searcha.page/s?q=blog https://seek.ninja/s?q=blog

Both of them are erroring out right now?

chiefsearchaco 9/10/2025|||

Yep, it was load. Usage increased 20x week over week, especially today. I think I failed my trial by fire. Got a good plan for scaling capacity and better UX for when its under strain.

kitd 9/10/2025|||

Were you trying them via Chrome, by any chance? ;)

jslakro 9/10/2025||

firefox here and it's not working

thm 9/10/2025||

I'm running one for news https://mozberg.com - not in my basement though.

cosmicgadget 9/10/2025||

Where is it?

lxe 9/10/2025||

This is a cool hobby project, but why is this notable? Why a FastCompany article? I'm trying to figure out anything that sets this apart from thousands of other little hobby search projects.

I understand companies like Perplexity or Brave or DuckDuckGo "rivialing Google", but building a hobby index and crawler is nice, and worthy of a "Show HN: "... but an actual media article?

gowld 9/10/2025|

It's only notable as a clickbait narrative for ignorant readers -- FastCompany's target market

the_real_cher 9/10/2025||

I always wondered why someone couldn't do this.

Google was invented many years ago by two guys in a dorm room and since then there's been so many white papers and advancements in the public sphere and the actual underlying problem has not changed that much, that it seems like it could be done by a small group or independent person.

dec0dedab0de 9/10/2025||

Crawling is much more difficult than it used to be. Significantly more content is behind a login, Javascript is required for way more than it should be, and almost the entire web is behind cloudflare or another type of captcha.

marginalia_nu 9/10/2025||

These things are actually fairly small problems.

The parts that absolutely require JS can't be reliably linked to and nobody indexes that stuff. Most apparent SPA:s serve a HTML alternative if you don't claim to be a web browser in the UA.

Cloudflare and the like are also fairly easy to deal with as long as your crawler is well behaved. You can register the fingerprint and mostly get access to cf:ed websites.

non_aligned 9/10/2025|||

I think there are two factors that helped Google. First, the search engine landscape back then was absolutely abysmal. I'm sure someone will chime in saying that it's abysmal today as well, but the reality is that 99%+ of consumer searches get good results today. And that's simply because the nature of search has changed: we have billions of people using the internet, and they overwhelmingly just search for products to buy, local restaurants that offer takeout, or for familiar pop content to watch or listen to. And there's some SEO spam there, but also pretty fierce quality assurance by search engines.

Second, the internet was different: when all nerds declared that Google is good, that was CNN-grade newsworthy (and CNN used to matter a lot more back then), simply because the internet seemed kinda important, but there was no other authority on the topic. Today, that's not the case. If you need someone to opine on the internet on air, you invite some political pundit or a business analyst.

So no, I don't think you can repeat the success of Google the same way. It was a product of its time.

snek_case 9/10/2025||

Google maps is probably a big moat that's very hard to replicate. You can't as easily just crawl all of that data. It's not easy to generate directions. The average user doesn't want to use your search engine for one thing and Google for everything else, they just want a one stop shop for search.

cadamsdotcom 9/11/2025||

The average user might want a one stop shop.

That's not a showstopper. It's ok to not be everything to everyone.

balder1991 9/10/2025|||

We have Marginalia which serves a specific use-case: https://about.marginalia-search.com/

mdaniel 9/10/2025||

That's what I was expecting this submission to be about, although to be honest I'm not certain that Marginalia would want the influx of a fastcompany sized tire kicking

marginalia_nu 9/10/2025||

To be fair I'm on a colocated server now. No more apartment hosting for me.

OutOfHere 9/10/2025|||

The actual underlying problem has changed altogether. Pagerank is easily gamed by SEO.

Search candidates and rankings now require assessment by LLM. Moreover, as a default, users want the results intelligently synthesized into a text response with references rather than as raw results.

Crawling too requires innovative approaches to bypass server filters.

I doubt any independent person can afford to run a vector database or LLMs at immense scale.

kcbanner 9/10/2025|||

> users want the results intelligently synthesized into a text response with references rather than as raw results.

The reason I pay for Kagi is that I specifically don't want this to occur.

OutOfHere 9/10/2025|||

If you pay for a service (web search) that 99.9% use for free, you're an extreme outlier, and not necessarily a justifiable one either. After all, DDG, Google and various others still have raw results for free.

Workaccount2 9/10/2025|||

How much do you technologically relate to the average person on the street though?

Every person I have seen (outside the tiny tech bubble) google something has just read the AI overview without skipping a beat.

yepitwas 9/10/2025|||

That's worrisome since I've seen those be for-sure wrong a pretty high percentage of the time.

[EDIT] Incidentally, are there any sites that do actual web search any more, better than Yandex? I'd rather avoid a Russian site if I can, but there are whole topics where it's impossible to find anything useful on heavily "massaged" allegedly-Web-search-but-not-really sites like Google and DDG (Bing), but I can find what I want on page 1 or 2 of a Yandex search. Is Kagi as good as that, or is their index simply ignoring a whole bunch of the Web like so many others? I don't mind paying.

degamad 9/10/2025|||

Google "Web" results (not the default results you get when you search) still seem okay for me. You can force them with the udm=14 url trick, or select the "Web" tab in the results. No AI, no images or shopping results, and slightly better text results.

franktankbank 9/10/2025|||

Yep, same here. Ask it "should I wash venison tenderloin" and you get an initial "No, because" followed by a generally "yes its important to clean including with water" in the longer description. Wow a self contradictory answer! Good job!

jkestner 9/10/2025||||

We’re being force fed them. I’m an AI hater and I catch myself reading those sometimes.

Yes, people want the answer directly. Google wants you to stay on their site to read some mishmash. I think the ideal would be to immediately go to the source’s site.

throwmeaway222 9/10/2025|||

At this point the web is also so centralized you only need 3 bookmarks these days (your news, youtube and Amazon)

A search is just learning what you don't know and AI does a better job than search has ever done for me - and I'm in tech.

ricardo81 9/10/2025||||

>Pagerank

Also a lot of site owners are reluctant to link out. So much so that 'nofollow' had been reduced to a hint rather than a directive.

freeopinion 9/10/2025||||

> users want the results intelligently synthesized into a text response with references rather than as raw results

This leads directly to another big change.

People used to submit their sites to search engines and now they might actively block search engines. So a search engine author might have to spend a lot of effort in adversarial games.

iamacyborg 9/10/2025|||

> Moreover, as a default, users want the results intelligently synthesized into a text response with references rather than as raw results.

Citation needed

OutOfHere 9/10/2025||

You mean all the users of chat services aren't evidence? Chat services increasingly incorporate web links for references in their responses, and this is as the users seek. The tide continues to shift from traditional search to LLM synthesis.

iamacyborg 9/10/2025||

I suspect there are more users of traditional search than there are of llm chat apps.

freeopinion 9/10/2025||

I suspect that chat apps dominate (80+%?) the under-20 demographic, and have a sizable chunk of the under-30 demographic. Within the next five years it will probably represent 50+% of total search traffic. Maybe it already does. It makes sense that any search site that wants to be in the game tomorrow would keep racing down the AI chat path.

jrm4 9/10/2025|||

More to the point, it's a shame that we can't collectively grok (dammit, they took that from us too) concepts like "personal" and/or "curated" directories, e.g. individual and group wikis and so forth on perhaps more directed topics with lists of good links.

cosmicgadget 9/10/2025|||

Other than the obvious (but surmountable) technical challenges with crawling and indexing, trying to establish "goodness" for a given user is tough. For a blogger it will be "hey, you are reading this so you probably like what I like". That's often true but as soon as you try to have a centralized service with arbitrary users, it is hard to do anything better than filtering purely commercial content.

sdf4j 9/10/2025|||

what you mean we can't? there are a lot of curated content directories out there.

jrm4 9/10/2025||

Right, I suppose I mean "getting more people to think about why a few of these bookmarked for your favorite topics, especially tied to a trustworthy person, is a million times better than just hitting up Google."

Or, perhaps, a "a better Google should just take you to these."

Something like that.

CalRobert 9/10/2025|||

Among other things, I think crawling is a lot harder now.

ambicapter 9/10/2025|||

Google basically invented the modern cloud in order to efficiently use the hardware necessary to actually build those search engine indices. It's not really a question of implementing a good algorithm and away we go.

lif 9/10/2025||

Provided they have the kind of massive government support Google has had from the get-go, sure!

_joel 9/10/2025||

The photo of the power socket right next to the sink looks safe

throwway120385 9/10/2025||

Looks like a GFCI. Should be fine.

chiefsearchaco 9/10/2025||

I'm planning on running a cord through my wall, I just keep putting it off :D

zrobotics 9/11/2025||

Absolutely don't run an extension cord through a wall, that's only slightly less of a fire hazard than storing a gasoline can on top of the server. Extension cords are normally very derated, expecting occasional use and ample cooking not being inside a wall. Better to keep it as-is, or have a 20A dedicated circuit run.

authnopuz 9/10/2025||

https://archive.is/HA7y4

HardCodedBias 9/10/2025||

I know that Google engineers have a cushy life but I actually find it unlikely that a guy, who isn't attempting some radical new type of search (like pagerank back in the day) can hope to compete with the orgs in Google who support search.

Again, those orgs are likely too comfortable and less productive than people would like, but we're talking about many-many thousands and depending upon how you define "the work" of search upwards of 10k.

I didn't see any new secret sauce in the article and Google is has said that since 2015 (?) Google Brain has been involved in search.

This is not to say that Google couldn't be dislodged by search via LLM or similar, that is "new" research.

freeopinion 9/10/2025|

If you wrote that 100 people could outwork one person, I'd nod my head. If you wrote that 10k people could outwork 1k people, I'd shrug. If you tell me that 100 people can combine to tie my shoe faster than I can, I'd question that.

Building a state-of-the-art search engine is not shoelaces. But upwards of 10k workers is not impressive in the right direction.

One person starting out with anything at all can quickly grow into one person with one or two really innovative ideas. One or two good ideas can catch fire pretty quickly. Don't be too dismissive.

freedomben 9/10/2025||

> Why the laundry room? Two reasons: Heat and noise. Pearce’s server was initially in his bedroom, but the machine was so hot, it actually made it too uncomfortable to sleep.

This is a rite of passage and a badge of honor for homelabbers/tinkerers/hackers to discover for themselves IMHO. If you haven't tried it, you should. The heat is bad enough to warrant moving it, but add the noise too, sprinkle in a few nights of bad sleep, and it becomes an effective form of torture :-D

Just don't decide to move it to a closet unless you also install some fans in there. I ended up finding a cozy spot under the staircase which worked quite well

tolerance 9/10/2025||

The great thing about this is that with the decentralization/recentralization of the Web, it may become easier for certain people to roll their own search engines for their respective communities and crawl/index pages only according to their shared tastes.

The bad thing about this is...read above.

risico 9/10/2025||

One of my dream projects as well, sadly it feels a lot harder to crawl the internet these days, as others have said around here as well.

What are some good practices these days to ensure a good crawl/scrape? Invest in proxies, preferably residential?

mooiedingen 9/10/2025|

Nothing new as it has been done before, the concept is simple enough: step 1: indexer, solr/lucene Step 2: crawler of which there are several foss, build one yourself? or you just run yacy which is a combo of the above, hook combine with an oldschool searx instance and you will be granted the title as seeker by the spirit of Fravia+ who was elder of the searchlores!!! Not only will you filter crap made by machine learning models, but thou shall find what thou seek! I refuse to call a 16 line long for loop triggering in memory loaded tokenized data where data can be anything from a scientific paper hallucinated by a chatbot to a message between two lovers anything intelligent for it is not intelligence but a blob of tokenized fcking data in memory getting triggered for an output by a derp with a 16 line long for loop!!!

rurban 9/11/2025|

xapian is easier and faster. No Java memory eater.

I've once built a good company wide search engine with custom crawlers, and result hooks, eg to crazy SAP or other ticket systems. Gmane was also legendary.

More comments...