Top
Best
New

Posted by coloneltcb 5 days ago

Guy running a Google rival from his laundry room(www.fastcompany.com)
245 points | 150 commentspage 2
evanjrowley 4 days ago|
Search websites by Ryan Pearce:

- SearchaPage - Web Search Engine https://searcha.page/

- Seek Ninja - Stealthy Search Engine https://seek.ninja/

317070 4 days ago||
https://searcha.page/s?q=blog https://seek.ninja/s?q=blog

Both of them are erroring out right now?

chiefsearchaco 4 days ago|||
Yep, it was load. Usage increased 20x week over week, especially today. I think I failed my trial by fire. Got a good plan for scaling capacity and better UX for when its under strain.
kitd 4 days ago|||
Were you trying them via Chrome, by any chance? ;)
jslakro 4 days ago||
firefox here and it's not working
thm 4 days ago||
I'm running one for news https://mozberg.com - not in my basement though.
cosmicgadget 4 days ago||
Where is it?
lxe 4 days ago||
This is a cool hobby project, but why is this notable? Why a FastCompany article? I'm trying to figure out anything that sets this apart from thousands of other little hobby search projects.

I understand companies like Perplexity or Brave or DuckDuckGo "rivialing Google", but building a hobby index and crawler is nice, and worthy of a "Show HN: "... but an actual media article?

gowld 4 days ago|
It's only notable as a clickbait narrative for ignorant readers -- FastCompany's target market
the_real_cher 4 days ago||
I always wondered why someone couldn't do this.

Google was invented many years ago by two guys in a dorm room and since then there's been so many white papers and advancements in the public sphere and the actual underlying problem has not changed that much, that it seems like it could be done by a small group or independent person.

dec0dedab0de 4 days ago||
Crawling is much more difficult than it used to be. Significantly more content is behind a login, Javascript is required for way more than it should be, and almost the entire web is behind cloudflare or another type of captcha.
marginalia_nu 4 days ago||
These things are actually fairly small problems.

The parts that absolutely require JS can't be reliably linked to and nobody indexes that stuff. Most apparent SPA:s serve a HTML alternative if you don't claim to be a web browser in the UA.

Cloudflare and the like are also fairly easy to deal with as long as your crawler is well behaved. You can register the fingerprint and mostly get access to cf:ed websites.

non_aligned 4 days ago|||
I think there are two factors that helped Google. First, the search engine landscape back then was absolutely abysmal. I'm sure someone will chime in saying that it's abysmal today as well, but the reality is that 99%+ of consumer searches get good results today. And that's simply because the nature of search has changed: we have billions of people using the internet, and they overwhelmingly just search for products to buy, local restaurants that offer takeout, or for familiar pop content to watch or listen to. And there's some SEO spam there, but also pretty fierce quality assurance by search engines.

Second, the internet was different: when all nerds declared that Google is good, that was CNN-grade newsworthy (and CNN used to matter a lot more back then), simply because the internet seemed kinda important, but there was no other authority on the topic. Today, that's not the case. If you need someone to opine on the internet on air, you invite some political pundit or a business analyst.

So no, I don't think you can repeat the success of Google the same way. It was a product of its time.

snek_case 4 days ago||
Google maps is probably a big moat that's very hard to replicate. You can't as easily just crawl all of that data. It's not easy to generate directions. The average user doesn't want to use your search engine for one thing and Google for everything else, they just want a one stop shop for search.
cadamsdotcom 4 days ago||
The average user might want a one stop shop.

That's not a showstopper. It's ok to not be everything to everyone.

balder1991 4 days ago|||
We have Marginalia which serves a specific use-case: https://about.marginalia-search.com/
mdaniel 4 days ago||
That's what I was expecting this submission to be about, although to be honest I'm not certain that Marginalia would want the influx of a fastcompany sized tire kicking
marginalia_nu 4 days ago||
To be fair I'm on a colocated server now. No more apartment hosting for me.
OutOfHere 4 days ago|||
The actual underlying problem has changed altogether. Pagerank is easily gamed by SEO.

Search candidates and rankings now require assessment by LLM. Moreover, as a default, users want the results intelligently synthesized into a text response with references rather than as raw results.

Crawling too requires innovative approaches to bypass server filters.

I doubt any independent person can afford to run a vector database or LLMs at immense scale.

kcbanner 4 days ago|||
> users want the results intelligently synthesized into a text response with references rather than as raw results.

The reason I pay for Kagi is that I specifically don't want this to occur.

OutOfHere 4 days ago|||
If you pay for a service (web search) that 99.9% use for free, you're an extreme outlier, and not necessarily a justifiable one either. After all, DDG, Google and various others still have raw results for free.
Workaccount2 4 days ago|||
How much do you technologically relate to the average person on the street though?

Every person I have seen (outside the tiny tech bubble) google something has just read the AI overview without skipping a beat.

yepitwas 4 days ago|||
That's worrisome since I've seen those be for-sure wrong a pretty high percentage of the time.

[EDIT] Incidentally, are there any sites that do actual web search any more, better than Yandex? I'd rather avoid a Russian site if I can, but there are whole topics where it's impossible to find anything useful on heavily "massaged" allegedly-Web-search-but-not-really sites like Google and DDG (Bing), but I can find what I want on page 1 or 2 of a Yandex search. Is Kagi as good as that, or is their index simply ignoring a whole bunch of the Web like so many others? I don't mind paying.

degamad 4 days ago|||
Google "Web" results (not the default results you get when you search) still seem okay for me. You can force them with the udm=14 url trick, or select the "Web" tab in the results. No AI, no images or shopping results, and slightly better text results.
franktankbank 4 days ago|||
Yep, same here. Ask it "should I wash venison tenderloin" and you get an initial "No, because" followed by a generally "yes its important to clean including with water" in the longer description. Wow a self contradictory answer! Good job!
jkestner 4 days ago||||
We’re being force fed them. I’m an AI hater and I catch myself reading those sometimes.

Yes, people want the answer directly. Google wants you to stay on their site to read some mishmash. I think the ideal would be to immediately go to the source’s site.

throwmeaway222 4 days ago|||
At this point the web is also so centralized you only need 3 bookmarks these days (your news, youtube and Amazon)

A search is just learning what you don't know and AI does a better job than search has ever done for me - and I'm in tech.

ricardo81 4 days ago||||
>Pagerank

Also a lot of site owners are reluctant to link out. So much so that 'nofollow' had been reduced to a hint rather than a directive.

freeopinion 4 days ago||||
> users want the results intelligently synthesized into a text response with references rather than as raw results

This leads directly to another big change.

People used to submit their sites to search engines and now they might actively block search engines. So a search engine author might have to spend a lot of effort in adversarial games.

iamacyborg 4 days ago|||
> Moreover, as a default, users want the results intelligently synthesized into a text response with references rather than as raw results.

Citation needed

OutOfHere 4 days ago||
You mean all the users of chat services aren't evidence? Chat services increasingly incorporate web links for references in their responses, and this is as the users seek. The tide continues to shift from traditional search to LLM synthesis.
iamacyborg 4 days ago||
I suspect there are more users of traditional search than there are of llm chat apps.
freeopinion 4 days ago||
I suspect that chat apps dominate (80+%?) the under-20 demographic, and have a sizable chunk of the under-30 demographic. Within the next five years it will probably represent 50+% of total search traffic. Maybe it already does. It makes sense that any search site that wants to be in the game tomorrow would keep racing down the AI chat path.
jrm4 4 days ago|||
More to the point, it's a shame that we can't collectively grok (dammit, they took that from us too) concepts like "personal" and/or "curated" directories, e.g. individual and group wikis and so forth on perhaps more directed topics with lists of good links.
cosmicgadget 4 days ago|||
Other than the obvious (but surmountable) technical challenges with crawling and indexing, trying to establish "goodness" for a given user is tough. For a blogger it will be "hey, you are reading this so you probably like what I like". That's often true but as soon as you try to have a centralized service with arbitrary users, it is hard to do anything better than filtering purely commercial content.
sdf4j 4 days ago|||
what you mean we can't? there are a lot of curated content directories out there.
jrm4 4 days ago||
Right, I suppose I mean "getting more people to think about why a few of these bookmarked for your favorite topics, especially tied to a trustworthy person, is a million times better than just hitting up Google."

Or, perhaps, a "a better Google should just take you to these."

Something like that.

CalRobert 4 days ago|||
Among other things, I think crawling is a lot harder now.
ambicapter 4 days ago|||
Google basically invented the modern cloud in order to efficiently use the hardware necessary to actually build those search engine indices. It's not really a question of implementing a good algorithm and away we go.
lif 4 days ago||
Provided they have the kind of massive government support Google has had from the get-go, sure!
_joel 4 days ago||
The photo of the power socket right next to the sink looks safe
throwway120385 4 days ago||
Looks like a GFCI. Should be fine.
chiefsearchaco 4 days ago||
I'm planning on running a cord through my wall, I just keep putting it off :D
zrobotics 4 days ago||
Absolutely don't run an extension cord through a wall, that's only slightly less of a fire hazard than storing a gasoline can on top of the server. Extension cords are normally very derated, expecting occasional use and ample cooking not being inside a wall. Better to keep it as-is, or have a 20A dedicated circuit run.
authnopuz 5 days ago||
https://archive.is/HA7y4
HardCodedBias 4 days ago||
I know that Google engineers have a cushy life but I actually find it unlikely that a guy, who isn't attempting some radical new type of search (like pagerank back in the day) can hope to compete with the orgs in Google who support search.

Again, those orgs are likely too comfortable and less productive than people would like, but we're talking about many-many thousands and depending upon how you define "the work" of search upwards of 10k.

I didn't see any new secret sauce in the article and Google is has said that since 2015 (?) Google Brain has been involved in search.

This is not to say that Google couldn't be dislodged by search via LLM or similar, that is "new" research.

freeopinion 4 days ago|
If you wrote that 100 people could outwork one person, I'd nod my head. If you wrote that 10k people could outwork 1k people, I'd shrug. If you tell me that 100 people can combine to tie my shoe faster than I can, I'd question that.

Building a state-of-the-art search engine is not shoelaces. But upwards of 10k workers is not impressive in the right direction.

One person starting out with anything at all can quickly grow into one person with one or two really innovative ideas. One or two good ideas can catch fire pretty quickly. Don't be too dismissive.

freedomben 4 days ago||
> Why the laundry room? Two reasons: Heat and noise. Pearce’s server was initially in his bedroom, but the machine was so hot, it actually made it too uncomfortable to sleep.

This is a rite of passage and a badge of honor for homelabbers/tinkerers/hackers to discover for themselves IMHO. If you haven't tried it, you should. The heat is bad enough to warrant moving it, but add the noise too, sprinkle in a few nights of bad sleep, and it becomes an effective form of torture :-D

Just don't decide to move it to a closet unless you also install some fans in there. I ended up finding a cozy spot under the staircase which worked quite well

tolerance 4 days ago||
The great thing about this is that with the decentralization/recentralization of the Web, it may become easier for certain people to roll their own search engines for their respective communities and crawl/index pages only according to their shared tastes.

The bad thing about this is...read above.

rurban 4 days ago||
Just switched to Search Ninja as my default search engine on my Android firefox. No tracking, faster, better than duckduckgo. Now I'm just looking how to get search suggestions enabled.
risico 4 days ago|
One of my dream projects as well, sadly it feels a lot harder to crawl the internet these days, as others have said around here as well.

What are some good practices these days to ensure a good crawl/scrape? Invest in proxies, preferably residential?

More comments...