Posted by elsewhen 10/26/2024
Google is trying to show they are not behind in the AI race by advertising something probably barely out of alpha testing. It just reinforce the idea that Gemini is still inferior to Claude and ChatGPT.
I tried Gemini once and then tried Claude. It was such a huge difference I can't imagine Google, who created the transformer architecture, can be so behind a tiny startup a fraction of their size.
I'm really unimpressed by the velocity of feature development from these AI orgs, I don't expect them to have complete feature parity any time soon if at all.
As always pick the right tool for the job. There is almost never a 1-size fits all best selection.
Before this came along, we had tried different tools and RAG applications and nothing compares to what Gemeni delivers. And the cost is nearly nothing compared to gains.
I feel two ways about this: one is that there's a lot of opportunity for fast moving startups. But two: how does a startup remain defensible with the giant comes along?
The example is copilot: Microsoft announced it 19 months ago and still very rough around the edges and many opensource projects are doing a fairly decent job of filling in the gap in the meanwhile.
> I'm really unimpressed by the velocity of feature development from these AI orgs
Seems about right though. Do they even know what to build? I get the sense that they are getting an idea started and then moving mostly based on feedback. We're still very early in this game.
https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leade...
I’m not sure they’re behind, maybe just focusing on different things? Being fast makes sense for a lot of use cases, and large context windows are important for the sorts of cases like NotebookLM, and citing sources is important for safety.
(Friendly reminder not to trust everything you read on the internet)
But ChatGPT - hallucinates and flounders so much about almost anything worthwhile I ask that it is simply of no worth to me as far as trustworthiness is concerned. It tries to be flowery to give an impression of being "good". It is not. Is it decent fort writing quick office replies which you could quick-edit and sent. I would think so. Anything more "serious"? Nope!
Gemini (could use it via a non-Gmail throwaway email Google a/c) wasn't that verbose or going all over. It was more restrained and didn't try too hard about things it didn't know or couldn't do anything about.
I think a lot of the reason ChatGPT seems "better" is because it is easily accessible and the company/founder actually achieved the intended "viral marketing" it could including by that firing and re-hiring saga, ScaJo episode et al.
In all seriousness I firmly believe they’ll embrace ads in AI responses and I see zero reason to think they wouldn’t.
Billboards on the highway are limited in scope due to safety and other reasons. A billboard can't have mechanical arms that swing about causing driver distraction. If "information safety" is becoming a thing, the equivalent of "no mechanical arms on billboards" might be enforced on AI generators? Or am I suggesting a remedy worse than the problem is solves?
Where it gets problematic, is if the AI pushes a specific brand as a necessary step of the cake-making process. Suddenly it's unethical.
It would also be a problem if the AI recommended a competing brand of appliance if I were specifically asking the AI to tell me how to use XYZ brand of appliance. Kind of like how Google lets advertisers buy ads for competitor keywords, which in my opinion is grubby and borderline unethical.
Edit: it looks like I need to turn "app activity" on, which means I need to opt in to someone being allowed to read my interactions and annotate them, making them undeletable. Then I need to connect Google Workspace. Then it creates a reminder but tapping on it prompts me to install the Google Tasks app. It's absolutely clown shoes for something that could be done on the device with existing APIs.
Google’s not even pretending to care about privacy any more.
It really says something about the state of competition, the power of capital, and the level of data hoarding these megacorps enjoy. A startup fumbling this way would be dead on arrival and receive no second chances.
There’s also something very offensive about Google championing the death of ad blockers in browsers while sucking in all our data to power their invasive browser features.
This feels more like a sticky move than an innovation in what AI can do.
This is the know more about you, better context for results trick. Which any other could deliver with your "user habit profile RAG data" - browser hook is a great way to collect.
Gemini is terrible. It's way worse than even GPT 3. Never mind 3.5 or Claude. It's basically useless. Even the simplest things like trivial code transformations don't work. Gemini goes rogue all the time and starts to do things it shouldn't.
I get the feeling that in desperation people at Google are hacking the metrics to make their model look good. While in reality it's just junk.
No model, and I've tried a lot of them, has such a massive gap between good benchmark performance and horrible real world performance.
If you knew this ahead of time then what value did it provide you? Put another way, once the counter party realizes that you've pushed this responsibility onto an LLM, aren't you worried that they could take advantage of this fact to produce intentionally misleading query results?
In an adversarial world what long term value could possibly exist here?
I think I should have clarified that the responses were accurate when we bench-marked them against RAG applications built on other LLMs. If the results had been poor or below a set threshold, the RAG application built on Gemini Pro would have been shelved.
>> Put another way, once the counter party realizes that you've pushed this responsibility onto an LLM, aren't you worried that they could take advantage of this fact to produce intentionally misleading query results?
This is an excellent point but inapplicable in our use case as this was only meant for internal users, who otherwise would be opening word and PDF documents and searching through them or relying on their own memory.
FAANGs don't understand the concept of consent, let alone informed consent.