Training our own AI models

Posted by tartieret 4 hours ago

160 points | 104 commentspage 3

gyoridavid 3 hours ago|

I feel that the US should step up their legislation game and make sure these companies can't retroactively make rules to steal their users data. I know it's trendy to hate the EU but their legislation actually protects the users, and not the companies interests.

tartieret 4 hours ago||

I initially used Posthog as an alternative to Google Analytics with more privacy. Now they want to use the data for a business purpose. Working hard towards enshitification?

rvz 4 hours ago|

> I initially used Posthog as an alternative to Google Analytics with more privacy.

This does not make any sense.

> Now they want to use the data for a business purpose.

They raised VC money and they want a return so this was predictable.

mrits 3 hours ago||

It makes perfect sense actually

calmbonsai 3 hours ago||

LOL. You stay classy PostHog.

Henchman21 4 hours ago||

You can’t “opt-in” to something that is the default. The choice is made for you — and when the choice is made for you? You haven’t opted in or out?

scosman 3 hours ago|

I would have guessed that was just a bad title here but no, article states it as "opted in by default".

tartieret 3 hours ago||

I fixed the title, sorry for the typo!

scosman 3 hours ago||

not your fault, the article uses that language!

TZubiri 3 hours ago||

Today I was thinking, if I start a company in the LLM tooling space, I would put in the company mission in the incorporation documents that client data will not be used to train.

The temptation and the value is too great, and the opt-in opt-out consent thing ends up being a fuckery where the company tries to trick the user into allowing them to take a look into the data, presumably because they are selling the product at a loss and need an alternative revenue model.

Just make it impossible from the get-go, the fine print would be that the data can be shared off-band explicitly, in an email, or if explicitly copy pasted in a support chatbox, but there would be no mechanism for us to read the data from the databases much less from the client.

I don't mean it would be an air-tight mechanism like Signal or ProtonMail, if a court order would ask us to produce client info, we would still reserve the right to produce the data, but exceptionally, and definitely not for training models.

OkayPhysicist 3 hours ago|

More companies need to make, for lack of a better term, "oaths" of what they won't do as a company. My pitch on it is to tie it to financial penalties the company agrees to pay, somewhere in the "enough to incentivize a significant portion of our user base to sue us" territory, such that it would be financial suicide to violate them.

TZubiri 3 hours ago||

Contracts ad incorporations are designed for this, the issue is that the incumbent legal strategy is to use template documents, and to reduce potential disputes to 1$ in private arbitration, essentially legal's job is to make legal go away.

Another term I would incorporate is a Seppuku term, if we get hacked, I resign, the company goes bankrupt. Anything else is the wrong attitude to computer security for companies that want to scale to Global reach.

dzonga 3 hours ago||

another would be excellent product company destroyed or being destroyed slowly due to VCs and the ever chase for 'growth'

mikkelam 3 hours ago||

The enshittification has begun. Time to move on!

slopinthebag 4 hours ago||

PostHog better transition to an AI company soon because they are one of the SAAS's which are absolutely cooked by vibe coding. What it does is extremely amenable to LLMs and it's also non-critical for a business, making it an excellent candidate for replacement by in-house solutions. And if it means never having to use their website again that's even better.

I wonder if they regret opensource, considering people will be using LLMs to replace them which have surely trained off of their code.

Ozzie-D 1 hour ago||

[flagged]

Ayush_Khati1 3 hours ago|

[flagged]