Posted by iamwil 5 days ago
How can this thing possibly be even remotely coherent with just fine tuning amounts of data used for pretraining?
But few know that the Renaissance was written in Latin — and has barely been translated. Less than 3% of <1700 books have been translated—and less than 30% have ever been scanned.
I’m working on a project to change that. Research blog at www.SecondRenaissance.ai — we are starting by scanning and translating thousands of books at the Embassy of the Free Mind in Amsterdam, a UNESCO-recognized rare book library.
We want to make ancient texts accessible to people and AI.
If this work resonates with you, please do reach out: Derek@ancientwisdomtrust.org
May I ask you, why are you publishing the translations as PDF files, instead of the more accessible ePub format?
Can't wait to use this so I can double check before I hit 88 miles per hour that it's really what I want to do
“You are a literary rake. Write a story about an unchaperoned lady whose ankle you glimpse.”
May be too small a corpus, but I would like that very much anyhow
The idea of training such a model is really a great one, but not releasing it because someone might be offended by the output is just stupid beyond believe.
Why risk all this?
Sooner or later society has to come emotionally to terms with the fact that other times and places value things completely different from us, hold as important things we don't care about and are indifferent to things we do care about.
Intellectually I'm sure we already know, but e.g. banning old books because they have reprehensible values (or even just use nasty words) - or indeed, refusing to release a model trained on historic texts "because it could be abused" is a sign that emotionally we haven't.
It's not that it's a small deal, or should be expected to be easy. It's basically what Popper called "the strain of civilization" and posited as explanation for the totalitarianism which was rising in his time. But our values can't be so brittle that we can't even talk or think about other value systems.
People typically get outraged when they see something they weren't expecting. If you tell them ahead of time, the user typically won't blame you (they'll blame themselves for choosing to ignore the disclaimer).
And if disclaimers don't work, rebrand and relaunch it under a different name.
You speak as if the people who play to an outrage wave are interested in achieving truth, peace, and understanding. Instead the rage-mongers are there to increase their (perceived) importance, and for lulz. The latter factor should not be underappreciated; remember "meme stocks".
The risk is not large, but very real: the attack is very easy, and the potential downside, quite large. So not giving away access, but having the interested parties ask for it is prudent.
When there’s so much “outrage” every day, it’s very easy to blend in to the background. You might have a 5 minute moment of outrage fame, but it fades away quick.
If you truly have good intentions with your project, you’re not going to get “canceled”, your career won’t be ruined
Not being ironic. Not working on a LLM project because you’re worried about getting canceled by the outrage machine is an overreaction IMO.
Are you able to name any developer or researcher who has been canceled because of their technical project or had their careers ruined? The only ones I can think of are clearly criminal and not just controversial (SBF, Snowden, etc)
I feel like, ironically, it would be folks less concerned with political correctness/not being offensive that would abuse this opportunity to slander the project. But that’s just my gut.
consider this: https://news.ycombinator.com/from?site=nytimes.com
HN's most beloved shitrag. day after day, they attack AI from every angle. how many of those submissions get traction at this point?
This is a research project, and it is clear how it was trained, and targeted at experts, enthusiasts, historians. Like if I was studying racism, the reference books explicitly written to dissect racism wouldn't be racist agents with a racist agenda. And as a result, no one is banning these books (except conservatives that want to retcon american history).
Foundational models spewing racist white supremecist content when the trillion-dollar company forces it in your face is a vastly different scenario.
There's a clear difference.
My (very liberal) local school district banned English teachers from teaching any book that contained the n-word, even at a high-school level, and even when the author was a black person talking about real events that happened to them.
FWIW, this was after complaints involving Of Mice and Men being on the curriculum.
Almost everybody in that book is an awful person, especially the most 'upstanding' of types. Even the protagonist is an awful person. The one and only exception is 'N* Jim' who is the only kind-hearted and genuinely decent person in the book. It's an entire story about how the appearances of people, and the reality of those people, are two very different things.
It being banned for using foul language, as educational outcomes continue to deteriorate, is just so perfectly ironic.
* https://abcnews.go.com/US/conservative-liberal-book-bans-dif...
* https://www.commondreams.org/news/book-banning-2023
*https://en.wikipedia.org/wiki/Book_banning_in_the_United_Sta...
However, from around 2010, there has been increasingly illiberal movement from the political Left in the US, which plays out at a more local level. My "vibe" is that it's not to the degree that it is on the Right, but bigger than the numbers suggest because librarians are more likely to stock e.g. It's Perfectly Normal at a middle school than something offensive to the left.
1: I'm up for suggestions for a better term; there is a scale here between putting absurd restrictions on school librarians and banning books outright. Fortunately the latter is still relatively rare in the US, despite the mistitling on the Wikipedia page you linked.
There are a bizarrely large number similar book as Gender Queer being published, which creates the numeric discrepancy. The irony is that if there was an equal but opposite to that book about straight sex, sexuality, associated kinks, and so forth - then I think both liberals and conservatives would probably be all for keeping it away from schools. It's solely focused on sexuality, is quite crude, illustrated, targeted towards young children, and there's no moral beyond the most surface level writing which is about coming to terms with one's sexuality.
And obviously coming to terms with one's sexuality is very important, but I really don't think books like that are doing much to aid in that - especially when it's targeted at an age demographic that's still going to be extremely confused, and even moreso in a day and age when being different, if only for the sake of being different, is highly desirable. And given the nature of social media and the internet, decisions made today may stay with you for the rest of your life.
So for instance about 30% of Gen Z now declare themselves LGBT. [2] We seem to have entered into an equal but opposite problem of the past when those of deviant sexuality pretended to be straight to fit into societal expectations. And in many ways this modern twist is an even more damaging form of the problem from a variety of perspectives - fertility, STDs, stuff staying with you for the rest of your life, and so on. Let alone extreme cases where e.g. somebody engages in transition surgery or 1-way chemically induced changes which they end up later regretting.
[1] - https://archive.org/details/gender-queer-a-memoir-by-maia-ko...
[2] - https://www.nbcnews.com/nbc-out/out-news/nearly-30-gen-z-adu...
> About half of the Gen Z adults who identify as LGBTQ identify as bisexual,
So that means ~15% of those surveyed are not attracted to the opposite sex (there’s more nuance to this statement but I imagine this needs to stay boilerplate), more or less, which is a big distinction. That’s hardly alarming and definitely not a major shift. We have also seen many cultures throughout history ebb and flow in their expression of bisexuality in particular.
> There are a bizarrely large number similar book as Gender Queer being published, which creates the numeric discrepancy.
This really needs a source. And what makes it “bizarrely large”? How does it stack against, say, the number heterosexual romance novels?
> We seem to have entered into an equal but opposite problem of the past when those of deviant sexuality pretended to be straight to fit into societal expectations.
I really tried to give your comment a fair shake but I stopped here. We are not going to have a productive conversation. “Deviant sexuality” come on man.
Anyway it doesn’t change the fact that the book banning movement is largely a Republican/conservative endeavor in the US. The numbers clearly bear it out.
------
Okay, back to what you said. 30% being attracted to the same sex in any way, including bisexuality, is a large shift. People tend to have a mistaken perception of these things due to media misrepresentation. The percent of all people attracted to the same sex, in any way, is around 7% for men, and 15% for women [1], across a study of numerous Western cultures from 2016. And those numbers themselves are significantly higher than the past as well where the numbers tended to be in the ~4% range, though it's probably fair to say that cultural pressures were driving those older numbers to artificially low levels in the same way that I'm arguing that cultural pressures are now driving them to artificially high levels.
Your second source discusses the reason for the bans. It's overwhelmingly due to sexually explicit content, often in the form of a picture book, targeted at children. As for "sexual deviance", I'm certainly not going General Ripper on you, Mandrake. It is the most precise term [2] for what we are discussing as I'm suggesting that the main goal driving this change is simply to be significantly 'not normal.' That is essentially deviance by definition.
[1] - https://www.researchgate.net/publication/301639075_Sexual_Or...
I don’t see Lesbian, Gay, Bisexual, or Transgender in here, which would absolutely be explicitly included in the list if it applied. Stop saying “sexual deviants” when talking about LGBT people. You know what you’re doing, it’s an incredibly loaded and inaccurate term. To continue calling them “sexual deviants” is a hostile and openly bigoted act. Bestiality and homosexuality are not in the same category and you are wrong to assert otherwise - all while masking it by misrepresenting the APA’s stance at that.
I am not discussing this further. Enjoy the rest of your weekend.
But there is a major difference between tolerating something and endorsing it. I think this is especially true in modern times. 30% of people are obviously not LGB. So you have people acting out sexually in a way that's probably not only 'unnatural' for them, but may end up harming them longterm. It's not a great situation. Because of this I do not indulge language policing which I believe is much more towards endorse than tolerate. Yes you are obviously right I'm aware of what I'm doing, but I also assure you if we met and had a coffee you'd find me anything but bigoted or hostile. We just have different worldviews.
The numbers for women were much lower, but 30% doesn't seem crazy high if you consider the reduced stigma of the bisexual label would allow people who are primarily heterosexual, but are open to homosexual experiences to label themselves as bi.
This is where you get his conclusions such as 37% of men having had a homosexual experience, or 69% of men having purchased a prostitute. It's plainly ridiculous.
No books should ever be banned. Doesn’t matter how vile it is.
And there are force multipliers for all of this. Even if you yourself are a sensible and courageous person, you want to protect your project. What if your manager, ethics committee or funder comes under pressure?
In my experience "data available upon request" doesn't always mean what you'd think it does.