Posted by ph4evers 4/1/2025
Every video is transcribed to get much better transcripts than the closed captions. I filter on high quality transcripts, and afterwards a LLM selects only plausible segments for the exercises. This seems to work well for quality control and seems to be reliable enough for these short exercises.
Would love your thoughts!
1) Let us keep the right sidebar permanently out, and DON'T grey out the rest of the screen. I want to be able to click on target language words and immediately see them. Like, you've given us the translated sentence, but I can't see which word is which;
2) Colour _the same words_ in both languages when doing mouseover;
3) Or just highlight BOTH as we're listening [but note issue below!];
4) Make the keyboard use a bit more intuitive - i.e. left/right obviously means "go back or forward in the video/audio", but now I have to CLICK on the yt video again to get that behavior. It should be auto so I don't have to do that. Similarly, I want to click on a word to know it's meaning, but then go back to space->pause behavior. Rn clicking a word breaks that. Just adds friction to users.
5) Consider yt-dlp to save the videos so if we are studying one, and yt pulls it, we can keep using. Maybe for the roadmap.
6) Consider allowing us to add words to vocab -- and which vocab -- directly from mouseover [without clogging up UI - not sure there]. Right now it's a bit convoluted [right sidebar, which again should be permanent and integrated, not greying out the main screen - but even if that was fixed, that's a lot of mouse movement]
6) Handle idiomatic language issues better. You'll probably need another LLM pass/method for this, but IT'S a BIG ONE! Languages don't map 1:1 obviously, so for example this one:
https://app.fluentsubs.com/stream?v=cm8mnqrqe084ervb0mi6a4sa...
"genommen" was translated as "taken" <- means nothing.
I dump into 4o and it explains
In the phrase „genau genommen“, the word „genommen“ is part of a fixed idiomatic expression and doesn't translate literally as "taken."
„Genau genommen“ means "strictly speaking" or "to be precise."
So the full sentence:
„Wir sind heute wieder auf der Straße unterwegs, genau genommen auf dem Flohmarkt…“
translates to:
"Today we're out on the street again — strictly speaking, at the flea market…"
It’s specifying or narrowing down what “on the street” means in this context.
**
So you'll need to pull out these idiomatic phrases and then make sure they can be analysed as a single unit, so to speak. Learners are gonna have to be acquainted with those, and now the workflow is obviously broken.
Basically just get a model to bundle them and then in the sidebar on the right that has like "drill into X" you've got the PHRASE as a unit of analysis.
1. Makes sense! 2./3. That's a bit hard, but like point 6 I think it is possible to map certain parts. 4. Makes sense 5. I put it on the roadmap but I think it is not so much of a priority now. I want to have an offline mode at a certain point (as well as a dedicated app) 6. Yes, this is hard and expensive. But I think that I should have a high quality section with proper quality control. I have some ideas to quickly create lessons as a teacher, but right now I'm mainly firefighting stability and quality
Thanks again for the extensive write-up
The sidebar greying out the foreground now and not able to stay locked REALLY breaks flow. Fixing that slightly mitigates.
It’s amazing tho and I’ll subscribe soon enough.
Also, your page needs to disclose any content filtered by or generated by a model.
As for the Union Jack: the UK has at least 3 rather different languages (English, Gaelic, Welsh), possibly a few more depending on how you count the different kinds of Gaelic.
Using a country flag to represent a language has always struck me as being silly. Only rarely do they map 1-to-1.
https://www.bbc.com/travel/article/20180206-the-tiny-us-isla...
Also, for what it's worth:
> Some people have characterised Tangier’s way of speaking as ‘Elizabethan’ or ‘Restoration’ English, but that’s nonsense. Languages aren’t static and the Tangier dialect has changed a lot because of its isolation. It’s a distinct creation of its own," Shores said.
[0]https://www.bbc.com/travel/article/20190623-the-us-island-th...
Mixed with, yes, the variant spellings and word choices (e.g. chips/crisps/biscuits) that make it apparent to British English readers when something is American.
EDIT: Of course, it doesn’t matter one bit in the grand scheme of things—feel free to ignore my pedantry over a silly joke :-)
To use your example, there are plenty of Irish people who speak English but would resent being forced to identify with the Union Flag.
For another example that is very relevant today, there are plenty of Russian-speaking Ukrainians who hate Russia. Using the Russian flag to represent them would at best be distasteful.
[1] https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes
For example, the very first English video I got was a South African English accent.
Of the five languages I have configured in KDE, three of them are country-specific. So I use the flag indicator, which is far quicker for me to locate and identify out of the corner of my eye than would be a text label (which would require using the retina and thus more time and attention).
As for English, the United States has far and away the largest number of native English speakers.
Not that I think the stars and stripes has any more right to represent “English” as a concept any more than the Union Jack. If you’re going on origin, why not the flag of England instead?
> If you’re going on origin, why not the flag of England instead?
I actually really like that idea. The US and UK flags seem to represent more culture than language.The moral is: don't try to draw boxes around languages.
All that said, I do understand why someone would want to use flags as shorthand for language. It's wrong, but it's useful.
You would be far more likely to understand any given English speaking person in the USA than in England. It should really be called American at this point.
I picked news channels because they often have short well spoken videos.