Posted by ph4evers 4 days ago
Every video is transcribed to get much better transcripts than the closed captions. I filter on high quality transcripts, and afterwards a LLM selects only plausible segments for the exercises. This seems to work well for quality control and seems to be reliable enough for these short exercises.
Would love your thoughts!
https://app.fluentsubs.com/exercises/cm8v909oq00fj9x1kztl1ez...
What are your long-term plans with this? I'd love at some point to be able to combine something like this with an algorithm I'm working on called Guided Immersion.
Basically, the system tracks what words you know and don't know, and so could tell you how hard a given sentence is for you. And it also tracks what words it would be useful to review and/or learn (spaced repetition and frequency analysis), to tell you how valuable a sentence would be for you.
The algorithm is generic and can be adapted to any language; right now it's been adapted to Mandarin Chinese, Korean, and New Testament Greek. (Which unfortunately so far doesn't seem to overlap with any of your available languages.) I'm working on an API to allow any content providers to use the algorithm.
Adding this to your system could help focus the content you're showing people to things that they're likely to be able to understand without having to look up most words, and helping them incrementally grow and solidify their vocabulary using the built-in spaced repetition.
Drop me a line if you want to chat at some point -- my email is in my about.
I only checked English, French, Dutch and German and assumed that Spanish would be OK. Was this for drag & drop. And do you maybe have the video? Maybe I need to tune the quality threshold specifically for Spanish videos.
Question: out of the processing steps you mention - transcription, quality filtering, segment selection, and (I guess) wrong-word selection) are there any truly manual steps? Those would be the ones that prevent you from building this for just about any language that has good transcription available, right?
So this is a welcome tool I am definitively gonna check out.
I focussed a lot on European languages at first so the support for the Asian languages is a bit lacking. The only thing I did so far was changing the font and increasing the font size. There is a lot more to do! Thanks!