Posted by ph4evers 4 days ago
Every video is transcribed to get much better transcripts than the closed captions. I filter on high quality transcripts, and afterwards a LLM selects only plausible segments for the exercises. This seems to work well for quality control and seems to be reliable enough for these short exercises.
Would love your thoughts!
After 4 retries, the spinner finally gave up but it incorrectly said "Sorry, no exercise available for this language today." and not, as it should have, "We were unable to load the exercises. Try again later, or contact support at ${email}"
---
The AppSec-er in me wants to point out that returning the version of nginx that you're using is an antipattern since it enables more targeted attacks if the version has woes; it does it in the error, and it does it in the headers
Yes, the server got knocked out. I was not expecting this much traffic hah. I already upgraded it but I have an NLP server with 10 language models loaded and it seems to be grinding CPU resources.
It would be nice to limit the YouTube content a bit like not just news, but an option for news in slow French, or something else. At least for me news in slow French is way easier to understand than news in French at 0.5x in you tube.
Maybe it's just my phone, but the dragging and dropping wasn't hit or miss it was mostly broken. On an English speaking video (my native language) filling in three gaps took me like five video repetitions to get the words in place. It made me feel a lot better about my Spanish speaking performance. Just clicking the words like someone else suggested would solve the problem completely for me, but it might be like a "hit box" problem on the words.
I've been working hard to get the quality up. And now that I have some paid users for the large languages I can also auto-transcribe high quality channels. Main reason for the poor exercises (especially for German) was that I initially picked some poor channels and I was being cheap.
I've updated the german channel and that should hopefully result in a better experience.
I'm using AssemlbyAI and Deepgram for the transcripts at the moment. Unfortunately, they don't support Irish. However, I did see this: https://elevenlabs.io/speech-to-text/irish . Not sure how accurate it is.
Also I'm maybe jlpt4 and the text was too hard, you should let me choose difficulty.
However, I was very confused by the interface at first. I started a with a 3 gap exercise. I dragged what I thought was the correct word into the gap. Listened again, changed my mind but I couldn't drag in my new choice. It was a while before I realised that the correct word had been inserted for me. This was despite me not completing the other gaps.
It would be better if the answers weren't given until the user submits the answer.