Top
Best
New

Posted by ph4evers 4/1/2025

Show HN: Duolingo-style exercises but with real-world content like the news(app.fluentsubs.com)
I've been working on a little side project that combines Duolingo-like listening comprehension exercises with real content .

Every video is transcribed to get much better transcripts than the closed captions. I filter on high quality transcripts, and afterwards a LLM selects only plausible segments for the exercises. This seems to work well for quality control and seems to be reliable enough for these short exercises.

Would love your thoughts!

472 points | 184 commentspage 3
owenpalmer 4/1/2025|
I absolutely love the idea. I would honestly use this. However, when I tried the English learning, it incorrectly marked words wrong several times. Something to check out.
ph4evers 4/1/2025|
Thank you! It seems that the video is pretty well transcribed but it unluckily selected segments with a few words missing :/
owenpalmer 4/2/2025||
The words were correct in the YouTube subtitles, both in spelling and sequence. It must have been a problem with the transcription or some other bug (whitespace?)
mitthrowaway2 4/1/2025||
I tried Japanese; the Youtube video that autoplayed had its timing slightly off so that instead of saying あたまも疲れました I only heard まも疲れました. It was pretty confusing but fortunately the answer was displayed right in the video because the video itself had its text spelled out.

https://app.fluentsubs.com/exercises/cm8v909oq00fj9x1kztl1ez...

gwd 4/1/2025||
I love having real sentences -- so much more engaging than the random things made up by Duolingo!

What are your long-term plans with this? I'd love at some point to be able to combine something like this with an algorithm I'm working on called Guided Immersion.

Basically, the system tracks what words you know and don't know, and so could tell you how hard a given sentence is for you. And it also tracks what words it would be useful to review and/or learn (spaced repetition and frequency analysis), to tell you how valuable a sentence would be for you.

The algorithm is generic and can be adapted to any language; right now it's been adapted to Mandarin Chinese, Korean, and New Testament Greek. (Which unfortunately so far doesn't seem to overlap with any of your available languages.) I'm working on an API to allow any content providers to use the algorithm.

Adding this to your system could help focus the content you're showing people to things that they're likely to be able to understand without having to look up most words, and helping them incrementally grow and solidify their vocabulary using the built-in spaced repetition.

Drop me a line if you want to chat at some point -- my email is in my about.

mcjiggerlog 4/1/2025||
Really cool idea! I tried a few Spanish ones (I speak Spanish) and unfortunately it was marking things as incorrectly wrong on 2/5 videos I did!
ph4evers 4/1/2025|
That's a bit unfortunate, sorry about that!

I only checked English, French, Dutch and German and assumed that Spanish would be OK. Was this for drag & drop. And do you maybe have the video? Maybe I need to tune the quality threshold specifically for Spanish videos.

mcjiggerlog 4/1/2025||
I actually did the same video on desktop and the same answers worked fine! Screenshots of it failing in an android webview, but passing on desktop firefox: https://imgur.com/a/vALlFdH.
ph4evers 4/1/2025||
Oh wow, I think this is a cross platform bug where I dumbly assumed that strings were equal without normalizing it. I'll fix it! Thanks!
mattsouth 4/1/2025||
It looks great - nice work. I tried the french example and found it challenging and useful - a great addition to my duo lingo practice. So much so that I signed up. But in doing so I lost the credits that Id apparently acquired by completing the example which was a little disappointing. I hadnt seen the Easy French videos before - they look nice too.
black_puppydog 4/1/2025||
Hey, this looks really nice and worked like a breeze for French!

Question: out of the processing steps you mention - transcription, quality filtering, segment selection, and (I guess) wrong-word selection) are there any truly manual steps? Those would be the ones that prevent you from building this for just about any language that has good transcription available, right?

ph4evers 4/1/2025|
There are no manual steps. But it is hard to gatekeep quality. The transcription models work well for the large languages but not so much for the smaller ones.
nougati 4/1/2025||
As a resident Duolingo apologist this is certainly awesome! I appreciate how little landing-page fluff there was before I could give it a shot. I tried Japanese and felt it was only reasonable in tandem with my in-built translation extension, since Kanji-reading knowledge itself is a major hurdle of learning. Furigana would really help this, but personally, being able to translate the words I pick helps a lot during the challenge of hearing new vocabulary in native Japanese.

As well, I am learning multiple languages, and noticed that the settings panel seems to be the way to switch between them. I think it's a little unnatural to force a user to do this, but if there's an intention for bookmarking languages of interest for separate collections of videos & transcription exercises I can say I'd be happy to pay, honestly. The pricing itself seems reasonable and I appreciate that I can feel the app out for free.

Interesting project!

ph4evers 4/1/2025|
Thank you! And great suggestions!

I focussed first on European languages as I'm learning French. But I'll put some more effort in improving the Japanese experience since it seems to be very popular.

> but if there's an intention for bookmarking languages of interest for separate collections of videos & transcription exercises I can say I'd be happy to pay, honestly.

Would a language selection box at the top be enough? Or do you mean a more elaborate way to switch between languages?

nougati 4/1/2025||
For me, the main interest would be to switch the interface for one language to another in ideally 1-2 clicks; so if there was an interface element that captured the languages I was 'working on' that would be neat. Then I'd be happy to peruse the full list whenever curiosity spikes.

Otherwise, great work on a good use of existing technologies to provide meaningful educational benefit for yourself and others!

N-Krause 4/1/2025||
I am a audio-visual learner and on Duolingo, which I am currently using to learn Spanish, my biggest problem is, that I have not a real visual for the words. Sure sometimes you get a picture for single substantives, but learning via video and watching mouth movements is so much better for me.

So this is a welcome tool I am definitively gonna check out.

annienar 4/1/2025||
I tried the japanese one, as an A1, I can't read/don't know kanji yet, would be nice to have an option to see katakana/hiragana only, an option to have furigana and an option to see the kanji. Also would like an option to save phrases and not just a word. but likes it overall
ph4evers 4/1/2025|
Thank you! It should store the word and the sentence so that rehearsal is always with both.

I focussed a lot on European languages at first so the support for the Asian languages is a bit lacking. The only thing I did so far was changing the font and increasing the font size. There is a lot more to do! Thanks!

mdaniel 4/1/2025|
It seems we either ate all your LLM credits or knocked your server over since the spinner just spins (checking dev tools coughs up that https://app.fluentsubs.com/api/exercises/daily?language=fr is 504)

After 4 retries, the spinner finally gave up but it incorrectly said "Sorry, no exercise available for this language today." and not, as it should have, "We were unable to load the exercises. Try again later, or contact support at ${email}"

---

The AppSec-er in me wants to point out that returning the version of nginx that you're using is an antipattern since it enables more targeted attacks if the version has woes; it does it in the error, and it does it in the headers

ph4evers 4/1/2025|
Thanks for reporting! I'll fix the Nginx version exposure.

Yes, the server got knocked out. I was not expecting this much traffic hah. I already upgraded it but I have an NLP server with 10 language models loaded and it seems to be grinding CPU resources.

More comments...