Posted by sohamrj 5 hours ago
I used this to build a CLI that indexes hours of footage into ChromaDB, then searches it with natural language and auto-trims the matching clip. Demo video on the GitHub README. Indexing costs ~$2.50/hr of footage. Still-frame detection skips idle chunks, so security camera / sentry mode footage is much cheaper.
If there is text on the video (like a caption or wtv), will the embedding capture that? Never thought about this before.
If the video has audio, does the embedding capture that too?
Cool Project, thanks for sharing!
a bit expensive right now so it's not as practical at scale. but once the embedding model comes out of public preview, and we hopefully get a local equivalent, this will be a lot more practical.