Top
Best
New

Posted by jamesxv7 6/30/2025

Ask HN: What's the 2025 stack for a self-hosted photo library with local AI?

First of all, this is purely a personal learning project for me, aiming to combine three of my passions: photography, software engineering, and my family memories. I have a large collection of family photos and want to build an interactive experience to explore them, ala Google or Apple Photo features.

My goal is to create a system with smart search capabilities, and one of the most important requirements is that it must run entirely on my local hardware. Privacy is key, but the main driver is the challenge and joy of building it myself (an obviously learn).

The key features I'm aiming for are:

Automatic identification and tagging of family members (local face recognition).

Generation of descriptive captions for each photo.

Natural language search (e.g., "Show me photos of us at the beach in Luquillo from last summer").

I've already prompted AI tools for a high-level project plan, and they provided a solid blueprint (eg, Ollama with LLaVA, a vector DB like ChromaDB, you know it). Now, I'm highly interested in the real-world human experience. I'm looking for advice, learning stories, and the little details that only come from building something similar.

What tools, models, and best practices would you recommend for a project like this in 2025? Specifically, I'm curious about combining structured metadata (EXIF), face recognition data, and semantic vector search into a single, cohesive application.

Any and all advice would be deeply appreciated. Thanks!

230 points | 121 commentspage 3
ciaranmca 6/30/2025|
I have used https://www.photoprism.app/ and have found the face recognition to work quite well.
osdamv 6/30/2025|
Photoprism is ok, but the AI features of immich are far superior
hammyhavoc 7/1/2025||
What value are you finding in the AI?
slackpad 6/30/2025||
Haven’t tried it yet (I’d love to find something like this too) but I saw a conference talk on https://docs.voxel51.com/ that looked pretty interesting. It is kind of a data frame for images with a GUI for exploring them. They make it pretty easy to rip various models over your images to add tags, and to evaluate the results.
ggm-at-algebras 7/1/2025||
Dedupe over edited photos, and handling highly approximate date information are my "nobody has this right yet" criteria.
spacecadet 6/30/2025||
I built this same solution for myself last year, used Hugging Face's "SmolVLM". It works surprisingly well. I use the model to generate verbose descriptions of each image, embed the descriptions using another model, which I also use for the query embedding.

The stack is hacky, since it was mostly for myself...

joesweetsox 6/30/2025||
Are any of these systems doing true image based entity resolution? It seems like its only pair-wise similarity checking. If you are trying to index say 20 years of family photos how do they do linking kindergardeners to thier adult images?
stormfather 6/30/2025||
I would try the Qwen models before LLaVa

Do you need the embeddings to be private? Or just the photos?

msgodel 6/30/2025|
For photo indexing I'd run CLIP directly and save on compute, no need to use a whole language model.
SirFatty 6/30/2025||
It looks as you are primarily using a phone to view and share? We often (visually) share via our living room TV (via attached computer). Is that something you're looking to incorporate?
xnx 6/30/2025||
https://www.digikam.org/ does a lot of what you're looking for.
iforgotpassword 6/30/2025|
Not web based, and really starts to show its age.
kilpikaarna 7/1/2025||
I don't think the OP specified web based?

Personally I'd love a separate thing that could crawl the photos in a folder I point it to and then let me search using semantics and natural language. But can it please just be an exe I can double click when I need it? If it involves maintaining a server or faffing about with Docker I'm probably not going to bother.

ProfessorZoom 6/30/2025||
I'm also curious as to the best local high quality background removal, such as for gradation images where people are wearing tassels
washadjeffmad 7/1/2025||
Flux Kontext is probably the best for local for a few reasons, but it's slow, uses a lot of VRAM, and changes the quality and resolution. Amazing results if you want <2MP final images, though.

If you need a detailed mask for editing in another application, florence2 or SAM. Or rembg for decent all purpose one shot removals, as long as you have a touchup process or don't mind rerunning the failures.

oth001 6/30/2025||
Stable Diffusion (Web UI or whatever) has add-ons (e.g. rembg) that are really good at this last time I checked
east4ming 7/12/2025|
digikam + nas
More comments...