Ask HN: Who uses open LLMs and coding assistants locally? Share setup and laptop

Posted by threeturn 4 days ago

Dear Hackers, I’m interested in your real-world workflows for using open-source LLMs and open-source coding assistants on your laptop (not just cloud/enterprise SaaS). Specifically:

Which model(s) are you running (e.g., Ollama, LM Studio, or others) and which open-source coding assistant/integration (for example, a VS Code plugin) you’re using?

What laptop hardware do you have (CPU, GPU/NPU, memory, whether discrete GPU or integrated, OS) and how it performs for your workflow?

What kinds of tasks you use it for (code completion, refactoring, debugging, code review) and how reliable it is (what works well / where it falls short).

I'm conducting my own investigation, which I will be happy to share as well when over.

Thanks! Andrea.

339 points | 187 commentspage 6

mwambua 3 days ago|

Tangential question. What do people use for search? What search engines provide the best quality to cost ratios?

Also are there good solutions for searching through a local collection of documents?

andai 3 days ago||

ddg (python lib) is free and I'd say good enough for most tasks. (I think the endpoint is unofficial, but from what I've heard it's fine for typical usage.)

There's also google, which gives you 100 requests a day or something.

Here's the search.py I use

    import os
    import json
    from req import get

    # https://programmablesearchengine.google.com/controlpanel/create
    GOOGLE_SEARCH_API_KEY = os.getenv('GOOGLE_SEARCH_API_KEY')
    GOOGLE_SEARCH_API_ID = os.getenv('GOOGLE_SEARCH_API_ID')

    url = "https://customsearch.googleapis.com/customsearch/v1"

    def search(query):
        data = {
            "q": query,
            "cx": GOOGLE_SEARCH_API_ID,
            "key": GOOGLE_SEARCH_API_KEY,
        }
        results_json = get(url, data)
        results = json.loads(results_json)
        results = results["items"]
        return results

    if __name__ == "__main__":
        while True:
            query = input('query: ')
            results = search(query)
            print(results)

and the ddg version

    from duckduckgo_search import DDGS

    def search(query, max_results=8):
        results = DDGS().text(query, max_results=max_results)
        return results

mwambua 3 days ago||

Oh, nice! Thanks! This reminds me of the unofficial yahoo finance api.

nickthegreek 2 days ago||

just setup searxng yesterday and mcp for it in lm studio to be able to search the net for answers to simple queries. small ibm granite worked surpassingly well, while oss20b seemed to be looping searches.

Weloveall 2 days ago||

I use ollama in openwebui and llama3 in model. But I also have an ssh flask where I use an claude api.

more_corn 3 days ago||

My friend uses a 4 gpu server in her office and hits the ollama api over the local network. If you want it to work from anywhere a free tailscale account.

lovelydata 3 days ago||

llama.cpp + Qwen3-4B running on older PC with AMD Radeon GPU (Vulcan). Users connect via web UI. Usually around 30 tokens/sec. Usable.

NicoJuicy 3 days ago|

What do they use it for? It's a very small model

embedding-shape 3 days ago||

Autocomplete words, I'd wager, as yeah, super tiny model that can barely output coherent output in many cases.

packetmuse 3 days ago||

Running local LLMs on laptops still feels like early days, but it’s great to see how fast everyone’s improving and sharing real setups.

timenotwasted 3 days ago||

I have an old 2080TI that I use to run Ollama and Qdrant. It has been ok, I haven't found it so good that it has replaced using Claude or Codex but there are times where having RAG available locally is a nice setup for more specific queries. I also just enjoy tinkering with random models which this makes super easy.

My daily drivers though are still either Codex or GPT5, Claude Code used to be but it just doesn't deliver the same results as it has previously.

Frannky 3 days ago||

I wonder how long it will be before 400B+ local models achieving 2K TPS become a reality for under $10K

system2 3 days ago||

Those who use these can you compare the quality of code compared to Claude Sonnet 4.5 or Opus 4.1?

Gracana 3 days ago||

I don’t own a laptop. I run DeepSeek-V3 IQ4_XS on a Xeon workstation with lots of RAM and a few RTX A4000s.

It’s not very fast, and I built it up slowly without knowing quite where I was headed. If I could do it over again, I’d go with a recent EPYC with 12 channels of DDR5 and pair it with a single RTX 6000 Pro Blackwell.

mooiedingen 3 days ago|

Vim+ollama-vim Start new file with at the top in the comments the instructions needed to follow to become the solution to the problem and let it work like a sort of auto complete... example: # The Following is a python # Script that uses the # libraries requests and # BeautifullSoup to scrape # url_to_scrape = input( # "what url do i need to # fetch?") import ... """autocompletes from here the rest""" anywhere in a script one can # comment ' Instructions this way i find the most effective instead of asking Write me a script for this or that.. take a coding model and finetune it with commonly used snippets of code... This is completely customizable and will stay coherent to your own writing style.. i made embeddings per language, even md. python javascript vimscript lua php html json(however output is json) xml Css ...

More comments...