LLM code generation may lead to an erosion of trust

Posted by CoffeeOnWrite 5 days ago

LLM code generation may lead to an erosion of trust(jaysthoughts.com)

248 points | 274 commentspage 2

beau_g 5 days ago|

The article opens with a statement saying the author isn't going to reword what others are writing, but the article reads as that and only that.

That said, I do think it would be nice for people to note in pull requests which files have AI gen code in the diff. It's still a good idea to look at LLM gen code vs human code with a bit different lens, the mistakes each make are often a bit different in flavor, and it would save time for me in a review to know which is which. Has anyone seen this at a larger org and is it of value to you as a reviewer? Maybe some tool sets can already do this automatically (I suppose all these companies report the % of code that is LLM generated must have one if they actually have these granular metrics?)

acedTrex 5 days ago|

Author here:

> The article opens with a statement saying the author isn't going to reword what others are writing, but the article reads as that and only that.

Hmm, I was just saying I hadn't seen much literature or discussion on trust dynamics in teams with LLMs. Maybe I'm just in the wrong spaces for such discussions but I haven't really come across it.

kordlessagain 2 days ago||

Spending half the time building Claude Code tools (MCP servers) and half my time working on Gnosis, an AI powered oracle.

What is an oracle? That's a system that:

- Knows things - Has pre-crawled, indexed information about specific domains

- Answers authoritatively - Not just web search, but curated, verified data

- Connects isolated systems - Apps can query Gnosis instead of implementing their own crawling/search

- May have some practical use for blockchain actions (typically a crypto "oracle" bridges web data with chain data. In this context the "oracle" is AI + storage + transactions on the chain.

The Core Components:

- Evolve: Our tooling layer - manages the MCP servers, handles deployment, monitors health. Agentic tools.

- Wraith: Web crawler that fetches and processes content from URLs, handles JavaScript rendering, screenshots, and more. Agentic crawler.

- Alaya: Vector database (streaming projected dimensions) for storing and searching through all the collected information. Agentic storage.

- Gnosis-Docker: Container orchestration MCP server for managing these services locally. Agentic DevOps.

There's more coming.

https://github.com/kordless/gnosis-evolve

https://linkedin.com/in/kordless

https://github.com/kordless/gnosis-wraith (under heavy development)

There's also a complete MCP inspection and debugging system for Python here: https://github.com/kordless/gnosis-mystic

acedTrex 5 days ago||

Hi everyone, author here.

Sorry about the JS stuff I wrote this while also fooling around with alpine.js for fun. I never expected it to make it to HN. I'll get a static version up and running.

Happy to answer any questions or hear other thoughts.

Edit: https://static.jaysthoughts.com/

Static version here with slightly wonky formatting, sorry for the hassle.

Edit2: Should work on mobile now well, added a quick breakpoint.

konaraddi 5 days ago|

Given the topic of your post, and high pagespeed results, I think >99% of your intended audience can already read the original. No need to apologize or please HN users.

davidthewatson 5 days ago||

Well said. The death of trust in software is a well worn path from the money that funds and founds it to the design and engineering that builds it - at least the 2 guys-in-a-garage startup work I was involved in for decades. HITL is key. Even with a human in the loop, you wind up at Therac 25. That's exactly where hybrid closed loop insulin pumps are right now. Autonomy and insulin don't mix well. If there weren't a moat of attorneys keeping the signal/noise ratio down, we'd already realize that at scale - like the PR team at 3 letter technical universities designed to protect parents from the exploding pressure inside the halls there.

geor9e 5 days ago||

They changed the headline to "Yes, I will judge you for using AI..." so I feel like I got the whole story already.

dr-detroit 5 days ago|

[dead]

DyslexicAtheist 5 days ago||

it's really hard using AI (not impossible) to produce meaningful offensive security to improve defense due to there being way too many guard rails.

While on the other hand real nation-state threat actors would face no such limitations.

On a more general level, what concerns me isn't whether people use it to get utility out of it (that would be silly), but the power-imbalance in the hand of a few, and with new people pouring their questions into it, this divide getting wider. But it's not just the people using AI directly but also every post online that eventually gets used for training. So to be against it would mean to stop producing digital content.

heisenbit 5 days ago||

> The reality is that LLMs enable an inexperienced engineer to punch far above their proverbial weight class. That is to say, it allows them to work with concepts immediately that might have taken days, months or even years otherwise to get to that level of output.

At the moment LLMs allow me to punch far above my weight class in Python where I do a short term job. But then I know all the concepts from decades dabbling in other ecosystems. Let‘s all admit there is a huge amount of accidental complexity (h/t Brooks‘s Silver-bullet) in our world. For better or worse there are skill silos that are now breaking down.

lawlessone 5 days ago||

One trust breaking issue is we still can't know why the LLM makes specific choices.

Sure we can ask it why it did something but any reason it gives is just something generated to sound plausible.

mensetmanusman 5 days ago||

All this means is that the QC is going to be 10x more important.

archibaldJ 4 days ago|

This can be solved when the ARC puzzle is cracked (https://arcprize.org/play) so we can automate correctness-checking like in coq but for program synthesis.

More comments...