I used Claude Code to get a second opinion on my MRI

Posted by engmarketer 3 hours ago

I used Claude Code to get a second opinion on my MRI(antoine.fi)

101 points | 136 commentspage 2

lucfranken 2 hours ago|

Why wouldn’t you as a doctor by standard run the images through a certified compliant LLM? The actual cost won’t be it and then you can see if you get any new ideas from it. See if it’s just wrong or that it spotted a little detail you missed?

The LLM doesn’t need to be leading or whatever but then you can have a conversation with the patient. If their ChatGPT reports has differences it can be analyzed as well.

It feels like the time constraint of the 15m doctor sessions is the thing. But if prepared immediately after the scan then why not?

There is always time needed to factor in new developments and innovations and that’s fine. Just moving blindly work from human to LLM is wrong. But learning on and testing with all the ai tools incoming constantly won’t be a waste. There will be more and more tools in those processes outside of human judgement, better improve the workflows now to be able to test and plugin new models and systems when they are ready.

KaiserPro 1 hour ago||

> standard run the images through a certified compliant LLM?

Because they don't exist, yet.

In the UK MRIs and other imaging systems need two opinions. there has been a move to allow the first opinion to be ML based.

The _problem_ is that you are basically doing grey smudge analysis, and thats fucking hard.

foobarian 1 hour ago|||

I've been starting to think of LLM as a great tool for "lead generation," borrowing a term from sales. Most of the things it comes up with don't pan out, but in many cases it's things we wouldn't have thought of, or at least not as quickly. This is especially in the context of web service or SAAS outages.

yread 1 hour ago||

Because they might bias you. And because you have your own brain, training and experience

davikr 1 hour ago||

You can try sending basic chest radiographs to GPT and it'll fail at interpretation. I'd be wary of premature conclusions.

LogicFailsMe 1 hour ago||

I did the same exercise here with medical reports and CT scans for a friend's cancer diagnosis and I got ahead of the oncologists predicting they were about to be cured. Spoilers: yep, cancer free now.

And well, yes, I have the appropriate life science degrees to navigate clinical trial reports and research publications, and that was likely indispensable for steering Claude Code where it went, the radiologist's caution is merited here. But it's just not amateur hour for me to do this, it's 2 decades of academic research in my rearview mirror.

darepublic 1 hour ago||

I would like if we could have a site where you submit your MRI then doctor commenters anonymously post their opinion. In general I want a forum where.. when people come with questions for which there are varying opinions we don't just have people leave their 2c and then jet. The thread persists, duplicated ideas get merged, erroneous statements get purged and gradually we refine shining truth

lostlogin 1 hour ago|

I’m wondering how many radiologist want to work all day, then come home and work.

Many can get paid fee-for-service for after hours work, so would probably prefer that.

eqvinox 2 hours ago||

> My hope is that in a couple of model generations, we'll trust AI to review MRIs the way we trust it to proofread our emails.

https://www.nature.com/articles/d41586-026-01947-1

I've started asking my doctors whether they use AI, and if they say yes look for another one.

rmbyrro 1 hour ago||

That study seems to be confounding factors and rushing to a questionable conclusion.

A very plausible explanation for the adenoma detection rate to have gone down is simply that its prevalence went down among the population in the second three-month period.

This was not a randomized trial. Concluding that "AI usage degrades physicians' skills" is questionable at the very least.

eqvinox 1 hour ago||

There's a whole bunch of other studies on this topic, as well as metastudies, and from what I can tell the problem is real.

https://www.sciencedirect.com/science/article/pii/S245195882... (+ cf. its references)

throwatdem12311 2 hours ago||

I don’t even trust AI to proofread my emails.

TSiege 2 hours ago||

Always worth a share for this scenario. It's not clear if LLMs are capable of doing actual analysis on medical imaging. For details see this article https://futurism.com/artificial-intelligence/frontier-models...

> As detailed in a new, yet-to-be-peer-reviewed paper, a team of researchers at Stanford University found that frontier AI models readily generated “detailed image descriptions and elaborate reasoning traces, including pathology-biased clinical findings, for images never provided.”

> In other words, the AI models happily came up with answers to questions about a supposedly accompanying image — even if the researchers never even showed it an image.

> As opposed to hallucinations, which involve AI models arbitrarily filling in the gaps within a logical framework, the team coined a new term for the phenomenon: “mirage reasoning.”

> The effect “involves constructing a false epistemic frame, i.e., describing a multi-modal input never provided by the user and basing the rest of the conversation on that, therefore changing the context of the task at hand,” the researchers wrote in their paper.

> The damning findings suggest AI models cheat by diving into the data they were given — and coming up with the rest based on probability, even if it’s almost entirely conjecture.

kierangill 2 hours ago||

I work at a telemedicine company. We’ve benchmarked a few frontier LLMs on public medical imaging datasets. One test included high-quality and high-consensus otoscopic images. We didn’t anticipate the models to do well on something so niche, but what concerned us was how poorly calibrated the models were.

I know you can’t trust an LLM’s self-assessed “confidence” of a prediction, but I’ve found that confidence can at least be directionally correct for some tasks. For our benchmarks, however, confidence was poorly correlated. What’s worse is that binary classification models (“Do you see $diagnosis in this photo?”) highly influenced the LLM to confidently predict $diagnosis.

I’m concerned for those using LLMs for diagnostics, and getting confidently led to the wrong conclusion.

nostrebored 2 hours ago||

But the binary classification models can be made ternary easily. RL on congruence plus penalty for misdiagnosis is easy to set up and gives great results.

What I’ve seen be the true bottleneck is people not setting up the structured data. But making a tiny reasoning model with OPSD -> GRPO is totally doable with a bit of money.

appplication 2 hours ago|||

It makes a lot of sense if you understand how these models work but this was a cool read anyways and studies like this are impotent for curbing the unfortunate fever dream some folks seem to be collectively having about LLM omnipotence

seanmcdirmid 2 hours ago|||

I don’t understand how this is a different result than giving any LLM a task that is not completely grounded? I’ve observed this in coding tasks, if I forget to include a file referred to in the spec, the LLM will just hallucinate a version of it and my results suck. If I give it the file (and really, all the information I claimed it had access to), the task works fine. I fixed this in my pipeline with a prompt that does an extensive grounding analysis to determine if the assets I’m giving it are complete with respect to the spec (and that the spec is grounded as well, ie it doesn’t refer to something that is undefined).

I wonder if the above problem can be fixed similarly? Just ask the LLM to do a conservative grounding analysis before jumping to the main task?

pickleRick243 1 hour ago||

It's not different- there's a line of research and reasoning where people who don't use LLM's regularly point out issues that have been known (and more or less solved) for more than a year now (which is an eternity in the LLM space).

tracerbulletx 2 hours ago|||

The absolute only thing that matters is if they are provided an image what's the success rate.

consensus1 2 hours ago||

But why should I care? If you demonstrated that a model can perform more accurate diagnoses than a doctor, but also it had this strange behavior when no image was presented, why should that deter me from using the model?

swiftcoder 2 hours ago||

Because you don’t have any way of telling if it actually used the image presented, or based it’s conclusions on a different image it made up

simianwords 2 hours ago||

Really? You know you could just ask it.

Aeolun 3 hours ago||

I would not use Claude to get a second opinion on anything that’s an image.

rmbyrro 1 hour ago||

I agree with you for some kinds of images, but not all.

LLMs are the best PDF-to-markdown converters, in my experience. I have a CLI that converts PDF to PNG, then run a background agent to "read" each PNG and write it down as markdown; it works flawlessly even for complex math formulas, it can "translate" complex charts, graphs, and tables into words.

It's slow and arguably expensive compared to traditional OCR, but very effective and precise.

maxall4 3 hours ago|||

Especially an MRI which is a 3D medium —something current LLMs are very bad at.

lostlogin 1 hour ago|||

> MRI which is a 3D medium

The finer detail (which you may already know) is more complicated.

MR does ‘2D’ scans which are a slice, then a gap of non-imaged tissue (typically 10% the slice thickness) then a slice. Each slice is an image with a number of pixels, say 320. Each pixel in the slice is small, eg 0.5mm but very thick due to the slice being thick, which is required for MRI signal. The pixels are 3mm in the shoulder scan done here.

‘3D’ scans don’t have a gap between slices, and are often isotopic, meaning the same resolution in all directions. The voxel (a pixel with depth) would be something like 1mm x 1mm x 1mm.

3D scans are slow, prone to movement artifact and never as pretty in plane as a good 2D. You can reformat them to look ok in any plane.

amluto 2 hours ago|||

I know little about radiology, but MRI is a 3D medium. I would not be at all surprised if one could slice an MRI the wrong way to produce a 2D image that fails to show a feature that exists in the source data.

yolo3000 3 hours ago|||

I used it on an ankle fracture xray, it was quite useful to make sense of things. But not like a 2nd opinion.

behnamoh 2 hours ago||

What's wrong with Claude? I've asked it to analyze images and even Opus 4 would perfect nail it.

throwrioawfo 2 hours ago|||

Sure, it can see obvious stuff in images, but as far as I'm aware it is not designed for (or tested on) performing the kind microscopic analysis that radiology involves

nostrebored 2 hours ago|||

Claude is the worst FM at image understanding. Prior to gpt-5.4 the only usable models were Gemini and Qwen.

mootothemax 2 hours ago||

Can any LLM give you the rough pixel coordinates of an item it identifies in an image?

I found that while Claude, GPT etc could describe an image, there was no way to link the description back to specific pixels in the image itself. Not even to a bounding box or segment.

intoXbox 2 hours ago||

Radiologists very often have to weigh up different theories, guidelines based on the symptoms. The certainty of their diagnosis is their added value, or if they don’t know they will tell you why.

An AI telling you it could be X or Y because theory ABC… is the academic answer and a luxury clinicians don’t have. AI doesn’t give you what you want. I don’t see any added value in using generic AI models for this

skybrian 3 hours ago|

Getting an actual second opinion seems like the next step?

More comments...