Inside the M4 Apple Neural Engine, Part 1: Reverse Engineering

Posted by zdw 1 day ago

Inside the M4 Apple Neural Engine, Part 1: Reverse Engineering(maderix.substack.com)

220 points | 55 comments

blobbers 13 minutes ago|

Can someone help me understand when these neural engines kick in in open source software?

I typically use python ML libraries like lightgbm, sklearn, xgboost etc.

I also use numpy for large correlation matrices, covariance etc.

Are these operations accelerated? Is there a simple way to benchmark?

I see a lot of benchmarks on what look like C functions, but today in my jobs I rely on higher level libraries. I don't know if they perform any better on apple HW, and unless they have a flag like use_ane I'm inclined to think they do better.

Of course chatgpt suggested I benchmark an Intel Mac vs. newer apple silicon. Thanks chatgpt, there's a reason people still hate AI.

LatencyKills 6 hours ago||

I worked on the Xcode team for years and know the lengths Apple goes to make this stuff difficult to figure out.

I just wanted to say that you’ve done an excellent job and am looking forward to the 3rd installment.

RetpolineDrama 4 hours ago|

>I worked on the Xcode team for years

Why did you guys remove the ability to detach the console and move it to another window?

estimator7292 3 hours ago||

[flagged]

Octoth0rpe 6 hours ago||

Part 2 has benchmarks: https://maderix.substack.com/p/inside-the-m4-apple-neural-en...

6.6 FLOPS/W, plus the ability to completely turn off when not in use, so 0W at idle.

notepad0x90 2 hours ago||

I've been guilty of this myself, but every other comment here is like "What about <insert something unrelated to the topic but related to apple>".

zozbot234 2 hours ago||

Much of this information we already knew the very basics of from documentation of the M1/M2 ANE as accessed via bare-metal from Asahi Linux, but it's nice to see confirmation and it being explored in further depth. Note that according to OP Parts 1/2 for very large matmuls CoreML adds little to no overhead compared to the lower-level interface, so there seems to be plenty of scope for supporting ANE for prefill in local AI frameworks. Decode is generally memory-bandwidth limited unless context is very large, and the ANE requires special handling (converting from matmul to 1x1 convolution as described here is wasteful of memory bandwidth, as is potentially dequantizing to INT8/FP16 in memory) so it's less of a clear win.

eleventyseven 6 hours ago||

> Throughout this series, “we” refers to maderix (human) and Claude Opus 4.6 (by Anthropic) working as a pair. The reverse engineering, benchmarking, and training code were developed collaboratively

Sure, "collaboratively." Why would I ever trust a vibe coded analysis? How do I, a non expert in this niche, know that Opus isn't pulling a fast one on both of us? LLMs write convincing bullshit that even fools experts. Have you manually verified each fact in this piece? I doubt it. Thanks for the disclaimer, it saved me from having to read it.

Anonbrit 5 hours ago||

Humans also write endless amounts of convincing bullshit, and have done since time immemorial. False papers and faked results have been a growing scourge in academia before LLMs were a thing, and that's just counting the intentional fraud - the reproducibility crisis in science, especially medical and psychological science, affects even the best designed and well intentioned of studies.

Humans also make mistakes and assumptions while reverse engineering, so it will always need more engineers to go through the results, test things

withinboredom 6 hours ago||

Claude likes to hide bad benchmarks from you, so it will show you where you are clearly winning. You even see some weird benchmarks in the article.

behnamoh 5 hours ago||

It's insane that the source code of ANE is not available even to the MLX team, possibly one of the reasons Awni (MLX project head) left Apple.

mathisfun123 5 hours ago|

[flagged]

behnamoh 5 hours ago||

Yes I haven't worked at a hardware company, nothing to be ashamed of!

timcobb 4 hours ago|||

I'm not op but I don't think op meant to shame, I understand the construction "tell me you're... without telling me" as a way to highlight that something is unexpected to people who haven't done something, that is that something is particularly unintuitive without some special experience.

webdevver 4 hours ago||

he did a reddit (cringe) and now must be punished for it (the text becomes an absolutely fucking unreadable shade of light grey)

webdevver 4 hours ago||||

actually, it really is not neccesarily a 'hardware company' thing. ive been in 'hardware companies' where the rtl was just as available for viewing as the rest of the firmware/software.

in big hardware companies, things start getting siloed, but that probably has more to do with big companies (seemingly invariably) operating as a union of fiefdoms (dunbar-number-ification?)

mathisfun123 3 hours ago|||

> It's insane that the source code of ANE is not available even to the MLX team

no it's not insane - it's completely mundane policy. that's my point - that you're calling something out as insane with exactly zero experience (which is the actually insane thing...).

9dev 1 hour ago||

on that line of argument, nobody would have ever called out the emperor for not wearing any clothes, civilians would not go to peace protests, and nobody would ever improve things by looking at something from another angle.

mathisfun123 1 hour ago||

This is a completely asinine take - you're not observing the emperor with no clothes here - you're completely outside the kingdom hypothesizing that the emperor has no clothes. To wit: you don't actually know the the ANE "source" isn't available to MLX. Hint: it probably is but there's just red tape involved.

GeekyBear 5 hours ago||

The recent news is that Apple is supposedly replacing the Core ML framework with an updated version that will make it easier to integrate third party LLMs into your apps.

> the company is also planning a few other software-based AI upgrades, including a new framework called Core AI. The idea is to replace the long-existing Core ML with something a bit more modern.

https://www.bloomberg.com/news/newsletters/2026-03-01/apple-...

love2read 7 hours ago||

This article was clearly written by a human (and AI) but still has a few "LLMisms" such as:

- The key insight - [CoreML] doesn't XXX. It YYY.

With that being said, this is a highly informative article that I enjoyed thoroughly! :)

The article links to their own Github repo: https://github.com/maderix/ANE

walthamstow 7 hours ago||

We've got about a year before so many people are interacting with LLMs on a daily basis that its style starts to reverse infect human speech and writing

baxtr 4 hours ago|||

Great insight – Would you like to try and identify some specific "AI-isms" that you've noticed creeping into your own writing or your colleagues' emails lately?

pixl97 6 hours ago||||

This said, there were people that talked like this before LLMs, it didn't develop this whole cloth.

pcrh 3 hours ago|||

The article above doesn't read well, at all.

It's not my subject, but it reads as a list of things. There's little exposition.

DrScientist 5 hours ago|||

Exactly. LLM's are mimics.

People seem to be going around pointing out that people talk like parrots, when in reality it's parrots talk like people.

pixl97 5 hours ago||

I mean, it's both.

Did you develop your own whole language at any point to describe the entire world? No, you, me, and society mimic what is around us.

Humans have the advantage, at least at this point, of being a continuous learning device so we adapt and change with the language use around us.

Angostura 6 hours ago|||

My honest take? You're probably right

sholladay 5 hours ago||

You are absolutely right.

Here is why you are correct:

- I see what you did there.

- You are always right.

rafram 5 hours ago||

Also the Prior Art section, which has telltale repetition of useless verbs like "documenting," "providing insight into," and "confirming" on each line. This was definitely AI-written, at least in part.

tzs 2 hours ago||

Below are the items from that section. How should they be written to not look like an AI?

> hollance/neural-engine — Matthijs Hollemans’ comprehensive community documentation of ANE behavior, performance characteristics, and supported operations. The single best existing resource on ANE.

> mdaiter/ane — Early reverse engineering with working Python and Objective-C samples, documenting the ANECompiler framework and IOKit dispatch.

> eiln/ane — A reverse-engineered Linux driver for ANE (Asahi Linux project), providing insight into the kernel-level interface.

> apple/ml-ane-transformers — Apple’s own reference implementation of transformers optimized for ANE, confirming design patterns like channel-first layout and 1×1 conv preference.

grey-area 3 hours ago|

If only they could fix the iOS autocomplete, which is getting worse with every iteration.

More comments...