Posted by bhouston 4 days ago
Let's take two very similar cores, the SiFive P550 @1.4GHz and a Cortex-A72 @1.5GHz: https://browser.geekbench.com/v6/cpu/compare/123?baseline=74...
Notice that the A72 is a lot faster in some of the benchmarks. Looking at the manual suggests that all of them are to a varying degree SIMD optimized for arm Neon: https://www.geekbench.com/doc/geekbench6-benchmark-internals...
The most ISA agnostic benchmark you can compare is the clang one, as it just compiles clang, which doesn't benefit much from dedicated instructions that aren't enabled for RISC-V. Notice how there the P550 outperforms the A72.
---
That compared un-optimized code, but let's compare hand optimized ARM Neon vs hand optimized RISC-V RVV:
This time I'm comparing a Cortex-A72 with the SpacemiT X60:
* Cortex-A55: in order 1.8GHz, dual issue neon (two 128-bit execution units)
* SpacemiT X60: in order 1.6GHz. Has RVV with 256-bit vectors, but two 128-bit execution units with a weird layout:
only EX1: shift, bitwise, compare, mask-ops, merge, gather, compress
EX1&EX2: int/float arithmetic, including mul&div
The cores should be quite comparable, although the A55 is in a lot better SOC (my phone) than the X60 core and should be slightly faster, all else equal.
The hand optimized code is from simdutf and the gnuradio kernel library, which I ported both to RVV:
gnuradio kernels
A55 Neon vs X60 RVV
i16c mul: 674 vs 634 ms
i16c dot: 361 vs 415 ms
f32c conj dot: 1043 vs 900 ms
rotator2: 763 vs 918 ms
min index u32: 404 vs 156 ms
f32 interleave: 308 vs 742 ms
f32 log2: 1208 vs 789 ms
f32 sin: 2155 vs 2152 ms
f32 tan: 2962 vs 3152 ms
f32 stddev&mean: 609 vs 277 ms
f32 poly sum: 667 vs 266 ms
u8 conv k2: 10413 bs 7523 ms
simdutf: utf8 to utf16 utf16 to utf8
Neon A55 vs RVV X60 Neon A55 vs RVV X60
arabic.utf8.txt 0.25 vs 0.34 b/c 0.66 vs 0.95 b/c
chinese.utf8.txt 0.23 vs 0.28 b/c 0.51 vs 0.56 b/c
czech.utf8.txt 0.21 vs 0.35 b/c 0.67 vs 0.90 b/c
english.utf8.txt 0.81 vs 0.85 b/c 1.14 vs 1.63 b/c
esperanto.utf8.txt 0.34 vs 0.45 b/c 0.91 vs 1.12 b/c
french.utf8.txt 0.28 vs 0.35 b/c 0.86 vs 0.98 b/c
german.utf8.txt 0.35 vs 0.41 b/c 0.94 vs 1.10 b/c
greek.utf8.txt 0.24 vs 0.37 b/c 0.64 vs 1.03 b/c
hebrew.utf8.txt 0.21 vs 0.33 b/c 0.58 vs 0.92 b/c
hindi.utf8.txt 0.24 vs 0.29 b/c 0.48 vs 0.57 b/c
japanese.utf8.txt 0.23 vs 0.29 b/c 0.49 vs 0.57 b/c
korean.utf8.txt 0.21 vs 0.28 b/c 0.47 vs 0.55 b/c
persan.utf8.txt 0.23 vs 0.33 b/c 0.60 vs 0.80 b/c
portuguese.utf8.txt 0.29 vs 0.36 b/c 0.88 vs 1.02 b/c
russian.utf8.txt 0.23 vs 0.31 b/c 0.60 vs 0.84 b/c
thai.utf8.txt 0.29 vs 0.30 b/c 0.52 vs 0.60 b/c
turkish.utf8.txt 0.25 vs 0.38 b/c 0.73 vs 0.99 b/c
vietnamese.utf8.txt 0.18 vs 0.28 b/c 0.46 vs 0.54 b/c
Note: b/c is bytes/cycle, so bigger is better
As you can see the performance is very competitive between processors of a similar design point.I don't think anyone has put the kind of money into a RISC-V processor that Apple has in order to develop the 3nm M4.
I was going to say it isn't an apples to apples comparison but I will restrain myself.
Suppose Facebook are tired of paying a premium to Cisco et al and decide to commission their own network equipment. That stuff doesn't have to be competitive with x86 on single thread performance, it just has to be reasonably power efficient. So they take some existing free RISC-V core and make a few improvements to it and use that. But they publish the improvements, because they're not actually trying to be a hardware OEM and if someone else takes their design and does the same thing, they know they get those improvements for their next generation.
So then that happens. Google want the same thing and make more improvements. Netgear use it in a consumer router, and they're not big enough to improve the chip, but they ship it in a product that sells a million units, so widespread use causes the community to optimize software for it and fix bugs. At this point Samsung or Qualcomm realize they only have to improve the SIMD support a little and they can stop paying ARM for their low and mid range phone SoCs. But if half of Android devices are now RISC-V and Qualcomm are already designing the high end cores themselves, why pay ARM for that either? So now it's in the high end phones, and someone starts putting the same chip into laptops.
All it really takes is for enough people to not want to pay ARM to create an ecosystem that allows everybody else to do the same thing. The free designs eat the low end of the market and then the high end uses the same architecture because why wouldn't it?
I don’t know if it will happen, but it would be extremely funny if Intel cut off Arm and went with RISC-V. (False reports of the death of x86 have been around for decades, but it is bound to happen eventually, right?)
Can't write off the first car only able to go 15km/h because your horse is able to do 40km/h.
Most of ARM's design work is done in the US (Austin), India (Bangalore, Noida), and China (Beijing), though ARM China should basically be treated as a separate company at this point due to corporate shenanigans.
That said, in the chip design space (which tends to be concentrated in the US, Israel, India, and China), RISC-V has become much more popular for commodity embedded usecases because of the less restrictive licensing meaning better profit margins, which is allowing fabless chip startups to potentially leap ahead of ARM
And that probably only happened because Apple co-founded ARM.
That's other people being short-sighted, not China doing anything wrong or sinister.
There are in fact quite a lot of exciting non-Chinese developments being announced recently, including at the RISC-V Summit that is on now, but those things will take several years to make their way in to the market.
It's not about how great the teams behind these CPUs are.
It's about how great the CPUs are.
That's the ignoble rhetorical device of applephasis
https://camel-cdr.github.io/rvv-bench-results/articles/xperm... (scroll to bottom and compare scalar performance between Ryzen 1600x and XiangShanV3)
You may notice that while scalar is faster, the vector performance is slower, this is because their vector implementation is still quite new, and they are still missing a few optimizations.
XiangShan repo: https://github.com/OpenXiangShan/XiangShan
More micro architectural details: https://www.servethehome.com/xiangshan-high-performance-risc...
BTW, XiangShanV2 has already been taped out and will be available in an laptop in the future: https://milkv.io/ruyibook
(In the following years AMD has reduced then eliminated their initial handicap of 3 years, with Zen 2 matching Skylake in IPC, and then from Zen 3 until now they have always matched in IPC the best contemporaneous Intel cores.)
So even such an unusually fast RISC-V core has 10 years of handicap to recover until being able to match modern CPU cores, like the Apple and Qualcomm Arm cores or like the current AMD and Intel cores.
Moreover, unless your RTL simulation includes cache memories and slow DRAM memory, the simulated IPC will be far too optimistic. In any real CPU the IPC is diminished a few times from its ideal values by the cache misses that stall the CPU until data is loaded from the main memory.
Yes.
But why do you say this as if it's news, or even bad news?
Two years ago the best RISC-V core in the market had 30 years of handicap to x86/PowerPC in IPC.
Catching up by 20 years in 2 years is pretty impressive, don't you think?
Parity is coming well before 2030, even allowing for x86 advances in that time.
So I just upgraded my Zen1+ processor (2400G) to a Zen3 (5700G) which doubled the core count and upped single core performance by about 50 percent. My favorite benchmark runs 3x as fast. Now Zen5 AFAICT is not more than 2x the performance of Zen3. So per-core the state of the art AMD is maybe 3x that of Zen1.
In the last 10+ years the only significant jump due to an advanced node was with the switch to EUV lithography (TSMC 7nm) and that jump was included between Zen1 and Zen3. All the other node advancements seem to be 10-15 percent performance with new CPUs getting a modest IPC increase on top of that.
If there really is a RISC-V chip with Zen1 performance that's really quite good and I'd be happy to have it. Not sure what node it's on either, so there's probably room to "buy" more performance.
No it didn't. ARM began on Acorn desktop computers, the Acorn Archimedes. ARM originally stood for Acorn RISC Machine.
So sorry, but I have to partly disagree, they did.
Edit: Interesting... sharing what the designer of the first ARM core said leads to downvoting in HN...HN is a very different place as it used to be...
I just want some graphics
is that really so much to ask
RISC-V is not slow.
The RISC-V chips currently available in off-the-shelf hardware, which have CPU cores released in 2018/2019 at the same time as the original specs were formally frozen (ratified), are slow.
Big money started to be invested into RISC-V designs in 2021 and 2022. The results of that will be seen in hardware in the market in 2026 or 2027 or so.
The whole RISC-V "winning" meme is so weird. What do you gain if RISC-V "wins"?
Even with an open source CPU core you're not getting away from binary blobs. Any wireless baseband is going to be pretty much a sealed system for regulatory reasons. Manufacturers will still lock down firmwares. Attestation chains will still be required for security. HDCP won't magically open up.
High performance is largely divorced from the ISA and more related to the low level chip design and process node/chemistry. If RISC-V were to "win", the decent chips will still be manufactured by TSMC. Existing chip designers won't commit mass seppuku, they'll just start working on RISC-V designs. Compiler toolchains will just target RISC-V.
So to you what do you expect to change with RISC-V? Unless you've shorted ARM and just want them to go out of business there's no magic upside for you as an end user or even as a device designer. Maybe your next phone has a RISC-V cellular baseband? You're not going to be able to tweak the EIRP of the radio any more than you can today. ARM at the core of the baseband doesn't control that but the regulatory licensing.
What problem or problems do you think exists? What of those problems do you think RISC-V somehow solves? An ARM laptop can run some code you write. A RISC-V laptop can as well. The difference is immaterial unless you really love writing RISC-V assembly.
RISC-V doesn't change physics so it doesn't obviate radio emission regulation. RISC-V doesn't change licensing to industry SIGs for protocol compliance badging. RISC-V doesn't change security postures so it's going to still use a signed bootloader to make enterprise sales. A decent performing RISC-V chip still requires a factory costing billions of dollars so you're not going to be manufacturing your own.
The open (or closed) nature of a CPU core isn't really changing the dynamics of electronics in general or computing specifically.
Also, a complete re-boot of the ISA with AArch64.
This is a mostly-uninformed theory, but I'd love to hear thoughts on it: AArch64 was a substantial break from AArch32, with lots of design changes intended to ease superscalar OoO implementations. Conditional execution mostly gone, Thumb gone, PC no longer a GPR, etc. Clearly, AArch64 has excelled for this. There are even rumors that Apple basically commissioned AArch64 for the types of cores they wanted to build: https://news.ycombinator.com/item?id=31368489
RISC-V is quite similar to MIPS, an ISA which hasn't had a high-performance leading-edge implementation in 20+ years (dating back to SGI's last parts). Will this heritage make it harder to build high-performance OoO implementations? Does RISC-V need an AArch64-style reboot? Maybe this can be mostly done through extensions?
They have a whole book where they go threw each instruction and explain why they added it.
Its not perfectly optimized only for high performance, but that was certainty a major factor.
The architecture has been evaluated by people like Dave Ditzel and Jim Keller. The leader designer for Jim Kellers company worked on the M1. And they all seem to think that its a good design.
In general, ARM optimisation has caught up quite a bit, although still lags behind in some places. RISC-V still has some ways to do. For example, for Go:
[/usr/lib/go]% (for f in **/*.s; print ${${(s:_:)f:t}[-1]}) | sort | uniq -c | sort -hr | head -n10
86 amd64.s
72 arm64.s
63 s390x.s
56 s
48 arm.s
47 386.s
35 riscv64.s
32 ppc64x.s
22 loong64.s
20 mips64x.s
[/usr/lib/go]% wc -l **/*amd64.s | tail -n1
28801 total
[/usr/lib/go]% wc -l **/*arm64.s | tail -n1
21956 total
[/usr/lib/go]% wc -l **/*riscv64.s | tail -n1
7804 total
Rough measurement of course, but at least some code paths for your RISC-V will be slower just because it's not optimised (yet).I strongly disagree with how this frames the conversation. For most applications our desktop machines could be 10x faster if application programmers were incentivized to care about performance. I'd happily take a 2.5x slowdown for cheaper, simpler, non-proprietary hardware. And that leads into the next point:
> RISC-V implementations in the wild lack advanced features that modern CPUs rely on for speed, including sophisticated pipelining mechanisms, out-of-order execution capabilities, advanced branch prediction, and multi-tiered cache hierarchies. Most commercial RISC-V chips remain in-order processors, meaning they execute instructions sequentially rather than optimizing their order for performance.
Get your pitchforks out, because I consider this a feature. Spectre should have been a wakeup call that these performance optimizations are incompatible with secure computing. "Look how much faster our new minivan careens off the nearest cliff and explodes in midair!" is not the selling point people seem to think it is. I'm eagerly awaiting a RISC-V mainboard for the Framework for this reason. If I want performance, I'll use a burner PC. If I want security, I want a CPU design where it's actually tractable to make it secure.
But they are not going to be. Google and Slack is going to chomp RAM and burn my CPU cycles and my desire to use RISC-V isn't going to change their behavior. If you want people to use your hardware, you have to meet them where they are.
RISC-V does not guarantee that the CPU core designs are open-source, that the chip designs are open-source, that your computer doesn't use Secure Boot, that your computer doesn't have proprietary drivers, or that your computer doesn't use DRM at the hardware level.
It's only an instruction set. A good first step, but only that. Don't get people's hopes up.
There are many, many applications for which performance isn't as important as cost and/or power draw. For instance, these days it can be cheaper to run an additional microcontroller close to where it's needed than it is to fabricate a wiring harness to bring sensor data all the way to a centralized location. RISC-V excels here.
It's a mature enough ecosystem that people can compile whatever software they want and run it without fuss on RISC-V. Nobody is going to buy a RISC-V laptop with a full GUI and be disappointed when it doesn't perform like a MacBook.
So if I can buy a small SBC with two ethernets and a few RISC-V cores that takes few enough watts they can be counted on a single hand for a handful of dollars so that I can make a hardware VPN device, and I can download and compile most software on it, that interests me. Processors like Intel's N200 have their uses and definitely more performant than RISC-V, but take way too much power and therefore generate way too much heat. So why would I even bother comparing? They're in different leagues.
"RISC-V implementations in the wild lack advanced features that modern CPUs rely on for speed, including sophisticated pipelining mechanisms, out-of-order execution capabilities, advanced branch prediction, and multi-tiered cache hierarchies. Most commercial RISC-V chips remain in-order processors, meaning they execute instructions sequentially rather than optimizing their order for performance. This architectural simplicity creates a fundamental performance ceiling that's difficult to overcome without significant architectural changes."
As mentioned in article, Berkeley's SonicBOOM is out of order. And you could certainly enhance the memory architecture with multi-level caches easily; the ISA is blind to this (consider how little x86 ISA has improved to tackle ever improving cache/memory strategies since i386).
RISC-V is also extensible. So you can keep improving things! The incredibly awesome efficiency-foxused PULP group is working on a massive many-core research chip Occamy. It has huge vector processing units, many many many of them. Ok, so, promising. https://pulp-platform.org/occamy/
And they have their own extension, Semantic Streaming Registers, which add a very CISC-y set of instructions that combine real work and load/store (and incrementing the data pointer for the next loop), allowing DSP like performance in loops. Super super slick extension to massively increase the throughput, improve the ISA & reduce it's footprint in loops. https://arxiv.org/abs/1911.08356
"Rivos has not yet revealed details of its products or technology publicly."
Great that they're working on that, but when discussing the _current_ state of the RISC-V ecosystem, I think they can be safely ignored until such a time that they actually start shipping stuff.
It seems Ventana has shipped the Veyron V1, but I'm having a hard time finding concrete information on that in a quick search (other than from Ventana themselves), so not entirely sure what the status of that is? Their V2 chip is planned for 2025.
Some ISAs are more, or less, amenable to implementations that are fast for modern workloads.
E.g., a really bad ISA could make SIMD ops, floating point math, prompt interrupt handling, 64-bit addressing, etc. really hard to make into a fast implementation.
So based on the novelty of RISC-V, that's a plausible interpretation of the title.
Various low power rv64 CPUs are actually outperforming x86 when you compare them in terms of die area and energy usage.
Obviously
I'll defer to Jim Keller (dec alpha, amd zen, now tenstorrent..)