Posted by luu 1 day ago
When you download "debian-live-12.7.0-amd64-kde.iso", all the programs in the repos support all current Intel and AMD CPUs, right? Do they just target the lowest common denominator of operations? Or do they somehow adapt to the operations supported by the user's CPU?
Do dynamic languages (Javascript, Python, PHP...) get a speed boost because they can compile just in time and use all the features of the user's CPU?
Mostly the former. Some highly optimized bits of software do the latter—they are built with multiple code paths optimized for different hardware capabilities, and select which one to use at runtime.
> Do dynamic languages (Javascript, Python, PHP...) get a speed boost because they can compile just in time and use all the features of the user's CPU?
Hypothetically yes, but in practice no for the languages you mentioned because they don't map well to things like SIMD. Some JIT-based numerical computing systems as well as JIT-based ML compilers do reap those benefits.
They tried to do something similar in Javascript but it added way too much complexity to the runtimes and ended up getting dropped in favor of WASM SIMD.
https://docs.oracle.com/en/java/javase/23/docs/api/jdk.incub...
https://docs.oracle.com/en/java/javase/23/docs/api/jdk.incub...
SSE4 is from 2008, so making it a requirement isn't unreasonable.
Even AVX2 is from 2013, so some apps require it nowadays.
It is extremely difficult for a compiler to convert scalar code to SIMD automatically, even static C++ compilers really suck at it.
A dynamic compiler for javascript would have no real hope of any meaningful gains.
https://github.com/Cons-Cat/libCat/blob/main/src%2Flibraries...
RADs codecs are expensive but that's the expertise you're paying for.
This is called runtime dispatch. You can do it manually or use a library, like Google Highway. GCC supports multiversioning where you write separate versions of a function and the right one is selected at runtime.
https://github.com/google/highway
https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Function-Multiv...
Mostly the former, some specialized software does the latter. The lowest common denominator is called the baseline, and it differs over time and between distributions. Debian for example still supports x86-64-v1 (the original 64-bit extension to x86), but RHEL 10 will require x86-64-v3, which includes SSE4 and AVX2 support.
Note that in recent years the chosen LCD for some distros has changed - they're starting to target the v2 feature set rather than the original.
See https://developers.redhat.com/blog/2021/01/05/building-red-h...
> Do dynamic languages (Javascript, Python, PHP...) get a speed boost because they can compile just in time and use all the features of the user's CPU?
Dynamically-typed languages can't benefit from this at all (they may include a C library that uses runtime dispatch though). Statically-typed JIT'ed languages like Java can (and you see occasional "look, Java is faster than C" benchmarks citing this), but only if you avoid classes and use only arrays. C# can do better than Java but still suffers from its Windows-centric history.
Please do look into the kind of codegen emitted by OpenJDK and .NET before assuming this. It's a bit difficult with -XX:+PrintAssembly and much easier with DOTNET_JitDisasm='pattern'/Disasmo/NativeAOT+Ghidra. Once you do, you will clearly see how the exact set of ISA extensions influences instruction selection for all sorts of operations like stack zeroing, loads/stores, loop vectorization (automatic or manual), etc. .NET has extensive intrinsics and portable SIMD APIs that use effectively static dispatch even if the path is still picked at runtime, but just once during JIT compilation.
> still suffers from its Windows-centric history.
This is a provably wrong, especially in peformance-related scenarios.
sudo apt/dnf install dotnet-sdk-8.0
For Debian use this: https://learn.microsoft.com/en-us/dotnet/core/install/linux-...It is a better user experience than dealing with C/C++ or Java tooling too.
For shipping packages you don’t even need this since you can just publish them as self-contained or as native binaries.
But yeah, it's mostly code compiled to the lowest common spec, and a bit of code with dynamic dispatching.
PMULLD is probably just doing 2x PMULUDQ and discarding the high bits.
(I tried commenting on his blog but it's awaiting moderation - I don't know if that's ever checked, or just sits in the queue forever)
Maximum for signed bytes is +127, not +128. Minimum is correct, it's -128.
can someone explain this to me? isn't 32640 < 32767? how's this an overflow?
32640 * 2 > 32767
As an aside, the quoted section of the article seems to have an error. The maximum value of an i8 is 127 and the maximum value of one of these products is 32385.
Suggest you look at the Julia Language, a high-level but still capable of C-like speed.
It has built in support for SIMD (and GPU) processing.
Julia is designed to support Scientific Computing, with a growing library spanning different domains.