Big-Endian Testing with QEMU

Posted by jandeboevrie 3 days ago

Big-Endian Testing with QEMU(www.hanshq.net)

115 points | 128 commentspage 2

drob518 3 days ago|

This whole endianness issue can be traced to western civilization adopting Arabic numbers. Western languages are written left to right, but Arabic is right to left. Thus, Arabic numbers appear as big-endian when viewed in western languages. Consequently, big-endian appears to be "normal" for us in the modern age. But in Arabic, numbers appear little-endian because everything is right to left. Roman numbers are big-endian, though. Maybe that's why we kept the Arabic ordering even when adopting the system? We could have flipped Arabic numbers around and written them as little-endian, but we didn't.

zajio1am 3 days ago||

'Arabic' numbers comes originally from India, from Brahmi numerals. And Brahmi script was left to right. So big-endian was 'normal' even originally, it was Arabs who kept left-to-right numbers within right-to-left script (and therefore use little-endian relative to direction of Arabic script).

zephen 3 days ago|||

> it was Arabs who kept left-to-right numbers within right-to-left script

Do they do this? I thought they swapped this as well.

Narishma 3 days ago||

There are actually two ways to read numbers in Arabic.

The most common is to start from the most significant digit and read left-to-right until the last two digits, which you then read right-to-left.

A less common alternative is to read right-to-left starting from the least significant digit.

zephen 2 days ago||

Interesting! How do you know which alternative is being used?

Narishma 2 days ago||

To give an example, you could read 1234 as either 'one thousand and two hundred and four and thirty' or 'four and thirty and two hundred and one thousand'.

Now that I think about it though, I've only seen the latter way used for the year in a date.

pezezin 3 days ago||

How do you read a number like 1234? "One thousand two hundreds thirty four", big endian.

Most (all?) Western languages say out their numbers in big endian, as do East Asian languages like Chinese, Japanese and Korean. It is only natural that we write down our numbers in big endian, it can be argued that the mistake was making little endian CPUs.

userbinator 3 days ago|||

Little endian: byte N has value 256^N, bit n has value 2^n.

Big endian: byte N has value 256^(L-N), bit n has value 2^n or 2^(l-n) depending on the architecture (some effectively have little bit-endian but big byte-endian) and where L and l are the byte and bit size of the whole integer respectively.

Design hardware or even write arbitrary precision routines, and you'll quickly realise that "big endian is backwards, little endian is logical".

zajio1am 2 days ago||

As bits are generally not addressable / not ordered, it makes no sense to call CPU architecture big/little bit-endian. That makes sense only for serial lines/buses.

userbinator 2 days ago||

Wrong for any CPU with bit manipulation instructions... which is nearly all of them.

DavidVoid 2 days ago|||

How do you read a number like 17? "seventeen", small endian.

Language is messy, some more than others [1].

[1]: https://en.wikipedia.org/wiki/Danish_language#Numerals

pezezin 2 days ago||

"Diecisiete", still big endian. (you could make an argument for numbers between 11 and 15 though)

Yeah, I know there are exceptions, but on average most human languages are big endian.

eisbaw 3 days ago||

I did that many years back, but with MIPS and MIPSel: https://youtu.be/BGzJp1ybpHo?si=eY_Br8BalYzKPJMG&t=1130

presented at Embedded Linux Conf

BobbyTables2 3 days ago||

MIPS is often big endian.

Of course the endianness only matters to C programmers who take endless pleasure in casting raw data from external sources into structs.

bigstrat2003 3 days ago|

Hey, there's no need to kink shame C programmers like that.

beached_whale 3 days ago||

I've used docker buildx to do this in the past. Easier to work with than qemu directly(it does so under the hood).

siraben 3 days ago||

Without installing anything, this can also be reproduced with a shell script that uses a Nix shebang to specify the cross compilers.

https://gist.github.com/siraben/cb0eb96b820a50e11218f0152f2e...

1over137 3 days ago||

>But without access to a big-endian machine, how does one test it? QEMU provides a convenient solution. With its user mode emulation we can easily run a binary on an emulated big-endian system

Nice article! But pity it does not elaborate on how...

IshKebab 3 days ago||

> When programming, it is still important to write code that runs correctly on systems with either byte order

Eh, is it? There aren't any big endian systems left that matter for anyone that isn't doing super niche stuff. Unless you are writing a really foundation library that you want to work everywhere (like libc, zlib, libpng etc.) you can safely just assume everything is little endian. I usually just put a static_assert that the system is little endian for C++.

throwaway2027 3 days ago|

Is there any benefit in edge cases to using big-endian these days?

zephen 3 days ago|

Well, blogging about how it's important can certainly give insight to others about the age of your credentials, just in case repeatedly shouting "Get off my lawn!" didn't suffice.