Posted by surprisetalk 3 days ago
Because Windows, and only Windows, shows it this way. It is official and documented: https://devblogs.microsoft.com/oldnewthing/20090611-00/?p=17...
> Explorer is just following existing practice. Everybody (to within experimental error) refers to 1024 bytes as a kilobyte, not a kibibyte. If Explorer were to switch to the term kibibyte, it would merely be showing users information in a form they cannot understand, and for what purpose? So you can feel superior because you know what that term means and other people don’t.
ls does unless you pass --si.
user@machine:~$ python3
>>> with open('/tmp/a', 'wb') as f:
... f.write (b'a'*1000);
...
1000
>>> with open('/tmp/b', 'wb') as f:
... f.write (b'a'*1024);
...
1024
$ ll /tmp -h
-rw-r--r-- 1 user user 1000 Feb 5 10:40 a
-rw-r--r-- 1 user user 1.0K Feb 5 10:40 bWhat the hell is a "kibibyte"? Sounds like a brand of dog food.
I don't know what the better alternative would have been, but this certainly wasn't it.
1. defined traditional suffixes and abbreviations to mean powers of two, not ten, aligning with most existing usages, but...
2. deprecated their use, especially in formal settings...
3. defined new spelled-out vocabulary for both pow10 and pow2 units, e.g. in English "two megabytes" becomes "two binary megabytes" or "two decimal megabytes", and...
4. defined new unambiguous abbreviations for both decimal and binary units, e.g. "5MB" (traditional) becomes "5bMB" (simplified, binary) or "5dMB" (simplified, decimal)
This way, most people most of the time could keep using the traditional units and be understood just fine, but in formal contexts in which precision is paramount, you'd have a standard way of spelling out exactly what you meant.
I'd have gone one step further too and stipulate that truth in advertising would require storage makers to use "5dMB" or "5 decimal megabytes" or whatever in advertising and specifications if that's what they meant. No cheating using traditional units.
(We could also split bits versus bytes using similar principles, e.g. "bi" vs "by".)
I mean consider UK, which still uses pounds, stone, and miles. In contexts where you'd use those units, writing "10KB" or "one megabyte" would be fine too.
It's leagues better than "kibibyte".
Yeah it sounds dumb, but it’s really not that different from your suggestion.
The difference is the GP focused more on the abbreviation, but the implementation logic is similar.
This ambiguity is documented at least back to 1984, by IBM, the pre-eminent computer company of the time.
In 1972 IBM started selling the IBM 3333 magnetic disk drive. This product catalog [0] from 1979 shows them marketing the corresponding disks as "100 million bytes" or "200 million bytes" (3336 mdl 1 and 3336 mdl 11, respectively). By 1984, those same disks were marketed in the "IBM Input/Output Device Summary"[1] (which was intended for a customer audience) as "100MB" and "200MB"
0: (PDF page 281) "IBM 3330 DISK STORAGE" http://electronicsandbooks.com/edt/manual/Hardware/I/IBM%20w...
1: (PDF page 38, labeled page 2-7, Fig 2-4) http://electronicsandbooks.com/edt/manual/Hardware/I/IBM%20w...
Also, hats off to http://electronicsandbooks.com/ for keeping such incredible records available for the internet to browse.
-------
Edit: The below is wrong. Older experience has corrected me - there has always been ambiguity (perhaps bifurcated between CPU/OS and storage domains). "And that with such great confidence!", indeed.
-------
The article presents wishful thinking. The wish is for "kilobyte" to have one meaning. For the majority of its existence, it had only one meaning - 1024 bytes. Now it has an ambiguous meaning. People wish for an unambiguous term for 1000 bits, however that word does not exist. People also might wish that others use kibibyte any time they reference 1024 bytes, but that is also wishful thinking.
The author's wishful thinking is falsely presented as fact.
I think kilobyte was the wrong word to ever use for 1024 bytes, and I'd love to go back in time to tell computer scientists that they needed to invent a new prefix to mean "1,024" / "2^10" of something, which kilo- never meant before kilobit / kilobyte were invented. Kibi- is fine, the phonetics sound slightly silly to native English speakers, but the 'bi' indicates binary and I think that's reasonable.
I'm just not going to fool myself with wishful thinking. If, in arrogance or self-righteousness, one simply assumes that every time they see "kilobyte" it means 1,000 bytes - then they will make many, many failures. We will always have to take care to verify whether "kilobyte" means 1,000 or 1,024 bytes before implementing something which relies on that for correctness.
There was always a confusion about whether a kilobyte was 1000 or 1024 bytes. Early diskettes always used 1000, only when the 8 bit home computer era started was the 1024 convention firmly established.
Before that it made no sense to talk about kilo as 1024. Earlier computers measured space in records and words, and I guess you can see how in 1960, no one would use kilo to mean 1024 for a 13 bit computer with 40 byte records. A kiloword was, naturally, 1000 words, so why would a kilobyte be 1024?
1024 bearing near ubiquitous was only the case in the 90s or so - except for drive manufacturing and signal processing. Binary prefixes didn't invent the confusion, they were a partial solution. As you point out, while it's possible to clearly indicate binary prefixes, we have no unambiguous notation for decimal bytes.
Even worse, the 3.5" HD floppy disk format used a confusing combination of the two. Its true capacity (when formatted as FAT12) is 1,474,560 bytes. Divide that by 1024 and you get 1440KB; divide that by 1000 and you get the oft-quoted (and often printed on the disk itself) "1.44MB", which is inaccurate no matter how you look at it.
But that said, we aren't talking about sector sizes. Of course storage mediums are always going to use sector sizes of powers of two. What's being talked about here is the confusion in how to refer to the storage medium's total capacity.
Actually, that's not true.
As far as I know, IBM floppy disks always used power-of-2 sizes. The first read-write IBM floppy drives to ship to customers were part of the IBM 3740 Data Entry System (released 1973), designed as a replacement for punched cards. IBM's standard punched card format stored 80 bytes per a card, although some of their systems used a 96 byte format instead. 128 byte sectors was enough to fit either, plus some room for expansion. In their original use case, files were stored with one record/line/card per a disk sector.
However, unlike floppies, (most) IBM mainframe hard disks didn't use power-of-2 sectors. Instead, they supported variable sector sizes ("CKD" format) – when you created a file, it would be assigned one or more hard disk tracks, which then would be formatted with whatever sector size you wanted. In early systems, it was common to use 80 byte sectors, so you could store one punched card per a sector. You could even use variable length sectors, so successive sectors on the same track could be of different sizes.
There was a limit on how many bytes you could fit in a track - for an IBM 3390 mainframe hard disk (released 1989), the maximum track size is 56,664 bytes – not a power of two.
IBM mainframes historically used physical hard disks with special firmware that supported all these unusual features. Nowadays, however, they use industry standard SSDs and hard disks, with power of two sector sizes, but running special software on the SAN which makes it look like a busload of those legacy physical hard disks to the mainframe. And newer mainframe applications use a type of file (VSAM) which uses power-of-two sector sizes (512 bytes through 32KB, but 4KB is most common). So weird sector sizes is really only a thing for legacy apps (BSAM, BDAM, BPAM-sans-PDSE), and certain core system files which are stuck on that format due to backward compatibility requirements. But go back to the 1960s/1970s, non-power-of-2 sector sizes were totally mainstream on IBM mainframe hard disks.
And in that environment, 1000 bytes rather than 1024 bytes makes complete sense. However, file sizes were commonly given in allocation units of tracks/cylinders instead of bytes.
I wonder if there's a wikipedia article listing these...
Example: in 1972, DEC PDP 11/40 handbook [0] said on first page: "16-bit word (two 8-bit bytes), direct addressing of 32K 16-bit words or 64K 8-bit bytes (K = 1024)". Same with Intel - in 1977 [1], they proudly said "Static 1K RAMs" on the first page.
[0] https://pdos.csail.mit.edu/6.828/2005/readings/pdp11-40.pdf
[1] https://deramp.com/downloads/mfe_archive/050-Component%20Spe...
But once hard drives started hitting about a gigabyte was when everyone started noticing and howling.
Similarly, the 4104 chip was a "4kb x 1 bit" RAM chip and stored 4096 bits. You'd see this in the whole 41xx series, and beyond.
I was going to say that what it could address and what they called what it could address is an important distinction, but found this fun ad from 1976[1].
"16K Bytes of RAM Memory, expandable to 60K Bytes", "4K Bytes of ROM/RAM Monitor software", seems pretty unambiguous that you're correct.
Interestingly wikipedia at least implies the IBM System 360 popularized the base-2 prefixes[2], citing their 1964 documentation, but I can't find any use of it in there for the main core storage docs they cite[3]. Amusingly the only use of "kb" I can find in the pdf is for data rate off magnetic tape, which is explicitly defined as "kb = thousands of bytes per second", and the only reference to "kilo-" is for "kilobaud", which would have again been base-10. If we give them the benefit of the doubt on this, presumably it was from later System 360 publications where they would have had enough storage to need prefixes to describe it.
[1] https://commons.wikimedia.org/wiki/File:Zilog_Z-80_Microproc...
[2] https://en.wikipedia.org/wiki/Byte#Units_based_on_powers_of_...
[3] http://www.bitsavers.org/pdf/ibm/360/systemSummary/A22-6810-...
I don't know if that's correct, but at least it'd explain the mismatch.
That's the microcomputer era that has defined the vast majority of our relationship with computers.
IMO, having lived through this era, the only people pushing 1,000 byte kilobytes were storage manufacturers, because it allows them to bump their numbers up.
https://www.latimes.com/archives/la-xpm-2007-nov-03-fi-seaga...
More like late 60s. In fact, in the 70s and 80s, I remember the storage vendors being excoriated for "lying" by following the SI standard.
There were two proposals to fix things in the late 60s, by Donald Morrison and Donald Knuth. Neither were accepted.
Another article suggesting we just roll over and accept the decimal versions is here:
https://cacm.acm.org/opinion/si-and-binary-prefixes-clearing...
This article helpfully explains that decimal KB has been "standard" since the very late 90s.
But when such an august personality as Donald Knuth declares the proposal DOA, I have no heartburn using binary KB.
In fact, they practically say the same exact thing you have said: In a nutshell, base-10 prefixes were used for base-2 numbers, and now it's hard to undo that standard in practice. They didn't say anything about making assumptions. The only difference is that that the author wants to keep trying, and you don't think it's possible? Which is perfectly fine. It's just not as dramatic as your tone implies.
Here's my theory. In the beginning, everything was base10. Because humans.
Binary addressing made sense for RAM. Especially since it makes decoding address lines into chip selects (or slabs of core, or whatever) a piece of cake, having chips be a round number in binary made life easier for everyone.
Then early DOS systems (CP/M comes to mind particularly) mapped disk sectors to RAM regions, so to enable this shortcut, disk sectors became RAM-shaped. The 512-byte sector was born. File sizes can be written in bytes, but what actually matters is how many sectors they take up. So file sizing inherited this shortcut.
But these shortcuts never affected "real computers", only the hamstrung crap people were running at home.
So today we have multiple ecosystems. Some born out of real computers, some with a heavy DOS inheritance. Some of us were taught DOS's limitations as truth, and some of us weren't.
However it doesn't seem to be divided into sectors at all, more like each track is like a loop of magnetic tape. In that context it makes a bit more sense to use decimal units, measuring in bits per second like for serial comms.
Or maybe there were some extra characters used for ECC? 5 million / 100 / 100 = 500 characters per track, leaves 72 bits over for that purpose if the actual size was 512.
First floppy disks - also from IBM - had 128-byte sectors. IIRC, it was chosen because it was the smallest power of two that could store an 80-column line of text (made standard by IBM punched cards).
Disk controllers need to know how many bytes to read for each sector, and the easiest way to do this is by detecting overflow of an n-bit counter. Comparing with 80 or 100 would take more circuitry.
You can get away with those on machines with 64 bit address spaces and TFLOPs of math capacity. You can't on anything older or smaller.
> The author's wishful thinking is falsely presented as fact.
There's good reason why the meanings of SI prefixes aren't set by convention or by common usage or by immemorial tradition, but by the SI. We had several thousand years of setting weights and measures by local and trade tradition and it was a nightmare, which is how we ended up with the SI. It's not a good show for computing to come along and immediately recreate the long and short ton.
Adding to your point, it is human nature to create industry- or context-specific units and refuse to play with others.
In the non-metric world, I see examples like: Paper publishing uses points (1/72 inch), metal machinists use thousands of an inch, woodworkers use feet and inches and binary fractions, land surveyors use decimal feet (unusual!), waist circumference is in inches, body height is in feet and inches, but you buy fabric by the yard, airplane altitudes are in hundreds to tens of thousands of feet instead of decimal miles. Crude oil is traded in barrels but gasoline is dispensed in gallons. Everyone thinks their usage of units and numbers is intuitive and optimal, and everyone refuses to change.
In the metric(ish) world, I still see many tensions. The micron is a common alternate name for the micrometre, yet why don't we have a millin or nanon or picon? The solution is to eliminate the micron. I've seen the angstrom (0.1 nm) in spectroscopy and in the discussion of CPU transistor sizes, yet it diverts attention away from the picometre. The bar (100 kPa) is popular in talking about things like tire pressure because it's nearly 1 atmosphere. The mmHg is a unit of pressure that sounds metric but is not; the correct unit is pascal. No one in astronomy uses mega/giga/tera/peta/etc.-metres; instead they use AU and parsec and (thousand, million, billion) light-years. Particle physics use eV/keV/MeV instead of some units around the picojoule.
Having a grab bag of units and domains that don't talk to each other is indeed the natural state of things. To put your foot down and say no, your industry does not get its own special snowflake unit, stop that nonsense and use the standardized unit - that takes real effort to achieve.
You need character to admit that. I bow to you.
Kudos for getting back. (and closing the tap of "you are wrong" comments :))
Which makes it really @#ing annoying when you have things like "I want to transmit 8 gigabytes (meaning gibibytes, 2*30) over a 1 gigabit/s link, how long will it take?". Welcome to every networking class in the 90s.
We should continue moving towards a world where 2*k prefixes have separate names and we use SI prefixes only for their precise base-10 meanings. The past is polluted but we hopefully have hundreds of years ahead of us to do things better.
Which doesn't make it more correct, of course, even through I strongly believe believe that it is (where appropriate for things like memory sizes). Just saying, it goes much further back than 1984.
Which is the reality. "kilobyte" means "1000 bytes". There's no possible discussion over this fact.
Many people have been using it wrong for decades, but its literal value did not change.
You are free to intend only one meaning in your own communication, but you may sometimes find yourself being misunderstood: that, too, is reality.
E.g., M-W lists both, with even the 1,024 B definition being listed first. Wiktionary lists the 1,024 B definition, though it is tagged as "informal".
As a prescriptivist myself I would love if the world could standardize on kilo = 1000, kibi = 1024, but that'll likely take some time … and the introduction of the word to the wider public, who I do not think is generally aware of the binary prefixes, and some large companies deciding to use the term, which they likely won't do, since companies are apt to always trade for low-grade perpetual confusion over some short-term confusion during the switch.
This is a myth. The first IBM harddrive was 5,000,000 characters in 1956 - before bytes were even common usage. Drives have always been base10, it's not a conspiracy.
Drives are base10, lines are base10, clocks are base10, pretty much everything but RAM is base10. Base2 is the exception, not the rule.
You can say that one meaning is more correct than the other, but that doesn't vanish the other meaning from existence.
Now, it depends.
Yeah, I already knew that, lol.
But thanks for bringing it to my attention. :-)
https://www-cs-faculty.stanford.edu/~knuth/news99.html
And he was right.
Context is important.
"K" is an excellent prefix for 1024 bytes when working with small computers, and a metric shit ton of time has been saved by standardizing on that.
When you get to bigger units, marketing intervenes, and, as other commenters have pointed out, we have the storage standard of MB == 1000 * 1024.
But why is that? Certainly it's because of the marketing, but also it's because KB has been standardized for bytes.
> Which is the reality. "kilobyte" means "1000 bytes". There's no possible discussion over this fact.
You couldn't be more wrong. Absolutely nobody talks about 8K bytes of memory and means 8000.
In fact, this is the only case I can think of where that has ever happened.
If we are talking about kilobytes, it could just as easily the opposite.
Unless you were referring to only contracts which you yourself draft, in which case it'd be whatever you personally want.
KB is 1024 bytes, and don't you dare try stealing those 24 bytes from me
Anyway, here's my contribution to help make everything worse. I think we should use Kylobyte, etc. when we don't care whether it's 1000 or 1024. KyB. See! Works great.
You can use `--si` for fake, 1000-byte kilobytes - trying it it seems weird that these are reported with a lowercase 'k' but 'M' and so on remain uppercase.
For SI units, the abbreviations are defined, so a lowercase k for kilo and uppercase M for mega is correct. Lower case m is milli, c is centi, d is deci. Uppercase G is giga, T is tera and so on.
https://en.wikipedia.org/wiki/International_System_of_Units#...
SI units are attempting to fix standard measurements with perceived constants in nature. A meter(Distance) is the distance light travels in a vacuum, back and forth, within a certain amount of ossilations of a cesium atom(Time). This doesn't mean we tweak the meter to conform to observational results as we'd all be happier if light really was 300 000KM/s instead of ~299 792km/s.
Then there's the problem of not mixing different measurement units. SI was designed to conform all measurements to the same base 10 exponents (cm, m, km versus feet inches and yards) But the authors attempt to resolve this matter doesn't even conform to standardised SI units as we would expect them to.
What is a byte? Well, 8 bits, sometimes. What is a kilobit? 1000 Bits What is a kilobyte? 1000 Bytes, or 1024 Bytes.
Now we've already mixed units based on what a bit or a byte even is and the addition of the 8 multiplier in addition to the exponent of 1000 or 1024.
And if you think, hey, at least the bit is the least divisible unit of information, That's not even correct. If there Should* be a reformalisation of information units, you would agree that the amount of "0"'s is the least divisible unit of information. A kilo of zero's, would be 1000. A 'byte' would be defined as containing up to 256 zero's. A Megazero would contain up to a million zero's.
It wouldn't make any intuitive sense for anyone to count 0's, which would automatically convert your information back to base 10, but it does prove that the most sensible unit of information is already what we've had before, that is, you're not mixing bytes (powers of 2) with SI-defined units of 1000
1 kB is 1024 B. It's measurement unit thing. Not logics. 8 bits in a byte. Not 10, neither 6 nor 5. Just 8.
Like feet in yards, inches in feet, meters in kilometers and ounces in pounds.
It's 1024. Period.
Then of course you are free to count as many bytes as you want and call that bunch a kB.
Pretending anyone else will agree on that is a different thing.
Otherwise this discussion had ended since long.