Posted by lerno 7 days ago
Given your examples, I think you'd have fewer issues if you were working with unsigned integers exclusively. Although I'm curious about what other code you were referencing with this: "But seeing how each change both made the code easier to reason about and more correct, I couldn’t deny the evidence."
With regards to modulo, in Zig if you try to use it with a signed integer it will tell you to specify whether you want `@mod` or `@rem` semantics. In my case, I'd almost never write `x % 2`, I'd write `x & 1`. I do use unsigned division but I'd pretty much never write code that would emit the `div` instruction.
I'm not saying you're wrong though! Everyone has a different mind. If you attain higher correctness and understandability through using signed integers, that's great. I'm just saying I'm in the opposite camp.
The if statement won't work since Zig would force a cast.
The tricky wrap sucks unless you use a power of 2. Then the Zig type can match (u4, u5, u7, etc.) and you would use wrapping arithmetic operators. And on smaller CPUs you NEED to use a power of 2 because division and mod are expensive.
I don’t really get this claim. Indexing should just look up the element corresponding to the value provided. It’s easy to come up with semantics that are intuitive and sound, even if signed integers or ones smaller than size_t are used.
This is especially inconvenient in C, where there exist extremely dangerous legacy implicit casts between signed integers and unsigned integers, which have a great probability of generating incorrect values.
Because the index is typically a signed integer, comparing it with an unsigned limit without using explicit casts is likely to cause bugs. Using explicit casts of smaller unsigned integers towards bigger signed integers results in correct code, but it is cumbersome.
These problems are avoided as said in TFA, by making "sizeof" and the like to have 64-bit signed integer values, instead of unsigned values.
Well chosen implicit conversions are good for a programming language, by reducing unnecessary verbosity, but the implicit integer conversions of C are just wrong and they are by far the worst mistake of C much worse than any other C feature.
Other C features are criticized because they may be misused by inexperienced or careless programmers, but most of the implicit integer conversions are just incorrect. There is no way of using them correctly. Only the conversions from a smaller signed integer to a bigger signed integer are correct.
Mixed signedness conversions have always been wrong and the conversions between unsigned integers have been made wrong by the change in the C standard that has decided that the unsigned integers are integer residues modulo 2^N and they are not non-negative integers.
For modular integers, the only correct conversions are from bigger numbers to smaller numbers, i.e. the opposite of the implicit conversions of C. The implicit conversions of C unsigned numbers would have been correct for non-negative integers, but in the current C standard there are no such numbers.
The current C standard is inconsistent, because the meaning of sizeof is of a non-negative integer and this is also true for the conversions between unsigned numbers, but all the arithmetic operations with unsigned numbers are defined to be operations with integer residues, not operations with non-negative numbers.
The hardware of most processors implements at least 3 kinds of arithmetic operations: operations with signed integers, operations with non-negative integers and operations with integer residues.
Any decent programming language should define distinct types for these kinds of numbers, otherwise the only way to use completely the processor hardware is to use assembly language. Because C does not do this, you have to use at least inline assembly, if not separate assembly source files, for implementing operations with big numbers.
It was undefined what happens at unsigned overflows and underflows. Therefore a compiler could choose to implement "unsigned" as either non-negative numbers or as integer residues.
The fact that "sizeof" is unsigned and the implicit conversions between "unsigned" numbers are consistent only with non-negative numbers. Therefore the undefined behavior should have been defined correspondingly.
Instead of this, at some version of the standard, I am lazy to search it now, but it might have been C99, they have changed the behavior from undefined to defined as the behavior of integer residues.
I do not know the reason for this choice, it may have been just laziness, because it is easier to implement in compilers and it leads to maximum performance in the absence of bugs. In any case this decision has broken the standard, because the arithmetic operations have become incompatible with the implicit conversions between "unsigned" types and with the semantics of "sizeof", which must be non-negative.
For non-negative numbers, the correct conversions are from smaller sizes to bigger sizes, while for integer residues the correct conversions are only in the opposite direction, from bigger sizes to smaller sizes (e.g. a number that is 257 modulo 65536 is also 1 modulo 256, so truncating it yields a correct value, while a number that is 1 modulo 256 when modulo 65536 it could be 257, 511, 769 etc. so you cannot extend it without additional information).
Judging from the implicit conversions, it is clear that the intention of the designers of C during the seventies was that "unsigned" numbers must be non-negative integers and not integer residues. The modern C standard is guilty of the current inconsistencies that greatly increase the chances of bugs
I get your argument about the conversion order, but I do not buy it in terms of language design. You also do not want to go to a quotient ring implicitly, so I do not agree that this conversion direction would be more "correct" for implicit conversion either and from a practical point of view the C design is defensible.
I think the motivation originally was merely to expose the common capabilities of the hardware, nothing more. What we miss from this perspective are polynomials over F_2, but nobody pushed for this too hard so far.
NaN is almost always a mistake, and adding it breaks the law of identity. You don't want it.
But I can't agree with the claim that "nan is almost always a mistake". Certainly if you're doing floating-point computation on large arrays, the last thing you want is e.g. for an error to be thrown in a elementwise division just because two corresponding elements both happen to be zero.
It's true that nan!=nan is one of the more 'controversial' parts of the standard, that possibly would have been decided the other way in a perfect world. But it was also a reasonable pragmatic decision at the time the standard was developed. See here: https://stackoverflow.com/a/1573715/1013442
I don't recall similar arguments being made for Pascal or ADA.
Look around at the state of our C++ and C software and all the CVEs I think we probably shouldn't care about unsigned or signed loop indexes and move on before regulatory pressure forces us. Please language designers, give us some interesting alternatives to Rust.
Fix the language. Don't hack around it by using the wrong type.
Using signed sizes adds a lot of footguns and performance degradations and in exchange gives only small code simplifications in rare and niche cases.
Sizes and indices of course need to be unsigned, and any self respecting compiler should warn about dangerous usage.
So in fact it is not just telling me it’s a hard problem, it’s telling me that the cost-benefit is still not there. It’s like it’s just not a very important problem (in an economic sense). And that is what surprises me, given that computers were made to do arbitrary calculations.
I used to imagine for someone in construction a wall must be some really simple thing. But it's only simple after millennia of building walls. So I now have lots of grace and patience for humanity to figure out numbers in computers, whether integers or reals.
Your explanation is possibly the same just in different words. It's a hard problem and probably needs a whole lifetime. But it's in no single person's economic interest to devote to it the time it needs (not to mention the diverse skills required; once one has a solution one has to pitch it to the world). And so it will happen over a hundred lifetimes.
Sure, it's possible to write bugs in C. And if you really want to, you can disable the compiler warnings which flag tautologous comparisons and mixed-sign comparisons (a common reason for doing this is to avoid spurious warnings in generic-type code).
But, uhh, "people can deliberately write bugs" has got to be the weakest justification I've ever seen for changing a language feature -- especially one as fundamental as "sizes of objects can't be negative".
Signed integers can be negative. The so-called "unsigned" integers of C are integer residues modulo 2^N, which are neither positive nor negative, i.e. these concepts are not applicable to "unsigned" integers.
An alternative view is that any C "unsigned" is both positive and negative. For example the unsigned short "1" is the same number as "65537" and as "-65535".
So any sizeof value in C is negative (while also being positive).
In contradiction with what you say, the change described in TFA, by making sizes 64-bit signed integers, is the only method to guarantee that the sizes are non-negative in a language that does not have dedicated non-negative integers.
Other programming languages have non-negative integers, but C and C++ and many languages derived from them do not have such integers.
The arithmetic operations with non-negative integers differ from the arithmetic operations of C. On overflows and underflows, they either generate exceptions or have saturating behavior.
This can be disproven by the fact that dividing by `unsigned e = 1U` is well defined and always yields the starting number. If the unsigned numbers were really modular numbers as you suggest, division could not be defined.
The oldest parts of the C language are all consistent with "unsigned" numbers being non-negative integers. The implicit conversions between different sizes of "unsigned", the sizeof operator, the relational operators and division are consistent with non-negative integers.
However the first C standard, instead of defining the correct behavior has left undefined many corner cases of the arithmetic operations, allowing the implementation of "unsigned" as either non-negative integers or integer residues.
Eventually, the undefined behaviors for addition, subtraction and multiplication have been defined to be those of integer residues, not those of non-negative integers.
These contradictory properties are the cause of many confusions and bugs.
In extensible languages, like C++, it is possible to define proper non-negative integers and integer residues and bit strings and to always use those types instead of the built-in "unsigned".
In C, it is better to always use signed numbers and avoid unsigned, by casting unsigned to bigger sizes of signed before using such a value.
#include <stdio.h>
int main() {
unsigned short a = 1;
long b = a;
printf("%ld\n", b);
}
If not, why?