Posted by nathan-barry 11 hours ago
Dijkstra already wrote this in the 80s and today many teachers still complain about this fact. I also know that, at least in the Netherlands, the curriculum is judged based on the percentage of students that pass. If too few students pass, then the material is made easier (never harder!), so you can imagine what happens if this process continued for half a century by now.
On the foolishness of "natural language programming". https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...
Apparently Dijesktra loved using em-dash!
This hits so hard. cough dynamic typing enthusiasts and vibe coders cough
https://www.cs.utexas.edu/~EWD/transcriptions/EWD12xx/EWD124...
And its just so obviously correct.
(And also, like DW points, that software is way more complex. But on this case, it's the requirements discovery.)
I think a compelling argument can be made that 0-based is better for offsets and 1-based is better for indexes, and that we should not think of both as the same thing. https://hisham.hm/2021/01/18/again-on-0-based-vs-1-based-ind...
Zero-based indexing is naturally coupled with using only half-open ranges.
When using zero-based indexing and half-open ranges, accessing an array forwards, backwards or circularly is equally easy.
In this case you can also do like in the language Icon, where non-negative indices access the array forwards, while negative indices access the array backwards (i.e. -1 is the index of the last element of the array, while 0 is the index of the first element).
In languages lacking the Icon feature, you just have to explicitly add the length of the array to the negative index.
There is absolutely no reason to distinguish offsets and indices. The less distinct kinds of things you need to keep in mind, the less chances for errors from using the wrong thing. Therefore one should not add extra kinds of things in a programming language, unless there is a real need for them.
There are things that are missing from most languages and which are needed, e.g. separate types for signed integers, unsigned integers, modular numbers, bit strings and binary polynomials (instead of using ambiguous unsigned integers for all the 4 latter types, which prevents the detection of dangerous errors, e.g. unsigned overflow), but distinguishing offsets from indices is not a useful thing.
Distinguishing offsets and indices would be useful only if the set of operations applicable to them would be different. However this is not true, because the main reason for using indices is to be able to apply arithmetic operations to them. Otherwise, you would not use numbers for indexing, but names, i.e. you would not use arrays, but structures (or hash tables, when the names used for access are not known at compile time).
The whole point here is to use a single kind of range, without exceptions, in order to avoid the errors caused by using the wrong type of range for the context.
For backwards iteration the right range is [-1,-1-n), i.e. none of those listed by you.
Like for any such range, the number of accessed elements is the difference of the range limits, i.e. n, which is how you check that you have written the correct limits. When the end of the range is less than the start, that means that the index must be decremented. In some programming languages specifying a range selects automatically incrementing or decrementing based on the relationship between the limits. In less clever languages, like C/C++, you have to select yourself between incrementing and decrementing (i.e. between "i=0;i<n;i++" and "i=-1;i>-1-n;i--").
It is easy to remember the backwards range, as it is obtained by the conversion rules: 0 => -1 (i.e. first element => last element) and n => -n (i.e. forwards => backwards).
To a negative index, the length of the array must be added, unless you use a programming language where that is done implicitly.
In the C language, instead of adding the length of the array, one can use a negative index into an array together with a pointer pointing to one element past the array, e.g. obtained as the address of the element indexed by the length of the array. Such a pointer is valid in C, even if accessing memory directly through it would generate an out-of-range error, like also taking the address of any element having an index greater than the length of the array. The validity of such a pointer is specified in the standard exactly for allowing the access of an array backwards, using negative indices.
Some decades ago, I have disassembled and studied Microsoft's Fortran compiler.
The fact that Fortran uses 1-based indexing caused a lot of unnecessary complications in that compiler. After seeing firsthand the problems caused by 1-based indexing I have no doubt that Dijkstra was right. Yes, the compiler could handle perfectly fine 1-based indexing, but there really was no reason for all that effort, which should have been better spent on features providing a serious advantage for the programmer.
The use of 1-based indexing and/or closed intervals, instead of consistently using only 0-based indexing and half-open intervals, are likely to be the cause of most off-by-one errors.
The IBM 704's index registers were decrementing, subtracting their contents from an instruction's address field to form an effective address.
Type A instructions had a three bit tag, indicating which index registers to use.
With those three indexes you get Fortran’s efficient 7D column major arrays. This made data aggregation much more efficient and Cray/Cuda/modern column oriented DBs do similar.
But for Algol-like languages on PDP inspired hardware I do like his convention, if like the example you have a total ordering on the indexes.
But Fortran also changed the way it was taught, shifting from an offset from memory that was indexed down from an address, which makes sense when the number of bits in an address space changes based on machine configuration, to index values.
Technically Fortran, at least in word machines meets his convention.
pointer + offset <= word < pointer + offset +1
It is just the offset is always negative.The use of 1-based indexing was just inherited from the index notation used for vectors and matrices in most mathematical journals at that time.
When IBM 704 was designed (as the successor of IBM 701 and taking into account the experience from IBM NORC), it was designed to allow an efficient implementation of FORTRAN, not vice-versa.
The column-major ordering of FORTRAN allows more efficient implementations of the matrix-vector and matrix-matrix products (by reading the elements sequentially), when these operations are done correctly, i.e. not using the naive dot product definitions that are presented in most linear algebra manuals instead of the useful definitions of these operations (i.e. based on AXPY and on the tensor product of vectors).
Then what do you call “here”?
The name for where you start from in this scenario is usually not required because it’s obvious what you mean and everyone understands the first block means you have to first walk a block, not that where you start is the first block.
So in that sense yes we have a zeroth chapter. That’s when you’re at the beginning of the first one but haven’t read all the way.
What is the name of the block from which you left to enter the first block? Before you started walking I mean.
And mustn’t that block be before that other first? When we move from where we start we count up, so then mustn’t an earlier block be counting down? Counting down would mean a number smaller than one.
And are blocks not counted in units, as whole numbers?
So would it not be the case that one block less than 1 must be by necessity the zeroth block?
In other words if you agree that “as soon as you start walking, you are in the first block”, then you must also agree that before you left you began in the zeroth block.
How else could it be interpreted?
Think of jogging on a road. When you are at the beginning of the road, you are at the start of the first mile, not in the zeroth mile. It doesn't have one more mile prior to first mile.
And everybody knows a pandemic starts with patient 1!
that's why we start from 0, not because of voltages, at least in compsci.
I guess the practice was influenced by computer science - I don't know of an example that precedes it, but one fairly early one I've found is Bishop and Goldberg's Tensor Analysis on Manifolds from 1968, with a chapter 0 on set theory and topology. Back then the authors felt the need to justify their numbering in the preface:
"The initial chapter has been numbered 0 because it logically precedes the main topics"
Quite straightforward.
There's also the "zeroth law of thermodynamics", which was explicitly identified long after the first, second, and third laws, but was felt more primary or basic, hence the need for an "ordinal before the first"
I think you’re just biased to think that starting must “naturally” begin with 1.
Zero is just a good a place to start and some people do start counting from zero.
Just the convenience of having an ordinal number to say? Rather than saying "chapter 0, chapter 1, chapter 2" one can say "the fourth chapter"? Or is it the fact that the chapter with number 4 has 3 chapters preceding it?
On first glance I find this all rather meaningless pedantry.
EDIT: Yeah, I don't know why book chapter labels shouldn't start with "0". It seems fine to me. They could use letters instead of numbers for all I care.
Definitions are fine, and I agree that "A" is the first letter. But that's no use to people who need to think clearly about the offset between "A" and "C" right now. Should I tell them they're wrong, they have to count to three and then subtract one? Because the dictionary says so?
You're also wrong about there being no 0th mile. https://www.atlasobscura.com/places/u-s-route-1-mile-0-sign
A while back someone posed EWD765 for an alternate solution, I don't recall if any other solution was found. That was my introduction to these.
Besides a mathematical inclination, an exceptionally good mastery of one's native tongue is the most vital asset of a competent programmer.
...
The use of anthropomorphic terminology when dealing with computing systems is a symptom of professional immaturity.
...
Projects promoting programming in "natural language" are intrinsically doomed to fail.
I'd also recommend EWD1305 https://www.cs.utexas.edu/~EWD/transcriptions/EWD13xx/EWD130... Answers to questions from students of Software Engineering
[The approximate reconstruction of the questions is left as an exercise to the reader.]
...
No, I'm afraid that computer science has suffered from the popularity of the Internet. It has attracted an increasing —not to say: overwhelming!— number of students with very little scientific inclination and in research it has only strengthened the prevailing (and somewhat vulgar) obsession with speed and capacity.
Yes, I share your concern: how to program well —though a teachable topic— is hardly taught. The situation is similar to that in mathematics, where the explicit curriculum is confined to mathematical results; how to do mathematics is something the student must absorb by osmosis, so to speak. One reason for preferring symbol-manipulating, calculating arguments is that their design is much better teachable than the design of verbal/pictorial arguments. Large-scale introduction of courses on such calculational methodology, however, would encounter unsurmountable political problems.Also the burn in the beginning of EWD899 (not transcribed) is noteworthy:
A review of a paper in AI. I read "Default Reasoning as Likelihood Reasoning" by Elaine Rich. (My copy did not reveal where it had been published; the format suggests some conference proceedings. If that impression is correct, I am glad I did not attend the conference in question.
I keep meaning to sit down with this site and make my way through it all. Might make more progress if I grab them into an eReader-friendly format and then peruse them more easily when travelling.
Less likely to make mistakes if you can’t erase