Posted by signa11 5 days ago
First of all, the fixed points are LITERALLY NOT FIXED POINTS. They're decimal floats. Fixed points are just integers that re-scale when multiplied or divided. There is no exponent field, no nothing. The author seems to have confused the notion "fixed points allow for precise calculations of monetary values" to mean that they're decimal. They're not. That section of the book contradicts itself constantly and also the code is wrong.
Also an ordered vector is used to implement a map/set. Because:
> Most people would likely instinctively reach for hash tables, and typically spend the next few months researching optimal hash algorithms and table designs.
> A binary searched vector is as simple as it gets and performs pretty well while being more predictable.
A basic hash table or hash set[1] is both simpler and faster than this solution. And I don't see what's stopping someone from spending the next few months researching optimal dynamic array growth and searching algorithms instead. This line of reasoning just doesn't make any sense.
And "Once nice advantage is that since they don't need any infrastructure, they're comparably cheap to create." What? It needs a dynamic array!
The growable array type ("vector" following C++ parlance) lacks the bifurcated reservation API meaning it has the same problem as Bjarne's std::vector - but it's 2025 people, just because C++ made this mistake last century doesn't mean you need to copy them.
And finally yes you want a really good general purpose hash table, this is one of the places where generics shine most brightly, don't "spend the next few months researching" pick a language which does a decent job of this out of the box, but since you're in C, your utility library should likewise provide a decent hash table out of the box.
Swiss Tables are literally just a single growable allocation, this idea that you've somehow made your thing cheaper than a hash table by using the growable array type underneath it means you're at best four decades behind the state of the art, which is a bad sign.
This is a Young Discipline. David Musser's "Introspective sorting" paper was written after I learned sorting at University. Literally the class where they taught me about sorting was held before that paper was even written, let alone widely disseminated. The whole terminology of "Lock free" versus "Wait free" again, that's newer than my undergraduate final project on distributed systems. Because this is a Young Discipline it's crucial to go check, hey, the stuff I learned in class years ago, is that actually still correct, and does my understanding match reality - or am I about to recite a known falsehood because I forgot how time works and/or I didn't pay attention in class?
What's "the bifurcated reservation API"?
There are several ways you could arrange this, but some of them can't optimize certain scenarios practically. I call Rust's choice here a "bifurcated" API because it has two functions named `reserve` and `reserve_exact` where many provide only one (typically named `reserve` but analogous to `reserve_exact`)
Because we know the circumstance, we can use the amortized growth strategy where appropriate in `reserve` even though we don't use it for `reserve_exact`.
Suppose I'm receiving bundles of Doodads, to form a Shipment, I can see how many Doodads are in the bundle I received, but I only know it's the last bundle of the shipment or it's not, I don't have advance notice of the full size of the Shipment.
If I receive bundles of 10, 15, 11, 20, 9, 14 and finally 13 Doodads. Total shipment size was 92 Doodads.
With just naive doubling, we grow to 1, 2, 4, 8, 16, 32, 64 and finally 128 Doodads space, we perform 127 copies + 92 new writes = 219 Doodad writes, 8 heap allocations. That's our base case, it's what Bjarne Stroustrup recommends and what you'd get in many languages out of the box or if you've never used a reservation API.
If we abuse Vec::reserve_exact - as might be tempting if it's the only API - we grow 10, 25, 36, 56, 65, 79, 92, we perform 271 copies + 92 new writes = 363 Doodad writes, 7 heap allocations, one fewer allocation but many more copies, probably slightly worse. Ouch.
If we use the bifurcated API we grow 10, 25, 50, 100, we perform 85 copies + 92 new writes = 177 Doodad writes, 4 allocations, we're doing markedly better.
What would you recommend as a source instead?
#define hc_task_yield(task)
do {
task->state = __LINE__;
return;
case __LINE__:;
} while (0)
That's just diabolical. I would not have thought to write "case __LINE__". In the case of a macro, using __LINE__ twice expands to the same value where the macro is used, even if the macro has newlines. It makes sense, but TIL.from https://github.com/codr7/hacktical-c/blob/main/macro/macro.h
#define hc_align(base, size) ({ \ __auto_type _base = base; \ __auto_type _size = hc_min((size), _Alignof(max_align_t)); \ (_base) + _size - ((ptrdiff_t)(_base)) % _size; \ }) \
After preprocessing it is a single line.
https://www.chiark.greenend.org.uk/~sgtatham/coroutines.html
Shows how old this post is. In fact I remember reading it well over 10 years ago, maybe more like 20. archive.org says that it's at least as old as 2001. A great article.
I'm very excited to see he's published a new article on C++20 coroutines. I've read (or maybe skimmed...) a few introductions and not really got them, despite having used C# and Python coroutines a lot with no problems (even making changes to an async runtime for Python). Given how clear his C coroutine article is, I'm optimistic about the C++ article.
> So, after the course, I went away and studied on my own, and wrote the introduction to C++ coroutines that I’d have liked to see.
https://www.chiark.greenend.org.uk/~sgtatham/quasiblog/corou...
Simon Tatham's Portable Puzzle Collection https://www.chiark.greenend.org.uk/~sgtatham/puzzles/
#define CO_BEGIN static void* cr_state_ = &&cr_st_0; goto *cr_state_; cr_st_0:
#define CO_RETURN(x) ({ __label__ resume; cr_state_ = &&resume; return (x); resume:; })
https://www.kernel.org/doc/html/latest/filesystems/vfs.html
The vast majority of the function pointers in those structures are optional (even if not explicitly stated). To give a few common sense examples:
* If your filesystem does not support extended attributes, you would not implement .listxattr and instead set it to NULL.
* There are multiple ways of implementing read and write in file_operations. You have the basic read and write operations, and more efficient variants. You don’t need to implement the more efficient variants if you don’t want to implement them.
* The .bmap call is used to find out how the filesystem stores a file on a block device, which used to be used by the syslinux (and might still be). This obviously is incompatible with NFS (or any multidisk filesystem like ZFS) so it absolutely must be optional.
Then there are other options, like not supporting mmap, or not supporting creation/removal of subdirectories. That sounds absurd, but some FUSE filesystems, particularly those exporting a program’s statistics, don’t bother with either of those since they are not needed. I do not believe Linux sysfs allows users to make directories either.I could continue, but this gives a few examples of why you might want to have optional functionality in a class-like interface.
By the way, I mentioned setting things you do not implement to NULL. This is done simply by not specifying them when using the structure initializer syntax. The compiler will zero unspecified members.
If you want to make something fancy, templates, if constexpr requires func-to-call, call func.
You need to use hacks to shoehorn C++ class member functions into this. In particular, you need stub functions. Then either, call them and have them either return a special error code or throw an exception, or use a custom query function that is implemented by derived classes that lets you find out if a function is a stub or not to allow you to skip calling it. Another idea would be to use thread local storage with setjmp()/longjmp(), which is probably the sanest way of doing this insane idea:
https://godbolt.org/z/4GWdvsz6z
And the C way for comparison:
https://godbolt.org/z/qG3v5zcYc
The idea that the simplest way of approximating what you can do with function pointers in C structures via C++ class member functions is to use TLS and setjmp/longjmp shows what a bad idea it is to use class member functions instead of function pointers for optional functions in the first place.
The same C example compiled in C++23 mode, https://godbolt.org/z/MWa7qqrK7
As for possible alternatives, here is a basic one without taking into consideration virtual mechanics, only to show the principles.
#include <concepts>
template <class T>
concept has_mmap = requires (T obj)
{
{ obj.mmap() } -> std::convertible_to<int>;
};
class VFS {
public:
VFS() = default;
virtual ~VFS() = default;
};
class ExampleFS : public VFS {
// mmap not available
};
class ExampleWithMMAP : public VFS {
public:
int mmap() {
return 0;
}
};
int main() {
ExampleFS fs;
ExampleWithMMAP fsWithMMAP;
/*
<source>: In function 'int main()':
<source>:33:19: error: 'class ExampleFS' has no member named 'mmap'
40 | return fs.mmap();
|
*/
if constexpr (has_mmap<ExampleFS>) {
return fs.mmap();
}
// ExampleWithMMAP has mmap(), just call it without issues
if constexpr (has_mmap<ExampleWithMMAP>) {
return fsWithMMAP.mmap();
}
// want to use the variable name instead of the type?
if constexpr (has_mmap<decltype(fsWithMMAP)>) {
return fsWithMMAP.mmap();
}
}
-- https://godbolt.org/z/cjcbrzT3zNaturally it is possible to be a bit even more creative, and moreso with C++26 reflection.
The same C example compiled in C++23 mode, https://godbolt.org/z/MWa7qqrK7
Everyone knows this. The original comment was saying not to do this (even in C++) and use C++ classes instead. I was making the point that is a bad idea. You seem to have not understood that.1. It is not possible to add optional member functions (which would be pure virtual functions) to a C++ class base class and then check at runtime if they are unimplemented in the object (at least not without implementing some way to query the object, which is slow). If you say to handle this by having typeid checks at runtime, look at the VFS and then notice that you cannot implement this typeid check in advance, since you cannot add a typeid check for a derived class that did not even exist when you compiled your code. Thus, you still need to use structs of function pointers in C++. Maybe you can use C++ classes for some cases where structs of function pointers are used, but you would giving up the ability to implement optional functions in a sane way.
2. It ignores all of the things in C that are absent from C++. In particular, C++ refuses to support C’s variably modified types and variable length arrays, which are useful language features.
3. It ignores all of the things in C++ that you likely do not want, such as exceptions and RTTI. The requirement to typecast whenever you assign a void pointer to any other pointer is also ridiculous.
Thankfully regarding 2., Google went the extra mile to pay for removing them from the Linux kernel, and they were made optional C11 onwards exactly because they are an attack vector.
3. It is called stronger type safety, ridiculous is the C community still approaching computers as if writing K&R C.
Furthermore, less is more. You get faster build times with C because it does not support all of the features C++ has. Just because you can do it in C++ does not mean you should.
I used C++ for one of my first projects for a startup in health care and I really wish I had not. C++ made development a hellish experience as I spent most of it on fighting the compiler to be able to use every C++ language feature I could imagine and not enough on actual issues. It easily doubled development time since I spent most of it on things that only existed because C++ had overcomplicated everything (e.g. reference versus pointer, public versus private, shoehorning OOP into places it did not belong, operator overloading, templates, etcetera). This was during my initial attempt at graduate studies and after ruining a semester because of it (this had been intended to be a part time thing), I parted ways with the company. The C++ daemon went on to be the heart of the company, despite the lingering bugs.
I ended up fixing the remaining issues as a consultant years later, but eventually, I realized that everything would have been better had I not used C++ in the first place. There are times when I fantasize about rewriting it in C. One of these days, I might actually do that for the company for free if only to put an end to a mistake of my youth. Unfortunately, now that I have fixed the daemon, it has the advantage of being a mature, reliable codebase, so it is difficult to justify a rewrite.
That said, despite my complaints about the effect C++ had on development, I did a number of things right when architecting that daemon. The lingering bugs turned out to be trivial and it has scaled with the company for 13 years with no end in sight. When it finally is replaced, the reason will likely be that it did not support HA, rather than some inability to scale. My younger self had refrained from pursuing HA since it seemed infeasible to do within the spare time I had during a single semester.
The difference in build times between identical code compiled with the C language or C++ language is probably negligible. Or at least dwarfed by using a better build system, a faster build machine, and/or some sort of build caching technology.
> Beyond that, there is no compiler flag to stop requiring explicit casts of void pointers before assigning them.
I believe that's true. And there are probably a few other ergonomic differences beyond this one. Has anyone proposed that as a feature flag for Clang and/or GCC? Open source C and C++ compiler devs don't have a lot of free time such that they peruse social media looking for things to do.
No comment on your anecdote other than to say I have heard versions of that story before but with other programs and in basically every other language. Including C.
I'm not saying you're wrong. I think a lot of your points are valid points. About taste. Which is fair and fine, but it's also true that the difference between C and C-style C++ are pretty minor, especially if someone knows how to enforce coding standards with clang-query wired up to CI or something like that.
https://godbolt.org/z/z9M55s3q6
What is particularly nice about that code is that a C compiler will realize that it has a buffer overflow. Adapting it for C++ will cause the C++ compiler to not notice the buffer overflow.
If you are going to be writing C, there is no reason to compile it as C++. Using C++ limits your ability to use newer features of C and exposes you to headaches like the ABI compatibility break of GCC 5.0 that was done for C++11. C has never had an ABI compatibility break caused by a revision of the language. Your suggestion that people should use C++ even when it is not what anyone wants befuddles me.
If you said this in a room with Linus Torvalds, I wonder if he would start cursing again.
https://godbolt.org/z/4GWdvsz6z
That is the closest I can get it to implementing an optional function via a C++ class member function instead of a function pointer. It is not only insane, but also masochistic in comparison to how it would be done via function pointers:
Because in C++ the features are just there right around the corner, they will seep into the code base.
And I don't want even classes, there's too much junk in there that I don't need.
You can sort of emulate it using pointers to member but it quickly loses its appeal.
Same thing people said about other people not compiling by hand lol.
I love C because it doesn't make my life very inconvenient to protect me from stubbing my toe in it. I hate C when I stub my toe in it.
I understand where this is coming from, but I think this is less true than it used to be, and (for that reason) it often devolves into arguments about whether the C standard is the actual source of truth for what you're "really" allowed to do in C. For example, the standard says I must never:
- cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)
- allow a signed integer to overflow
- pass a NULL pointer to memcpy, even if the length is zero
- read an unitialized object, even if I "don't care" what value I get
- read and write a value from different threads without locking or atomics, even if I know exactly what instructions those reads and writes compile into and the ISA manual says it's 100% fine to do that
All of these are ways that (modern, standard) C doesn't really "do what the programmer said". A lot of big real-world projects build with flags like -fno-strict-aliasing, so that they can get away with doing these things even though the standard says they shouldn't. But then, are they really writing C or "C with custom extensions"? When we compare C to other languages, whose extensions are we talking about?
cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)
Use the union type. Abusing it for aliasing violates the standard too, but GCC and Clang implement an extension that permits this. Alternatively, just allocate a char array and cast it as you please. Strict aliasing does not apply to char arrays if I recall. allow a signed integer to overflow
Is this still true? I thought that the reason for this is because C left the implementation to define how signed arithmetic worked, meaning you could not assume two’s complement, but the most recent C standard was supposed to mandate two’s complement. pass a NULL pointer to memcpy, even if the length is zero
There is a reason for this. memcpy is allowed to start reading early as a performance optimization, before it does a branch that checks if reading is only. I do wonder what happens if you only want to copy 1 byte and that byte has invalid memory right next to it. Presumably, this optimization would read more than a byte. read an unitialized object, even if I "don't care" what value I get
You are probably doing something wrong if you do this. It is not even good as an entropy source. read and write a value from different threads without locking or atomics, even if I know exactly what instructions those reads and writes compile into and the ISA manual says it's 100% fine to do that
Earlier C standards likely did not say anything about this because they did not support multithreading, but outside of possibly reading/writing to hardware registers, you do not want to do this because of races. Even if you think you know better, you almost certainly do not.While that's true, overflows are not automatically wrapping because they instead may trap for several reasons. (C++ does require wrapping now in comparison. [1])
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2412.pdf
> memcpy is allowed to start reading early as a performance optimization, [...]
Most modern memcpy implementations would branch on the length anyway, because word-based copying is generally faster than byte-based copying whenever possible. Also many would try SIMD when the copy size exceeds some threshold for the same reason.
>> read an unitialized object, even if I "don't care" what value I get
> You are probably doing something wrong if you do this.
The GP meant the case like this. Consider `struct foo { bool avail; int value; } foos[100];` where `value` would be only set when `avail` is true. If we are summing all available `value`s, we may want to avoid a branch misprediction by something like `accum += foos[i].avail * foos[i].value;` for each `foos[i]`, since the actual `value` shouldn't matter when `avail` is false. But the current specification prohibits this construction because it considers that each read from `foos[i].value` may be different from each other (!). In reality, this kind of issues is so widespread that LLVM has a special "poison" value which gets resolved to some fixed value after the first use.
As for the last one, I would probably bzero() that structure, as it is faster than setting just 1 field to zero in a loop, which presumably is what you would do until you have some need to “allocate” a value. That would avoid the problem entirely.
I know bzero() was removed from POSIX, but “bzero()” is nicer to write than “memset() it to zero”.
> Use the union type. Abusing it for aliasing violates the standard too, but GCC and Clang implement an extension that permits this. Alternatively, just allocate a char array and cast it as you please. Strict aliasing does not apply to char arrays if I recall.
I could be misreading, but you seem to be implying that you can trick the aliasing rules by casting Foo* to char* and then cast the char* to Bar*, but that still violates the rule. Even a union isn't allowed as a way of aliasing, but as you say it's often allowed in practice and is heavily used in the Linux kernel (and Linus has made his opinion on this part of the language standard very clear).
In theory, the right way to access the bits of a Foo as a Bar is to memcpy to a fresh Bar object, and then memcpy back if you want to update the original variable. The compiler is then allowed to optimise this into a direct access of the bits.
If you insist on doing what you described, just skip char * and mark the pointer with __attribute__((may_alias)) and then it will be okay. That is a compiler extension that lets you turn off strict aliasing rules.
char x[sizeof(struct Foo)];
struct Foo* f = (struct Foo*)&x;
struct Bar* b = (struct Bar*)&x;
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
https://www.iso-9899.info/n1570.html#6.5p7
I realise that many implementations will allow it anyway but if you're relying on that then you may as well fall back to a straight cast from Foo* to Bar*, which is also not allowed in theory.
cast a `struct Foo*` into a `struct Bar*` and access the Foo through it (in practice we teach this as the "strict aliasing" rules, and that's how all(?) compilers implement it, but that's not what §6.5 paragraph 7 of the standard says!)
Use the union type. Abusing it for aliasing violates the standard too, but GCC and Clang implement an extension that permits this. Alternatively, just allocate a char array and cast it as you please. Strict aliasing does not apply to char arrays if I recall. allow a signed integer to overflow
Is this still true? I thought that the reason for this is because C left the implementation to define how signed arithmetic worked, meaning you could not assume two’s complement, but the most recent C standard was supposed to mandate two’s complement.>> pass a NULL pointer to memcpy, even if the length is zero
> There is a reason for this. memcpy is allowed to start reading early as a performance optimization, before it does a branch that checks if reading is only.
Where did you get this idea from? It's not possible, since you can hand an address at the end of an array, and length 0. The array ends at the end of a page.
You can't read extra bytes in this case!
This lead me to think of the case where you hand it the address right before the end of a byte array where the byte after the last byte is an unmapped page and tell it to copy 1 byte. I suspect systems that have such an optimization would read beyond 1 byte into invalid memory. This is my criticism of the idea of having memcpy(NULL, NULL, 0) be undefined to make that speed trick legal. I am suggesting that an undefined number of low values to copy must also be undefined, yet they are not under the standard.
"C assumes you know what you're doing, which is only a problem because you don't know what you're doing."
Periodically, especially in r/cpp I run into people who are apparently faultless and so don't make the mistakes that make these languages dangerous, weirdly none of these people seem to have written any software I can inspect to see for myself what that looks like, and furthermore the universe I live in doesn't seem to have any of the resulting software. I choose to interpret this mystery as: People are idiots and liars, but of course there could be other interpretations.
That said, I have not written perfect C code myself, but I have fixed a number of mistakes others made in their C code. Many of my commits to OpenZFS were done to fix others’ mistakes. A few of my commits even contained my own mistakes that I or others later caught. Feel free to inspect the codebase yourself. You should find it is a very well written codebase
So basically Jeff Sutherland ever since he started talking about AI. "My AI agents have formed a Scrum team that's 30 times faster than any human developer!" Great, Jeff. Working in which company's production codebase?
The real danger with Rust is the cult like delusion that's not the case for them.
Other than writing memory safe code, as history has shown.
Because it allows things that are difficult, like writing your own memory allocators.
If you don't like working at that difficulty level, then C programming isn't for you. And that's fine.
This line of argumentation reminds me of this:
Advertise and promote a shortcoming or a fault as a virtue.
For example, ultra-cheap single-use film cameras are advertised as "No Focusing Required." The truth is, no focusing is possible, because those cameras have cheap plastic fixed-focus lenses that won't move and can't be focused. What is a serious shortcoming for a camera — the inability to properly focus on the subject — is sold as a convenience: "You don't have to bother with focusing."
https://orangepapers.eth.limo/orange-propaganda.html#make_vi...
When your computer is a PDP-11, otherwise it is a high level systems language like any other.
Your C optimizer is emulating that VM when performing symbolic execution, and the compiler backend is cross-compiling from it. It's an abstract hardware that doesn't have signed overflow, has a hidden extra bit for every byte of memory that says whether it's initialized or not, etc.
Assembly-level languages let you write your own calling conventions, arrange the stack how you want, and don't make padding bytes in structs cursed.
The existence of undefined behaviour isn't proof that there is a C "virtual machine" that code is being run on. Undefined behaviour is a relaxation of requirements on the compiler. The C abstract machine doesn't not have signed overflow, rather it allows the compiler to do what it likes when signed overflow is encountered. This is originally a concession to portability, since the common saying is not that C is close to assembly, but rather that it is "portable" assembler. It is kept around because it benefits performance, which is again one of the primary reasons people write C.
> The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant.
This belief that C targets the hardware directly makes C devs frustrated that UB seems like an intentional trap added by compilers that refuse to "just" do what the target CPU does.
The reality is that front-end/back-end split in compilers gave us the machine from the C spec as its own optimization target with its own semantics.
Before C got formalised in this form, it wasn't very portable beyond PDP. C was too opinionated and bloated for 8-bit computers. It wouldn't assume 8-bit bytes (because PDP-11 didn't have them), but it did assume linear memory (even though most 16-bit CPUs didn't have it). All those "checking wetness of water... wet" checks in ./configure used to have a purpose!
Originally C didn't count as an assembly any more than asm.js does today. C was too abstract to let programmers choose addressing modes and use flags back when these mattered (e.g. you could mark a variable as `register`, but not specifically as an A register on 68K). C was too high level for tricks like self-modifying code (pretty standard practice where performance mattered until I-cache and OoO killed it).
C is now a portable assembly more because CPUs that didn't fit C's model have died out (VLIW) or remained non-standard specialized targets (SIMT).
It gets very frustrating to communicate at this level.
Isn't this true for most higher level languages as well? C++ for instance builds on top of C and many languages call into and out of C based libraries. Go might be slightly different as it is interacting with slightly less C code (especially if you avoid CGO).
Describing C as "high-level" seems like deliberate abuse of the term. The virtual machine abstraction doesn't imply any benefits to the developer.
Honestly it doesn't really matter. High level and low level are relative to each-other (and machine language), and nothing changes based on what label you use.
Best thing to do is shrug and say "ok".
C has always been classed as a high level language since its inception. That term's meaning has shifted though. When C was created, it wasn't assembly (middle) or directly writing CPU op codes in binary/hex (low level).
What makes C low-level is that it can work directly with the representation of objects in memory. This has nothing to do with CPU features, but with direct interoperability with other components of a system. And this is what C can do better than any other language: solve problems by being a part of a more complex system.
The integral promotion rules come directly from the PDP-11 CPU instruction set.
If I recall correctly so does the float->double promotions.
CPUs started adapting to C semantics around the mid-80's. CPU designers would profile C generated code and change to be able to more efficiently run it.
No one is claiming it was built for today's processors, just that it puts less obstacles between you and the hardware than almost any other language. Assembler and Forth being the two I'm familiar with.
One of the very first systems programming languages was JOVIAL, from 1958. C's inventors were still finalising their studies.
The other approach, taken by Rust (and to some degree C++), is to nail everything to the floor and force the programmer to express a solution in a specific format that's easier to verify and make guarantees about. Which is fine.
Both approaches have their appeal, which is best depends on context.
Keep waiting for the examples where they can't do what ISO C allows for, and if the example uses compiler extensions to the ISO C, I also feel within the right to use extensions to those languages on the counter example.
The difference is clear. Assembly language programs specify sequences of CPU instructions. C programs specify runtime behavior.
But then again, you booby trapped the question with popular language.
So which of these languages do you think is a better representation of hardware and not a PDP-11?
None of them, you use Assembly if you want the better representation of hardware.
Yes, I am quite confident, because I have been dispelling the C myth of the true and only systems programming language since the 1990's.
C does not have an infinite number of libraries and examples. The number of libraries and examples C has is quite large, and there are an infinite number of theoretically possible libraries and examples, but the number of libraries and examples that exist are finite.
For one, would expect that a low level language wouldn't be so completely worthless at bit twiddling. Another thing, if C is so low level, why can't I define a new calling convention optimized for my use case? Why doesn't C have a rich library for working with SIMD types that has been ubiquitous in processors for 25 years?
Otherwise is says, do whatever you feel like.
What's standardized was never as important in C land, at least traditionally, which I guess partly explains why it's trailing so far behind. But the stability of the language is also one of its features.
But sure, if all youre doing is dot products I guess you can write a standard function that will work on most simd platforms, but who cares, use a linalg library instead.
Which language more accurately represents hardware then?
The real answer is obviously Assembly - pick a random instruction from any random modern CPU and I'd wager there's a 95% chance it's something you can't express in C at all. If the goal is to model hardware (it's not), it's doing a terrible job.
Besides, the operations are all wrong and only work for trivial values of the exponents, like 0, 1 and 2.
First of all, those languages do not "help" "reducing" some classes of bugs. They often entirely remove them.
Then, even assuming that any safe language with unsafe regions (Rust, C#, etc) would not give you comparable flexibility at a fraction of the risk... if your flexible, effortless solution contains entire classes of bugs, then there is no point in comparing "effort". You should at least take into account the effort in providing a software with a high confidence that those bugs are not there.
There are good reasons to use C. It's best to approach it with a clear mind and a practical understanding of its limitations. Be prepared to mitigate those short comings. It's no small task!
While I am sure there can not be enough security, I am not at all sure the extreme focus on memory safety is worth it, and I am also not sure the added complexity of Rust is really worth it. I would prefer to simplify the stack and make C safer.
This was the only notable failing of Sean's (abandoned) "Safe C++" - it delivers all the technology a safe C++ culture would have needed, but there is no safe C++ culture so it was waved away as unimportant.
The guy whose mine loses fifty miners in a roof collapse doesn't need better mining technology, inadequate technology isn't why those miners died, culture is. His mine didn't have safety culture, probably because he didn't give shit about safety, and his workers either shared this dismissal or had no choice in the matter.
Also "extreme focus" is a misinterpretation. It's not an extreme focus, it's just mandatory, it's like if you said humans have an "extreme focus" on breathing air, they really don't - they barely even think about breathing air - it was just mandatory so if you don't do it then I guess that stands out.
Also, relevantly here, nobody actually wants these terrible bugs. This is not A or B, Red or Blue, this is very much Cake or Death and like, there just aren't any people queueing up for Death, there are people who don't particularly want Cake but that's not the same thing at all.
Hence why I am so into cybersecurity laws, and if this is the only way to make C and C++ communities embrace a safety culture, instead of downplaying it as straitjacket programming like in the C vs Pascal/Modula-2 Usenet discussion days, then so be it.
Maybe that could be through a type system. Maybe that could be through a more capable run-time system. We've tried these avenues through other languages, through experimental compilers, etc.
Without introducing anything new to the language we have a plethora of tools at our disposal:
- Coq + Iris, or some other proof automation framework with separation logic.
- TLA+, Alloy, or some form of model checking where proofs are too burdensome/unnecessary
- AFL, Valgrind and other testing and static analysis tools
- Compcert: formally verified compilers
- MISRA and other coding guidelines
... and all of this to be used in tandem in order to really say that for the parts specified and tested, we're confident there are no use-after-free memory errors or leaks. That is a lot of effort in order to make that statement. The vast, vast majority of software out there won't even use most of these tools. Most software developers argue that they'll never use formal methods in industry because it's just too hard. Maybe they'll use Valgrind if you're lucky.
Or -- you could add something to the language in order to prevent at least some of the errors by definition.
I'm not a big Rust user. Maybe it's not great and is too difficult to use, I don't know. And I do like C. I just think people need to be aware that writing safe C is really expensive and time consuming, difficult and nothing is guaranteed. It might be worth the effort to learn Rust or use another language and at least get some guarantees; it's probably not as hard as writing safe C.
(Maybe not as safe as using Rust + formal methods, but at least you'll be forced to think about your specification up front before your code goes into production... and where you do have unsafe code, hopefully it will be small and not too hard to verify for correctness)
Update: fixed markup
It is the lack of culture to use them unless there is a goverment mandate to impose them, like in high critical computing.
I might too some day, who knows.
(Predictable response: "But they can only occur in unsafe regions which you can grep for" and my response to that: "so?")
It's unfortunate that C has so many truly unnecessary bugs which are only caused by stupid overly "clever" exploitation of undefined behaviour by compilers.
But what bugs? Suboptimal choices maybe; but any backwards compatible, popular language is going to have its share of those.
Nowadays pick your scripting language, and if C is really needed, cleanly placing it in a loadable module with all security invariants into that scripting, or managed language, instead of 100% pure C source.
My solution since early 2000's.
The situation is both worse than this and better than this. Consider the .set_len() method on Rust's Vec. It's unsafe, because you could just .set_len(1_000_000) and then the Vec would happily let you try to read the nonexistent millionth element and segfault. However, if you could edit the standard library sources, you could add this new method to Vec without touching any unsafe code:
pub fn set_len_totally_safe_i_promise(&mut self, new_len: usize) {
self.len = new_len;
}
This is exactly the same as the real set_len, except it's a "fn" instead of an "unsafe fn". Now the Vec API is totally broken, and safe callers can corrupt memory. Also critically, we didn't write any unsafe code in "set_len_totally_safe_i_promise". The key detail is that this new method has access to the private self.len field of Vec that unsafe blocks in the same module rely on.In other words, grepping for all the unsafe blocks isn't sufficient for saying that a program is UB-free. You also have to make sure that none of the safe code ever violates an invariant that the unsafe blocks rely on. Read the comments, think really hard, etc.
So...what's the point of all this? The point is that it lets us define a notion of "soundness", such that if I only write safe code, and I only use libraries that are "sound", we can guarantee that my program is UB-free. In other words, any UB in my program would necessarily point to a bug in one of my dependencies, in the stdlib, or in the compiler. (Or you know, in the hardware, or in mathematics itself.) In other other words, instead of auditing my entire gigantic (safe) program for UB, we can reduce the problem to auditing my dependencies for soundness. Crucially, this decouples the difficulty of the problem from the size of my program. This wouldn't be very interesting if "safe code" was some impoverished subset, like "unsigned integer arithmetic only". But in fact safe code can use pointers, tagged unions, pointers into tagged unions, heap allocation/freeing, and multithreading. Lots of large, complicated, useful, real-world programs are written in 100% safe code. Here the version of this story with all the caveats and footnotes: https://jacko.io/safety_and_soundness.html
But yes, this is nice and we should (and probably will) have a safe mode in C too.
edit: I'm sorry that my captain obvious moment is turning out to be some truth bomb for some. Please keep downvoting as a way to regain your inner peace.
*you or anyone else in your chain of dependencies that use unsafe
$ pandoc --pdf-engine=xelatex --toc README.md {macro,fix,list,task,malloc1,vector,error,set,malloc2,dynamic,stream1,slog}/README.md -o book.pdf