Posted by simonpure 4 days ago
[1]: https://nee.lv/2021/02/28/How-I-cut-GTA-Online-loading-times...
In that case, the fix is not to change C strings (breaking a lot of existing code), but to introduce a stringbuilder type.
It's a bit of extra code, yes. Not necessarily all that much, but some. On average it is only slightly more expensive than null termination, and considered as a proportion of the size of the strings themselves it's hardly anything. It's probably better than the strings getting hard-limited to 0-255, though, which was quite frequently a user-visible quirk.
The next level (110x xxxx) would give you 8MiB strings, which are going to be fine for most things.
But remember the first Macintosh shipped with 128KB of RAM, 131,072 bytes. Three more bytes per string hurts a lot more there...
... although, that said, even in that era given the number of errors that null-terminated strings caused, even completely ignoring security, I do still wonder if at least defaulting to 2 bytes of length and doing something special for strings over 64K still wouldn't have been the right tradeoff, even in the case of short strings. Today we mostly focus on security, but null-terminated strings also caused a lot of just plain-old bugs. But so did 1-byte length strings... it's way too easy to run out of 256 characters even on those old systems.
Pascal strings - historically and why people even remember this being an issue - were up to 255 chars in size, if not you had to use different string type.
You might still want raw pointers for all sorts of low level stuff, but you almost never want to have null-terminated strings for anything but back-compat, one of the worst things ever, even on memory constrained systems.
* NUL: An ASCII non-printing character with the byte value of 0
* NULL: A pointer that does not point to usable memory with the value that compiles in C to be equal to ((void *) 0).
I don’t think anyone in this thread is confusing the null character with the null pointer.
Yes it was an abbreviation in ASCII, as are all the non-printable first 32 codes.
It's just a shame that such a confusing name was chosen for such a niche use case (fixed width records that require null padding).
I was curious: Why have it, instead of just using memcpy_and_pad?
AI's answer (paraphrased) was * Avoid possible bugs from manually write sizeof(dest) * Enforces the __nonstring Attribute * signals: "I am converting an actual C-string into a fixed-width legacy memory field." vs copy binary data & pad it.
Interesting to learn about the __nonstring attribute:
https://github.com/torvalds/linux/blob/1a3746ccbb0a97bed3c06... https://github.com/search?q=repo%3Atorvalds%2Flinux+__nonstr...
Could you please elaborate on this? Both `man strncpy_s` and `man strcpy_s` didn't return any manual page on my Linux system.
For a moment, I misunderstood it as (g)libc removing strncpy and was worried about the trouble its going to cause.
The race conditions appear to be a result of the Linux kernel implementation but UNIX style syscalls introduce these races by default. It is not an inherent flaw of the API or even the implementation Linux was using.
The only useable C string API has always been memcpy anyways.
I should have sent them a nice fruit basket to commemorate the occasion.