Stephen Bourne wanted to write his shell in ALGOL so badly that he relentlessly beat C with its own preprocessor until it began to resemble his preferred language.
https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...
https://www.tuhs.org/cgi-bin/utree.pl?file=V7/usr/src/cmd/sh...
#include "pretty.h"
void print_int(int value){
println(value);
}
int main (int argc, string argv[])
{
long value = 23849234723748234;
print_int(value);
}
How is this strongly typed? $ cc test.c -o test && ./test
-1411401334
And to be clear, weak vs strong isn't a boolean property but a spectrum, but would be hard to argue with a straight face than C is a strongly typed language.Java is weakly typed in its generics, despite being statically typed. I’m sure there are more examples.
It's weak in many areas, such as, oh, that you can implicitly convert an out-of-range floating point value to an integer type and get undefined behavior.
Linkage in C is not type safe. An extern int x declaration in one translation unit can be matched with an extern double x = 0.0 definition in another. Linkers for c typically accept that without a diagnotic: the program links.
I saw a blog post a long time ago that went into the details of how ./foo worked, and how it executed an elf file. You could register `.c` programs in the same way to be compiled and run?
[0] https://gist.github.com/jdarpinian/1952a58b823222627cc1a8b83...
(For even more insanity I guess you could also trigger on // and /*, although there’s some risk of false positives then!)
If you want to insist that scripting languages can be either compiled or interpreted, then its better to just drop it altogether and just say "language" because the "scripting" part has utterly lost its identity at that point.
There are good reasons for why scripts are often interpreted and why systems are often compiled, but that's not what defines them. There are definitely scripts that are compiled and systems that are interpreted out in the wild.
compiled languages are rarely used for one offs because the effort they require is usually greater than the task calls for.
a big part of perl/python use is in tying together libraries written in more difficult lower level compiled languages.
you'll also see scripting used to refer to languages embedded in larger projects. lua scripts to control entities in a game, for instance. do they compile these somehow? I never did in the little project I used lua for.
----
all of that together, I expect that scripting as a concept largely boils down to conceptually simpler languages with less view of the ugly underbelly of how things actually work in a computer, used to chain together abstractions created by lower level code.
scripting is duct-tape. whether you duct-tape together a one-off task or some wad of long running functionality is besides the point.
Yes, but this is conceptually exactly the same as the aforementioned shell scenario. This is not something different.
Just as I suspected, there is only one definition, and one that has proven to actually be well defined to boot as you managed to reiterate the only definition I have ever known to perfection.
Haha love this!
I love this to the very core of my being.
I'd argue that strings and bytes are the same general type, but it's sometimes useful to give well-formed utf8 bytes a different type internally. Rust gets this mostly correct with OsString and String.
The thing I think Rust maybe goofed, or at least made a little complicated, is their weird distinction between a String and a str (and a &str). As a newbie learning the language, I have no idea which one to use, and usually just pick one, try to compile, then if it fails, pick the other one. I'm sure there was a great reason to have two types for the same thing, that I will understand when I know the language better.
If you want to understand more deeply, the Rust Programming Langauge, chapter 4, uses String and &String and &str to talk about ownership and borrowing. Here’s a link to the start of that chapter: https://doc.rust-lang.org/stable/book/ch04-00-understanding-...
Your blog post is practical and clearly explains what to do, when, which is helpful. What's confusing is why Rust has the two types and why the language designers decided it was a good idea to have to convert back and forth between them depending on whether it was going in a struct or being passed as an argument. I suppose the "why" is probably better found in the Rust docs.
As a long-time C++ user, it seems like std::string vs const char* all over again, and we somehow didn't find a better way.
It’s closer to std::string and std::string_view. But yes, in a language with value and reference semantics, when you also care about performance, you just can’t do any better: you need both types. Or at least, if you want the additional correctness guarantees and safety provided by communicating ownership semantics in the type. C gets away with just char * but then you have to read the docs to figure out what you’re allowed to do with it and what your responsibilities are.
A pointer to some memory is not the same thing as a struct that has a pointer to memory, as well as a capacity field and the ability to resize itself.
To give a real example, I once wrote some python scripts to parse serial messages coming off a bus. They'd read the messages, extract some values with regex, and move on.
Unfortunately the bus had some electrical bugs and would intermittently flip random bits with no CRC to correct them. From my point of view, no big deal. If it's in something outside the fields I care about, I won't notice it. If it's flipped something I do care about we have a bad sample to drop or noise the signal processing will deal with. Either way, it's fine. Python on the other hand cared very much. I rewrote everything in C once I got sufficiently annoyed of dealing with it and more importantly explaining to others how they couldn't "simplify" things using the stdlib APIs.
// NB: Must be utf-8!
struct string {
size_t sz;
size_t capacity;
unsigned char *buffer;
};
&String in Rust is roughly like `const struct string *`.str in Rust is just an array of (guaranteed utf-8) unsigned bytes. It does not have a capacity, so it can't be resized. You can't directly construct one (on the stack), because its size is undetermined and Rust doesn't have dynamic-sized stack allocation.
&str, and Box<str>, are pointers to str, along with a size, and are roughly like this C:
// NOTE: Must be utf-8!
struct str_ptr
{
size_t sz;
unsigned char *buffer;
}
The difference between &str and Box<str> is that the latter is an owned pointer to a heap allocation which will be freed when it goes out of scope. &str is unowned and might point anywhere: to a Box<str> on the heap, to a String on the heap, or to read-only static memory.IMO, it's probably easier to first try to understand the difference between `Vec<u8>`, `&[u8]`, and `&Vec<u8>`, because they are slightly less "weird" than the string types: they aren't syntactically special like `str` is[1], and they don't have an implicit requirement to be utf8 that is inexpressible in the type system.
[1]: `str` is syntactically special because it is basically a slice, but isn't written in slice notation.
Rust could have done better in naming, but a definite design goal of the language (for better and worse) is to not make things that are complicated for the compiler appear simple to the user. Which unfortunately results in:
String/str
CString/CStr
OsString/OsStr
Vec<u8>/[u8]
AsRef<str>
Cow<`a, str>
Of course if you provide a separate set of functions for treating a string as human readable vs not you can also work with that. Basically len() vs byte_len().
But you can’t concat two human readable strings without ensuring they are of the same encoding. You can’t search a string by bytes if your needle is of a different encoding. You can’t sort without taking encoding and locale preferences into account, etc.
Pretending like you don’t care about encoding doesn’t work as we have seen time and again.
At the language level C historically hasn't offered much support for working with specific character sets and their encodings. With C17 and C23 we get u"...", U"...", u8"...", type char8_t, and similar, but there's still little/no built-in tooling for text processing.
For text processing work with char* whose bytes are some encoding/s of Unicode, e.g. UTF-8, then you an use a C library such as libunistring or ICU.
However the bytes of a char* could instead be an encoding of a non-Unicode character set, e.g. GB2312 encoded as EUC-CN.
So char* is character set and encoding agnostic. And C-the-language doesn't even try to offer you tools for working with different sets and encodings. Instead, you can use a library or write your own code for that purpose.
A number of languages make the same decision, keeping the string type set/encoding agnostic, with libraries taking up the slack.
In Nim, for example, the string type is essentially raw bytes (string literals in .nim sources are UTF-8). If you're doing Unicode text processing then you'd use facilities from the std/unicode module
https://nim-lang.org/docs/unicode.html
Same story with Zig
https://ziglang.org/documentation/0.8.0/std/#std;unicode
Lua too, and you'll probably use a 3rd party library such as luaut8 for working with Unicode/UTF-8
https://github.com/starwing/luautf8
Returning to the matter of pretty.c, since it's just sugar for C, it makes sense (to me) that the string type is just an alias for the set/encoding agnostic char*. It's up to the programmer to know and decide what the bytes represent and choose a library accordingly.
- String data will be properly encoded
- There is one encoding of strings (UTF-8 usually)
- Validation must occur when string data is created
- Truncating a logical codepoint is never acceptable
- You may not do string things to "invalid" bytes
- Proper encoding is the beginning and the end of validation
None of these things are consistently true. It's a useful practice to wrap validated byte sequences in a type which can only be created by validation, and once you're doing that, `Utf8String` and `EmailAddress` are basically the same thing, there's no reason to privilege the encoding in the type system.If it's "human-readable text", then fine, a string is not the same thing as an arbitrary byte array.
But lots of languages don't enforce that definition.
Before he wrote the Bourne shell the author wrote an ALGOL compiler, and ALGOL inspired Bourne syntax:
I think in Europe C was not as common as other languages at the time so the terseness looked odd.
Characters like []{}\|~ are behind multi-finger access and often not printed at all on the physical keys (at least in the past). You can see how this adds a hurdle to writing C…
Pascal was designed by a European, so he preferred keywords which could be typed on every international keyboard. C basically just used every symbol from 7-bit ASCII that happened to be on the keyboards in Bell Labs.
You get used to them, though you start feeling like a pianist after a short coding session. The one most annoying for me are the fancy javascript/typescript quotes, which I have to use all too often: ` - altgr+7.
Also practically everytime I need to write a comment, commit message or email I need my č, š and ž. It's kinda nice to have them only a single keypress away.
In addition, our layout, overwrites only the numerics – all other symbols are the same as on a US layout.
setxkbmap us -option ctrl:swapcaps -option compose:rwin
Problem solved. US layout, and with the right Window keys you can compose
European characters.For starters, that they’re on Linux, they feel comfortable running complex CLI commands, they can memorize the U.S. layout just like that, and that they can type without looking at the physical keys (because changing the virtual mapping means keys produce something else than what the label says).
In reality, the learner’s first exposure to C family languages is more likely to be a website where you can run some JavaScript in a text box. And the first hurdle is to figure out how to even type {}. American developers just completely forget about that.
It's looks like being deliberately designed for press/office usage and not for proper programming.
The AltGr brackets are fine. The truly annoying character to type is the backtick (which is a quite new addition to the pantheon of special characters, C doesn’t use it).
My personal opinion is that Niklaus Wirth had the better overall ideas about clarity and inclusiveness in programming language design, but that battle is long lost. (What you consider the character set needed for "proper programming" is really a relatively new development, mid-1990s and later.)
My intuition is that Perl would be the most challenging on a keyboard where it's harder to type unusual punctuation, since it feels like a very punctuation-heavy language, but I don't know whether it actually uses more than C (I think the backtick has a shell-style meaning in Perl too).
Well unless opting for something like Dvorak, you are indeed doomed to something that was specificcaly designed to please typewriter mechanical constraints without much care for the resulting ergonomics.
I use a Bépo layout personally, on a Typematrix 2030 most of the time, as French is my native language.
This is not far off from the guidelines in many cases, e.g. Windows code (well, not every variable of course.) A lot of Java design was copied from C++.
https://learn.microsoft.com/en-us/cpp/cpp/property-cpp?view=...
All in all: quite a solid attempt. I'll give you 8/10 for the design of this. The way you sketched this out in C using macros is really elegant. This actually looks like good code. Would I use it? It's a new language and I like C already. It could help people learn C and think about language design. Since the way you've done this is very clear.
"unless" seems more readable than "ifnt".
I've seen "loop" in other languages. But Qt calls it "forever", and that is indeed very pretty. Very Qt, even
#define ever ;;
for(ever) {}
#define never ;0;
for(never) {}
You can break a "forever" loop so I think "loop" is a better name.
repeat {}
repeat while <condition> {}
repeat {} while <condition>
repeat <count> {}
> The word "REPEAT" should not be used in place of "SAY AGAIN", especially in the vicinity of naval or other firing ranges, as "REPEAT" is an artillery proword defined in ACP 125 U.S. Supp-2(A) with the wholly different meaning of "request for the same volume of fire to be fired again with or without corrections or changes" (e.g., at the same coordinates as the previous round).
https://en.wikipedia.org/wiki/Procedure_word#Say_again
More seriously, PASCAL has repeat-until loops, similar to do-while loops in C.
Oh shit wait, you're John Tromp, BLC creator! I'm a fan!