Posted by ingve 3 days ago
- https://mcyoung.xyz/2021/06/01/linker-script/
Sections and segments are more or less the same concept: metadata that tells the loader how to map each part of the file into the correct memory regions with the correct memory protection attributes. Biggest difference is segments don't have names. Also they aren't neatly organized into logical blocks like sections are, they're just big file extents. The segments table is essentially a table of arguments for the mmap system call.
Learning this stuff from scratch was pretty tough. Linker script has commands to manipulate the program header table but I couldn't figure those out. In the end I asked developers to add command line options instead and the maintainer of mold actually obliged.
Looks like very few people know about stuff like this. One can use it to do some heavy wizardry though. I leveraged this machinery into a cool mechanism for embedding arbitrary data into ELF files. The kernel just memory maps the data in before the program has even begun execution. Typical solutions involve the program finding its own executable on the file system, reading it into memory and then finding some embedded data section. I made the kernel do almost all of that automatically.
https://www.matheusmoreira.com/articles/self-contained-lone-...
Generally there are many sections combined into a single segment, other than special-purpose ones. Unless you are reimplementing ld.so, you almost certainly don't want to touch segments; sections are far easier to work with.
Also, normally you just just call `getauxval`, but if needed the type is already named `ElfW(auxv_t)*`.
They are both metadata about file extents and their memory images.
> sections are far easier to work with
Yes. They are not, however, loaded into memory by default. Linkers do not generate LOAD segments for section metadata since they are not needed for execution. Thus it's impossible for a program to introspect its own sections without additional logic and I/O to read them into memory.
> Also, normally you just just call `getauxval`, but if needed the type is already named `ElfW(auxv_t)*`.
True. I didn't use it because it was not available. I wrote my article in the context of a freestanding nolibc program.
It's trivial to put arbitrary files into sections:
objcopy --add-section program.files.1=file.1.dat \
--add-section program.files.2=file.2.dat \
program program+files
The problem is the program.files.* sections do not get mapped in by a LOAD segment. I ended up having to write my own tool to patch in a LOAD segment into the segments table because objcopy does not have the ability to do it.Even asked a Stack Overflow question about this two years ago:
https://stackoverflow.com/q/77468641
The only answer I got told me to simply read the sections into memory via /proc/self/exe or edit the segments table and make it so that the LOAD segments cover the whole file. I eventually figured out ways to add LOAD segments to the table. By that point I didn't need sections anymore, just a custom segment type.
Use `ld --verbose` to see what sections are mapped by default (it is impossible for a linker to work without having such a linker script; we're just lucky that GNU ld exposes it in a sane form rather than hard-coding it as C code). In modern versions of the linker (there is still old documentation found by search engines), you can specify multiple SECTIONS commands (likely from multiple scripts, i.e. just files passed on the command line), but why would you when you can conform to the default one?
You should pick a section name that won't collide with the section names generated by `-fdata-sections` (or `-ffunction-sections` if that's ever relevant for you).
Everytime I deal with target triples I get confused and have to refresh my memory. This article makes me feel better in knowing that target triples are an unmitigated cluster fuck of cruft and bad design.
> Go does the correct thing and distributes a cross compiler.
Yes but also no. AFAIK Zig is the only toolchain to provide native cross compiling out of the box without bullshit.
Missing from this discussion is the ability to specify and target different versions of glibc. Something that I think only Zig even attempts to do because Linux’s philosophy of building against local system globals is an incomprehensibly bad choice. So all these target triples are woefully underspecified.
I like that at least Rust defines its own clear list of target triples that are more rational than LLVM’s. At this point I feel like the whole concept of a target triples needs to be thrown away. Everything about it is bad.
Ideally each component in the target "triple" would be a separate argument.
* https://en.wikipedia.org/wiki/Endianness#Hardware
Is there anything that is used a lot that is not little? IBM's stuff?
Network byte order is BE:
All of my custom network serialization formats use LE because there’s literally no reason to use BE for network byte order. It’s pure legacy cruft.
I’m more than happy to static_assert little endian. If any platform needs BE support then I’ll add support to the minimum amount of libraries necessary to do so. Super easy.
Here’s the thing. If you wrote BE compatible code today you probably dont even have a way to test it. So you’re adding a bunch of complexity and doing a bunch of work that you can’t even verify is correct! Complete and total waste of time.
In practice, any real JVM implementation will simply use native byte order as much as possible. While bytecode and other data in class files is serialized in big endian order, it will be converted to native order whenever it's actually used. If you do pull out the unsafe APIs, you can see that e.g. values are little endian on x86(-64). The JVM would suffer from major performances issues if it tried to impose a byte order different from the underlying platform.
The comments in `config.guess` and `config.sub`, which are the origin of triples, use a large variety of terms, at least the following:
configuration name
configuration type
[machine] specification
system name
triplet
tuple
This is technically incorrect. The 286 had protected mode. It was a 16-bit protected mode, being a 16-bit processor. It was also incompatible with the later protected mode of the 386 through today’s processors. It did, however, exist.
That's a wild take. I think its pretty universally accepted the GCC and the GNU toolchain is what made this ubiquitous.
Also, the x32 ABI is still around, support is still around, I don't know where the author got that notion
It also doesn't really tell me anything about the content, except where I'm going to see tables or code blocks, so I'm not sure what the benefit is.
Given the really janky scrolling, I'd like to have a way to hide it.
Why isn't it called wasm32-none-none?