Posted by todsacerdoti 5 days ago
- the same object can move between containers with no allocation and no need for a dedicated complex API
- the same object can be part of multiple containers at once; particularly useful for intrusive binary trees, for indexing data with different criteria.
- the container can be fully polymorphic, no need for all elements to be the same dynamic type.
- no need for complex allocators, you can just store the objects as you see fit.
This is true also of the previous design.
> no need for complex allocators, you can just store the objects as you see fit.
Same here; the previous design leaves it up to the user.
I'm not sure I understand this one. Since the object contains the reference to where it belongs inside a container (e.g. object.node.next) how can it be re-used in multiple containers. Conversely, in a non-intrusive data structure, multiple containers can hold a ref to the same object through an intermediate node object
In the non-intrusive case, all you know is how you got there, you don't have return info to the other structures— particularly annoying if you're trying to destroy an object and have to speculatively look for references to it in the other lists in order to get rid of them.
struct Node<T> { Node<T> *next; T x; }
```C++ struct MyType : Node<MyType> { Node<MyType> next, prev; // rest of data }; ```
I usually reach for Boost.Intrusive when I need this [0].
[0] https://www.boost.org/doc/libs/1_88_0/doc/html/intrusive/usa...
The layouts look the same?
In other words, intrusive is like this:
struct T : NodeType<T> { // T data NodeType<T> next, prev; };
whereas non-intrusive data structures the T's are not the nodes themselves
struct NodeType<T> { NodeType<T> next, prev; T t; };
Doing non-intrusive means you need to own the lifecycle of the nodes (and make them copyable, move-constructible, or some combo thereof). Intrusive means that the caller can manage the nodes' lifecycles and just say "here's a node, I promise it'll live while the list is still alive" and have that node be on the stack, heap, static, etc.
Having said that, intrusive linked lists are quite an important data structure in code which is required to be allocation-free. Kernel code, signal handlers and code which requires consistent performance (eg. audio processing) need to be able to move objects between containers or have objects in multiple containers without allocating.
But yes there is a place for these data structures. It just shouldn't be common.
I don't think you mean this, but elaborating on why a non-intrusive linked list is not very useful, that's because it takes linear time to find an object in a non-intrusive linked list, but constant time in an intrusive linked list.
For example:
struct Node {
Node* prev_by_last_update;
Node* next_by_last_update;
Node* prev_with_owner;
Node* next_with_owner;
Payload data;
}
This intrusive structure allows reassigning and updating nodes, whilst being able to efficiently find eg. the next most recently updated node, or to iterate over nodes with the same owner.Extremely few can use one safely.
They're an elegant weapon for a more civilized age - namely one without multithreading, aliasing-based optimization, and a host of other nice things.
They're also pretty useful for free-list structures which overwrite entries to be removed (they're scheduled for deletion thus assumptiong here is that the object is now dead) with linked list nodes that would fit in that space thus never needing to allocate any additional as you can reuse existing space.
Anecdotally, I’ve seen wait-free structures used incorrectly many more times than I’ve seen them used correctly. Some people think they are magically faster because “acquiring a mutex lock is expensive”, but that’s far from always the case.
More the opposite, their simplicity makes them safer to deal with than the alternatives. And sometimes their unique features (mostly the intrusive variant) are exactly what you need.
Multi-threaded accesses to a vector would be just as bad?
Trying to write a safer version of the above without losing the performance / advantages takes you into the realm of ATS2 https://www.cs.bu.edu/~hwxi/atslangweb/ which quickly becomes more difficult to work with than practical (and no, rust can't solve this with it's implementation of resource types as far as I'm aware).
So yes, doing the above makes sense when the problem demands it as allocating memory you don't need when there's a clear way to represent a graph without it just wastes cycles on memory management you didn't even need to do.
It depends on whether you approach programming from a perspective of writing piles of obviously correct abstraction layers (the total effect of which is to leak so much the solution ends up being wrong anyway), or from the direct unabstracted perspective of actually making the computer do what you want it to do (which does not preclude the solution from using abstractions).
I call them the Uncle Bob vs Casey Muratori school, after Casey's lectures such as "Clean Code, Horrible Performance": https://www.youtube.com/watch?v=tD5NrevFtbU
I'm not for forcing inefficient safe data structures. You are pretending that this is a dichotomy when the vast majority of the time it isn't. You are completely misinterpreting what I said. I'm for safe data structures. And the data structures described are huge footguns. I made no point about efficiency.
> He also hates static typing
I mean, in OOP there are many reasons to dislike it. It's harder to extend and test. Calls to such functions within other methods are a nuisance to remove. E.g. `Instant.now()`.
In a cloud DB webservice what will save you is having a good tracing/debugging story. From it, it will be easy to see E2E performance. Sure, some things will translate, but going SIMD optimized layout on a class, that will have to talk to DB across the ocean is a fool's errands.
To optimize you must have a good target and make sure you are measuring correctly.
Until a colleague or future you, or someone else calling the code, forgets.
Which leads to "I WANT TO KILL WHOEVER WROTE THIS" debugging sessions.
Encapsulation exists for a reason.
var x: *T = @fieldParentPtr("node", node_ptr);
@fieldParentPtr knows that its result must have type pointer T; and then the builtin function "looks" inside of T and notices there is a "node" field. Then it can backtrack the pointer of T given the pointer to the node field, since the layout of T is well-defined at compile time.So you could have a mixed-type linked list, and at the callsite zig can at compile-time back out the parent type. There obviously needs to be outside logic (and an outside mechanism to track) this with branches for each type that you could be putting into the linked list. And yes, this is more than a little bit type-unsafe because of the potential for type erasure.
You could probably pretty trivially write a typesafe wrapper for this, as suggested in one of the comments here.
Is this something like "list starts with type A, and that will tell you if the next node is type A or B"?
Is there a way to stuff those bits in with the node somehow so you always know the type you expect back?
Yup. You'd have to set that up yourself. It could be anywhere. It could be in a separate array, it could be in the previous node-data, etc.
you wouldn't put it in the Node (capital N) datastructure though, that is privileged.
Personally, I would say doing something like this is "bad idea bears" unless you are absolutely certain you need it and you should explore using, say, data payload type with a union field instead, if possible (though know that that comes with a potential memory penalty)
I don’t get this. In the generic version the data is embedded with the linked list node, so there’s only one allocation per node. ‘T’ isn’t a pointer (or doesn’t have to be) there’s no indirection between data and node.
I think the reasoning behind this change is that (from Zig's perspective), if you are using linked lists you are probably doing something wrong, unless your application requires the above-mentioned kind of juggling, which favors explicitly intrusive LLs. In addition, this change de-generifys the lists's methods, like `prepend`, which reduces code size a little.
At least that's my understanding.
You could also do this with the entire node in the case of a generic implementation though, the API just needs to expose the node struct to you and allow you to detach it; but the same is true for this new implementation as well.
In terms of memory, a detached node that embeds the payload struct isn't different from an object with an embedded detached node.
What changes is that now, if you have an object class that you don't want to (or can't) extend to include a list node, you have to wrap it in a container struct that, again, looks the same in memory but now has a node and your object as its members. I'm not sure if this is really much of an improvement at the end of the day.
Also, correct me if I'm wrong (I don't do zig), but shouldn't it be possible to create a list of void elements from the generic implementation and embed its node type inside your object, then proceed from there as if it were the new implementation?
This, by the way, is a typical design consideration in Zig -- using "friction" to steer programmers away from doing the Wrong Thing (according to Andrew). In addition, Zig is really more of a fancy macro-assembler for outputting optimal code, and less a pragmatic general-purpose language, and makes design decisions accordingly. Taking both together, the linked-list change sort of makes sense, even though personally, I would have just added a separate intrusive list data structure.
An object can also be part of multiple lists, each used by a different subsystem with its own concerns.
Deleting an item involves unlinking it from all lists (though that requires a doubly-linked list)
Edit: found the article
https://www.codeofhonor.com/blog/tough-times-on-the-road-to-...
The reason is that the object being placed "in" the list can have a longer lifetime than the list node, in fact that's generally the case. Like, I work on Zephyr and in Zephyr we have "threads", whose tracking structs are managed by the application code. But they can be in kernel-maintained lists like a scheduler queue, which is an ephemeral state that the essentially-perpetual thread will long outlive. There's no place to allocate a "generic" node at list insert time.
(Also, more importantly, Zephyr is an RTOS and disallows heap use in the core parts of the system to permit use in environments that need to disallow dynamic allocation. But that's not really relevant to a generics discussion.)
But you can trivially put a scheduler queue node right into the thread object, knowing that it's a behavior of the type. The semantics become different though: you only get the one node, it's not possible to have multiple "lists of threads" using this technique unless you know exactly what each list is for ahead of time.
I believe the implication is there's fewer than one allocation per node with the new API. You allocate contiguous memory once, and use it to store n elements.
> The new version isn't generic. Rather, you embed the linked list node with your data. This is known as an intrusive linked list and tends to perform better and require fewer allocations. Except in trivial examples, the data that we store in a linked list is typically stored on the heap. Because an intrusive linked list has the linked list node embedded in the data, it doesn't need its own allocation.
Is simply a misunderstanding. The storage layout will be the same for the generic and the intrusive one.
The benefit of intrusive linked lists is that each node can be a member of several linked lists with a single allocation per node. This is used heavily e.g. in the Linux kernel.
The cost is that you need to keep track of the offset from object address to which linked list pointer you're currently dealing with. That's often a compile time constant, but makes the API more awkward. In this case it seems to be the string "node", which seems nice enough. C libraries often use the preprocessor to do something similar.
new Node<TStruct>[16];
new TStructContainingNode[16];
What’s the difference?What are the benefits of this approach? Is it limited to data alignment, or is it out of a greater desire to remove generics?
Upon connecting the object is connected to the channel X linked list as well as some other list of objects that are killed at the timestamp 5 minutes in the future.
With an intrusive linked list the link-node resides within the object, the only thing needed when installing is linking it into the 2 lists (this is a few move-operations), an external linked list would _require 2 extra allocations_ for the linked-list nodes.
Broadcasting from X is almost the same since it's a simple traversal, albeit with double the cache pressure since the object and the linked-list node probably lives separately.
The real HORROR comes when disconnecting, if channel X has millions of listeners it could become a real bottleneck to traverse the linked list to find the link-node that connects the actual object since there could be tons of jumping around memory. An intrusive linked list would just disconnect itself if it's doubly-linked.
This is why hashsets/tables,vectors/arraylists,etc are often used in practice (because many "modern" OO languages never added good support for the machinery needed here) making linked lists look quite bad (there is other badness but using non-intrusive linked lists is almost always worse than using something else than a linked list altogether).
The removal scenario is merely specifying that you are passing in ConnectionListNode instead of a Connection. Although maybe it's a good idea to think about how they compose comparatively.
On AmigaOS (which is entirely built out of intrusive doubly linked list) the list node is placed at the start of a struct, so the pointer to the node is also the pointer to the linked item. There's also no 'coupling' because the list manipulation functions take pointers to the list node structs, but they don't need to know the 'outer' item struct.
Zig's @fieldParentPtr is more flexible since the node struct can be anywhere in the parent struct and you still can get from the node pointer to the item base pointer (and more importantly, it makes it trivial to link the same item into multiple linked lists).
I don't know if this was the motiviation for the design change though, but IMHO intrusive linked list are the 'proper' linked list design, and C++-style "nodes with payload" are a weird outlier.
Another reason might be that generic code leads to a lot of code duplication in the binary (which may or may not be optimized away by the compiler though - but at the cost of increased build times though).
Perhaps I'm missing something, but isn't it the other way around? In C, this is what I'd expect for the intrusive and extrusive:
// Intrusive
struct user_data_t {
struct user_data_t *next;
struct user_data_t *prev;
// The object's fields
};
// Extrusive
struct node_t {
struct node_t *next;
struct node_t *prev;
struct user_data_t *payload;
};
The intrusive one obviously can't be shared between lists. typedef struct { node_t* next } node_t;
typedef struct { ... } payload_t;
typedef struct {
node_t node;
payload_t payload;
} user_data_t;
...and if you want user_data_t to be included into multiple lists: typedef struct {
node_t list1_node;
node_t list2_node;
node_t list3_node;
payload_t payload;
} user_data_t;
...of course in C now it gets tricky to get to the start of user_data_t given a pointer to `list3_node`, but that's where @fieldParentPtr comes in.The advantage versus your extrusive example is that the payload doesn't need to be referenced through a pointer, which drastically simplifies lifetime tracking / memory management.
typedef struct { node_t* next } node_t;
typedef struct { ... } payload_t;
typedef struct {
node_t node;
payload_t payload;
} user_data_t;
That's exactly the same as my intrusive structure, no?> typedef struct { node_t list1_node; node_t list2_node; node_t list3_node; payload_t payload; } user_data_t;
In C, at any rate, that doesn't give you "inclusion into multiple lists". It gives you "inclusion into at most 3 lists. The extrusive example I posted gives "inclusion into multiple lists".
So, yeah, I'm still not seeing your point.
It's not, for the reason they put in parentheses:
> note: no pointers in user_data_t
An allocation of user_data_t also allocates space for payload_t, rather than just allocating space for a pointer to payload_t. Your structure requires an additional allocation to point the payload_t* at something. The fact that they hid the next node_t* in a struct doesn't matter though.
>> note: no pointers in user_data_t
I feel like I am taking crazy pills: where, in my intrusive example, are pointers in the `user_data_t`?
My intrusive code sample had no pointers for the user_data_t. It's exactly the same as GP's intrusive example.
EDIT: Previously: https://news.ycombinator.com/item?id=26988839
https://github.com/codr7/hacktical-c/tree/main/list
One pretty big advantage is you get some help from the compiler making sure you mean what you say. As opposed to just blindly casting items and assuming no one forgot to put the list in their struct. Also the possibility to be part of multiple lists.
The benefits of intrusive linked lists are higher performance; you can use comptime to still have decoupled code.
- linked lists aren’t useful on modern systems because traversing them causes to many cache misses. Therefore, we shouldn’t provide such a simple implementation.
- but we need one in low level OS code, and there, intrusive lists are preferred. Their limitation that you cannot store an object in multiple lists isn’t a problem, and the extra speed and the fact that you can move items between lists without allocations is desired.
I don’t see why the intrusive implementation has to be so basic, though. Can’t you, in Zig, express that a generic type T used in a generic list class has to have a nullable next field that points to a T?
Only if you "default to using" (and only use) malloc. Zig encourages you to use different allocators within the same program for different use cases, including, potentially allocators which will thrash the cache far less (for example thread local arenas).
Another advantage is smaller code size, as the compiler doesn't need to generate code for SinglyLinkedList(i32), SinglyLinkedList(User), and SinglyLinkedList(PointCloud). This could have a performance impact by making it more likely that code remains in the cache.
Usually if I have multiple lists holding something I have one that's the 'owner' and then the secondary data structures would have a non-owning pointer to it. Is that the case where the performance would be better with an intrusive list? My intuition would be that having multiple Node members would pollute the cache and not actually be a performance win but maybe it is still better off because it's all colocated? Seems like the kind of thing I'd have to benchmark to know since it would depend on the number of lists and the size of the actual data.
Since you shouldn't reach for a linked list as a default data structure modern hardware anyway, I actually do see how this change makes sense for Zig. Neat!
I can only give a handwavey answer because I've yet to see any data, and if an engineer tells you something is better but doesn't provide any data, they're not a very good engineer. So grain of salt and all that. But the answer I got was because cache performance. Writing code this way your CPU will spend less time waiting for main memory, and the branch predictor will have better luck. The argument makes sense, but like I said,I've yet to see real world data.
> isn't the resulting type exactly the same in memory unless you intentionally made T a pointer type?
Yes and no. If I understand what you mean, the bit layout will be the 'same'. But I think your confusion is more about how what a compiler means by pointer type, and what a human means. If you pull away enough layers of abstraction, the compiler doesn't see *Type it'll only see *anyopaque, phrased completely differently; according to the compiler, all pointers are the same and are exactly memory_address_size() big. *Type doesn't really exist.
Writing it this way, imagine using just the LinkedList type, without a container of any kind. node to node to node, without any data. While it would be pointless, that would (might) be faster to walk that list, right? There's no extra data loads for the whole struct? That's what this is. Using it this way it gets complicated, but translating theory to asm is always messy. Even more messy when you try to account for speculative execution.
Check out the implementation of SinglyLinkedList in the latest release (before the change in the post)[0]. You'll notice the argument for SinglyLinkedList is (comptime T: type), which is used in the internal Node struct for the data field, which is of type T. Notably, the data field is not a *T.
In Zig, when you call the function SinglyLinkedList with the argument i32 (like SinglyLinkedList(i32)) to return a type for a list of integers, the i32 is used in the place of T, and a Node struct that is unique for i32 is defined and used internally in the returned type. Similarly, if you had a struct like Person with fields like name and age, and you created a list of Persons like SinglyLinkedList(Person), a new Node type would be internally defined for the new struct type returned for Person lists. This internal Node struct would instead use Person in place of T. The memory for the Node type used internally in SinglyLinkedList(Person) actually embeds the memory for the Person type, rather than just containing a pointer to a Person.
These types are very much known to the compiler, as the layout of a Node for a SinglyLinkedList(i32) is not the same as the layout of a Node for a SinglyLinkedList(Person), because the argument T is not used as a pointer. Unless, as the gp mentioned, T is explicitly made to be a pointer (like SinglyLinkedList(*Person)).
[0] https://ziglang.org/documentation/0.14.0/std/#src/std/linked...
That's not realistically dealt with by the compiler re-organizing the struct layout.
pub const SinglyLinkedList(comptime T: type) type {
return struct {
first: ?*Node = null,
pub const Node = struct {
next: ?*Node = null,
};
const Self = @This();
pub fn data(self: Self) *T {
return @fieldParentPtr("node", self);
}
};
const some_data = struct {
// Some data fields
// ...
bar_node: SinglyLinkedList.Node,
baz_node: SinglyLinkedList.Node,
};
I do get why you need an array-like list, a dictionary/hashmap, a stack and a queue. I got the impression that linked lists aren't used very often. Or maybe it's a different case with Zig?
SinglyLinkedList is used in the standard library in std.Thread.Pool, and std.heap.ArenaAllocator. DoublyLinkedList is used in the http client's connection pool, as well as the ObjectCache for the package manager's git client.
I did some cursory research on the software I've been using today and presented the results here: https://news.ycombinator.com/item?id=43684121
I think there is a misconception around linked lists mostly stemming from the fact that they're not great for cache locality. Someone presents the idea that linked lists under common circumstances are unintuitively slow and therefore should be considered carefully for performance sensitive applications. A game of Chinese whispers immediately forms around that idea and what comes out the other end is a new idea: that linked lists are bad, shouldn't be used and aren't used. Meanwhile, they continue to be used extensively in system software.
Back in days of yore, small memory and CPU meant that linked lists and arrays were really the only effective data structures. Allocating and computation were super expensive, so following a link or having an index is really the only thing that works.
On today's computers, CPU is so much faster than memory that it is almost always better to recompute a value than try to look it up. Locality is so important that linear actions trump asymptotic complexity until you get quite a lot more elements than you might expect. Memory is so cheap that it is worth exponentially overallocating in order to amortize complexity. etc.
At the end of the day with modern programming, you should almost always be reaching for something simpler than a linked list (array/vector) or you should be reaching for an abstraction that is more complex than a linked list (hash/dict, queue/deque, etc.).
With modern CPUs, if you are using a linked list for anything other than building a more complex data structure, you're probably making an incorrect choice on almost all fronts.
The reverse of this can be true. In the following paper, they found that William's Heap Construction (one by one, O(n log n) time) beat Floyd's Heap Construction (O(n) time) when the input is larger than the size of off-chip cache.
Which operation of a linked list has any better complexity. The only one I can think of is a frequent removal of random elements in the middle -- and opt not to use tombstones.
Their prime use is that adding new elements lack a terrible worst case, aside the fact it puts more pressure on the memory allocator - need to be concurrent friendly.
If there is a full control of the memory allocation (e.g. kernels) linked lists make sense, for virtually all other cases there are better data structures.
What if I only ever iterate over a list of some hundred entries in the rare case of an error? Should I be reaching for the most performant data structure or the one that most obviously and clearly maps to the operations the program needs to perform on it at some infinitesimal cost?
> Linked lists have higher constant const memory wise
Than a fixed vector with a known maximum number of elements, yes. With something like a dynamic array you are however usually better off using a linked list memory-wise, unless you want to reallocate every few insertions or your node data is smaller than the link.
> effectively removals of a random in the middle are the same as vector.
You're conflating concepts. Seeking is O(n). Removal is O(1). There are cases where seeking for the purpose of removal is either amortized (e.g. you're already iterating over all the nodes, executing an associated handler for each which may or may not determine that the node should be unlinked in-place) or completely unnecessary (some other data structure already tracks a reference to the node and some code associated with that determines if/when it is removed).
One might similarly argue that I am conflating compacting (O(n)) with removal (O(1)) in the vector case, but in the case of linked lists, the difference is just a natural consequence of design, while for a typical implementation of vector operations they are the same operation and require an additional data structure to be anything but that in order to track the holes.
Moving elements around in memory also comes with a host of other issues, for example if there are many references to the elements, requiring maintaining and indirectly accessing memory via an index array.
> aside the fact it puts more pressure on the memory allocator - need to be concurrent friendly.
This is again a conflation of concepts: that of using linked lists and that of making space for the nodes or preparing an allocator for concurrent application. The nodes could be on the stack for some applications, in a static pool for others.
So, you are completely correct. The issue is simply that you are at a different level of abstraction.