Top
Best
New

Posted by jer-irl 5 hours ago

Show HN: Threadprocs – executables sharing one address space (0-copy pointers)(github.com)
This project launches multiple independent programs into a single shared virtual address space, while still behaving like separate processes (independent binaries, globals, and lifetimes). When threadprocs share their address space, pointers are valid across them with no code changes for well-behaved Linux binaries.

Unlike threads, each threadproc is a standalone and semi-isolated process. Unlike dlopen-based plugin systems, threadprocs run traditional executables with a `main()` function. Unlike POSIX processes, pointers remain valid across threadprocs because they share the same address space.

This means that idiomatic pointer-based data structures like `std::string` or `std::unordered_map` can be passed between threadprocs and accessed directly (with the usual data race considerations).

This accomplishes a programming model somewhere between pthreads and multi-process shared memory IPC.

The implementation relies on directing ASLR and virtual address layout at load time and implementing a user-space analogue of `exec()`, as well as careful manipulation of threadproc file descriptors, signals, etc. It is implemented entirely in unprivileged user space code: <https://github.com/jer-irl/threadprocs/blob/main/docs/02-imp...>.

There is a simple demo demonstrating “cross-threadproc” memory dereferencing at <https://github.com/jer-irl/threadprocs/tree/main?tab=readme-...>, including a high-level diagram.

This is relevant to systems of multiple processes with shared memory (often ring buffers or flat tables). These designs often require serialization or copying, and tend away from idiomatic C++ or Rust data structures. Pointer-based data structures cannot be passed directly.

There are significant limitations and edge cases, and it’s not clear this is a practical model, but the project explores a way to relax traditional process memory boundaries while still structuring a system as independently launched components.

46 points | 33 commentspage 2
whalesalad 4 hours ago|
I feel like this could unlock some huge performance gains with Python. If you want to truly "defeat the GIL" you must use processes instead of threads. This could be a middle ground.
hun3 4 hours ago||
This is exactly what subinterpreters are for! Basically isolated copies of Python in the same process.

https://docs.python.org/3/library/concurrent.interpreters.ht...

If you want a higher-level interface, there is InterpreterPoolExecutor:

https://docs.python.org/3/library/concurrent.futures.html#co...

short_sells_poo 4 hours ago||
How would this really help python though? This doesn't solve the difficult problem, which is that python objects don't support parallel access by multiple threads/processes, no? Concurrent threads, yes, but only one thread can be operating on a python object at a time (I'm simplifying here for brevity).

There are already means of passing around bulk data with zero copy characteristics in python, but there's a lot of bureaucracy around it. A true solution must work with the GIL (or remove it altogether), no?

jer-irl 4 hours ago||
I'm not familiar with CPython GC internals, but I there there are mechanisms for Python objects to be safely handed to C,C++ libraries and used there in parallel? Perhaps one could implement a handoff mechanism that uses those same mechanisms? Interesting idea!
philipwhiuk 4 hours ago|
This is basically 'De-ASLR' is it not?
jer-irl 3 hours ago|
Could you clarify what you mean by that? This does heavily rely on loaded code being position-independent, because the memory used will go into whatever regions `mmap(..., ~MAP_FIXED)` returns.