Top
Best
New

Posted by robin_reala 16 hours ago

RIP pthread_cancel(eissing.org)
188 points | 85 commentspage 2
enva2712 3 hours ago|
i’ve run into this issue with pthread_cancel

it makes cleanup logic convoluted when you can’t easily pthread_cleanup_push, forcing you to either block on the join or signal cancellation and detach

hacker_homie 9 hours ago||
what's old is new again, I loved java in the early 2000's trying to remotely stop a thread

Thread.destroy() Thread.stop() Thread.suspend()

so much potential for corrupted state.

Aardwolf 14 hours ago||
Maybe this is naive, but could there just be some amount of worker threads that run forever, wait for and take jobs when needed, and message when the jobs are done? Don't need to be canceled, don't block
danappelxx 14 hours ago||
If the DNS resolution call blocks the thread, then you need N worker threads to perform N DNS calls. Threads aren’t free, so this is suboptimal. OTOH some thread pools e.g. libdispatch on Apple operating systems will spawn new threads on demand to prevent starvation, so this _can_ be viable. Though of course this can lead to thread explosion which may be even more problematic depending on the use case. In libcurl’s situation, spawning a million threads is probably even worse than a memory leak, which is worse than long timeouts.

In general, what you really want is for the API call to be nonblocking so you’re not forced to burn a thread.

variadix 11 hours ago|||
Yeah, I’m not sure I see the problem (other than that threads are more expensive than e.g. file descriptors, but this is a moot point without a better API). You define how many requests in flight you want to allow and that sets the cap on how many worker threads you spawn/use, you could also support an unbounded number in flight this way by lazily spawning worker threads per requests. Cancellation/kill interfaces for multithreading are pretty much always a footgun. Even for multiprocessing on modern machines, if you’re doing something non-trivial and decide to use SIGKILL to terminate a worker process, it’s easy to leave e.g. file system resources in a bad state.
ComputerGuru 14 hours ago||
This is, essentially, what the previous (largely pathetic) excuse for true asynchronous I/O on Linux did with the libc aio(7) interface to essentially fake support for truly asynchronous file IO. It wasn’t great.
Someone 14 hours ago||
> Then it needs to sort them if there is more than one address. And in order to do that it needs to read /etc/gai.conf

I don’t see why glibc would have to do that inside a call to getaddrinfo. can’t it do that once at library initialization? If it has to react to changes to that file while a process is running, couldn’t it have a separate thread for polling that file for changes, or use inotify for a separate thread to be called when it changes? Swapping in the new config atomically might be problematic, but I would think that is solvable.

Even ignoring the issue mentioned it seems wasteful to open, parse, and close that file repeatedly.

loeg 14 hours ago||
I think the libc people might argue this level of functionality is just outside the scope of libc. (Arguably, it is a mistake for DNS to be part of libc, given how complicated it is.)
ComputerGuru 14 hours ago||
To be sure, complexity isn’t the determinator for whether something is or isn’t in scope for libc though.
loeg 6 hours ago||
Historically libcs have been leery of imposing threading on otherwise singlethreaded applications; and for similar reasons, try to minimize startup costs.
cesarb 12 hours ago|||
> I don’t see why glibc would have to do that inside a call to getaddrinfo. can’t it do that once at library initialization?

If it were a library dedicated to DNS, sure, but glibc is used by nearly every process in the system, including many which will never call getaddrinfo.

NewJazz 12 hours ago||
You want libc to start a thread whenever it is loaded?
Someone 3 hours ago||
If it’s the only alternative to being broken: yes.

It could read and parse the file the first time a thread gets created, too.

A cheaper alternative is to check the modification time of the config file, and only reparse it in pthread_cancel when that changes. That doesn’t 10% fix the problem in theory, but would do it in practice.

jart 14 hours ago||
Why can't they help fix the C library in question? Cancelation is really tricky to implement for the C library author. It's one of those concepts that, like fork, has implications that pervade everything. Please give your C library maintainers a little leeway if they get cancelation wrong. Especially if it's just a memory leak.
RedShift1 14 hours ago|
I'm betting this code is so old and its behavior so ingrained everywhere else that nobody dares touching it.
jart 8 hours ago||
No it sounds like they just need to add a pthread_cleanup_push() call somewhere in the getaddrinfo() implementation.

C libraries are not black magic. Nor are they holy code. We needn't fear them. It's the simplest part of the software stack.

nly 14 hours ago||
Why is running the DNS resolution thread a problem? It should be dequeuing resolution requests and pushing responses and sleeping when there is nothing to do

When someone kills off the curl context surely you simply set a suicide flag on the thread and wake it up so it can be joined.

foota 14 hours ago||
The thread started sounds like it's single use, not a thread handling requests in a loop. Anyway, a single thread handling requests in a loop would serialize these DNS lookups which if they're hanging would be problematic.
loeg 14 hours ago||
Yes, but why? As GP notes, the thread doesn't have to be single-use.
rwmj 14 hours ago||
One problem may be that fork() kills background threads, so now any program that uses libcurl + fork has to have a new API to restart the DNS thread (or use posix_atfork which is a big PITA), and that might break existing programs using curl.
ComputerGuru 14 hours ago|||
It’s not too much of an exaggeration to say that everything about using fork() instead of vfork() plus exec() is essentially fundamentally broken in modern osdev without a whole stack of hacks to try and patch individual issues one-by-one.
EPWN3D 9 hours ago||
It's not an exaggeration in any sense. fork(2) basically cannot be done correctly in modern userspace stacks.
loeg 14 hours ago|||
A surmountable problem, sure.
rwmj 14 hours ago||
Sometimes. To give one counterexample, golang doesn't have a way to restart the threads it uses for I/O (apparently a decision the golang developers made), so if you're embedding golang code in another binary, it better not call fork. (The reason for this warning: https://gitlab.com/nbdkit/nbdkit/-/commit/2589e6da40939af9ae...)
loeg 6 hours ago||
This is a policy choice of Golang, but a C library like Curl (the topic of this thread) is not constrained by the policy choices of the Go developers. Curl could use MADV_WIPEONFORK or other primitives to restart its DNS thread automatically.
charcircuit 10 hours ago||
Why isn't DNS in a service on the operating system instead of libc? You'll want requests to be locally cache anyways. This also makes it easier to just abandon a RPC instead of stopping a thread you don't control.
cesarb 9 hours ago|
> Why isn't DNS in a service on the operating system instead of libc?

On modern Linux systems, it is: systemd-resolved (https://www.freedesktop.org/software/systemd/man/latest/syst...) is a system service which can be queried through RPC (using dbus or varlink), through the traditional glibc APIs (using a NSS plugin), or even by being queried on the loopback interface as if it were a normal recursive DNS server (for compatibility with software which bypasses glibc and does direct DNS queries).

kelnos 5 hours ago|||
Not sure how standard that is, though. E.g. on Debian systemd-resolved isn't even installed by default, let alone enabled and set up as the default resolver.
charcircuit 6 hours ago|||
Well from the blogpost it looks like calling it via glibc does extra work than just querying the service.
gary_0 14 hours ago||
[deleted]
okl 14 hours ago|
https://c-ares.org/
throwaway81523 14 hours ago|
There might be a way to getaddrinfo asynchronously with io_uring by now. Otherwise just call the synchronous version in another thread and let it time out so the thread exits normally, right? Why bother with pthread_cancel?
gary_0 13 hours ago||
The problem is that the standard library function is specified to be blocking (and it's in userspace, so io_uring is not relevant). It's quite possible to do a non-blocking DNS lookup but you have to use a separate non-standard library (like c-ares).
yxhuvud 14 hours ago|||
No. Getaddrinfo is libc, not the kernel. It is of course possible, but complicated, to implement dns resolution with io_uring, but making it behave the same as glibc is very much a nontrivial piece of work.
loeg 14 hours ago|||
io_uring is for calling kernel APIs; this is a userspace API.
1over137 13 hours ago||
io_uring is a linux-ism, curl is cross-platform.