-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rare class init / atomic property deadlock #284
Comments
I'm still trying to work through this, but initially I'm reaching for adding another set of spinlocks as a fairly straightforward solution? So something along the lines of dropping
And then update from one I'm not 100% on this, but it does appear as though the And that should at least ensure Update: added PR #285 as a better illustration of the idea. |
I think that is the right solution, yes. I, a bit confused about how you get here though. The retain call will create a dtable only if the class of the object has not been sent any messages before. I’m not sure what would trigger that. How are you assigning an object as an ivar of another object without ever sending it any messages? I guess it must be an unsafe property or it would have been sent a retain message on the way in. |
That is a great question... I didn't actually look into the application code (I was just brought that stack trace, and told it was only happening during initial startup, which is what lead me down this path). I wouldn't be surprised as I have definitely seen a number of direct ivar assignments of a properties backing storage in the codebase in question. I can definitely confirm if that is the case though. |
Well now I'm confused too. The initial assignment for the property I was told produced this specific stack trace was assigned via the property accessor, IE:
...? That would definitely mean it had to run through |
The +alloc there should have initialised the class, so we shouldn’t hit that path at all. Can you run with Asan or valgrind? You would see the same symptoms if the ivar contained a dangling pointer. |
Wait... did you say that this class is also directly assigning to the ivar? That's almost certainly UB: if you have an atomic property, then you must use it to get the atomicity. If you are assigning to an ivar directly and it is concurrently assigned to via an atomic property then this is not atomic and may introduce temporal safety bugs. |
I thought it was a possibility, in the codebase there is ~20 years worth of code, and some of it definitely has ... less than ideal 'features', lets say? However, I did double check the specific application code the stack trace above came from and it does seem to be uniformly using property accessors correctly (no unsafe direct accesses). |
Yeah, I can try to get another repro with valgrind. I might even try to create a more simple (more importantly, non-proprietary) example I can share if I can get something that'll somewhat reliably repro this issue (I think getting those pointer hashes to collide is the key, which makes this pretty random).
So the flow leading up to this deadlock is super simple, and nothing other than an initial assignment then method call happens on the properties. Basically flows like:
Im not too sure how we'd get to a point where there is a dangling pointer for that property, where that is all very straightforward, and even all happens on a single thread...? Unfortunately, I couldn't find another path through the code that would lead to taking out two spinlocks... But it feels like there must be something else that I'm still missing here... |
There’s one other way you could get there: if the isa pointer of the object in the ivar is corrupted (e.g. to point to a different, u initialised, class). |
Okay, so had some other fun to attend to the last couple of days, but the latest update is: its looking like the compiler might be reordering / 'optimizing' things on us? Still need to test out a theory, but at first glance when stepping through in Anyways, still chipping away at this one between some other tasks, so I'll update again once I get a couple theories confirmed/squashed. |
That shouldn't be possible, but if it is happening then you can try adding an atomic signal fence after the spinlock acquire. It may also be that LLVM is now better at optimising atomics and we need to make the spinlock code use proper C11 atomics with acquire / release semantics to ensure that the reordering doesn't happen. |
Well I'm no closer to understanding the why, but I do have a bit more data at least. I tried out reordering the assignment of
But my coworker still hit this same issue again (This time we got some extra symbols loaded into
This time we can definitively see that we're hitting the And those |
While investigating some intermittent deadlocks being observed in a processes that is part of a suite of software I help maintain, I think I ended up tracking down a potential deadlock.
(Partially due to how pointer hashes could collide for the for the 'spinlocks' used in a few places... but I'm not sure any hash algorithm would ensure this couldn't be hit, just help reduce the rate it happens at)
The process seems to only lock up during initial startup. If gets through initial startup, it seems like it continues chugging along without any hiccups. The following rabbit hole I'm about to describe does use this fact as a basis for making some assumptions on which code paths are taken.
Stack trace of the deadlocked thread (this is a multi threaded application, but none of the other threads are really interesting to look at):
So it looks like inside
objc_getProperty
we attempt toretain
, while inside aspin_lock(0xaaaafd5ab608 + propOffset)
At this point we've taken the
lock_spinlock(0xaaaafd5ab608 + propOffset)
At
#6
theobjc_msg_lookup_sender()
calls theobjc_msg_lookup_internal()
version, and where this deadlock only happens during initial startup, I'm assuming we're hitting the 'uninstalled dtable' path:Inside
objc_send_initialize()
, we essentially hit a@synchronized()
block:Essentially
LOCK_OBJECT_FOR_SCOPE()
is a macro that runsobjc_sync_enter()
/objc_sync_exit()
automatically for the current lexical scope, like a@synchronized
block would. So now we're at#4
, and we attempt to either loop up or create the lock for this property:Once
referenceListForObject()
is called withcreate == YES
, then withinreferenceListForObject()
, if thereference_list->lock
needs to be initialized, we take another spin lock, this time on[(0xaaaafd5ab608 + propOffset).class]->isa
essentially:In either code path where the lock needs to be initialized, we're essentially now taking a second
lock_for_pointer()
, and I'm thinking in some cases the pointer hashes collide, making uslock_spinlock()
on the same spinlock for both pointers... and in turn, deadlocking the main thread.Unfortunately, I haven't figured out a good solution for this one yet. But I at least wanted to get this issue posted, in case anyone else has seen weird behavior with atomic properties, or this sparks any ideas on how to fix the issue.
The text was updated successfully, but these errors were encountered: