Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collector: fix potential forgetting unlock in Persister.Inject #386

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

BurtonQin
Copy link

Description
This patch fixes a potential forgetting unlock bug in Persister.Inject, which may lead to a deadlock.
In src/mongoshake/collector/persister.go,
Function Inject calls p.diskQueueMutex.Lock() but forgets p.diskQueueMutex.Unlock() on a specific branch.

p.diskQueueMutex.Lock()
if p.DiskQueue != nil { // double check
// should send to diskQueue
atomic.AddUint64(&p.diskWriteCount, 1)
if err := p.DiskQueue.Put(input); err != nil {
LOG.Crashf("persister inject replset[%v] put oplog to disk queue failed[%v]",
p.replset, err)
}
} else {
// should send to pending queue
p.PushToPendingQueue(input)
}

Inject is called by next in syncer.go, and next is called multiple times by poll.
So it is possible that p.disQueueMutex.Lock() is acquired twice, leading to a deadlock.
sync.persister.Inject(log)

for quorum.IsMaster() {
// limit the qps if enabled
if sync.qos.Limit > 0 {
sync.qos.FetchBucket()
}
// only get one
sync.next()
}

Fix
The fix is to add a defer p.diskQueueMutex.Unlock() right after p.diskQueueMutex.Lock(). This way Unlock is always called as long as Lock is called, even if there is a panic (LOG.Crashf).

Suggestions
One interesting thing is that there is a unit testing TestInject in src/mongoshake/collector/persister_test.go.
But it fails to detect this potential deadlock because of the limited configurations.
e.g.

conf.Options.FullSyncReaderOplogStoreDisk = false

So p.enableDiskPersist will always be false and the Lock branch will never be taken.
enableDiskPersist: conf.Options.SyncMode == utils.VarSyncModeAll &&
conf.Options.FullSyncReaderOplogStoreDisk,

I suggest changing the configurations to cover the Lock branch.

Though unlikely, if leaving out Unlock is the intention of the developers (e.g., for misconfiguration detection), I suggest adding error messages and doc to better explain this.

@CLAassistant
Copy link

CLAassistant commented Jul 12, 2020

CLA assistant check
All committers have signed the CLA.

@vinllen
Copy link
Collaborator

vinllen commented Jul 13, 2020

Thanks for your PR. This PR will be merged later when I want to open the disk persistent feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants