Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Go 竞态和死锁检测 #50

Open
Petelin opened this issue Nov 19, 2019 · 0 comments
Open

Go 竞态和死锁检测 #50

Petelin opened this issue Nov 19, 2019 · 0 comments

Comments

@Petelin
Copy link
Owner

Petelin commented Nov 19, 2019

Go 竞态和死锁检测

race condition and deadlock detect

参考文档

race 检查实现原理
中文博客可能有误

go race condition 检测

  1. 搞出来一块影子内存, 其中原来的8字节对应 影子内存里 N(2,4,8) 个8字节的结构. 这个N应该是事件的长度.

Shadow Word is a 64-bit object that contains the following fields:

TID (Thread Id) 16 bits (configurable)
Scalar Clock 42 bits (configurable)
IsWrite 1 bit
Access Size (1, 2, 4 or 8) 2 bits
Address Offset (0..7) 3 bits

然后一次针对内存的读或者写会被记录成一条事件, 这个事件就是靠上面这个结构记录的
最后的2+3字节 表达是原来内存里8字节中那几个字节发生了改变.
比如 [2:4] 意思就是第3个字节到6字节(3,4,5,6)发生了改变.

至于怎么映射的值没看懂,只说是直接映射的...

dead lock (deadly embraees by E.W.D) 概念

deadlock讲的是线程间的死锁. 而线程在wait或者join的时候是会释放掉自己的锁的.所以对单个锁不可能造成死锁.

Mutual Exclusion 互斥条件

即某个资源在一段时间内只能由一个进程占有,不能同时被两个或两个以上的进程占有。如独木桥就是一种独占资源,两方的人不能同时过桥。

解决办法:弄两个独木桥,程序里就是弄两个mysql等等

No Preemption 不可抢占

a resource can be released only voluntarily by the process holding it.

进程所获得的资源在未使用完毕之前,资源申请者不能强行地从资源占有者手中夺取资源,而只能由该资源的占有者进程自行释放。换句话说甲乙不能自杀或者他杀,必须是桥上的人自己过桥后空出桥面(即主动释放占有资源),对方的人才能过桥。就是说不能自己和其他人不能释放, 只能自己释放

解决办法: 强占,比如重启服务,或强行杀死对方,然后走过去

Hold and wait or resource holding:

a process is currently holding at least one resource and requesting additional resources which are being held by other processes.

进程至少已经占有一个资源,但又要申请新的资源。一个人已经有独木桥的左部分了还要独木桥的右半部分。

解决办法:申请新资源前先释放掉自己占有的资源。

Circular Wait

存在一个进程等待序列 {P1,P2,…,Pn},其中 P1 等待 P2 所占有的某一资源,P2 等待 P3 所占有的某一源,……,而 Pn 等待 P1 所占有的的某一资源,形成一个进程循环等待环。

就像过独木桥问题,甲等待乙占有的桥面,而乙又等待甲占有的桥面,从而彼此循环等待。

解决办法:按照相同的顺序获取锁,比如都从独木桥只允许从左边往右走

上面我们提到的这四个条件在死锁时会同时发生。也就是说,只要有一个必要条件不满足,则死锁就可以排除。

死锁检测算法

每种类型一个资源的死锁检测

这样的系统可能有扫描仪、蓝光光盘刻录机、绘图仪和磁带机,但每种设备都不超过一个,即排除了同时有两台打印机的情况。

可以对这样的系统构造一~张资源分配图,如图 6-3 所示。如果这张图包含了一个或一个以上的环,那么死锁就存在。在此环中的任何一个进程都是死锁进程。如果没有这样的环,系统就没有发生死锁。

每个类型有多种资源

使用两个矩阵来检测. 算法未知

Go 检测死锁

如果go检测到死锁,会爆这么一句话

fatal error: all goroutines are asleep - deadlock!

  1. 这里的asleep是指处于只能被其他用户协程唤醒的状态。Sleep 并没有进入这个状态 sleep的goroutine还是running状态.

// _Gwaiting means this goroutine is blocked in the runtime.
// It is not executing user code. It is not on a run queue,
// but should be recorded somewhere (e.g., a channel wait
// queue) so it can be ready()d when necessary. The stack is
// not owned except that a channel operation may read or
// write parts of the stack under the appropriate channel
// lock. Otherwise, it is not safe to access the stack after a
// goroutine enters _Gwaiting (e.g., it may get moved).

  1. 这里只针对一种情况做了检测,那就是全局所有的goroutine都处于asleep,没有针对单独的锁做检测。
  2. 这个检测算法是利用Circular Wait 这种特性来检测的。 如果所有的goroutine都处于等待其他人释放资源的状态,那程序一定退出不了。
  3. 如果每个goroutine都等待自己释放 也会爆这个错误。-- 但是这种情况其实不是“多线程死锁”(deadly embraces),而是 self deadlock /recursive deadlock.

ps : 还有cgo下, go是无法用--race检查的因为有还有潜在的c调用发生.

详细分析

A goroutine can get stuck

  • either because it's waiting for a channel or

  • because it is waiting for one of the locks in the sync package.

Typical reasons are:

  • No other goroutine has access to the channel or the lock, or

  • A group of goroutines are waiting for each other and none of them is able to proceed. (This is the formal defintion of deadlock.)

Currently Go only detects when the program as a whole freezes, not when a subset of goroutines get stuck.

With channels it's often easy to figure out what caused a deadlock. Programs that make heavy use of mutexes can on the other hand be notoriously difficult to debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant