同样作为非并发安全的数据结构,slice和map在有并发安全问题时,为什么表现相差那么大

fliter

发布于 2023-06-18 11:34:44

1980

发布于 2023-06-18 11:34:44

文章被收录于专栏：旅途散记

TL;DR:

Because some cases are easy and cheap to detect for maps, but there

are no cases that are easy or cheap to detect for slices.

Go一共有27种细分数据类型 (可参考利用反射,探究Go语言中的数据类型 https://dashen.tech/2020/06/27/%E5%8F%8D%E5%B0%84/)

除channel外(结构体中有mutex，保证其他字段的并发安全)，一般情况下，byte，bool，int，float，point,func是并发安全的

(这些数据类型的位宽不会超过64位，所以在64位的指令集架构中可以由一条机器指令完成，不存在被细分为更小的操作单位，故而这些类型的并发赋值是安全的；但也和操作系统的位数有关，如int64在32位操作系统中，高32位和低32位是分开赋值的，此时是非并发安全的）

而 string，slice，map这三种最常用的数据结构是并发不安全的

(interface，complex，struct，数组，往往也是并发不安全的) 如:

package main

import (
 "fmt"
 "sort"
 "time"
)

var s []int

func appendValue(i int) {
 s = append(s, i)
}

func main() {

 for i := 0; i < 10000; i++ {
  go appendValue(i)
 }

 sort.Ints(s) //给切片排序,先排完序再打印,

 for i, v := range s {
  fmt.Println(i, ":", v)

 }

 time.Sleep(5e9)

}

输出为:

0 : 0
1 : 0
2 : 0
3 : 0
4 : 0

...


80 : 0
81 : 0
82 : 0
83 : 0
84 : 0
85 : 1
86 : 2
87 : 3
88 : 6
89 : 8
90 : 12
91 : 13
92 : 14
93 : 19
94 : 28
95 : 30
96 : 31
97 : 32
98 : 33
99 : 34
100 : 35
101 : 36
102 : 44
103 : 45
104 : 46
105 : 47
106 : 48
107 : 49
108 : 50
109 : 51
110 : 52
111 : 53
112 : 54
113 : 55
114 : 56
115 : 57

...

8328 : 9990
8329 : 9991
8330 : 9992
8331 : 9993
8332 : 9995
8333 : 9996
8334 : 9994
8335 : 9997
8336 : 9998
8337 : 9999

最后没有到 9999 : 9999，但也没出任何提示

再如:


package main

import (
 "fmt"
 "time"
)

const N = 500

func main() {

 m := make(map[int]int)

 go func() {
  for i := 0; i < N; i++ {
   m[i] = i*10 + 6
  }
 }()

 go func() {
  for i := 0; i < N; i++ {
   fmt.Println(i, m[i])
  }
 }()

 time.Sleep(1e9 * 5)

}

第一个协程对map写,第二个对map读

N 较大时,该程序将报错:

fatal error: concurrent map read and map write

goroutine 18 [running]:
runtime.throw(0x10cb62d, 0x21)
...

而且这种报错，无法通过recover捕获

看代码可知，除了并发读写/写写 map这种case，还有另外几种情况，同样无法通过recover恢复，需要特别注意:

堆栈内存耗尽(如递归): fatal error: stack overflow
将 nil 函数作为 goroutine 启动 fatal error: go of nil func value
goroutines 死锁 fatal error: all goroutines are asleep - deadlock!
线程超过设置的最大限制 fatal error: thread exhaustion
超出可用内存 fatal error: runtime: out of memory

等

那map和slice同样作为非并发安全的数据结构，为什么map被设计成在有并发冲突时抛出一个无法恢复的致命错误，而slice却没有任何提示？是否有任何设计上的考量？

在 golang-nuts上提出了这个问题

活跃于社区孜孜不倦的Ian Lancer大佬给出了如上回复

即检测map的并发问题非常容易*低成本，而检测slice的并发问题很困难&代价高昂

sliceheader:

// runtime/slice.go
type slice struct {
 array unsafe.Pointer
 len   int
 cap   int
}

mapheader


// runtime/map.go

 // flags
 iterator     = 1 // there may be an iterator using buckets
 oldIterator  = 2 // there may be an iterator using oldbuckets
 hashWriting  = 4 // a goroutine is writing to the map
 sameSizeGrow = 8 // the current map growth is to a new map of the same size


// A header for a Go map.
type hmap struct {
 // Note: the format of the hmap is also encoded in cmd/compile/internal/reflectdata/reflect.go.
 // Make sure this stays in sync with the compiler's definition.
 count     int // # live cells == size of map.  Must be first (used by len() builtin)
 flags     uint8
 B         uint8  // log_2 of # of buckets (can hold up to loadFactor * 2^B items)
 noverflow uint16 // approximate number of overflow buckets; see incrnoverflow for details
 hash0     uint32 // hash seed

 buckets    unsafe.Pointer // array of 2^B Buckets. may be nil if count==0.
 oldbuckets unsafe.Pointer // previous bucket array of half the size, non-nil only when growing
 nevacuate  uintptr        // progress counter for evacuation (buckets less than this have been evacuated)

 extra *mapextra // optional fields
}

// A bucket for a Go map.
type bmap struct {
 // tophash generally contains the top byte of the hash value
 // for each key in this bucket. If tophash[0] < minTopHash,
 // tophash[0] is a bucket evacuation state instead.
 tophash [bucketCnt]uint8
 // Followed by bucketCnt keys and then bucketCnt elems.
 // NOTE: packing all the keys together and then all the elems together makes the
 // code a bit more complicated than alternating key/elem/key/elem/... but it allows
 // us to eliminate padding which would be needed for, e.g., map[int64]int8.
 // Followed by an overflow pointer.
}

// mapextra holds fields that are not present on all maps.
type mapextra struct {
 // If both key and elem do not contain pointers and are inline, then we mark bucket
 // type as containing no pointers. This avoids scanning such maps.
 // However, bmap.overflow is a pointer. In order to keep overflow buckets
 // alive, we store pointers to all overflow buckets in hmap.extra.overflow and hmap.extra.oldoverflow.
 // overflow and oldoverflow are only used if key and elem do not contain pointers.
 // overflow contains overflow buckets for hmap.buckets.
 // oldoverflow contains overflow buckets for hmap.oldbuckets.
 // The indirection allows to store a pointer to the slice in hiter.
 overflow    *[]*bmap
 oldoverflow *[]*bmap

 // nextOverflow holds a pointer to a free overflow bucket.
 nextOverflow *bmap
}

相比于slice的"简约"(其实扩容也挺复杂的)，map要繁复庞杂得多

其中hmap中的flags字段，用于存储哈希表的各种标志位信息。该字段是一个无符号整数类型（uint8）。

flags字段的位表示在哈希表中具有不同的含义。下面是flags字段的各个位表示的标志位含义：

低2位（least significant 2 bits）：表示哈希表的状态。具体取值如下：
- 00：哈希表为空。
- 01：哈希表正在被使用。
- 10：哈希表正在被迭代（遍历）。
- 11：哈希表正在被扩容。
第3位（bit 2）：表示哈希表是否使用指针作为键（key）的布尔标志位。取值为1表示使用指针作为键，取值为0表示使用非指针类型作为键。
第4位（bit 3）：表示哈希表的键（key）是否为字符串类型的布尔标志位。取值为1表示键为字符串类型，取值为0表示键为非字符串类型。
第5位（bit 4）：表示哈希表是否为幕后结构的布尔标志位。取值为1表示该哈希表为幕后结构（backing store），即哈希表是另一个哈希表的底层数据结构。
第6位（bit 5）：表示哈希表是否禁用完整性检查的布尔标志位。取值为1表示禁用完整性检查，取值为0表示启用完整性检查。
第7位（bit 6）：保留位，未使用。

这些标志位用于在哈希表的操作和状态之间进行标识和传递信息。通过flags字段，可以了解哈希表的状态、键的类型、底层结构等信息，从而在哈希表的实现中进行相应的逻辑处理和优化。