在测试 LinkADR 功能时,发现了 STACK 中一个 ChMask 的问题,经过好几天的摸索,终于搞定了问题。根源是 range 语句中的内存分配问题,本能手觉得是个非常易错的点,值得好好记录一下。
小能手这段时间在学习 The Things Network LoRaWAN Stack V3,从使用和代码等角度对该 Stack 进行了分析,详细可点此查看。
NS LinkADRReq 中的 ChannelMask 与频点计划不匹配。超过 16 信道的频段都存在这个问题:US902-928、AU915-928、CN470-510。
当节点使用 CN_470_510_FSB_11 的频段,它收到的 LinkADRReq 将会是 {{03 00 FF 00 01},{03 00 FF 00 11},{03 00 FF 00 21},{03 00 FF 00 31},{03 00 FF 00 41},{03 00 FF 00 51}}
当节点使用 CN_470_510_FSB_1 的频段,它收到的 LinkADRReq 将会是 {{03 00 00 00 01},{03 00 00 00 11},{03 00 00 00 21},{03 00 00 00 31},{03 00 00 00 41},{03 00 00 00 51}}
其实和这个问题是一样的,https://github.com/TheThingsNetwork/lorawan-stack/issues/544,它并没有解决掉。
当节点使用 CN_470_510_FSB_11 的频段,它收到的 LinkADRReq 将会是 {{03 00 00 00 01},{03 00 00 00 11},{03 00 00 00 21},{03 00 00 00 31},{03 00 00 00 41},{03 00 FF 00 51}}
当节点使用 CN_470_510_FSB_1 的频段,它收到的 LinkADRReq 将会是 {{03 00 FF 00 01},{03 00 00 00 11},{03 00 00 00 21},{03 00 00 00 31},{03 00 00 00 41},{03 00 00 00 51}}
和服务器没有关系,是代码问题。
在研究了代码之后,我把问题代码抽出来写成了一个测试程序 chmask_issue.go。核心问题是在这段代码处理:
for _, m := range desiredMasks {
pld := &MACCommand_LinkADRReq {
ChannelMask: m.Mask[:],
}
fmt.Printf("p = %p %p\r\n", pld.ChannelMask, &m.Mask[0])
}
我们将 m.Mask 数组传递给了切片 ChannelMask,相当于把数组的指针传递给了切片ChannelMask,而问题核心就在于, m.Mask 在每次循环中只是向 desiredMasks 获取了内容,其指针并没有发生改变。
所以,所有的 pld 的 ChannelMask,都会被覆盖成最后一个循环中 m.Mask。
当节点使用 CN_470_510_FSB_11 的频段,所有的 ChannelMask,都被覆盖成最后一个循环中的 m.Mask, 数值为 {true, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false}。
当节点使用 CN_470_510_FSB_1 的频段,所有的 ChannelMask,都被覆盖成最后一个循环中的 m.Mask, 数值为 {false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false}。
Go语言提供了关键字range,用于便捷地遍历容器中的元素,每次循环只是将容器中的元素复制到副本中。
副本只在 for 循环开始时分配了内存,其内存地址在之后的循环中都不会改变。
一些 Gopher 也有类似经历,做个参考:
https://tam7t.com/golang-range-and-pointers/
https://blog.csdn.net/idwtwt/article/details/87378419
我把能想到的几种解决方式都列在这里。
for i, m := range desiredMasks {
pld := &ttnpb.MACCommand_LinkADRReq{
ChannelMask: desiredMasks[i].Mask[:],
}
}
这种方式的代码改动量最小。
临时变量将会分配一个新的内存,这样就可以进行切片的赋值。
它的改动量也比较小,但是结合上下文发现它有个比较大的问题。因为在函数外面还需要使用 ChannelMask,tempMask 的内存很快就会被释放,这个解决方案在这里并不合适,有内存泄漏的问题。
for _, m := range desiredMasks {
tempMask := make([]bool, 0, 16)
copy(tempMask, m.Mask[:])
pld := &ttnpb.MACCommand_LinkADRReq{
ChannelMask: tempMask,
}
}
将结构体中的切片也改成数组,严格将 for range 只用作数值传递。
type MACCommand_LinkADRReq struct {
ChannelMask [16]bool `protobuf:"varint,3,rep,packed,name=channel_mask,json=channelMask,proto3" json:"channel_mask,omitempty"`
}
这种方式至少需要改动4个地方,但是是最建议的方式,不担心内存泄漏问题。
我把这个问题到 github 上提交了 issue,https://github.com/TheThingsNetwork/lorawan-stack/issues/586,原文如下:
Summary:
The ChannelMask field in the LinkADRReq command mismatch with the frequency_plan. This issue exists in the band which have more than 16 channels: US902-928、AU915-928、CN470-510.
Steps to Reproduce:
What do you see now?
When end-device takes the CN_470_510_FSB_11, it get the LinkADRReq: {{03 00 FF 00 01},{03 00 FF 00 11},{03 00 FF 00 21},{03 00 FF 00 31},{03 00 FF 00 41},{03 00 FF 00 51}}
When end-device takes the CN_470_510_FSB_1, it get the LinkADRReq: {{03 00 00 00 01},{03 00 00 00 11},{03 00 00 00 21},{03 00 00 00 31},{03 00 00 00 41},{03 00 00 00 51}}
This issue is basically the same as another issue(https://github.com/TheThingsNetwork/lorawan-stack/issues/544). I updated the latest code and used the latest frequency plan, but the issue still exists.
What do you want to see instead?
When end-device takes the CN_470_510_FSB_11, it should get the LinkADRReq: {{03 00 00 00 01},{03 00 00 00 11},{03 00 00 00 21},{03 00 00 00 31},{03 00 00 00 41},{03 00 FF 00 51}}
When end-device takes the CN_470_510_FSB_1, it should get the LinkADRReq: {{03 00 FF 00 01},{03 00 00 00 11},{03 00 00 00 21},{03 00 00 00 31},{03 00 00 00 41},{03 00 00 00 51}}
Environment:
This issue can be reproduced by any environment.
How do you propose to implement this?
I have created a minimized program, chmask_issue.go, attatched in this post. The key code is this:
for _, m := range desiredMasks {
pld := &MACCommand_LinkADRReq {
ChannelMask: m.Mask[:],
}
fmt.Printf("p = %p %p\r\n", pld.ChannelMask, &m.Mask[0])
}
When we pass value from m.Mask array to ChannelMask slice, we know that it actually pass the pointer of the array to the slice. Problem happens here, m.Mask could only change its value in each iteration of the loop, its pointer would be consistent, which is allocated once at the beginning of the loop.
So, each pld.ChannelMask would be overwrited with the m.Mask in the last iteration of the loop:
i. When end-device takes the CN_470_510_FSB_11, all ChannelMask would be overwrited with the m.Mask in the last iteration of the loop, the value is {true, true, true, true, true, true, true, true, false, false, false, false, false, false, false, false}.
ii. When end-device takes the CN_470_510_FSB_1, all ChannelMask would be overwrited with the m.Mask in the last iteration of the loop, the value is {false, false, false, false, false, false, false, false, false, false, false, false, false, false, false, false}.
Can you do this yourself and submit a Pull Request?
I would like to recommend 2 solutions.
Solution 1. Passing real memory addresses by array subscript
for i, m := range desiredMasks {
pld := &ttnpb.MACCommand_LinkADRReq{
ChannelMask: desiredMasks[i].Mask[:],
}
}
This solution takes the least amount of code changes. I have made test, this issue is finally fixed.
Solution 2. Passing value by changing the type of ChannelMask to array
type MACCommand_LinkADRReq struct {
ChannelMask [16]bool `protobuf:"varint,3,rep,packed,name=channel_mask,json=channelMask,proto3" json:"channel_mask,omitempty"`
}
This solution needs to adjust at least 4 places, but it is the better way to operate in Range loop, no need to care about memory leak.
Which one do you prefer for these two solution? or any other suggestion? I’d like to submit a Pull Request to this repository.