文章/答案/技术大牛

发布

问平方根计算算法
EN

Stack Overflow用户

提问于 2021-05-13 14:50:34

回答 3查看 261关注 0票数 1

我一直在用C语言实现控制软件，其中一个控制算法需要平方根计算。我一直在寻找合适的平方根计算算法，它将具有恒定的执行时间，而与根值无关。这个要求排除了标准库中的sqrt函数。

就我的平台而言，我一直在使用基于浮点32位ARM Cortex A9的机器。就我的应用程序中的基数范围而言，算法是以物理单位计算的，因此我希望使用以下范围<0, 400>。至于所需的误差，我认为大约1%的误差就足够了。有没有人能推荐一个适合我的平方根计算算法？

algorithm

math

embedded

sqrt

回答 3

Stack Overflow用户

发布于 2021-05-13 15:56:18

我最初方法是将泰勒级数用于平方根，并在许多固定点上预先计算系数。这将把计算减少为减法和多次乘法。

查找表将是一个2D数组，如下所示：

point | C0  | C1  | C2  | C3  | C4  | ...
-----------------------------------------
 0.5  | f00 | f01 | f02 | f03 | f04 |
-----------------------------------------
 1.0  | f10 | f11 | f12 | f13 | f14 |
-----------------------------------------
 1.5  | f20 | f21 | f22 | f23 | f24 |
-----------------------------------------
....

因此，在计算sqrt( x )时，请使用与x最接近的表行。

示例：

sqrt(1.1) (i.e. use point 1.0 coeffients)

f10 + 
f11 * (1.1 - 1.0) + 
f12 * (1.1 - 1.0) ^ 2 + 
f13 * (1.1 - 1.0) ^ 3 + 
f14 * (1.1 - 1.0) ^ 4

上表建议预先计算系数的点之间的固定距离(即每个点之间的0.5 )。但是，由于平方根的性质，您可能会发现，对于不同的x范围，点之间的距离应该不同。例如，x in 0-1 ->距离0.1，x in 1-2 ->距离0.25，x in 2- 10 ->距离0.5，依此类推。

另一件事是获得所需精度所需的项数。在这里，您可能还会发现，不同范围的x可能需要不同数量的系数。

所有这些都很容易在普通计算机上进行预计算(例如使用excel)。

注意:对于非常接近于零的值，这种方法并不好用。也许牛顿方法会是一个更好的选择。

泰勒级数：https://en.wikipedia.org/wiki/Taylor_series

牛顿方法：https://en.wikipedia.org/wiki/Newton%27s_method

票数 1

Stack Overflow用户

发布于 2021-05-13 16:06:02

Arm v7指令集提供了用于两个同时近似的vrsqrte_f32和四个近似的vrsqrteq_f32的反倒数平方根计算的快速指令。(标量变体vrsqrtes_f32仅在Arm64 v8.2上可用)。

然后，可以通过x * vrsqrte_f32(x);简单地计算结果，它在整个正值x范围内具有优于0.33%的相对精度。请参阅https://www.mdpi.com/2079-3197/9/2/21/pdf

ARM霓虹灯指令FRSQRTE给出了8.25个正确的结果位。

At x==0 vrsqrtes_f32(x) == Inf，所以x*vrsqrtes_f32(x)应该是NaN。

如果x==0的值是不可避免的，那么最优的两条指令序列需要更多的调整：

float sqrtest(float a) {
    // need to "transfer" or "convert" the scalar input 
    // to a vector of two
    // - optimally we would not need an instruction for that
    // but we would just let the processor calculate the instruction
    // for all the lanes in the register
    float32x2_t a2 = vdup_n_f32(a);

    // next we create a mask that is all ones for the legal
    // domain of 1/sqrt(x)
    auto is_legal = vreinterpret_f32_u32(vcgt_f32(a2, vdup_n_f32(0.0f)));

    // calculate two reciprocal estimates in parallel 
    float32x2_t a2est = vrsqrte_f32(a2);

    // we need to mask the result, so that effectively
    // all non-legal values of a2est are zeroed
    a2est = vand_u32(is_legal, a2est);

    // x * 1/sqrt(x) == sqrt(x)
    a2 = vmul_f32(a2, a2est);

    // finally we get only the zero lane of the result
    // discarding the other half
    return vget_lane_f32(a2, 0);
}

当然，这种方法的吞吐量几乎是

void sqrtest2(float &a, float &b) {
    float32x2_t a2 = vset_lane_f32(b, vdup_n_f32(a), 1);
    float32x2_t is_legal = vreinterpret_f32_u32(vcgt_f32(a2, vdup_n_f32(0.0f)));
    float32x2_t a2est = vrsqrte_f32(a2);
    a2est = vand_u32(is_legal, a2est);
    a2 = vmul_f32(a2, a2est);
    a = vget_lane_f32(a2,0); 
    b = vget_lane_f32(a2,1); 
}

如果您可以直接使用float32x2_t或float32x4_t输入和输出，那就更好了。

float32x2_t sqrtest2(float32x2_t a2) {
    float32x2_t is_legal = vreinterpret_f32_u32(vcgt_f32(a2, vdup_n_f32(0.0f)));
    float32x2_t a2est = vrsqrte_f32(a2);
    a2est = vand_u32(is_legal, a2est);
    return vmul_f32(a2, a2est);
}

此实现提供了sqrtest2(1) == 0.998和sqrtest2(400) == 19.97 (在带有arm64的MacBook M1上测试)。由于是无分支和无LUT的，假设所有指令在恒定的周期数内执行，这可能具有恒定的执行时间。

票数 1

Stack Overflow用户

发布于 2021-05-14 15:12:06

我决定使用下面的方法。我选择了牛顿法，然后我实验地设置了固定的迭代次数，使整个根值范围内的误差，即<0,400>不超过规定值。我在六次迭代中结束了。至于值为0的根数，我决定返回0而不做任何计算。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/67515027

复制

相似问题

问平方根计算算法
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问平方根计算算法EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问平方根计算算法
EN