# 2. 普通实现

```void IM_GetRoughSkinRegion(unsigned char *Src, unsigned char *Skin, int Width, int Height, int Stride) {
for (int Y = 0; Y < Height; Y++)
{
unsigned char *LinePS = Src + Y * Stride;
unsigned char *LinePD = Skin + Y * Width;
for (int X = 0; X < Width; X++)
{
int Blue = LinePS[0], Green = LinePS[1], Red = LinePS[2];
if (Red >= 60 && Green >= 40 && Blue >= 20 && Red >= Blue && (Red - Green) >= 10 && IM_Max(IM_Max(Red, Green), Blue) - IM_Min(IM_Min(Red, Green), Blue) >= 10)
LinePD[X] = 255;
else
LinePD[X] = 16;
LinePS += 3;
}
}
}
```

4272x2848

1000

41.40ms

# 3. 肤色检测算法第一版优化

```void IM_GetRoughSkinRegion_OpenMP(unsigned char *Src, unsigned char *Skin, int Width, int Height, int Stride) {
for (int Y = 0; Y < Height; Y++)
{
unsigned char *LinePS = Src + Y * Stride;
unsigned char *LinePD = Skin + Y * Width;
for (int X = 0; X < Width; X++)
{
int Blue = LinePS[X*3 + 0], Green = LinePS[X*3 + 1], Red = LinePS[X*3 + 2];
if (Red >= 60 && Green >= 40 && Blue >= 20 && Red >= Blue && (Red - Green) >= 10 && IM_Max(IM_Max(Red, Green), Blue) - IM_Min(IM_Min(Red, Green), Blue) >= 10)
LinePD[X] = 255;
else
LinePD[X] = 16;
}
}
}
```

4272x2848

1000

41.40ms

4272x2848

OpenMP 4线程

1000

36.54ms

# 4. 肤色检测算法第二版优化

```void IM_GetRoughSkinRegion_SSE(unsigned char *Src, unsigned char *Skin, int Width, int Height, int Stride) {
const int NonSkinLevel = 10; //非肤色部分的处理程序，本例取16，最大值取100，那样就是所有区域都为肤色，毫无意义
const int BlockSize = 16;
int Block = Width / BlockSize;
for (int Y = 0; Y < Height; Y++) {
unsigned char *LinePS = Src + Y * Stride;
unsigned char *LinePD = Skin + Y * Width;
for (int X = 0; X < Block * BlockSize; X += BlockSize, LinePS += BlockSize * 3, LinePD += BlockSize) {
__m128i Src1, Src2, Src3, Blue, Green, Red, Result, Max, Min, AbsDiff;
Src1 = _mm_loadu_si128((__m128i *)(LinePS + 0));
Src2 = _mm_loadu_si128((__m128i *)(LinePS + 16));
Src3 = _mm_loadu_si128((__m128i *)(LinePS + 32));

Blue = _mm_shuffle_epi8(Src1, _mm_setr_epi8(0, 3, 6, 9, 12, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1));
Blue = _mm_or_si128(Blue, _mm_shuffle_epi8(Src2, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, 2, 5, 8, 11, 14, -1, -1, -1, -1, -1)));
Blue = _mm_or_si128(Blue, _mm_shuffle_epi8(Src3, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 4, 7, 10, 13)));

Green = _mm_shuffle_epi8(Src1, _mm_setr_epi8(1, 4, 7, 10, 13, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1));
Green = _mm_or_si128(Green, _mm_shuffle_epi8(Src2, _mm_setr_epi8(-1, -1, -1, -1, -1, 0, 3, 6, 9, 12, 15, -1, -1, -1, -1, -1)));
Green = _mm_or_si128(Green, _mm_shuffle_epi8(Src3, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 2, 5, 8, 11, 14)));

Red = _mm_shuffle_epi8(Src1, _mm_setr_epi8(2, 5, 8, 11, 14, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1));
Red = _mm_or_si128(Red, _mm_shuffle_epi8(Src2, _mm_setr_epi8(-1, -1, -1, -1, -1, 1, 4, 7, 10, 13, -1, -1, -1, -1, -1, -1)));
Red = _mm_or_si128(Red, _mm_shuffle_epi8(Src3, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 3, 6, 9, 12, 15)));

Max = _mm_max_epu8(_mm_max_epu8(Blue, Green), Red); //IM_Max(IM_Max(Red, Green), Blue)
Min = _mm_min_epu8(_mm_min_epu8(Blue, Green), Red); //IM_Min(IM_Min(Red, Green), Blue)
Result = _mm_cmpge_epu8(Blue, _mm_set1_epi8(20)); //Blue >= 20
Result = _mm_and_si128(Result, _mm_cmpge_epu8(Green, _mm_set1_epi8(40))); //Green >= 40
Result = _mm_and_si128(Result, _mm_cmpge_epu8(Red, _mm_set1_epi8(60))); //Red >= 60
Result = _mm_and_si128(Result, _mm_cmpge_epu8(Red, Blue)); //Red >= Blue
Result = _mm_and_si128(Result, _mm_cmpge_epu8(_mm_subs_epu8(Red, Green), _mm_set1_epi8(10))); //(Red - Green) >= 10
Result = _mm_and_si128(Result, _mm_cmpge_epu8(_mm_subs_epu8(Max, Min), _mm_set1_epi8(10))); //IM_Max(IM_Max(Red, Green), Blue) - IM_Min(IM_Min(Red, Green), Blue) >= 10
Result = _mm_or_si128(Result, _mm_set1_epi8(16));
_mm_storeu_si128((__m128i*)(LinePD + 0), Result);
}
for (int X = Block * BlockSize; X < Width; X++, LinePS += 3, LinePD++)
{
int Blue = LinePS[0], Green = LinePS[1], Red = LinePS[2];
if (Red >= 60 && Green >= 40 && Blue >= 20 && Red >= Blue && (Red - Green) >= 10 && IM_Max(IM_Max(Red, Green), Blue) - IM_Min(IM_Min(Red, Green), Blue) >= 10)
LinePD[0] = 255;         // 全为肤色部分
else
LinePD[0] = 16;
}
}
}
```

• 首先和【AI PC端算法优化】二，一步步优化自然饱和度算法一样，将B/G/R分量分别提取到一个SSE变量中。
• 然后来看一下`Red >= 60 && Green >= 40 && Blue >= 20`这个条件，我们需要一个`unsigned char`类型的比较函数，而SSE只提供了`singed char`类型的SSE比较函数即`_mm_cmpeq_ps`

_mm_cmpeq_ps 指令集

unsigned char类型的比较函数

• 接下来我们再来看一下这个条件`(Red - Green) >= 10`，如果计算`Red-Green`，则需要把他们转换为`ushort`类型才可能满足可能存在负数的情况，但如果使用`_mm_subs_epu8`这个饱和计算函数，当`Red<Green`时，`Red-Green`就被截断为`0`了，这个时候`(Red-Green)>=10`就会返回`false`了，而如果`Red-Green>0`就不会发生截断，刚好满足。其中`_mm_subs_epu8`这个饱和计算函数实现功能如下所示：

• 还有一个条件`IM_Max(IM_Max(Red, Green), Blue) - IM_Min(IM_Min(Red, Green), Blue) >= 10`。这个是最简单的一个，直接用`_mm_max_epu8``_mm_min_epu8`获得B/G/R三分量的最大值和最小值，这个时候很明显`max>min`,因此有可以直接使用`_mm_subs_epu8`函数产生不会截断的正确结果。
• 注意到SSE的比较函数只有`0``255`这两种结果，因此上面的`6`个判断条件直接进行`and`操作就可以获得最后的组合值了，满足所有的条件的像素结果就是`255`，否则就是`0`
• 最后还有一个操作是不满条件的像素被设置成了`16`或者其他值，这里作者提供的方式直接与其他数或起来即可，因为`255`和其他数进行`or`操作还是`255`，而`0`和其它数进行`or`操作就会变为其它数。

4272x2848

1000

41.40ms

4272x2848

OpenMP 4线程

1000

36.54ms

4272x2848

SSE第一版

1000

6.77ms

# 5. 肤色检测算法第三版优化

```void _IM_GetRoughSkinRegion(unsigned char* Src, const int32_t Width, const int32_t start_row, const int32_t thread_stride, const int32_t Stride, unsigned char* Dest) {
const int NonSkinLevel = 10; //非肤色部分的处理程序，本例取16，最大值取100，那样就是所有区域都为肤色，毫无意义
const int BlockSize = 16;
int Block = Width / BlockSize;
for (int Y = start_row; Y < start_row + thread_stride; Y++) {
unsigned char *LinePS = Src + Y * Stride;
unsigned char *LinePD = Dest + Y * Width;
for (int X = 0; X < Block * BlockSize; X += BlockSize, LinePS += BlockSize * 3, LinePD += BlockSize) {
__m128i Src1, Src2, Src3, Blue, Green, Red, Result, Max, Min, AbsDiff;
Src1 = _mm_loadu_si128((__m128i *)(LinePS + 0));
Src2 = _mm_loadu_si128((__m128i *)(LinePS + 16));
Src3 = _mm_loadu_si128((__m128i *)(LinePS + 32));

Blue = _mm_shuffle_epi8(Src1, _mm_setr_epi8(0, 3, 6, 9, 12, 15, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1));
Blue = _mm_or_si128(Blue, _mm_shuffle_epi8(Src2, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, 2, 5, 8, 11, 14, -1, -1, -1, -1, -1)));
Blue = _mm_or_si128(Blue, _mm_shuffle_epi8(Src3, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 1, 4, 7, 10, 13)));

Green = _mm_shuffle_epi8(Src1, _mm_setr_epi8(1, 4, 7, 10, 13, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1));
Green = _mm_or_si128(Green, _mm_shuffle_epi8(Src2, _mm_setr_epi8(-1, -1, -1, -1, -1, 0, 3, 6, 9, 12, 15, -1, -1, -1, -1, -1)));
Green = _mm_or_si128(Green, _mm_shuffle_epi8(Src3, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 2, 5, 8, 11, 14)));

Red = _mm_shuffle_epi8(Src1, _mm_setr_epi8(2, 5, 8, 11, 14, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1));
Red = _mm_or_si128(Red, _mm_shuffle_epi8(Src2, _mm_setr_epi8(-1, -1, -1, -1, -1, 1, 4, 7, 10, 13, -1, -1, -1, -1, -1, -1)));
Red = _mm_or_si128(Red, _mm_shuffle_epi8(Src3, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 3, 6, 9, 12, 15)));

Max = _mm_max_epu8(_mm_max_epu8(Blue, Green), Red); //IM_Max(IM_Max(Red, Green), Blue)
Min = _mm_min_epu8(_mm_min_epu8(Blue, Green), Red); //IM_Min(IM_Min(Red, Green), Blue)
Result = _mm_cmpge_epu8(Blue, _mm_set1_epi8(20)); //Blue >= 20
Result = _mm_and_si128(Result, _mm_cmpge_epu8(Green, _mm_set1_epi8(40))); //Green >= 40
Result = _mm_and_si128(Result, _mm_cmpge_epu8(Red, _mm_set1_epi8(60))); //Red >= 60
Result = _mm_and_si128(Result, _mm_cmpge_epu8(Red, Blue)); //Red >= Blue
Result = _mm_and_si128(Result, _mm_cmpge_epu8(_mm_subs_epu8(Red, Green), _mm_set1_epi8(10))); //(Red - Green) >= 10
Result = _mm_and_si128(Result, _mm_cmpge_epu8(_mm_subs_epu8(Max, Min), _mm_set1_epi8(10))); //IM_Max(IM_Max(Red, Green), Blue) - IM_Min(IM_Min(Red, Green), Blue) >= 10
Result = _mm_or_si128(Result, _mm_set1_epi8(16));
_mm_storeu_si128((__m128i*)(LinePD + 0), Result);
}
for (int X = Block * BlockSize; X < Width; X++, LinePS += 3, LinePD++)
{
int Blue = LinePS[0], Green = LinePS[1], Red = LinePS[2];
if (Red >= 60 && Green >= 40 && Blue >= 20 && Red >= Blue && (Red - Green) >= 10 && IM_Max(IM_Max(Red, Green), Blue) - IM_Min(IM_Min(Red, Green), Blue) >= 10)
LinePD[0] = 255;         // 全为肤色部分
else
LinePD[0] = 16;
}
}
}

void IM_GetRoughSkinRegion_SSE2(unsigned char *Src, unsigned char *Skin, int width, int height, int stride) {
const int32_t hw_concur = std::min(height >> 4, static_cast<int32_t>(std::thread::hardware_concurrency()));
std::vector<std::future<void>> fut(hw_concur);
const int thread_stride = (height - 1) / hw_concur + 1;
int i = 0, start = 0;
for (; i < std::min(height, hw_concur); i++, start += thread_stride)
{
fut[i] = std::async(std::launch::async, _IM_GetRoughSkinRegion, Src, width, start, thread_stride, stride, Skin);
}
for (int j = 0; j < i; ++j)
fut[j].wait();
}
```

4272x2848

1000

41.40ms

4272x2848

OpenMP 4线程

1000

36.54ms

4272x2848

SSE第一版

1000

6.77ms

4272x2848

SSE第二版(std::async)

1000

4.73ms

# 6. 总结

0 条评论

• ### 【AI PC端算法优化】二，一步步优化自然饱和度算法

今天要介绍的自然饱和度算法是一个开源图像处理软件PhotoDemon（地址：https://github.com/tannerhelland/PhotoDemo...

• ### 史上最详细的Yolov3边框预测分析

我们读yolov3论文时都知道边框预测的公式，然而难以准确理解为何作者要这么做，这里我就献丑来总结解释一下个人的见解，总结串联一下学习时容易遇到的疑惑，期待对大...

• ### PyTorch版CenterNet数据加载解析

本文主要解读CenterNet如何加载数据，并将标注信息转化为CenterNet规定的高斯分布的形式。

• ### SSE图像算法优化系列十：简单的一个肤色检测算法的SSE优化。

在很多场合需要高效率的肤色检测代码，本人常用的一个C++版本的代码如下所示: void IM_GetRoughSkinRegion(unsigned char...

• ### Gossip 协议详解

Gossip protocol 也叫 Epidemic Protocol （流行病协议）。Gossip protocol在1987年8月由施乐-帕洛阿尔托研究中...

• ### 分布式架构——Gossip 协议详解

Gossip protocol 也叫 Epidemic Protocol （流行病协议）。Gossip protocol在1987年8月由施乐-帕洛阿尔托研究中...

• ### 图表示学习起源: 从Word2vec到DeepWalk

本文发表在知乎专栏<435算法研究所>,介绍的是2014年的一篇文章《DeepWalk: Online Learning of Social Represent...

• ### Attribute-Based Encryption(ABE)属性加密

http://cpfd.cnki.com.cn/Article/CPFDTOTAL-ZGTH201310001015.htm

版权声明：本文为博主原创文章，未经博主允许不得转载。 https://blog.csdn.net/huyuyang6688/article/...