表面模糊原理与 python 实现

为为为什么

发布于 2022-08-05 14:51:00

6030

发布于 2022-08-05 14:51:00

文章被收录于专栏：又见苍岚又见苍岚

常规的模糊算法如高斯模糊等会模糊图像边缘，很多场景中我们需要保留图像纹理并模糊一些细节，这就可以使用PS中的表面模糊。

表面模糊

表面模糊有两个参数，半径Radius和阈值Threshold。如果我们知道了以某点为中心，半径为Radius范围内的直方图数据Hist，以及该点的像素值，那根据原始的算法，其计算公式为：

x = \frac { \sum _ { i = 1 } ^ { ( 2 r + 1 ) ^ { 2 } } [ ( 1 - \frac { | x _ { i } - x _ { 1 } | } { 2.5 Y } ) x _ { i } ] } { \sum _ { i = 1 } ^ { ( 2 r + 1 ) ^ { 2 } } ( 1 - \frac { | x _ { i } - x _ { 1 } | } { 2.5 Y } ) }

其中：r 为半径，Y为阈值， x_1为当前像素阶值，x_i为模板中某个像素值，x为当前像素结果阶值

主要思想还是计算当前像素X的邻域范围内不同像素的加权求和，与 x_1 像素值接近的点权重比较大，反之权重较小，以此来保留边缘信息，平滑平坦区域；
python 代码：

参考了网络流行的Python版本，做了一点点优化和修正使用了numba cpu加速，可以提速10倍，但还是没有c++快

@nb.jit(nopython=True)
def Surface_blur(I_in, thre, radius):

    I_out = I_in.copy()
    row, col = I_in.shape

    for ii in range(radius, row-1-radius):
        for jj in range(radius, col - 1 - radius):
            p0 = I_in[ii, jj]
            aa = I_in[ii-radius: ii+radius+1, jj-radius: jj + radius + 1]

            mask_2 = 1-np.abs(aa-p0)/(2.5*thre)
            mask_3 = mask_2 * (mask_2 > 0)
            t1 = aa * mask_3
            I_out[ii, jj] = t1.sum()/mask_3.sum()

    return I_out

numba CUDA 加速代码：

from numba import cuda

@cuda.jit
def image_process_cuda(img_cuda, result_img_cuda, y_size, x_size):
    y = cuda.threadIdx.y + cuda.blockDim.y * cuda.blockIdx.y
    x = cuda.threadIdx.x + cuda.blockDim.x * cuda.blockIdx.x

    radius = 8
    thre = 20
    if radius < y < y_size-radius and radius < x < x_size-radius:
        a = 0.0
        b = 0.0
        x_1 = float(img_cuda[y, x, 0])
        for x_index in range(x-radius, x+radius+1):
            for y_index in range(y-radius, y+radius+1):
                x_i = float(img_cuda[y_index, x_index, 0])
                
                temp_b = (1 - abs(x_i - x_1) / 2.5 / thre)
                if temp_b <= 0:
                    continue
                b += temp_b
                a += temp_b * x_i
        for i in range(3):
            result_img_cuda[y, x, i] = int(round(a / b))

调用：

img = ...
ori_image_cuda = cuda.to_device(img)
copy_image_cuda = cuda.to_device(copy_image)
image_process_cuda[blocks_per_grid, threads_per_block](ori_image_cuda, copy_image_cuda, y_size, x_size)
result_img = copy_image_cuda.copy_to_host()