Image hashing with OpenCV and Python

文章来源：企鹅号 - qzxy

图像哈希（Image hashing or perceptual hashing）过程如下：

检查图像的内容

根据图像的内容惟一地标识输入图像，构造一个散列值。

比较有名的图像散列实现或服务是TinEye（an reverse image search engine）。使用TinEye，用户可以上传一个图像，然后TinEye会告诉用户图像出现在哪个网站。

现在我们打算开发一个计算机视觉应用程序，它以两个包含图像的文件目录作为输入，比较其图像对应的哈希值。

我们将实现的图像散列算法叫做difference hashing或者简称为dHash，实现步骤如下：

Step #1: Convert to grayscale：

将输入图像转换为灰度，并丢弃任何颜色信息。（①丢弃颜色信息能够使我们更快的处理图像因为我们只需要计算单一通道②除此之外，这也有助于我们匹配颜色空间变化量很小的图片。）事实上，你也可以对颜色空间三个通道分别进行散列求值，最后再组合起来。

Step #2: Resize：

我们需要把图像压缩大小为9x8。（对于大多数图像+数据集，resizing/interpolation step是算法中最耗时间的部分）

我们将图像压缩到9x8（忽略纵横比），以确保得到的图像散列将匹配类似的照片，而不考虑它们的初始空间维度。

Step #3: Compute the difference：

我们的最终目标是计算一个64位的散列，因为8x8=64。如果我们以每一行9个像素的输入图像来计算相邻的列像素之间的差异，我们最终会得到8个不同的结果。八行八列的比较结果将变成我们的64位散列。

Step #4: Build the hash：

最后一步是分配位元并构建结果散列。比较相邻像素值，如果左边像素大于右边像素，则输出为1，反之，为0。最后，我们生成一组64个二进制值，然后将它们组合成一个64位整数（实际的图像散列）。

Benefits of dHash：

①如果我们的输入图像的纵横比改变了（因为我们忽略了纵横比），我们的图像哈希也不会改变。

②调整亮度或对比度将不会改变我们的散列值，或者只是变化微小。

③Difference hashing 算法计算很快。

Comparing difference hashes：

通常我们用汉明距离来比较哈希值。汉明距离测量了两个哈希值不同的比特位数。如果两个哈希值的汉明距离为0，则这两个哈希值是相同的（因为没有不同的位），而且这两个图像是相同的或相似的。

Dr. Neal Krawetz of HackerFactor 认为若不同比特数大于10位，则图像很可能是不同的，而汉明距离在1到10之间的可能是同一图像与其变体。

Implementing image hashing with OpenCV and Python

# import the necessary packages

fromimutilsimportpaths

importargparse

importtime

importsys

importcv2

importos

defdhash(image,hashSize=8):

# resize the input image, adding a single column (width) so we

# can compute the horizontal gradient

resized=cv2.resize(image,(hashSize+1,hashSize))

# compute the (relative) horizontal gradient between adjacent

# column pixels

diff=resized[:,1:]>resized[:,:-1]

# convert the difference image to a hash

returnsum([2**ifor(i,v)inenumerate(diff.flatten())ifv])

# construct the argument parse and parse the arguments

ap=argparse.ArgumentParser()

ap.add_argument("-a","--haystack",required=True,

help="dataset of images to search through (i.e., the haytack)")

ap.add_argument("-n","--needles",required=True,

help="set of images we are searching for (i.e., needles)")

args=vars(ap.parse_args())

# grab the paths to both the haystack and needle images

print("[INFO] computing hashes for haystack...")

haystackPaths=list(paths.list_images(args["haystack"]))

needlePaths=list(paths.list_images(args["needles"]))

# remove the `` character from any filenames containing a space

# (assuming you're executing the code on a Unix machine)

ifsys.platform!="win32":

haystackPaths=[p.replace("\\","")forpinhaystackPaths]

needlePaths=[p.replace("\\","")forpinneedlePaths]

# grab the base subdirectories for the needle paths, initialize the

# dictionary that will map the image hash to corresponding image,

# hashes, then start the timer

haystack={}

start=time.time()

# loop over the haystack paths

forpinhaystackPaths:

# load the image from disk

image=cv2.imread(p)

# if the image is None then we could not load it from disk (so

# skip it)

ifimageisNone:

continue

# convert the image to grayscale and compute the hash

image=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

imageHash=dhash(image)

# update the haystack dictionary

l=haystack.get(imageHash,[])

l.append(p)

haystack[imageHash]=l

# show timing for hashing haystack images, then start computing the

# hashes for needle images

print("[INFO] processed {} images in {:.2f} seconds".format(

len(haystack),time.time()-start))

print("[INFO] computing hashes for needles...")

# loop over the needle paths

forpinneedlePaths:

# load the image from disk

image=cv2.imread(p)

# if the image is None then we could not load it from disk (so

# skip it)

ifimageisNone:

continue

# convert the image to grayscale and compute the hash

image=cv2.cvtColor(image,cv2.COLOR_BGR2GRAY)

imageHash=dhash(image)

# grab all image paths that match the hash

matchedPaths=haystack.get(imageHash,[])

# loop over all matched paths

formatchedPathinmatchedPaths:

# extract the subdirectory from the image path

# if the subdirectory exists in the base path for the needle

# images, remove it

ifbinBASE_PATHS:

BASE_PATHS.remove(b)

# display directories to check

print("[INFO] check the following directories...")

# loop over each subdirectory and display it

forbinBASE_PATHS:

print("[INFO] {}".format(b))

Image hashing with OpenCV and Python results

打开命令行界面，跳到程序目录路径下，输入：

pythonhash_and_search.py--haystackhaystack--needlesneedles

输出结果为：

[INFO]computing hashesforhaystack...

[INFO]processed1000imagesin7.43seconds

[INFO]computing hashesforneedles...

[INFO]check the followingdirectories...

[INFO]PIX

[INFO]December2014

测试源代码下载地址如下：

https://www.pyimagesearch.com/2017/11/27/image-hashing-opencv-python/

拓展参考网站：

https://github.com/opencv/opencv_contrib/tree/master/modules/img_hash/src

http://qtandopencv.blogspot.com/2016/06/introduction-to-image-hash-module-of.html

https://github.com/JohannesBuchner/imagehash

http://www.hackerfactor.com/blog/index.php?/archives/529-Kind-of-Like-That.html

发表于: 2018-04-062018-04-06 16:39:41
原文链接：http://kuaibao.qq.com/s/20180406G0SXDN00?refer=cp_1026
腾讯「腾讯云开发者社区」是腾讯内容开放平台帐号（企鹅号）传播渠道之一，根据《腾讯内容开放平台服务协议》转载发布内容。
如有侵权，请联系 cloudcommunity@tencent.com 删除。

扫码

添加站长进交流群

领取专属 10元无门槛券

私享最新 技术干货

Image hashing with OpenCV and Python

相关快讯

扫码

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐