文章/答案/技术大牛

发布

社区首页 >问答首页 >照片相似性检查-将循环结果附加到数据帧

问照片相似性检查-将循环结果附加到数据帧
EN

Stack Overflow用户

提问于 2021-07-17 07:56:03

回答 1查看 33关注 0票数 1

我有两个文件夹的照片。第二个文件夹假定完全由第一个文件夹中的照片的副本组成。我的工作是确认第二个文件夹实际上完全由副本组成。

我的脚本从2号文件夹拍摄一张照片，并将其与1号文件夹中的每张照片进行比较。每次比较都会产生一个相似值。如果相似性值大于16 (表示正匹配)，则计数器变量加1。一旦针对文件夹1中的所有照片检查了2号文件夹中的照片，就会检查计数器。如果它仍然是零，则将照片添加到列表中。这部分代码可以正常工作，我对此很满意。

问题是，我还想从文件夹一(即，具有从1到16的相似度排名的照片与文件夹2中的照片)，以便我可以对这些照片进行手动检查。我还希望这些结果是数据帧格式的，以便轻松呈现到可视化的html页面中。以下是我想要的最终结果：

data = {'Photo': ['C:\Lucy Maud in Garden.jpg','C:\Henry by car.jpg','C:\Lucy and Henry arms together.jpg','C:\Lucy Maud with dog.jpg'],
     'NearMatch': ['C:\Lucy Maud in Garden2.jpg','C:\Henry by car2.jpg','C:\Lucy and Henry arms together2.jpg','C:\Lucy Maud with dog2.jpg'],
     'Similarity': [1,2,1,11]
        }


df = pd.DataFrame (data, columns = ['Photo','NearMatch','Similarity'])

下面是我的代码：

from __future__ import division

import cv2
import numpy as np
import glob
import pandas as pd

    # Sift and Flann
sift = cv2.SIFT_create()


index_params = dict(algorithm=0, trees=5)
search_params = dict()
flann = cv2.FlannBasedMatcher(index_params, search_params)

#prep the empty lists

countInner = 0
countOuter = 1
countNoMatch = 0
nearMatch = []
nearMatch2 = []
listOfSimilarities = []
listOfDisimilarities = []

# Load all the images

folder1 = r"C:/ProbablyDups/**"
folder2 = r"C:/DefinitiveCopy/**"


extensionsOnly = ('.jpeg','.jpg','.png','.tif','.tiff','.gif')

siftOut1 = {}
for a in glob.iglob(folder1,recursive=True):
    if not a.lower().endswith(extensionsOnly):
        continue
    image1 = cv2.imread(a)
    kp_1, desc_1 = sift.detectAndCompute(image1, None)
    siftOut1[a]=(kp_1,desc_1)

siftOut2 = {}
for a in glob.iglob(folder2,recursive=True):
    if not a.lower().endswith(extensionsOnly):
        continue
    image1 = cv2.imread(a)
    kp_1, desc_1 = sift.detectAndCompute(image1, None)
    siftOut2[a]=(kp_1,desc_1)

#Compare photos in loops
for a in glob.iglob(folder1,recursive=True):
    if not a.lower().endswith(extensionsOnly):
        continue

    (kp_1,desc_1) = siftOut1[a]

    for b in glob.iglob(folder2,recursive=True):


        if not b.lower().endswith(extensionsOnly):

            continue

        if b.lower().endswith(extensionsOnly):

            countInner += 1


        (kp_2,desc_2) = siftOut2[b]

        matches = flann.knnMatch(desc_1, desc_2, k=2)

        good_points = []

        for m, n in matches:
            if m.distance < 0.6*n.distance:
                good_points.append(m)

        number_keypoints = 0
        if len(kp_1) >= len(kp_2):
            number_keypoints = len(kp_1)
        else:
            number_keypoints = len(kp_2)

        percentage_similarity = int(float(len(good_points)) / number_keypoints * 100)
        # add a tick to the counter if there is positive match
        if percentage_similarity > 16:
            countNoMatch =+1
        #part that is not working:
        if percentage_similarity < 16 and percentage_similarity > 0:
            nearMatch.append(a)
            nearMatch2.append(b)
            listOfSimilarities.append(percentage_similarity)
    
    if countNoMatch == 0:
        listOfDisimilarities.append(a)
        df2=pd.DataFrame({"NoMatch":listOfDisimilarities})
        zippedList =  list(zip(nearMatch,nearMatch2, listOfSimilarities))
        print(zippedList)
        nearMatch = []
        nearMatch2 = []
        final_df = pd.concat(zippedList, ignore_index=True)
    
    countNoMatch = 0
    if a.lower().endswith(extensionsOnly):
        countOuter += 1
print(final_df)

df.to_csv(r"C:/Documents/NearMatch.csv")

我尝试做的事情：

我试图添加一个新的循环，它在比较点问:这个相似度排名是在1到16之间吗？如果是，则添加到列表nearMatch2。然后，当循环完成时，代码会问一个新的问题:计数器(表示没有大于16的正匹配)是否为零？如果是，将以下列表压缩在一起: nearMatch2、nearMatch和listOfSimilarities (代表排名号)。

问题是，当一切都完成后，我以元组列表的形式获得数据，但我不知道如何将其转换为数据帧。我尝试过append、assign、loc和iloc、concat，但都不起作用。使用concat时，我得到的错误是Error: cannot concatenate object of type '<class 'tuple'>'; only Series and DataFrame objs are valid

python

dataframe

cv2

回答 1

Stack Overflow用户

发布于 2021-07-18 01:57:24

让它工作了-找到了一个名为extend的东西，它添加到了一个列表中。但仍然不是完全优雅--欢迎其他解决方案。

from __future__ import division

import cv2
import numpy as np
import glob
import pandas as pd



    # Sift and Flann
sift = cv2.SIFT_create()


index_params = dict(algorithm=0, trees=5)
search_params = dict()
flann = cv2.FlannBasedMatcher(index_params, search_params)

# Load all the images1

countInner = 0
countOuter = 1
countNoMatch = 0
nearMatch = []
nearMatch2 = []
listOfSimilarities = []
nearMatchAgg = []
nearMatch2Agg = []
listOfSimilaritiesAgg = []


folder1 = r"/media/folderTwo/**"
folder2 = r"/media/folderOne/**"


extensionsOnly = ('.jpeg','.jpg','.png','.tif','.tiff','.gif')

siftOut1 = {}
for a in glob.iglob(folder1,recursive=True):
    if not a.lower().endswith(extensionsOnly):
        continue
    image1 = cv2.imread(a)
    kp_1, desc_1 = sift.detectAndCompute(image1, None)
    siftOut1[a]=(kp_1,desc_1)

siftOut2 = {}
for a in glob.iglob(folder2,recursive=True):
    if not a.lower().endswith(extensionsOnly):
        continue
    image1 = cv2.imread(a)
    kp_1, desc_1 = sift.detectAndCompute(image1, None)
    siftOut2[a]=(kp_1,desc_1)


for a in glob.iglob(folder1,recursive=True):
    if not a.lower().endswith(extensionsOnly):
        continue

    (kp_1,desc_1) = siftOut1[a]

    for b in glob.iglob(folder2,recursive=True):


        if not b.lower().endswith(extensionsOnly):

            continue

        if b.lower().endswith(extensionsOnly):

            countInner += 1

        # print(countInner, "", countOuter, "", countNoMatch)

        # you don't need this when you are comparing two folders
        # if countInner <= countOuter:

        #     continue


        (kp_2,desc_2) = siftOut2[b]

        matches = flann.knnMatch(desc_1, desc_2, k=2)

        good_points = []

        for m, n in matches:
            if m.distance < 0.6*n.distance:
                good_points.append(m)

        number_keypoints = 0
        if len(kp_1) >= len(kp_2):
            number_keypoints = len(kp_1)
        else:
            number_keypoints = len(kp_2)

        percentage_similarity = int(float(len(good_points)) / number_keypoints * 100)
        # print(percentage_similarity)
        if percentage_similarity > 16:
            countNoMatch =+1
        if percentage_similarity < 16 and percentage_similarity > 0:
            nearMatch.append(a)
            nearMatch2.append(b)
            listOfSimilarities.append(percentage_similarity)
    
    if countNoMatch == 0:
        listOfDisimilarities.append(a)
        df2=pd.DataFrame({"NoMatch":listOfDisimilarities})
        nearMatchAgg.extend(nearMatch)
        nearMatch2Agg.extend(nearMatch2)
        listOfSimilaritiesAgg.extend(listOfSimilarities)
        nearMatch = []
        nearMatch2 = []
        listOfSimilarities=[]
    
    zippedList = list(zip(nearMatchAgg,nearMatch2Agg, listOfSimilaritiesAgg))
    
    countNoMatch = 0
    if a.lower().endswith(extensionsOnly):
        countOuter += 1
dfObj = pd.DataFrame(zippedList, columns = ['Original', 'Title' , 'Similarity'])

dfObj.to_csv(r"C:/Documents/PhotoResults.csv")

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/68416339

复制

相似问题

问照片相似性检查-将循环结果附加到数据帧
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问照片相似性检查-将循环结果附加到数据帧EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问照片相似性检查-将循环结果附加到数据帧
EN