我有两个文件夹的照片。第二个文件夹假定完全由第一个文件夹中的照片的副本组成。我的工作是确认第二个文件夹实际上完全由副本组成。
我的脚本从2号文件夹拍摄一张照片,并将其与1号文件夹中的每张照片进行比较。每次比较都会产生一个相似值。如果相似性值大于16 (表示正匹配),则计数器变量加1。一旦针对文件夹1中的所有照片检查了2号文件夹中的照片,就会检查计数器。如果它仍然是零,则将照片添加到列表中。这部分代码可以正常工作,我对此很满意。
问题是,我还想从文件夹一(即,具有从1到16的相似度排名的照片与文件夹2中的照片),以便我可以对这些照片进行手动检查。我还希望这些结果是数据帧格式的,以便轻松呈现到可视化的html页面中。以下是我想要的最终结果:
data = {'Photo': ['C:\Lucy Maud in Garden.jpg','C:\Henry by car.jpg','C:\Lucy and Henry arms together.jpg','C:\Lucy Maud with dog.jpg'],
'NearMatch': ['C:\Lucy Maud in Garden2.jpg','C:\Henry by car2.jpg','C:\Lucy and Henry arms together2.jpg','C:\Lucy Maud with dog2.jpg'],
'Similarity': [1,2,1,11]
}
df = pd.DataFrame (data, columns = ['Photo','NearMatch','Similarity'])
下面是我的代码:
from __future__ import division
import cv2
import numpy as np
import glob
import pandas as pd
# Sift and Flann
sift = cv2.SIFT_create()
index_params = dict(algorithm=0, trees=5)
search_params = dict()
flann = cv2.FlannBasedMatcher(index_params, search_params)
#prep the empty lists
countInner = 0
countOuter = 1
countNoMatch = 0
nearMatch = []
nearMatch2 = []
listOfSimilarities = []
listOfDisimilarities = []
# Load all the images
folder1 = r"C:/ProbablyDups/**"
folder2 = r"C:/DefinitiveCopy/**"
extensionsOnly = ('.jpeg','.jpg','.png','.tif','.tiff','.gif')
siftOut1 = {}
for a in glob.iglob(folder1,recursive=True):
if not a.lower().endswith(extensionsOnly):
continue
image1 = cv2.imread(a)
kp_1, desc_1 = sift.detectAndCompute(image1, None)
siftOut1[a]=(kp_1,desc_1)
siftOut2 = {}
for a in glob.iglob(folder2,recursive=True):
if not a.lower().endswith(extensionsOnly):
continue
image1 = cv2.imread(a)
kp_1, desc_1 = sift.detectAndCompute(image1, None)
siftOut2[a]=(kp_1,desc_1)
#Compare photos in loops
for a in glob.iglob(folder1,recursive=True):
if not a.lower().endswith(extensionsOnly):
continue
(kp_1,desc_1) = siftOut1[a]
for b in glob.iglob(folder2,recursive=True):
if not b.lower().endswith(extensionsOnly):
continue
if b.lower().endswith(extensionsOnly):
countInner += 1
(kp_2,desc_2) = siftOut2[b]
matches = flann.knnMatch(desc_1, desc_2, k=2)
good_points = []
for m, n in matches:
if m.distance < 0.6*n.distance:
good_points.append(m)
number_keypoints = 0
if len(kp_1) >= len(kp_2):
number_keypoints = len(kp_1)
else:
number_keypoints = len(kp_2)
percentage_similarity = int(float(len(good_points)) / number_keypoints * 100)
# add a tick to the counter if there is positive match
if percentage_similarity > 16:
countNoMatch =+1
#part that is not working:
if percentage_similarity < 16 and percentage_similarity > 0:
nearMatch.append(a)
nearMatch2.append(b)
listOfSimilarities.append(percentage_similarity)
if countNoMatch == 0:
listOfDisimilarities.append(a)
df2=pd.DataFrame({"NoMatch":listOfDisimilarities})
zippedList = list(zip(nearMatch,nearMatch2, listOfSimilarities))
print(zippedList)
nearMatch = []
nearMatch2 = []
final_df = pd.concat(zippedList, ignore_index=True)
countNoMatch = 0
if a.lower().endswith(extensionsOnly):
countOuter += 1
print(final_df)
df.to_csv(r"C:/Documents/NearMatch.csv")
我尝试做的事情:
我试图添加一个新的循环,它在比较点问:这个相似度排名是在1到16之间吗?如果是,则添加到列表nearMatch2。然后,当循环完成时,代码会问一个新的问题:计数器(表示没有大于16的正匹配)是否为零?如果是,将以下列表压缩在一起: nearMatch2、nearMatch和listOfSimilarities (代表排名号)。
问题是,当一切都完成后,我以元组列表的形式获得数据,但我不知道如何将其转换为数据帧。我尝试过append、assign、loc和iloc、concat,但都不起作用。使用concat时,我得到的错误是Error: cannot concatenate object of type '<class 'tuple'>'; only Series and DataFrame objs are valid
发布于 2021-07-18 01:57:24
让它工作了-找到了一个名为extend
的东西,它添加到了一个列表中。但仍然不是完全优雅--欢迎其他解决方案。
from __future__ import division
import cv2
import numpy as np
import glob
import pandas as pd
# Sift and Flann
sift = cv2.SIFT_create()
index_params = dict(algorithm=0, trees=5)
search_params = dict()
flann = cv2.FlannBasedMatcher(index_params, search_params)
# Load all the images1
countInner = 0
countOuter = 1
countNoMatch = 0
nearMatch = []
nearMatch2 = []
listOfSimilarities = []
nearMatchAgg = []
nearMatch2Agg = []
listOfSimilaritiesAgg = []
folder1 = r"/media/folderTwo/**"
folder2 = r"/media/folderOne/**"
extensionsOnly = ('.jpeg','.jpg','.png','.tif','.tiff','.gif')
siftOut1 = {}
for a in glob.iglob(folder1,recursive=True):
if not a.lower().endswith(extensionsOnly):
continue
image1 = cv2.imread(a)
kp_1, desc_1 = sift.detectAndCompute(image1, None)
siftOut1[a]=(kp_1,desc_1)
siftOut2 = {}
for a in glob.iglob(folder2,recursive=True):
if not a.lower().endswith(extensionsOnly):
continue
image1 = cv2.imread(a)
kp_1, desc_1 = sift.detectAndCompute(image1, None)
siftOut2[a]=(kp_1,desc_1)
for a in glob.iglob(folder1,recursive=True):
if not a.lower().endswith(extensionsOnly):
continue
(kp_1,desc_1) = siftOut1[a]
for b in glob.iglob(folder2,recursive=True):
if not b.lower().endswith(extensionsOnly):
continue
if b.lower().endswith(extensionsOnly):
countInner += 1
# print(countInner, "", countOuter, "", countNoMatch)
# you don't need this when you are comparing two folders
# if countInner <= countOuter:
# continue
(kp_2,desc_2) = siftOut2[b]
matches = flann.knnMatch(desc_1, desc_2, k=2)
good_points = []
for m, n in matches:
if m.distance < 0.6*n.distance:
good_points.append(m)
number_keypoints = 0
if len(kp_1) >= len(kp_2):
number_keypoints = len(kp_1)
else:
number_keypoints = len(kp_2)
percentage_similarity = int(float(len(good_points)) / number_keypoints * 100)
# print(percentage_similarity)
if percentage_similarity > 16:
countNoMatch =+1
if percentage_similarity < 16 and percentage_similarity > 0:
nearMatch.append(a)
nearMatch2.append(b)
listOfSimilarities.append(percentage_similarity)
if countNoMatch == 0:
listOfDisimilarities.append(a)
df2=pd.DataFrame({"NoMatch":listOfDisimilarities})
nearMatchAgg.extend(nearMatch)
nearMatch2Agg.extend(nearMatch2)
listOfSimilaritiesAgg.extend(listOfSimilarities)
nearMatch = []
nearMatch2 = []
listOfSimilarities=[]
zippedList = list(zip(nearMatchAgg,nearMatch2Agg, listOfSimilaritiesAgg))
countNoMatch = 0
if a.lower().endswith(extensionsOnly):
countOuter += 1
dfObj = pd.DataFrame(zippedList, columns = ['Original', 'Title' , 'Similarity'])
dfObj.to_csv(r"C:/Documents/PhotoResults.csv")
https://stackoverflow.com/questions/68416339
复制相似问题