我想在txt文件中搜索不包括p和比较中的扩展名的重复行。一旦确定了等号行,就只显示不包含p及其扩展的行。我在test.txt有这几行
Peliculas/Desperados (2020)[p].mp4
Peliculas/La Duquesa (2008)[p].mp4
Peliculas/Nueva York Año 2012 (1975).mkv
Peliculas/Acoso en la noche (1980) .mkv
Peliculas/Angustia a Flor de Piel (1982).mkv
Peliculas/Desperados (2020).mkv
Peliculas/Angustia (1947).mkv
Peliculas/Días de radio (1987) BR1080[p].mp4
Peliculas/Mona Lisa (1986) BR1080[p].mp4
Peliculas/La decente (1970) FlixOle WEB-DL 1080p [Buzz][p].mp4
Peliculas/Mona Lisa (1986) BR1080.mkv在这个文件中,第1-6行和第9-11行是相同的(含ext和p)。所需产出:
Peliculas/Desperados (2020).mkv
Peliculas/Mona Lisa (1986) BR1080.mkv我尝试这样做,但只显示相同的行,删除扩展和模式p,但我不知道正确的行,我需要完整的整行。
sed 's/\[p\]//' ./test.txt | sed 's\.[^.]*$//' | sort | uniq -d
错误输出(缺少扩展):
Peliculas/Desperados (2020)
Peliculas/Mona Lisa (1986) BR1080发布于 2020-07-07 00:27:36
因为你提到巴什..。
使用p删除任何行
cat test.txt | grep -v p
home/folder/house from earth.mkv
home/folder3/window 1.avi用[p]删除任何行
cat test.txt | grep -v '\[p\]'
home/folder/house from earth.mkv
home/folder3/window 1.avi
home/folder4/little mouse.mpg这不太可能是您的需要,只是因为:从每一行中删除[p],然后执行以下操作:
cat test.txt | sed 's/\[p\]//g' | sort | uniq
home/folder/house from earth.mkv
home/folder/house from earth.mp4
home/folder2/test.mp4
home/folder3/window 1.avi
home/folder3/window 1.mp4
home/folder4/little mouse.mpg 发布于 2020-07-07 00:38:27
在Python中,您可以将itertools.groupby与一个函数一起使用,该函数生成一个键,该键由文件名组成,没有任何[p],扩展名被移除。
对于任何大小为2或更大的组,将打印不包含“p”的任何文件名。
import itertools
import re
def make_key(line):
return re.sub(r'\.[^.]*$', '', line.replace('[p]', ''))
with open('test.txt') as f:
lines = [line.strip() for line in f]
for key, group in itertools.groupby(lines, make_key):
files = [file for file in group]
if len(files) > 1:
for file in files:
if '[p]' not in file:
print(file)这意味着:
home/folder/house from earth.mkv
home/folder3/window 1.avi发布于 2020-07-07 03:59:11
如果两次读取test.txt文件的解决方案是可以接受的,请尝试:
declare -A ary # associate the filename with the base
while IFS= read -r file; do
if [[ $file != *\[p\]* ]]; then # the filename does not include "[p]"
base="${file%.*}" # remove the extension
ary[$base]="$file" # create a map
fi
done < test.txt
while IFS= read -r base; do
echo "${ary[$base]}"
done < <(sed 's/\[p\]//' ./test.txt | sed 's/\.[^.]*$//' | sort | uniq -d)输出:
Peliculas/Desperados (2020).mkv
Peliculas/Mona Lisa (1986) BR1080.mkv如果您喜欢1通解决方案(这将更快),请尝试:
declare -A ary # associate the filename with the base
declare -A count # count the occurrences of the base
while IFS= read -r file; do
base="${file%.*}" # remove the extension
if [[ $base =~ (.*)\[p\](.*) ]]; then
# "$base" contains the substring "[p]"
(( count[${BASH_REMATCH[1]}${BASH_REMATCH[2]}]++ ))
# increment the counter
else
(( count[$base]++ )) # increment the counter
ary[$base]="$file" # map the filename
fi
done < test.txt
for base in "${!ary[@]}"; do # loop over the keys of ${ary[@]}
if (( count[$base] > 1 )); then
# it duplicates
echo "${ary[$base]}"
fi
donehttps://stackoverflow.com/questions/62766221
复制相似问题