专栏首页SnailTyanLinux删除重复文件

Linux删除重复文件

文章作者:Tyan 博客:noahsnail.com | CSDN | 简书

1. 引言

在Linux系统处理数据时,经常会遇到删除重复文件的问题。例如,在进行图片分类任务时,希望删除训练数据中的重复图片。在Linux系统中,存在一个fdupes命令可以查找并删除重复文件。

2. Fdupes介绍

Fdupes是Adrian Lopez用C语言编写的Linux实用程序,它能够在给定的目录和子目录集中找到重复文件,Fdupes通过比较文件的MD5签名然后进行字节比较来识别重复文件。其比较顺序为:

大小比较 > 部分MD5签名比较 > 完整MD5签名比较 > 字节比较

3. 安装fdupes

以CentOS系统为例,fdupes的安装命令为:

sudo yum install -y fdupes

4. fdupes的使用

删除重复文件,并且不需要询问用户:

$ fdupes -dN [folder_name]

其中,-d参数表示保留一个文件,并删除其它重复文件,-N-d一起使用,表示保留第一个重复文件并删除其它重复文件,不需要提示用户。

使用说明:

$ fdupes -h
Usage: fdupes [options] DIRECTORY...

 -r --recurse           for every directory given follow subdirectories
                        encountered within
 -R --recurse:          for each directory given after this option follow
                        subdirectories encountered within (note the ':' at
                        the end of the option, manpage for more details)
 -s --symlinks          follow symlinks
 -H --hardlinks         normally, when two or more files point to the same
                        disk area they are treated as non-duplicates; this
                        option will change this behavior
 -n --noempty           exclude zero-length files from consideration
 -A --nohidden          exclude hidden files from consideration
 -f --omitfirst         omit the first file in each set of matches
 -1 --sameline          list each set of matches on a single line
 -S --size              show size of duplicate files
 -m --summarize         summarize dupe information
 -q --quiet             hide progress indicator
 -d --delete            prompt user for files to preserve and delete all
                        others; important: under particular circumstances,
                        data may be lost when using this option together
                        with -s or --symlinks, or when specifying a
                        particular directory more than once; refer to the
                        fdupes documentation for additional information
 -N --noprompt          together with --delete, preserve the first file in
                        each set of duplicates and delete the rest without
                        prompting the user
 -I --immediate         delete duplicates as they are encountered, without
                        grouping into sets; implies --noprompt
 -p --permissions       don't consider files with different owner/group or
                        permission bits as duplicates
 -o --order=BY          select sort order for output and deleting; by file
                        modification time (BY='time'; default), status
                        change time (BY='ctime'), or filename (BY='name')
 -i --reverse           reverse order while sorting
 -v --version           display fdupes version
 -h --help              display this help message

参考资料

  1. https://www.tecmint.com/fdupes-find-and-delete-duplicate-files-in-linux/
  2. https://www.howtoing.com/fdupes-find-and-delete-duplicate-files-in-linux
  3. http://www.runoob.com/linux/linux-comm-who.html 关注

本文参与腾讯云自媒体分享计划,欢迎正在阅读的你也加入,一起分享。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • Batch Normalization论文翻译——中英文对照

    Batch Normalization: Accelerating Deep Network Training by Reducing Internal Cov...

    Tyan
  • Linux下shell显示用户名和主机名

    版权声明:博客文章都是作者辛苦整理的,转载请注明出处,谢谢! https://blog.cs...

    Tyan
  • U-Net - Convolutional Networks 论文翻译——中英文对照

    翻译论文汇总:https://github.com/SnailTyan/deep-learning-papers-translation

    Tyan
  • POJ-1887 Testing the CATCHER(dp,最长下降子序列)

    Testing the CATCHER Time Limit: 1000MS Memory Limit: 30000K Total Submi...

    ShenduCC
  • 简单web安全入门

    Secure by default, in software, means that the default configuration settings ar...

    linjinhe
  • Solution for wear-leveling

    Flash is a type of electrically-erasable programmable read-only memory (EEPROM) ...

    anytao
  • No thread-bound request found异常

    本文主要研究下spring mvc的No thread-bound request found异常

    codecraft
  • 为 VUE 项目添加 PWA 解决发布后刷新报错问题

    下面这些文件忘记出处是哪,Github也能搜到一些,之前写的 PWA 的 Demo 里面拿过来的~

    易墨
  • 机器学习基本概念-3

    前两篇介绍了ML中的一些基本概念,还有一些很重要的概念也还没有说到,作为入门教程还是需要直观点,所以先举个最简单的例子线性回归(linear regresion...

    GavinZhou
  • leetcode450. Delete Node in a BST

    假设有一棵二叉搜索树,现在要求从二叉搜索树中删除指定值,使得删除后的结果依然是一棵二叉搜索树。

    眯眯眼的猫头鹰

扫码关注云+社区

领取腾讯云代金券