文章/答案/技术大牛

发布

社区首页 >问答首页 >如何在Notepad++中删除重复的单词？

问如何在Notepad++中删除重复的单词？
EN

Stack Overflow用户

提问于 2016-03-13 23:25:31

回答 1查看 573关注 0票数 0

我有一个很大的文本文件，如下所示：

Mitchel-2
Anna-2
Witold-4
Serena-3
Serena-9
Witros-3

所以我需要在"-“之前的第一个单词”-“从不重复。任何方法都可以删除除第一个之外的所有。所以，如果我有3000行以" Serena“开头，但"-”后面总是有一个不同的数字，有没有办法删除Serena的2999行，只留下第一行？

另外，Serena只是一个例子，我有超过200个重复的单词。

duplicates

notepad++

回答 1

Stack Overflow用户

发布于 2016-03-14 03:10:55

我不认为你能用notepad++做到这一点。您可以为每个名称使用正则表达式，但由于您有超过200个名称，这是不切实际的。

但是你可以写一个程序来帮你完成这项工作。基本上你会经历两个步骤：

1)搜索每个唯一的名称并将其保存在一个集合中(不允许重复的条目)。2)对于集合中的每个唯一名称，搜索文件上的重复项。

我已经编写了一个简单的c++程序来查找字符串变量中的重复项。您可以将其调整为您喜欢的语言。我用Microsoft Visual Studio Community 2015编译的(它在cpp.sh中不起作用)

#include "stdafx.h"
#include <regex>
#include <string>
#include <iostream>
#include <set>

using namespace std;

int main()
{

    typedef match_results<const char*> cmatch;
    set<string> names;

    string notepad_text = "Serena-1\nSerena-2\nSerena-3\nSerena-4\nAna-1\nSerena-7\nWilson-1\nAna-2\nJohn-1\nAna-3\nJohn-2\nWilson-2";
    regex regex_find_names("^\\w+"); //double slashes are needed because this is in a string

    // 1) Let's find every name

    //sregex_iterator it_beg(notepad_text.begin(), notepad_text.end(), regex_find_names);
    sregex_iterator find_names_itit(notepad_text.begin(), notepad_text.end(), regex_find_names);
    sregex_iterator it_end; //defaults to the end condition

    while (find_names_itit != it_end) {
        names.insert(find_names_itit->str()); //automatically deletes duplicates
        ++find_names_itit;
    }

    // 2) For demonstration purposes, let's print what we've found

    cout << "---printing the names we've found:\n\n";
    set<string>::const_iterator names_it; // declare an iterator
    names_it = names.begin();             // assign it to the start of the set
    while (names_it != names.end())       // while it hasn't reach the end
    {
        cout << *names_it << " ";
        ++names_it; 
    }

    // 3) Let's find the duplicates

    cout << "\n\n---printing the regex matches:\n";

    string current_name;
    set<string>::const_iterator current_name_it; //this iterates over every name we've found
    current_name_it = names.begin();
    while (current_name_it != names.end())
    {
        // we're building something like "^Serena.*"
        current_name = "^"; 
        current_name += *current_name_it; 
        current_name += ".*"; 
        cout << "\n-Lets find duplicates of: " << *current_name_it << endl;
        ++current_name_it;

        // let's iterate through the matches
        regex regex_obj(current_name); //double slashes are needed because this is in a string
        sregex_iterator it_beg(notepad_text.begin(), notepad_text.end(), regex_obj);
        sregex_iterator it(notepad_text.begin(), notepad_text.end(), regex_obj); //this iterates over the match results
        sregex_iterator it_end;
        //string res = *it;

        while (it != it_end) {
            if (it != it_beg)
            {
                cout << it->str() << endl;
            }
            ++it;

        }

    }


    int i; //depending on the compaling getting this additional char is necessary to see the console window
    cin >> i;
    return 0;
}

输入字符串为：

Serena-1
Serena-2
Serena-3
Serena-4
Ana-1
Serena-5
Wilson-1
Ana-2
John-1
Ana-3
John-2
Wilson-2

在这里打印

---printing the names we've found:

Ana John Serena Wilson

---printing the regex matches:

-Lets find duplicates of: Ana
Ana-2
Ana-3

-Lets find duplicates of: John
John-2

-Lets find duplicates of: Serena
Serena-2
Serena-3
Serena-4
Serena-5

-Lets find duplicates of: Wilson
Wilson-2

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/35972083

复制

相似问题

问如何在Notepad++中删除重复的单词？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在Notepad++中删除重复的单词？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何在Notepad++中删除重复的单词？
EN