问从未排序列表中删除重复单词
EN

Code Review用户

提问于 2014-02-13 19:34:29

回答 1查看 250关注 0票数 6

我是用蛮力删除重复的词，因为名单真的很小。但是我想要一个解决方案，如果输入增加，它不会变得太慢。

此函数创建二叉树并插入列表中的所有单词，然后收集唯一的单词而不进行排序。插入过程中处理重复的单词。对于树，我使用的代码与不平衡二叉树中的代码大致相同。

#include "bst.h"
#include <strings.h>
#include <stdlib.h>

#define LIST_TERMINATOR 1

static size_t i = 0;
static char **final_list;

static void insert(void *word)
{
    final_list[i++] = word;
}

char **unique_words(const char **words)
{
    //Binary tree containing the words
    BST unique;
    bst_init(&unique, (int(*)(const void *, const void *))strcasecmp);

    //Every word will be inserted at most 1 time
    while(*words != NULL){
        if(bst_insert(&unique, (void *)*words) == BST_NO_MEMORY){
            bst_free(&unique);
            return NULL;
        }

        ++words;
    }

    //Array to return
    final_list = malloc(sizeof(char *) * (unique.node_count + LIST_TERMINATOR));
    if(final_list == NULL){
        bst_free(&unique);
        return NULL;
    }

    //Collect words without sorting, so if the list is merged with another 
    //and passed again, the tree won't become a linked list
    if(bst_iterate_top_down(&unique, insert) == BST_NO_MEMORY){
        free(final_list);
        bst_free(&unique);
        return NULL;
    }
    final_list[i] = NULL;

    bst_free(&unique);

    //Clear state
    i = 0;

    return final_list;
}

对输入进行排序，然后删除重复项会更快吗？

performance

algorithm

回答 1

Code Review用户

回答已采纳

发布于 2014-02-14 04:45:04

这个问题归结为操作的数量和顺序。如果您正在构建一个大列表，但要删除大量的重复项，请使用哈希表或排序，并对列表进行惟一化。

排序和使列表唯一充其量是O(n log )。删除重复最坏的情况是O(n)，因此删除m重复是O(m * n)。在一般的O(k * n) = O(n)中，一旦m超过log n，一次排序的成本就会为它自己支付。另外的查找保持O(1)。

对于m和n，您必须用实际值评估成本，以决定哪种方法更好。当然，随着时间的推移，它们的值会随着CPU相对于RAM相对于磁盘的相对成本而变化。但是随着时间的推移，大多数m和n的“可能”值，依赖于标准的库集合类

票数 5

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/41581

复制

相似问题

问从未排序列表中删除重复单词
EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从未排序列表中删除重复单词EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从未排序列表中删除重复单词
EN