问如何创建文件中每个单词的频率列表？
EN

Stack Overflow用户

提问于 2012-05-11 21:57:31

回答 12查看 68.5K关注 0票数 45

我有一个这样的文件：

This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.

我想生成一个由两列组成的列表。第一列显示出现的单词，第二列显示它们出现的频率，例如：

this@1
is@1
a@1
file@1
with@1
many@1
words3
some@2
of@2
the@2
only@1
appear@2
more@1
than@1
one@1
once@1
time@1

为了使这项工作更简单，在处理列表之前，我将删除所有标点符号，并将所有文本更改为小写letters.
Unless有一个简单的解决方案，words和word可以算作两个单独的单词。

到目前为止，我有这样的想法：

sed -i "s/ /\n/g" ./file1.txt # put all words on a new line
while read line
do
     count="$(grep -c $line file1.txt)"
     echo $line"@"$count >> file2.txt # add word and frequency to file
done < ./file1.txt
sort -u -d # remove duplicate lines

由于某些原因，这只在每个单词后显示"0“。

如何生成文件中出现的每个单词的列表以及频率信息？

bash

file-io

sed

grep

回答 12

Stack Overflow用户

回答已采纳

发布于 2012-05-11 22:05:35

不是sed和grep，而是tr、sort、uniq和awk

% (tr ' ' '\n' | sort | uniq -c | awk '{print $2"@"$1}') <<EOF
This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.
EOF

a@1
appear@2
file@1
is@1
many@1
more@1
of@2
once.@1
one@1
only@1
Some@2
than@1
the@2
This@1
time.@1
with@1
words@2
words.@1

在大多数情况下，您还希望删除数字和标点符号，将所有内容转换为小写(否则"THE“、"The”和"the“将分别计算)，并禁止输入长度为零的单词。对于ASCII文本，您可以使用以下修改后的命令执行所有这些操作：

sed -e  's/[^A-Za-z]/ /g' text.txt | tr 'A-Z' 'a-z' | tr ' ' '\n' | grep -v '^$'| sort | uniq -c | sort -rn

票数 72

Stack Overflow用户

发布于 2014-10-03 06:37:17

uniq -c已经完成了您想要的操作，只需对输入进行排序：

echo 'a s d s d a s d s a a d d s a s d d s a' | tr ' ' '\n' | sort | uniq -c

输出：

  6 a
  7 d
  7 s

票数 47

Stack Overflow用户

发布于 2018-11-26 20:21:17

您可以使用tr来完成此任务，只需运行

tr ' ' '\12' <NAME_OF_FILE| sort | uniq -c | sort -nr > result.txt

城市名称文本文件的输出示例：

3026 Toronto
2006 Montréal
1117 Edmonton
1048 Calgary
905 Ottawa
724 Winnipeg
673 Vancouver
495 Brampton
489 Mississauga
482 London
467 Hamilton

票数 12

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/10552803

复制

相似问题

问如何创建文件中每个单词的频率列表？
EN

回答 12

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何创建文件中每个单词的频率列表？EN

回答 12

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

资源

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何创建文件中每个单词的频率列表？
EN