找到文件夹中重复次数最多的单词

时间:2021-10-01 19:14:48

Is there a way in linux to find the most repeated word in files of current folder and childs. I need this to find the most used c++ classes in my project. The output could be like this :

在linux中有没有办法在当前文件夹和子文件的文件中找到最重复的单词。我需要这个来找到我项目中最常用的c ++类。输出可能是这样的:

class alpha : 157,
class beta  : 98,
class gamma : 13,
...

means 157 references to the class alpha etc ...

表示157对类alpha等的引用...

Can this be done using a Linux command (maybe grep) ? or Should I use a tool for this ?

可以使用Linux命令(也许是grep)来完成吗?或者我应该使用工具吗?

1 个解决方案

#1


to find the most used words within the files, you can use

要查找文件中最常用的单词,您可以使用

grep -hoE "\w+" * | sort | uniq -c | sort -g

This counts all words as your question asked for.

这会根据您的问题对所有单词进行计数。

grep -hoE "\w{7,}" * | sort | uniq -c | sort -g

This counts words that are longer than six characters.

这会计算超过六个字符的单词。

#1


to find the most used words within the files, you can use

要查找文件中最常用的单词,您可以使用

grep -hoE "\w+" * | sort | uniq -c | sort -g

This counts all words as your question asked for.

这会根据您的问题对所有单词进行计数。

grep -hoE "\w{7,}" * | sort | uniq -c | sort -g

This counts words that are longer than six characters.

这会计算超过六个字符的单词。