Is there a way in linux to find the most repeated word in files of current folder and childs. I need this to find the most used c++ classes in my project. The output could be like this :
在linux中有没有办法在当前文件夹和子文件的文件中找到最重复的单词。我需要这个来找到我项目中最常用的c ++类。输出可能是这样的:
class alpha : 157,
class beta : 98,
class gamma : 13,
...
means 157 references to the class alpha etc ...
表示157对类alpha等的引用...
Can this be done using a Linux command (maybe grep) ? or Should I use a tool for this ?
可以使用Linux命令(也许是grep)来完成吗?或者我应该使用工具吗?
1 个解决方案
#1
to find the most used words within the files, you can use
要查找文件中最常用的单词,您可以使用
grep -hoE "\w+" * | sort | uniq -c | sort -g
This counts all words as your question asked for.
这会根据您的问题对所有单词进行计数。
grep -hoE "\w{7,}" * | sort | uniq -c | sort -g
This counts words that are longer than six characters.
这会计算超过六个字符的单词。
#1
to find the most used words within the files, you can use
要查找文件中最常用的单词,您可以使用
grep -hoE "\w+" * | sort | uniq -c | sort -g
This counts all words as your question asked for.
这会根据您的问题对所有单词进行计数。
grep -hoE "\w{7,}" * | sort | uniq -c | sort -g
This counts words that are longer than six characters.
这会计算超过六个字符的单词。