I am writing a program to read a file and count the occurrences of specific words within that file.
我正在编写一个程序来读取文件并计算该文件中特定单词的出现次数。
I have got the code working to a point. I put the words I want to count in a String []. The problem is that the program either counts the occurences of all the words in the file (including the ones I do not want to count) or it counts the words in the String [].
我已经让代码工作到了一定程度。我把我想要的字放在String []中。问题是程序要么计算文件中所有单词的出现次数(包括我不想计算的单词),要么计算String []中的单词。
How do I go about getting the program to count the words in the file that match the words in the array? I've looked through many similar questions and have tried using StringTokenizer and Lists but can't get them fully working either.
如何让程序计算文件中与数组中的单词匹配的单词?我已经查看了许多类似的问题,并尝试过使用StringTokenizer和Lists,但也无法让它们完全正常工作。
My aim is that if my file has the text " yellow red blue white black purple blue", I want my output to be "red: 1, blue: 2, yellow: 1"
我的目标是,如果我的文件有“黄色红色蓝色白色黑色紫色蓝色”,我希望我的输出为“红色:1,蓝色:2,黄色:1”
I just want a nudge in the right direction, I know it is something silly I am stuck on, and as always, any constructive feedback is appreciated.
我只是想在正确的方向上轻推,我知道这是一个愚蠢的东西,我被困在一起,并且一如既往,任何建设性的反馈都值得赞赏。
Here is my code so far:
这是我到目前为止的代码:
static String[] words = { "red", "blue", "yellow", "green" };
public static void main(String[] args) throws FileNotFoundException, IOException {
System.out.println("This program will count the occurences of the specific words from a text file.");
System.out.println("\nThe words to be counted are; red, blue, yellow, and green.\n");
Map map = new HashMap();
try (BufferedReader br = new BufferedReader(new FileReader("colours.txt"))) {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
words = line.split(" "); // keeping this counts all words separated by whitespace, removing it counts words in my array instead of the file, so I'll get red: 1, blue: 1, yellow: 1 etc.,
for (int i = 0; i < words.length; i++) {
if (map.get(words[i]) == null) {
map.put(words[i], 1);
}
else {
int newValue = Integer.valueOf(String.valueOf(map.get(words[i])));
newValue++;
map.put(words[i], newValue);
}
}
sb.append(System.lineSeparator());
line = br.readLine();
}
}
Map<String, String> sorted = new TreeMap<String, String>(map);
for (Object key : sorted.keySet()) {
System.out.println(key + ": " + map.get(key));
}
}
2 个解决方案
#1
1
The main issue above is that your are overriding the initial array or words
when you split the line that you just read.
上面的主要问题是,当您拆分刚刚读取的行时,您将覆盖初始数组或单词。
I have written this (modified variable names a bit for my own understanding)
我写了这个(为了我自己的理解,修改后的变量名称有点)
(updated based on comments, thanks @shmosel)
(根据评论更新,谢谢@shmosel)
public static void main(String[] args) throws FileNotFoundException, IOException {
String[] keywords = {"red", "blue", "yellow", "green"};
// for easier querying contents of array
List keywordList = Arrays.asList(keywords);
System.out.println("This program will count the occurrences of the specific words from a text file.");
System.out.println("\nThe words to be counted are: " + keywordList + ".\n");
Map<String, Integer> wordMap = new HashMap<>();
try (BufferedReader br = new BufferedReader(new FileReader("/path/to/file/colours.txt"))) {
// read a line
String line = br.readLine();
while (line != null) {
// keeping this counts all words separated by whitespace, removing it counts words in my array instead
// of the file, so I'll get red: 1, blue: 1, yellow: 1 etc.,
String[] words = line.split(" ");
for(String oneWord : words ){
if( keywordList.contains(oneWord)){
// thanks @ shmosel for the improvement suggested in comments
wordMap.merge(oneWord, 1, Integer::sum);
}
}
line = br.readLine();
}
}
Map<String, Integer> sorted = new TreeMap<>(wordMap);
for (Object key : sorted.keySet()) {
System.out.println(key + ": " + wordMap.get(key));
}
}
#2
1
There are probably two issues in the code.
代码中可能存在两个问题。
- Array 'words' is used initially to list the words you are interested. But you are using the same array to hold the words in the line. [see words = line.split(" ");] So use a different array to hold the words in the line.
- There is no check if the word (in initial list) exists in line. Need to add this check. Also, remember that a word can repeat many times in the same line.
数组'words'最初用于列出您感兴趣的单词。但是你使用相同的数组来保存行中的单词。 [请参阅words = line.split(“”);]因此请使用不同的数组来保存行中的单词。
没有检查单词(在初始列表中)是否存在于行中。需要添加此检查。另外,请记住,一个单词可以在同一行重复多次。
#1
1
The main issue above is that your are overriding the initial array or words
when you split the line that you just read.
上面的主要问题是,当您拆分刚刚读取的行时,您将覆盖初始数组或单词。
I have written this (modified variable names a bit for my own understanding)
我写了这个(为了我自己的理解,修改后的变量名称有点)
(updated based on comments, thanks @shmosel)
(根据评论更新,谢谢@shmosel)
public static void main(String[] args) throws FileNotFoundException, IOException {
String[] keywords = {"red", "blue", "yellow", "green"};
// for easier querying contents of array
List keywordList = Arrays.asList(keywords);
System.out.println("This program will count the occurrences of the specific words from a text file.");
System.out.println("\nThe words to be counted are: " + keywordList + ".\n");
Map<String, Integer> wordMap = new HashMap<>();
try (BufferedReader br = new BufferedReader(new FileReader("/path/to/file/colours.txt"))) {
// read a line
String line = br.readLine();
while (line != null) {
// keeping this counts all words separated by whitespace, removing it counts words in my array instead
// of the file, so I'll get red: 1, blue: 1, yellow: 1 etc.,
String[] words = line.split(" ");
for(String oneWord : words ){
if( keywordList.contains(oneWord)){
// thanks @ shmosel for the improvement suggested in comments
wordMap.merge(oneWord, 1, Integer::sum);
}
}
line = br.readLine();
}
}
Map<String, Integer> sorted = new TreeMap<>(wordMap);
for (Object key : sorted.keySet()) {
System.out.println(key + ": " + wordMap.get(key));
}
}
#2
1
There are probably two issues in the code.
代码中可能存在两个问题。
- Array 'words' is used initially to list the words you are interested. But you are using the same array to hold the words in the line. [see words = line.split(" ");] So use a different array to hold the words in the line.
- There is no check if the word (in initial list) exists in line. Need to add this check. Also, remember that a word can repeat many times in the same line.
数组'words'最初用于列出您感兴趣的单词。但是你使用相同的数组来保存行中的单词。 [请参阅words = line.split(“”);]因此请使用不同的数组来保存行中的单词。
没有检查单词(在初始列表中)是否存在于行中。需要添加此检查。另外,请记住,一个单词可以在同一行重复多次。