查找.txt文件中的所有字符串“the”

时间:2022-11-29 19:27:12

Here is my code:

这是我的代码:

// Import io so we can use file objects
import java.io.*;

public class SearchThe {
    public static void main(String args[]) {
        try {
            String stringSearch = "the";
            // Open the file c:\test.txt as a buffered reader
            BufferedReader bf = new BufferedReader(new FileReader("test.txt"));

            // Start a line count and declare a string to hold our current line.
            int linecount = 0;
                String line;

            // Let the user know what we are searching for
            System.out.println("Searching for " + stringSearch + " in file...");

            // Loop through each line, stashing the line into our line variable.
            while (( line = bf.readLine()) != null){
                // Increment the count and find the index of the word
                linecount++;
                int indexfound = line.indexOf(stringSearch);

                // If greater than -1, means we found the word
                if (indexfound > -1) {
                    System.out.println("Word was found at position " + indexfound + " on line " + linecount);
                }
            }

            // Close the file after done searching
            bf.close();
        }
        catch (IOException e) {
            System.out.println("IO Error Occurred: " + e.toString());
        }
    }
}

I want to find some word "the" in test.txt file. The problem is when I found the first "the", my program stops finding more.

我想找一些单词“the”in test。txt文件。问题是当我发现第一个“The”时,我的程序就会停止寻找更多的“The”。

And when some word like "then" my program understand it as the word "the".

当某个词像“then”时,我的程序将它理解为“the”。

5 个解决方案

#1


15  

Use Regexes case insensitively, with word boundaries to find all instances and variations of "the".

不敏感地使用Regexes案例,使用单词边界查找“the”的所有实例和变体。

indexOf("the") can not discern between "the" and "then" since each starts with "the". Likewise, "the" is found in the middle of "anathema".

索引(“the”)不能区分“the”和“then”,因为它们都以“the”开头。同样,“the”在“anathema”中也有出现。

To avoid this, use regexes, and search for "the", with word boundaries (\b) on either side. Use word boundaries, instead of splitting on " ", or using just indexOf(" the ") (spaces on either side) which would not find "the." and other instances next to punctuation. You can also do your search case insensitively to find "The" as well.

为了避免这种情况,使用regexes,搜索“the”,两边都有单词边界(\b)。使用单词边界,而不是在“”上拆分,或者使用索引(“the”)(两边的空格),这样就不会找到“the”以及标点符号旁边的其他实例。你也可以不敏感地做你的搜索案例来寻找“The”。

Pattern p = Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE);

while ( (line = bf.readLine()) != null) {
    linecount++;

    Matcher m = p.matcher(line);

    // indicate all matches on the line
    while (m.find()) {
        System.out.println("Word was found at position " + 
                       m.start() + " on line " + linecount);
    }
}

#2


3  

You shouldn't use indexOf because it will find all the possible substring that you have in your string. And because "then" contains the string "the", so it is also a good substring.

你不应该使用indexOf因为它会找到你字符串中所有可能的子字符串。因为“then”包含字符串“the”,所以它也是一个很好的子字符串。

More about indexOf

更多关于indexOf

indexOf

indexOf

public int indexOf(String str, int fromIndex) Returns the index within this string of the first occurrence of the specified substring, starting at the specified index. The integer returned is the smallest value k for which:

公共int indexOf(String str, int fromIndex)返回指定子字符串中第一个出现的字符串中的索引,从指定的索引开始。返回的整数为k的最小值,其中:

You should separate the lines into many words and loop over each word and compare to "the".

你应该把这些句子分成许多单词,并对每个单词进行循环,并与“the”进行比较。

String [] words = line.split(" ");
for (String word : words) {
  if (word.equals("the")) {
    System.out.println("Found the word");
  }
}

The above code snippet will also loop over all possible "the" in the line for you. Using indexOf will always returns you the first occurrence

上面的代码片段还将为您循环行中所有可能的“The”。使用indexOf将始终返回第一个事件

#3


0  

Your current implementation will only find the first instance of 'the' per line.

您当前的实现将只找到每行“the”的第一个实例。

Consider splitting each line into words, iterating over the list of words, and comparing each word to 'the' instead:

考虑把每一行都分成单词,遍历单词列表,并将每个单词与“the”进行比较:

while (( line = bf.readLine()) != null)
{
    linecount++;
    String[] words = line.split(" ");

    for (String word : words)
    {
        if(word.equals(stringSearch))
            System.out.println("Word was found at position " + indexfound + " on line " + linecount);
    }
}

#4


0  

It doesn't sound like the point of the exercise is to skill you up in regular expressions (I don't know it may be... but it seems a little basic for that), even though regexs would indeed be the real-world solution to things like this.

听起来这个练习的目的并不是让你掌握正则表达式(我不知道可能是……)但这似乎有点基础),即使regexs确实是这样的实际解决方案。

My advice is to focus on the basics, use index of and substring to test the string. Think about how you could account for the naturally case sensitive nature of strings. Also, does your reader always get closed (i.e. is there a way bf.close() wouldn't be executed)?

我的建议是专注于基础,使用index of和substring来测试字符串。考虑一下如何解释字符串的自然区分大小写。另外,您的读者是否总是被关闭(例如,是否有一种方法不会执行bf.close()))?

#5


-1  

You best should use Regular Expressions for this kind of search. As a easy/dirty workaround you could modify your stringSearch from

对于这种搜索,最好使用正则表达式。作为一个简单/肮脏的解决方案,您可以修改stringSearch

String stringSearch = "the";

to

String stringSearch = " the ";

#1


15  

Use Regexes case insensitively, with word boundaries to find all instances and variations of "the".

不敏感地使用Regexes案例,使用单词边界查找“the”的所有实例和变体。

indexOf("the") can not discern between "the" and "then" since each starts with "the". Likewise, "the" is found in the middle of "anathema".

索引(“the”)不能区分“the”和“then”,因为它们都以“the”开头。同样,“the”在“anathema”中也有出现。

To avoid this, use regexes, and search for "the", with word boundaries (\b) on either side. Use word boundaries, instead of splitting on " ", or using just indexOf(" the ") (spaces on either side) which would not find "the." and other instances next to punctuation. You can also do your search case insensitively to find "The" as well.

为了避免这种情况,使用regexes,搜索“the”,两边都有单词边界(\b)。使用单词边界,而不是在“”上拆分,或者使用索引(“the”)(两边的空格),这样就不会找到“the”以及标点符号旁边的其他实例。你也可以不敏感地做你的搜索案例来寻找“The”。

Pattern p = Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE);

while ( (line = bf.readLine()) != null) {
    linecount++;

    Matcher m = p.matcher(line);

    // indicate all matches on the line
    while (m.find()) {
        System.out.println("Word was found at position " + 
                       m.start() + " on line " + linecount);
    }
}

#2


3  

You shouldn't use indexOf because it will find all the possible substring that you have in your string. And because "then" contains the string "the", so it is also a good substring.

你不应该使用indexOf因为它会找到你字符串中所有可能的子字符串。因为“then”包含字符串“the”,所以它也是一个很好的子字符串。

More about indexOf

更多关于indexOf

indexOf

indexOf

public int indexOf(String str, int fromIndex) Returns the index within this string of the first occurrence of the specified substring, starting at the specified index. The integer returned is the smallest value k for which:

公共int indexOf(String str, int fromIndex)返回指定子字符串中第一个出现的字符串中的索引,从指定的索引开始。返回的整数为k的最小值,其中:

You should separate the lines into many words and loop over each word and compare to "the".

你应该把这些句子分成许多单词,并对每个单词进行循环,并与“the”进行比较。

String [] words = line.split(" ");
for (String word : words) {
  if (word.equals("the")) {
    System.out.println("Found the word");
  }
}

The above code snippet will also loop over all possible "the" in the line for you. Using indexOf will always returns you the first occurrence

上面的代码片段还将为您循环行中所有可能的“The”。使用indexOf将始终返回第一个事件

#3


0  

Your current implementation will only find the first instance of 'the' per line.

您当前的实现将只找到每行“the”的第一个实例。

Consider splitting each line into words, iterating over the list of words, and comparing each word to 'the' instead:

考虑把每一行都分成单词,遍历单词列表,并将每个单词与“the”进行比较:

while (( line = bf.readLine()) != null)
{
    linecount++;
    String[] words = line.split(" ");

    for (String word : words)
    {
        if(word.equals(stringSearch))
            System.out.println("Word was found at position " + indexfound + " on line " + linecount);
    }
}

#4


0  

It doesn't sound like the point of the exercise is to skill you up in regular expressions (I don't know it may be... but it seems a little basic for that), even though regexs would indeed be the real-world solution to things like this.

听起来这个练习的目的并不是让你掌握正则表达式(我不知道可能是……)但这似乎有点基础),即使regexs确实是这样的实际解决方案。

My advice is to focus on the basics, use index of and substring to test the string. Think about how you could account for the naturally case sensitive nature of strings. Also, does your reader always get closed (i.e. is there a way bf.close() wouldn't be executed)?

我的建议是专注于基础,使用index of和substring来测试字符串。考虑一下如何解释字符串的自然区分大小写。另外,您的读者是否总是被关闭(例如,是否有一种方法不会执行bf.close()))?

#5


-1  

You best should use Regular Expressions for this kind of search. As a easy/dirty workaround you could modify your stringSearch from

对于这种搜索,最好使用正则表达式。作为一个简单/肮脏的解决方案,您可以修改stringSearch

String stringSearch = "the";

to

String stringSearch = " the ";