如何只用大写字母查找单词(字符串)?

时间:2022-09-06 22:42:22

I was wondering, is there any function or way, how to select from a random text all words(strings) with only uppercase letters? To be more specific, I want to take from text all uppercase words and put them into an string array, because those uppercase words are important for me.

我想知道,有没有任何功能或方法,如何从一个随机文本中选择只有大写字母的所有单词(字符串)?更具体地说,我想从文本中取出所有大写单词并将它们放入字符串数组中,因为那些大写单词对我来说很重要。

For example from text: "This text was just made RANDOMLY to show what I MEANT."

例如,来自文本:“这篇文章只是随机地显示我的意思。”

In string array I will have words RANDOMLY and MEANT.

在字符串数组中,我会有单词RANDOMLY和MEANT。

And array should looks like this String[] myArray = {"RANDOMLY", "MEANT"};

数组应该看起来像String [] myArray = {“RANDOMLY”,“MEANT”};

The only thing I think of is that I have go trought every single letter and check if its uppercase,

我唯一想到的是,我每一封信都要去,检查它是否是大写字母,

if yes

  • save the letter to a string variable
  • 将字母保存到字符串变量

  • increase value of help integer variable (int count) by one
  • 将帮助整数变量(int count)的值增加1

  • and take a look at the next letter,
    • if its uppercase again, repeat this part
    • 如果再次大写,请重复此部分

    • if not - move to another letter.
    • 如果没有 - 转到另一封信。

  • 并看一下下一个字母,如果它再次大写,如果没有则重复这一部分 - 转到另一个字母。

I think my solotion is not very effective, so can tell me your opinion about it? Or prehaps how to make it more effective?

我认为我的独奏不是很有效,所以可以告诉我你对它的看法吗?或者说如何使其更有效?

PS: int count is there for expelling short words with 3 letters and less.

PS:int count用于驱逐3个字母以内的短字。

4 个解决方案

#1


Probably easiest way to achieve it would be using regex like \b[A-Z]{4,}\b which represents

可能最容易实现的方法是使用正则表达式\ b [A-Z] {4,} \ b代表

  • \b word boundary - place between alphanumeric and non-alphanumeric characters
  • \ b字边界 - 放置在字母数字和非字母数字字符之间

  • [A-Z] character in range A-Z
  • 范围A-Z中的[A-Z]字符

  • {4,} which appears at least 4 times (if we don't want single letter words like I to be counted) (more info at: http://www.regular-expressions.info/repeat.html)
  • {4,}至少出现4次(如果我们不想要像我这样的单字母单词)(更多信息请访问:http://www.regular-expressions.info/repeat.html)

  • \b another word boundary to make sure that we are reading entire word
  • \ b另一个单词边界,以确保我们正在阅读整个单词

So your code could look like:

所以你的代码看起来像:

String s = "This text was just made RANDOMLY to show what I MEANT.";

Pattern p = Pattern.compile("\\b[A-Z]{4,}\\b");
Matcher m = p.matcher(s);
while (m.find()) {
    String word = m.group();
    System.out.println(word);
}

Beside printing word to console you can also store it in List<String>.

除了将字打印到控制台之外,您还可以将其存储在List 中。

#2


Split your sentence by whitespace. Then you can use StringUtils.isAllUpperCase(CharSequence cs) for instance to check every single string.

用空格分割你的句子。然后,您可以使用StringUtils.isAllUpperCase(CharSequence cs)来检查每个字符串。

http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isAllUpperCase(java.lang.CharSequence)

#3


Use Regex to extract them. Like

使用正则表达式提取它们。喜欢

public static void main(String[] args) {
        List<String> words = new ArrayList<>();
        String dataStr = "This text was just made RANDOMLY to show what I MEANT.";
        Pattern pattern = Pattern.compile("[A-Z][A-Z]+");
        Matcher matcher = pattern.matcher(dataStr);
        while (matcher.find()) {
            words.add(matcher.group());
        }

        System.out.println(words);
    }

Output:

[RANDOMLY, MEANT]

With this in future, you could just adjust search pattern to extract what ever you want.

将来,您可以调整搜索模式以提取您想要的内容。

#4


Here is a solution with minimal use of regex.

这是一个最少使用正则表达式的解决方案。

String s = "This text was just made RANDOMLY to show what I MEANT.";
    String[] words = s.split(" |\\.");
    ArrayList<String> result = new ArrayList<>();

    for(String word : words) {
        String wordToUpperCase = word.toUpperCase();
        if(wordToUpperCase.equals(word)) {
            result.add(word);
        }
    }

The line of code:

代码行:

String[] words = s.split(" |\\.");

means that the string will be split either by a white-space (" ") or by a dot(".")

意味着字符串将由空格(“”)或点(“。”)拆分

More info on why the dashes (escaping) were needed here: Java string split with "." (dot)

有关为什么需要破折号(转义)的更多信息:Java字符串拆分为“。” (点)

If you would have split the string just by white-space, as such:

如果您只是通过空格分割字符串,那么:

String[] words = s.split(" ");

it would have left possible nasty results like "MEANT."

它会留下可能令人讨厌的结果,比如“MEANT”。

In either case, the word "I" is included in the result. If you don't want that, make a check so that every word has a length greater that 1.

在任何一种情况下,结果中都包含单词“I”。如果您不想这样做,请进行检查,以便每个单词的长度都大于1。

#1


Probably easiest way to achieve it would be using regex like \b[A-Z]{4,}\b which represents

可能最容易实现的方法是使用正则表达式\ b [A-Z] {4,} \ b代表

  • \b word boundary - place between alphanumeric and non-alphanumeric characters
  • \ b字边界 - 放置在字母数字和非字母数字字符之间

  • [A-Z] character in range A-Z
  • 范围A-Z中的[A-Z]字符

  • {4,} which appears at least 4 times (if we don't want single letter words like I to be counted) (more info at: http://www.regular-expressions.info/repeat.html)
  • {4,}至少出现4次(如果我们不想要像我这样的单字母单词)(更多信息请访问:http://www.regular-expressions.info/repeat.html)

  • \b another word boundary to make sure that we are reading entire word
  • \ b另一个单词边界,以确保我们正在阅读整个单词

So your code could look like:

所以你的代码看起来像:

String s = "This text was just made RANDOMLY to show what I MEANT.";

Pattern p = Pattern.compile("\\b[A-Z]{4,}\\b");
Matcher m = p.matcher(s);
while (m.find()) {
    String word = m.group();
    System.out.println(word);
}

Beside printing word to console you can also store it in List<String>.

除了将字打印到控制台之外,您还可以将其存储在List 中。

#2


Split your sentence by whitespace. Then you can use StringUtils.isAllUpperCase(CharSequence cs) for instance to check every single string.

用空格分割你的句子。然后,您可以使用StringUtils.isAllUpperCase(CharSequence cs)来检查每个字符串。

http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isAllUpperCase(java.lang.CharSequence)

#3


Use Regex to extract them. Like

使用正则表达式提取它们。喜欢

public static void main(String[] args) {
        List<String> words = new ArrayList<>();
        String dataStr = "This text was just made RANDOMLY to show what I MEANT.";
        Pattern pattern = Pattern.compile("[A-Z][A-Z]+");
        Matcher matcher = pattern.matcher(dataStr);
        while (matcher.find()) {
            words.add(matcher.group());
        }

        System.out.println(words);
    }

Output:

[RANDOMLY, MEANT]

With this in future, you could just adjust search pattern to extract what ever you want.

将来,您可以调整搜索模式以提取您想要的内容。

#4


Here is a solution with minimal use of regex.

这是一个最少使用正则表达式的解决方案。

String s = "This text was just made RANDOMLY to show what I MEANT.";
    String[] words = s.split(" |\\.");
    ArrayList<String> result = new ArrayList<>();

    for(String word : words) {
        String wordToUpperCase = word.toUpperCase();
        if(wordToUpperCase.equals(word)) {
            result.add(word);
        }
    }

The line of code:

代码行:

String[] words = s.split(" |\\.");

means that the string will be split either by a white-space (" ") or by a dot(".")

意味着字符串将由空格(“”)或点(“。”)拆分

More info on why the dashes (escaping) were needed here: Java string split with "." (dot)

有关为什么需要破折号(转义)的更多信息:Java字符串拆分为“。” (点)

If you would have split the string just by white-space, as such:

如果您只是通过空格分割字符串,那么:

String[] words = s.split(" ");

it would have left possible nasty results like "MEANT."

它会留下可能令人讨厌的结果,比如“MEANT”。

In either case, the word "I" is included in the result. If you don't want that, make a check so that every word has a length greater that 1.

在任何一种情况下,结果中都包含单词“I”。如果您不想这样做,请进行检查,以便每个单词的长度都大于1。