I was wondering, is there any function or way, how to select from a random text all words(strings) with only uppercase letters? To be more specific, I want to take from text all uppercase words and put them into an string array, because those uppercase words are important for me.
我想知道,有没有任何功能或方法,如何从一个随机文本中选择只有大写字母的所有单词(字符串)?更具体地说,我想从文本中取出所有大写单词并将它们放入字符串数组中,因为那些大写单词对我来说很重要。
For example from text: "This text was just made RANDOMLY to show what I MEANT."
例如,来自文本:“这篇文章只是随机地显示我的意思。”
In string array I will have words RANDOMLY
and MEANT
.
在字符串数组中,我会有单词RANDOMLY和MEANT。
And array should looks like this String[] myArray = {"RANDOMLY", "MEANT"};
数组应该看起来像String [] myArray = {“RANDOMLY”,“MEANT”};
The only thing I think of is that I have go trought every single letter and check if its uppercase,
我唯一想到的是,我每一封信都要去,检查它是否是大写字母,
if yes
- save the letter to a string variable
- increase value of help integer variable (
int count
) by one - and take a look at the next letter,
- if its uppercase again, repeat this part
- if not - move to another letter.
如果再次大写,请重复此部分
如果没有 - 转到另一封信。
将字母保存到字符串变量
将帮助整数变量(int count)的值增加1
并看一下下一个字母,如果它再次大写,如果没有则重复这一部分 - 转到另一个字母。
I think my solotion is not very effective, so can tell me your opinion about it? Or prehaps how to make it more effective?
我认为我的独奏不是很有效,所以可以告诉我你对它的看法吗?或者说如何使其更有效?
PS: int count
is there for expelling short words with 3 letters and less.
PS:int count用于驱逐3个字母以内的短字。
4 个解决方案
#1
Probably easiest way to achieve it would be using regex like \b[A-Z]{4,}\b
which represents
可能最容易实现的方法是使用正则表达式\ b [A-Z] {4,} \ b代表
-
\b
word boundary - place between alphanumeric and non-alphanumeric characters -
[A-Z]
character in rangeA-Z
-
{4,}
which appears at least 4 times (if we don't want single letter words likeI
to be counted) (more info at: http://www.regular-expressions.info/repeat.html) -
\b
another word boundary to make sure that we are reading entire word
\ b字边界 - 放置在字母数字和非字母数字字符之间
范围A-Z中的[A-Z]字符
{4,}至少出现4次(如果我们不想要像我这样的单字母单词)(更多信息请访问:http://www.regular-expressions.info/repeat.html)
\ b另一个单词边界,以确保我们正在阅读整个单词
So your code could look like:
所以你的代码看起来像:
String s = "This text was just made RANDOMLY to show what I MEANT.";
Pattern p = Pattern.compile("\\b[A-Z]{4,}\\b");
Matcher m = p.matcher(s);
while (m.find()) {
String word = m.group();
System.out.println(word);
}
Beside printing word to console you can also store it in List<String>
.
除了将字打印到控制台之外,您还可以将其存储在List
#2
Split your sentence by whitespace. Then you can use StringUtils.isAllUpperCase(CharSequence cs)
for instance to check every single string.
用空格分割你的句子。然后,您可以使用StringUtils.isAllUpperCase(CharSequence cs)来检查每个字符串。
#3
Use Regex to extract them. Like
使用正则表达式提取它们。喜欢
public static void main(String[] args) {
List<String> words = new ArrayList<>();
String dataStr = "This text was just made RANDOMLY to show what I MEANT.";
Pattern pattern = Pattern.compile("[A-Z][A-Z]+");
Matcher matcher = pattern.matcher(dataStr);
while (matcher.find()) {
words.add(matcher.group());
}
System.out.println(words);
}
Output:
[RANDOMLY, MEANT]
With this in future, you could just adjust search pattern to extract what ever you want.
将来,您可以调整搜索模式以提取您想要的内容。
#4
Here is a solution with minimal use of regex.
这是一个最少使用正则表达式的解决方案。
String s = "This text was just made RANDOMLY to show what I MEANT.";
String[] words = s.split(" |\\.");
ArrayList<String> result = new ArrayList<>();
for(String word : words) {
String wordToUpperCase = word.toUpperCase();
if(wordToUpperCase.equals(word)) {
result.add(word);
}
}
The line of code:
代码行:
String[] words = s.split(" |\\.");
means that the string will be split either by a white-space (" ") or by a dot(".")
意味着字符串将由空格(“”)或点(“。”)拆分
More info on why the dashes (escaping) were needed here: Java string split with "." (dot)
有关为什么需要破折号(转义)的更多信息:Java字符串拆分为“。” (点)
If you would have split the string just by white-space, as such:
如果您只是通过空格分割字符串,那么:
String[] words = s.split(" ");
it would have left possible nasty results like "MEANT."
它会留下可能令人讨厌的结果,比如“MEANT”。
In either case, the word "I" is included in the result. If you don't want that, make a check so that every word has a length greater that 1.
在任何一种情况下,结果中都包含单词“I”。如果您不想这样做,请进行检查,以便每个单词的长度都大于1。
#1
Probably easiest way to achieve it would be using regex like \b[A-Z]{4,}\b
which represents
可能最容易实现的方法是使用正则表达式\ b [A-Z] {4,} \ b代表
-
\b
word boundary - place between alphanumeric and non-alphanumeric characters -
[A-Z]
character in rangeA-Z
-
{4,}
which appears at least 4 times (if we don't want single letter words likeI
to be counted) (more info at: http://www.regular-expressions.info/repeat.html) -
\b
another word boundary to make sure that we are reading entire word
\ b字边界 - 放置在字母数字和非字母数字字符之间
范围A-Z中的[A-Z]字符
{4,}至少出现4次(如果我们不想要像我这样的单字母单词)(更多信息请访问:http://www.regular-expressions.info/repeat.html)
\ b另一个单词边界,以确保我们正在阅读整个单词
So your code could look like:
所以你的代码看起来像:
String s = "This text was just made RANDOMLY to show what I MEANT.";
Pattern p = Pattern.compile("\\b[A-Z]{4,}\\b");
Matcher m = p.matcher(s);
while (m.find()) {
String word = m.group();
System.out.println(word);
}
Beside printing word to console you can also store it in List<String>
.
除了将字打印到控制台之外,您还可以将其存储在List
#2
Split your sentence by whitespace. Then you can use StringUtils.isAllUpperCase(CharSequence cs)
for instance to check every single string.
用空格分割你的句子。然后,您可以使用StringUtils.isAllUpperCase(CharSequence cs)来检查每个字符串。
#3
Use Regex to extract them. Like
使用正则表达式提取它们。喜欢
public static void main(String[] args) {
List<String> words = new ArrayList<>();
String dataStr = "This text was just made RANDOMLY to show what I MEANT.";
Pattern pattern = Pattern.compile("[A-Z][A-Z]+");
Matcher matcher = pattern.matcher(dataStr);
while (matcher.find()) {
words.add(matcher.group());
}
System.out.println(words);
}
Output:
[RANDOMLY, MEANT]
With this in future, you could just adjust search pattern to extract what ever you want.
将来,您可以调整搜索模式以提取您想要的内容。
#4
Here is a solution with minimal use of regex.
这是一个最少使用正则表达式的解决方案。
String s = "This text was just made RANDOMLY to show what I MEANT.";
String[] words = s.split(" |\\.");
ArrayList<String> result = new ArrayList<>();
for(String word : words) {
String wordToUpperCase = word.toUpperCase();
if(wordToUpperCase.equals(word)) {
result.add(word);
}
}
The line of code:
代码行:
String[] words = s.split(" |\\.");
means that the string will be split either by a white-space (" ") or by a dot(".")
意味着字符串将由空格(“”)或点(“。”)拆分
More info on why the dashes (escaping) were needed here: Java string split with "." (dot)
有关为什么需要破折号(转义)的更多信息:Java字符串拆分为“。” (点)
If you would have split the string just by white-space, as such:
如果您只是通过空格分割字符串,那么:
String[] words = s.split(" ");
it would have left possible nasty results like "MEANT."
它会留下可能令人讨厌的结果,比如“MEANT”。
In either case, the word "I" is included in the result. If you don't want that, make a check so that every word has a length greater that 1.
在任何一种情况下,结果中都包含单词“I”。如果您不想这样做,请进行检查,以便每个单词的长度都大于1。