I am pretty new to Java (started a course less than half a year ago) and I'm not sure how to go about implementing this. Hopefully it can be covered with some sort of regex - though I haven't covered regex in my course yet so if someone can explain their answer briefly it would be appreciated.
我是Java新手(不到半年前开始上课)我不知道如何实现这个。希望它可以覆盖某种正则表达式 - 虽然我还没有在我的课程中介绍正则表达式,所以如果有人能够简单地解释他们的答案,那将不胜感激。
Here is the code so far:
这是到目前为止的代码:
import java.io.*;
import java.util.*;
import java.net.*;
public class definerNotOrganised
{
public static void main(String[]args) throws Exception
{
System.out.println("\f\n\tWelcome to the word definer! (Input '*' to exit)");
while (true)
{
System.out.print("\n\tEnter a word to Define: ");
input();
}
}
private static void input() throws Exception
{
Scanner sc = new Scanner(System.in);
String userWord = sc.nextLine();
if (userWord.equalsIgnoreCase("*"))
{
System.out.println("Exiting...");
System.exit(0);
}
else
{
System.out.print(define(userWord));
}
}
private static String define(String word) throws Exception
{
String notFound = "I'm sorry, I can't find that word...";
String line = "";
BufferedReader br = new BufferedReader(new InputStreamReader(new URL("https://raw.githubusercontent.com/sujithps/Dictionary/master/Oxford%20English%20Dictionary.txt").openStream()));
try {
while (line != null)
{
line = br.readLine();
String lineFirstWord = firstWord(line);
if ((lineFirstWord.equalsIgnoreCase(word))&&(line.length() > 5))
{
cleanUp(line);
}
}
} catch (Exception E)
{
return notFound;
}
return notFound;
}
private static String firstWord(String line) {
if (line.indexOf(' ') > -1)
{
return line.substring(0, line.indexOf(' '));
} else
{
return line;
}
}
private static void cleanUp(String line)
{
//Unsure what to put in here
}
}
The code I am writing is meant to define words, which it does by searching https://raw.githubusercontent.com/sujithps/Dictionary/master/Oxford%20English%20Dictionary.txt for a definition of a word the user enters. It isn't very optimal and takes a while to search - but that's not what I'm trying to solve right now.
我正在编写的代码用于定义单词,它通过搜索https://raw.githubusercontent.com/sujithps/Dictionary/master/Oxford%20English%20Dictionary.txt来定义用户输入的单词。它不是非常优化,需要一段时间才能进行搜索 - 但这不是我现在想要解决的问题。
I'm sure there are many issues but currently what I want to know is what to put in the cleanUp
method to make the output better.
我确定有很多问题,但目前我想知道的是放在cleanUp方法中以使输出更好。
The main issue with my code is that the output can be very messy if the word has multiple definitions.
我的代码的主要问题是,如果单词有多个定义,输出可能会非常混乱。
For example, the output for the word "nice" would be:
例如,单词“nice”的输出将是:
Nice adj. 1 pleasant, satisfactory. 2 (of a person) kind, good-natured. 3 iron. Bad or awkward (nice mess). 4 fine or subtle (nice distinction). 5 fastidious; delicately sensitive. 6 (foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm). nicely adv. Niceness n. Nicish adj. (also niceish). [originally = foolish, from latin nescius ignorant]
不错的1愉快,满意。 2(一个人)善良,善良。 3铁。糟糕或尴尬(好乱)。 4精细或微妙(很好的区别)。 5挑剔;敏感的。 6(foll。通过adj。,经常使用和)在所描述的质量方面令人满意(很长很长时间;温暖而且温暖)。很高兴。好的Nicish adj。 (也很好看)。 [原来是愚蠢的,来自拉丁语nescius ignorant]
This gets printed out all in one line by the console, which looks messy. I want the output to be something more like this:
这会被控制台一行打印出来,看起来很乱。我希望输出更像是这样的:
Nice adj.
pleasant, satisfactory.
(of a person) kind, good-natured.
(一个人)善良,善良。
iron. Bad or awkward (nice mess).
铁。糟糕或尴尬(好乱)。
etc.
Originally I thought that the solution was to have the code find a number in the string, and then add a \n
before it.
最初我认为解决方案是让代码在字符串中找到一个数字,然后在它之前添加一个\ n。
However, some of the definitions themselves contain numbers so this wouldn't work out.
但是,有些定义本身包含数字,因此无法解决问题。
Each time there is a new definition it comes after the end of a sentence, so ideally the code needs to look for . [a number]
and then line break before the number.
每次有一个新的定义时,它都会在句子结尾之后出现,所以理想情况下代码需要查找。 [数字]然后在数字之前换行。
It also needs to accommodate for up to two digit numbers, because some words have a lot of definitions.
它还需要容纳最多两位数字,因为有些单词有很多定义。
As further safe-proofing (just incase the conditions are met somewhere unexpected) it would be useful if it only applied the line break when the number is one higher that the last one it did. (if the code finds ". 1" and then for some reason ". 7" it should not line break, but if it finds ". 2" it should.)
随着进一步的安全防范(只是意味着条件在某个意外的地方得到满足),如果它仅在最后一个数字高一个时应用换行符将是有用的。 (如果代码找到“.1”然后由于某种原因“。7”它不应该换行,但如果它找到“.2”它应该。)
Sorry if something similar to this has been posted before, but I'm not even sure where to start with this. Someone I know who is much more competent than I tried to offer a regex solution but it didn't work out, hopefully someone here can be of assistance.
对不起,如果以前发布了类似的内容,但我甚至不知道从哪里开始。我认识的人比我试图提供正则表达式解决方案更有能力,但它没有成功,希望这里有人可以提供帮助。
Not all the criteria from before needs to be met really, it doesn't have to be perfect, I just wanted to give an idea of what I am going for. Sorry for the long read and thanks in advance.
并非所有以前的标准都需要真正满足,它不一定是完美的,我只是想了解我的目标。很抱歉长时间阅读并提前致谢。
3 个解决方案
#1
1
You're going to have a harder time than you think because of the dictionary format. Printed (As opposed to online) dictionaries use many formatting techniques to shorten the length of the text and thus of the book itself.
由于字典格式,你将比你想象的更难。印刷(与在线相对)词典使用许多格式化技术来缩短文本的长度,从而缩短书本身的长度。
Operating on the basis that you need to look for a period followed by a number (. #
) will not be enough. Look in your example what you will get for definition 6:
在您需要查找一个句点后跟一个数字(。#)的基础上运行是不够的。在您的示例中查看定义6将获得的内容:
- (foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm). nicely adv. Niceness n. Nicish adj. (also niceish). [originally = foolish, from latin nescius ignorant]
(foll。通过adj。,经常和)并且在所描述的质量方面令人满意(好长时间;美好而温暖)。很高兴。好的Nicish adj。 (也很好看)。 [原来是愚蠢的,来自拉丁语nescius ignorant]
But this is incorrect because the dictionary format is such that different parts of speech are written sequentially. What you would probably like is to have
但这是不正确的,因为字典格式是按顺序写入不同的词性。你可能想要的是拥有
Nice adj.
...
- (foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm).
(foll。通过adj。,经常和)并且在所描述的质量方面令人满意(好长时间;美好而温暖)。
nicely adv.
Niceness n.
Nicish adj. (also niceish). [originally = foolish, from latin nescius ignorant]
Nicish adj。 (也很好看)。 [原来是愚蠢的,来自拉丁语nescius ignorant]
And this is excluding any other formatting conventions. You will have to consult the first pages in the dictionary that explain all the abbreviations and definition format.
这是排除任何其他格式约定。您将不得不查阅字典中解释所有缩写和定义格式的第一页。
For now, I suggest you write a list of keyword, like adj
, adv
, n
etc. and search for them in addition to searching for . #
. Here is an incomplete attempt:
现在,我建议你写一个关键字列表,比如adj,adv,n等,除了搜索之外还要搜索它们。 #。这是一个不完整的尝试:
public static void main(String[] args) {
final String[] KEYWORDS = {" adj\\. ", " n\\. ", " adv\\. "};
String s = "Nice adj. 1 pleasant, satisfactory. 2 (of a person) kind, good-natured. 3 iron. Bad or awkward (nice mess). 4 fine or subtle (nice distinction). 5 fastidious; delicately sensitive. 6 (foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm). nicely adv. Niceness n. Nicish adj. (also niceish). [originally = foolish, from latin nescius ignorant]";
String r = s;
for (String kw : KEYWORDS)
r = r.replaceAll(kw + "(?![^(]+\\))", kw + "\n");
r = r.replaceAll("\\.\\s+(\\d+)", ".\n $1.");
System.out.println(r);
}
with the output
与输出
Nice adj.
pleasant, satisfactory.
(of a person) kind, good-natured.
(一个人)善良,善良。
iron. Bad or awkward (nice mess).
铁。糟糕或尴尬(好乱)。
fine or subtle (nice distinction).
精致或微妙(很好的区别)。
fastidious; delicately sensitive.
挑剔;敏感的。
(foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm). nicely adv.
(foll。通过adj。,经常和)并且在所描述的质量方面令人满意(好长时间;美好而温暖)。很高兴。
Niceness n.
Nicish adj.
(also niceish). [originally = foolish, from latin nescius ignorant]
(也很好看)。 [原来是愚蠢的,来自拉丁语nescius ignorant]
Note that one would need an arbitrary length lookbehing to fix the nicely adv.
in definition 6. Also, in the Nicish adj.
form the additional info should not be separated with a line break.
请注意,需要一个任意长度的lookbehing来修复好的adv。在定义6中。另外,在Nicish adj。表格附加信息不应该用换行符分隔。
#2
0
I believe your concern that "fake" numbers might exist in the text is unfounded, and further you couldn't realistically guard against a "fake" number that happened to match the next expected sequential number.
我相信您对文本中可能存在“假”数字的担忧是没有根据的,而且您无法真实地防范碰巧与下一个预期序列号相匹配的“假”数字。
Thus, this will suffice:
因此,这就足够了:
String formatted = definition.replaceAll("\\. (\\d+)", ".\n\t$1");
#3
0
I myself am new to Java, but I thought I could give it a try. I added "fake" numbers in the enumeration to make sure it works correctly. I encourage more experienced Java programmers to comment on this post to further improve things.
我自己是Java的新手,但我想我可以尝试一下。我在枚举中添加了“假”数字以确保它正常工作。我鼓励更有经验的Java程序员对这篇文章发表评论,以进一步改进。
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String str = "Nice adj. 1 pleasant, satisfactory. 2 (of a person) kind,"
+ " good-natured. 3 iron. 2 A fake number. Bad or awkward (nice mess). 4 fine or"
+ " subtle (nice distinction). 5 fastidious; delicately "
+ "sensitive. 8 Another fake number. 6 (foll. By an adj., often with and) satisfactory"
+ " in terms of the quality described (a nice long time; nice"
+ " and warm). nicely adv. Niceness n. Nicish adj. (also "
+ "niceish). [originally = foolish, from latin nescius ignorant]";
String strClean = cleanUp(str);
System.out.println(strClean);
}
private static String cleanUp(String str) {
StringBuilder cleaned = new StringBuilder();
int currentLevel = 0;
/* The initial pre-digit information */
Matcher initialMatcher = Pattern.compile("(.*?)(?=\\. 1)").matcher(str);
// We must initialise the matcher before grouping
boolean initialMatchBool = initialMatcher.find();
cleaned.append(initialMatcher.group(1) + ".");
/* Digit listing */
List<String> startDigitList = new ArrayList<String>();
Matcher startDigitMatcher = Pattern.compile("(?<=\\. )(\\d[^\\d]*)").matcher(str);
while (startDigitMatcher.find()) {
startDigitList.add(startDigitMatcher.group());
}
for (String match: startDigitList) {
/* The first digit of a match */
Matcher digitMatcher = Pattern.compile("(^\\d+)").matcher(match);
// We must initialise the matcher before grouping
boolean digitMatchBool = digitMatcher.find();
int precedingDigit = Integer.parseInt(digitMatcher.group(1));
if (precedingDigit == currentLevel+1) {
cleaned.append("\n\t");
currentLevel++;
}
cleaned.append(match);
}
return cleaned.toString();
}
}
Output:
Nice adj.
1 pleasant, satisfactory.
2 (of a person) kind, good-natured.
3 iron. 2 A fake number. Bad or awkward (nice mess).
4 fine or subtle (nice distinction).
5 fastidious; delicately sensitive. 8 Another fake number.
6 (foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm). nicely adv. Niceness n. Nicish adj. (also niceish). [originally = foolish, from latin nescius ignorant]
#1
1
You're going to have a harder time than you think because of the dictionary format. Printed (As opposed to online) dictionaries use many formatting techniques to shorten the length of the text and thus of the book itself.
由于字典格式,你将比你想象的更难。印刷(与在线相对)词典使用许多格式化技术来缩短文本的长度,从而缩短书本身的长度。
Operating on the basis that you need to look for a period followed by a number (. #
) will not be enough. Look in your example what you will get for definition 6:
在您需要查找一个句点后跟一个数字(。#)的基础上运行是不够的。在您的示例中查看定义6将获得的内容:
- (foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm). nicely adv. Niceness n. Nicish adj. (also niceish). [originally = foolish, from latin nescius ignorant]
(foll。通过adj。,经常和)并且在所描述的质量方面令人满意(好长时间;美好而温暖)。很高兴。好的Nicish adj。 (也很好看)。 [原来是愚蠢的,来自拉丁语nescius ignorant]
But this is incorrect because the dictionary format is such that different parts of speech are written sequentially. What you would probably like is to have
但这是不正确的,因为字典格式是按顺序写入不同的词性。你可能想要的是拥有
Nice adj.
...
- (foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm).
(foll。通过adj。,经常和)并且在所描述的质量方面令人满意(好长时间;美好而温暖)。
nicely adv.
Niceness n.
Nicish adj. (also niceish). [originally = foolish, from latin nescius ignorant]
Nicish adj。 (也很好看)。 [原来是愚蠢的,来自拉丁语nescius ignorant]
And this is excluding any other formatting conventions. You will have to consult the first pages in the dictionary that explain all the abbreviations and definition format.
这是排除任何其他格式约定。您将不得不查阅字典中解释所有缩写和定义格式的第一页。
For now, I suggest you write a list of keyword, like adj
, adv
, n
etc. and search for them in addition to searching for . #
. Here is an incomplete attempt:
现在,我建议你写一个关键字列表,比如adj,adv,n等,除了搜索之外还要搜索它们。 #。这是一个不完整的尝试:
public static void main(String[] args) {
final String[] KEYWORDS = {" adj\\. ", " n\\. ", " adv\\. "};
String s = "Nice adj. 1 pleasant, satisfactory. 2 (of a person) kind, good-natured. 3 iron. Bad or awkward (nice mess). 4 fine or subtle (nice distinction). 5 fastidious; delicately sensitive. 6 (foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm). nicely adv. Niceness n. Nicish adj. (also niceish). [originally = foolish, from latin nescius ignorant]";
String r = s;
for (String kw : KEYWORDS)
r = r.replaceAll(kw + "(?![^(]+\\))", kw + "\n");
r = r.replaceAll("\\.\\s+(\\d+)", ".\n $1.");
System.out.println(r);
}
with the output
与输出
Nice adj.
pleasant, satisfactory.
(of a person) kind, good-natured.
(一个人)善良,善良。
iron. Bad or awkward (nice mess).
铁。糟糕或尴尬(好乱)。
fine or subtle (nice distinction).
精致或微妙(很好的区别)。
fastidious; delicately sensitive.
挑剔;敏感的。
(foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm). nicely adv.
(foll。通过adj。,经常和)并且在所描述的质量方面令人满意(好长时间;美好而温暖)。很高兴。
Niceness n.
Nicish adj.
(also niceish). [originally = foolish, from latin nescius ignorant]
(也很好看)。 [原来是愚蠢的,来自拉丁语nescius ignorant]
Note that one would need an arbitrary length lookbehing to fix the nicely adv.
in definition 6. Also, in the Nicish adj.
form the additional info should not be separated with a line break.
请注意,需要一个任意长度的lookbehing来修复好的adv。在定义6中。另外,在Nicish adj。表格附加信息不应该用换行符分隔。
#2
0
I believe your concern that "fake" numbers might exist in the text is unfounded, and further you couldn't realistically guard against a "fake" number that happened to match the next expected sequential number.
我相信您对文本中可能存在“假”数字的担忧是没有根据的,而且您无法真实地防范碰巧与下一个预期序列号相匹配的“假”数字。
Thus, this will suffice:
因此,这就足够了:
String formatted = definition.replaceAll("\\. (\\d+)", ".\n\t$1");
#3
0
I myself am new to Java, but I thought I could give it a try. I added "fake" numbers in the enumeration to make sure it works correctly. I encourage more experienced Java programmers to comment on this post to further improve things.
我自己是Java的新手,但我想我可以尝试一下。我在枚举中添加了“假”数字以确保它正常工作。我鼓励更有经验的Java程序员对这篇文章发表评论,以进一步改进。
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String str = "Nice adj. 1 pleasant, satisfactory. 2 (of a person) kind,"
+ " good-natured. 3 iron. 2 A fake number. Bad or awkward (nice mess). 4 fine or"
+ " subtle (nice distinction). 5 fastidious; delicately "
+ "sensitive. 8 Another fake number. 6 (foll. By an adj., often with and) satisfactory"
+ " in terms of the quality described (a nice long time; nice"
+ " and warm). nicely adv. Niceness n. Nicish adj. (also "
+ "niceish). [originally = foolish, from latin nescius ignorant]";
String strClean = cleanUp(str);
System.out.println(strClean);
}
private static String cleanUp(String str) {
StringBuilder cleaned = new StringBuilder();
int currentLevel = 0;
/* The initial pre-digit information */
Matcher initialMatcher = Pattern.compile("(.*?)(?=\\. 1)").matcher(str);
// We must initialise the matcher before grouping
boolean initialMatchBool = initialMatcher.find();
cleaned.append(initialMatcher.group(1) + ".");
/* Digit listing */
List<String> startDigitList = new ArrayList<String>();
Matcher startDigitMatcher = Pattern.compile("(?<=\\. )(\\d[^\\d]*)").matcher(str);
while (startDigitMatcher.find()) {
startDigitList.add(startDigitMatcher.group());
}
for (String match: startDigitList) {
/* The first digit of a match */
Matcher digitMatcher = Pattern.compile("(^\\d+)").matcher(match);
// We must initialise the matcher before grouping
boolean digitMatchBool = digitMatcher.find();
int precedingDigit = Integer.parseInt(digitMatcher.group(1));
if (precedingDigit == currentLevel+1) {
cleaned.append("\n\t");
currentLevel++;
}
cleaned.append(match);
}
return cleaned.toString();
}
}
Output:
Nice adj.
1 pleasant, satisfactory.
2 (of a person) kind, good-natured.
3 iron. 2 A fake number. Bad or awkward (nice mess).
4 fine or subtle (nice distinction).
5 fastidious; delicately sensitive. 8 Another fake number.
6 (foll. By an adj., often with and) satisfactory in terms of the quality described (a nice long time; nice and warm). nicely adv. Niceness n. Nicish adj. (also niceish). [originally = foolish, from latin nescius ignorant]