从java中的文本文件中提取信息

时间:2020-12-27 21:45:32

I'm writing a program where I need to read a text file and extract some specific strings, the text is written in DOT language and this is an example of the file:

我在写一个程序,我需要读取一个文本文件并提取一些特定的字符串,文本是用点语言写的,这是这个文件的一个例子:

digraph G {
node [shape=circle];
0 [xlabel="[]"];
1 [xlabel="[[Text]]"];
0 -> 1 [label="a"];//this
1 -> 2 [label="ab"];//this
1 -> 3 [label="123"];//this
}

I want to ignore everything but the lines that have the structure of the commented lines (by //this);

我想忽略所有的东西,但是那些有注释行结构的线(by //this);

Then split every line to three parts, i.e.:

然后将每行分成三部分,即:

 1 -> 2 [label="ab"];

saved as a list of strings (or array ...):

保存为字符串列表(或数组…):

 [1,2,ab]

I tried a lots with regex but I couldn't get the expected results.

我用regex尝试了很多次,但是没有得到预期的结果。

2 个解决方案

#1


1  

Here is the regex you can use:

以下是您可以使用的regex:

(?m)^(\d+)\s+->\s+(\d+)\s+\[\w+="([^"]*)"];\s*//[^/\n]*$

See regex demo.

查看演示正则表达式。

All the necessary details are held in Group 1, 2 and 3.

所有必要的细节都在第1、2和3组。

See Java code:

看到Java代码:

String str = "digraph G {\nnode [shape=circle];\n0 [xlabel=\"[]\"];\n1 [xlabel=\"[[Text]]\"];\n0 -> 1 [label=\"a\"];//this\n1 -> 2 [label=\"ab\"];//this\n1 -> 3 [label=\"123\"];//this\n}"; 
Pattern ptrn = Pattern.compile("(?m)^(\\d+)\\s+->\\s+(\\d+)\\s+\\[\\w+=\"([^\"]*)\"\\];\\s*//[^/\n]*$");
Matcher m = ptrn.matcher(str);
ArrayList<String[]> results = new ArrayList<String[]>();
while (m.find()) {
    results.add(new String[]{m.group(1), m.group(2), m.group(3)});
}
for(int i = 0; i < results.size(); i++) {               // Display results
    System.out.println(Arrays.toString(results.get(i)));
}

#2


1  

IF you are guaranteed that the line will always be in the format of a -> b [label="someLabel"]; then I guess you can use a bunch of splits to get what you need:

如果保证行格式为a -> b [label="someLabel"];然后我猜你可以用一些分割来得到你需要的:

if (outputLine.contains("[label=")) {
    String[] split1 = outputLine.split("->");
    String first = split1[0].replace(" ", ""); // value of 1
    String[] split2 = split1[1].split("\\[label=\"");
    String second = split2[0].replace(" ", ""); // value of 2
    String label = split2[1].replace("\"", "").replace(" ", "").replace("]", "").replace(";", ""); // just the label
    String[] finalArray = {first, second, label};        
    System.out.println(Arrays.toString(finalArray)); // [1, 2, ab]
}

Seems clunky. Probably a better way to do this.

看起来笨重。也许是更好的方法。

#1


1  

Here is the regex you can use:

以下是您可以使用的regex:

(?m)^(\d+)\s+->\s+(\d+)\s+\[\w+="([^"]*)"];\s*//[^/\n]*$

See regex demo.

查看演示正则表达式。

All the necessary details are held in Group 1, 2 and 3.

所有必要的细节都在第1、2和3组。

See Java code:

看到Java代码:

String str = "digraph G {\nnode [shape=circle];\n0 [xlabel=\"[]\"];\n1 [xlabel=\"[[Text]]\"];\n0 -> 1 [label=\"a\"];//this\n1 -> 2 [label=\"ab\"];//this\n1 -> 3 [label=\"123\"];//this\n}"; 
Pattern ptrn = Pattern.compile("(?m)^(\\d+)\\s+->\\s+(\\d+)\\s+\\[\\w+=\"([^\"]*)\"\\];\\s*//[^/\n]*$");
Matcher m = ptrn.matcher(str);
ArrayList<String[]> results = new ArrayList<String[]>();
while (m.find()) {
    results.add(new String[]{m.group(1), m.group(2), m.group(3)});
}
for(int i = 0; i < results.size(); i++) {               // Display results
    System.out.println(Arrays.toString(results.get(i)));
}

#2


1  

IF you are guaranteed that the line will always be in the format of a -> b [label="someLabel"]; then I guess you can use a bunch of splits to get what you need:

如果保证行格式为a -> b [label="someLabel"];然后我猜你可以用一些分割来得到你需要的:

if (outputLine.contains("[label=")) {
    String[] split1 = outputLine.split("->");
    String first = split1[0].replace(" ", ""); // value of 1
    String[] split2 = split1[1].split("\\[label=\"");
    String second = split2[0].replace(" ", ""); // value of 2
    String label = split2[1].replace("\"", "").replace(" ", "").replace("]", "").replace(";", ""); // just the label
    String[] finalArray = {first, second, label};        
    System.out.println(Arrays.toString(finalArray)); // [1, 2, ab]
}

Seems clunky. Probably a better way to do this.

看起来笨重。也许是更好的方法。