I am reading in a log file and extracting certain data contained with in the file. I am able to extract the time for each line of the log file.
我正在读取日志文件并提取文件中包含的某些数据。我能够为日志文件的每一行提取时间。
Now I want to extract the id "ieatrcxb4498-1"
. All of the id's start with the sub string ieatrcxb
which I have tried to query and return the full string based on it.
现在我想提取id“ieatrcxb4498-1”。所有的id都以子字符串ieatrcxb开头,我试图查询它并根据它返回完整的字符串。
I have tried many different suggestions from other posts. But I have been unsuccessful, with the following patterns:
我从其他帖子尝试过很多不同的建议。但我没有成功,具有以下模式:
(?i)\\b("ieatrcxb"(?:.+?)?)\\b
(?i)\\b\\w*"ieatrcxb"\\w*\\b"
^.*ieatrcxb.*$
I have also tried to extract the full id based, on the String starting with i
and finishing in 1
. As they all do.
我也尝试提取完整的id,基于字符串从i开始并在1中完成。正如他们所做的那样。
Line of the log file
日志文件的行
150: 2017-06-14 18:02:21 INFO monitorinfo : Info: Lock VCS on node "ieatrcxb4498-1"
Code
Scanner s = new Scanner(new FileReader(new File("lock-unlock.txt")));
//Record currentRecord = null;
ArrayList<Record> list = new ArrayList<>();
while (s.hasNextLine()) {
String line = s.nextLine();
Record newRec = new Record();
// newRec.time =
newRec.time = regexChecker("([0-1]?\\d|2[0-3]):([0-5]?\\d):([0-5]?\\d)", line);
newRec.ID = regexChecker("^.*ieatrcxb.*$", line);
list.add(newRec);
}
public static String regexChecker(String regEx, String str2Check) {
Pattern checkRegex = Pattern.compile(regEx);
Matcher regexMatcher = checkRegex.matcher(str2Check);
String regMat = "";
while(regexMatcher.find()){
if(regexMatcher.group().length() !=0)
regMat = regexMatcher.group();
}
//System.out.println("Inside the "+ regexMatcher.group().trim());
}
return regMat;
}
I need a simple pattern which will do this for me.
我需要一个简单的模式来为我做这件事。
3 个解决方案
#1
1
Does the ID always have the format "ieatrcxb
followed by 4 digits, followed by -
, followed by 1 digit"?
ID是否始终具有“ieatrcxb后跟4位数,后跟 - ,后跟1位”的格式?
If that's the case, you can do:
如果是这种情况,你可以这样做:
regexChecker("ieatrcxb\\d{4}-\\d", line);
Note the {4}
quantifier, which matches exactly 4 digits (\\d
). If the last digit is always 1
, you could also use "ieatrcxb\\d{4}-1"
.
注意{4}量词,它恰好与4位数字匹配(\\ d)。如果最后一位数始终为1,您还可以使用“ieatrcxb \\ d {4} -1”。
If the number of digits vary, you can use "ieatrcxb\\d+-\\d+"
, where +
means "1 or more".
如果位数不同,您可以使用“ieatrcxb \\ d + - \\ d +”,其中+表示“1或更多”。
You can also use the {}
quantifier with the mininum and maximum number of occurences. Example: "ieatrcxb\\d{4,6}-\\d"
- {4,6}
means "minimum of 4 and maximum of 6 occurrences" (that's just an example, I don't know if that's your case). This is useful if you know exactly how many digits the ID can have.
您还可以使用{}量词与最小和最大出现次数。例如:“ieatrcxb \\ d {4,6} - \\ d” - {4,6}表示“最少4次,最多6次”(这只是一个例子,我不知道你的情况是否属实) 。如果您确切知道ID可以有多少位数,这将非常有用。
All of the above work for your case, returning ieatrcxb4498-1
. Which one to use will depend on how your input varies.
所有上述工作适用于您的情况,返回ieatrcxb4498-1。使用哪一个取决于您的输入如何变化。
If you want just the numbers without the ieatrcxb
part (4498-1
), you can use a lookbehind regex:
如果你只想要没有ieatrcxb部分(4498-1)的数字,你可以使用lookbehind正则表达式:
regexChecker("(?<=ieatrcxb)\\d{4,6}-\\d", line);
This makes ieatrcxb
to not be part of the match, thus returning just 4498-1
.
这使得ieatrcxb不属于匹配,因此只返回4498-1。
If you also don't want the -1
and just 4498
, you can combine this with a lookahead:
如果你也不想-1和4498,你可以将它与前瞻结合起来:
regexChecker("(?<=ieatrcxb)\\d{4,6}(?=-\\d)", line)
This returns just 4498
.
这只返回4498。
#2
1
public static void main(String[] args) {
String line = "150: 2017-06-14 18:02:21 INFO monitorinfo : Info: Lock VCS on node \"ieatrcxb4498-1\"";
String regex ="ieatrcxb.*1";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
while(m.find()){
System.out.println(m.group());
}
}
or if the id's are all quoted:
或者如果id都被引用:
String id = line.substring(line.indexOf("\""), line.lastIndexOf("\"")+1);
System.out.println(id);
#3
0
You are trying to do it by very difficult way. If each line of the lock-unlock.txt
file is the same like on snippet you posted, you can do following:
你试图通过非常困难的方式来做到这一点。如果lock-unlock.txt文件的每一行与您发布的代码段相同,则可以执行以下操作:
File logFile = new File("lock-unlock.txt");
List<String> lines = Files.readAllLines(logFile.toPath());
List<Integer> ids = lines.stream()
.filter(line -> line.contains("ieatrcxb"))
.map(line -> line.split( "\"")[1]) //"ieatrcxb4498-1"
.map(line -> line.replaceAll("\\D+","")) //"44981"
.map(Integer::parseInt) // 44981
.collect( Collectors.toList() );
If you are not looking for just the ID number, just remove/comment second and third .map()
method call, but it will result to a List of Strings instead of Integers.
如果您不是只查找ID号,只需删除/注释第二个和第三个.map()方法调用,但它将导致一个字符串列表而不是整数。
#1
1
Does the ID always have the format "ieatrcxb
followed by 4 digits, followed by -
, followed by 1 digit"?
ID是否始终具有“ieatrcxb后跟4位数,后跟 - ,后跟1位”的格式?
If that's the case, you can do:
如果是这种情况,你可以这样做:
regexChecker("ieatrcxb\\d{4}-\\d", line);
Note the {4}
quantifier, which matches exactly 4 digits (\\d
). If the last digit is always 1
, you could also use "ieatrcxb\\d{4}-1"
.
注意{4}量词,它恰好与4位数字匹配(\\ d)。如果最后一位数始终为1,您还可以使用“ieatrcxb \\ d {4} -1”。
If the number of digits vary, you can use "ieatrcxb\\d+-\\d+"
, where +
means "1 or more".
如果位数不同,您可以使用“ieatrcxb \\ d + - \\ d +”,其中+表示“1或更多”。
You can also use the {}
quantifier with the mininum and maximum number of occurences. Example: "ieatrcxb\\d{4,6}-\\d"
- {4,6}
means "minimum of 4 and maximum of 6 occurrences" (that's just an example, I don't know if that's your case). This is useful if you know exactly how many digits the ID can have.
您还可以使用{}量词与最小和最大出现次数。例如:“ieatrcxb \\ d {4,6} - \\ d” - {4,6}表示“最少4次,最多6次”(这只是一个例子,我不知道你的情况是否属实) 。如果您确切知道ID可以有多少位数,这将非常有用。
All of the above work for your case, returning ieatrcxb4498-1
. Which one to use will depend on how your input varies.
所有上述工作适用于您的情况,返回ieatrcxb4498-1。使用哪一个取决于您的输入如何变化。
If you want just the numbers without the ieatrcxb
part (4498-1
), you can use a lookbehind regex:
如果你只想要没有ieatrcxb部分(4498-1)的数字,你可以使用lookbehind正则表达式:
regexChecker("(?<=ieatrcxb)\\d{4,6}-\\d", line);
This makes ieatrcxb
to not be part of the match, thus returning just 4498-1
.
这使得ieatrcxb不属于匹配,因此只返回4498-1。
If you also don't want the -1
and just 4498
, you can combine this with a lookahead:
如果你也不想-1和4498,你可以将它与前瞻结合起来:
regexChecker("(?<=ieatrcxb)\\d{4,6}(?=-\\d)", line)
This returns just 4498
.
这只返回4498。
#2
1
public static void main(String[] args) {
String line = "150: 2017-06-14 18:02:21 INFO monitorinfo : Info: Lock VCS on node \"ieatrcxb4498-1\"";
String regex ="ieatrcxb.*1";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
while(m.find()){
System.out.println(m.group());
}
}
or if the id's are all quoted:
或者如果id都被引用:
String id = line.substring(line.indexOf("\""), line.lastIndexOf("\"")+1);
System.out.println(id);
#3
0
You are trying to do it by very difficult way. If each line of the lock-unlock.txt
file is the same like on snippet you posted, you can do following:
你试图通过非常困难的方式来做到这一点。如果lock-unlock.txt文件的每一行与您发布的代码段相同,则可以执行以下操作:
File logFile = new File("lock-unlock.txt");
List<String> lines = Files.readAllLines(logFile.toPath());
List<Integer> ids = lines.stream()
.filter(line -> line.contains("ieatrcxb"))
.map(line -> line.split( "\"")[1]) //"ieatrcxb4498-1"
.map(line -> line.replaceAll("\\D+","")) //"44981"
.map(Integer::parseInt) // 44981
.collect( Collectors.toList() );
If you are not looking for just the ID number, just remove/comment second and third .map()
method call, but it will result to a List of Strings instead of Integers.
如果您不是只查找ID号,只需删除/注释第二个和第三个.map()方法调用,但它将导致一个字符串列表而不是整数。