So I need to parse this input file and I can't seem to figure out how to go about doing so. I've tried using scanner.Delimiter()
but still having problems. Any one how any idea how to properly do this?
所以我需要解析这个输入文件,我似乎无法弄清楚如何去做。我尝试过使用scanner.Delimiter()但仍有问题。任何人怎么知道如何正确地做到这一点?
Here is one line from the input file:
这是输入文件中的一行:
200.88.223.98 - - [01/Feb/2007:04:02:22 -0500] "GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1" 200 52464 "http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album" "Opera/6.01 (Windows 98; U) [en]"
200.88.223.98 - - [01 / Feb / 2007:04:02:22 -0500]“GET / gallery / v / events / album02 / contests / programmingContest05 /?g2_GALLERYSID = 3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId = x332be852 HTTP / 1.1”200 52464“http: //cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID% 3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID = 3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName = album“”Opera / 6.01(Windows 98; U)[en]“
It is suppose to break into sections as such:
假设如此分成几个部分:
-
address = 200.88.223.98
地址= 200.88.223.98
-
date = 01/Feb/2007:04:02:22 -0500
date = 01 / Feb / 2007:04:02:22 -0500
-
request = GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1
request = GET / gallery / v / events / album02 / contests / programmingContest05 /?g2_GALLERYSID = 3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId = x332be852 HTTP / 1.1
-
status = 200
status = 200
-
bytes = 52464
bytes = 52464
-
refer = http://cs.tcnj.edu/gallery/main.php? g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album
refer = http://cs.tcnj.edu/gallery/main.php? g2_view = comment.AddComment&g2_itemId = 664&g2_return = HTTP%3A%2F%2Fcs.tcnj.edu%2Fgallery%2FV%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID = 3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName =专辑
-
agent = Opera/6.01 (Windows 98; U) [en]
agent = Opera / 6.01(Windows 98; U)[en]
Here is the part of my code attempting to parse it:
以下是我的代码试图解析它的部分:
Scanner scan = new Scanner(input);
scan.useDelimiter("[-']+");
while (scan.hasNextLine())
{
String address = scan.next();
String date = scan.next();
String request = scan.next();
int status = scan.nextInt();
int bytes = scan.nextInt();
String refer = scan.next();
String agent = scan.next();
}
The following error is shown:
显示以下错误:
Exception in thread "main" java.util.InputMismatchException
at java.util.Scanner.throwFor(Scanner.java:840)
at java.util.Scanner.next(Scanner.java:1461)
at java.util.Scanner.nextInt(Scanner.java:2091)
at java.util.Scanner.nextInt(Scanner.java:2050)
at Analyzer.start(Unknown Source)
at Driver.main(Unknown Source)
Java Result: 1
1 个解决方案
#1
0
Just think about this. Split your line by space and extracting data
试想一下。按空格拆分行并提取数据
String s = "200.88.223.98 - - [01/Feb/2007:04:02:22 -0500] \"GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1\" 200 52464 \"http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album\" \"Opera/6.01 (Windows 98; U) [en]\"";
String arr [] = s.split(" ");
for(int i =0 ;i<arr.length;i++){
System.out.println(i+" - "+arr[i]);
}
And Out put is :
Out out是:
0 : 200.88.223.98
1 : -
2 : -
3 : [01/Feb/2007:04:02:22
4 : -0500]
5 : "GET
6 : /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852
7 : HTTP/1.1"
8 : 200
9 : 52464
10 : "http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album"
11 : "Opera/6.01
12 : (Windows
13 : 98;
14 : U)
15 : [en]"
So that 0th element give your ip , 3rd and 4th give your date, 6th and 7nt give your request , so on you can extract your data.
所以第0个元素给你的ip,第3和第4个给你的日期,6和7nt给你的请求,所以你可以提取你的数据。
#1
0
Just think about this. Split your line by space and extracting data
试想一下。按空格拆分行并提取数据
String s = "200.88.223.98 - - [01/Feb/2007:04:02:22 -0500] \"GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1\" 200 52464 \"http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album\" \"Opera/6.01 (Windows 98; U) [en]\"";
String arr [] = s.split(" ");
for(int i =0 ;i<arr.length;i++){
System.out.println(i+" - "+arr[i]);
}
And Out put is :
Out out是:
0 : 200.88.223.98
1 : -
2 : -
3 : [01/Feb/2007:04:02:22
4 : -0500]
5 : "GET
6 : /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852
7 : HTTP/1.1"
8 : 200
9 : 52464
10 : "http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album"
11 : "Opera/6.01
12 : (Windows
13 : 98;
14 : U)
15 : [en]"
So that 0th element give your ip , 3rd and 4th give your date, 6th and 7nt give your request , so on you can extract your data.
所以第0个元素给你的ip,第3和第4个给你的日期,6和7nt给你的请求,所以你可以提取你的数据。