如何解析输入文件

时间:2022-10-18 17:47:39

So I need to parse this input file and I can't seem to figure out how to go about doing so. I've tried using scanner.Delimiter() but still having problems. Any one how any idea how to properly do this?

所以我需要解析这个输入文件,我似乎无法弄清楚如何去做。我尝试过使用scanner.Delimiter()但仍有问题。任何人怎么知道如何正确地做到这一点?

Here is one line from the input file:

这是输入文件中的一行:

200.88.223.98 - - [01/Feb/2007:04:02:22 -0500] "GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1" 200 52464 "http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album" "Opera/6.01 (Windows 98; U) [en]"

200.88.223.98 - - [01 / Feb / 2007:04:02:22 -0500]“GET / gallery / v / events / album02 / contests / programmingContest05 /?g2_GALLERYSID = 3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId = x332be852 HTTP / 1.1”200 52464“http: //cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID% 3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID = 3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName = album“”Opera / 6.01(Windows 98; U)[en]“

It is suppose to break into sections as such:

假设如此分成几个部分:

  1. address = 200.88.223.98

    地址= 200.88.223.98

  2. date = 01/Feb/2007:04:02:22 -0500

    date = 01 / Feb / 2007:04:02:22 -0500

  3. request = GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1

    request = GET / gallery / v / events / album02 / contests / programmingContest05 /?g2_GALLERYSID = 3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId = x332be852 HTTP / 1.1

  4. status = 200

    status = 200

  5. bytes = 52464

    bytes = 52464

  6. refer = http://cs.tcnj.edu/gallery/main.php? g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album

    refer = http://cs.tcnj.edu/gallery/main.php? g2_view = comment.AddComment&g2_itemId = 664&g2_return = HTTP%3A%2F%2Fcs.tcnj.edu%2Fgallery%2FV%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID = 3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName =专辑

  7. agent = Opera/6.01 (Windows 98; U) [en]

    agent = Opera / 6.01(Windows 98; U)[en]

Here is the part of my code attempting to parse it:

以下是我的代码试图解析它的部分:

Scanner scan = new Scanner(input);
scan.useDelimiter("[-']+");
while (scan.hasNextLine()) 
{
    String address = scan.next();
    String date = scan.next();
    String request = scan.next();
    int status = scan.nextInt();
    int bytes = scan.nextInt();
    String refer = scan.next();
    String agent = scan.next(); 
}

The following error is shown:

显示以下错误:

Exception in thread "main" java.util.InputMismatchException      
  at java.util.Scanner.throwFor(Scanner.java:840) 
  at java.util.Scanner.next(Scanner.java:1461) 
  at java.util.Scanner.nextInt(Scanner.java:2091) 
  at java.util.Scanner.nextInt(Scanner.java:2050) 
  at Analyzer.start(Unknown Source) 
  at Driver.main(Unknown Source) 
Java Result: 1

1 个解决方案

#1


0  

Just think about this. Split your line by space and extracting data

试想一下。按空格拆分行并提取数据

String s = "200.88.223.98 - - [01/Feb/2007:04:02:22 -0500] \"GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1\" 200 52464 \"http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album\" \"Opera/6.01 (Windows 98; U) [en]\"";

  String arr [] = s.split(" ");

  for(int i =0 ;i<arr.length;i++){
      System.out.println(i+" - "+arr[i]);
  }

And Out put is :

Out out是:

0 : 200.88.223.98
1 : -
2 : -
3 : [01/Feb/2007:04:02:22
4 : -0500]
5 : "GET
6 : /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852
7 : HTTP/1.1"
8 : 200
9 : 52464
10 : "http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album"
  11 : "Opera/6.01
  12 : (Windows
  13 : 98;
  14 : U)
  15 : [en]"

So that 0th element give your ip , 3rd and 4th give your date, 6th and 7nt give your request , so on you can extract your data.

所以第0个元素给你的ip,第3和第4个给你的日期,6和7nt给你的请求,所以你可以提取你的数据。

#1


0  

Just think about this. Split your line by space and extracting data

试想一下。按空格拆分行并提取数据

String s = "200.88.223.98 - - [01/Feb/2007:04:02:22 -0500] \"GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1\" 200 52464 \"http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album\" \"Opera/6.01 (Windows 98; U) [en]\"";

  String arr [] = s.split(" ");

  for(int i =0 ;i<arr.length;i++){
      System.out.println(i+" - "+arr[i]);
  }

And Out put is :

Out out是:

0 : 200.88.223.98
1 : -
2 : -
3 : [01/Feb/2007:04:02:22
4 : -0500]
5 : "GET
6 : /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852
7 : HTTP/1.1"
8 : 200
9 : 52464
10 : "http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album"
  11 : "Opera/6.01
  12 : (Windows
  13 : 98;
  14 : U)
  15 : [en]"

So that 0th element give your ip , 3rd and 4th give your date, 6th and 7nt give your request , so on you can extract your data.

所以第0个元素给你的ip,第3和第4个给你的日期,6和7nt给你的请求,所以你可以提取你的数据。