Possible Duplicate:
RegEx match open tags except XHTML self-contained tags可能重复:RegEx匹配除XHTML自包含标记之外的开放标记
I have this list of 100 names to be extracted that lie in between the tags. I need to extract just the data and not the tags using Java Regular Expressions.
Eg: I need the data Aaron,Teb, Abacha, Jui, Abashidze, Harry. All in a new line.
例如:我需要数据Aaron,Teb,Abacha,Jui,Abashidze,Harry。一切都在新的一行。
<a class="listing" href=http://eeee/a/hank_aaron/index.html">Aaron, Teb</a><br>
<a class="listing" href=http://eeee/t/sani_abacha/index.html">Abacha, Jui</a><br>
<a class="listing" href=http://eeee/i/aslan_abashidze/index.html">Abashidze, Harry</a><br>
I wrote the following code, but it extracts the tags too. Where am i going wrong. How do i replace the tags or Is the Regexp wrong.
我编写了以下代码,但它也提取了标签。我哪里错了。如何更换标签或Regexp是否错误。
public static void main(String[] args) throws Exception {
URL oracle = new URL("http://eeee/all/people/index.html");
BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));
String input;
String REGEX = "<a class=\"listing\"[^>]*>";
while ((input = in.readLine()) != null){
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(input);
while(m.find()) {
System.out.println(input);
}
}
in.close();
}
1 个解决方案
#1
0
Use this regexp:
使用此正则表达式:
(?:<a class=\"listing\"[^>]*>)([^<]*)(?:<)
Its group 1 will capture the name.
它的第1组将捕获名称。
P.S. You should move Pattern p = Pattern.compile(REGEX);
outside the loop.
附:你应该移动Pattern p = Pattern.compile(REGEX);在循环之外。
#1
0
Use this regexp:
使用此正则表达式:
(?:<a class=\"listing\"[^>]*>)([^<]*)(?:<)
Its group 1 will capture the name.
它的第1组将捕获名称。
P.S. You should move Pattern p = Pattern.compile(REGEX);
outside the loop.
附:你应该移动Pattern p = Pattern.compile(REGEX);在循环之外。