I'm trying to split this string in PHP:
我试图在PHP中拆分此字符串:
11.11.11.11 - - [25/Jan/2000:14:00:01 +0100] "GET /1986.js HTTP/1.1" 200 932 "http://domain.com/index.html" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"
How can split this into IP, date, HTTP method, domain-name and browser?
如何将其拆分为IP,日期,HTTP方法,域名和浏览器?
4 个解决方案
#1
12
This log format seems to be the Apache’s combined log format. Try this regular expression:
这种日志格式似乎是Apache的组合日志格式。试试这个正则表达式:
/^(\S+) \S+ \S+ \[([^\]]+)\] "([A-Z]+)[^"]*" \d+ \d+ "[^"]*" "([^"]*)"$/m
The matching groups are as follows:
匹配组如下:
- remote IP address
- 远程IP地址
- request date
- 查询日期
- request HTTP method
- 请求HTTP方法
- User-Agent value
- 用户代理值
But the domain is not listed there. The second quoted string is the Referer value.
但域名没有在那里列出。第二个引用的字符串是Referer值。
#2
4
You should check out a regular expression tutorial. But here is the answer:
你应该看一下正则表达式教程。但这里是答案:
if (preg_match('/^(\S+) \S+ \S+ \[(.*?)\] "(\S+).*?" \d+ \d+ "(.*?)" "(.*?)"/', $line, $m)) {
$ip = $m[1];
$date = $m[2];
$method = $m[3];
$referer = $m[4];
$browser = $m[5];
}
Take care, it's not the domain name in the log but the HTTP referer.
小心,它不是日志中的域名,而是HTTP referer。
#3
4
Here is some Perl, not PHP, but the regex to use is the same. This regex works to parse everything I've seen; clients can send some bizarre requests:
这是一些Perl,而不是PHP,但使用的正则表达式是相同的。这个正则表达式可以解析我所看到的一切;客户可以发送一些奇怪的请求:
my ($ip, $date, $method, $url, $protocol, $alt_url, $code, $bytes,
$referrer, $ua) = (m/
^(\S+)\s # IP
\S+\s+ # remote logname
(?:\S+\s+)+ # remote user
\[([^]]+)\]\s # date
"(\S*)\s? # method
(?:((?:[^"]*(?:\\")?)*)\s # URL
([^"]*)"\s| # protocol
((?:[^"]*(?:\\")?)*)"\s) # or, possibly URL with no protocol
(\S+)\s # status code
(\S+)\s # bytes
"((?:[^"]*(?:\\")?)*)"\s # referrer
"(.*)"$ # user agent
/x);
die "Couldn't match $_" unless $ip;
$alt_url ||= '';
$url ||= $alt_url;
#4
1
// # Parses the NCSA Combined Log Format lines:
$pattern = '/^([^ ]+) ([^ ]+) ([^ ]+) (\[[^\]]+\]) "(.*) (.*) (.*)" ([0-9\-]+) ([0-9\-]+) "(.*)" "(.*)"$/';
Usage:
用法:
if (preg_match($pattern,$yourstuff,$matches)) {
//# puts each part of the match in a named variable
list($whole_match, $remote_host, $logname, $user, $date_time, $method, $request, $protocol, $status, $bytes, $referer, $user_agent) = $matches;
}
#1
12
This log format seems to be the Apache’s combined log format. Try this regular expression:
这种日志格式似乎是Apache的组合日志格式。试试这个正则表达式:
/^(\S+) \S+ \S+ \[([^\]]+)\] "([A-Z]+)[^"]*" \d+ \d+ "[^"]*" "([^"]*)"$/m
The matching groups are as follows:
匹配组如下:
- remote IP address
- 远程IP地址
- request date
- 查询日期
- request HTTP method
- 请求HTTP方法
- User-Agent value
- 用户代理值
But the domain is not listed there. The second quoted string is the Referer value.
但域名没有在那里列出。第二个引用的字符串是Referer值。
#2
4
You should check out a regular expression tutorial. But here is the answer:
你应该看一下正则表达式教程。但这里是答案:
if (preg_match('/^(\S+) \S+ \S+ \[(.*?)\] "(\S+).*?" \d+ \d+ "(.*?)" "(.*?)"/', $line, $m)) {
$ip = $m[1];
$date = $m[2];
$method = $m[3];
$referer = $m[4];
$browser = $m[5];
}
Take care, it's not the domain name in the log but the HTTP referer.
小心,它不是日志中的域名,而是HTTP referer。
#3
4
Here is some Perl, not PHP, but the regex to use is the same. This regex works to parse everything I've seen; clients can send some bizarre requests:
这是一些Perl,而不是PHP,但使用的正则表达式是相同的。这个正则表达式可以解析我所看到的一切;客户可以发送一些奇怪的请求:
my ($ip, $date, $method, $url, $protocol, $alt_url, $code, $bytes,
$referrer, $ua) = (m/
^(\S+)\s # IP
\S+\s+ # remote logname
(?:\S+\s+)+ # remote user
\[([^]]+)\]\s # date
"(\S*)\s? # method
(?:((?:[^"]*(?:\\")?)*)\s # URL
([^"]*)"\s| # protocol
((?:[^"]*(?:\\")?)*)"\s) # or, possibly URL with no protocol
(\S+)\s # status code
(\S+)\s # bytes
"((?:[^"]*(?:\\")?)*)"\s # referrer
"(.*)"$ # user agent
/x);
die "Couldn't match $_" unless $ip;
$alt_url ||= '';
$url ||= $alt_url;
#4
1
// # Parses the NCSA Combined Log Format lines:
$pattern = '/^([^ ]+) ([^ ]+) ([^ ]+) (\[[^\]]+\]) "(.*) (.*) (.*)" ([0-9\-]+) ([0-9\-]+) "(.*)" "(.*)"$/';
Usage:
用法:
if (preg_match($pattern,$yourstuff,$matches)) {
//# puts each part of the match in a named variable
list($whole_match, $remote_host, $logname, $user, $date_time, $method, $request, $protocol, $status, $bytes, $referer, $user_agent) = $matches;
}