I have a number of LZO-compressed log files on Amazon S3, which I want to read from PHP. The AWS SDK provides a nice StreamWrapper for reading these files efficiently, but since the files are compressed, I need to decompress the content before I can process it.
我在Amazon S3上有许多LZO压缩的日志文件,我想从PHP中读取这些文件。 AWS SDK提供了一个很好的StreamWrapper来有效地读取这些文件,但由于文件是压缩的,我需要在处理之前解压缩内容。
I have installed the PHP-LZO extension which allows me to do lzo_decompress($data)
, but since I'm dealing with a stream rather than the full file contents, I assume I'll need to consume the string one LZO compressed block at a time. In other words, I want to do something like:
我已经安装了PHP-LZO扩展,允许我做lzo_decompress($ data),但由于我正在处理流而不是完整的文件内容,我假设我需要在字符串中使用一个LZO压缩块一时间换句话说,我想做的事情如下:
$s3 = S3Client::factory( $myAwsCredentials );
$s3->registerStreamWrapper();
$stream = fopen("s3://my_bucket/my_logfile", 'r');
$compressed_data = '';
while (!feof($stream)) {
$compressed_data .= fread($stream, 1024);
// TODO: determine if we have a full LZO block yet
if (contains_full_lzo_block($compressed_data)) {
// TODO: extract the LZO block
$lzo_block = get_lzo_block($compressed_data);
$input = lzo_decompress( $lzo_block );
// ...... and do stuff to the decompressed input
}
}
fclose($stream);
The two TODO
s are where I'm unsure what to do:
这两个TODO是我不确定该做什么的地方:
- Inspecting the data stream to dtermine whether I have a full LZO block yet
- Extracting this block for decompression
检查数据流以确定我是否还有一个完整的LZO块
提取此块以进行解压缩
Since the compression was done by Amazon (s3distCp) I don't have control over the block size, so I'll probably need to inspect the incoming stream to determine how big the blocks are -- is this a correct assumption?
由于压缩是由亚马逊(s3distCp)完成的,因此我无法控制块大小,因此我可能需要检查传入流以确定块的大小 - 这是正确的假设吗?
(ideally, I'd use a custom StreamFilter directly on the stream, but I haven't been able to find anyone who has done that before)
(理想情况下,我会直接在流上使用自定义StreamFilter,但我以前找不到任何人这样做过)
1 个解决方案
#1
1
Ok executing a command via PHP can be done in many different ways, something like:
通过PHP执行命令可以通过多种方式完成,例如:
$command = 'gunzip -c /path/src /path/dest';
$escapedCommand = escapeshellcmd($command);
system($escapedCommand);
or also
shell_exec('gunzip -c /path/src /path/dest');
will do the work. Now it's a matter of what command to execute, under Linux there's a nice command line tool called lzop which extracts orcompresses lzop files.
会做的工作。现在这是执行什么命令的问题,在Linux下有一个很好的命令行工具lzop,它提取或压缩lzop文件。
You can use it via something like:
您可以通过以下方式使用它:
lzop -dN sources.lzo
So you final code might be something as easy as:
所以你最终的代码可能很简单:
shell_exec('lzop -dN s3://my_bucket/my_logfile');
#1
1
Ok executing a command via PHP can be done in many different ways, something like:
通过PHP执行命令可以通过多种方式完成,例如:
$command = 'gunzip -c /path/src /path/dest';
$escapedCommand = escapeshellcmd($command);
system($escapedCommand);
or also
shell_exec('gunzip -c /path/src /path/dest');
will do the work. Now it's a matter of what command to execute, under Linux there's a nice command line tool called lzop which extracts orcompresses lzop files.
会做的工作。现在这是执行什么命令的问题,在Linux下有一个很好的命令行工具lzop,它提取或压缩lzop文件。
You can use it via something like:
您可以通过以下方式使用它:
lzop -dN sources.lzo
So you final code might be something as easy as:
所以你最终的代码可能很简单:
shell_exec('lzop -dN s3://my_bucket/my_logfile');