在Dataflow管道中解析Stackdriver LogEntry JSON

时间:2022-02-02 15:34:32

I'm building a Dataflow pipeline to process Stackdriver logs, the data are read from Pub/Sub and results written into BigQuery. When I read from Pub/Sub I get JSON strings of LogEntry objects but what I'm really interested in is protoPayload.line records which contain user log messages. To get those I need to parse LogEntry JSON object and I found a two years old Google example how to do it:

我正在构建一个Dataflow管道来处理Stackdriver日志,数据从Pub / Sub读取,结果写入BigQuery。当我从Pub / Sub读取时,我获得了LogEntry对象的JSON字符串,但我真正感兴趣的是包含用户日志消息的protoPayload.line记录。为了得到那些我需要解析LogEntry JSON对象的东西,我发现了一个两年前的Google示例如何做到这一点:

import com.google.api.client.json.JsonParser;
import com.google.api.client.json.jackson2.JacksonFactory;
import com.google.api.services.logging.model.LogEntry;

try {
    JsonParser parser = new JacksonFactory().createJsonParser(entry);
    LogEntry logEntry = parser.parse(LogEntry.class);
    logString = logEntry.getTextPayload();
}
catch (IOException e) {
    LOG.error("IOException parsing entry: " + e.getMessage());
}
catch(NullPointerException e) {
    LOG.error("NullPointerException parsing entry: " + e.getMessage());
}

Unfortunately this doesn't work for me, the logEntry.getTextPayload() returns null. I'm not even sure if it's suppose to work as com.google.api.services.logging library is not mentioned anywhere in Google Cloud docs, the current logging library seems to be google-cloud-logging.

不幸的是,这对我不起作用,logEntry.getTextPayload()返回null。我甚至不确定是否因为com.google.api.services.logging库在Google Cloud文档中的任何地方都没有提及,因此当前的日志库似乎是google-cloud-logging。

So if anyone could suggest what is the right or simplest way of parsing LogEntry objects?

那么,如果有人能够建议解析LogEntry对象的正确或最简单的方法是什么?

1 个解决方案

#1


1  

I ended up with manually parsing LogEntry JSON with gson library, using the tree traversing approach in particular. Here is a small snippet:

我最终使用gson库手动解析LogEntry JSON,特别是使用树遍历方法。这是一个小片段:

static class ProcessLogMessages extends DoFn<String, String> {
    @ProcessElement
    public void processElement(ProcessContext c) {
        String entry = c.element();

        JsonParser parser = new JsonParser();
        JsonElement element = parser.parse(entry);
        if (element.isJsonNull()) {
            return;
        }
        JsonObject root = element.getAsJsonObject();
        JsonArray lines = root.get("protoPayload").getAsJsonObject().get("line").getAsJsonArray();
        for (int i = 0; i < lines.size(); i++) {
            JsonObject line = lines.get(i).getAsJsonObject();
            String logMessage = line.get("logMessage").getAsString();

            // Do what you need with the logMessage here
            c.output(logMessage);
        }
    }
}

This is simple enough and works fine for me since I'm interested in protoPayload.line.logMessage objects only. But I guess this is not ideal way of parsing LogEntry objects if you need to work with many attributes.

这很简单,对我来说很好,因为我只对protoPayload.line.logMessage对象感兴趣。但是,如果您需要使用许多属性,我想这不是解析LogEntry对象的理想方法。

#1


1  

I ended up with manually parsing LogEntry JSON with gson library, using the tree traversing approach in particular. Here is a small snippet:

我最终使用gson库手动解析LogEntry JSON,特别是使用树遍历方法。这是一个小片段:

static class ProcessLogMessages extends DoFn<String, String> {
    @ProcessElement
    public void processElement(ProcessContext c) {
        String entry = c.element();

        JsonParser parser = new JsonParser();
        JsonElement element = parser.parse(entry);
        if (element.isJsonNull()) {
            return;
        }
        JsonObject root = element.getAsJsonObject();
        JsonArray lines = root.get("protoPayload").getAsJsonObject().get("line").getAsJsonArray();
        for (int i = 0; i < lines.size(); i++) {
            JsonObject line = lines.get(i).getAsJsonObject();
            String logMessage = line.get("logMessage").getAsString();

            // Do what you need with the logMessage here
            c.output(logMessage);
        }
    }
}

This is simple enough and works fine for me since I'm interested in protoPayload.line.logMessage objects only. But I guess this is not ideal way of parsing LogEntry objects if you need to work with many attributes.

这很简单,对我来说很好,因为我只对protoPayload.line.logMessage对象感兴趣。但是,如果您需要使用许多属性,我想这不是解析LogEntry对象的理想方法。