循环引用在Avro中没有处理

时间:2022-08-23 08:32:27

There is a tool called Avro-Tools which ships with Avro and can be used to convert between JSON, Avro-Schema (.avsc) and binary formats. But it does not work with circular references.

有一个叫做Avro- tools的工具,它附带Avro,可以用于在JSON、Avro- schema (.avsc)和二进制格式之间进行转换。但它不支持循环引用。

We have two files:

我们有两个文件:

  1. circular.avsc (generated by Avro)

    圆形。avsc(由Avro生成)

  2. circular.json (generated by Jackson because it has circular reference and Avro doesn't like the same).

    圆形。json(由Jackson生成,因为它有循环引用,而Avro不喜欢相同)。

circular.avsc

circular.avsc

{
   "type":"record",
   "name":"Parent",
   "namespace":"bigdata.example.avro",
   "fields":[
      {
         "name":"name",
         "type":[
            "null",
            "string"
         ],
         "default":null
      },
      {
         "name":"child",
         "type":[
            "null",
            {
               "type":"record",
               "name":"Child",
               "fields":[
                  {
                     "name":"name",
                     "type":[
                        "null",
                        "string"
                     ],
                     "default":null
                  },
                  {
                     "name":"parent",
                     "type":[
                        "null",
                        "Parent"
                     ],
                     "default":null
                  }
               ]
            }
         ],
         "default":null
      }
   ]
}

circular.json

circular.json

{
   "@class":"bigdata.example.avro.Parent",
   "@circle_ref_id":1,
   "name":"parent",
   "child":{
      "@class":"bigdata.example.avro.DerivedChild",
      "@circle_ref_id":2,
      "name":"hello",
      "parent":1
   }
}

Command to run avro-tools on the above

在上面运行avro工具的命令

java -jar avro-tools-1.7.6.jar fromjson --schema-file circular.avsc circular.json

java jar avro-tools-1.7.6。jar fromjson——模式文件循环。avsc circular.json

Output

输出

2014-06-09 14:29:17.759 java[55860:1607] Unable to load realm mapping info from SCDynamicStore Objavro.codenullavro.schema? {"type":"record","name":"Parent","namespace":"bigdata.example.avro","fields":[{"name":"name","type":["null","string"],"default":null},{"name":"child","type":["null",{"type":"record","name":"Child","fields":[{"name":"name","type":["null","string"],"default":null},{"name":"parent","type":["null","Parent"],"default":null}]}],"default":null}]}?'???K?jH!??Ė?Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)

java[55860:1607]无法加载来自SCDynamicStore Objavro.codenullavro.schema的领域映射信息。{“类型”:“记录”、“名称”:“父”、“名称”:“bigdata.example.avro”、“字段”:[{ " name ":"名称”、“类型”:“空”,“弦”,“默认”:零},{“名称”:“孩子”,“类型”:[“零”,{“类型”:“记录”、“名称”:“孩子”,“字段”:[{ " name ":"名称”、“类型”:“空”,“弦”,“默认”:零},{“名称”:“父”、“类型”:“空”、“父”,“默认”:零}]}],“默认”:零}]} ?”? ? ? K ? jH ! ? ?Ė吗?线程“main”或“apache.avro”中的异常。start-union AvroTypeException:预期。有VALUE_STRING org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)

at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)

org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)

at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)

org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)

Some other JSON values tried with the same schema but that did not work

其他一些JSON值尝试使用相同的模式,但没有成功

JSON 1

JSON 1

{
   "name":"parent",
   "child":{
      "name":"hello",
      "parent":null
   }
}

JSON 2

JSON 2

{
   "name":"parent",
   "child":{
      "name":"hello",
   }
}

JSON 3

JSON 3

 {
   "@class":"bigdata.example.avro.Parent",
   "@circle_ref_id":1,
   "name":"parent",
   "child":{
      "@class":"bigdata.example.avro.DerivedChild",
      "@circle_ref_id":2,
      "name":"hello",
      "parent":null
   }
}

Removing some of the "optional" elements:

删除一些“可选”元素:

circular.avsc

circular.avsc

{
   "type":"record",
   "name":"Parent",
   "namespace":"bigdata.example.avro",
   "fields":[
      {
         "name":"name",
         "type":
            "string",
         "default":null
      },
      {
         "name":"child",
         "type":
            {
               "type":"record",
               "name":"Child",
               "fields":[
                  {
                 "name":"name",
                 "type":
                    "string",
                 "default":null
                  },
                  {
                     "name":"parent",
                     "type":
                        "Parent",
                     "default":null
                  }
               ]
            },
         "default":null
      }
   ]
}

circular.json

circular.json

 {
   "@class":"bigdata.example.avro.Parent",
   "@circle_ref_id":1,
   "name":"parent",
   "child":{
      "@class":"bigdata.example.avro.DerivedChild",
      "@circle_ref_id":2,
      "name":"hello",
      "parent":1
   }
}

output

输出

2014-06-09 15:30:53.716 java[56261:1607] Unable to load realm mapping info from SCDynamicStore Objavro.codenullavro.schema?{"type":"record","name":"Parent","namespace":"bigdata.example.avro","fields":[{"name":"name","type":"string","default":null},{"name":"child","type":{"type":"record","name":"Child","fields":[{"name":"name","type":"string","default":null},{"name":"parent","type":"Parent","default":null}]},"default":null}]}?x?N??O"?M?`AbException in thread "main" java.lang.*Error

2014-06-09 15:30:53.716 java(56261:1607)无法加载域映射信息从SCDynamicStore Objavro.codenullavro.schema ? {“类型”:“记录”,“名字”:“父”,“名称”:“bigdata.example.avro”,“字段”:[{“名称”:“名字”,“类型”:“弦”、“默认”:零},{“名称”:“孩子”,“类型”:{“类型”:“记录”,“名字”:“孩子”,“字段”:[{“名称”:“名字”,“类型”:“弦”、“默认”:零},{“名称”:“父”,“类型”:“父”,“默认”:零}]},“默认”:零}]} ? x N ? ? O”米?在线程“main”java.lang.*Error中设置“AbException”

at org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:212)

org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:212)

at org.apache.avro.io.parsing.Symbol$Sequence.flattenedSize(Symbol.java:323)

org.apache.avro.io.parsing.Symbol Sequence.flattenedSize美元(Symbol.java:323)

at org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:216)

org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:216)

at org.apache.avro.io.parsing.Symbol$Sequence.flattenedSize(Symbol.java:323)

org.apache.avro.io.parsing.Symbol Sequence.flattenedSize美元(Symbol.java:323)

at org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:216)

org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:216)

at org.apache.avro.io.parsing.Symbol$Sequence.flattenedSize(Symbol.java:323)

org.apache.avro.io.parsing.Symbol Sequence.flattenedSize美元(Symbol.java:323)

Does anyone know how I can make circular reference work with Avro?

有谁知道我怎么用Avro做循环引用吗?

1 个解决方案

#1


1  

I met this same problem recently and resolved in a work-around way, hopefully it could help.

我最近遇到了同样的问题,并以一种变通的方式解决了这个问题,希望它能有所帮助。

Based on the Avro specification:

根据Avro规范:

JSON Encoding Except for unions, the JSON encoding is the same as is used to encode field default values.

除了联合外,JSON编码与用于编码字段默认值的JSON编码相同。

The value of a union is encoded in JSON as follows:

union的值用JSON编码如下:

  • if its type is null, then it is encoded as a JSON null;
  • 如果其类型为null,则将其编码为JSON null;
  • otherwise it is encoded as a JSON object with one name/value pair whose name is the type's name and whose value is the recursively encoded value. For Avro's named types (record, fixed or enum) the user-specified name is used, for other types the type name is used.
  • 否则,它被编码为一个JSON对象,具有一个名称/值对,名称是类型的名称,值是递归编码的值。对于Avro的命名类型(记录、固定或枚举),使用用户指定的名称,对于其他类型,使用类型名称。

For example, the union schema ["null","string","Foo"], where Foo is a record name, would encode:

例如,Foo为记录名的union模式["null"、"string"、"Foo"]将编码:

  • null as null;
  • 空为空;
  • the string "a" as {"string": "a"};
  • 字符串“a”作为{“字符串”:“a”};
  • and a Foo instance as {"Foo": {...}}, where {...} indicates the JSON encoding of a Foo instance.
  • 一个Foo实例为{"Foo":{…} },{…}表示Foo实例的JSON编码。

If the source file could not be changed to follow the requirement, maybe we have to change the code. So I customized the original org.apache.avro.io.JsonDecoder class from avro-1.7.7 package and created my own class MyJsonDecoder.

如果不能按照要求修改源文件,那么可能需要修改代码。所以我定制了原始的org.apache.avro.io。JsonDecoder类从avro-1.7.7包中创建了我自己的类MyJsonDecoder。

Here is the key placed I changed besides create new constructors and class name:

下面是我在创建新构造函数和类名之外更改的键:

    @Override
public int readIndex() throws IOException {
    advance(Symbol.UNION);
    Symbol.Alternative a = (Symbol.Alternative) parser.popSymbol();

    String label;
    if (in.getCurrentToken() == JsonToken.VALUE_NULL) {
        label = "null";
//***********************************************
// Original code: according to Avor document "JSON Encoding":
// it is encoded as a Json object with one name/value pair whose name is
//   the type's name and whose value is the recursively encoded value.
// Can't change source data, so remove this rule.
//        } else if (in.getCurrentToken() == JsonToken.START_OBJECT &&
//                in.nextToken() == JsonToken.FIELD_NAME) {
//            label = in.getText();
//            in.nextToken();
//            parser.pushSymbol(Symbol.UNION_END);
//***********************************************
        // Customized code:
        // Add to check if type is in the union then parse it.
        // Check if type match types in union or not.
    } else {
        label = findTypeInUnion(in.getCurrentToken(), a);

        // Field missing but not allow to be null
        //   or field type is not in union.
        if (label == null) {
            throw error("start-union, type may not be in UNION,");
        }
    }
//***********************************************
// Original code: directly error out if union
//        } else {
//                throw error("start-union");
//        }
//***********************************************
    int n = a.findLabel(label);
    if (n < 0)
        throw new AvroTypeException("Unknown union branch " + label);
    parser.pushSymbol(a.getSymbol(n));
    return n;
}

/**
 * Method to check if current JSON token type is declared in union.
 * Do NOT support "record", "enum", "fix":
 * Because there types require user defined name in Avro schema,
 * if user defined names could not be found in Json file, can't decode.
 *
 * @param jsonToken         JsonToken
 * @param symbolAlternative Symbol.Alternative
 * @return String Parsing label, decode in which way.
 */
private String findTypeInUnion(final JsonToken jsonToken,
                               final Symbol.Alternative symbolAlternative) {
    // Create a map for looking up: JsonToken and Avro type
    final HashMap<JsonToken, String> json2Avro = new HashMap<>();

    for (int i = 0; i < symbolAlternative.size(); i++) {
        // Get the type declared in union: symbolAlternative.getLabel(i).
        // Map the JsonToken with Avro type.
        switch (symbolAlternative.getLabel(i)) {
            case "null":
                json2Avro.put(JsonToken.VALUE_NULL, "null");
                break;
            case "boolean":
                json2Avro.put(JsonToken.VALUE_TRUE, "boolean");
                json2Avro.put(JsonToken.VALUE_FALSE, "boolean");
                break;
            case "int":
                json2Avro.put(JsonToken.VALUE_NUMBER_INT, "int");
                break;
            case "long":
                json2Avro.put(JsonToken.VALUE_NUMBER_INT, "long");
                break;
            case "float":
                json2Avro.put(JsonToken.VALUE_NUMBER_FLOAT, "float");
                break;
            case "double":
                json2Avro.put(JsonToken.VALUE_NUMBER_FLOAT, "double");
                break;
            case "bytes":
                json2Avro.put(JsonToken.VALUE_STRING, "bytes");
                break;
            case "string":
                json2Avro.put(JsonToken.VALUE_STRING, "string");
                break;
            case "array":
                json2Avro.put(JsonToken.START_ARRAY, "array");
                break;
            case "map":
                json2Avro.put(JsonToken.START_OBJECT, "map");
                break;
            default: break;
        }
    }

    // Looking up the map to find out related Avro type to JsonToken
    return json2Avro.get(jsonToken);
}

The generate idea is to check the type from source file could be found in union or not.

生成的想法是检查源文件的类型是否可以在union中找到。

Here still has some issues:

这里仍然有一些问题:

  1. This solution doesn't support "record", "enum", or "fixed" Avro type because these types require user defined name. E.g. if you want union "type": ["null", {"name": "abc", "type": "record", "fields" : ...}], this code will not work. For Primitive type, this should work. But please test it before your use it for your project.

    该解决方案不支持“记录”、“枚举”或“固定”Avro类型,因为这些类型需要用户定义的名称。例如:如果你想联盟“类型”:[“零”,{“名称”:“abc”,“类型”:“记录”、“字段”:……,此代码不能工作。对于原始类型,这应该是可行的。但请在您的项目中使用它之前进行测试。

  2. Personally I think records should not be null because I consider records are what I need to make sure exists, if something is missing, that means I have bigger problem. If it could be omit, I prefer to use "map" as type instead of using "record" when you define the schema.

    我个人认为记录不应该是空的,因为我认为记录是我需要确保存在的,如果某些东西丢失了,那意味着我有更大的问题。如果可以省略,我宁愿使用“map”作为类型,而不是在定义模式时使用“record”。

Hopefully this could help.

希望这可以帮助。

#1


1  

I met this same problem recently and resolved in a work-around way, hopefully it could help.

我最近遇到了同样的问题,并以一种变通的方式解决了这个问题,希望它能有所帮助。

Based on the Avro specification:

根据Avro规范:

JSON Encoding Except for unions, the JSON encoding is the same as is used to encode field default values.

除了联合外,JSON编码与用于编码字段默认值的JSON编码相同。

The value of a union is encoded in JSON as follows:

union的值用JSON编码如下:

  • if its type is null, then it is encoded as a JSON null;
  • 如果其类型为null,则将其编码为JSON null;
  • otherwise it is encoded as a JSON object with one name/value pair whose name is the type's name and whose value is the recursively encoded value. For Avro's named types (record, fixed or enum) the user-specified name is used, for other types the type name is used.
  • 否则,它被编码为一个JSON对象,具有一个名称/值对,名称是类型的名称,值是递归编码的值。对于Avro的命名类型(记录、固定或枚举),使用用户指定的名称,对于其他类型,使用类型名称。

For example, the union schema ["null","string","Foo"], where Foo is a record name, would encode:

例如,Foo为记录名的union模式["null"、"string"、"Foo"]将编码:

  • null as null;
  • 空为空;
  • the string "a" as {"string": "a"};
  • 字符串“a”作为{“字符串”:“a”};
  • and a Foo instance as {"Foo": {...}}, where {...} indicates the JSON encoding of a Foo instance.
  • 一个Foo实例为{"Foo":{…} },{…}表示Foo实例的JSON编码。

If the source file could not be changed to follow the requirement, maybe we have to change the code. So I customized the original org.apache.avro.io.JsonDecoder class from avro-1.7.7 package and created my own class MyJsonDecoder.

如果不能按照要求修改源文件,那么可能需要修改代码。所以我定制了原始的org.apache.avro.io。JsonDecoder类从avro-1.7.7包中创建了我自己的类MyJsonDecoder。

Here is the key placed I changed besides create new constructors and class name:

下面是我在创建新构造函数和类名之外更改的键:

    @Override
public int readIndex() throws IOException {
    advance(Symbol.UNION);
    Symbol.Alternative a = (Symbol.Alternative) parser.popSymbol();

    String label;
    if (in.getCurrentToken() == JsonToken.VALUE_NULL) {
        label = "null";
//***********************************************
// Original code: according to Avor document "JSON Encoding":
// it is encoded as a Json object with one name/value pair whose name is
//   the type's name and whose value is the recursively encoded value.
// Can't change source data, so remove this rule.
//        } else if (in.getCurrentToken() == JsonToken.START_OBJECT &&
//                in.nextToken() == JsonToken.FIELD_NAME) {
//            label = in.getText();
//            in.nextToken();
//            parser.pushSymbol(Symbol.UNION_END);
//***********************************************
        // Customized code:
        // Add to check if type is in the union then parse it.
        // Check if type match types in union or not.
    } else {
        label = findTypeInUnion(in.getCurrentToken(), a);

        // Field missing but not allow to be null
        //   or field type is not in union.
        if (label == null) {
            throw error("start-union, type may not be in UNION,");
        }
    }
//***********************************************
// Original code: directly error out if union
//        } else {
//                throw error("start-union");
//        }
//***********************************************
    int n = a.findLabel(label);
    if (n < 0)
        throw new AvroTypeException("Unknown union branch " + label);
    parser.pushSymbol(a.getSymbol(n));
    return n;
}

/**
 * Method to check if current JSON token type is declared in union.
 * Do NOT support "record", "enum", "fix":
 * Because there types require user defined name in Avro schema,
 * if user defined names could not be found in Json file, can't decode.
 *
 * @param jsonToken         JsonToken
 * @param symbolAlternative Symbol.Alternative
 * @return String Parsing label, decode in which way.
 */
private String findTypeInUnion(final JsonToken jsonToken,
                               final Symbol.Alternative symbolAlternative) {
    // Create a map for looking up: JsonToken and Avro type
    final HashMap<JsonToken, String> json2Avro = new HashMap<>();

    for (int i = 0; i < symbolAlternative.size(); i++) {
        // Get the type declared in union: symbolAlternative.getLabel(i).
        // Map the JsonToken with Avro type.
        switch (symbolAlternative.getLabel(i)) {
            case "null":
                json2Avro.put(JsonToken.VALUE_NULL, "null");
                break;
            case "boolean":
                json2Avro.put(JsonToken.VALUE_TRUE, "boolean");
                json2Avro.put(JsonToken.VALUE_FALSE, "boolean");
                break;
            case "int":
                json2Avro.put(JsonToken.VALUE_NUMBER_INT, "int");
                break;
            case "long":
                json2Avro.put(JsonToken.VALUE_NUMBER_INT, "long");
                break;
            case "float":
                json2Avro.put(JsonToken.VALUE_NUMBER_FLOAT, "float");
                break;
            case "double":
                json2Avro.put(JsonToken.VALUE_NUMBER_FLOAT, "double");
                break;
            case "bytes":
                json2Avro.put(JsonToken.VALUE_STRING, "bytes");
                break;
            case "string":
                json2Avro.put(JsonToken.VALUE_STRING, "string");
                break;
            case "array":
                json2Avro.put(JsonToken.START_ARRAY, "array");
                break;
            case "map":
                json2Avro.put(JsonToken.START_OBJECT, "map");
                break;
            default: break;
        }
    }

    // Looking up the map to find out related Avro type to JsonToken
    return json2Avro.get(jsonToken);
}

The generate idea is to check the type from source file could be found in union or not.

生成的想法是检查源文件的类型是否可以在union中找到。

Here still has some issues:

这里仍然有一些问题:

  1. This solution doesn't support "record", "enum", or "fixed" Avro type because these types require user defined name. E.g. if you want union "type": ["null", {"name": "abc", "type": "record", "fields" : ...}], this code will not work. For Primitive type, this should work. But please test it before your use it for your project.

    该解决方案不支持“记录”、“枚举”或“固定”Avro类型,因为这些类型需要用户定义的名称。例如:如果你想联盟“类型”:[“零”,{“名称”:“abc”,“类型”:“记录”、“字段”:……,此代码不能工作。对于原始类型,这应该是可行的。但请在您的项目中使用它之前进行测试。

  2. Personally I think records should not be null because I consider records are what I need to make sure exists, if something is missing, that means I have bigger problem. If it could be omit, I prefer to use "map" as type instead of using "record" when you define the schema.

    我个人认为记录不应该是空的,因为我认为记录是我需要确保存在的,如果某些东西丢失了,那意味着我有更大的问题。如果可以省略,我宁愿使用“map”作为类型,而不是在定义模式时使用“record”。

Hopefully this could help.

希望这可以帮助。

相关文章