There is a tool called Avro-Tools which ships with Avro and can be used to convert between JSON, Avro-Schema (.avsc) and binary formats. But it does not work with circular references.
有一个叫做Avro- tools的工具,它附带Avro,可以用于在JSON、Avro- schema (.avsc)和二进制格式之间进行转换。但它不支持循环引用。
We have two files:
我们有两个文件:
-
circular.avsc (generated by Avro)
圆形。avsc(由Avro生成)
-
circular.json (generated by Jackson because it has circular reference and Avro doesn't like the same).
圆形。json(由Jackson生成,因为它有循环引用,而Avro不喜欢相同)。
circular.avsc
circular.avsc
{
"type":"record",
"name":"Parent",
"namespace":"bigdata.example.avro",
"fields":[
{
"name":"name",
"type":[
"null",
"string"
],
"default":null
},
{
"name":"child",
"type":[
"null",
{
"type":"record",
"name":"Child",
"fields":[
{
"name":"name",
"type":[
"null",
"string"
],
"default":null
},
{
"name":"parent",
"type":[
"null",
"Parent"
],
"default":null
}
]
}
],
"default":null
}
]
}
circular.json
circular.json
{
"@class":"bigdata.example.avro.Parent",
"@circle_ref_id":1,
"name":"parent",
"child":{
"@class":"bigdata.example.avro.DerivedChild",
"@circle_ref_id":2,
"name":"hello",
"parent":1
}
}
Command to run avro-tools on the above
在上面运行avro工具的命令
java -jar avro-tools-1.7.6.jar fromjson --schema-file circular.avsc circular.json
java jar avro-tools-1.7.6。jar fromjson——模式文件循环。avsc circular.json
Output
输出
2014-06-09 14:29:17.759 java[55860:1607] Unable to load realm mapping info from SCDynamicStore Objavro.codenullavro.schema? {"type":"record","name":"Parent","namespace":"bigdata.example.avro","fields":[{"name":"name","type":["null","string"],"default":null},{"name":"child","type":["null",{"type":"record","name":"Child","fields":[{"name":"name","type":["null","string"],"default":null},{"name":"parent","type":["null","Parent"],"default":null}]}],"default":null}]}?'???K?jH!??Ė?Exception in thread "main" org.apache.avro.AvroTypeException: Expected start-union. Got VALUE_STRING at org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
java[55860:1607]无法加载来自SCDynamicStore Objavro.codenullavro.schema的领域映射信息。{“类型”:“记录”、“名称”:“父”、“名称”:“bigdata.example.avro”、“字段”:[{ " name ":"名称”、“类型”:“空”,“弦”,“默认”:零},{“名称”:“孩子”,“类型”:[“零”,{“类型”:“记录”、“名称”:“孩子”,“字段”:[{ " name ":"名称”、“类型”:“空”,“弦”,“默认”:零},{“名称”:“父”、“类型”:“空”、“父”,“默认”:零}]}],“默认”:零}]} ?”? ? ? K ? jH ! ? ?Ė吗?线程“main”或“apache.avro”中的异常。start-union AvroTypeException:预期。有VALUE_STRING org.apache.avro.io.JsonDecoder.error(JsonDecoder.java:697)
at org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
org.apache.avro.io.JsonDecoder.readIndex(JsonDecoder.java:441)
at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
Some other JSON values tried with the same schema but that did not work
其他一些JSON值尝试使用相同的模式,但没有成功
JSON 1
JSON 1
{
"name":"parent",
"child":{
"name":"hello",
"parent":null
}
}
JSON 2
JSON 2
{
"name":"parent",
"child":{
"name":"hello",
}
}
JSON 3
JSON 3
{
"@class":"bigdata.example.avro.Parent",
"@circle_ref_id":1,
"name":"parent",
"child":{
"@class":"bigdata.example.avro.DerivedChild",
"@circle_ref_id":2,
"name":"hello",
"parent":null
}
}
Removing some of the "optional" elements:
删除一些“可选”元素:
circular.avsc
circular.avsc
{
"type":"record",
"name":"Parent",
"namespace":"bigdata.example.avro",
"fields":[
{
"name":"name",
"type":
"string",
"default":null
},
{
"name":"child",
"type":
{
"type":"record",
"name":"Child",
"fields":[
{
"name":"name",
"type":
"string",
"default":null
},
{
"name":"parent",
"type":
"Parent",
"default":null
}
]
},
"default":null
}
]
}
circular.json
circular.json
{
"@class":"bigdata.example.avro.Parent",
"@circle_ref_id":1,
"name":"parent",
"child":{
"@class":"bigdata.example.avro.DerivedChild",
"@circle_ref_id":2,
"name":"hello",
"parent":1
}
}
output
输出
2014-06-09 15:30:53.716 java[56261:1607] Unable to load realm mapping info from SCDynamicStore Objavro.codenullavro.schema?{"type":"record","name":"Parent","namespace":"bigdata.example.avro","fields":[{"name":"name","type":"string","default":null},{"name":"child","type":{"type":"record","name":"Child","fields":[{"name":"name","type":"string","default":null},{"name":"parent","type":"Parent","default":null}]},"default":null}]}?x?N??O"?M?`AbException in thread "main" java.lang.*Error
2014-06-09 15:30:53.716 java(56261:1607)无法加载域映射信息从SCDynamicStore Objavro.codenullavro.schema ? {“类型”:“记录”,“名字”:“父”,“名称”:“bigdata.example.avro”,“字段”:[{“名称”:“名字”,“类型”:“弦”、“默认”:零},{“名称”:“孩子”,“类型”:{“类型”:“记录”,“名字”:“孩子”,“字段”:[{“名称”:“名字”,“类型”:“弦”、“默认”:零},{“名称”:“父”,“类型”:“父”,“默认”:零}]},“默认”:零}]} ? x N ? ? O”米?在线程“main”java.lang.*Error中设置“AbException”
at org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:212)
org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:212)
at org.apache.avro.io.parsing.Symbol$Sequence.flattenedSize(Symbol.java:323)
org.apache.avro.io.parsing.Symbol Sequence.flattenedSize美元(Symbol.java:323)
at org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:216)
org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:216)
at org.apache.avro.io.parsing.Symbol$Sequence.flattenedSize(Symbol.java:323)
org.apache.avro.io.parsing.Symbol Sequence.flattenedSize美元(Symbol.java:323)
at org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:216)
org.apache.avro.io.parsing.Symbol.flattenedSize(Symbol.java:216)
at org.apache.avro.io.parsing.Symbol$Sequence.flattenedSize(Symbol.java:323)
org.apache.avro.io.parsing.Symbol Sequence.flattenedSize美元(Symbol.java:323)
Does anyone know how I can make circular reference work with Avro?
有谁知道我怎么用Avro做循环引用吗?
1 个解决方案
#1
1
I met this same problem recently and resolved in a work-around way, hopefully it could help.
我最近遇到了同样的问题,并以一种变通的方式解决了这个问题,希望它能有所帮助。
Based on the Avro specification:
根据Avro规范:
JSON Encoding Except for unions, the JSON encoding is the same as is used to encode field default values.
除了联合外,JSON编码与用于编码字段默认值的JSON编码相同。
The value of a union is encoded in JSON as follows:
union的值用JSON编码如下:
- if its type is null, then it is encoded as a JSON null;
- 如果其类型为null,则将其编码为JSON null;
- otherwise it is encoded as a JSON object with one name/value pair whose name is the type's name and whose value is the recursively encoded value. For Avro's named types (record, fixed or enum) the user-specified name is used, for other types the type name is used.
- 否则,它被编码为一个JSON对象,具有一个名称/值对,名称是类型的名称,值是递归编码的值。对于Avro的命名类型(记录、固定或枚举),使用用户指定的名称,对于其他类型,使用类型名称。
For example, the union schema ["null","string","Foo"], where Foo is a record name, would encode:
例如,Foo为记录名的union模式["null"、"string"、"Foo"]将编码:
- null as null;
- 空为空;
- the string "a" as {"string": "a"};
- 字符串“a”作为{“字符串”:“a”};
- and a Foo instance as {"Foo": {...}}, where {...} indicates the JSON encoding of a Foo instance.
- 一个Foo实例为{"Foo":{…} },{…}表示Foo实例的JSON编码。
If the source file could not be changed to follow the requirement, maybe we have to change the code. So I customized the original org.apache.avro.io.JsonDecoder class from avro-1.7.7 package and created my own class MyJsonDecoder.
如果不能按照要求修改源文件,那么可能需要修改代码。所以我定制了原始的org.apache.avro.io。JsonDecoder类从avro-1.7.7包中创建了我自己的类MyJsonDecoder。
Here is the key placed I changed besides create new constructors and class name:
下面是我在创建新构造函数和类名之外更改的键:
@Override
public int readIndex() throws IOException {
advance(Symbol.UNION);
Symbol.Alternative a = (Symbol.Alternative) parser.popSymbol();
String label;
if (in.getCurrentToken() == JsonToken.VALUE_NULL) {
label = "null";
//***********************************************
// Original code: according to Avor document "JSON Encoding":
// it is encoded as a Json object with one name/value pair whose name is
// the type's name and whose value is the recursively encoded value.
// Can't change source data, so remove this rule.
// } else if (in.getCurrentToken() == JsonToken.START_OBJECT &&
// in.nextToken() == JsonToken.FIELD_NAME) {
// label = in.getText();
// in.nextToken();
// parser.pushSymbol(Symbol.UNION_END);
//***********************************************
// Customized code:
// Add to check if type is in the union then parse it.
// Check if type match types in union or not.
} else {
label = findTypeInUnion(in.getCurrentToken(), a);
// Field missing but not allow to be null
// or field type is not in union.
if (label == null) {
throw error("start-union, type may not be in UNION,");
}
}
//***********************************************
// Original code: directly error out if union
// } else {
// throw error("start-union");
// }
//***********************************************
int n = a.findLabel(label);
if (n < 0)
throw new AvroTypeException("Unknown union branch " + label);
parser.pushSymbol(a.getSymbol(n));
return n;
}
/**
* Method to check if current JSON token type is declared in union.
* Do NOT support "record", "enum", "fix":
* Because there types require user defined name in Avro schema,
* if user defined names could not be found in Json file, can't decode.
*
* @param jsonToken JsonToken
* @param symbolAlternative Symbol.Alternative
* @return String Parsing label, decode in which way.
*/
private String findTypeInUnion(final JsonToken jsonToken,
final Symbol.Alternative symbolAlternative) {
// Create a map for looking up: JsonToken and Avro type
final HashMap<JsonToken, String> json2Avro = new HashMap<>();
for (int i = 0; i < symbolAlternative.size(); i++) {
// Get the type declared in union: symbolAlternative.getLabel(i).
// Map the JsonToken with Avro type.
switch (symbolAlternative.getLabel(i)) {
case "null":
json2Avro.put(JsonToken.VALUE_NULL, "null");
break;
case "boolean":
json2Avro.put(JsonToken.VALUE_TRUE, "boolean");
json2Avro.put(JsonToken.VALUE_FALSE, "boolean");
break;
case "int":
json2Avro.put(JsonToken.VALUE_NUMBER_INT, "int");
break;
case "long":
json2Avro.put(JsonToken.VALUE_NUMBER_INT, "long");
break;
case "float":
json2Avro.put(JsonToken.VALUE_NUMBER_FLOAT, "float");
break;
case "double":
json2Avro.put(JsonToken.VALUE_NUMBER_FLOAT, "double");
break;
case "bytes":
json2Avro.put(JsonToken.VALUE_STRING, "bytes");
break;
case "string":
json2Avro.put(JsonToken.VALUE_STRING, "string");
break;
case "array":
json2Avro.put(JsonToken.START_ARRAY, "array");
break;
case "map":
json2Avro.put(JsonToken.START_OBJECT, "map");
break;
default: break;
}
}
// Looking up the map to find out related Avro type to JsonToken
return json2Avro.get(jsonToken);
}
The generate idea is to check the type from source file could be found in union or not.
生成的想法是检查源文件的类型是否可以在union中找到。
Here still has some issues:
这里仍然有一些问题:
-
This solution doesn't support "record", "enum", or "fixed" Avro type because these types require user defined name. E.g. if you want union "type": ["null", {"name": "abc", "type": "record", "fields" : ...}], this code will not work. For Primitive type, this should work. But please test it before your use it for your project.
该解决方案不支持“记录”、“枚举”或“固定”Avro类型,因为这些类型需要用户定义的名称。例如:如果你想联盟“类型”:[“零”,{“名称”:“abc”,“类型”:“记录”、“字段”:……,此代码不能工作。对于原始类型,这应该是可行的。但请在您的项目中使用它之前进行测试。
-
Personally I think records should not be null because I consider records are what I need to make sure exists, if something is missing, that means I have bigger problem. If it could be omit, I prefer to use "map" as type instead of using "record" when you define the schema.
我个人认为记录不应该是空的,因为我认为记录是我需要确保存在的,如果某些东西丢失了,那意味着我有更大的问题。如果可以省略,我宁愿使用“map”作为类型,而不是在定义模式时使用“record”。
Hopefully this could help.
希望这可以帮助。
#1
1
I met this same problem recently and resolved in a work-around way, hopefully it could help.
我最近遇到了同样的问题,并以一种变通的方式解决了这个问题,希望它能有所帮助。
Based on the Avro specification:
根据Avro规范:
JSON Encoding Except for unions, the JSON encoding is the same as is used to encode field default values.
除了联合外,JSON编码与用于编码字段默认值的JSON编码相同。
The value of a union is encoded in JSON as follows:
union的值用JSON编码如下:
- if its type is null, then it is encoded as a JSON null;
- 如果其类型为null,则将其编码为JSON null;
- otherwise it is encoded as a JSON object with one name/value pair whose name is the type's name and whose value is the recursively encoded value. For Avro's named types (record, fixed or enum) the user-specified name is used, for other types the type name is used.
- 否则,它被编码为一个JSON对象,具有一个名称/值对,名称是类型的名称,值是递归编码的值。对于Avro的命名类型(记录、固定或枚举),使用用户指定的名称,对于其他类型,使用类型名称。
For example, the union schema ["null","string","Foo"], where Foo is a record name, would encode:
例如,Foo为记录名的union模式["null"、"string"、"Foo"]将编码:
- null as null;
- 空为空;
- the string "a" as {"string": "a"};
- 字符串“a”作为{“字符串”:“a”};
- and a Foo instance as {"Foo": {...}}, where {...} indicates the JSON encoding of a Foo instance.
- 一个Foo实例为{"Foo":{…} },{…}表示Foo实例的JSON编码。
If the source file could not be changed to follow the requirement, maybe we have to change the code. So I customized the original org.apache.avro.io.JsonDecoder class from avro-1.7.7 package and created my own class MyJsonDecoder.
如果不能按照要求修改源文件,那么可能需要修改代码。所以我定制了原始的org.apache.avro.io。JsonDecoder类从avro-1.7.7包中创建了我自己的类MyJsonDecoder。
Here is the key placed I changed besides create new constructors and class name:
下面是我在创建新构造函数和类名之外更改的键:
@Override
public int readIndex() throws IOException {
advance(Symbol.UNION);
Symbol.Alternative a = (Symbol.Alternative) parser.popSymbol();
String label;
if (in.getCurrentToken() == JsonToken.VALUE_NULL) {
label = "null";
//***********************************************
// Original code: according to Avor document "JSON Encoding":
// it is encoded as a Json object with one name/value pair whose name is
// the type's name and whose value is the recursively encoded value.
// Can't change source data, so remove this rule.
// } else if (in.getCurrentToken() == JsonToken.START_OBJECT &&
// in.nextToken() == JsonToken.FIELD_NAME) {
// label = in.getText();
// in.nextToken();
// parser.pushSymbol(Symbol.UNION_END);
//***********************************************
// Customized code:
// Add to check if type is in the union then parse it.
// Check if type match types in union or not.
} else {
label = findTypeInUnion(in.getCurrentToken(), a);
// Field missing but not allow to be null
// or field type is not in union.
if (label == null) {
throw error("start-union, type may not be in UNION,");
}
}
//***********************************************
// Original code: directly error out if union
// } else {
// throw error("start-union");
// }
//***********************************************
int n = a.findLabel(label);
if (n < 0)
throw new AvroTypeException("Unknown union branch " + label);
parser.pushSymbol(a.getSymbol(n));
return n;
}
/**
* Method to check if current JSON token type is declared in union.
* Do NOT support "record", "enum", "fix":
* Because there types require user defined name in Avro schema,
* if user defined names could not be found in Json file, can't decode.
*
* @param jsonToken JsonToken
* @param symbolAlternative Symbol.Alternative
* @return String Parsing label, decode in which way.
*/
private String findTypeInUnion(final JsonToken jsonToken,
final Symbol.Alternative symbolAlternative) {
// Create a map for looking up: JsonToken and Avro type
final HashMap<JsonToken, String> json2Avro = new HashMap<>();
for (int i = 0; i < symbolAlternative.size(); i++) {
// Get the type declared in union: symbolAlternative.getLabel(i).
// Map the JsonToken with Avro type.
switch (symbolAlternative.getLabel(i)) {
case "null":
json2Avro.put(JsonToken.VALUE_NULL, "null");
break;
case "boolean":
json2Avro.put(JsonToken.VALUE_TRUE, "boolean");
json2Avro.put(JsonToken.VALUE_FALSE, "boolean");
break;
case "int":
json2Avro.put(JsonToken.VALUE_NUMBER_INT, "int");
break;
case "long":
json2Avro.put(JsonToken.VALUE_NUMBER_INT, "long");
break;
case "float":
json2Avro.put(JsonToken.VALUE_NUMBER_FLOAT, "float");
break;
case "double":
json2Avro.put(JsonToken.VALUE_NUMBER_FLOAT, "double");
break;
case "bytes":
json2Avro.put(JsonToken.VALUE_STRING, "bytes");
break;
case "string":
json2Avro.put(JsonToken.VALUE_STRING, "string");
break;
case "array":
json2Avro.put(JsonToken.START_ARRAY, "array");
break;
case "map":
json2Avro.put(JsonToken.START_OBJECT, "map");
break;
default: break;
}
}
// Looking up the map to find out related Avro type to JsonToken
return json2Avro.get(jsonToken);
}
The generate idea is to check the type from source file could be found in union or not.
生成的想法是检查源文件的类型是否可以在union中找到。
Here still has some issues:
这里仍然有一些问题:
-
This solution doesn't support "record", "enum", or "fixed" Avro type because these types require user defined name. E.g. if you want union "type": ["null", {"name": "abc", "type": "record", "fields" : ...}], this code will not work. For Primitive type, this should work. But please test it before your use it for your project.
该解决方案不支持“记录”、“枚举”或“固定”Avro类型,因为这些类型需要用户定义的名称。例如:如果你想联盟“类型”:[“零”,{“名称”:“abc”,“类型”:“记录”、“字段”:……,此代码不能工作。对于原始类型,这应该是可行的。但请在您的项目中使用它之前进行测试。
-
Personally I think records should not be null because I consider records are what I need to make sure exists, if something is missing, that means I have bigger problem. If it could be omit, I prefer to use "map" as type instead of using "record" when you define the schema.
我个人认为记录不应该是空的,因为我认为记录是我需要确保存在的,如果某些东西丢失了,那意味着我有更大的问题。如果可以省略,我宁愿使用“map”作为类型,而不是在定义模式时使用“record”。
Hopefully this could help.
希望这可以帮助。