使用JSON输入步骤处理不均匀的数据

时间:2021-09-07 18:14:49

I'm trying to process the following with an JSON Input step:

我正在尝试使用JSON输入步骤处理以下内容:

{"address":[
  {"AddressId":"1_1","Street":"A Street"},
  {"AddressId":"1_101","Street":"Another Street"},
  {"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},
  {"AddressId":"1_102","Locality":"New York"}
]}

However this seems not to be possible:

然而,这似乎是不可能的:

Json Input.0 - ERROR (version 4.2.1-stable, build 15952 from 2011-10-25 15.27.10 by buildguy) : 
The data structure is not the same inside the resource! 
We found 1 values for json path [$..Locality], which is different that the number retourned for path [$..Street] (3509 values). 
We MUST have the same number of values for all paths.

The step provides Ignore Missing Path flag but it only works if all the rows misses the same path. In that case that step acts as as expected an fills the missing values with null.

该步骤提供Ignore Missing Path标志,但它仅在所有行都错过相同路径时才有效。在这种情况下,该步骤按预期运行,用null填充缺失值。

This limits the power of this step to read uneven data, which was really one of my priorities.

这限制了这一步骤读取不均匀数据的能力,这实际上是我的优先事项之一。

My step Fields are defined as follows:

我的步骤字段定义如下:

使用JSON输入步骤处理不均匀的数据

Am I missing something? Is this the correct behavior?

我错过了什么吗?这是正确的行为吗?

2 个解决方案

#1


10  

What I have done is use JSON Input using $.address[*] to read to a jsonRow field the full map of each element p.e:

我所做的是使用$ .address [*]使用JSON输入读取每个元素的完整映射的jsonRow字段p.e:

{"address":[
    {"AddressId":"1_1","Street":"A Street"},  
    {"AddressId":"1_101","Street":"Another Street"},  
    {"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},   
    {"AddressId":"1_102","Locality":"New York"} 
]}

This results in 4 jsonRows one for each element, p.e. jsonRow = {"AddressId":"1_101","Street":"Another Street"}. Then using a Javascript step I map my values using this:

这导致每个元素有4个jsonRows,p.e。 jsonRow = {“AddressId”:“1_101”,“Street”:“Another Street”}。然后使用Javascript步骤我使用以下方法映射我的值:

var AddressId = getFromMap('AddressId', jsonRow);
var Street = getFromMap('Street', jsonRow);
var Locality = getFromMap('Locality', jsonRow);

In a second script tab I inserted minified JSON parse code from https://github.com/douglascrockford/JSON-js and the getFromMap function:

在第二个脚本选项卡中,我从https://github.com/douglascrockford/JSON-js和getFromMap函数中插入了缩小的JSON解析代码:

function getFromMap(key,jsonRow){
  try{
   var map = JSON.parse(jsonRow);
  }
  catch(e){
   var message = "Unparsable JSON: "+jsonRow+" Desc: "+e.message;
   var nr_errors = 1;
   var field = "jsonRow";
   var errcode = "JSON_PARSE";
   _step_.putError(getInputRowMeta(), row, nr_errors, message, field, errcode);
   trans_Status = SKIP_TRANSFORMATION;
   return null;
  }

  if(map[key] == undefined){
   return null;
  }
  trans_Status = CONTINUE_TRANSFORMATION;
  return map[key]
}

#2


2  

You can solve this by changing the JSONPath and splitting up the steps in two JSON input steps. The following website explains a lot about JSONPath: http://goessner.net/articles/JsonPath/

您可以通过更改JSONPath并在两个JSON输入步骤中拆分步骤来解决此问题。以下网站解释了很多关于JSONPath的内容:http://goessner.net/articles/JsonPath/

$..AddressId

Does in fact return all the AddressId's in the address array, BUT since Pentaho is using grid rows for input and output [4 rows x 3 columns], it can't handle a missing value aka null value when you want as results return all the Streets (3 rows) and return all the Locality (2 rows), simply because there are no null values in the array itself as in you can't drive out of your garage with 3 wheels on your car instead of the usual 4.

实际上是否返回地址数组中的所有AddressId,但是因为Pentaho正在使用网格行进行输入和输出[4行x 3列],所以当你想要结果返回所有时,它无法处理缺失值aka null值Streets(3行)并返回所有Locality(2行),原因很简单,因为数组本身没有空值,因为你不能用车轮上的3个*驱车出你的车库而不是通常的4个。

I guess your script returns null (where X is zero) values like:

我猜你的脚本返回null(其中X为零)值如:

A S X
A S X
A S L
A X L

The scripting step can be avoided same by changing the Fields path of the first JSONinput step into:

通过将第一个JSONinput步骤的Fields路径更改为:可以避免脚本步骤:

$.address[*]

This is to retrieve all the 4 address lines. Create a next JSONinput step based on the new source field which contains the address line(s) to retrieve the address details per line:

这是为了检索所有4个地址行。根据新的源字段创建下一个JSONinput步骤,该新字段包含用于检索每行地址详细信息的地址行:

$.AddressId
$.Street
$.Locality

This yields the null values on the four address lines when a address details is not available in an address line.

当地址行中没有地址详细信息时,这会在四条地址线上产生空值。

#1


10  

What I have done is use JSON Input using $.address[*] to read to a jsonRow field the full map of each element p.e:

我所做的是使用$ .address [*]使用JSON输入读取每个元素的完整映射的jsonRow字段p.e:

{"address":[
    {"AddressId":"1_1","Street":"A Street"},  
    {"AddressId":"1_101","Street":"Another Street"},  
    {"AddressId":"1_102","Street":"One more street", "Locality":"Buenos Aires"},   
    {"AddressId":"1_102","Locality":"New York"} 
]}

This results in 4 jsonRows one for each element, p.e. jsonRow = {"AddressId":"1_101","Street":"Another Street"}. Then using a Javascript step I map my values using this:

这导致每个元素有4个jsonRows,p.e。 jsonRow = {“AddressId”:“1_101”,“Street”:“Another Street”}。然后使用Javascript步骤我使用以下方法映射我的值:

var AddressId = getFromMap('AddressId', jsonRow);
var Street = getFromMap('Street', jsonRow);
var Locality = getFromMap('Locality', jsonRow);

In a second script tab I inserted minified JSON parse code from https://github.com/douglascrockford/JSON-js and the getFromMap function:

在第二个脚本选项卡中,我从https://github.com/douglascrockford/JSON-js和getFromMap函数中插入了缩小的JSON解析代码:

function getFromMap(key,jsonRow){
  try{
   var map = JSON.parse(jsonRow);
  }
  catch(e){
   var message = "Unparsable JSON: "+jsonRow+" Desc: "+e.message;
   var nr_errors = 1;
   var field = "jsonRow";
   var errcode = "JSON_PARSE";
   _step_.putError(getInputRowMeta(), row, nr_errors, message, field, errcode);
   trans_Status = SKIP_TRANSFORMATION;
   return null;
  }

  if(map[key] == undefined){
   return null;
  }
  trans_Status = CONTINUE_TRANSFORMATION;
  return map[key]
}

#2


2  

You can solve this by changing the JSONPath and splitting up the steps in two JSON input steps. The following website explains a lot about JSONPath: http://goessner.net/articles/JsonPath/

您可以通过更改JSONPath并在两个JSON输入步骤中拆分步骤来解决此问题。以下网站解释了很多关于JSONPath的内容:http://goessner.net/articles/JsonPath/

$..AddressId

Does in fact return all the AddressId's in the address array, BUT since Pentaho is using grid rows for input and output [4 rows x 3 columns], it can't handle a missing value aka null value when you want as results return all the Streets (3 rows) and return all the Locality (2 rows), simply because there are no null values in the array itself as in you can't drive out of your garage with 3 wheels on your car instead of the usual 4.

实际上是否返回地址数组中的所有AddressId,但是因为Pentaho正在使用网格行进行输入和输出[4行x 3列],所以当你想要结果返回所有时,它无法处理缺失值aka null值Streets(3行)并返回所有Locality(2行),原因很简单,因为数组本身没有空值,因为你不能用车轮上的3个*驱车出你的车库而不是通常的4个。

I guess your script returns null (where X is zero) values like:

我猜你的脚本返回null(其中X为零)值如:

A S X
A S X
A S L
A X L

The scripting step can be avoided same by changing the Fields path of the first JSONinput step into:

通过将第一个JSONinput步骤的Fields路径更改为:可以避免脚本步骤:

$.address[*]

This is to retrieve all the 4 address lines. Create a next JSONinput step based on the new source field which contains the address line(s) to retrieve the address details per line:

这是为了检索所有4个地址行。根据新的源字段创建下一个JSONinput步骤,该新字段包含用于检索每行地址详细信息的地址行:

$.AddressId
$.Street
$.Locality

This yields the null values on the four address lines when a address details is not available in an address line.

当地址行中没有地址详细信息时,这会在四条地址线上产生空值。