我正在使用apache spark解析json文件。如何从json文件中获取嵌套键

I have multiple json files which keeps json data init. Json Structure look like this.

我有多个json文件，用于保存json数据init。Json结构如下所示。

{ 
  "Name":"Vipin Suman",
  "Email":"vpn2330@gmail.com",
 "Designation":"Trainee Programmer",
 "Age":22 ,
 "location":
    {"City":
           {
            "Pin":324009,
            "City Name":"Ahmedabad"
           },
    "State":"Gujarat"
   },
 "Company":
          {
           "Company Name":"Elegant",
           "Domain":"Java"
          }, 
 "Test":["Test1","Test2"]

}

I tried this

我试着这

    String jsonFilePath = "/home/vipin/workspace/Smarten/jsonParsing/Employee/Employee-03.json";

    String[] jsonFiles = jsonFilePath.split(",");

    Dataset<Row> people = sparkSession.read().json(jsonFiles);

i am getting schema for this is

我得到了这个的图式

root
 |-- Age: long (nullable = true)
 |-- Company: struct (nullable = true)
 |    |-- Company Name: string (nullable = true)
 |    |-- Domain: string (nullable = true)
 |-- Designation: string (nullable = true)
 |-- Email: string (nullable = true)
 |-- Name: string (nullable = true)
 |-- Test: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- location: struct (nullable = true)
 |    |-- City: struct (nullable = true)
 |    |    |-- City Name: string (nullable = true)
 |    |    |-- Pin: long (nullable = true)
 |    |-- State: string (nullable = true)

i am getting the view of table:-

我看到了桌子的样子:-

    +---+--------------+------------------+-----------------+-----------+--------------+--------------------+
|Age|       Company|       Designation|            Email|       Name|          Test|            location|
+---+--------------+------------------+-----------------+-----------+--------------+--------------------+
| 22|[Elegant,Java]|Trainee Programmer|vpn2330@gmail.com|Vipin Suman|[Test1, Test2]|[[Ahmedabad,32400...|
+---+--------------+------------------+-----------------+-----------+--------------+--------------------+

i want result as:-

我想要的结果:-

            Age   |  Company Name    | Domain|  Designation |  Email           |    Name          |  Test   | City Name      |  Pin   |   State   |

           22     | Elegant MicroWeb | Java  |  Programmer  | vpn2330@gmail.com | Vipin Suman     | Test1  |  Ahmedabad      | 324009  | Gujarat 
           22     | Elegant MicroWeb | Java  |  Programmer  | vpn2330@gmail.com | Vipin Suman     | Test2  |  Ahmedabad      | 324009  |

how i can get table in above formet. i tried out everything. I am new to apache spark can any one help me out??

我怎样才能在上面得到一张桌子。我尝试了一切。我是apache spark的新手，有人能帮我一下吗?

3 个解决方案

#1

If you want to parse json file which is having nesting key. So first you have to find key set from json files. Then you have to fire select command in spark to get nested data.

如果要解析具有嵌套键的json文件。所以首先要从json文件中找到key set。然后必须在spark中触发select命令以获取嵌套数据。

#2

I suggest you do your work in scala which is better supported by spark. To do your work, you can use "select" API to select a specific column, use alias to rename a column, and you can refer to here to say how to select complex data format(https://databricks.com/blog/2017/02/23/working-complex-data-formats-structured-streaming-apache-spark-2-1.html)

我建议你用scala完成你的工作，这是spark更好的支持。为了完成您的工作，您可以使用“select”API来选择特定的列，使用别名来重命名列，并且您可以在这里参考如何选择复杂的数据格式(https://databricks.com/blog7/02/23/workcomplex -data-format - structu-streaming-apache-spar-2-1.html))

Based on your result, you also need to use "explode" API (Flattening Rows in Spark)

基于您的结果，您还需要使用“explosion”API (Spark中扁平化行)

#3

In Scala it could be done like this:

在Scala中可以这样做:

people.select(
  $"Age",
  $"Company.*",
  $"Designation",
  $"Email",
  $"Name",
  explode($"Test"),
  $"location.City.*",
  $"location.State")

Unfortunately, following code in Java would fail:

不幸的是，以下的Java代码将会失败:

people.select(
  people.col("Age"),
  people.col("Company.*"),
  people.col("Designation"),
  people.col("Email"),
  people.col("Name"),
  explode(people.col("Test")),
  people.col("location.City.*"),
  people.col("location.State"));

You can use selectExpr instead though:

您可以使用selectExpr代替:

people.selectExpr(
  "Age",
  "Company.*",
  "Designation",
  "Email",
  "Name",
  "EXPLODE(Test) AS Test",
  "location.City.*",
  "location.State");

PS: You can pass the path to the directory or directories instead of the list of JSON files in sparkSession.read().json(jsonFiles);.

PS:您可以将路径传递到目录或目录，而不是sparkSession.read(). JSON (jsonFiles)中的JSON文件列表;

#1