通过Apache Spark读取json数据

时间:2020-12-06 23:11:07

i am trying to read sample Json file through Apache Spark, during this process i observed one thing is that you need to keep entire json object into single line. If i keep entire json object into single line,code is working well otherwise getting exception.

我试图通过Apache Spark读取示例Json文件,在此过程中我发现有一点是你需要将整个json对象保持为单行。如果我将整个json对象保持为单行,则代码运行良好,否则会出现异常。

This is my json data:

这是我的json数据:

    [
    {
        "id": 2,
        "name": "An ice sculpture",
        "price": 12.50,
        "tags": ["cold", "ice"],
        "dimensions": {
            "length": 7.0,
            "width": 12.0,
            "height": 9.5
        },
        "warehouseLocation": {
            "latitude": -78.75,
            "longitude": 20.4
        }
    },
    {
        "id": 3,
        "name": "A blue mouse",
        "price": 25.50,
        "dimensions": {
            "length": 3.1,
            "width": 1.0,
            "height": 1.0
        },
        "warehouseLocation": {
            "latitude": 54.4,
            "longitude": -32.7
        }
    }
]

This is my code:

这是我的代码:

SparkSession session = new SparkSession.Builder().appName("JsonRead").master("local").getOrCreate();
        Dataset<Row> json = session.read().json("/Users/mac/Desktop/a.json");
        json.select("tags").show();

In case of small datasets its okay, is any other way to process large json datasets?

如果小数据集没问题,还有其他方法可以处理大型json数据集吗?

1 个解决方案

#1


2  

see the document: http://spark.apache.org/docs/2.0.1/sql-programming-guide.html#json-datasets

请参阅文档:http://spark.apache.org/docs/2.0.1/sql-programming-guide.html#json-datasets

JSON Datasets

Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail.

请注意,作为json文件提供的文件不是典型的JSON文件。每行必须包含一个单独的,自包含的有效JSON对象。因此,常规的多行JSON文件通常会失败。

#1


2  

see the document: http://spark.apache.org/docs/2.0.1/sql-programming-guide.html#json-datasets

请参阅文档:http://spark.apache.org/docs/2.0.1/sql-programming-guide.html#json-datasets

JSON Datasets

Note that the file that is offered as a json file is not a typical JSON file. Each line must contain a separate, self-contained valid JSON object. As a consequence, a regular multi-line JSON file will most often fail.

请注意,作为json文件提供的文件不是典型的JSON文件。每行必须包含一个单独的,自包含的有效JSON对象。因此,常规的多行JSON文件通常会失败。