填充DynamoDb表的最佳方法

时间:2022-05-29 22:18:53

Please keep in mind this is a open question and I am not looking for a specific answer but just approaches and routes I can take.

请记住这是一个悬而未决的问题,我不是在寻找具体的答案,而只是我可以采取的方法和路线。

Essentially I am getting a csv file from my aws s3 bucket. I am able to get it successfully using

基本上我从我的aws s3桶中获取一个csv文件。我能够成功使用它

AmazonS3 s3Client = new AmazonS3Client(new ProfileCredentialsProvider());        
S3Object object = s3Client.getObject(
                  new GetObjectRequest(bucketName, key));

Now I want to populate a dynamodb table using this JSON file.

现在我想使用这个JSON文件填充一个dynamodb表。

I was confused as i found all sorts of stuff online.

我很困惑,因为我在网上发现了各种各样的东西。

Here is one suggestion - This approach is however only reading the file it is not inserting anything to the dynamodb table.

这是一个建议 - 但是这种方法只是读取它没有向dynamodb表插入任何内容的文件。

Here is another suggestion - This approach is lot closer to what i am looking for , it is populating a table from a JSON file.

这是另一个建议 - 这种方法更接近我正在寻找的,它是从JSON文件填充表。

However i was wondering is there a generic way to ready any json file and populate a dynamodb table based on that ? Also for my case what approach is the best?

但是我想知道是否有一种通用的方法来准备任何json文件并基于它填充一个dynamodb表?另外对于我的情况,哪种方法最好?

Since i originally asked the question I did more work.

自从我最初问这个问题后,我做了更多工作。

What I have done so far

到目前为止我做了什么

I have a csv file sitting in s3 that looks like this

我有一个坐在s3中的csv文件,看起来像这样

name,position,points,assists,rebounds
Lebron James,SF,41,12,11
Kyrie Irving,PG,41,7,5
Stephen Curry,PG,29,8,4
Klay Thompson,SG,31,5,5

I am able to sucessfully pick it up as a s3object doing the following

我能够成功地将其作为s3object进行以下操作

AmazonS3 s3client = new AmazonS3Client(/**new ProfileCredentialsProvider()*/); 
    S3Object object = s3client.getObject(
            new GetObjectRequest("lambda-function-bucket-blah-blah", "nba.json"));
            InputStream objectData = object.getObjectContent();

Now I want to insert this in to my dynamodb table so i am attempting the following.

现在我想将它插入到我的dynamodb表中,所以我尝试以下操作。

AmazonDynamoDBClient dbClient = new AmazonDynamoDBClient();
dbClient.setRegion(Region.getRegion(Regions.US_BLAH_1));

DynamoDB dynamoDB = new DynamoDB(dbClient); 
//DynamoDB dynamoDB = new DynamoDB(client); 
Table table = dynamoDB.getTable("MyTable"); 

//after this point i have tried many json parsers etc and did table.put(item) etc but nothing has worked. I would appreciate kind help

//在这一点之后我尝试了很多json解析器等并且做了table.put(item)等但没有任何工作。我会很感激的帮助

1 个解决方案

#1


1  

For CSV parsing, you can use plain reader as your file looks quite simple

对于CSV解析,您可以使用普通阅读器,因为您的文件看起来非常简单

    AmazonS3 s3client = new AmazonS3Client(/**new ProfileCredentialsProvider()*/); 
    S3Object object = s3client.getObject(
                new GetObjectRequest("lambda-function-bucket-blah-blah", "nba.json"));
    InputStream objectData = object.getObjectContent();

    AmazonDynamoDBClient dbClient = new AmazonDynamoDBClient();
    dbClient.setRegion(Region.getRegion(Regions.US_BLAH_1));

    DynamoDB dynamoDB = new DynamoDB(dbClient); 
    //DynamoDB dynamoDB = new DynamoDB(client); 
    Table table = dynamoDB.getTable("MyTable"); 

    String line = "";
    String cvsSplitBy = ",";

    try (BufferedReader br = new BufferedReader(
                                new InputStreamReader(objectData, "UTF-8"));

        while ((line = br.readLine()) != null) {

            // use comma as separator
            String[] elements = line.split(cvsSplitBy);

            try {
                table.putItem(new Item()
                    .withPrimaryKey("name", elements[0])
                    .withString("position", elements[1])
                    .withInt("points", elements[2])
                    .....);

                System.out.println("PutItem succeeded: " + elements[0]);

            } catch (Exception e) {
                System.err.println("Unable to add user: " + elements);
                System.err.println(e.getMessage());
                break;
            }

        }

    } catch (IOException e) {
        e.printStackTrace();
    }

Depending the complexity of your CSV, you can use 3rd party libraries like Apache CSV Parser or open CSV

根据CSV的复杂程度,您可以使用第三方库,如Apache CSV Parser或打开CSV

I leave the original answer for parsing JSon

我留下解析JSon的原始答案

I would use the Jackson library and following your code do the following

我会使用Jackson库并按照您的代码执行以下操作

    AmazonS3 s3client = new AmazonS3Client(/**new ProfileCredentialsProvider()*/); 
    S3Object object = s3client.getObject(
                new GetObjectRequest("lambda-function-bucket-blah-blah", "nba.json"));
    InputStream objectData = object.getObjectContent();

    AmazonDynamoDBClient dbClient = new AmazonDynamoDBClient();
    dbClient.setRegion(Region.getRegion(Regions.US_BLAH_1));

    DynamoDB dynamoDB = new DynamoDB(dbClient); 
    //DynamoDB dynamoDB = new DynamoDB(client); 
    Table table = dynamoDB.getTable("MyTable"); 

    JsonParser parser = new JsonFactory()
        .createParser(objectData);

    JsonNode rootNode = new ObjectMapper().readTree(parser);
    Iterator<JsonNode> iter = rootNode.iterator();

    ObjectNode currentNode;

    while (iter.hasNext()) {
        currentNode = (ObjectNode) iter.next();

        String lastName  = currentNode.path("lastName").asText();
        String firstName = currentNode.path("firstName").asText();
        int minutes      = currentNode.path("minutes").asInt();
        // read all attributes from your JSon file

        try {
            table.putItem(new Item()
                .withPrimaryKey("lastName", lastName, "firstName", firstName)
                .withInt("minutes", minutes));

            System.out.println("PutItem succeeded: " + lastName + " " + firstName);

        } catch (Exception e) {
            System.err.println("Unable to add user: " + lastName + " " + firstName);
            System.err.println(e.getMessage());
            break;
        }
    }
    parser.close();

Inserting the records in your table will depend of your schema, I just put an arbitrary example, but anyway this will get you the reading of your file and the way to insert into the dynamoDB table

在表中插入记录将取决于您的模式,我只是放了一个任意的例子,但无论如何这将使您读取文件以及插入dynamoDB表的方式

As you talked about the different approaches, another possibility is to setup a AWS Pipeline

在您谈到不同的方法时,另一种可能性是设置AWS管道

#1


1  

For CSV parsing, you can use plain reader as your file looks quite simple

对于CSV解析,您可以使用普通阅读器,因为您的文件看起来非常简单

    AmazonS3 s3client = new AmazonS3Client(/**new ProfileCredentialsProvider()*/); 
    S3Object object = s3client.getObject(
                new GetObjectRequest("lambda-function-bucket-blah-blah", "nba.json"));
    InputStream objectData = object.getObjectContent();

    AmazonDynamoDBClient dbClient = new AmazonDynamoDBClient();
    dbClient.setRegion(Region.getRegion(Regions.US_BLAH_1));

    DynamoDB dynamoDB = new DynamoDB(dbClient); 
    //DynamoDB dynamoDB = new DynamoDB(client); 
    Table table = dynamoDB.getTable("MyTable"); 

    String line = "";
    String cvsSplitBy = ",";

    try (BufferedReader br = new BufferedReader(
                                new InputStreamReader(objectData, "UTF-8"));

        while ((line = br.readLine()) != null) {

            // use comma as separator
            String[] elements = line.split(cvsSplitBy);

            try {
                table.putItem(new Item()
                    .withPrimaryKey("name", elements[0])
                    .withString("position", elements[1])
                    .withInt("points", elements[2])
                    .....);

                System.out.println("PutItem succeeded: " + elements[0]);

            } catch (Exception e) {
                System.err.println("Unable to add user: " + elements);
                System.err.println(e.getMessage());
                break;
            }

        }

    } catch (IOException e) {
        e.printStackTrace();
    }

Depending the complexity of your CSV, you can use 3rd party libraries like Apache CSV Parser or open CSV

根据CSV的复杂程度,您可以使用第三方库,如Apache CSV Parser或打开CSV

I leave the original answer for parsing JSon

我留下解析JSon的原始答案

I would use the Jackson library and following your code do the following

我会使用Jackson库并按照您的代码执行以下操作

    AmazonS3 s3client = new AmazonS3Client(/**new ProfileCredentialsProvider()*/); 
    S3Object object = s3client.getObject(
                new GetObjectRequest("lambda-function-bucket-blah-blah", "nba.json"));
    InputStream objectData = object.getObjectContent();

    AmazonDynamoDBClient dbClient = new AmazonDynamoDBClient();
    dbClient.setRegion(Region.getRegion(Regions.US_BLAH_1));

    DynamoDB dynamoDB = new DynamoDB(dbClient); 
    //DynamoDB dynamoDB = new DynamoDB(client); 
    Table table = dynamoDB.getTable("MyTable"); 

    JsonParser parser = new JsonFactory()
        .createParser(objectData);

    JsonNode rootNode = new ObjectMapper().readTree(parser);
    Iterator<JsonNode> iter = rootNode.iterator();

    ObjectNode currentNode;

    while (iter.hasNext()) {
        currentNode = (ObjectNode) iter.next();

        String lastName  = currentNode.path("lastName").asText();
        String firstName = currentNode.path("firstName").asText();
        int minutes      = currentNode.path("minutes").asInt();
        // read all attributes from your JSon file

        try {
            table.putItem(new Item()
                .withPrimaryKey("lastName", lastName, "firstName", firstName)
                .withInt("minutes", minutes));

            System.out.println("PutItem succeeded: " + lastName + " " + firstName);

        } catch (Exception e) {
            System.err.println("Unable to add user: " + lastName + " " + firstName);
            System.err.println(e.getMessage());
            break;
        }
    }
    parser.close();

Inserting the records in your table will depend of your schema, I just put an arbitrary example, but anyway this will get you the reading of your file and the way to insert into the dynamoDB table

在表中插入记录将取决于您的模式,我只是放了一个任意的例子,但无论如何这将使您读取文件以及插入dynamoDB表的方式

As you talked about the different approaches, another possibility is to setup a AWS Pipeline

在您谈到不同的方法时,另一种可能性是设置AWS管道