将CSV导入Google云数据存储区

时间:2022-09-05 15:48:44

I have a CSV file with 2 columns and 20,000 rows I would like to import into Google Cloud Datastore. I'm new to the Google Cloud and NoSQL databases. I have tried using dataflow but need to provide a Javascript UDF function name. Does anyone have an example of this? I will be querying this data once it's in the datastore. Any advice or guidance on how to create this would be appreciated.

我有一个包含2列和20,000行的CSV文件,我想将其导入Google Cloud Datastore。我是Google Cloud和NoSQL数据库的新手。我已尝试使用数据流,但需要提供Javascript UDF函数名称。有没有人有这样的例子?我将在数据存储区中查询这些数据。任何关于如何创建这个的建议或指导将不胜感激。

2 个解决方案

#1


3  

Using Apache Beam, you can read a CSV file using the TextIO class. See the TextIO documentation.

使用Apache Beam,您可以使用TextIO类读取CSV文件。请参阅TextIO文档。

Pipeline p = Pipeline.create();

p.apply(TextIO.read().from("gs://path/to/file.csv"));

Next, apply a transform that will parse each row in the CSV file and return an Entity object. Depending on how you want to store each row, construct the appropriate Entity object. This page has an example of how to create an Entity object.

接下来,应用一个转换,该转换将解析CSV文件中的每一行并返回一个Entity对象。根据您希望存储每一行​​的方式,构造相应的Entity对象。此页面有一个如何创建Entity对象的示例。

.apply(ParDo.of(new DoFn<String, Entity>() {
    @ProcessElement
    public void processElement(ProcessContext c) {
        String row = c.element();
        // TODO: parse row (split) and construct Entity object
        Entity entity = ...
        c.output(entity);
    }
}));

Lastly, write the Entity objects to Cloud Datastore. See the DatastoreIO documentation.

最后,将Entity对象写入Cloud Datastore。请参阅DatastoreIO文档。

.apply(DatastoreIO.v1().write().withProjectId(projectId));

#2


0  

Simple in python, but can easily adapt to other langauges. Use the split() method to loop through the lines and comma-separated values:

简单的python,但可以很容易地适应其他语言。使用split()方法遍历行和逗号分隔的值:

from google.appengine.api import urlfetch
from my.models import MyModel

csv_string   = 'http://someplace.com/myFile.csv'
csv_response = urlfetch.fetch(csv_string, allow_truncated=True) 

if csv_response.status_code == 200:
    for row in csv_response.content.split('\n'):
        row_values = row.split(',')
        # csv values are strings.  Cast them if they need to be something else
        new_entry = MyModel(
            property1 = row_values[0],
            property2 = row_values[1]
        )
        new_entry.put()

else:
    print 'cannot load file: {}'.format(csv_string)

#1


3  

Using Apache Beam, you can read a CSV file using the TextIO class. See the TextIO documentation.

使用Apache Beam,您可以使用TextIO类读取CSV文件。请参阅TextIO文档。

Pipeline p = Pipeline.create();

p.apply(TextIO.read().from("gs://path/to/file.csv"));

Next, apply a transform that will parse each row in the CSV file and return an Entity object. Depending on how you want to store each row, construct the appropriate Entity object. This page has an example of how to create an Entity object.

接下来,应用一个转换,该转换将解析CSV文件中的每一行并返回一个Entity对象。根据您希望存储每一行​​的方式,构造相应的Entity对象。此页面有一个如何创建Entity对象的示例。

.apply(ParDo.of(new DoFn<String, Entity>() {
    @ProcessElement
    public void processElement(ProcessContext c) {
        String row = c.element();
        // TODO: parse row (split) and construct Entity object
        Entity entity = ...
        c.output(entity);
    }
}));

Lastly, write the Entity objects to Cloud Datastore. See the DatastoreIO documentation.

最后,将Entity对象写入Cloud Datastore。请参阅DatastoreIO文档。

.apply(DatastoreIO.v1().write().withProjectId(projectId));

#2


0  

Simple in python, but can easily adapt to other langauges. Use the split() method to loop through the lines and comma-separated values:

简单的python,但可以很容易地适应其他语言。使用split()方法遍历行和逗号分隔的值:

from google.appengine.api import urlfetch
from my.models import MyModel

csv_string   = 'http://someplace.com/myFile.csv'
csv_response = urlfetch.fetch(csv_string, allow_truncated=True) 

if csv_response.status_code == 200:
    for row in csv_response.content.split('\n'):
        row_values = row.split(',')
        # csv values are strings.  Cast them if they need to be something else
        new_entry = MyModel(
            property1 = row_values[0],
            property2 = row_values[1]
        )
        new_entry.put()

else:
    print 'cannot load file: {}'.format(csv_string)