Good day, after A-lot of struggling trying to get to grips with this technology ( dataflow ) I've managed to get the pipeline working 100%.
美好的一天,经过一番努力试图掌握这项技术(数据流)后,我设法让管道100%运转。
What it does is load a bunch of CSV files into the pipeline ( From google cloud storage ), transform them into "Domain" objects and then save them in JSON format to a file.
它的作用是将一堆CSV文件加载到管道中(从谷歌云存储),将它们转换为“域”对象,然后将它们以JSON格式保存到文件中。
What I'd like to do instead though is take the JSON object and push it to a database ( google cloud firestore ) directly.
我想做的是将JSON对象直接推送到数据库(google cloud firestore)。
The final transform that I apply to my data at this stage is :
我在此阶段应用于我的数据的最终转换是:
.apply(DatastoreIO.v1().write().withProjectId("____"));
As I understand that call requires the previous transform to return an Entity object, which I am unable to create
据我所知,调用需要先前的转换返回一个我无法创建的Entity对象
public Entity toEntity() {
Datastore datastore = DatastoreOptions.getDefaultInstance().getService();
Key taskKey = datastore.newKeyFactory().setKind("Task").newKey("Test");
Entity e = Entity.newBuilder(taskKey).set("Domain", domain)
.set("LocationOnsite", locOnSite)
.set("Company", company).build();
return e;
}
this returns a com.google.cloud.datastore.Entity instead of the required com.google.datastore.v1.Entity
这将返回com.google.cloud.datastore.Entity,而不是所需的com.google.datastore.v1.Entity
I think its worth noting that the "Domain" object also contains some ArrayLists of other objects such as "Emails" which needs to be included into the database.
我认为值得注意的是,“Domain”对象还包含一些其他对象的ArrayLists,例如需要包含在数据库中的“Emails”。
Below is a sample JSON output that I currently have :
下面是我目前拥有的示例JSON输出:
{
"Vertical": "Business And Industrial",
"Zip": "35229",
"Company": "Alabama Association of Nonprofits",
"QuantCast": "229219",
"Twitter": "",
"Vimeo": "",
"LocationOnSite": "",
"LastIndexed": "2018-02-01",
"Pinterest": "",
"Youtube": "",
"TechSpend": "$250+",
"Emails": [
{
"Email": "shannon@alabamanonprofits.org"
},
{
"Email": "support@alabamanonprofits.org"
},
{
"Email": "carla@alabamanonprofits.org"
},
{
"Email": "kellie@alabamanonprofits.org"
},
{
"Email": "ashley@alabamanonprofits.org"
},
{
"Email": "Unknown"
}
],
"Facebook": "",
"Google+": "",
"Alexa": "",
"Github": "",
"FirstIndexed": "2011-01-03",
"People": [
{
"Email": "Unknown",
"Name": "Joshua Cirulnick"
},
{
"Email": "Unknown",
"Position": "Other",
"Name": " Elaine Lin"
},
{
"Email": "Unknown",
"Position": "Other",
"Name": " Terry Burkle"
},
{
"Email": "Unknown",
"Position": "Director",
"Name": " Ashley Gilbert"
},
{
"Email": "Unknown",
"Position": "President",
"Name": " Carol Weisman"
},
{
"Email": "Unknown",
"Position": "Csuite",
"Name": " Shannon Ammons"
},
{
"Email": "Unknown",
"Position": "Founder",
"Name": " Kelly McDonald"
}
],
"City": "Birmingham",
"Telephone#s": [
{
"Telephone#": "+1-205-879-4712"
},
{
"Telephone#": "+1-205-871-7740"
}
],
"FirstDetected": "N/A",
"LinkedIn": "",
"VK": "",
"State": "AL",
"Instagram": "",
"Country": "US",
"Domain": "alabamanonprofits.org",
"LastFound": "N/A"
}
If anyone could point me in the right direction as to how to effectively get objects such as these into the google cloud firestore database, I'd be very happy!
如果有人能指出我如何有效地将这些对象带入谷歌云火库数据库,我会非常高兴!
1 个解决方案
#1
2
You can write the data to Cloud Pub/Sub, which can trigger a function to write the data to Cloud Firestore. There was a great example of this at Google I/O 2017, which does the same thing, but with the Realtime Database.
您可以将数据写入Cloud Pub / Sub,这可以触发将数据写入Cloud Firestore的功能。在Google I / O 2017上有一个很好的例子,它可以做同样的事情,但是使用实时数据库。
You can watch it here: Data Pipelines with Firebase and Google Cloud (Google I/O '17)
您可以在此处观看:使用Firebase和Google Cloud的数据管道(Google I / O '17)
#1
2
You can write the data to Cloud Pub/Sub, which can trigger a function to write the data to Cloud Firestore. There was a great example of this at Google I/O 2017, which does the same thing, but with the Realtime Database.
您可以将数据写入Cloud Pub / Sub,这可以触发将数据写入Cloud Firestore的功能。在Google I / O 2017上有一个很好的例子,它可以做同样的事情,但是使用实时数据库。
You can watch it here: Data Pipelines with Firebase and Google Cloud (Google I/O '17)
您可以在此处观看:使用Firebase和Google Cloud的数据管道(Google I / O '17)