从谷歌数据流输出到谷歌云端防火墙

时间:2022-04-15 15:26:32

Good day, after A-lot of struggling trying to get to grips with this technology ( dataflow ) I've managed to get the pipeline working 100%.

美好的一天,经过一番努力试图掌握这项技术(数据流)后,我设法让管道100%运转。

What it does is load a bunch of CSV files into the pipeline ( From google cloud storage ), transform them into "Domain" objects and then save them in JSON format to a file.

它的作用是将一堆CSV文件加载到管道中(从谷歌云存储),将它们转换为“域”对象,然后将它们以JSON格式保存到文件中。

What I'd like to do instead though is take the JSON object and push it to a database ( google cloud firestore ) directly.

我想做的是将JSON对象直接推送到数据库(google cloud firestore)。

The final transform that I apply to my data at this stage is :

我在此阶段应用于我的数据的最终转换是:

.apply(DatastoreIO.v1().write().withProjectId("____"));

As I understand that call requires the previous transform to return an Entity object, which I am unable to create

据我所知,调用需要先前的转换返回一个我无法创建的Entity对象

public Entity toEntity() {
    Datastore datastore = DatastoreOptions.getDefaultInstance().getService();
    Key taskKey = datastore.newKeyFactory().setKind("Task").newKey("Test");
    Entity e = Entity.newBuilder(taskKey).set("Domain", domain)
            .set("LocationOnsite", locOnSite)
            .set("Company", company).build();

    return e;
}

this returns a com.google.cloud.datastore.Entity instead of the required com.google.datastore.v1.Entity

这将返回com.google.cloud.datastore.Entity,而不是所需的com.google.datastore.v1.Entity

I think its worth noting that the "Domain" object also contains some ArrayLists of other objects such as "Emails" which needs to be included into the database.

我认为值得注意的是,“Domain”对象还包含一些其他对象的ArrayLists,例如需要包含在数据库中的“Emails”。

Below is a sample JSON output that I currently have :

下面是我目前拥有的示例JSON输出:

{
   "Vertical": "Business And Industrial",
   "Zip": "35229",
   "Company": "Alabama Association of Nonprofits",
   "QuantCast": "229219",
   "Twitter": "",
   "Vimeo": "",
   "LocationOnSite": "",
   "LastIndexed": "2018-02-01",
   "Pinterest": "",
   "Youtube": "",
   "TechSpend": "$250+",
   "Emails": [
      {
         "Email": "shannon@alabamanonprofits.org"
      },
      {
         "Email": "support@alabamanonprofits.org"
      },
      {
         "Email": "carla@alabamanonprofits.org"
      },
      {
         "Email": "kellie@alabamanonprofits.org"
      },
      {
         "Email": "ashley@alabamanonprofits.org"
      },
      {
         "Email": "Unknown"
      }
   ],
   "Facebook": "",
   "Google+": "",
   "Alexa": "",
   "Github": "",
   "FirstIndexed": "2011-01-03",
   "People": [
      {
         "Email": "Unknown",
         "Name": "Joshua Cirulnick"
      },
      {
         "Email": "Unknown",
         "Position": "Other",
         "Name": " Elaine Lin"
      },
      {
         "Email": "Unknown",
         "Position": "Other",
         "Name": " Terry Burkle"
      },
      {
         "Email": "Unknown",
         "Position": "Director",
         "Name": " Ashley Gilbert"
      },
      {
         "Email": "Unknown",
         "Position": "President",
         "Name": " Carol Weisman"
      },
      {
         "Email": "Unknown",
         "Position": "Csuite",
         "Name": " Shannon Ammons"
      },
      {
         "Email": "Unknown",
         "Position": "Founder",
         "Name": " Kelly McDonald"
      }
   ],
   "City": "Birmingham",
   "Telephone#s": [
      {
         "Telephone#": "+1-205-879-4712"
      },
      {
         "Telephone#": "+1-205-871-7740"
      }
   ],
   "FirstDetected": "N/A",
   "LinkedIn": "",
   "VK": "",
   "State": "AL",
   "Instagram": "",
   "Country": "US",
   "Domain": "alabamanonprofits.org",
   "LastFound": "N/A"
}

If anyone could point me in the right direction as to how to effectively get objects such as these into the google cloud firestore database, I'd be very happy!

如果有人能指出我如何有效地将这些对象带入谷歌云火库数据库,我会非常高兴!

1 个解决方案

#1


2  

You can write the data to Cloud Pub/Sub, which can trigger a function to write the data to Cloud Firestore. There was a great example of this at Google I/O 2017, which does the same thing, but with the Realtime Database.

您可以将数据写入Cloud Pub / Sub,这可以触发将数据写入Cloud Firestore的功能。在Google I / O 2017上有一个很好的例子,它可以做同样的事情,但是使用实时数据库。

You can watch it here: Data Pipelines with Firebase and Google Cloud (Google I/O '17)

您可以在此处观看:使用Firebase和Google Cloud的数据管道(Google I / O '17)

#1


2  

You can write the data to Cloud Pub/Sub, which can trigger a function to write the data to Cloud Firestore. There was a great example of this at Google I/O 2017, which does the same thing, but with the Realtime Database.

您可以将数据写入Cloud Pub / Sub,这可以触发将数据写入Cloud Firestore的功能。在Google I / O 2017上有一个很好的例子,它可以做同样的事情,但是使用实时数据库。

You can watch it here: Data Pipelines with Firebase and Google Cloud (Google I/O '17)

您可以在此处观看:使用Firebase和Google Cloud的数据管道(Google I / O '17)