I have used Spark SQL to retrieve data from a Cassandra database:
我使用Spark SQL从Cassandra数据库中检索数据:
DataFrame customers = sqlContext.cassandraSql("SELECT email, first_name, last_name FROM customer " +
"WHERE CAST(store_id as string) = '" + storeId + "'");
After that I did some filtration and I want to save this data into another Cassandra table that looks like this:
之后我做了一些过滤,我想将这些数据保存到另一个看起来像这样的Cassandra表中:
store_id uuid,
report_name text,
report_time timestamp,
sharder int,
customer_email text,
count int static,
firts_name text,
last_name text,
PRIMARY KEY ((store_id, report_name, report_time, sharder), customer_email)
How can I add these additional properties when I save the DataFrame
into the new table? Also what is the best practice to shard the Cassandra long row using this example? I expect to have 4k-6k records in the DataFrame
, so sharding the long row is a must, but I am not sure if counting the records and then changing the sharder
for a certain number of items is the best practice in Spark or Cassandra.
将DataFrame保存到新表时,如何添加这些附加属性?另外,使用此示例对Cassandra长行进行分片的最佳做法是什么?我希望在DataFrame中有4k-6k的记录,因此必须对长行进行分片,但我不确定是否计算记录然后更改特定数量的项目的分片是Spark或Cassandra的最佳实践。
2 个解决方案
#1
3
after you have the DataFrame, you can define a case class, which has the structure of the new schema with the added properties.
在拥有DataFrame之后,您可以定义一个案例类,该案例类具有添加了属性的新模式的结构。
You can create the case class like this: case class DataFrameRecord(property1: String, property2: Long, property3: String, property4: Double)
您可以像这样创建案例类:案例类DataFrameRecord(property1:String,property2:Long,property3:String,property4:Double)
Then you can use map to convert into the new structure using the case class: df.rdd.map(p => DataFrameRecord(prop1, prop2, prop3, prop4)).toDF()
然后你可以使用map转换成使用case类的新结构:df.rdd.map(p => DataFrameRecord(prop1,prop2,prop3,prop4))。toDF()
#2
0
You will need to do some sort of transformation (like map()
) to add the properties to the data frame.
您需要进行某种转换(如map())以将属性添加到数据框。
#1
3
after you have the DataFrame, you can define a case class, which has the structure of the new schema with the added properties.
在拥有DataFrame之后,您可以定义一个案例类,该案例类具有添加了属性的新模式的结构。
You can create the case class like this: case class DataFrameRecord(property1: String, property2: Long, property3: String, property4: Double)
您可以像这样创建案例类:案例类DataFrameRecord(property1:String,property2:Long,property3:String,property4:Double)
Then you can use map to convert into the new structure using the case class: df.rdd.map(p => DataFrameRecord(prop1, prop2, prop3, prop4)).toDF()
然后你可以使用map转换成使用case类的新结构:df.rdd.map(p => DataFrameRecord(prop1,prop2,prop3,prop4))。toDF()
#2
0
You will need to do some sort of transformation (like map()
) to add the properties to the data frame.
您需要进行某种转换(如map())以将属性添加到数据框。