We have a file that we want split into 3 and that we need to perform some data cleanup on before it can be imported into Hana Vora - otherwise everything has to be typed as String, which is not ideal.
我们有一个文件,我们想要拆分为3,我们需要先执行一些数据清理才能导入Hana Vora - 否则所有内容都必须输入为String,这并不理想。
We can import and prepare the DataFrames in spark just fine, but then when i try to write to either the HDFS filesystem or, better, to save as a Table in the "com.sap.spark.vora" datasource, i get errors.
我们可以在spark中导入和准备DataFrames,但是当我尝试写入HDFS文件系统时,或者更好的是,在“com.sap.spark.vora”数据源中保存为Table时,我会收到错误。
Can any one advise on a reliable way to import the spark-prepared datasets into Hana Vora? Thanks!
任何人都可以建议以可靠的方式将火花准备的数据集导入Hana Vora吗?谢谢!
1 个解决方案
#1
0
Vora currently only officially supports appending data to an existing table (using the APPEND statement). For details see SAP HANA Vora Developer Guide -> Chapter "3.5 Appending Data to Existing Tables"
Vora目前仅正式支持将数据附加到现有表(使用APPEND语句)。有关详细信息,请参阅SAP HANA Vora开发人员指南 - >章节“3.5将数据附加到现有表”
This means you would have to create an intermediate file. Vora supports reading from CSV, ORC, Parquet files. A dataframe can be saved in an ORC and Parquet files directly from Spark (see https://spark.apache.org/docs/1.6.1/sql-programming-guide.htm). To write to CSV files from Spark see https://github.com/databricks/spark-csv
这意味着您必须创建一个中间文件。 Vora支持从CSV,ORC,Parquet文件中读取。数据帧可以直接从Spark保存在ORC和Parquet文件中(参见https://spark.apache.org/docs/1.6.1/sql-programming-guide.htm)。要从Spark写入CSV文件,请参阅https://github.com/databricks/spark-csv
#1
0
Vora currently only officially supports appending data to an existing table (using the APPEND statement). For details see SAP HANA Vora Developer Guide -> Chapter "3.5 Appending Data to Existing Tables"
Vora目前仅正式支持将数据附加到现有表(使用APPEND语句)。有关详细信息,请参阅SAP HANA Vora开发人员指南 - >章节“3.5将数据附加到现有表”
This means you would have to create an intermediate file. Vora supports reading from CSV, ORC, Parquet files. A dataframe can be saved in an ORC and Parquet files directly from Spark (see https://spark.apache.org/docs/1.6.1/sql-programming-guide.htm). To write to CSV files from Spark see https://github.com/databricks/spark-csv
这意味着您必须创建一个中间文件。 Vora支持从CSV,ORC,Parquet文件中读取。数据帧可以直接从Spark保存在ORC和Parquet文件中(参见https://spark.apache.org/docs/1.6.1/sql-programming-guide.htm)。要从Spark写入CSV文件,请参阅https://github.com/databricks/spark-csv