在spark 2.0.0中以流媒体方式读取excel文件

时间:2022-12-15 20:51:48

I have a set of excel format files which needs to be read from spark(2.0.0) as and when an excel file is loaded into a local directory.Scala version used here is 2.11.8.

我有一组excel格式文件,当excel文件加载到本地目录时需要从spark(2.0.0)读取。这里使用的版本是2.11.8。

I've tried using readstream method of sparkSession,but I'm not able to read in a streaming way.I'm able to read excel files statically as:

我尝试过使用sparkSession的readstream方法,但是我无法以流方式读取。我能够静态读取excel文件:

val df = spark.read.format("com.crealytics.spark.excel").option("sheetName", "Data").option("useHeader", "true").load("Sample.xlsx")

Is there any other way of reading excel files in streaming way from a local directory?

有没有其他方法从本地目录以流方式读取excel文件?

Any answers would be helpful.

任何答案都会有所帮助。

Thanks

谢谢


Changes done:

完成的更改:

val spark = SparkSession.builder().master("local[*]").config("spark.sql.warehouse.dir","file:///D:/pooja").appName("Spark SQL Example").getOrCreate()
spark.conf.set("spark.sql.streaming.schemaInference", true)
import spark.implicits._  
val dataFrame = spark.readStream.format("csv").option("inferSchema",true).option("header", true).load("file:///D:/pooja/sample.csv")
dataFrame.writeStream.format("console").start()
dataFrame.show()

Updated code:

更新的代码:

val spark = SparkSession.builder().master("local[*]").appName("Spark SQL Example").getOrCreate()
spark.conf.set("spark.sql.streaming.schemaInference", true)
import spark.implicits._  
val df = spark.readStream.format("com.crealytics.spark.excel").option("header", true).load("file:///filepath/*.xlsx")
df.writeStream.format("memory").queryName("tab").start().awaitTermination()
val res = spark.sql("select * from tab")
res.show()

Error:

错误:

Exception in thread "main" java.lang.UnsupportedOperationException: Data source com.crealytics.spark.excel does not support streamed reading

Can anyone help me resolve this issue.

任何人都可以帮我解决这个问题。

1 个解决方案

#1


0  

For a streaming data frame you have to provide Schema and Currently, DataStreamReader does not support option("inferSchema", true|false). You can set SQLConf setting "spark.sql.streaming.schemaInference", which needs to be set at session level.

对于流数据帧,您必须提供Schema和Current,DataStreamReader不支持选项(“inferSchema”,true | false)。您可以设置SQLConf设置“spark.sql.streaming.schemaInference”,需要在会话级别设置。

You can refer here

你可以参考这里

#1


0  

For a streaming data frame you have to provide Schema and Currently, DataStreamReader does not support option("inferSchema", true|false). You can set SQLConf setting "spark.sql.streaming.schemaInference", which needs to be set at session level.

对于流数据帧,您必须提供Schema和Current,DataStreamReader不支持选项(“inferSchema”,true | false)。您可以设置SQLConf设置“spark.sql.streaming.schemaInference”,需要在会话级别设置。

You can refer here

你可以参考这里