Spark scala:在将csv文件上传到azure blob时,FIle已经存在异常

时间:2021-06-27 23:12:10

I am reading sas file from azure blob . Converting it to csv and trying to upload csv to azure blob . However for small files in MBs I am able to do the same successfully with the following spark scala code .

我正在从azure blob读取sas文件。将其转换为csv并尝试将csv上传到azure blob。但是对于MB中的小文件,我可以使用以下spark scala代码成功完成相同的操作。

    import org.apache.spark.SparkContext 
    import org.apache.spark.SparkConf
    import org.apache.spark.sql.SQLContext 
    import com.github.saurfang.sas.spark._

     val sqlContext = new SQLContext(sc) 
   val df=sqlContext.sasFile("wasbs://container@storageaccount/input.sas7bdat")
     df.write.format("csv").save("wasbs://container@storageaccount/output.csv");

But for large files in GB it gives me Analysis exception wasbs://container@storageaccount/output.csv file already exists exception. I have tried overwrite also . But no luck . Any help would be appriciated

但是对于GB中的大文件,它给出了我的分析异常wasbs://container@storageaccount/output.csv文件已经存在异常。我也尝试过覆盖。但没有运气。任何帮助都会得到满足

1 个解决方案

#1


0  

Actually, you could not overwrite an existing file on HDFS normally, even for small files in MBs.

实际上,您无法正常覆盖HDFS上的现有文件,即使对于MB中的小文件也是如此。

Please try to use the code below to overwrite, please check your spark version because there are some differences to use the methed for different spark version.

请尝试使用下面的代码覆盖,请检查您的火花版本,因为使用不同火花版本的方法有一些差异。

df.write.format("csv").mode("overwrite").save("wasbs://container@storageaccount/output.csv");

I don't know the code above using overwrite mode whether you had tried as you said.

我不知道上面的代码使用覆盖模式你是否按照你的说法尝试过。

So there is another way to do it that first delete the existing files befer do the overwrite operation.

因此,还有另一种方法可以首先删除现有文件,然后执行覆盖操作。

val hadoopConf = new org.apache.hadoop.conf.Configuration()
val hdfs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("<hdfs://<namenodehost>/ or wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path> >"), hadoopConf)
try { hdfs.delete(new org.apache.hadoop.fs.Path(filepath), true) } catch { case _ : Throwable => { } }

And there is a spark topic discussed similar issue, please see http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-make-Spark-1-0-saveAsTextFile-to-overwrite-existing-file-td6696.html.

并且有一个火花话题讨论了类似的问题,请参阅http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-make-Spark-1-0-saveAsTextFile-to-覆盖现有文件 - td6696.html。

#1


0  

Actually, you could not overwrite an existing file on HDFS normally, even for small files in MBs.

实际上,您无法正常覆盖HDFS上的现有文件,即使对于MB中的小文件也是如此。

Please try to use the code below to overwrite, please check your spark version because there are some differences to use the methed for different spark version.

请尝试使用下面的代码覆盖,请检查您的火花版本,因为使用不同火花版本的方法有一些差异。

df.write.format("csv").mode("overwrite").save("wasbs://container@storageaccount/output.csv");

I don't know the code above using overwrite mode whether you had tried as you said.

我不知道上面的代码使用覆盖模式你是否按照你的说法尝试过。

So there is another way to do it that first delete the existing files befer do the overwrite operation.

因此,还有另一种方法可以首先删除现有文件,然后执行覆盖操作。

val hadoopConf = new org.apache.hadoop.conf.Configuration()
val hdfs = org.apache.hadoop.fs.FileSystem.get(new java.net.URI("<hdfs://<namenodehost>/ or wasb[s]://<containername>@<accountname>.blob.core.windows.net/<path> >"), hadoopConf)
try { hdfs.delete(new org.apache.hadoop.fs.Path(filepath), true) } catch { case _ : Throwable => { } }

And there is a spark topic discussed similar issue, please see http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-make-Spark-1-0-saveAsTextFile-to-overwrite-existing-file-td6696.html.

并且有一个火花话题讨论了类似的问题,请参阅http://apache-spark-user-list.1001560.n3.nabble.com/How-can-I-make-Spark-1-0-saveAsTextFile-to-覆盖现有文件 - td6696.html。