最近要大量处理一批.csv文件,由于收集到的数据大部分是9列,但是有的行的列数大于9,因此想写个程序去批处理这些.csv文件,使得处理好的文件可以是规则的,方便导入数据库。
方法一:
首先我想到了用opencsv去实现数据处理,但是我在其中遇到了一些问题,先说明一下,我处理的数据都是以省份中文拼音简写的.csv文件,比如anhui.csv
先上代码
package anhui;
import java.io.File;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.IOException;
import com.opencsv.CSVReader;
import com.opencsv.CSVWriter;
public class Anhui {
public static void main(String[] args) throws IOException{
// TODO Auto-generated method stub
FileReader fileReader = new FileReader(new File("F:\\Porject\\data\\WeiboDataShare-master\\anhui.csv"));
FileWriter fileWriter = new FileWriter(new File("F:\\Porject\\data\\Sina_dataAfterDeal\\anhui_new.csv"));
CSVReader reader = new CSVReader(fileReader);
CSVWriter writer = new CSVWriter(fileWriter);
String[] strs = reader.readNext();
int count = 0;
int wrong_count = 0;
int k = 0;
while(strs != null){
k++;
if(strs.length==9){
writer.writeNext(strs);
count++;
}
else{
wrong_count++;
System.out.println("Wrong line :"+k);
System.out.println("Wrong count:" + wrong_count);
}
System.out.println("执行到第"+k+"行");
strs = reader.readNext();
if(strs == null){
break;
}
}
reader.close();
writer.close();
System.out.println("The right lines is:" + count);
System.out.println("The wrong lines is:" + (k-count));
System.out.println("END!!");
}
}
这个代码的主要功能就是去除哪些行的列数不为9的行,并将这些数据写到一个新的.csv文件中。
这个代码在实现的时候,对于少数的数据出现了一个问题:在执行到文件中的某行的时候程序停滞不前,程序也不结束,没有报异常,这个问题我现在都没搞明白,仔细的查看了程序停滞的代码行,发现和其他的行在结构上没有任何区别。但是在其他大多数的数据处理的时候是没问题的,希望解决这个问题的伙伴能够指导指导,不胜感激!
正是由于上面这个程序不能完全地处理完我的数据,我用第二个方法:
package anhui;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
public class Readmethod2 {
public static void main(String[] args) throws IOException {
// TODO Auto-generated method stub
String encoding = "utf-8";
String[] provinces = {"anhui","aomen","beijing","chongqing","fujian","ganshu","guangdong","guangxi","guizhou"
,"hainan","hebei","heilongjiang","henan","huan","hubei","jiangsu","jiangxi",
"jilin","liaoning","neimenggu","ningxia","qinghai","shan1xi","shan3xi","shandong","shanghai",
"sicuan","*","tianjin","xianggang","*","xizang","yunnan","zhejiang"};
for(int i = 0; i < provinces.length; i++){
String name_str = provinces[i]+".csv";
String newName_str = provinces[i]+"_new.csv";
String readFilePath = "F:\\Porject\\data\\WeiboDataShare-master\\"+name_str;
String writeFilePath = "F:\\Porject\\data\\deal_data\\"+newName_str;
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(new File(readFilePath)) ,encoding));
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(new File(writeFilePath)) ,encoding));
String string = null;
int count = 0;
int sum = 0;
//int k = 0;
while((string = reader.readLine())!=null){
sum++;
String[] strs = string.split(",");
if(strs.length == 9){
writer.write(string);
writer.newLine();
writer.flush();
count++;
}
else{
//System.out.println("Wrong line: "+sum);
//System.out.println("Wrong lines number is: " + k);
}
}
//System.out.println(string);
reader.close();
writer.close();
//System.out.println("The sum of lines are : "+sum);
//System.out.println("The count of wrong lines is :"+count);
System.out.println(name_str+"Finshed!!!!");
}
}
}
这个程序就能批处理我的数据!希望对大家有所帮助。