在pandas中写入文件的问题

时间:2022-06-29 00:05:22

I'm currently trying to write an excel file from a file format using the function tr8 pd.to_excel of pandas. However, It writes the excel file, but when opening in excel I cannot see the full data. I attached the code of tr8

我目前正在尝试使用pandas函数tr8 pd.to_excel从文件格式编写excel文件。但是,它写了excel文件,但是当在excel中打开时,我看不到完整的数据。我附上了tr8的代码

output_file = pd.ExcelWriter('20131001103311.xlsx')
widths = [1, 8, 2, 4, 2, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 5, 1, 10, 1]
df = pd.read_fwf('20131001103311.tr8', widths=widths, header=True)
df.columns = ['TIP. REG.', 'COD. EST.', 'TIP. INF.', 'AGNO', 'DEL', 'ENE', 'OBS', 'FEB', 'OBS', 'MAR', 'OBS', 'ABR',
              'OBS', 'MAY', 'OBS', 'JUN', 'OBS', 'JUL', 'OBS', 'AGO', 'OBS', 'SEP', 'OBS', 'OCT', 'OBS', 'NOV', 'OBS',
              'DIC', 'OBS', 'ESP.', 'TIP. DATO']
df.to_excel(output_file, '20131001103311')
output_file.save()

1 个解决方案

#1


2  

I simplified your program down to 2 columns of data for testing:

我将您的程序简化为2列数据以进行测试:

import pandas as pd

output_file = pd.ExcelWriter('20131001103311.xlsx')

widths = [10, 10]
df = pd.read_fwf('20131001103311.tr8', widths=widths, header=True)

df.columns = ['TIP. REG.', 'COD. EST.']

df.to_excel(output_file, '20131001103311')
output_file.save()

And I ran it against the following fixed width format fwf file:

我用以下固定宽度格式的fwf文件运行它:

$ cat 20131001103311.tr8
TIP. REG. COD. EST.
1         1000
2         300
3         7000
4         600
5         12345

I didn't get any execution errors and the output looks like it should:

我没有得到任何执行错误,输出看起来应该是:

在pandas中写入文件的问题

The first row of data is missing since the parameter header=True has been passed to read_fwf.

由于参数header = True已传递给read_fwf,因此缺少第一行数据。

So it doesn't seem to be an pandas issue.

所以它似乎不是一个熊猫问题。

I would look at the columns in your fixed width fields file. Perhaps print it out after reading to see if the column names that you supply to df.columns have all been parsed correctly.

我会查看固定宽度字段文件中的列。也许在阅读之后打印出来,看看你提供给df.columns的列名是否都已正确解析。

Update: Looking at the images of the input data and the output file that @jchavarro tried to upload it looks that there may be an issue here. At least the Excel output doesn't tie up with the DataFrame data. Probably due to the repeated OBS columns.

更新:查看@jchavarro尝试上传的输入数据和输出文件的图像,看起来可能存在问题。至少Excel输出不会与DataFrame数据绑定。可能是由于重复的OBS列。

Update 2: This is an issue. I've raised it on GitHub and submitted a fix.

更新2:这是一个问题。我在GitHub上提出了它并提交了修复程序。

Update 3: I created a fix for the above issue which has now been merged into the pandas master branch and which should be released as part of the 0.13 release.

更新3:我为上述问题创建了一个修复程序,现在已经合并到pandas master分支中,并且应该作为0.13版本的一部分发布。

#1


2  

I simplified your program down to 2 columns of data for testing:

我将您的程序简化为2列数据以进行测试:

import pandas as pd

output_file = pd.ExcelWriter('20131001103311.xlsx')

widths = [10, 10]
df = pd.read_fwf('20131001103311.tr8', widths=widths, header=True)

df.columns = ['TIP. REG.', 'COD. EST.']

df.to_excel(output_file, '20131001103311')
output_file.save()

And I ran it against the following fixed width format fwf file:

我用以下固定宽度格式的fwf文件运行它:

$ cat 20131001103311.tr8
TIP. REG. COD. EST.
1         1000
2         300
3         7000
4         600
5         12345

I didn't get any execution errors and the output looks like it should:

我没有得到任何执行错误,输出看起来应该是:

在pandas中写入文件的问题

The first row of data is missing since the parameter header=True has been passed to read_fwf.

由于参数header = True已传递给read_fwf,因此缺少第一行数据。

So it doesn't seem to be an pandas issue.

所以它似乎不是一个熊猫问题。

I would look at the columns in your fixed width fields file. Perhaps print it out after reading to see if the column names that you supply to df.columns have all been parsed correctly.

我会查看固定宽度字段文件中的列。也许在阅读之后打印出来,看看你提供给df.columns的列名是否都已正确解析。

Update: Looking at the images of the input data and the output file that @jchavarro tried to upload it looks that there may be an issue here. At least the Excel output doesn't tie up with the DataFrame data. Probably due to the repeated OBS columns.

更新:查看@jchavarro尝试上传的输入数据和输出文件的图像,看起来可能存在问题。至少Excel输出不会与DataFrame数据绑定。可能是由于重复的OBS列。

Update 2: This is an issue. I've raised it on GitHub and submitted a fix.

更新2:这是一个问题。我在GitHub上提出了它并提交了修复程序。

Update 3: I created a fix for the above issue which has now been merged into the pandas master branch and which should be released as part of the 0.13 release.

更新3:我为上述问题创建了一个修复程序,现在已经合并到pandas master分支中,并且应该作为0.13版本的一部分发布。