I have multiple files which I process using Numpy and SciPy, but I am required to deliver an Excel file. How can I efficiently copy/paste a huge numpy array to Excel?
我有多个文件,我使用Numpy和SciPy处理,但我需要提供一个Excel文件。如何有效地将巨大的numpy数组复制/粘贴到Excel?
I have tried to convert to Pandas' DataFrame object, which has the very usefull function to_clipboard(excel=True)
, but I spend most of my time converting the array into a DataFrame.
我试图转换为Pandas的DataFrame对象,它具有非常有用的函数to_clipboard(excel = True),但我花了大部分时间将数组转换为DataFrame。
I cannot simply write the array to a CSV file then open it in excel, because I have to add the array to an existing file; something very hard to achieve with xlrd/xlwt and other Excel tools.
我不能简单地将数组写入CSV文件,然后在excel中打开它,因为我必须将数组添加到现有文件中;用xlrd / xlwt和其他Excel工具很难实现的东西。
4 个解决方案
#1
12
My best solution here would be to turn the array into a string, then use win32clipboard
to sent it to the clipboard. This is not a cross-platform solution, but then again, Excel is not avalable on every platform anyway.
我最好的解决方案是将数组转换为字符串,然后使用win32clipboard将其发送到剪贴板。这不是一个跨平台的解决方案,但是再说一遍,Excel无论如何都无法在所有平台上实现。
Excel uses tabs (\t
) to mark column change, and \r\n
to indicate a line change.
Excel使用制表符(\ t)标记列更改,使用\ r \ n表示行更改。
The relevant code would be:
相关代码将是:
import win32clipboard as clipboard
def toClipboardForExcel(array):
"""
Copies an array into a string format acceptable by Excel.
Columns separated by \t, rows separated by \n
"""
# Create string from array
line_strings = []
for line in array:
line_strings.append("\t".join(line.astype(str)).replace("\n",""))
array_string = "\r\n".join(line_strings)
# Put string into clipboard (open, clear, set, close)
clipboard.OpenClipboard()
clipboard.EmptyClipboard()
clipboard.SetClipboardText(array_string)
clipboard.CloseClipboard()
I have tested this code with random arrays of shape (1000,10000) and the biggest bottleneck seems to be passing the data to the function. (When I add a print
statement at the beginning of the function, I still have to wait a bit before it prints anything.)
我用随机数组形状(1000,10000)测试了这个代码,最大的瓶颈似乎是将数据传递给函数。 (当我在函数的开头添加一个print语句时,我仍然需要等待它才能打印任何东西。)
EDIT: The previous paragraph related my experience in Python Tools for Visual Studio. In this environment, it seens like the print statement is delayed. In direct command line interface, the bottleneck is in the loop, like expected.
编辑:上一段涉及我在Visual Studio的Python工具中的经验。在这种环境中,它会像print语句一样延迟。在直接命令行界面中,瓶颈在循环中,与预期的一样。
#2
2
As of today, you can also use xlwings. It's open source, and fully compatible with Numpy arrays and Pandas DataFrames.
截至今天,您还可以使用xlwings。它是开源的,与Numpy数组和Pandas DataFrames完全兼容。
#3
1
If I would need to process multiple files loaded into python and then parse into excel, I would probably make some tools using xlwt
如果我需要处理加载到python中的多个文件然后解析为excel,我可能会使用xlwt制作一些工具
That said, may I offer my recipe Pasting python data into a spread sheet open for any edits, complaints or feedback. It uses no third party libraries and should be cross platform.
也就是说,我可以提供我的食谱将python数据粘贴到电子表格中,以便进行任何编辑,投诉或反馈。它不使用第三方库,应该是跨平台的。
#1
12
My best solution here would be to turn the array into a string, then use win32clipboard
to sent it to the clipboard. This is not a cross-platform solution, but then again, Excel is not avalable on every platform anyway.
我最好的解决方案是将数组转换为字符串,然后使用win32clipboard将其发送到剪贴板。这不是一个跨平台的解决方案,但是再说一遍,Excel无论如何都无法在所有平台上实现。
Excel uses tabs (\t
) to mark column change, and \r\n
to indicate a line change.
Excel使用制表符(\ t)标记列更改,使用\ r \ n表示行更改。
The relevant code would be:
相关代码将是:
import win32clipboard as clipboard
def toClipboardForExcel(array):
"""
Copies an array into a string format acceptable by Excel.
Columns separated by \t, rows separated by \n
"""
# Create string from array
line_strings = []
for line in array:
line_strings.append("\t".join(line.astype(str)).replace("\n",""))
array_string = "\r\n".join(line_strings)
# Put string into clipboard (open, clear, set, close)
clipboard.OpenClipboard()
clipboard.EmptyClipboard()
clipboard.SetClipboardText(array_string)
clipboard.CloseClipboard()
I have tested this code with random arrays of shape (1000,10000) and the biggest bottleneck seems to be passing the data to the function. (When I add a print
statement at the beginning of the function, I still have to wait a bit before it prints anything.)
我用随机数组形状(1000,10000)测试了这个代码,最大的瓶颈似乎是将数据传递给函数。 (当我在函数的开头添加一个print语句时,我仍然需要等待它才能打印任何东西。)
EDIT: The previous paragraph related my experience in Python Tools for Visual Studio. In this environment, it seens like the print statement is delayed. In direct command line interface, the bottleneck is in the loop, like expected.
编辑:上一段涉及我在Visual Studio的Python工具中的经验。在这种环境中,它会像print语句一样延迟。在直接命令行界面中,瓶颈在循环中,与预期的一样。
#2
2
As of today, you can also use xlwings. It's open source, and fully compatible with Numpy arrays and Pandas DataFrames.
截至今天,您还可以使用xlwings。它是开源的,与Numpy数组和Pandas DataFrames完全兼容。
#3
1
If I would need to process multiple files loaded into python and then parse into excel, I would probably make some tools using xlwt
如果我需要处理加载到python中的多个文件然后解析为excel,我可能会使用xlwt制作一些工具
That said, may I offer my recipe Pasting python data into a spread sheet open for any edits, complaints or feedback. It uses no third party libraries and should be cross platform.
也就是说,我可以提供我的食谱将python数据粘贴到电子表格中,以便进行任何编辑,投诉或反馈。它不使用第三方库,应该是跨平台的。