Python删除excel电子表格中的行

时间:2020-12-18 04:25:20

I have a really large excel file and i need to delete about 20,000 rows, contingent on meeting a simple condition and excel won't let me delete such a complex range when using a filter. The condition is:

我有一个非常大的ex​​cel文件,我需要删除大约20,000行,这取决于满足一个简单的条件和excel不会让我删除这样一个复杂的范围使用过滤器。条件是:

If the first column contains the value, X, then I need to be able to delete the entire row.

如果第一列包含值X,那么我需要能够删除整行。

I'm trying to automate this using python and xlwt, but am not quite sure where to start. Seeking some code snippits to get me started... Grateful for any help that's out there!

我正在尝试使用python和xlwt自动执行此操作,但我不确定从哪里开始。寻求一些代码snippits让我开始...感谢那里的任何帮助!

6 个解决方案

#1


9  

Don't delete. Just copy what you need.

不要删除。只需复制你需要的东西。

  1. read the original file
  2. 阅读原始文件

  3. open a new file
  4. 打开一个新文件

  5. iterate over rows of the original file (if the first column of the row does not contain the value X, add this row to the new file)
  6. 迭代原始文件的行(如果行的第一列不包含值X,则将此行添加到新文件中)

  7. close both files
  8. 关闭这两个文件

  9. rename the new file into the original file
  10. 将新文件重命名为原始文件

#2


2  

You can try using the csv reader:

您可以尝试使用csv阅读器:

http://docs.python.org/library/csv.html

#3


1  

I like using COM objects for this kind of fun:

我喜欢使用COM对象来获得这种乐趣:

import win32com.client
from win32com.client import constants

f = r"h:\Python\Examples\test.xls"
DELETE_THIS = "X"

exc = win32com.client.gencache.EnsureDispatch("Excel.Application")
exc.Visible = 1
exc.Workbooks.Open(Filename=f)

row = 1
while True:
    exc.Range("B%d" % row).Select()
    data = exc.ActiveCell.FormulaR1C1
    exc.Range("A%d" % row).Select()
    condition = exc.ActiveCell.FormulaR1C1

    if data == '':
        break
    elif condition == DELETE_THIS:
        exc.Rows("%d:%d" % (row, row)).Select()
        exc.Selection.Delete(Shift=constants.xlUp)
    else:
        row += 1

# Before
# 
#      a
#      b
# X    c
#      d
#      e
# X    d
#      g
#        

# After
#
#      a
#      b
#      d
#      e
#      g

I usually record snippets of Excel macros and glue them together with Python as I dislike Visual Basic :-D.

我通常会记录Excel宏的片段并将它们与Python粘合在一起,因为我不喜欢Visual Basic :-D。

#4


0  

If you just need to delete the data (rather than 'getting rid of' the row, i.e. it shifts rows) you can try using my module, PyWorkbooks. You can get the most recent version here:

如果你只需要删除数据(而不是'摆脱'行,即它会移动行),你可以尝试使用我的模块PyWorkbooks。您可以在此处获取最新版本:

https://sourceforge.net/projects/pyworkbooks/

There is a pdf tutorial to guide you through how to use it. Happy coding!

有一个pdf教程,指导您如何使用它。快乐的编码!

#5


0  

You can use,

您可以使用,

sh.Range(sh.Cells(1,1),sh.Cells(20000,1)).EntireRow.Delete()

will delete rows 1 to 20,000 in an open Excel spreadsheet so,

将在打开的Excel电子表格中删除第1行到第20,000行,这样,

if sh.Cells(1,1).Value == 'X':

   sh.Cells(1,1).EntireRow.Delete()

#6


-2  

I achieved using Pandas package.... import pandas as pd

我实现了使用Pandas包....导入pandas为pd

#Read from Excel
xl= pd.ExcelFile("test.xls")

#Parsing Excel Sheet to DataFrame
dfs = xl.parse(xl.sheet_names[0])

#Update DataFrame as per requirement
#(Here Removing the row from DataFrame having blank value in "Name" column)

dfs = dfs[dfs['Name'] != '']

#Updating the excel sheet with the updated DataFrame

dfs.to_excel("test.xls",sheet_name='Sheet1',index=False)

#1


9  

Don't delete. Just copy what you need.

不要删除。只需复制你需要的东西。

  1. read the original file
  2. 阅读原始文件

  3. open a new file
  4. 打开一个新文件

  5. iterate over rows of the original file (if the first column of the row does not contain the value X, add this row to the new file)
  6. 迭代原始文件的行(如果行的第一列不包含值X,则将此行添加到新文件中)

  7. close both files
  8. 关闭这两个文件

  9. rename the new file into the original file
  10. 将新文件重命名为原始文件

#2


2  

You can try using the csv reader:

您可以尝试使用csv阅读器:

http://docs.python.org/library/csv.html

#3


1  

I like using COM objects for this kind of fun:

我喜欢使用COM对象来获得这种乐趣:

import win32com.client
from win32com.client import constants

f = r"h:\Python\Examples\test.xls"
DELETE_THIS = "X"

exc = win32com.client.gencache.EnsureDispatch("Excel.Application")
exc.Visible = 1
exc.Workbooks.Open(Filename=f)

row = 1
while True:
    exc.Range("B%d" % row).Select()
    data = exc.ActiveCell.FormulaR1C1
    exc.Range("A%d" % row).Select()
    condition = exc.ActiveCell.FormulaR1C1

    if data == '':
        break
    elif condition == DELETE_THIS:
        exc.Rows("%d:%d" % (row, row)).Select()
        exc.Selection.Delete(Shift=constants.xlUp)
    else:
        row += 1

# Before
# 
#      a
#      b
# X    c
#      d
#      e
# X    d
#      g
#        

# After
#
#      a
#      b
#      d
#      e
#      g

I usually record snippets of Excel macros and glue them together with Python as I dislike Visual Basic :-D.

我通常会记录Excel宏的片段并将它们与Python粘合在一起,因为我不喜欢Visual Basic :-D。

#4


0  

If you just need to delete the data (rather than 'getting rid of' the row, i.e. it shifts rows) you can try using my module, PyWorkbooks. You can get the most recent version here:

如果你只需要删除数据(而不是'摆脱'行,即它会移动行),你可以尝试使用我的模块PyWorkbooks。您可以在此处获取最新版本:

https://sourceforge.net/projects/pyworkbooks/

There is a pdf tutorial to guide you through how to use it. Happy coding!

有一个pdf教程,指导您如何使用它。快乐的编码!

#5


0  

You can use,

您可以使用,

sh.Range(sh.Cells(1,1),sh.Cells(20000,1)).EntireRow.Delete()

will delete rows 1 to 20,000 in an open Excel spreadsheet so,

将在打开的Excel电子表格中删除第1行到第20,000行,这样,

if sh.Cells(1,1).Value == 'X':

   sh.Cells(1,1).EntireRow.Delete()

#6


-2  

I achieved using Pandas package.... import pandas as pd

我实现了使用Pandas包....导入pandas为pd

#Read from Excel
xl= pd.ExcelFile("test.xls")

#Parsing Excel Sheet to DataFrame
dfs = xl.parse(xl.sheet_names[0])

#Update DataFrame as per requirement
#(Here Removing the row from DataFrame having blank value in "Name" column)

dfs = dfs[dfs['Name'] != '']

#Updating the excel sheet with the updated DataFrame

dfs.to_excel("test.xls",sheet_name='Sheet1',index=False)