加速VBA代码以查找具有相似数字出现次数的行

时间:2022-05-05 19:14:47

I have and excel filled with 0 and 1 like this one: 加速VBA代码以查找具有相似数字出现次数的行

我有和excel填充0和1像这样:

and I want to find which rows have three (1) in common with the others rows and delete them.

我想找到哪些行与其他行共有三(1)个并删除它们。

for example, to check with rows have three (1) in common with the first row I put this function in the column G :

例如,要检查行与第一行共有三(1)个,我将此函数放在G列中:

G2:=SUM.IF(A2:F2;"=1";A2:F2)

G3:=SUM.IF(A2:F2;"=1";A3:F3)

G4:=SUM.IF(A2:F2;"=1";A4:F4)

G5:=SUM.IF(A2:F2;"=1";A5:F5)

and obviously I want to do this to a lot of Rows (5000++) and columns (51) and this is my code:

显然我想对很多行(5000 ++)和列(51)做这个,这是我的代码:

Sub Macro_NUEVA()

Dim maxRows, maxColumns, rowCount, row As Integer
maxRows= 10
maxColumns= 51
maxRows = InputBox("Number of rows?:", "Number of rows")

sngStartTime = Timer 'Just a timer
Application.ScreenUpdating = False 'Do not update screen to save some time

For rowCount = 2 To maxRows 'Iterate all Rows  

    For row = rowCount To maxRows 'loop to compare every single row with the actual row
      ActiveSheet.Cells(row, maxRows + 1).Select
      ActiveCell.Formula = "=SUMIF(" & Range("B" & rowCount & ":AY" & rowCount ).Address(False,     False) & ",""=1""," & Range("B" & row & ":AY" & row).Address(False, False) & ")"
        If Selection = 3 Then 'If three ones in common -> delete row
        Selection.EntireRow.Delete
        maxRows = maxRows - 1
        row= row- 1
      End If
  Next row 
Next rowCount

Application.ScreenUpdating = True
sngTotalTime = Timer - sngStartTime
MsgBox "Tiempo Empleado:  " & Round(sngTotalTime, 2) & " Segundos"

End Sub

This code is working fine, but it takes a lot of time... (7000 Rows -> 25 hours)

这段代码工作正常,但需要花费很多时间......(7000行 - > 25小时)

I'm a beginner with VBA, and I don't know if this code is efficient, but I didn't find any other way to solve the problem, also I'm thinking of doing this program in C (just parse a CSV).

我是VBA的初学者,我不知道这段代码是否有效,但我没有找到任何其他方法来解决问题,我也想在C中做这个程序(只需解析一个CSV) )。

1 个解决方案

#1


3  

See if this speeds things up for you. Tested on A2:AY5000 filled with =RANDBETWEEN(0,1) then copied & paste special values. Row 1 was a header row with column labels. You will need to rename your worksheet Matriz or modify the code line that names the worksheet.

看看这是否能为您加速。测试A2:AY5000填充= RANDBETWEEN(0,1)然后复制并粘贴特殊值。第1行是带有列标签的标题行。您需要重命名工作表Matriz或修改命名工作表的代码行。

     加速VBA代码以查找具有相似数字出现次数的行

Option Explicit

Sub Macro_NUEVA()
    Dim maxRws As Long, maxCols As Long, rwCount As Long, ws As Worksheet
    Dim f As Long, fc As Long, c As Long, cl As Long, rw As Long, n As Long
    Dim sngTime As Double, app As Application
    maxRws = 5000
    maxRws = InputBox("Número de filas?:", "Número de filas", maxRws)

    Set app = Application
    app.ScreenUpdating = False
    app.EnableEvents = False
    app.Calculation = xlCalculationManual
    sngTime = Timer 'Just a timer

    Set ws = Sheets("Matriz")
    With ws.Cells(1, 1).CurrentRegion
        If Not ws.AutoFilterMode Then .AutoFilter
        On Error Resume Next: ws.ShowAllData: On Error GoTo 0
        maxCols = .Columns.Count
        For rw = 2 To maxRws
            For cl = 1 To (.Columns.Count - 2)
                If app.CountIf(.Cells(rw, cl).Resize(1, (maxCols - cl) + 1), 1) > 2 Then
                    f = 0
                    For fc = cl To maxCols
                        If .Cells(rw, fc).Value = 1 Then
                            .AutoFilter Field:=fc, Criteria1:=1
                            f = f + 1
                            If f = 3 Then Exit For
                        End If
                    Next fc
                    If f = 3 And app.Subtotal(102, .Columns(1)) > 1 Then
                        Debug.Print "deleting " & app.Subtotal(102, .Columns(1)) - 1 & " row(s)"
                        '.Offset(2, 0).EntireRow.Delete Shift:=xlUp
                        'next line is a modification of the offset to delete
                        .Offset(.Offset(1, 0).SpecialCells(xlCellTypeVisible).Cells(1).Row, 0).EntireRow.Delete Shift:=xlUp
                    End If
                    ws.ShowAllData
                End If
            Next cl
            If Not CBool(app.Count(Rows(rw + 1))) Then Exit For
        Next rw
        If ws.AutoFilterMode Then .AutoFilter
    End With
    Set ws = Nothing

    sngTime = Timer - sngTime
    MsgBox "Tiempo Empleado:  " & Round(sngTime, 2) & " Segundos"

    app.Calculation = xlCalculationAutomatic
    app.EnableEvents = False
    app.ScreenUpdating = True
    Set app = Application
End Sub

     加速VBA代码以查找具有相似数字出现次数的行

Your own elapsed time will depend greatly on the ratio of ones and zeroes. Mine was ~50%/50% if RANDBETWEEN was operating properly. More zeroes means more rows and columns have to be examined. You can check the VBE's Immediate Window for the count of rows that were deleted.

您自己的经过时间将在很大程度上取决于1和0的比率。如果RANDBETWEEN正常运行,我的约为50%/ 50%。更多的零意味着必须检查更多的行和列。您可以检查VBE的立即窗口以获取已删除的行数。

There are good reasons for not resetting your increment variable within a VBA For ... Next; a) getting caught in an infinite loop and b) not resetting the end of the loop means useless iterations are two of them. There are other reasons; in general it is not good programming methodology. In the above method, I don't have to worry about proceeding from top to bottom because I am leaving the row being examined alone and deleting every other match; not the other way around. I also have an exit when the row being examined no longer has any values.

有充分的理由不在VBA中重置增量变量For ... Next; a)陷入无限循环并且b)不重置循环的结束意味着无用的迭代是其中两个。还有其他原因;一般来说,它不是很好的编程方法。在上面的方法中,我不必担心从上到下进行,因为我正在单独检查行并删除其他所有匹配项;而不是相反。当被检查的行不再具有任何值时,我也有一个退出。

I am a little curious about the purpose of this beyond an intellectual exercise. With 51 columns of >5000 rows with only a choice of 0 or 1 then there seems little chance that there would be much remaining after removing matching triplet sets of ones. Perhaps you could expand on that subject a bit in a comment or even your original posting.

除了智力锻炼之外,我对这个目的有点好奇。有51列> 5000行,只有0或1的选择,那么在删除匹配的三元组之后,似乎很少有剩余的可能性。也许你可以在评论或甚至原始帖子中稍微扩展一下这个主题。

#1


3  

See if this speeds things up for you. Tested on A2:AY5000 filled with =RANDBETWEEN(0,1) then copied & paste special values. Row 1 was a header row with column labels. You will need to rename your worksheet Matriz or modify the code line that names the worksheet.

看看这是否能为您加速。测试A2:AY5000填充= RANDBETWEEN(0,1)然后复制并粘贴特殊值。第1行是带有列标签的标题行。您需要重命名工作表Matriz或修改命名工作表的代码行。

     加速VBA代码以查找具有相似数字出现次数的行

Option Explicit

Sub Macro_NUEVA()
    Dim maxRws As Long, maxCols As Long, rwCount As Long, ws As Worksheet
    Dim f As Long, fc As Long, c As Long, cl As Long, rw As Long, n As Long
    Dim sngTime As Double, app As Application
    maxRws = 5000
    maxRws = InputBox("Número de filas?:", "Número de filas", maxRws)

    Set app = Application
    app.ScreenUpdating = False
    app.EnableEvents = False
    app.Calculation = xlCalculationManual
    sngTime = Timer 'Just a timer

    Set ws = Sheets("Matriz")
    With ws.Cells(1, 1).CurrentRegion
        If Not ws.AutoFilterMode Then .AutoFilter
        On Error Resume Next: ws.ShowAllData: On Error GoTo 0
        maxCols = .Columns.Count
        For rw = 2 To maxRws
            For cl = 1 To (.Columns.Count - 2)
                If app.CountIf(.Cells(rw, cl).Resize(1, (maxCols - cl) + 1), 1) > 2 Then
                    f = 0
                    For fc = cl To maxCols
                        If .Cells(rw, fc).Value = 1 Then
                            .AutoFilter Field:=fc, Criteria1:=1
                            f = f + 1
                            If f = 3 Then Exit For
                        End If
                    Next fc
                    If f = 3 And app.Subtotal(102, .Columns(1)) > 1 Then
                        Debug.Print "deleting " & app.Subtotal(102, .Columns(1)) - 1 & " row(s)"
                        '.Offset(2, 0).EntireRow.Delete Shift:=xlUp
                        'next line is a modification of the offset to delete
                        .Offset(.Offset(1, 0).SpecialCells(xlCellTypeVisible).Cells(1).Row, 0).EntireRow.Delete Shift:=xlUp
                    End If
                    ws.ShowAllData
                End If
            Next cl
            If Not CBool(app.Count(Rows(rw + 1))) Then Exit For
        Next rw
        If ws.AutoFilterMode Then .AutoFilter
    End With
    Set ws = Nothing

    sngTime = Timer - sngTime
    MsgBox "Tiempo Empleado:  " & Round(sngTime, 2) & " Segundos"

    app.Calculation = xlCalculationAutomatic
    app.EnableEvents = False
    app.ScreenUpdating = True
    Set app = Application
End Sub

     加速VBA代码以查找具有相似数字出现次数的行

Your own elapsed time will depend greatly on the ratio of ones and zeroes. Mine was ~50%/50% if RANDBETWEEN was operating properly. More zeroes means more rows and columns have to be examined. You can check the VBE's Immediate Window for the count of rows that were deleted.

您自己的经过时间将在很大程度上取决于1和0的比率。如果RANDBETWEEN正常运行,我的约为50%/ 50%。更多的零意味着必须检查更多的行和列。您可以检查VBE的立即窗口以获取已删除的行数。

There are good reasons for not resetting your increment variable within a VBA For ... Next; a) getting caught in an infinite loop and b) not resetting the end of the loop means useless iterations are two of them. There are other reasons; in general it is not good programming methodology. In the above method, I don't have to worry about proceeding from top to bottom because I am leaving the row being examined alone and deleting every other match; not the other way around. I also have an exit when the row being examined no longer has any values.

有充分的理由不在VBA中重置增量变量For ... Next; a)陷入无限循环并且b)不重置循环的结束意味着无用的迭代是其中两个。还有其他原因;一般来说,它不是很好的编程方法。在上面的方法中,我不必担心从上到下进行,因为我正在单独检查行并删除其他所有匹配项;而不是相反。当被检查的行不再具有任何值时,我也有一个退出。

I am a little curious about the purpose of this beyond an intellectual exercise. With 51 columns of >5000 rows with only a choice of 0 or 1 then there seems little chance that there would be much remaining after removing matching triplet sets of ones. Perhaps you could expand on that subject a bit in a comment or even your original posting.

除了智力锻炼之外,我对这个目的有点好奇。有51列> 5000行,只有0或1的选择,那么在删除匹配的三元组之后,似乎很少有剩余的可能性。也许你可以在评论或甚至原始帖子中稍微扩展一下这个主题。