I am trying to create a tool that will search 300+ .txt files for a string that that may be used several times in each of the 300+ .txt files
我正在尝试创建一个工具,它将搜索300多个.txt文件中的字符串,这些字符串可能会在300多个.txt文件中的每一个中使用多次
I want to be able to go through each file and get the string between each of the occurrences.
我希望能够浏览每个文件并在每个事件之间获取字符串。
It sounds a bit twisted I know, I have been scratching my head for hours, while testing code.
听起来有点扭曲我知道,在测试代码的同时,我一直在摸不着头脑。
What I have tried
我试过了什么
I read through each file and check for if it contains my search text at least once, if it does, then I add the full path of the (files that do contain it) to a list
我读完每个文件并检查它是否包含我的搜索文本至少一次,如果是,那么我将(包含它的文件)的完整路径添加到列表中
Dim FileNamesList As New List(Of String)
Dim occurList As New List(Of String)
Dim textSearch As String = TextBox1.Text.ToLower
'check each file to see if it even contains textbox1.text
'if it does, then add matching files to list
For Each f As FileInfo In dir.GetFiles("*.txt")
Dim tmpRead = File.ReadAllText(f.FullName).ToLower
Dim tIndex As Integer = tmpRead.IndexOf(textSearch)
If tIndex > -1 Then
FileNamesList.Add(f.FullName)
End If
Next
Then I thought, oh, now all I need to do is go through each string in that 'approved' files list and add the entire contents of each to a new list.
然后我想,哦,现在我需要做的就是浏览“已批准”文件列表中的每个字符串,并将每个字符串的全部内容添加到新列表中。
Then I go through each in 'that' list and get string between two delimiters.
然后我浏览每个'that'列表,并在两个分隔符之间获取字符串。
And... I just get lost from there...
而且......我只是迷路了......
Here is the get string between delimiters I have tried using.
这是我尝试使用的分隔符之间的get字符串。
Private Function GetStringBetweenTags(ByVal startIdentifer As String, ByVal endIndentifier As String, ByVal textsource As String) As String
Dim idLength As Int16 = startIdentifer.Length
Dim s As String = textsource
Try
s = s.Substring(s.IndexOf(startIdentifer) + idLength)
s = s.Substring(0, s.IndexOf(endIndentifier))
'MsgBox(s)
Catch
End Try
Return s
End Function
In simple terms...
简单来说...
- I have 300 .txt files
- Some may contain a string that I am after
- I want the substring of each string
我有300个.txt文件
有些可能包含我追求的字符串
我想要每个字符串的子字符串
Normally I am fine, and never need to ask questions, but there is too many forceptions going on.
通常我很好,从不需要提问,但是有太多的力量在继续。
Logical Example
== Table.txt ==
print("I am tony")
print("pineapple")
print("brown cows")
log("cable ties")
log("bad ocd")
log("bingo")
== Cherry.txt ==
print("grapes")
print("pie")
print("apples")
log("laugh")
log("tuna")
log("gonuts")
== Tower.txt ==
print("tall")
print("clouds")
print("nomountain")
log("goggles?")
log("kuwait")
log("india")
I want to end with list of the text between only the print function from all 3 files
我想以所有3个文件中的打印功能之间的文本列表结束
Haven't found any other thread about this, probably because it stupid.
没有找到任何关于此的其他线索,可能是因为它很愚蠢。
So I should end with
所以我应该结束
== ResultList ==
I am tony
pineapple
brown cows
grapes
pie
apples
tall
clouds
nomountain
2 个解决方案
#1
RegEx is probably your best choice for something like this. For instance:
RegEx可能是你这样的最佳选择。例如:
Dim results As New List(Of String)()
Dim r As New RegEx("print\(""(.*)""\)")
For path As String In filePaths
Dim contents As String = File.ReadAllText(path)
For Each m As Match in r.Matches(contents)
If m.Sucess Then
results.Add(m.Groups(1).Value)
End If
Next
Next
As you can see, the code loops through a list of file paths. For each one, it loads the entire contents of the file into a string. It then searches the file contents string for all matches to the following regular expression pattern: print\("(.*)"\)
. It then loops through all of those pattern matches and grabs the value of the first capture group from each one. Those are added to the results list, which contains your desired strings. Here's the meaning of the parts of the RegEx:
如您所见,代码循环遍历文件路径列表。对于每一个,它将文件的全部内容加载到字符串中。然后,它将文件内容字符串中的所有匹配项搜索到以下正则表达式模式:print \(“(。*)”\)。然后循环遍历所有这些模式匹配并从每个匹配中获取第一个捕获组的值。这些将添加到结果列表中,其中包含所需的字符串。以下是RegEx各部分的含义:
-
print
- Looks for any string starting with the word "print" -
\(
- The next character after the word "print" must be an open parentheses (the backslash is an escape character) -
"
- The next character after the open parentheses must be a double quote character (it is repeated twice so as to escape it so that VB doesn't think it's the end of the string). -
(.*)
- The parentheses define this as a capturing group (so that we can pull out just this value from the matches). The.*
means any characters of any length. -
"\)
- Matching strings must end with a double quote followed by a closing parentheses.
print - 查找以“print”开头的任何字符串
\( - “print”一词后面的下一个字符必须是一个开括号(反斜杠是一个转义字符)
“ - 打开括号后的下一个字符必须是双引号字符(它会重复两次以便转义它,以便VB不认为它是字符串的结尾)。
(。*) - 括号将其定义为捕获组(以便我们可以从匹配中提取此值)。 。*表示任何长度的任何字符。
“\) - 匹配字符串必须以双引号结尾,后跟右括号。
#2
Use Regex:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim input1 As String = _
"print(""I am tony"") " + _
"print(""pineapple"") " + _
"print(""brown cows"") " + _
"log(""cable ties"") " + _
"log(""bad ocd"") " + _
"log(""bingo"")"
Dim input2 As String = _
"print(""grapes"") " + _
"print(""pie"") " + _
"print(""apples"") " + _
"log(""laugh"") " + _
"log(""tuna"") " + _
"log(""gonuts"")"
Dim input3 As String = _
"print(""tall"") " + _
"print(""clouds"") " + _
"print(""nomountain"") " + _
"log(""goggles?"") " + _
"log(""kuwait"") " + _
"log(""india"")"
Dim pattern As String = "print\(""([^""]*)""\)"
Dim expr As Regex = New Regex(pattern, RegexOptions.Singleline)
Dim matches As MatchCollection = Nothing
Dim data As List(Of String) = New List(Of String)()
matches = expr.Matches(input1)
For Each mat As Match In matches
data.Add(mat.Groups(1).Value)
Next mat
matches = expr.Matches(input2)
For Each mat As Match In matches
data.Add(mat.Groups(1).Value)
Next mat
matches = expr.Matches(input3)
For Each mat As Match In matches
data.Add(mat.Groups(1).Value)
Next mat
End Sub
End Module
#1
RegEx is probably your best choice for something like this. For instance:
RegEx可能是你这样的最佳选择。例如:
Dim results As New List(Of String)()
Dim r As New RegEx("print\(""(.*)""\)")
For path As String In filePaths
Dim contents As String = File.ReadAllText(path)
For Each m As Match in r.Matches(contents)
If m.Sucess Then
results.Add(m.Groups(1).Value)
End If
Next
Next
As you can see, the code loops through a list of file paths. For each one, it loads the entire contents of the file into a string. It then searches the file contents string for all matches to the following regular expression pattern: print\("(.*)"\)
. It then loops through all of those pattern matches and grabs the value of the first capture group from each one. Those are added to the results list, which contains your desired strings. Here's the meaning of the parts of the RegEx:
如您所见,代码循环遍历文件路径列表。对于每一个,它将文件的全部内容加载到字符串中。然后,它将文件内容字符串中的所有匹配项搜索到以下正则表达式模式:print \(“(。*)”\)。然后循环遍历所有这些模式匹配并从每个匹配中获取第一个捕获组的值。这些将添加到结果列表中,其中包含所需的字符串。以下是RegEx各部分的含义:
-
print
- Looks for any string starting with the word "print" -
\(
- The next character after the word "print" must be an open parentheses (the backslash is an escape character) -
"
- The next character after the open parentheses must be a double quote character (it is repeated twice so as to escape it so that VB doesn't think it's the end of the string). -
(.*)
- The parentheses define this as a capturing group (so that we can pull out just this value from the matches). The.*
means any characters of any length. -
"\)
- Matching strings must end with a double quote followed by a closing parentheses.
print - 查找以“print”开头的任何字符串
\( - “print”一词后面的下一个字符必须是一个开括号(反斜杠是一个转义字符)
“ - 打开括号后的下一个字符必须是双引号字符(它会重复两次以便转义它,以便VB不认为它是字符串的结尾)。
(。*) - 括号将其定义为捕获组(以便我们可以从匹配中提取此值)。 。*表示任何长度的任何字符。
“\) - 匹配字符串必须以双引号结尾,后跟右括号。
#2
Use Regex:
Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Dim input1 As String = _
"print(""I am tony"") " + _
"print(""pineapple"") " + _
"print(""brown cows"") " + _
"log(""cable ties"") " + _
"log(""bad ocd"") " + _
"log(""bingo"")"
Dim input2 As String = _
"print(""grapes"") " + _
"print(""pie"") " + _
"print(""apples"") " + _
"log(""laugh"") " + _
"log(""tuna"") " + _
"log(""gonuts"")"
Dim input3 As String = _
"print(""tall"") " + _
"print(""clouds"") " + _
"print(""nomountain"") " + _
"log(""goggles?"") " + _
"log(""kuwait"") " + _
"log(""india"")"
Dim pattern As String = "print\(""([^""]*)""\)"
Dim expr As Regex = New Regex(pattern, RegexOptions.Singleline)
Dim matches As MatchCollection = Nothing
Dim data As List(Of String) = New List(Of String)()
matches = expr.Matches(input1)
For Each mat As Match In matches
data.Add(mat.Groups(1).Value)
Next mat
matches = expr.Matches(input2)
For Each mat As Match In matches
data.Add(mat.Groups(1).Value)
Next mat
matches = expr.Matches(input3)
For Each mat As Match In matches
data.Add(mat.Groups(1).Value)
Next mat
End Sub
End Module