I have this
我有这个
var regex = new Regex(@"StartDate:(.*)EndDate:(.*)W.*Status:(.*)");
So this gets me values until it hits a W in the string correct? - I need it to stop at a W OR S. I have tried a few different ways but I am not getting it to work. Anyone got some info?
所以这会得到我的值,直到它击中字符串中的W是正确的? - 我需要它停在W或S.我尝试了几种不同的方法,但我没有让它工作。有人得到一些信息?
More info:
record = record.Replace(" ", "").Replace("\r\n", "").Replace("-", "/");
var regex = new Regex(@"StartDate:(.*)EndDate:(.*)W.*Status:(.*)");
string strStartDate = regex.Match(record).Groups[1].ToString();
string strEndDate = regex.Match(record).Groups[2].ToString();
string Status = regex.Match(record).Groups[3].ToString().ToUpper().StartsWith("In") ? "Inactive" : "Active";
I am trying to parse a big string of values, I only want 3 things - Start Date, End Date, and Status (active/inactive). However there are 3 different values for each (3 start dates, 3 end dates, 3 status')
我试图解析一大串值,我只想要3件事 - 开始日期,结束日期和状态(活动/非活动)。但是每个有3个不同的值(3个开始日期,3个结束日期,3个状态')
First 2 string go like this
前两个字符串是这样的
"Start Date:
2014-09-08
End Date:
2017-09-07
Warranty Type:
XXX
Status:
Active
Serial Number/IMEI:
XXXXXXXXXXX
Description:
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX"
The 3rd string is like this
第三个字符串是这样的
"Start Date:
2014-09-08
End Date:
2017-09-07
Status:
Active
Warranty Upgrade Code:
SVC_PRIORITY"
On the last string it will not display the dates because of the W.*
after end date im guessing
在最后一个字符串上,由于在猜测结束日期之后的W. *,它不会显示日期
I am not getting the 2 dates on the last string
我没有在最后一个字符串上获得2个日期
4 个解决方案
#1
1
EDIT Please try the function to parse using regex:
编辑请尝试使用正则表达式解析函数:
using System.Text.RegularExpressions;
using System.Linq;
using System.Windows.Forms;
private static List<string[]> parseString(string input)
{
var pattern = @"Start\s+Date:\s+([0-9-]+)\s+End\s+Date:\s+([0-9-]+)\s+(?:Warranty\s+Type:\s+\w+\s+)?Status:\s+(\w+)\s*";
return Regex.Matches(input, pattern).Cast<Match>().ToList().ConvertAll(m => new string[] { m.Groups[1].Value, m.Groups[2].Value, m.Groups[3].Value });
}
// To show the result string
var result1 = parseString(str1);
string result_string = string.Join("\n", result1.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
MessageBox.Show(result_string);
Output:
EDIT2 For OP's situation, you could call the function from inside the foreach loop like this:
EDIT2对于OP的情况,你可以从foreach循环内部调用函数,如下所示:
foreach (HtmlElement el in webBrowser1.Document.GetElementsByTagName("div"))
{
if (el.GetAttribute("className") == "fluid-row Borderfluid")
{
string record = el.InnerText;
//if record is the string to parse
var result = parseString(record);
var result_string = string.Join("\n", result.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
MessageBox.Show(result_string);
}
}
#2
1
No need to replace the new lines in your example
无需替换示例中的新行
List<string> resultList = new List<string>();
var subjectString = @"Start Date: xxxxx
End Date: yyyy
Warranty Type: zzzz
Status: uuuu
Start Date: aaaa
End Date: bbbb
Status: cccc";
Regex regexObj = new Regex(@"Start Date: (.*?)\nEnd Date: (.*?)\n(.|\n)*?Status: (.*)");
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Groups[1].Value);
resultList.Add(matchResult.Groups[2].Value);
resultList.Add(matchResult.Groups[4].Value);
matchResult = matchResult.NextMatch();
}
#3
0
You may replace your code with the following one (see IDEONE demo):
您可以使用以下代码替换您的代码(请参阅IDEONE演示):
var s = @"Start Date: xxxxx
End Date: xxxx
Warranty Type: xxxx
Status: xxxx";
var res = Regex.Replace(s, @":\s+", ": ") // Remove excessive whitespace
.Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) // Split each line with `:`+space
.ToDictionary(n => n[0], n => n[1]); // Create a dictionary
string strStartDate = string.Empty;
string strEndDate = string.Empty;
string Status = string.Empty;
string Warranty = string.Empty;
// Demo & variable assignment
if (res.ContainsKey("Start Date")) {
Console.WriteLine(res["Start Date"]);
strStartDate = res["Start Date"];
}
if (res.ContainsKey("Warranty Type")) {
Console.WriteLine(res["Warranty Type"]);
Warranty = res["Warranty Type"];
}
if (res.ContainsKey("End Date")) {
Console.WriteLine(res["End Date"]);
strEndDate = res["End Date"];
}
if (res.ContainsKey("Status")) {
Console.WriteLine(res["Status"]);
string Status = res["Status"];
}
Note that the best approach is to declare your own class with the fields like WarrantyType
, StartDate
, etc. and initialize that right in the LINQ code.
请注意,最好的方法是使用WarrantyType,StartDate等字段声明自己的类,并在LINQ代码中对其进行初始化。
#4
0
Avoid .*
its a catch all which gets regex pattern creators in trouble. Instead create the pattern to match to a specific pattern in the data which always occurs in the data.
避免使用。*它可以解决所有让正则表达式模式创建者陷入困境的问题。而是创建模式以匹配数据中始终出现在数据中的特定模式。
Your pattern are the two dates of \d\d\d\d-\d\d-\d\d\d\d
the rest is anchor text, which should be used as static anchors which can be skipped.
您的模式是两个日期\ d \ d \ d \ d- \ d \ d- \ d \ d \ d \ d其余是锚文本,应该用作可以跳过的静态锚点。
Here is an example where it looks for the date patterns. Once found regex puts it into named match capture groups (?<GroupNameHere>...)
and Linq extracts each match into a dynamic entity and parses the date times.
这是一个查找日期模式的示例。一旦发现正则表达式将其置于命名匹配捕获组(?
Data
Note the first date is reversed as per your example
请注意,根据您的示例,第一个日期是相反的
var data = @"Start Date:
2014-09-08
End Date:
2017-09-07
Status:
Active
Start Date:
2014-09-09
End Date:
2017-09-10
Status:
In-Active
";
Pattern
string pattern = @"
^Start\sDate:\s+ # An anchor of start date that always starts at the BOL
(?<Start>\d\d\d\d-\d\d-\d\d) # actual start date pattern
\s+ # a lot of space including \r\n
^End\sDate:\s+ # End date anchor and space
(?<End>\d\d\d\d-\d\d-\d\d) # pattern of the end date.
\s+ # Same pattern as above for Status
^Status:\s+
(?<Status>[^\s]+)
";
Processing
// Explicit hints to the parser to ingore any non specified matches ones outside the parenthesis(..)
// Multiline states ^ and $ are beginning and eol lines and not beginning and end of buffer.
// Ignore allows us to comment the pattern only; does not affect processing.
Regex.Matches(data, pattern, RegexOptions.ExplicitCapture |
RegexOptions.Multiline |
RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Status = mt.Groups["Status"].Value,
StartDate = DateTime.Parse(mt.Groups["Start"].Value),
EndDate = DateTime.Parse(mt.Groups["End"].Value)
})
Result
#1
1
EDIT Please try the function to parse using regex:
编辑请尝试使用正则表达式解析函数:
using System.Text.RegularExpressions;
using System.Linq;
using System.Windows.Forms;
private static List<string[]> parseString(string input)
{
var pattern = @"Start\s+Date:\s+([0-9-]+)\s+End\s+Date:\s+([0-9-]+)\s+(?:Warranty\s+Type:\s+\w+\s+)?Status:\s+(\w+)\s*";
return Regex.Matches(input, pattern).Cast<Match>().ToList().ConvertAll(m => new string[] { m.Groups[1].Value, m.Groups[2].Value, m.Groups[3].Value });
}
// To show the result string
var result1 = parseString(str1);
string result_string = string.Join("\n", result1.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
MessageBox.Show(result_string);
Output:
EDIT2 For OP's situation, you could call the function from inside the foreach loop like this:
EDIT2对于OP的情况,你可以从foreach循环内部调用函数,如下所示:
foreach (HtmlElement el in webBrowser1.Document.GetElementsByTagName("div"))
{
if (el.GetAttribute("className") == "fluid-row Borderfluid")
{
string record = el.InnerText;
//if record is the string to parse
var result = parseString(record);
var result_string = string.Join("\n", result.ConvertAll(r => string.Format("Start Date: {0}\nEnd Date: {1}\nStatus: {2}", r)).ToArray());
MessageBox.Show(result_string);
}
}
#2
1
No need to replace the new lines in your example
无需替换示例中的新行
List<string> resultList = new List<string>();
var subjectString = @"Start Date: xxxxx
End Date: yyyy
Warranty Type: zzzz
Status: uuuu
Start Date: aaaa
End Date: bbbb
Status: cccc";
Regex regexObj = new Regex(@"Start Date: (.*?)\nEnd Date: (.*?)\n(.|\n)*?Status: (.*)");
Match matchResult = regexObj.Match(subjectString);
while (matchResult.Success) {
resultList.Add(matchResult.Groups[1].Value);
resultList.Add(matchResult.Groups[2].Value);
resultList.Add(matchResult.Groups[4].Value);
matchResult = matchResult.NextMatch();
}
#3
0
You may replace your code with the following one (see IDEONE demo):
您可以使用以下代码替换您的代码(请参阅IDEONE演示):
var s = @"Start Date: xxxxx
End Date: xxxx
Warranty Type: xxxx
Status: xxxx";
var res = Regex.Replace(s, @":\s+", ": ") // Remove excessive whitespace
.Split(new[] { "\r", "\n" }, StringSplitOptions.RemoveEmptyEntries) // Split each line with `:`+space
.ToDictionary(n => n[0], n => n[1]); // Create a dictionary
string strStartDate = string.Empty;
string strEndDate = string.Empty;
string Status = string.Empty;
string Warranty = string.Empty;
// Demo & variable assignment
if (res.ContainsKey("Start Date")) {
Console.WriteLine(res["Start Date"]);
strStartDate = res["Start Date"];
}
if (res.ContainsKey("Warranty Type")) {
Console.WriteLine(res["Warranty Type"]);
Warranty = res["Warranty Type"];
}
if (res.ContainsKey("End Date")) {
Console.WriteLine(res["End Date"]);
strEndDate = res["End Date"];
}
if (res.ContainsKey("Status")) {
Console.WriteLine(res["Status"]);
string Status = res["Status"];
}
Note that the best approach is to declare your own class with the fields like WarrantyType
, StartDate
, etc. and initialize that right in the LINQ code.
请注意,最好的方法是使用WarrantyType,StartDate等字段声明自己的类,并在LINQ代码中对其进行初始化。
#4
0
Avoid .*
its a catch all which gets regex pattern creators in trouble. Instead create the pattern to match to a specific pattern in the data which always occurs in the data.
避免使用。*它可以解决所有让正则表达式模式创建者陷入困境的问题。而是创建模式以匹配数据中始终出现在数据中的特定模式。
Your pattern are the two dates of \d\d\d\d-\d\d-\d\d\d\d
the rest is anchor text, which should be used as static anchors which can be skipped.
您的模式是两个日期\ d \ d \ d \ d- \ d \ d- \ d \ d \ d \ d其余是锚文本,应该用作可以跳过的静态锚点。
Here is an example where it looks for the date patterns. Once found regex puts it into named match capture groups (?<GroupNameHere>...)
and Linq extracts each match into a dynamic entity and parses the date times.
这是一个查找日期模式的示例。一旦发现正则表达式将其置于命名匹配捕获组(?
Data
Note the first date is reversed as per your example
请注意,根据您的示例,第一个日期是相反的
var data = @"Start Date:
2014-09-08
End Date:
2017-09-07
Status:
Active
Start Date:
2014-09-09
End Date:
2017-09-10
Status:
In-Active
";
Pattern
string pattern = @"
^Start\sDate:\s+ # An anchor of start date that always starts at the BOL
(?<Start>\d\d\d\d-\d\d-\d\d) # actual start date pattern
\s+ # a lot of space including \r\n
^End\sDate:\s+ # End date anchor and space
(?<End>\d\d\d\d-\d\d-\d\d) # pattern of the end date.
\s+ # Same pattern as above for Status
^Status:\s+
(?<Status>[^\s]+)
";
Processing
// Explicit hints to the parser to ingore any non specified matches ones outside the parenthesis(..)
// Multiline states ^ and $ are beginning and eol lines and not beginning and end of buffer.
// Ignore allows us to comment the pattern only; does not affect processing.
Regex.Matches(data, pattern, RegexOptions.ExplicitCapture |
RegexOptions.Multiline |
RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Status = mt.Groups["Status"].Value,
StartDate = DateTime.Parse(mt.Groups["Start"].Value),
EndDate = DateTime.Parse(mt.Groups["End"].Value)
})
Result