I would like to split up a string using a space as my delimiter, but if there are multiple words enclosed in double or single quotes, then I would like them to be returned as one item.
我想使用空格作为我的分隔符来拆分字符串,但是如果有多个单词用双引号或单引号括起来,那么我希望它们作为一个项返回。
For example if the input string is:
例如,如果输入字符串是:
CALL "C:\My File Name With Space" /P1 P1Value /P1 P2Value
CALL“C:\ My Space Name with Space”/ P1 P1Value / P1 P2Value
The output array would be:
输出数组将是:
Array[0]=Call
Array[1]=C:\My File Name With Space
Array[2]=/P1
Array[3]=P1Value
Array[4]=/P1
Array[5]=P2Value
How do you use regular expressions to do this? I realize that there are command line parsers. I took a cursory look at a popular one, but it did not handle the situation where you can have multiple parameters with the same name. In any event, instead of learning how to use a command line parsing library (leave that for another day). I'm interested in getting exposed more to RegEx functions.
你如何使用正则表达式来做到这一点?我意识到有命令行解析器。我粗略地看了一下流行的一个,但它没有处理你可以有多个具有相同名称的参数的情况。无论如何,而不是学习如何使用命令行解析库(将其留在另一天)。我有兴趣更多地了解RegEx功能。
How would you use a RegEx function to parse this?
你会如何使用RegEx函数来解析它?
3 个解决方案
#1
10
The link in Jim Mischel's comment points out that the Win32 API provides a function for this. I'd recommend using that for consistency. Here's a sample (from PInvoke).
Jim Mischel评论中的链接指出Win32 API为此提供了一个功能。我建议使用它来保持一致性。这是一个样本(来自PInvoke)。
static string[] SplitArgs(string unsplitArgumentLine)
{
int numberOfArgs;
IntPtr ptrToSplitArgs;
string[] splitArgs;
ptrToSplitArgs = CommandLineToArgvW(unsplitArgumentLine, out numberOfArgs);
if (ptrToSplitArgs == IntPtr.Zero)
throw new ArgumentException("Unable to split argument.",
new Win32Exception());
try
{
splitArgs = new string[numberOfArgs];
for (int i = 0; i < numberOfArgs; i++)
splitArgs[i] = Marshal.PtrToStringUni(
Marshal.ReadIntPtr(ptrToSplitArgs, i * IntPtr.Size));
return splitArgs;
}
finally
{
LocalFree(ptrToSplitArgs);
}
}
[DllImport("shell32.dll", SetLastError = true)]
static extern IntPtr CommandLineToArgvW(
[MarshalAs(UnmanagedType.LPWStr)] string lpCmdLine,
out int pNumArgs);
[DllImport("kernel32.dll")]
static extern IntPtr LocalFree(IntPtr hMem);
If you want a quick-and-dirty, inflexible, fragile regex solution you can do something like this:
如果你想要一个快速,肮脏,不灵活,脆弱的正则表达式解决方案,你可以做这样的事情:
var rex = new Regex(@"("".*?""|[^ ""]+)+");
string test = "CALL \"C:\\My File Name With Space\" /P1 P1Value /P1 P2Value";
var array = rex.Matches(test).OfType<Match>().Select(m => m.Groups[0]).ToArray();
#2
1
I wouldn't do it with Regex, for various reasons shown above.
由于上面显示的各种原因,我不会使用正则表达式。
If I did need to, this would match your simple requirements:
如果我确实需要,这将符合您的简单要求:
(".*?")|([^ ]+)
However, this doesn't include:
但是,这不包括:
- Escaped quotes
- Single quotes
- non-ascii quotes (you don't think people will paste smart quotes from word into your file?)
- combinations of the above
非ascii引号(你不认为人们会将单词中的智能引号粘贴到你的文件中吗?)
以上的组合
And that's just off the top of my head.
而这只是我的头脑。
#3
1
@chad Henderson you forgot to include the single quotes, and this also have the problem of capturing anything that comes before a set of quotes.
@chad Henderson你忘了包含单引号,这也有捕获一组引号之前的任何内容的问题。
here is the correction including the single quotes, but also shows the problem with the extra capture before a quote. http://regexhero.net/tester/?id=81cebbb2-5548-4973-be19-b508f14c3348
这里是包含单引号的更正,但也显示了报价前额外捕获的问题。 http://regexhero.net/tester/?id=81cebbb2-5548-4973-be19-b508f14c3348
#1
10
The link in Jim Mischel's comment points out that the Win32 API provides a function for this. I'd recommend using that for consistency. Here's a sample (from PInvoke).
Jim Mischel评论中的链接指出Win32 API为此提供了一个功能。我建议使用它来保持一致性。这是一个样本(来自PInvoke)。
static string[] SplitArgs(string unsplitArgumentLine)
{
int numberOfArgs;
IntPtr ptrToSplitArgs;
string[] splitArgs;
ptrToSplitArgs = CommandLineToArgvW(unsplitArgumentLine, out numberOfArgs);
if (ptrToSplitArgs == IntPtr.Zero)
throw new ArgumentException("Unable to split argument.",
new Win32Exception());
try
{
splitArgs = new string[numberOfArgs];
for (int i = 0; i < numberOfArgs; i++)
splitArgs[i] = Marshal.PtrToStringUni(
Marshal.ReadIntPtr(ptrToSplitArgs, i * IntPtr.Size));
return splitArgs;
}
finally
{
LocalFree(ptrToSplitArgs);
}
}
[DllImport("shell32.dll", SetLastError = true)]
static extern IntPtr CommandLineToArgvW(
[MarshalAs(UnmanagedType.LPWStr)] string lpCmdLine,
out int pNumArgs);
[DllImport("kernel32.dll")]
static extern IntPtr LocalFree(IntPtr hMem);
If you want a quick-and-dirty, inflexible, fragile regex solution you can do something like this:
如果你想要一个快速,肮脏,不灵活,脆弱的正则表达式解决方案,你可以做这样的事情:
var rex = new Regex(@"("".*?""|[^ ""]+)+");
string test = "CALL \"C:\\My File Name With Space\" /P1 P1Value /P1 P2Value";
var array = rex.Matches(test).OfType<Match>().Select(m => m.Groups[0]).ToArray();
#2
1
I wouldn't do it with Regex, for various reasons shown above.
由于上面显示的各种原因,我不会使用正则表达式。
If I did need to, this would match your simple requirements:
如果我确实需要,这将符合您的简单要求:
(".*?")|([^ ]+)
However, this doesn't include:
但是,这不包括:
- Escaped quotes
- Single quotes
- non-ascii quotes (you don't think people will paste smart quotes from word into your file?)
- combinations of the above
非ascii引号(你不认为人们会将单词中的智能引号粘贴到你的文件中吗?)
以上的组合
And that's just off the top of my head.
而这只是我的头脑。
#3
1
@chad Henderson you forgot to include the single quotes, and this also have the problem of capturing anything that comes before a set of quotes.
@chad Henderson你忘了包含单引号,这也有捕获一组引号之前的任何内容的问题。
here is the correction including the single quotes, but also shows the problem with the extra capture before a quote. http://regexhero.net/tester/?id=81cebbb2-5548-4973-be19-b508f14c3348
这里是包含单引号的更正,但也显示了报价前额外捕获的问题。 http://regexhero.net/tester/?id=81cebbb2-5548-4973-be19-b508f14c3348