I have a PDF file which I converted to .txt using an online tool. Now I want to parse the data in that and split it using regular expression. I am almost done but stuck at 1 point.
我有一个PDF文件,我用一个在线工具把它转换成。txt。现在我要解析数据,并使用正则表达式拆分它。我快做完了,但还是停留在一点上。
Example of data is:
数据的例子:
00 41 53 Bid Form – Design/Build (Single-Prime Contract)
27 05 13.23 T1 Services
I want to split it like : 00 41 53 Bid Form – Design/Build (Single-Prime Contract)
and other is 27 05 13.23 T1 Services
我想把它拆分为:00 41 53投标表格-设计/建造(单主要合同)和其他27 05 13.23 T1服务
The regular Expression I'm using is [0-9](\d|\ |\.)*(\D)*
我使用的正则表达式是[0-9](\d|\ |\。)*(\ d)*。
It can have numbers with spaces and/or dots, then text which can be (letters, dot, comma, (
, )
, -
, and digits).
它可以有空格和/或点的数字,然后可以是文本(字母、点、逗号、()、-和数字)。
I cannot match a string if it has number in it like the "T1 Services" above.
如果字符串中有像上面的“T1服务”那样的数字,则无法匹配字符串。
2 个解决方案
#1
2
If I understood this correctly , you are trying to split by newline character .This is in C#.
如果我理解正确的话,您是在尝试用换行字符来分割,这是在c#中。
string[] Result = Regex.Split(inputText, "[\r\n]+");
#2
0
you can also done it with out regex Like this:
你也可以像这样使用regex:
string phrase = ".......\n,,,,.ll..\r\n....";
string[] words;
words = phrase.Split(new string []{"\n","\r"}), StringSplitOptions.RemoveEmptyEntries);
if you want regex only then use @mhasan solution.
如果您只想要regex,那么使用@mhasan解决方案。
#1
2
If I understood this correctly , you are trying to split by newline character .This is in C#.
如果我理解正确的话,您是在尝试用换行字符来分割,这是在c#中。
string[] Result = Regex.Split(inputText, "[\r\n]+");
#2
0
you can also done it with out regex Like this:
你也可以像这样使用regex:
string phrase = ".......\n,,,,.ll..\r\n....";
string[] words;
words = phrase.Split(new string []{"\n","\r"}), StringSplitOptions.RemoveEmptyEntries);
if you want regex only then use @mhasan solution.
如果您只想要regex,那么使用@mhasan解决方案。