从直接字母或数字或除引号之外的任何内容之间删除逗号?

时间:2021-10-22 11:48:00

Is there a way, say a regex even that will remove any commas enclosed in two consecutive quotes and surrounded by letters or numbers?

有没有办法,比如一个正则表达式,即使是删除任何用两个连续引号括起来并用字母或数字包围的逗号?

Not sure what else to do here and this is my last hope before I go looking at CSV Helpers:

不知道还有什么可以做的,这是我在看看CSV助手之前的最后希望:

I am using Visual Studio SSIS/BI to import text files into a DB. The problem is, SSIS will chock if the file contains data like this:

我使用Visual Studio SSIS / BI将文本文件导入数据库。问题是,如果文件包含这样的数据,SSIS将会阻塞:

"Soccer rocks, yes it does"

“足球晃动,是的吗”

To remedy this, I used a Replace Method which solved the problem temporarily. I am running this code in Visual Studio BI/SSIS Script task to process the text file to CSV before sending it to the DB.

为了解决这个问题,我使用了一种暂时解决问题的替换方法。我在Visual Studio BI / SSIS脚本任务中运行此代码,以便在将文本文件发送到数据库之前将其处理为CSV。

static void AddComma(string s, TextWriter writer)
{
    foreach (var line in s.Replace(", ", "").Split(new string[] { Environment.NewLine}, StringSplitOptions.None))
    {
        foreach (var t in line)
        {
            writer.Write(t);
        }
        writer.WriteLine();
    }
    writer.Flush();
}

static void Main(string[] args)
{
    TextReader reader = new StreamReader(@"C:\sample\test.txt");
    string a = reader.ReadToEnd();
    reader.Close();

    FileStream aFile = new FileStream(@"C:\sample\test.csv", FileMode.Create);
    AddComma(a, new StreamWriter(aFile));
    aFile.Close();
}

Note: I am replacing comma followed by a single space

注意:我正在替换逗号,后跟一个空格

Replace(", ", "");

The problem is if the data in the text file looks like this:

问题是如果文本文件中的数据如下所示:

"Soccer rocks,yes it does"

“足球晃动,是的吗”

The Replace method will not catch it, obviously.

显然,Replace方法无法捕获它。

Is there a way, say a regex even that will remove any commas enclosed in two consecutive quotes and surrounded by letters or numbers?

有没有办法,比如一个正则表达式,即使是删除任何用两个连续引号括起来并用字母或数字包围的逗号?

So if the data looks like this: "Soccer rocks, yes it does" Or "Soccer rocks 54,23 yes it does" then it will end up like this: "Soccer rocks yes it does"

因此,如果数据看起来像这样:“足球晃动,是的确如此”或“足球晃动54,23是的它”,那么它最终会像这样:“足球晃动它是的吗”

I am not sure what is possible and simply looking for some kind of solution.

我不确定什么是可能的,只是寻找某种解决方案。

1 个解决方案

#1


1  

did you mean something like this?

你的意思是这样的吗?

if yes, you should use matcher with patern regex ("[\w\s]*),([\w\s]*"), and get the first and second group then you will get what you need.

如果是的话,你应该使用带有patern正则表达式的匹配器(“[\ w \ s] *),([\ w \ s] *”),然后获得第一组和第二组,然后你将得到你需要的东西。

if you use c#, that's means you use .net engine regex then you can use infinite repetition lookbehind.

如果你使用c#,这意味着你使用.net引擎正则表达式,那么你可以使用无限重复的lookbehind。

You can try something like this then s.Replace("(?<="[\w\s]+),(?=[\w\s]+")","-"), so you can just replace it without needing to get group and match.

你可以试试这样的东西然后s.Replace(“(?<=”[\ w \ s] +),(?= [\ w \ s] +“)”,“ - ”),所以你可以更换它不需要组和匹配。

#1


1  

did you mean something like this?

你的意思是这样的吗?

if yes, you should use matcher with patern regex ("[\w\s]*),([\w\s]*"), and get the first and second group then you will get what you need.

如果是的话,你应该使用带有patern正则表达式的匹配器(“[\ w \ s] *),([\ w \ s] *”),然后获得第一组和第二组,然后你将得到你需要的东西。

if you use c#, that's means you use .net engine regex then you can use infinite repetition lookbehind.

如果你使用c#,这意味着你使用.net引擎正则表达式,那么你可以使用无限重复的lookbehind。

You can try something like this then s.Replace("(?<="[\w\s]+),(?=[\w\s]+")","-"), so you can just replace it without needing to get group and match.

你可以试试这样的东西然后s.Replace(“(?<=”[\ w \ s] +),(?= [\ w \ s] +“)”,“ - ”),所以你可以更换它不需要组和匹配。