PHP REGEX -删除多余的行中断-字符串处理问题。

时间:2021-08-04 12:14:06

I am inputting a dirty string (lots of spaces, line-breaks and extra false spaces just before punctuation characters.

我输入了一个脏字符串(在标点符号之前有很多空格、换行符和额外的空格)。

my desired output is explained in the code below.

下面的代码解释了我想要的输出。

It seems that I can achieve to remove excess white-spaces + to remove spaces just before punctuation characters. But my output has still unwanted excess line-breaks.

我似乎可以在使用标点符号之前删除多余的空格+删除空格。但是我的输出仍然有多余的断行。

I use functions below while I print the user input from MySQL db to screen.

我在打印用户从MySQL db到屏幕的输入时使用下面的函数。

echo "\t\t".'<p>'.nl2br(convert_str(htmlspecialchars($comment))).'</p>'."\r\n";

my custom function code is below:

我的自定义函数代码如下:

function convert_str ($str)
{
    // remove excess whitespace
    // looks for a one or more spaces and replaces them all with a single space.
    $str = preg_replace('/ +/', ' ', $str);
    // check for instances of more than two line breaks in a row
    // and then change them to a total of two line breaks
    //did not worked for me --> preg_replace('/(?:(?:\r\n|\r|\n)\s*){2}/s', "\n\n", $str);
    $str = preg_replace('/[ \t]+/', ' ', preg_replace('/\s*$^\s*/m', "\n", $str));
    // if exists; remove 1 space character just before punctuations below:
    // $punc = array('.',',',';',':','...','?','!','-','—','/','\\','“','”','‘','’','"','\'','(',')','[',']','’','{','}','*','&','#','^','<','>','|');
    $punc = array(' .',' ,',' ;',' :',' ...',' ?',' !',' -',' —',' /',' \\',' “',' ”',' ‘',' ’',' "',' \'',' (',' )',' [',' ]',' ’',' {',' }',' *',' &',' #',' ^',' <',' >',' |');
    $replace = array('.',',',';',':','...','?','!','-','—','/','\\','“','”','‘','’','"','\'','(',')','[',']','’','{','}','*','&','#','^','<','>','|');
    $str = str_replace($punc,$replace,$str);
    return $str;
}

Can you please correct me?

你能纠正我吗?

update: I use prepared statements to enter user input into MySQL db tables and I do not manipulate users' data during entrance into db.

更新:我使用准备好的语句将用户输入输入输入到MySQL db表中,并且在进入db时不操作用户的数据。

1 个解决方案

#1


2  

I found the simple but 5-hours-consuming reason: using just \n instead of \r\n.

我找到了一个简单却耗时5个小时的理由:只用\n来代替\r\n。

So the code that satisfies my requirements is:

所以符合我要求的代码是

function convert_str ($str)
{
    // remove excess whitespace
    // looks for a one or more spaces and replaces them all with a single space.
    $str = preg_replace('/ +/', ' ', $str);
    // check for instances of more than two line breaks in a row
    // and then change them to a total of two line breaks
    $str = preg_replace('/(?:(?:\r\n|\r|\n)\s*){2}/s', "\r\n\r\n", $str);
    // if exists; remove 1 space character just before punctuations below:
    // $punc = array('.',',',';',':','...','?','!','-','—','/','\\','“','”','‘','’','"','\'','(',')','[',']','’','{','}','*','&','#','^','<','>','|');
    $punc = array(' .',' ,',' ;',' :',' ...',' ?',' !',' -',' —',' /',' \\',' “',' ”',' ‘',' ’',' "',' \'',' (',' )',' [',' ]',' ’',' {',' }',' *',' &',' #',' ^',' <',' >',' |');
    $replace = array('.',',',';',':','...','?','!','-','—','/','\\','“','”','‘','’','"','\'','(',')','[',']','’','{','}','*','&','#','^','<','>','|');
    $str = str_replace($punc,$replace,$str);
    return $str;
}

#1


2  

I found the simple but 5-hours-consuming reason: using just \n instead of \r\n.

我找到了一个简单却耗时5个小时的理由:只用\n来代替\r\n。

So the code that satisfies my requirements is:

所以符合我要求的代码是

function convert_str ($str)
{
    // remove excess whitespace
    // looks for a one or more spaces and replaces them all with a single space.
    $str = preg_replace('/ +/', ' ', $str);
    // check for instances of more than two line breaks in a row
    // and then change them to a total of two line breaks
    $str = preg_replace('/(?:(?:\r\n|\r|\n)\s*){2}/s', "\r\n\r\n", $str);
    // if exists; remove 1 space character just before punctuations below:
    // $punc = array('.',',',';',':','...','?','!','-','—','/','\\','“','”','‘','’','"','\'','(',')','[',']','’','{','}','*','&','#','^','<','>','|');
    $punc = array(' .',' ,',' ;',' :',' ...',' ?',' !',' -',' —',' /',' \\',' “',' ”',' ‘',' ’',' "',' \'',' (',' )',' [',' ]',' ’',' {',' }',' *',' &',' #',' ^',' <',' >',' |');
    $replace = array('.',',',';',':','...','?','!','-','—','/','\\','“','”','‘','’','"','\'','(',')','[',']','’','{','}','*','&','#','^','<','>','|');
    $str = str_replace($punc,$replace,$str);
    return $str;
}