处理字符串中的转义字符

时间:2022-12-07 22:28:12

There are strings from the user input I need to convert. The use case is pretty simple:

有来自用户输入的字符串,我需要转换。用例非常简单:

  • When a semicolon is in the string, the string is split into multiple lines.
  • 当字符串中有分号时,字符串被分割成多行。
  • When there are two semicolons in a row, they're converted to one.
  • 当一行中有两个分号时,它们就被转换为1。

In theory, no big problem. I use Python, but I'm sure others with other languages will find this as easy with regular expressions.

理论上讲,没有什么大问题。我使用Python,但我确信其他使用其他语言的人会发现这很容易使用正则表达式。

import re

def get_lines(text):
    """Return a list of lines (list of str)."""
    command_stacking = ";"
    delimiter = re.escape(command_stacking)
    re_del = re.compile("(?<!{s}){s}(?!{s})".format(s=delimiter), re.UNICODE)
    chunks = re_del.split(text)

    # Clean the double delimiters
    for i, chunk in enumerate(chunks):
        chunks[i] = chunk.replace(2 * command_stacking, command_stacking)

    return chunks

That seems to work:

这似乎工作:

>>> get_lines("first line;second line;third line with;;a semicolon")
['first line', 'second line', 'third line with;a semicolon']
>>>

But when there's three or four semicolons, it doesn't behave as expected.

但是当有3或4个分号时,它的行为就不像预期的那样。

The multiple semicolons are ignored by the regular expression (as they should), but when replacing ;; by ;, ;;; is replaced by ;;, ;;;; is replaced by ;;... and so on. It would be great if 2 was replaced by 1, 3 by 2, 4 by 3... that's something I could explain to my users.

多个分号被正则表达式忽略(应该是这样),但是在替换时;的;;;;代替;;;;;;取而代之的是;;…等等。如果2被1、3、2、4、3替换就太好了……这是我可以向我的用户解释的。

What would be the best solution to do that?

最好的解决方案是什么?

Thanks for your help,

谢谢你的帮助,

3 个解决方案

#1


1  

The repl argument of re.sub can be a function.

re.sub的repl参数可以是一个函数。

>>> s = 'a;;b;;;c;;;;d'
>>> pattern = ';{2,}'
>>> def f(m):
    return m.group(0)[1:]

>>> re.sub(pattern, f, s)
'a;b;;c;;;d'
>>> 

#2


1  

Instead of the string replace method use re.sub() with count=1

使用count=1的re.sub()代替字符串替换方法

import re
re.sub(';;', ';', 'foo;;;bar', count=1)

https://docs.python.org/2/library/re.html#re.sub

https://docs.python.org/2/library/re.html re.sub

#3


0  

You can use re.split with look arounds.

你可以使用re.split with look around。

Example

例子

>>> re.split(r'(?<!;);(?!;)', string)
['first line', 'second line', 'third line with;;a semicolon']

Regex

正则表达式

  • (?<!;) Negative look behind. Checks if the ; is not preceded by another ;
    • ; Matches the ;
    • ;匹配;
  • (? < !)消极的向后看。检查;前面没有另一个;;匹配;
  • (?!;) Negative look ahead. Checks if the ; is not followed by another ;
  • (? !)负面展望未来。检查;不跟着另一个;

>>> [x.replace(';;', ';') for x in re.split(r'(?<!;);(?!;)', string)]
['first line', 'second line', 'third line with;a semicolon']

#1


1  

The repl argument of re.sub can be a function.

re.sub的repl参数可以是一个函数。

>>> s = 'a;;b;;;c;;;;d'
>>> pattern = ';{2,}'
>>> def f(m):
    return m.group(0)[1:]

>>> re.sub(pattern, f, s)
'a;b;;c;;;d'
>>> 

#2


1  

Instead of the string replace method use re.sub() with count=1

使用count=1的re.sub()代替字符串替换方法

import re
re.sub(';;', ';', 'foo;;;bar', count=1)

https://docs.python.org/2/library/re.html#re.sub

https://docs.python.org/2/library/re.html re.sub

#3


0  

You can use re.split with look arounds.

你可以使用re.split with look around。

Example

例子

>>> re.split(r'(?<!;);(?!;)', string)
['first line', 'second line', 'third line with;;a semicolon']

Regex

正则表达式

  • (?<!;) Negative look behind. Checks if the ; is not preceded by another ;
    • ; Matches the ;
    • ;匹配;
  • (? < !)消极的向后看。检查;前面没有另一个;;匹配;
  • (?!;) Negative look ahead. Checks if the ; is not followed by another ;
  • (? !)负面展望未来。检查;不跟着另一个;

>>> [x.replace(';;', ';') for x in re.split(r'(?<!;);(?!;)', string)]
['first line', 'second line', 'third line with;a semicolon']