如何通过用户输入转义收到的乳胶代码？

I read in a string from a GUI textbox entered by the user and process it through pandoc. The string contains latex directives for math which have backslash characters. I want to send in the string as a raw string to pandoc for processing. But something like "\theta" becomes a tab and "heta".

我从用户输入的GUI文本框中读取一个字符串,并通过pandoc处理它。该字符串包含带有反斜杠字符的math的latex指令。我想将字符串作为原始字符串发送到pandoc进行处理。但是像“\ theta”这样的东西变成了一个标签和“heta”。

How can I convert a string literal that contains backslash characters to a raw string...?

如何将包含反斜杠字符的字符串文字转换为原始字符串...?

Edit:

Thanks develerx, flying sheep and unutbu. But none of the solutions seem to help me. The reason is that there are other backslashed-characters which do not have any effect in python but do have a meaning in latex.

谢谢develerx,飞羊和unutbu。但是这些解决方案似乎都没有帮助我。原因是还有其他的backslashed-characters在python中没有任何影响但在latex中有意义。

For example '\lambda'. All the methods suggested produce

例如'\ lambda'。建议生产所有方法

\\lambda

which does not go through in latex processing -- it should remain as \lambda.

在乳胶加工中没有经过 - 它应该保持为\ lambda。

Another edit:

If i can get this work, i think i should be through. @Mark: All three methods give answers that i dont desire.

如果我能得到这项工作,我想我应该通过。 @Mark:这三种方法都给出了我不想要的答案。

a='\nu + \lambda + \theta'; b=a.replace(r"\\",r"\\\\"); c='%r' %a; d=a.encode('string_escape');print au + \lambda +   hetaprint bu + \lambda +   hetaprint c'\nu + \\lambda + \theta'print d\nu + \\lambda + \theta

5 个解决方案

#1

Python’s raw strings are just a way to tell the Python interpreter that it should interpret backslashes as literal slashes. If you read strings entered by the user, they are already past the point where they could have been raw. Also, user input is most likely read in literally, i.e. “raw”.

Python的原始字符串只是一种告诉Python解释器它应该将反斜杠解释为文字斜杠的方法。如果您阅读了用户输入的字符串,那么它们已经超出了它们原始的位置。而且,用户输入最有可能按字面读取,即“原始”。

This means the interpreting happens somewhere else. But if you know that it happens, why not escape the backslashes for whatever is interpreting it?

这意味着口译发生在其他地方。但是,如果你知道它发生了,为什么不逃避反斜杠的任何解释呢?

s = s.replace("\\", "\\\\")

(Note that you can't do r"\" as “a raw string cannot end in a single backslash”, but I could have used r"\\" as well for the second argument.)

(请注意,你不能将r“\”作为“原始字符串不能以单个反斜杠结尾”,但我可以使用r“\\”作为第二个参数。)

If that doesn’t work, your user input is for some arcane reason interpreting the backslashes, so you’ll need a way to tell it to stop that.

如果这不起作用,您的用户输入是出于解释反斜杠的一些神秘原因,因此您需要一种方法来告诉它停止它。

#2

If you want to convert an existing string to raw string, then we can reassign that like below

如果要将现有字符串转换为原始字符串,那么我们可以像下面那样重新分配

s1 = "welcome\tto\tPython"raw_s1 = "%r"%s1print(raw_s1)

Will print

welcome\tto\tPython

#3

a='\nu + \lambda + \theta'd=a.encode('string_escape').replace('\\\\','\\')print(d)# \nu + \lambda + \theta

This shows that there is a single backslash before the n, l and t:

这表明在n,l和t之前有一个反斜杠:

print(list(d))# ['\\', 'n', 'u', ' ', '+', ' ', '\\', 'l', 'a', 'm', 'b', 'd', 'a', ' ', '+', ' ', '\\', 't', 'h', 'e', 't', 'a']

There is something funky going on with your GUI. Here is a simple example of grabbing some user input through a Tkinter.Entry. Notice that the text retrieved only has a single backslash before the n, l, and t. Thus no extra processing should be necessary:

你的GUI有一些时髦的东西。这是一个通过Tkinter.Entry获取一些用户输入的简单示例。请注意,检索到的文本在n,l和t之前只有一个反斜杠。因此,不需要额外的处理:

import Tkinter as tkdef callback():    print(list(text.get()))root = tk.Tk()root.config()b = tk.Button(root, text="get", width=10, command=callback)text=tk.StringVar()entry = tk.Entry(root,textvariable=text)b.pack(padx=5, pady=5)entry.pack(padx=5, pady=5)root.mainloop()

If you type \nu + \lambda + \theta into the Entry box, the console will (correctly) print:

如果在“输入”框中键入\ nu + \ lambda + \ theta,控制台将(正确)打印:

['\\', 'n', 'u', ' ', '+', ' ', '\\', 'l', 'a', 'm', 'b', 'd', 'a', ' ', '+', ' ', '\\', 't', 'h', 'e', 't', 'a']

If your GUI is not returning similar results (as your post seems to suggest), then I'd recommend looking into fixing the GUI problem, rather than mucking around with string_escape and string replace.

如果您的GUI没有返回类似的结果(正如您的帖子似乎建议的那样),那么我建议您考虑修复GUI问题,而不是使用string_escape和字符串替换。

#4

When you read the string from the GUI control, it is already a "raw" string. If you print out the string you might see the backslashes doubled up, but that's an artifact of how Python displays strings; internally there's still only a single backslash.

当您从GUI控件读取字符串时,它已经是一个“原始”字符串。如果您打印出字符串,您可能会看到反斜杠加倍,但这是Python显示字符串的工件;内部仍然只有一个反斜杠。

>>> a='\nu + \lambda + \theta'>>> a'\nu + \\lambda + \theta'>>> len(a)20>>> b=r'\nu + \lambda + \theta'>>> b'\\nu + \\lambda + \\theta'>>> len(b)22>>> b[0]'\\'>>> print b\nu + \lambda + \theta

#5

I spent a lot of time trying different answers all around the internet, and I suspect the reasons why one thing works for some people and not for others is due to very small weird differences in application. For context, I needed to read in file names from a csv file that had strange and/or unmappable unicode characters and write them to a new csv file. For what it's worth, here's what worked for me:

我花了很多时间在互联网上尝试不同的答案,我怀疑为什么一件事对某些人而不是对其他人有效的原因是由于应用程序中非常小的奇怪差异。对于上下文,我需要从具有奇怪和/或不可映射的unicode字符的csv文件中读取文件名,并将它们写入新的csv文件。对于它的价值,这对我有用:

s = '\u00e7\u00a3\u0085\u00e5\u008d\u0095' # csv freaks if you try to write thiss = repr(s.encode('utf-8', 'ignore'))[2:-1]

#1