如何在LaTeX文档中转义/删除特殊字符?

时间:2022-05-22 22:28:55

We implemented the online service where it is possible to generate PDF with predefined structure. The user can choose a LaTeX template and then compile it with an appropriate inputs.

我们实施了在线服务,可以生成具有预定义结构的PDF。用户可以选择LaTeX模板,然后使用适当的输入进行编译。

The question we worry about is the security, that the malicious user was not able to gain shell access through the injection of special instruction into latex document.

我们担心的问题是安全性,恶意用户无法通过向乳胶文档中注入特殊指令来获取shell访问权限。

We need some workaround for this or at least a list of special characters that we should strip from the input data.

我们需要一些解决方法或至少我们应该从输入数据中删除的特殊字符列表。

Preferred language would be PHP, but any suggestions, constructions and links are very welcomed.

首选语言是PHP,但非常欢迎任何建议,结构和链接。

PS. in few word we're looking for mysql_real_escape_string for LaTeX

PS。简而言之,我们正在为LaTeX寻找mysql_real_escape_string

5 个解决方案

#1


3  

The only possibility (AFAIK) to perform harmful operations using LaTeX is to enable the possibility to call external commands using \write18. This only works if you run LaTeX with the --shell-escape or --enable-write18 argument (depending on your distribution).

使用LaTeX执行有害操作的唯一可能性(AFAIK)是使用\ write18调用外部命令的可能性。这仅适用于使用--shell-escape或--enable-write18参数运行LaTeX(取决于您的分发)。

So as long as you do not run it with one of these arguments you should be safe without the need to filter out any parts.

因此,只要您不使用其中一个参数运行它,您就应该是安全的,而无需过滤掉任何部分。

Besides that, one is still able to write other files using the \newwrite, \openout and \write commands. Having the user create and (over)write files might be unwanted? So you could filter out occurrences of these commands. But keeping blacklists of certain commands is prone to fail since someone with a bad intention can easily hide the actual command by obfusticating the input document.

除此之外,还可以使用\ newwrite,\ openout和\ write命令编写其他文件。让用户创建和(over)写入文件可能不受欢迎?因此,您可以过滤掉这些命令的出现次数。但是保留某些命令的黑名单很容易失败,因为有意图的人可以通过捏造输入文档来轻易隐藏实际命令。

Edit: Running the LaTeX command using a limited account (ie no writing to non latex/project related directories) in combination with disabling \write18 might be easier and more secure than keeping a blacklist of 'dangerous' commands.

编辑:使用有限的帐户运行LaTeX命令(即不写入非乳胶/项目相关目录)与禁用\ write18相结合可能比保留“危险”命令的黑名单更容易,更安全。

#2


15  

Here's some code to implement the Geoff Reedy answer. I place this code in the public domain.

这是一些实现Geoff Reedy答案的代码。我将此代码放在公共域中。

<?

$test = "Test characters: # $ % & ~ _ ^ \ { }.";
header( "content-type:text/plain" );
print latexSpecialChars( $test );
exit;

function latexSpecialChars( $string )
{
    $map = array( 
            "#"=>"\\#",
            "$"=>"\\$",
            "%"=>"\\%",
            "&"=>"\\&",
            "~"=>"\\~{}",
            "_"=>"\\_",
            "^"=>"\\^{}",
            "\\"=>"\\textbackslash",
            "{"=>"\\{",
            "}"=>"\\}",
    );
    return preg_replace( "/([\^\%~\\\\#\$%&_\{\}])/e", "\$map['$1']", $string );
}

#3


3  

In general, achieving security purely through escaping command sequences is hard to do without drastically reducing expressivity, since it there is no principled way to distinguish safe cs's from unsafe ones: Tex is just not a clean enough programming language to allow this. I'd say abandon this approach in favour of eliminating the existence of security holes.

一般来说,如果不大幅降低表达性,很难实现纯粹通过转义命令序列来实现安全性,因为它没有原则上的方法来区分安全的cs和不安全的cs:Tex只是不够干净的编程语言来实现这一点。我会说放弃这种方法有利于消除安全漏洞的存在。

Veger's summary of the security holes in Latex conforms with mine: i.e., the issues are shell escapes and file creation.overwriting, though he has missed a shell escape vulnerability. Some additional points follow, then some recommendations:

Veger对Latex中安全漏洞的总结符合我的要求:即问题是shell转义和文件创建。重写,尽管他已经错过了shell逃逸漏洞。接下来是一些补充点,然后是一些建议:

  1. It is not enough to avoid actively invoking --shell-escape, since it can be implicitly enabled in texmf.cnf. You should explicitly pass --no-shell-escape to override texmf.cnf;
  2. 避免主动调用--shell-escape是不够的,因为它可以在texmf.cnf中隐式启用。你应该明确地传递--no-shell-escape来覆盖texmf.cnf;

  3. \write18 is a primitive of Etex, not Knuth's Tex. So you can avoid Latexes that implement it (which, unfortunately, is most of them);
  4. \ write18是Etex的原语,而不是Knuth的Tex。所以你可以避免实现它的Latexes(不幸的是,它们是大多数);

  5. If you are using Dvips, there is another risk: \special commands can create .dvi files that ask dvips to execute shell commands. So you should, if you use dvips, pass the -R2 command to forbid invoking of shell commands;
  6. 如果您使用Dvips,还有另一个风险:\特殊命令可以创建.dvi文件,要求dvips执行shell命令。因此,如果使用dvips,则应该传递-R2命令以禁止调用shell命令;

  7. texmf.cnf allows you to specify where Tex can create files;
  8. texmf.cnf允许您指定Tex可以创建文件的位置;

  9. You might not be able to avoid disabling creation of fonts if you want your clients much freedom in which fonts they may create. Take a look at the notes on security for Kpathsea; the default behaviour seems reasonable to me, but you could have a per user font tree, to prevent one user stepping on another users toes.
  10. 如果您希望客户可以*创建字体,则可能无法避免禁用字体创建。看看Kpathsea的安全注意事项;默认行为对我来说似乎是合理的,但你可以有一个每用户字体树,以防止一个用户踩到另一个用户脚趾。

Options:

  1. Sandbox your client's Latex invocations, and allow them freedom to misbehave in the sandbox;
  2. 沙箱客户端的Latex调用,允许他们*地在沙箱中行为不端;

  3. Trust in kpathsea's defaults, and forbid shell escapes in latex and any other executables used to build the PDF output;
  4. 信任kpathsea的默认值,并禁止在latex和用于构建PDF输出的任何其他可执行文件中进行shell转义;

  5. Drastically reduce expressivity, forbidding your clients the ability to create font files or any new client-specified files. Run latex as a process that can only write to certain already existing files;
  6. 大幅降低表现力,禁止客户创建字体文件或任何新的客户端指定文件。运行latex作为只能写入某些已存在文件的进程;

  7. You can create a format file in which the \write18 cs, and the file creation css, are not bound, and only macros that invoke them safely, such as for font/toc/bbl creation, exist. This means you have to decide what functionality your clients have: they would not be able to freely choose which packages they import, but must make use of the choices you have imposed on them. Depending on what kind of 'templates' you have in mind, this could be a good option, allowing use of packages that use shell escapes, but you will need to audit the Tex/Latex code that goes into your format file.
  8. 您可以创建一个格式文件,其中\ write18 cs和文件创建css未绑定,并且只存在安全调用它们的宏,例如font / toc / bbl创建。这意味着您必须决定您的客户具有哪些功能:他们无法*选择导入的软件包,但必须使用您对其进行的选择。根据您所考虑的“模板”类型,这可能是一个很好的选择,允许使用使用shell转义的包,但您需要审核进入格式文件的Tex / Latex代码。

Postscript

There's a TUGBoat article, Server side PDF generation based on LATEX templates, addressing another take on the question to the one I have taken, namely generating PDFs from form input using Latex.

有一篇TUGBoat文章,基于LATEX模板的服务器端PDF生成,解决了我已经采取的问题的另一个问题,即使用Latex从表单输入生成PDF。

#4


2  

According to http://www.tug.org/tutorials/latex2e/Special_Characters.html the special characters in latex are # $ % & ~ _ ^ \ { }. Most can be escaped with a simple backslash but _ ^ and \ need special treatment.

根据http://www.tug.org/tutorials/latex2e/Special_Characters.html,乳胶中的特殊字符是#$%&~_ ^ \ {}。大多数可以使用简单的反斜杠进行转义,但_ ^和\需要特殊处理。

For caret use \^{} (or \textasciicircum), for tilde use \~{} (or \textasciitilde) and for backslash use \textbackslash

对于插入符使用\ ^ {}(或\ textasciicircum),使用\〜{}(或\ textasciitilde)和反斜杠使用\ textbackslash

If you want the user input to appear as typewriter text, there is also the \verb command which can be used like \verb+asdf$$&\~^+, the + can be any character but can't be in the text.

如果您希望用户输入显示为打字机文本,还有\ verb命令可以像\ verb + asdf $$&\〜^ +一样使用,+可以是任何字符但不能在文本中。

#5


0  

You'd probably want to make sure that your \write18 is disabled.

你可能想确保你的\ write18被禁用。

See http://www.fceia.unr.edu.ar/lcc/cdrom/Instalaciones/LaTex/MiKTex/doc/ch04s08.html and http://www.texdev.net/2009/10/06/what-does-write18-mean/

见http://www.fceia.unr.edu.ar/lcc/cdrom/Instalaciones/LaTex/MiKTex/doc/ch04s08.html和http://www.texdev.net/2009/10/06/what-does -write18均值/

#1


3  

The only possibility (AFAIK) to perform harmful operations using LaTeX is to enable the possibility to call external commands using \write18. This only works if you run LaTeX with the --shell-escape or --enable-write18 argument (depending on your distribution).

使用LaTeX执行有害操作的唯一可能性(AFAIK)是使用\ write18调用外部命令的可能性。这仅适用于使用--shell-escape或--enable-write18参数运行LaTeX(取决于您的分发)。

So as long as you do not run it with one of these arguments you should be safe without the need to filter out any parts.

因此,只要您不使用其中一个参数运行它,您就应该是安全的,而无需过滤掉任何部分。

Besides that, one is still able to write other files using the \newwrite, \openout and \write commands. Having the user create and (over)write files might be unwanted? So you could filter out occurrences of these commands. But keeping blacklists of certain commands is prone to fail since someone with a bad intention can easily hide the actual command by obfusticating the input document.

除此之外,还可以使用\ newwrite,\ openout和\ write命令编写其他文件。让用户创建和(over)写入文件可能不受欢迎?因此,您可以过滤掉这些命令的出现次数。但是保留某些命令的黑名单很容易失败,因为有意图的人可以通过捏造输入文档来轻易隐藏实际命令。

Edit: Running the LaTeX command using a limited account (ie no writing to non latex/project related directories) in combination with disabling \write18 might be easier and more secure than keeping a blacklist of 'dangerous' commands.

编辑:使用有限的帐户运行LaTeX命令(即不写入非乳胶/项目相关目录)与禁用\ write18相结合可能比保留“危险”命令的黑名单更容易,更安全。

#2


15  

Here's some code to implement the Geoff Reedy answer. I place this code in the public domain.

这是一些实现Geoff Reedy答案的代码。我将此代码放在公共域中。

<?

$test = "Test characters: # $ % & ~ _ ^ \ { }.";
header( "content-type:text/plain" );
print latexSpecialChars( $test );
exit;

function latexSpecialChars( $string )
{
    $map = array( 
            "#"=>"\\#",
            "$"=>"\\$",
            "%"=>"\\%",
            "&"=>"\\&",
            "~"=>"\\~{}",
            "_"=>"\\_",
            "^"=>"\\^{}",
            "\\"=>"\\textbackslash",
            "{"=>"\\{",
            "}"=>"\\}",
    );
    return preg_replace( "/([\^\%~\\\\#\$%&_\{\}])/e", "\$map['$1']", $string );
}

#3


3  

In general, achieving security purely through escaping command sequences is hard to do without drastically reducing expressivity, since it there is no principled way to distinguish safe cs's from unsafe ones: Tex is just not a clean enough programming language to allow this. I'd say abandon this approach in favour of eliminating the existence of security holes.

一般来说,如果不大幅降低表达性,很难实现纯粹通过转义命令序列来实现安全性,因为它没有原则上的方法来区分安全的cs和不安全的cs:Tex只是不够干净的编程语言来实现这一点。我会说放弃这种方法有利于消除安全漏洞的存在。

Veger's summary of the security holes in Latex conforms with mine: i.e., the issues are shell escapes and file creation.overwriting, though he has missed a shell escape vulnerability. Some additional points follow, then some recommendations:

Veger对Latex中安全漏洞的总结符合我的要求:即问题是shell转义和文件创建。重写,尽管他已经错过了shell逃逸漏洞。接下来是一些补充点,然后是一些建议:

  1. It is not enough to avoid actively invoking --shell-escape, since it can be implicitly enabled in texmf.cnf. You should explicitly pass --no-shell-escape to override texmf.cnf;
  2. 避免主动调用--shell-escape是不够的,因为它可以在texmf.cnf中隐式启用。你应该明确地传递--no-shell-escape来覆盖texmf.cnf;

  3. \write18 is a primitive of Etex, not Knuth's Tex. So you can avoid Latexes that implement it (which, unfortunately, is most of them);
  4. \ write18是Etex的原语,而不是Knuth的Tex。所以你可以避免实现它的Latexes(不幸的是,它们是大多数);

  5. If you are using Dvips, there is another risk: \special commands can create .dvi files that ask dvips to execute shell commands. So you should, if you use dvips, pass the -R2 command to forbid invoking of shell commands;
  6. 如果您使用Dvips,还有另一个风险:\特殊命令可以创建.dvi文件,要求dvips执行shell命令。因此,如果使用dvips,则应该传递-R2命令以禁止调用shell命令;

  7. texmf.cnf allows you to specify where Tex can create files;
  8. texmf.cnf允许您指定Tex可以创建文件的位置;

  9. You might not be able to avoid disabling creation of fonts if you want your clients much freedom in which fonts they may create. Take a look at the notes on security for Kpathsea; the default behaviour seems reasonable to me, but you could have a per user font tree, to prevent one user stepping on another users toes.
  10. 如果您希望客户可以*创建字体,则可能无法避免禁用字体创建。看看Kpathsea的安全注意事项;默认行为对我来说似乎是合理的,但你可以有一个每用户字体树,以防止一个用户踩到另一个用户脚趾。

Options:

  1. Sandbox your client's Latex invocations, and allow them freedom to misbehave in the sandbox;
  2. 沙箱客户端的Latex调用,允许他们*地在沙箱中行为不端;

  3. Trust in kpathsea's defaults, and forbid shell escapes in latex and any other executables used to build the PDF output;
  4. 信任kpathsea的默认值,并禁止在latex和用于构建PDF输出的任何其他可执行文件中进行shell转义;

  5. Drastically reduce expressivity, forbidding your clients the ability to create font files or any new client-specified files. Run latex as a process that can only write to certain already existing files;
  6. 大幅降低表现力,禁止客户创建字体文件或任何新的客户端指定文件。运行latex作为只能写入某些已存在文件的进程;

  7. You can create a format file in which the \write18 cs, and the file creation css, are not bound, and only macros that invoke them safely, such as for font/toc/bbl creation, exist. This means you have to decide what functionality your clients have: they would not be able to freely choose which packages they import, but must make use of the choices you have imposed on them. Depending on what kind of 'templates' you have in mind, this could be a good option, allowing use of packages that use shell escapes, but you will need to audit the Tex/Latex code that goes into your format file.
  8. 您可以创建一个格式文件,其中\ write18 cs和文件创建css未绑定,并且只存在安全调用它们的宏,例如font / toc / bbl创建。这意味着您必须决定您的客户具有哪些功能:他们无法*选择导入的软件包,但必须使用您对其进行的选择。根据您所考虑的“模板”类型,这可能是一个很好的选择,允许使用使用shell转义的包,但您需要审核进入格式文件的Tex / Latex代码。

Postscript

There's a TUGBoat article, Server side PDF generation based on LATEX templates, addressing another take on the question to the one I have taken, namely generating PDFs from form input using Latex.

有一篇TUGBoat文章,基于LATEX模板的服务器端PDF生成,解决了我已经采取的问题的另一个问题,即使用Latex从表单输入生成PDF。

#4


2  

According to http://www.tug.org/tutorials/latex2e/Special_Characters.html the special characters in latex are # $ % & ~ _ ^ \ { }. Most can be escaped with a simple backslash but _ ^ and \ need special treatment.

根据http://www.tug.org/tutorials/latex2e/Special_Characters.html,乳胶中的特殊字符是#$%&~_ ^ \ {}。大多数可以使用简单的反斜杠进行转义,但_ ^和\需要特殊处理。

For caret use \^{} (or \textasciicircum), for tilde use \~{} (or \textasciitilde) and for backslash use \textbackslash

对于插入符使用\ ^ {}(或\ textasciicircum),使用\〜{}(或\ textasciitilde)和反斜杠使用\ textbackslash

If you want the user input to appear as typewriter text, there is also the \verb command which can be used like \verb+asdf$$&\~^+, the + can be any character but can't be in the text.

如果您希望用户输入显示为打字机文本,还有\ verb命令可以像\ verb + asdf $$&\〜^ +一样使用,+可以是任何字符但不能在文本中。

#5


0  

You'd probably want to make sure that your \write18 is disabled.

你可能想确保你的\ write18被禁用。

See http://www.fceia.unr.edu.ar/lcc/cdrom/Instalaciones/LaTex/MiKTex/doc/ch04s08.html and http://www.texdev.net/2009/10/06/what-does-write18-mean/

见http://www.fceia.unr.edu.ar/lcc/cdrom/Instalaciones/LaTex/MiKTex/doc/ch04s08.html和http://www.texdev.net/2009/10/06/what-does -write18均值/