一个人如何能够转换正常的引号(例如。“,”)进入乳胶/TeX报价(即”,“”)

时间:2023-02-04 06:07:49

Given a document written with normal quotes, e.g.

给出一个用普通引号写的文档。

Ben said "buttons, dear sir".
I replied "Did you say 'buttons'?" to him.

What ways can one turn these sort of things into LaTeX quotes, with the appropriate semantics. i.e.

有哪些方法可以将这些东西转换成乳胶引用,并使用适当的语义。即。

Ben said ``buttons, dear sir''.
I replied ``Did you say `buttons'?'' to him.

So that LaTeX produces:

所以乳胶生产:

Ben said “buttons, dear sir”.
I replied “Did you say ‘buttons’?”

My first thought is to turn to a regex. However, I'm not getting any hits from Google or the regex libraries for "LaTeX quotes regular expression", and of course "TeX quotes regular expression" seems to return too many.

我的第一个想法是求助于regex。但是,我没有从谷歌或regex库中得到任何“LaTeX正则表达式”,当然,“TeX quotes regular expression”似乎返回的内容太多了。

Thank you.

谢谢你!

8 个解决方案

#1


4  

In general, this problem is harder than it looks.

总的来说,这个问题比看起来要难。

The simplest cases can be treated with regular expressions, but for more general situations you will almost certainly need to build a recursive parser: regular expression will only work if there is no nesting.

最简单的情况可以用正则表达式来处理,但是对于更一般的情况,您几乎肯定需要构建一个递归解析器:只有在没有嵌套的情况下,正则表达式才能工作。

The big problem is going to be associated with identifying single "'"s that are not paired---as is contractions (the "'" in "don't" should not be changed, and should not be paired).

最大的问题将与识别没有配对的单个“' ' s”联系在一起——就像收缩一样(“不”中的“' s”不应该被改变,也不应该被配对)。


Lets see if we can write a usable EBNF description:

让我们看看能否写出一个有用的EBNF描述:

input:       text+
text:        uquote|squote|dquote
squote       "'" text "'"
dquote       """ text """
uquote:      [contraction|.]+
contraction: [A-Za-z]+ "'" [A-Za-z]+

which is limited to contractions that have the "'" in the middle of the word. All the associated action will just echo the input, except that the squote and dquote terms replace the quotes as appropriate.

它仅限于在单词中间有“'”的收缩。除了squote和dquote项适当地替换引号外,所有相关的操作都只会回显输入。


I used regular expressions followed by human fix-ups for a fairly simple one-off, but that would be labor intensive for on-going work.

我使用了常规的表达方式,然后是人工操作,这是一个相当简单的一次性行为,但这对于正在进行的工作来说是劳动密集型的。

#2


3  

I want to take the opportunity to point to XƎTEX which is shipped with the (highly recommendend!) TeX Live distribution.

我想借此机会点XƎTEX附带的(高度recommendend !)特克斯居住分布。

Amongst other things, XƎTEX directly supports Unicode. In your case this means that you don’t have to deal with these (sometimes tedious) replacement characters any more: instead of using ''´´ you can directly use “” in your LATEX code.

在其他事情上,XƎTEX直接支持Unicode。在你的情况中这意味着你不需要处理这些(有时冗长)替换字符:不使用“´´可以直接使用“”在乳胶代码中。

IMHO, that’s a big and important step. TEX is a great typesetting system but it’s lacking support of modern features such as Unicode has made many tasks arduous.

哦,这是重要的一步。TEX是一个很棒的排版系统,但它缺乏对现代功能(如Unicode)的支持,这使得很多任务变得艰巨。

#3


2  

Here is the python regex that I use for my Latex documents:

下面是我用于我的Latex文档的python regex:

'([ \w-]+)'", " `\\1'

There is a python script that applies the regex on a latex file (here). Works most of the time. Happy typesetting! :)

有一个python脚本将regex应用到乳胶文件(这里)。大多数时候都可以。排版快乐!:)

#4


1  

Here are some Perl regular expression substitutions that might be good enough for what you want to do.

下面是一些Perl正则表达式替换,它们可能足以满足您的需求。

s/"(\w)/``$1/g;
s/'(\w)/`$1/g;
s/([\w\.?!])"/$1''/g;

The code assumes that a single or double quote followed by an alphanumeric character begins a quote. Also, it assumes that a double quote following an alphanumeric character or punctuation mark ends a quote. These assumptions are probably true most of the time but there may be exceptions.

代码假设一个单引号或双引号后面跟着一个字母数字字符以引号开始。另外,它假定在一个字母数字字符或标点符号之后的双引号结束了引号。这些假设在大多数时候可能是正确的,但也有例外。

#5


1  

Thanks for the input - helpful and appreciated.

感谢您的输入——帮助和感谢。

I've also come across this, from CPAN's Latex::Encode.pm:

我还从CPAN乳胶中发现了这个:Encode.pm:

    # A single or double quote before a word character, preceded
    # by start of line, whitespace or punctuation gets converted
    # to "`" or "``" respectively.

    $text =~ s{ ( ^ | [\s\p{IsPunct}] )( ['"] ) (?= \w ) }
              { $2 eq '"' ? "$1``" : "$1`" }mgxe;

    # A double quote preceded by a word or punctuation character
    # and followed by whitespace or end of line gets converted to
    # "''".  (Final single quotes are represented by themselves so
    # we don't need to worry about those.)

    $text =~ s{ (?<= [\w\p{IsPunct}] ) " (?= \s | $ ) }
              { "''" }mgxe

#6


0  

Do not use regular expressions for this kind of task!

不要对这种任务使用正则表达式!

Maybe you can get some inspiration from SmartyPants?

也许你可以从smartypant得到一些灵感?

#7


0  

I was looking for an answer to this problem and decided to learn a little lisp today. I put this lisp function in my ~/.emacs file and then run with M-x tex-set-quotes:

我一直在寻找这个问题的答案,今天决定学一点lisp。我把这个lisp函数放在~/中。emacs文件,然后运行M-x文本集引用:

(defun tex-set-quotes ()  
  (interactive)  
  (latex-mode)  
  (while (search-forward "\"" nil t)  
   (replace-match "" nil t)  
   (tex-insert-quote nil)))

#8


-3  

Simply, use `` for opening quotations and '' for closing

简单地说,用“开价”和“关闭”。

#1


4  

In general, this problem is harder than it looks.

总的来说,这个问题比看起来要难。

The simplest cases can be treated with regular expressions, but for more general situations you will almost certainly need to build a recursive parser: regular expression will only work if there is no nesting.

最简单的情况可以用正则表达式来处理,但是对于更一般的情况,您几乎肯定需要构建一个递归解析器:只有在没有嵌套的情况下,正则表达式才能工作。

The big problem is going to be associated with identifying single "'"s that are not paired---as is contractions (the "'" in "don't" should not be changed, and should not be paired).

最大的问题将与识别没有配对的单个“' ' s”联系在一起——就像收缩一样(“不”中的“' s”不应该被改变,也不应该被配对)。


Lets see if we can write a usable EBNF description:

让我们看看能否写出一个有用的EBNF描述:

input:       text+
text:        uquote|squote|dquote
squote       "'" text "'"
dquote       """ text """
uquote:      [contraction|.]+
contraction: [A-Za-z]+ "'" [A-Za-z]+

which is limited to contractions that have the "'" in the middle of the word. All the associated action will just echo the input, except that the squote and dquote terms replace the quotes as appropriate.

它仅限于在单词中间有“'”的收缩。除了squote和dquote项适当地替换引号外,所有相关的操作都只会回显输入。


I used regular expressions followed by human fix-ups for a fairly simple one-off, but that would be labor intensive for on-going work.

我使用了常规的表达方式,然后是人工操作,这是一个相当简单的一次性行为,但这对于正在进行的工作来说是劳动密集型的。

#2


3  

I want to take the opportunity to point to XƎTEX which is shipped with the (highly recommendend!) TeX Live distribution.

我想借此机会点XƎTEX附带的(高度recommendend !)特克斯居住分布。

Amongst other things, XƎTEX directly supports Unicode. In your case this means that you don’t have to deal with these (sometimes tedious) replacement characters any more: instead of using ''´´ you can directly use “” in your LATEX code.

在其他事情上,XƎTEX直接支持Unicode。在你的情况中这意味着你不需要处理这些(有时冗长)替换字符:不使用“´´可以直接使用“”在乳胶代码中。

IMHO, that’s a big and important step. TEX is a great typesetting system but it’s lacking support of modern features such as Unicode has made many tasks arduous.

哦,这是重要的一步。TEX是一个很棒的排版系统,但它缺乏对现代功能(如Unicode)的支持,这使得很多任务变得艰巨。

#3


2  

Here is the python regex that I use for my Latex documents:

下面是我用于我的Latex文档的python regex:

'([ \w-]+)'", " `\\1'

There is a python script that applies the regex on a latex file (here). Works most of the time. Happy typesetting! :)

有一个python脚本将regex应用到乳胶文件(这里)。大多数时候都可以。排版快乐!:)

#4


1  

Here are some Perl regular expression substitutions that might be good enough for what you want to do.

下面是一些Perl正则表达式替换,它们可能足以满足您的需求。

s/"(\w)/``$1/g;
s/'(\w)/`$1/g;
s/([\w\.?!])"/$1''/g;

The code assumes that a single or double quote followed by an alphanumeric character begins a quote. Also, it assumes that a double quote following an alphanumeric character or punctuation mark ends a quote. These assumptions are probably true most of the time but there may be exceptions.

代码假设一个单引号或双引号后面跟着一个字母数字字符以引号开始。另外,它假定在一个字母数字字符或标点符号之后的双引号结束了引号。这些假设在大多数时候可能是正确的,但也有例外。

#5


1  

Thanks for the input - helpful and appreciated.

感谢您的输入——帮助和感谢。

I've also come across this, from CPAN's Latex::Encode.pm:

我还从CPAN乳胶中发现了这个:Encode.pm:

    # A single or double quote before a word character, preceded
    # by start of line, whitespace or punctuation gets converted
    # to "`" or "``" respectively.

    $text =~ s{ ( ^ | [\s\p{IsPunct}] )( ['"] ) (?= \w ) }
              { $2 eq '"' ? "$1``" : "$1`" }mgxe;

    # A double quote preceded by a word or punctuation character
    # and followed by whitespace or end of line gets converted to
    # "''".  (Final single quotes are represented by themselves so
    # we don't need to worry about those.)

    $text =~ s{ (?<= [\w\p{IsPunct}] ) " (?= \s | $ ) }
              { "''" }mgxe

#6


0  

Do not use regular expressions for this kind of task!

不要对这种任务使用正则表达式!

Maybe you can get some inspiration from SmartyPants?

也许你可以从smartypant得到一些灵感?

#7


0  

I was looking for an answer to this problem and decided to learn a little lisp today. I put this lisp function in my ~/.emacs file and then run with M-x tex-set-quotes:

我一直在寻找这个问题的答案,今天决定学一点lisp。我把这个lisp函数放在~/中。emacs文件,然后运行M-x文本集引用:

(defun tex-set-quotes ()  
  (interactive)  
  (latex-mode)  
  (while (search-forward "\"" nil t)  
   (replace-match "" nil t)  
   (tex-insert-quote nil)))

#8


-3  

Simply, use `` for opening quotations and '' for closing

简单地说,用“开价”和“关闭”。