I would like to use " as a token seperatior for the input by using PatternTokenizer. My setting in schema.xml is of the following
我想通过使用PatternTokenizer使用“作为输入的令牌seperatior。我在schema.xml中的设置如下
<tokenizer class="solr.PatternTokenizerFactory" pattern="[\s\.,!(){\[\]:}\"]+"/>
But this one failed since the second " is mistook for the closing of pattern (Solr cannot start with it). How can I achieve my desired output?
但是这个失败了,因为第二个“错误地关闭模式(Solr不能从它开始)。我怎样才能达到我想要的输出?
1 个解决方案
#1
2
You need to update the line to
您需要将行更新为
pattern="[\s.,!(){\[\]:}"]+"
The literal quote must be replaced with the XML entity.
必须用XML实体替换文字引号。
As an alternative, you may use \u0022
that will be correctly parsed by the regex engine as a literal double quote.
作为替代方案,您可以使用\ u0022将正则表达式引擎正确解析为文字双引号。
#1
2
You need to update the line to
您需要将行更新为
pattern="[\s.,!(){\[\]:}"]+"
The literal quote must be replaced with the XML entity.
必须用XML实体替换文字引号。
As an alternative, you may use \u0022
that will be correctly parsed by the regex engine as a literal double quote.
作为替代方案,您可以使用\ u0022将正则表达式引擎正确解析为文字双引号。