Lucene RangeQuery没有适当过滤

时间:2021-07-17 05:52:24

I'm using RangeQuery to get all the documents which have amount between say 0 to 2. When i execute the query, Lucene gives me documents which have amount greater than 2 also. What am I missing here?

我正在使用RangeQuery来获取介于0到2之间的所有文档。当我执行查询时,Lucene还给了我大于2的文档。我在这里想念的是什么?

Here is my code:

这是我的代码:

Term lowerTerm = new Term("amount", minAmount);
Term upperTerm = new Term("amount", maxAmount);

RangeQuery amountQuery = new RangeQuery(lowerTerm, upperTerm, true);

finalQuery.Add(amountQuery, BooleanClause.Occur.MUST);

and here is what goes into my index:

这是我的索引中的内容:

doc.Add(new Field("amount", amount.ToString(), Field.Store.YES, Field.Index.UN_TOKENIZED, Field.TermVector.YES));

2 个解决方案

#1


UPDATE: Like @basZero said in his comment, starting with Lucene 2.9, you can add numeric fields to your documents. Just remember to use NumericRangeQuery instead of RangeQuery when you search.

更新:就像@basZero在评论中所说,从Lucene 2.9开始,您可以在文档中添加数字字段。只需记住在搜索时使用NumericRangeQuery而不是RangeQuery。

Original answer

Lucene treats numbers as words, so their order is alphabetic:

Lucene将数字视为单词,因此它们的顺序是字母的:

0
1
12
123
2
22

That means that for Lucene, 12 is between 0 and 2. If you want to do a proper numerical range, you need to index the numbers zero-padded, then do a range search of [0000 TO 0002]. (The amount of padding you need depends on the expected range of values).

这意味着对于Lucene,12介于0和2之间。如果要进行正确的数值范围,则需要对零填充的数字进行索引,然后进行[0000至0002]的范围搜索。 (您需要的填充量取决于预期的值范围)。

If you have negative numbers, just add another zero for non-negative numbers. (EDIT: WRONG WRONG WRONG. See update)

如果您有负数,只需为非负数添加另一个零。 (编辑:错误错误。请参阅更新)

If your numbers include a fraction part, leave it as is, and zero-pad the integer part only.

如果您的数字包含小数部分,请保持原样,并仅对整数部分进行零填充。

Example:

-00002.12
-00001

000000
000001
000003.1415
000022

UPDATE: Negative numbers are a bit tricky, since -1 comes before -2 alphabetically. This article gives a complete explanation about dealing with negative numbers and numbers in general in Lucene. Basically, you have to "encode" numbers into something that makes the order of the items make sense.

更新:负数有点棘手,因为-1按字母顺序排在-2之前。本文给出了关于在Lucene中处理负数和数字的完整解释。基本上,你必须将数字“编码”成一些使得项目顺序有意义的东西。

#2


I created a PHP function that convert numerics to lucene/solr range searchables.

我创建了一个PHP函数,将数字转换为lucene / solr范围的可搜索值。

0.5 is converted to 10000000000.5
-0.5 is converted to 09999999999.5

0.5转换为10000000000.5 -0.5转换为09999999999.5

function luceneNumeric($numeric)
{
    $negative = $numeric < 0;
    $numeric = $negative ? 10000000000 + $numeric : $numeric;

    $parts = explode('.', str_replace(',', '.', $numeric));

    $lucene = $negative ? 0 : 1;
    $lucene .= str_pad($parts[0], 10, '0', STR_PAD_LEFT);
    $lucene .= isset($parts[1]) ? '.' . $parts[1] : '';

    return $lucene;
}

It seems to work, hope this helps someone!

它似乎工作,希望这有助于某人!

#1


UPDATE: Like @basZero said in his comment, starting with Lucene 2.9, you can add numeric fields to your documents. Just remember to use NumericRangeQuery instead of RangeQuery when you search.

更新:就像@basZero在评论中所说,从Lucene 2.9开始,您可以在文档中添加数字字段。只需记住在搜索时使用NumericRangeQuery而不是RangeQuery。

Original answer

Lucene treats numbers as words, so their order is alphabetic:

Lucene将数字视为单词,因此它们的顺序是字母的:

0
1
12
123
2
22

That means that for Lucene, 12 is between 0 and 2. If you want to do a proper numerical range, you need to index the numbers zero-padded, then do a range search of [0000 TO 0002]. (The amount of padding you need depends on the expected range of values).

这意味着对于Lucene,12介于0和2之间。如果要进行正确的数值范围,则需要对零填充的数字进行索引,然后进行[0000至0002]的范围搜索。 (您需要的填充量取决于预期的值范围)。

If you have negative numbers, just add another zero for non-negative numbers. (EDIT: WRONG WRONG WRONG. See update)

如果您有负数,只需为非负数添加另一个零。 (编辑:错误错误。请参阅更新)

If your numbers include a fraction part, leave it as is, and zero-pad the integer part only.

如果您的数字包含小数部分,请保持原样,并仅对整数部分进行零填充。

Example:

-00002.12
-00001

000000
000001
000003.1415
000022

UPDATE: Negative numbers are a bit tricky, since -1 comes before -2 alphabetically. This article gives a complete explanation about dealing with negative numbers and numbers in general in Lucene. Basically, you have to "encode" numbers into something that makes the order of the items make sense.

更新:负数有点棘手,因为-1按字母顺序排在-2之前。本文给出了关于在Lucene中处理负数和数字的完整解释。基本上,你必须将数字“编码”成一些使得项目顺序有意义的东西。

#2


I created a PHP function that convert numerics to lucene/solr range searchables.

我创建了一个PHP函数,将数字转换为lucene / solr范围的可搜索值。

0.5 is converted to 10000000000.5
-0.5 is converted to 09999999999.5

0.5转换为10000000000.5 -0.5转换为09999999999.5

function luceneNumeric($numeric)
{
    $negative = $numeric < 0;
    $numeric = $negative ? 10000000000 + $numeric : $numeric;

    $parts = explode('.', str_replace(',', '.', $numeric));

    $lucene = $negative ? 0 : 1;
    $lucene .= str_pad($parts[0], 10, '0', STR_PAD_LEFT);
    $lucene .= isset($parts[1]) ? '.' . $parts[1] : '';

    return $lucene;
}

It seems to work, hope this helps someone!

它似乎工作,希望这有助于某人!