如何使用加权函数对多个字段的搜索结果进行排序?

时间:2022-09-27 03:12:48

I have a Lucene index where every document has several fields which contain numeric values. Now I would like to sort the search result on a weighted sum of this field. For example:

我有一个Lucene索引,其中每个文档都有几个包含数值的字段。现在我想根据该字段的加权和对搜索结果进行排序。例如:

field1=100
field2=002
field3=014

And the weighting function looks like:

加权函数看起来像:

f(d) = field1 * 0.5 + field2 * 1.4 + field3 * 1.8

The results should be ordered by f(d) where d represents the document. The sorting function should be non-static and could differ from search to search because the constant factors are influenced by the user who performs the search.

结果应按f(d)排序,其中d代表文件。排序功能应该是非静态的,并且可能因搜索到搜索而不同,因为常量因素受执行搜索的用户的影响。

Has anyone an idea how to solve this or maybe an idea how to accomplish this goal in another way?

有谁知道如何解决这个问题,或者想知道如何以另一种方式实现这一目标?

4 个解决方案

#1


You could try implementing a custom ScoreDocComparator. For example:

您可以尝试实现自定义ScoreDocComparator。例如:

public class ScaledScoreDocComparator implements ScoreDocComparator {

    private int[][] values;
    private float[] scalars;

    public ScaledScoreDocComparator(IndexReader reader, String[] fields, float[] scalars) throws IOException {
        this.scalars = scalars;
        this.values = new int[fields.length][];
        for (int i = 0; i < values.length; i++) {
            this.values[i] = FieldCache.DEFAULT.getInts(reader, fields[i]);
        }
    }

    protected float score(ScoreDoc scoreDoc) {
        int doc = scoreDoc.doc;

        float score = 0;
        for (int i = 0; i < values.length; i++) {
            int value = values[i][doc];
            float scalar = scalars[i];
            score += (value * scalar);
        }
        return score;
    }

    @Override
    public int compare(ScoreDoc i, ScoreDoc j) {
        float iScore = score(i);
        float jScore = score(j);
        return Float.compare(iScore, jScore);
    }

    @Override
    public int sortType() {
        return SortField.CUSTOM;
    }

    @Override
    public Comparable<?> sortValue(ScoreDoc i) {
        float score = score(i);
        return Float.valueOf(score);
    }

}

Here is an example of ScaledScoreDocComparator in action. I believe it works in my test, but I encourage you to prove it against your data.

以下是ScaledScoreDocComparator的实例。我相信它适用于我的测试,但我鼓励您根据您的数据证明它。

final String[] fields = new String[]{ "field1", "field2", "field3" };
final float[] scalars = new float[]{ 0.5f, 1.4f, 1.8f };

Sort sort = new Sort(
    new SortField(
        "",
        new SortComparatorSource() {
            public ScoreDocComparator newComparator(IndexReader reader, String fieldName) throws IOException {
                return new ScaledScoreDocComparator(reader, fields, scalars);
            }
        }
    )
);

IndexSearcher indexSearcher = ...;
Query query = ...;
Filter filter = ...; // can be null
int nDocs = 100;

TopFieldDocs topFieldDocs = indexSearcher.search(query, filter, nDocs, sort);
ScoreDoc[] scoreDocs = topFieldDocs.scoreDocs;

Bonus!

It appears that the Lucene developers are deprecating the ScoreDocComparator interface (it's currently deprecated in the Subversion repository). Here is an example of the ScaledScoreDocComparator modified to adhere to ScoreDocComparator's successor, FieldComparator:

似乎Lucene开发人员正在弃用ScoreDocComparator接口(它目前在Subversion存储库中已弃用)。以下是ScaledScoreDocComparator的一个示例,其修改为遵循ScoreDocComparator的后继者FieldComparator:

public class ScaledComparator extends FieldComparator {

    private String[] fields;
    private float[] scalars;
    private int[][] slotValues;
    private int[][] currentReaderValues;
    private int bottomSlot;

    public ScaledComparator(int numHits, String[] fields, float[] scalars) {
        this.fields = fields;
        this.scalars = scalars;

        this.slotValues = new int[this.fields.length][];
        for (int fieldIndex = 0; fieldIndex < this.fields.length; fieldIndex++) {
            this.slotValues[fieldIndex] = new int[numHits];
        }

        this.currentReaderValues = new int[this.fields.length][];
    }

    protected float score(int[][] values, int secondaryIndex) {
        float score = 0;

        for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
            int value = values[fieldIndex][secondaryIndex];
            float scalar = scalars[fieldIndex];
            score += (value * scalar);
        }

        return score;
    }

    protected float scoreSlot(int slot) {
        return score(slotValues, slot);
    }

    protected float scoreDoc(int doc) {
        return score(currentReaderValues, doc);
    }

    @Override
    public int compare(int slot1, int slot2) {
        float score1 = scoreSlot(slot1);
        float score2 = scoreSlot(slot2);
        return Float.compare(score1, score2);
    }

    @Override
    public int compareBottom(int doc) throws IOException {
        float bottomScore = scoreSlot(bottomSlot);
        float docScore = scoreDoc(doc);
        return Float.compare(bottomScore, docScore);
    }

    @Override
    public void copy(int slot, int doc) throws IOException {
        for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
            slotValues[fieldIndex][slot] = currentReaderValues[fieldIndex][doc];
        }
    }

    @Override
    public void setBottom(int slot) {
        bottomSlot = slot;
    }

    @Override
    public void setNextReader(IndexReader reader, int docBase, int numSlotsFull) throws IOException {
        for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
            String field = fields[fieldIndex];
            currentReaderValues[fieldIndex] = FieldCache.DEFAULT.getInts(reader, field);
        }
    }

    @Override
    public int sortType() {
        return SortField.CUSTOM;
    }

    @Override
    public Comparable<?> value(int slot) {
        float score = scoreSlot(slot);
        return Float.valueOf(score);
    }

}

Using this new class is very similar to the original, except that the definition of the sort object is a bit different:

使用这个新类与原始类非常相似,只是sort对象的定义有点不同:

final String[] fields = new String[]{ "field1", "field2", "field3" };
final float[] scalars = new float[]{ 0.5f, 1.4f, 1.8f };

Sort sort = new Sort(
    new SortField(
        "",
        new FieldComparatorSource() {
            public FieldComparator newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
                return new ScaledComparator(numHits, fields, scalars);
            }
        }
    )
);

#2


I'm thinking one way to do this would be to accept these as parameters to your sorting function:

我想有一种方法可以接受这些作为排序功能的参数:

number of fields, array of documents, list of weight factors(based on the number of fields)

字段数,文档数组,权重因子列表(基于字段数)

Calculate the weighing function for each document, storing the result in a separate array in the same order as the document array. Then, perform any sort you wish (quick sort would probably be best), making sure you are sorting not just the f(d) array, but the document array as well. Return the sorted documents array and you're done.

计算每个文档的称重函数,将结果以与文档数组相同的顺序存储在单独的数组中。然后,执行您希望的任何排序(快速排序可能是最好的),确保您不仅排序f(d)数组,还排序文档数组。返回已排序的文档数组,您就完成了。

#3


Implement your own similarity class and override idf(Term, Searcher) method. In this method, you can return the score as follows. if (term.field.equals("field1") {

实现您自己的相似性类并覆盖idf(Term,Searcher)方法。在此方法中,您可以按如下方式返回分数。 if(term.field.equals(“field1”){

    if (term.field.equals("field1") {
        score = 0.5 * Integer.parseInt(term.text());
    } else if (term.field.equals("field2") {
        score = 1.4 * Integer.parseInt(term.text());
    } // and so on
    return score;

When you execute the query, make sure it is on all the fields. That is query should look like

执行查询时,请确保它在所有字段上。那是查询应该是这样的

field1:term field2:term field3:term

field1:term field2:term field3:term

The final score will also add some weights based on the query normalization. But, that will not affect the relative ranking of the documents as per the equation given by you.

最终得分还将根据查询规范化添加一些权重。但是,根据您给出的等式,这不会影响文档的相对排名。

#4


Create a wrapper which holds the rating and is comparable. Something like:

创建一个包含评级并且具有可比性的包装器。就像是:

public void sort(Datum[] data) {
   Rating[] ratings = new Rating[data.length];
   for(int i=0;i<data.length;i++)
     rating[i] = new Rating(data[i]);
   Arrays.sort(rating);
   for(int i=0;i<data.length;i++)
     data[i] = rating[i].datum;
}

class Rating implements Comparable<Datum> {
   final double rating;
   final Datum datum;

   public Rating(Datum datum) {
      this.datum = datum;
      rating = datum.field1 * 0.5 + datum.field2 * 1.4 + datum.field3 * 1.8
   }

   public int compareTo(Datum d) {
      return Double.compare(rating, d.rating);
   }
}

#1


You could try implementing a custom ScoreDocComparator. For example:

您可以尝试实现自定义ScoreDocComparator。例如:

public class ScaledScoreDocComparator implements ScoreDocComparator {

    private int[][] values;
    private float[] scalars;

    public ScaledScoreDocComparator(IndexReader reader, String[] fields, float[] scalars) throws IOException {
        this.scalars = scalars;
        this.values = new int[fields.length][];
        for (int i = 0; i < values.length; i++) {
            this.values[i] = FieldCache.DEFAULT.getInts(reader, fields[i]);
        }
    }

    protected float score(ScoreDoc scoreDoc) {
        int doc = scoreDoc.doc;

        float score = 0;
        for (int i = 0; i < values.length; i++) {
            int value = values[i][doc];
            float scalar = scalars[i];
            score += (value * scalar);
        }
        return score;
    }

    @Override
    public int compare(ScoreDoc i, ScoreDoc j) {
        float iScore = score(i);
        float jScore = score(j);
        return Float.compare(iScore, jScore);
    }

    @Override
    public int sortType() {
        return SortField.CUSTOM;
    }

    @Override
    public Comparable<?> sortValue(ScoreDoc i) {
        float score = score(i);
        return Float.valueOf(score);
    }

}

Here is an example of ScaledScoreDocComparator in action. I believe it works in my test, but I encourage you to prove it against your data.

以下是ScaledScoreDocComparator的实例。我相信它适用于我的测试,但我鼓励您根据您的数据证明它。

final String[] fields = new String[]{ "field1", "field2", "field3" };
final float[] scalars = new float[]{ 0.5f, 1.4f, 1.8f };

Sort sort = new Sort(
    new SortField(
        "",
        new SortComparatorSource() {
            public ScoreDocComparator newComparator(IndexReader reader, String fieldName) throws IOException {
                return new ScaledScoreDocComparator(reader, fields, scalars);
            }
        }
    )
);

IndexSearcher indexSearcher = ...;
Query query = ...;
Filter filter = ...; // can be null
int nDocs = 100;

TopFieldDocs topFieldDocs = indexSearcher.search(query, filter, nDocs, sort);
ScoreDoc[] scoreDocs = topFieldDocs.scoreDocs;

Bonus!

It appears that the Lucene developers are deprecating the ScoreDocComparator interface (it's currently deprecated in the Subversion repository). Here is an example of the ScaledScoreDocComparator modified to adhere to ScoreDocComparator's successor, FieldComparator:

似乎Lucene开发人员正在弃用ScoreDocComparator接口(它目前在Subversion存储库中已弃用)。以下是ScaledScoreDocComparator的一个示例,其修改为遵循ScoreDocComparator的后继者FieldComparator:

public class ScaledComparator extends FieldComparator {

    private String[] fields;
    private float[] scalars;
    private int[][] slotValues;
    private int[][] currentReaderValues;
    private int bottomSlot;

    public ScaledComparator(int numHits, String[] fields, float[] scalars) {
        this.fields = fields;
        this.scalars = scalars;

        this.slotValues = new int[this.fields.length][];
        for (int fieldIndex = 0; fieldIndex < this.fields.length; fieldIndex++) {
            this.slotValues[fieldIndex] = new int[numHits];
        }

        this.currentReaderValues = new int[this.fields.length][];
    }

    protected float score(int[][] values, int secondaryIndex) {
        float score = 0;

        for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
            int value = values[fieldIndex][secondaryIndex];
            float scalar = scalars[fieldIndex];
            score += (value * scalar);
        }

        return score;
    }

    protected float scoreSlot(int slot) {
        return score(slotValues, slot);
    }

    protected float scoreDoc(int doc) {
        return score(currentReaderValues, doc);
    }

    @Override
    public int compare(int slot1, int slot2) {
        float score1 = scoreSlot(slot1);
        float score2 = scoreSlot(slot2);
        return Float.compare(score1, score2);
    }

    @Override
    public int compareBottom(int doc) throws IOException {
        float bottomScore = scoreSlot(bottomSlot);
        float docScore = scoreDoc(doc);
        return Float.compare(bottomScore, docScore);
    }

    @Override
    public void copy(int slot, int doc) throws IOException {
        for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
            slotValues[fieldIndex][slot] = currentReaderValues[fieldIndex][doc];
        }
    }

    @Override
    public void setBottom(int slot) {
        bottomSlot = slot;
    }

    @Override
    public void setNextReader(IndexReader reader, int docBase, int numSlotsFull) throws IOException {
        for (int fieldIndex = 0; fieldIndex < fields.length; fieldIndex++) {
            String field = fields[fieldIndex];
            currentReaderValues[fieldIndex] = FieldCache.DEFAULT.getInts(reader, field);
        }
    }

    @Override
    public int sortType() {
        return SortField.CUSTOM;
    }

    @Override
    public Comparable<?> value(int slot) {
        float score = scoreSlot(slot);
        return Float.valueOf(score);
    }

}

Using this new class is very similar to the original, except that the definition of the sort object is a bit different:

使用这个新类与原始类非常相似,只是sort对象的定义有点不同:

final String[] fields = new String[]{ "field1", "field2", "field3" };
final float[] scalars = new float[]{ 0.5f, 1.4f, 1.8f };

Sort sort = new Sort(
    new SortField(
        "",
        new FieldComparatorSource() {
            public FieldComparator newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
                return new ScaledComparator(numHits, fields, scalars);
            }
        }
    )
);

#2


I'm thinking one way to do this would be to accept these as parameters to your sorting function:

我想有一种方法可以接受这些作为排序功能的参数:

number of fields, array of documents, list of weight factors(based on the number of fields)

字段数,文档数组,权重因子列表(基于字段数)

Calculate the weighing function for each document, storing the result in a separate array in the same order as the document array. Then, perform any sort you wish (quick sort would probably be best), making sure you are sorting not just the f(d) array, but the document array as well. Return the sorted documents array and you're done.

计算每个文档的称重函数,将结果以与文档数组相同的顺序存储在单独的数组中。然后,执行您希望的任何排序(快速排序可能是最好的),确保您不仅排序f(d)数组,还排序文档数组。返回已排序的文档数组,您就完成了。

#3


Implement your own similarity class and override idf(Term, Searcher) method. In this method, you can return the score as follows. if (term.field.equals("field1") {

实现您自己的相似性类并覆盖idf(Term,Searcher)方法。在此方法中,您可以按如下方式返回分数。 if(term.field.equals(“field1”){

    if (term.field.equals("field1") {
        score = 0.5 * Integer.parseInt(term.text());
    } else if (term.field.equals("field2") {
        score = 1.4 * Integer.parseInt(term.text());
    } // and so on
    return score;

When you execute the query, make sure it is on all the fields. That is query should look like

执行查询时,请确保它在所有字段上。那是查询应该是这样的

field1:term field2:term field3:term

field1:term field2:term field3:term

The final score will also add some weights based on the query normalization. But, that will not affect the relative ranking of the documents as per the equation given by you.

最终得分还将根据查询规范化添加一些权重。但是,根据您给出的等式,这不会影响文档的相对排名。

#4


Create a wrapper which holds the rating and is comparable. Something like:

创建一个包含评级并且具有可比性的包装器。就像是:

public void sort(Datum[] data) {
   Rating[] ratings = new Rating[data.length];
   for(int i=0;i<data.length;i++)
     rating[i] = new Rating(data[i]);
   Arrays.sort(rating);
   for(int i=0;i<data.length;i++)
     data[i] = rating[i].datum;
}

class Rating implements Comparable<Datum> {
   final double rating;
   final Datum datum;

   public Rating(Datum datum) {
      this.datum = datum;
      rating = datum.field1 * 0.5 + datum.field2 * 1.4 + datum.field3 * 1.8
   }

   public int compareTo(Datum d) {
      return Double.compare(rating, d.rating);
   }
}