哪一个更好的表现?交叉加入还是新表?

时间:2021-03-04 04:15:19

I'm building a face match web application.

我正在构建一个面部匹配Web应用程序。

Note: I just found out that people don't call this type of application as a facematch application.

注意:我刚刚发现人们不会将此类应用程序称为facematch应用程序。

Here is a basic workflow.

这是一个基本的工作流程。

  1. users upload photos
  2. 用户上传照片

  3. admin either approve/deny a photo
  4. 管理员批准/拒绝照片

  5. when a user access the page, two photos are randomly selected from the database.
  6. 当用户访问该页面时,从数据库中随机选择两张照片。

  7. the user has two options
    1. choose one of the photos
    2. 选择其中一张照片

    3. skip to another match
    4. 跳到另一场比赛

  8. 用户有两个选项,选择其中一个照片跳到另一个匹配

There is one condition. Users do not see a duplicated match. If a user already played with 1 vs 2, then the user does not see 2 vs 1 again.

有一个条件。用户看不到重复的匹配。如果用户已经玩过1对2,则用户再次看不到2对1。

Let's say I have the following 4 photos

假设我有以下4张照片

table photo

id
1
2
3
4

there are 6 possible matches. Those are

有6种可能的匹配。那些是

1 vs 2
1 vs 3
1 vs 4

2 vs 3
2 vs 4

3 vs 4

in order to make those matches, I use the following cross join query.

为了进行这些匹配,我使用以下交叉连接查询。

select p1.id, p2.id from photos as p1 cross join photos as p2 where p1.id < p2.id

it works without a problem. My concern is that it would be slower as the number of matches grow.

它没有问题。我担心的是,随着比赛数量的增加,它会变慢。

I get 1999000 matches with just 2000 photos. That is such a huge number.

我只用2000张照片获得了1999000场比赛。这是一个如此巨大的数字。

so I thought about a solution and came up with creating a new table that stores all the possible matches. The rows are created when the admin approves a photo.

所以我想到了一个解决方案,并想出了一个新的表来存储所有可能的匹配。管理员批准照片时会创建行。

table matches

id1 id2
1    2
1    3
1    4
and so on

finally, my question is

最后,我的问题是

should I keep using cross join or should I create a new table 'matches'?

我应该继续使用交叉连接还是应该创建一个新表'匹配'?

which one would be better?

哪一个会更好?

any other better solutions would be appreciated!

任何其他更好的解决方案将不胜感激

1 个解决方案

#1


2  

I think in this case you'd be better off not storing all matches at all. As you've figured out, the number of matches is quadratic to the number of rows. Based on your use case, it seems it would be better to keep a table with all seen pairs per user and exclude them at the time you query for that user. This will likely be pretty sparse compared to entire space of combinations. Unless you need to store data for all combinations at the time the admin approves, there's no reason to generate them at that time.

我认为在这种情况下你最好不要存储所有的比赛。正如您所知,匹配的数量是行数的二次方。根据您的使用案例,似乎最好保留一个包含每个用户所有看到的对的表,并在您查询该用户时将其排除。与整个组合空间相比,这可能相当稀疏。除非您需要在管理员批准时存储所有组合的数据,否则当时没有理由生成它们。

#1


2  

I think in this case you'd be better off not storing all matches at all. As you've figured out, the number of matches is quadratic to the number of rows. Based on your use case, it seems it would be better to keep a table with all seen pairs per user and exclude them at the time you query for that user. This will likely be pretty sparse compared to entire space of combinations. Unless you need to store data for all combinations at the time the admin approves, there's no reason to generate them at that time.

我认为在这种情况下你最好不要存储所有的比赛。正如您所知,匹配的数量是行数的二次方。根据您的使用案例,似乎最好保留一个包含每个用户所有看到的对的表,并在您查询该用户时将其排除。与整个组合空间相比,这可能相当稀疏。除非您需要在管理员批准时存储所有组合的数据,否则当时没有理由生成它们。