您如何有效地将1行与数据库中的所有其他行进行比较

时间:2022-12-09 13:36:19

I have a database with primarily 3 tables, (ImageID, imageName), ( ImageID | Tags ) and (tagID, tagName)

我有一个主要有3个表的数据库,(ImageID,imageName),(ImageID | Tags)和(tagID,tagName)

So each image can have many tags associated to it. How would I efficiently and scale able select 1 image and find the next x images that are most similar ( have the same tags associated to it)

因此每个图像可以有许多与之关联的标签。我将如何有效和可扩展地选择1个图像并找到最相似的下一个x图像(具有与之关联的相同标签)

All done on the web using javascript, ajax and php. Thanks for any suggestions and hints on how to approach this!

所有这些都是使用javascript,ajax和php在网上完成的。感谢您提供有关如何处理此问题的任何建议和提示!

Edit:

Yes Mysql

The format was ( Table ) and ( ROW | ROW )

格式为(表)和(ROW | ROW)

IMAGEID, TAGID are primary keys

IMAGEID,TAGID是主键

So yes there is a normalized index of IMAGEIDS and TAGIDS to save room.

所以是的,有一个标准化的IMAGEIDS和TAGIDS指数来节省空间。

I am trying to get if image A has 10 of 10 tags in common with image B it would be returned higher then IMAGE C which has 6 of 10 tags in common.

我试图得到如果图像A与图像B共有10个标签中的10个,它将返回高于图像C,其*有10个标签中的6个。

Sorry for being ambiguous.I am developing the site, so i can add keys, foreign keys, etc if its impossible to do it with what i have. And it doesnt have to be done in one giant SQL statement, i just dont want to get into a o(n^2) situation by comparing my first row to every other row 1 at a time.

抱歉是模棱两可的。我正在开发网站,所以我可以添加密钥,外键等,如果它不可能用我所拥有的。并且它不必在一个巨大的SQL语句中完成,我只是不想通过一次比较我的第一行和每隔一行1来进入o(n ^ 2)情况。

2 个解决方案

#1


2  

Unfortunately this design isn't actually very scalable. Simply because you really will be comparing the tags of one image against the tags of pretty much every other image.

不幸的是,这种设计实际上并不具备可扩仅仅因为你真的要将一个图像的标签与几乎所有其他图像的标签进行比较。

It's codable, it's just not overly scalable. (100's of images? Great! Tens of thousands? You'll be able to measure the lookup speed.)

它是可编码的,它只是没有过度扩展。 (100的图像?太棒了!成千上万?你将能够测量查找速度。)

SELECT
  allImage.ImageID,
  COUNT(*)           AS commonTags
FROM
  image_tag    AS allImage
INNER JOIN
  image_tag    AS myImage
    ON allImage.TagID = myImage.TagID
WHERE
  myImage.ImageID = 123
GROUP BY
  allImage.ImageID
ORDER BY
  COUNT(*) DESC

Then use LIMIT or TOP (depending on your flavour of SQL) to pick only the first (N) images.

然后使用LIMIT或TOP(取决于您的SQL风格)仅选择第一个(N)图像。

NOTE: This assumes you don't have all the tags for an image in a string held in one field of one row. If you do, you really should normalise the data to have one (ImageID,TagID) per row,

注意:这假设您没有在一行的一个字段中保存的字符串中的图像的所有标记。如果你这样做,你真的应该规范化数据,每行有一个(ImageID,TagID),

#2


1  

I would create an index of ImageID's in the first two tables to increase the speed. Then use a simple SELECT WHERE query.

我会在前两个表中创建一个ImageID索引来提高速度。然后使用简单的SELECT WHERE查询。

#1


2  

Unfortunately this design isn't actually very scalable. Simply because you really will be comparing the tags of one image against the tags of pretty much every other image.

不幸的是,这种设计实际上并不具备可扩仅仅因为你真的要将一个图像的标签与几乎所有其他图像的标签进行比较。

It's codable, it's just not overly scalable. (100's of images? Great! Tens of thousands? You'll be able to measure the lookup speed.)

它是可编码的,它只是没有过度扩展。 (100的图像?太棒了!成千上万?你将能够测量查找速度。)

SELECT
  allImage.ImageID,
  COUNT(*)           AS commonTags
FROM
  image_tag    AS allImage
INNER JOIN
  image_tag    AS myImage
    ON allImage.TagID = myImage.TagID
WHERE
  myImage.ImageID = 123
GROUP BY
  allImage.ImageID
ORDER BY
  COUNT(*) DESC

Then use LIMIT or TOP (depending on your flavour of SQL) to pick only the first (N) images.

然后使用LIMIT或TOP(取决于您的SQL风格)仅选择第一个(N)图像。

NOTE: This assumes you don't have all the tags for an image in a string held in one field of one row. If you do, you really should normalise the data to have one (ImageID,TagID) per row,

注意:这假设您没有在一行的一个字段中保存的字符串中的图像的所有标记。如果你这样做,你真的应该规范化数据,每行有一个(ImageID,TagID),

#2


1  

I would create an index of ImageID's in the first two tables to increase the speed. Then use a simple SELECT WHERE query.

我会在前两个表中创建一个ImageID索引来提高速度。然后使用简单的SELECT WHERE查询。