我应该使用哪种分层模型?邻接,嵌套或枚举?

时间:2022-09-09 16:39:52

I have a table which contains a location of all geographical locations in the world and their relationships.

我有一张桌子,其中包含世界上所有地理位置的位置及其关系。

Here is a example that shows the hierarchy. You will see that the data is actually stored as all three

以下是显示层次结构的示例。您将看到数据实际存储为全部三个

  • Enumerated Path
  • Adjacency list
  • Nested Set

The data obviously never changes either. Below is an example of direct ancestors of the location Brighton in England which has a woeid of 13911.

数据显然也从未改变过。下面是英格兰布莱顿位置的直接祖先的例子,其中有一个13911的寂寞。

Table: geoplanet_places (Has 5.6million rows) 我应该使用哪种分层模型?邻接,嵌套或枚举? Large Image: http://tinyurl.com/68q4ndx

表:geoplanet_places(已有560万行)大图:http://tinyurl.com/68q4ndx

I then have another table called entities. This table stores my items which I would like to map to a geographical location. I store some basic information but most important I store the woeid which is a foreign key from geoplanet_places. 我应该使用哪种分层模型?邻接,嵌套或枚举?

然后我有另一个名为实体的表。此表存储我想要映射到地理位置的项目。我存储了一些基本信息,但最重要的是我存储了来自geoplanet_places的外键woeid。

Eventually the entities table will contain several thousand entities. And I would like a way to be able to return a full tree of all of the nodes which contain entities.

最终,实体表将包含数千个实体。我想要一种能够返回包含实体的所有节点的完整树的方法。

I plan on creating something to facilitate the filtering and searching of entities based on their geographical location and be able to discover how many entities can be found on that particular node.

我计划创建一些东西,以便根据实体的地理位置过滤和搜索实体,并能够发现在该特定节点上可以找到多少个实体。

So if I only have one entity in my entities table, I might have something like this

所以如果我的实体表中只有一个实体,我可能会有这样的东西

`Earth (1)

United Kingdom (1)

英国(1)

England (1)

East Sussex (1)

东萨塞克斯郡(1)

Brighton and Hove City (1)

布莱顿 - 霍夫市(1)

Brighton (1)`

Lets then say that I have another entity which is located in Devon, then it would show something like:

让我们说我有另一个位于德文郡的实体,那么它会显示如下:

Earth (2)

United Kingom (2)

United Kingom(2)

England (2)

Devon (1)

East Sussex (1) ... etc

东萨塞克斯(1)......等

The (Counts) which will say how many entities are "inside" of each geographical location do not need to be live. I can live with generating my object every hour and caching it.

(计数)将说明每个地理位置“内部”有多少实体不需要是活的。我可以忍受每小时生成我的对象并缓存它。

The aim, is to be able to create an interface which might start out showing only the Countries which have entities..

目标是,能够创建一个界面,可能开始只显示有实体的国家。

So like

Argentina (1021), Chile (291), ..., United States (32,103), United Kingdom (12,338)

阿根廷(1021),智利(291),......,美国(32,103),英国(12,338)

Then the user will click on a location, such as United Kindom, and will then be given all of the immediate child nodes which are descendants of United Kingdom AND have an entity in them.

然后,用户将点击某个位置,例如United Kindom,然后将获得所有直接子节点,这些子节点是英国的后代并且在其中具有实体。

If there are 32 Counties in United Kindgdom, but only 23 of them eventually when you drill down have entities stored in them, then I don't want to display the other 9. It is only locations.

如果United Kindgdom中有32个县,但最终只有23个当你向下钻取时存在实体,那么我不想显示其他9.它只是位置。

This site aptly demonstrates the functionality that I wish to achieve: http://www.homeaway.com/vacation-rentals/europe/r5 我应该使用哪种分层模型?邻接,嵌套或枚举?

该网站恰当地展示了我希望实现的功能:http://www.homeaway.com/vacation-rentals/europe/r5

How do you recommend that I manage such a data structure?

您如何建议我管理这样的数据结构?

Things I am using.

我正在使用的东西。

  • PHP
  • MySQL
  • Solr

I plan on having the Drill downs be as rapid as possible. I want to create an AJAX interface that will be seemless for searching.

我计划尽可能快地完成演练。我想创建一个无缝的AJAX界面进行搜索。

I would also be interested to know which columns you would recommend indexing on.

我也有兴趣知道你建议索引哪些列。

2 个解决方案

#1


9  

Typically, there are three kinds of queries in the hierarchies which cause troubles:

通常,层次结构中有三种查询会导致麻烦:

  1. Return all ancestors
  2. 归还所有祖先

  3. Return all descendants
  4. 归还所有后代

  5. Return all children (immediate descendants).
  6. 归还所有孩子(直系后代)。

Here's a little table which shows the performance of different methods in MySQL:

这是一个小表,显示了MySQL中不同方法的性能:

                        Ancestors  Descendants  Children        Maintainability InnoDB
Adjacency list          Good       Decent       Excellent       Easy            Yes
Nested sets (classic)   Poor       Excellent    Poor/Excellent  Very hard       Yes
Nested sets (spatial)   Excellent  Very good    Poor/Excellent  Very hard       No
Materialized path       Excellent  Very good    Poor/Excellent  Hard            Yes

In children, poor/excellent means that the answer depends on whether you are mixing the method with adjacency list, i. e. storing the parentID in each record.

在儿童中,差/优秀意味着答案取决于您是否将该方法与邻接列表混合,即。即在每条记录中存储parentID。

For your task, you need all three queries:

对于您的任务,您需要所有三个查询:

  1. All ancestors to show the Earth / UK / Devon thing
  2. 所有祖先都展示地球/英国/德文的东西

  3. All children to show "Destinations in Europe" (the items)
  4. 所有孩子都要出现“欧洲目的地”(项目)

  5. All descendants to show "Destinations in Europe" (the counts)
  6. 所有后代都显示“欧洲目的地”(计数)

I would go for materialized paths, since this kind of hierarchy rarely changes (only in case of war, revolt etc).

我会选择物化路径,因为这种层次很少改变(只有在战争,反抗等情况下)。

Create a varchar column called path, index it and fill it with the value like this:

创建一个名为path的varchar列,对其进行索引并使用如下值填充它:

1:234:6345:45454:

where the numbers are primary keys of the appropriate parents, in correct order (1 for Europe, 234 for UK etc.)

其中数字是适当父母的主键,顺序正确(欧洲为1,英国为234等)

You will also need a table called levels to keep numbers from 1 to 20 (or whatever maximum nesting level you want).

您还需要一个名为levels的表来保持数字从1到20(或者您想要的最大嵌套级别)。

To select all ancestors:

要选择所有祖先:

SELECT   pa.*
FROM     places p
JOIN     levels l
ON       SUBSTRING_INDEX(p.path, ':', l.level) <> p.path
JOIN     places pa
ON       pa.path = CONCAT(SUBSTRING_INDEX(p.path, ':', l.level), ':') 
WHERE    p.id = @id_of_place_in_devon

To select all children and counts of places within them:

要选择所有孩子和其中的地点数:

SELECT  pc.*, COUNT(pp.id)
FROM    places p
JOIN    places pc
ON      pc.parentId = p.id
JOIN    places pp
ON      pp.path BETWEEN pc.path AND CONCAT(pc.path, ':')
        AND pp.id NOT IN
        (
        SELECT  parentId
        FROM    places
        )
WHERE   p.id = @id_of_europe
GROUP BY
        pc.id

#2


0  

This is the query that I came up. It is an adaption of what you suggestion Quassnoi.

这是我提出的查询。这是你的建议Quassnoi的适应。

SELECT   pa.*,  level, SUBSTRING_INDEX(p.ancestry, '/', l.level),  p.*
FROM     geoplanet_places p
JOIN     levels l
ON       SUBSTRING_INDEX(p.ancestry, '/', l.level) <> p.ancestry 
JOIN     geoplanet_places  pa
ON       pa.woeid =  SUBSTRING_INDEX( SUBSTRING_INDEX(p.ancestry, '/', l.level),'/',-1)
WHERE    p.woeid = "13911"

This returns all of the parents of Brighton.

这将归还布莱顿的所有父母。

The problem with your query was that it wasn't return the path to parents, but instead any node which shared the same path.

您的查询的问题是它没有返回父项的路径,而是返回共享相同路径的任何节点。

SELECT     pa.*, GROUP_CONCAT(pa.name ORDER BY pa.lft asc),group_concat( pa.lft  ), pa.ancestry
                                            FROM     geo_places p
                                            JOIN     levels l
                                            ON       SUBSTRING_INDEX(CONCAT(p.ancestry, p.woeid,'/'), '/', l.level) <> p.ancestry 
                                            JOIN     geo_places  pa
                                            ON       pa.woeid =  SUBSTRING_INDEX( SUBSTRING_INDEX(CONCAT(p.ancestry, p.woeid,'/'), '/', l.level),'/',-1)
                                            WHERE    p.woeid IN ("12767488","12832668","12844837","131390","131391","12846428","24534461")
                                            GROUP BY p.woeid

#1


9  

Typically, there are three kinds of queries in the hierarchies which cause troubles:

通常,层次结构中有三种查询会导致麻烦:

  1. Return all ancestors
  2. 归还所有祖先

  3. Return all descendants
  4. 归还所有后代

  5. Return all children (immediate descendants).
  6. 归还所有孩子(直系后代)。

Here's a little table which shows the performance of different methods in MySQL:

这是一个小表,显示了MySQL中不同方法的性能:

                        Ancestors  Descendants  Children        Maintainability InnoDB
Adjacency list          Good       Decent       Excellent       Easy            Yes
Nested sets (classic)   Poor       Excellent    Poor/Excellent  Very hard       Yes
Nested sets (spatial)   Excellent  Very good    Poor/Excellent  Very hard       No
Materialized path       Excellent  Very good    Poor/Excellent  Hard            Yes

In children, poor/excellent means that the answer depends on whether you are mixing the method with adjacency list, i. e. storing the parentID in each record.

在儿童中,差/优秀意味着答案取决于您是否将该方法与邻接列表混合,即。即在每条记录中存储parentID。

For your task, you need all three queries:

对于您的任务,您需要所有三个查询:

  1. All ancestors to show the Earth / UK / Devon thing
  2. 所有祖先都展示地球/英国/德文的东西

  3. All children to show "Destinations in Europe" (the items)
  4. 所有孩子都要出现“欧洲目的地”(项目)

  5. All descendants to show "Destinations in Europe" (the counts)
  6. 所有后代都显示“欧洲目的地”(计数)

I would go for materialized paths, since this kind of hierarchy rarely changes (only in case of war, revolt etc).

我会选择物化路径,因为这种层次很少改变(只有在战争,反抗等情况下)。

Create a varchar column called path, index it and fill it with the value like this:

创建一个名为path的varchar列,对其进行索引并使用如下值填充它:

1:234:6345:45454:

where the numbers are primary keys of the appropriate parents, in correct order (1 for Europe, 234 for UK etc.)

其中数字是适当父母的主键,顺序正确(欧洲为1,英国为234等)

You will also need a table called levels to keep numbers from 1 to 20 (or whatever maximum nesting level you want).

您还需要一个名为levels的表来保持数字从1到20(或者您想要的最大嵌套级别)。

To select all ancestors:

要选择所有祖先:

SELECT   pa.*
FROM     places p
JOIN     levels l
ON       SUBSTRING_INDEX(p.path, ':', l.level) <> p.path
JOIN     places pa
ON       pa.path = CONCAT(SUBSTRING_INDEX(p.path, ':', l.level), ':') 
WHERE    p.id = @id_of_place_in_devon

To select all children and counts of places within them:

要选择所有孩子和其中的地点数:

SELECT  pc.*, COUNT(pp.id)
FROM    places p
JOIN    places pc
ON      pc.parentId = p.id
JOIN    places pp
ON      pp.path BETWEEN pc.path AND CONCAT(pc.path, ':')
        AND pp.id NOT IN
        (
        SELECT  parentId
        FROM    places
        )
WHERE   p.id = @id_of_europe
GROUP BY
        pc.id

#2


0  

This is the query that I came up. It is an adaption of what you suggestion Quassnoi.

这是我提出的查询。这是你的建议Quassnoi的适应。

SELECT   pa.*,  level, SUBSTRING_INDEX(p.ancestry, '/', l.level),  p.*
FROM     geoplanet_places p
JOIN     levels l
ON       SUBSTRING_INDEX(p.ancestry, '/', l.level) <> p.ancestry 
JOIN     geoplanet_places  pa
ON       pa.woeid =  SUBSTRING_INDEX( SUBSTRING_INDEX(p.ancestry, '/', l.level),'/',-1)
WHERE    p.woeid = "13911"

This returns all of the parents of Brighton.

这将归还布莱顿的所有父母。

The problem with your query was that it wasn't return the path to parents, but instead any node which shared the same path.

您的查询的问题是它没有返回父项的路径,而是返回共享相同路径的任何节点。

SELECT     pa.*, GROUP_CONCAT(pa.name ORDER BY pa.lft asc),group_concat( pa.lft  ), pa.ancestry
                                            FROM     geo_places p
                                            JOIN     levels l
                                            ON       SUBSTRING_INDEX(CONCAT(p.ancestry, p.woeid,'/'), '/', l.level) <> p.ancestry 
                                            JOIN     geo_places  pa
                                            ON       pa.woeid =  SUBSTRING_INDEX( SUBSTRING_INDEX(CONCAT(p.ancestry, p.woeid,'/'), '/', l.level),'/',-1)
                                            WHERE    p.woeid IN ("12767488","12832668","12844837","131390","131391","12846428","24534461")
                                            GROUP BY p.woeid