I have a blogging program (sorta like Twitter), and I am currently making a recently visited box, which contains 9 people who most recently visited your page.
我有一个博客程序(像Twitter一样),我现在正在做一个最近访问的盒子,其中有9个人最近访问了你的页面。
Everyone is registered with a different username.
每个人都注册了不同的用户名。
What I've currently got is a database like this:
我现在有一个这样的数据库:
-----------------------------
| id | username | who_visit |
-----------------------------
For example, if 9 users foo1
to foo9
visited foo10
's page, the database would be populated with a row:
例如,如果9个foo1到foo9的用户访问了foo10的页面,那么数据库将被填充为一行:
------------------------------------------------------------------------
| id | username | who_visit |
------------------------------------------------------------------------
| 1 | foo10 | foo1, foo2, foo3, foo4, foo5, foo6, foo7, foo8, foo9 |
------------------------------------------------------------------------
And then when foo11
visits foo10
's page, I would remove foo9
from the end of the string, and add foo11
to the front.
然后当foo11访问foo10的页面时,我将从字符串的末尾删除foo9,并在前面添加foo11。
But the main problem now is, what if foo1
visited foo10
's page, then foo2
visited foo10
's page, and then foo1
visited foo10
's page again? Then I would have to search the 9 users, remove any duplicates, place this one in front, and then continue executing. But the problem is, then it would only show 8 rows.
但是现在的主要问题是,如果foo1访问了foo10的页面,而foo2访问了foo10的页面,而foo1又再次访问了foo10的页面,那会怎样?然后我必须搜索9个用户,删除任何重复,将这个放在前面,然后继续执行。但问题是,它只会显示8行。
The only solution to this problem I could think of was making a database like this:
我能想到的解决这个问题的唯一办法就是建立一个这样的数据库:
-----------------------------
| id | username | who_visit |
-----------------------------
And instead of populating them in one row, I would add a new row for every visit:
而不是将它们填充为一行,我将为每次访问添加一个新行:
-----------------------------
| id | username | who_visit |
-----------------------------
| 1 | foo10 | foo1 |
-----------------------------
| 2 | foo10 | foo2 |
-----------------------------
| 3 | foo10 | foo3 |
-----------------------------
| 4 | foo10 | foo4 |
-----------------------------
| 5 | foo10 | foo5 |
-----------------------------
| 6 | foo10 | foo6 |
-----------------------------
| 7 | foo10 | foo7 |
-----------------------------
| 8 | foo10 | foo8 |
-----------------------------
| 9 | foo10 | foo9 |
-----------------------------
But then this would takes heaps and heaps of unnecessary space.
但这将占用大量不必要的空间。
Is there a method I missed out on, which can efficiently solve this problem without adding > 50000 rows for one user in the database?
是否有一种方法我错过了,它可以有效地解决这个问题,而不需要为数据库中的一个用户添加> 50000行?
Update: For those with the same problem, as PM 77-1 stated below in the comments, one could delete the earliest duplicate row when a new row is inserted. This way, you won't get 'data-bloat'.
更新:对于那些有相同问题的人,如下面注释中的PM 77-1所述,当插入新行时,可以删除最早的重复行。这样,就不会出现“数据膨胀”。
5 个解决方案
#1
3
Your second method is the best. When i first started implementing databases in my apps I tried your first method. It creates problems when you want to expand or change things up in regard to how that data set is handled.
你的第二种方法是最好的。当我开始在我的应用程序中实现数据库时,我尝试了你的第一种方法。当您想要扩展或更改处理数据集的方式时,它会产生问题。
You should have no problem sorting through this data quickly if indexed properly.
如果索引正确,您应该可以快速地对这些数据进行排序。
You still want to remove the oldest row from the whovisit table. This will prevent your 50k entries. In theory you keep only 9 records in the whovisit table for each user. so your actual table size would be 9*Number_of_users
您仍然需要从whovisit表中删除最古老的行。这将阻止您的50k条目。理论上,每个用户在whovisit表中只保存9条记录。所以实际的表大小应该是9*Number_of_users
table one users
表一个用户
id | username
-----|-----------
1 | foo1
2 | foo2
table two whovisit
表2 whovisit
id | user(id) | visited(userId) | Date/time stamp
-----|-----------------------------|----------------
1 | 1 | 2 | 9999-12-31 23:59:59
when you insert the newly visited query the user id and get a row count. If less than nine your fine, if more than 9, then delete the oldest one leaving a total of 9 rows for the user.
当您插入新访问的查询时,用户id并获取行数。如果少于9个,如果超过9个,则删除最老的一个,为用户留下9行。
#2
2
It seems it will be helpful to add a date/time stamp of each visit.
它似乎有助于增加每次访问的日期/时间戳。
If you do, the logic could be like this:
如果你这么做,逻辑可能是这样的:
- user already on the list - update the earliest time/stamp with current date/time
- 用户已在名单-更新最早的时间/戳与当前日期/时间
- user is not yet on the list - find the overall earliest visit and update the record with this user's information
- 用户还没有出现在列表中——查找完整的最早访问并使用该用户的信息更新记录。
#3
0
I would suggest using two tables:
我建议使用两张表:
Table users
用户表
id | name
1 | foo1
2 | foo2
3 | foo3
4 | foo4
...
10 | foo10
Table visits
表的访问
host_userid | visitor_userid
10 | 1
10 | 2
10 | 3
10 | 4
The visits
table might also have a date column or a primary key, if necessary. Storing just two integers will result in a very small row size.
如果需要,访问表还可以有日期列或主键。只存储两个整数将导致非常小的行大小。
#4
0
your idea is called normalizing and actually a good idea.
你的想法叫做正常化,实际上是个好主意。
table user:
表用户:
-----------------
| id | name |
-----------------
| 1 | foo1 |
-----------------
| 2 | foo2 |
-----------------
| 3 | foo3 |
-----------------
table visit:
表访问:
-----------------------------
| id | user_id | visit_id |
-----------------------------
| 1 | 1 | 2 |
-----------------------------
| 2 | 2 | 3 |
-----------------------------
now you can easily and quickly store and retrieve visiting data. If you put that into one field (like in your 1st example), you end up in programmer's hell.
现在您可以轻松快速地存储和检索访问数据。如果你把它放到一个领域(比如在第一个例子中),你就会成为程序员的地狱。
You could include a timestamp in table visit and delete entries older than x days.
您可以在表访问中包含时间戳,并删除大于x天的条目。
#5
0
Use a relationship table instead... creating multiple IDs within your Users table is not recommended for obvious reasons...
使用关系表代替……不建议在用户表中创建多个id,原因显而易见……
for example:
例如:
Users
Table
用户表
[UserID
][UserName
]
【用户名】【用户名】
Visits
Table
访问表
[Source_User_ID
][Visitor_User_ID]
[Visit_Count
]
[Source_User_ID][Visitor_User_ID][Visit_Count]
Then your SQL statements become much simple with:
然后,您的SQL语句变得非常简单:
SELECT TOP 9 [Visitor_User_ID] WHERE [Source_User_ID]=### ORDER BY [Visit_Count] DESC
#1
3
Your second method is the best. When i first started implementing databases in my apps I tried your first method. It creates problems when you want to expand or change things up in regard to how that data set is handled.
你的第二种方法是最好的。当我开始在我的应用程序中实现数据库时,我尝试了你的第一种方法。当您想要扩展或更改处理数据集的方式时,它会产生问题。
You should have no problem sorting through this data quickly if indexed properly.
如果索引正确,您应该可以快速地对这些数据进行排序。
You still want to remove the oldest row from the whovisit table. This will prevent your 50k entries. In theory you keep only 9 records in the whovisit table for each user. so your actual table size would be 9*Number_of_users
您仍然需要从whovisit表中删除最古老的行。这将阻止您的50k条目。理论上,每个用户在whovisit表中只保存9条记录。所以实际的表大小应该是9*Number_of_users
table one users
表一个用户
id | username
-----|-----------
1 | foo1
2 | foo2
table two whovisit
表2 whovisit
id | user(id) | visited(userId) | Date/time stamp
-----|-----------------------------|----------------
1 | 1 | 2 | 9999-12-31 23:59:59
when you insert the newly visited query the user id and get a row count. If less than nine your fine, if more than 9, then delete the oldest one leaving a total of 9 rows for the user.
当您插入新访问的查询时,用户id并获取行数。如果少于9个,如果超过9个,则删除最老的一个,为用户留下9行。
#2
2
It seems it will be helpful to add a date/time stamp of each visit.
它似乎有助于增加每次访问的日期/时间戳。
If you do, the logic could be like this:
如果你这么做,逻辑可能是这样的:
- user already on the list - update the earliest time/stamp with current date/time
- 用户已在名单-更新最早的时间/戳与当前日期/时间
- user is not yet on the list - find the overall earliest visit and update the record with this user's information
- 用户还没有出现在列表中——查找完整的最早访问并使用该用户的信息更新记录。
#3
0
I would suggest using two tables:
我建议使用两张表:
Table users
用户表
id | name
1 | foo1
2 | foo2
3 | foo3
4 | foo4
...
10 | foo10
Table visits
表的访问
host_userid | visitor_userid
10 | 1
10 | 2
10 | 3
10 | 4
The visits
table might also have a date column or a primary key, if necessary. Storing just two integers will result in a very small row size.
如果需要,访问表还可以有日期列或主键。只存储两个整数将导致非常小的行大小。
#4
0
your idea is called normalizing and actually a good idea.
你的想法叫做正常化,实际上是个好主意。
table user:
表用户:
-----------------
| id | name |
-----------------
| 1 | foo1 |
-----------------
| 2 | foo2 |
-----------------
| 3 | foo3 |
-----------------
table visit:
表访问:
-----------------------------
| id | user_id | visit_id |
-----------------------------
| 1 | 1 | 2 |
-----------------------------
| 2 | 2 | 3 |
-----------------------------
now you can easily and quickly store and retrieve visiting data. If you put that into one field (like in your 1st example), you end up in programmer's hell.
现在您可以轻松快速地存储和检索访问数据。如果你把它放到一个领域(比如在第一个例子中),你就会成为程序员的地狱。
You could include a timestamp in table visit and delete entries older than x days.
您可以在表访问中包含时间戳,并删除大于x天的条目。
#5
0
Use a relationship table instead... creating multiple IDs within your Users table is not recommended for obvious reasons...
使用关系表代替……不建议在用户表中创建多个id,原因显而易见……
for example:
例如:
Users
Table
用户表
[UserID
][UserName
]
【用户名】【用户名】
Visits
Table
访问表
[Source_User_ID
][Visitor_User_ID]
[Visit_Count
]
[Source_User_ID][Visitor_User_ID][Visit_Count]
Then your SQL statements become much simple with:
然后,您的SQL语句变得非常简单:
SELECT TOP 9 [Visitor_User_ID] WHERE [Source_User_ID]=### ORDER BY [Visit_Count] DESC