I have a query like this, that looks at loads of different URLs, and groups them by hostname. It's pretty ugly, but seems fast enough to use.
我有这样的查询,它查看不同URL的负载,并按主机名对它们进行分组。它非常难看,但似乎足够快。
How can I write it so that the ugly substring (which grabs the first part of the domain) is written in a neater way? I'm generating the query from an array of social media sites, so potentially there could be more sites there.
我如何编写它以便丑陋的子串(抓住域的第一部分)以更整洁的方式编写?我正在从一系列社交媒体网站生成查询,因此可能会有更多网站。
SELECT substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) AS referer_domain,
count(USER) AS hits,
r.id
FROM core c,
referer r
WHERE c.site_url = 12
AND r.name LIKE '%/%'
AND c.referer = r.id
AND (substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) = "www.delicious.com"
OR substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) = "www.facebook.com"
OR substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) = "m.facebook.com"
OR substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) = "www.reddit.com"
OR substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) = "twitter.com"
OR substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) = "news.ycombinator.com"
GROUP BY substring(r.name, 8, locate("/",substring(r.name FROM 8))-1)
ORDER BY hits DESC
4 个解决方案
#1
2
In your case, you have already created an output column referer_domain which you can reference for the GROUP BY.
在您的情况下,您已经创建了一个输出列referer_domain,您可以为GROUP BY引用它。
In order to use it in the WHERE clause, though you need a view.
为了在WHERE子句中使用它,尽管您需要一个视图。
CREATE VIEW ref_domain_view AS SELECT *,substring(name, 8, locate("/",substring(name FROM 8))-1) as referer_domain FROM referer;
SELECT r.referer_domain,
count(USER) AS hits,
r.id
FROM core c,
ref_domain_view r
WHERE c.site_url = 12
AND r.name LIKE '%/%'
AND c.referer = r.id
AND referer_domain = "www.delicious.com"
OR referer_domain = "www.facebook.com"
OR referer_domain = "m.facebook.com"
OR referer_domain = "www.reddit.com"
OR referer_domain = "twitter.com"
OR referer_domain = "news.ycombinator.com"
GROUP BY referer_domain
ORDER BY hits DESC
#3
0
One way to do it would be to write a user-defined function (UDF) that does the string extraction. UDF's are fast, but more of a pain to write. Another alternative is to write a stored function a type of stored procedure which is simpler to write and more in line with SQL syntax.
一种方法是编写一个用于字符串提取的用户定义函数(UDF)。 UDF很快,但写起来更痛苦。另一种方法是编写一个存储过程的存储过程,它更易于编写,更符合SQL语法。
#4
0
While the above two answers have very good points I have to ask you something.
虽然以上两个答案都有很好的观点,但我必须问你一些问题。
Do you really need all the substring function calls?
你真的需要所有的子串函数调用吗?
Have you tried something like this:
你尝试过这样的事情:
SELECT substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) AS referer_domain,
count(USER) AS hits,
r.id
FROM core c,
referer r
WHERE c.site_url = 12
AND r.name LIKE '%/%'
AND c.referer = r.id
AND r.name = "http://www.delicious.com"
OR r.name = "http://www.facebook.com"
OR r.name = "http://m.facebook.com"
OR r.name = "http://www.reddit.com"
OR r.name = "http://twitter.com"
OR r.name = "http://news.ycombinator.com"
GROUP BY substring(r.name, 8, locate("/",substring(r.name FROM 8))-1)
ORDER BY hits DESC
The only problem with the above query would be if you needed more domains to track you would need to alter the query. If you put the domains into a table and joined to that table you could probably get rid of the substring completely and also never have to alter the query.
上述查询的唯一问题是,如果您需要更多域来跟踪您需要更改查询。如果将域放入表中并加入到该表中,则可能完全摆脱子字符串,也无需更改查询。
#1
2
In your case, you have already created an output column referer_domain which you can reference for the GROUP BY.
在您的情况下,您已经创建了一个输出列referer_domain,您可以为GROUP BY引用它。
In order to use it in the WHERE clause, though you need a view.
为了在WHERE子句中使用它,尽管您需要一个视图。
CREATE VIEW ref_domain_view AS SELECT *,substring(name, 8, locate("/",substring(name FROM 8))-1) as referer_domain FROM referer;
SELECT r.referer_domain,
count(USER) AS hits,
r.id
FROM core c,
ref_domain_view r
WHERE c.site_url = 12
AND r.name LIKE '%/%'
AND c.referer = r.id
AND referer_domain = "www.delicious.com"
OR referer_domain = "www.facebook.com"
OR referer_domain = "m.facebook.com"
OR referer_domain = "www.reddit.com"
OR referer_domain = "twitter.com"
OR referer_domain = "news.ycombinator.com"
GROUP BY referer_domain
ORDER BY hits DESC
#2
#3
0
One way to do it would be to write a user-defined function (UDF) that does the string extraction. UDF's are fast, but more of a pain to write. Another alternative is to write a stored function a type of stored procedure which is simpler to write and more in line with SQL syntax.
一种方法是编写一个用于字符串提取的用户定义函数(UDF)。 UDF很快,但写起来更痛苦。另一种方法是编写一个存储过程的存储过程,它更易于编写,更符合SQL语法。
#4
0
While the above two answers have very good points I have to ask you something.
虽然以上两个答案都有很好的观点,但我必须问你一些问题。
Do you really need all the substring function calls?
你真的需要所有的子串函数调用吗?
Have you tried something like this:
你尝试过这样的事情:
SELECT substring(r.name, 8, locate("/",substring(r.name FROM 8))-1) AS referer_domain,
count(USER) AS hits,
r.id
FROM core c,
referer r
WHERE c.site_url = 12
AND r.name LIKE '%/%'
AND c.referer = r.id
AND r.name = "http://www.delicious.com"
OR r.name = "http://www.facebook.com"
OR r.name = "http://m.facebook.com"
OR r.name = "http://www.reddit.com"
OR r.name = "http://twitter.com"
OR r.name = "http://news.ycombinator.com"
GROUP BY substring(r.name, 8, locate("/",substring(r.name FROM 8))-1)
ORDER BY hits DESC
The only problem with the above query would be if you needed more domains to track you would need to alter the query. If you put the domains into a table and joined to that table you could probably get rid of the substring completely and also never have to alter the query.
上述查询的唯一问题是,如果您需要更多域来跟踪您需要更改查询。如果将域放入表中并加入到该表中,则可能完全摆脱子字符串,也无需更改查询。