I have a table called results
with 5 columns.
我有一个名为结果的表,有5列。
I'd like to use the title
column to find rows that are say: WHERE title like '%for sale%'
and then listing the most popular words in that column. One would be for
and another would be sale
but I want to see what other words correlate with this.
我想使用title列查找这样的行:例如'%for sale%',然后列出该列中最流行的单词。一种是出售,另一种是出售,但我想看看与此相关的其他词是什么。
Sample data:
样本数据:
title
cheap cars for sale
house for sale
cats and dogs for sale
iphones and androids for sale
cheap phones for sale
house furniture for sale
Results (single words):
结果(单个词):
for 6
sale 6
cheap 2
and 2
house 2
furniture 1
cars 1
etc...
6 个解决方案
#1
7
You can extract words with some string manipulation. Assuming you have a numbers table and that words are separated by single spaces:
您可以使用一些字符串操作提取单词。假设您有一个数字表,并且单词由单个空格分隔:
select substring_index(substring_index(r.title, ' ', n.n), ' ', -1) as word,
count(*)
from results r join
numbers n
on n.n <= length(title) - length(replace(title, ' ', '')) + 1
group by word;
If you don't have a numbers table, you can construct one manually using a subquery:
如果没有数字表,则可以使用子查询手动构建:
from results r join
(select 1 as n union all select 2 union all select 3 union all . . .
) n
. . .
The SQL Fiddle (courtesy of @GrzegorzAdamKowalski) is here.
下面是SQL Fiddle(由@GrzegorzAdamKowalski提供)。
#2
3
You can use ExtractValue in some interesting way. See SQL fiddle here: http://sqlfiddle.com/#!9/0b0a0/45
你可以用一些有趣的方法来使用提取值。请参阅下面的SQL fiddle: http://sqlfiddle.com/#
We need only one table:
我们只需要一张桌子:
CREATE TABLE text (`title` varchar(29));
INSERT INTO text (`title`)
VALUES
('cheap cars for sale'),
('house for sale'),
('cats and dogs for sale'),
('iphones and androids for sale'),
('cheap phones for sale'),
('house furniture for sale')
;
Now we construct series of selects which extract whole words from text converted to XML. Each select extracts N-th word from the text.
现在我们构建了一系列从转换为XML的文本中提取整词的选择。每个选择从文本中提取第n个单词。
select words.word, count(*) as `count` from
(select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[1]') as word from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[2]') from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[3]') from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[4]') from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[5]') from `text`) as words
where length(words.word) > 0
group by words.word
order by `count` desc, words.word asc
#3
2
This would give you single words (Just if I understand what your single word
means.):
这将给你一个单词(只要我理解你这个单词的意思):
select concat(val,' ',cnt) as result from(
select (substring_index(substring_index(t.title, ' ', n.n), ' ', -1)) val,count(*) as cnt
from result t cross join(
select a.n + b.n * 10 + 1 n
from
(select 0 as n union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) a,
(select 0 as n union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) b
order by n
) n
where n.n <= 1 + (length(t.title) - length(replace(t.title, ' ', '')))
group by val
order by cnt desc
) as x
Result should be looks like this :
结果应该是这样的:
Result
--------
for 6
sale 6
house 2
and 2
cheap 2
phones 1
iphones 1
dogs 1
furniture 1
cars 1
androids 1
cats 1
But if the single word
you need like this :
但是如果你需要这样一个词:
result
-----------
for 6 sale 6 house 2 and 2 cheap 2 phones 1 iphones 1 dogs 1 furniture 1 cars 1 androids 1 cats 1
Just modify the query above to:
只需将上面的查询修改为:
select group_concat(concat(val,' ',cnt) separator ' ') as result from( ...
#4
0
Update
更新
Idea taken from https://*.com/a/17942691/98491
想法来自https://*.com/a/17942691/98491
This query works on my machine (MySQL 5.7), however Sqlfiddle reports an error. The basic idea is that you should either create a table with numbers from 1 to maximum word occurence (like 4) in your field or as I did, use a UNION 1 .. 4 for simplicity.
这个查询在我的机器上运行(MySQL 5.7),但是Sqlfiddle会报告错误。基本的想法是,您应该创建一个包含从1到最大出现词数(如4)的表,或者像我一样使用UNION 1。4为简单起见。
CREATE TABLE products (
`id` int,
`name` varchar(45)
);
INSERT INTO products
(`id`, `name`)
VALUES
(1, 'for sale'),
(2, 'for me'),
(3, 'for you'),
(4, 'you and me')
;
SELECT name, COUNT(*) as count FROM
(
SELECT
product.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(product.name, ' ', numbers.n), ' ', -1) name
FROM
(
SELECT 1 AS n
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
) AS numbers
INNER JOIN products product
ON CHAR_LENGTH(product.name)
-CHAR_LENGTH(REPLACE(product.name, ' ', ''))>=numbers.n-1
ORDER BY
id, n
)
AS result
GROUP BY name
ORDER BY count DESC
Result will be
结果将是
for | 3
you | 2
me | 2
and | 1
sale| 1
#5
0
SQL is not well suited for this task, While possible there are limitations (the number of words for example)
SQL不是很适合这个任务,虽然可能存在一些限制(例如,单词的数量)
a quick PHP script to do the same task may be easier to use long term (and likely quicker too)
执行相同任务的快速PHP脚本可能更容易长期使用(也可能更快)
<?php
$rows = [
"cheap cars for sale",
"house for sale",
"cats and dogs for sale",
"iphones and androids for sale",
"cheap phones for sale",
"house furniture for sale",
];
//rows here should be replaced by the SQL result
$wordTotals = [];
foreach ($rows as $row) {
$words = explode(" ", $row);
foreach ($words as $word) {
if (isset($wordTotals[$word])) {
$wordTotals[$word]++;
continue;
}
$wordTotals[$word] = 1;
}
}
arsort($wordTotals);
foreach($wordTotals as $word => $count) {
echo $word . " " . $count . PHP_EOL;
}
Output
输出
for 6
sale 6
and 2
cheap 2
house 2
phones 1
androids 1
furniture 1
cats 1
cars 1
dogs 1
iphones 1
#6
0
Here is working SQL Fiddle: http://sqlfiddle.com/#!9/0b0a0/32
下面是SQL Fiddle: http://sqlfiddle.com/#
Let's start with two tables - one for texts and one for numbers:
让我们从两个表开始——一个是文本表,一个是数字表:
CREATE TABLE text (`title` varchar(29));
INSERT INTO text
(`title`)
VALUES
('cheap cars for sale'),
('house for sale'),
('cats and dogs for sale'),
('iphones and androids for sale'),
('cheap phones for sale'),
('house furniture for sale')
;
CREATE TABLE iterator (`index` int);
INSERT INTO iterator
(`index`)
VALUES
(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)
;
The second table, iterator
must contains numbers from 1 to N where N higher or equal to the lenght of the longest string in text
.
第二个表,iterator必须包含从1到N的数字,其中N大于或等于文本中最长字符串的长度。
Then, run this query:
然后,运行这个查询:
select
words.word, count(*) as `count`
from
(select
substring(concat(' ', t.title, ' '), i.index+1, j.index-i.index) as word
from
text as t, iterator as i, iterator as j
where
substring(concat(' ', t.title), i.index, 1) = ' '
and substring(concat(t.title, ' '), j.index, 1) = ' '
and i.index < j.index
) AS words
where
length(words.word) > 0
and words.word not like '% %'
group by words.word
order by `count` desc, words.word asc
There are two selects. Outer one simply groups and counts single words (words of length greater than 0 and without any spaces). Inner one extracts all strings starting from any space character and ending with any other space character, so strings aren't words (despite naming this subquery words
) because they can contain other spaces than starting and ending one.
有两种选择。外部一个简单的组,并计算单个单词(长度大于0和没有空格的单词)。Inner从任何空格字符提取所有字符串,并以任何其他空格字符结尾,因此字符串不是单词(尽管命名了这个子查询词),因为它们可以包含除开始和结束一个空格之外的其他空格。
Results:
结果:
word count
for 6
sale 6
and 2
cheap 2
house 2
androids 1
cars 1
cats 1
dogs 1
furniture 1
iphones 1
phones 1
#1
7
You can extract words with some string manipulation. Assuming you have a numbers table and that words are separated by single spaces:
您可以使用一些字符串操作提取单词。假设您有一个数字表,并且单词由单个空格分隔:
select substring_index(substring_index(r.title, ' ', n.n), ' ', -1) as word,
count(*)
from results r join
numbers n
on n.n <= length(title) - length(replace(title, ' ', '')) + 1
group by word;
If you don't have a numbers table, you can construct one manually using a subquery:
如果没有数字表,则可以使用子查询手动构建:
from results r join
(select 1 as n union all select 2 union all select 3 union all . . .
) n
. . .
The SQL Fiddle (courtesy of @GrzegorzAdamKowalski) is here.
下面是SQL Fiddle(由@GrzegorzAdamKowalski提供)。
#2
3
You can use ExtractValue in some interesting way. See SQL fiddle here: http://sqlfiddle.com/#!9/0b0a0/45
你可以用一些有趣的方法来使用提取值。请参阅下面的SQL fiddle: http://sqlfiddle.com/#
We need only one table:
我们只需要一张桌子:
CREATE TABLE text (`title` varchar(29));
INSERT INTO text (`title`)
VALUES
('cheap cars for sale'),
('house for sale'),
('cats and dogs for sale'),
('iphones and androids for sale'),
('cheap phones for sale'),
('house furniture for sale')
;
Now we construct series of selects which extract whole words from text converted to XML. Each select extracts N-th word from the text.
现在我们构建了一系列从转换为XML的文本中提取整词的选择。每个选择从文本中提取第n个单词。
select words.word, count(*) as `count` from
(select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[1]') as word from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[2]') from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[3]') from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[4]') from `text`
union all
select ExtractValue(CONCAT('<w>', REPLACE(title, ' ', '</w><w>'), '</w>'), '//w[5]') from `text`) as words
where length(words.word) > 0
group by words.word
order by `count` desc, words.word asc
#3
2
This would give you single words (Just if I understand what your single word
means.):
这将给你一个单词(只要我理解你这个单词的意思):
select concat(val,' ',cnt) as result from(
select (substring_index(substring_index(t.title, ' ', n.n), ' ', -1)) val,count(*) as cnt
from result t cross join(
select a.n + b.n * 10 + 1 n
from
(select 0 as n union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) a,
(select 0 as n union all select 1 union all select 2 union all select 3
union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) b
order by n
) n
where n.n <= 1 + (length(t.title) - length(replace(t.title, ' ', '')))
group by val
order by cnt desc
) as x
Result should be looks like this :
结果应该是这样的:
Result
--------
for 6
sale 6
house 2
and 2
cheap 2
phones 1
iphones 1
dogs 1
furniture 1
cars 1
androids 1
cats 1
But if the single word
you need like this :
但是如果你需要这样一个词:
result
-----------
for 6 sale 6 house 2 and 2 cheap 2 phones 1 iphones 1 dogs 1 furniture 1 cars 1 androids 1 cats 1
Just modify the query above to:
只需将上面的查询修改为:
select group_concat(concat(val,' ',cnt) separator ' ') as result from( ...
#4
0
Update
更新
Idea taken from https://*.com/a/17942691/98491
想法来自https://*.com/a/17942691/98491
This query works on my machine (MySQL 5.7), however Sqlfiddle reports an error. The basic idea is that you should either create a table with numbers from 1 to maximum word occurence (like 4) in your field or as I did, use a UNION 1 .. 4 for simplicity.
这个查询在我的机器上运行(MySQL 5.7),但是Sqlfiddle会报告错误。基本的想法是,您应该创建一个包含从1到最大出现词数(如4)的表,或者像我一样使用UNION 1。4为简单起见。
CREATE TABLE products (
`id` int,
`name` varchar(45)
);
INSERT INTO products
(`id`, `name`)
VALUES
(1, 'for sale'),
(2, 'for me'),
(3, 'for you'),
(4, 'you and me')
;
SELECT name, COUNT(*) as count FROM
(
SELECT
product.id,
SUBSTRING_INDEX(SUBSTRING_INDEX(product.name, ' ', numbers.n), ' ', -1) name
FROM
(
SELECT 1 AS n
UNION SELECT 2
UNION SELECT 3
UNION SELECT 4
) AS numbers
INNER JOIN products product
ON CHAR_LENGTH(product.name)
-CHAR_LENGTH(REPLACE(product.name, ' ', ''))>=numbers.n-1
ORDER BY
id, n
)
AS result
GROUP BY name
ORDER BY count DESC
Result will be
结果将是
for | 3
you | 2
me | 2
and | 1
sale| 1
#5
0
SQL is not well suited for this task, While possible there are limitations (the number of words for example)
SQL不是很适合这个任务,虽然可能存在一些限制(例如,单词的数量)
a quick PHP script to do the same task may be easier to use long term (and likely quicker too)
执行相同任务的快速PHP脚本可能更容易长期使用(也可能更快)
<?php
$rows = [
"cheap cars for sale",
"house for sale",
"cats and dogs for sale",
"iphones and androids for sale",
"cheap phones for sale",
"house furniture for sale",
];
//rows here should be replaced by the SQL result
$wordTotals = [];
foreach ($rows as $row) {
$words = explode(" ", $row);
foreach ($words as $word) {
if (isset($wordTotals[$word])) {
$wordTotals[$word]++;
continue;
}
$wordTotals[$word] = 1;
}
}
arsort($wordTotals);
foreach($wordTotals as $word => $count) {
echo $word . " " . $count . PHP_EOL;
}
Output
输出
for 6
sale 6
and 2
cheap 2
house 2
phones 1
androids 1
furniture 1
cats 1
cars 1
dogs 1
iphones 1
#6
0
Here is working SQL Fiddle: http://sqlfiddle.com/#!9/0b0a0/32
下面是SQL Fiddle: http://sqlfiddle.com/#
Let's start with two tables - one for texts and one for numbers:
让我们从两个表开始——一个是文本表,一个是数字表:
CREATE TABLE text (`title` varchar(29));
INSERT INTO text
(`title`)
VALUES
('cheap cars for sale'),
('house for sale'),
('cats and dogs for sale'),
('iphones and androids for sale'),
('cheap phones for sale'),
('house furniture for sale')
;
CREATE TABLE iterator (`index` int);
INSERT INTO iterator
(`index`)
VALUES
(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),
(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30)
;
The second table, iterator
must contains numbers from 1 to N where N higher or equal to the lenght of the longest string in text
.
第二个表,iterator必须包含从1到N的数字,其中N大于或等于文本中最长字符串的长度。
Then, run this query:
然后,运行这个查询:
select
words.word, count(*) as `count`
from
(select
substring(concat(' ', t.title, ' '), i.index+1, j.index-i.index) as word
from
text as t, iterator as i, iterator as j
where
substring(concat(' ', t.title), i.index, 1) = ' '
and substring(concat(t.title, ' '), j.index, 1) = ' '
and i.index < j.index
) AS words
where
length(words.word) > 0
and words.word not like '% %'
group by words.word
order by `count` desc, words.word asc
There are two selects. Outer one simply groups and counts single words (words of length greater than 0 and without any spaces). Inner one extracts all strings starting from any space character and ending with any other space character, so strings aren't words (despite naming this subquery words
) because they can contain other spaces than starting and ending one.
有两种选择。外部一个简单的组,并计算单个单词(长度大于0和没有空格的单词)。Inner从任何空格字符提取所有字符串,并以任何其他空格字符结尾,因此字符串不是单词(尽管命名了这个子查询词),因为它们可以包含除开始和结束一个空格之外的其他空格。
Results:
结果:
word count
for 6
sale 6
and 2
cheap 2
house 2
androids 1
cars 1
cats 1
dogs 1
furniture 1
iphones 1
phones 1