You guys are amazing. I've posted here twice in the past couple of days - a new user - and I've been blown away by the help. So, I figured I'd take the slowest query I've got in my software and see if anyone can help me speed it up. I use this query as a view, so it's important that it be fast (and it isn't!).
你们是惊人的。在过去的几天里,我已经在这里发布了两次——一个新用户——我被这个帮助冲昏了头脑。所以,我想我应该用软件中最慢的查询,看看是否有人能帮我加快速度。我将这个查询用作视图,所以它必须快速(而且不是)。
First, I have a Contacts Table that store my company's customers. In the table is a JobTitle column which contains an ID which is defined in the Contacts_Def_JobFunctions table. There is also a table called contacts_link_job_functions which holds the contactID number and additional job functions the customer has - also defined in the Contacts_Def_JobFunctions table.
首先,我有一个联系人表,用来存放我公司的客户。表中是JobTitle列,其中包含在Contacts_Def_JobFunctions表中定义的ID。还有一个名为contacts_link_job_functions的表,它包含客户在Contacts_Def_JobFunctions表中定义的contactID数字和其他作业函数。
Secondly, the Contacts_Def_JobFunctions table records have a parent/child relationship with themselves. In this manner, we cluster similar job functions (for example: maid, laundry service, housekeeping, cleaning, etc. are all the same basic job - while the job title may vary). Job functions which we don't currently work with are maintained as children of ParentJobID 1841.
其次,Contacts_Def_JobFunctions表记录与其自身具有父/子关系。通过这种方式,我们将类似的工作功能集中在一起(例如:女佣、洗衣服务、家政、清洁等都是相同的基本工作——而职位名称可能会有所不同)。我们目前没有工作的工作职责,是作为父母的孩子在1841年维持的。
Third, the institutionswithzipcodesadditional simply provides geographical data to the final result.
第三,zipcodes附加的机构仅仅为最终结果提供地理数据。
Lastly, like all responsible companies, we maintain a remove list for any of our customers that wish to opt-out of our newsletter (after opting in).
最后,像所有负责任的公司一样,我们为任何希望退出我们的通讯(在选择加入后)的客户保留一个删除列表。
I use the following query to build a table of those people who have opted-in to receive our newsletter and who have a job function or job title relevant to the services/products we offer.
我使用下面的查询来构建一个表格,列出那些选择接收我们的通讯的人,以及那些拥有与我们提供的服务/产品相关的工作功能或职位的人。
Here's my UGLY query:
这是我的丑陋的查询:
SELECT DISTINCT
dbo.contacts_link_emails.Email, dbo.contacts.ContactID, dbo.contacts.First AS ContactFirstName, dbo.contacts.Last AS ContactLastName, dbo.contacts.InstitutionID,
dbo.institutionswithzipcodesadditional.CountyID, dbo.institutionswithzipcodesadditional.StateID, dbo.institutionswithzipcodesadditional.DistrictID
FROM
dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_3
INNER JOIN
dbo.contacts
INNER JOIN
dbo.contacts_link_emails
ON dbo.contacts.ContactID = dbo.contacts_link_emails.ContactID
ON contacts_def_jobfunctions_3.JobID = dbo.contacts.JobTitle
INNER JOIN
dbo.institutionswithzipcodesadditional
ON dbo.contacts.InstitutionID = dbo.institutionswithzipcodesadditional.InstitutionID
LEFT OUTER JOIN
dbo.contacts_def_jobfunctions
INNER JOIN
dbo.contacts_link_jobfunctions
ON dbo.contacts_def_jobfunctions.JobID = dbo.contacts_link_jobfunctions.JobID
ON dbo.contacts.ContactID = dbo.contacts_link_jobfunctions.ContactID
WHERE
(dbo.contacts.JobTitle IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_1
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist))
OR
(dbo.contacts_link_jobfunctions.JobID IN
(SELECT JobID
FROM dbo.contacts_def_jobfunctions AS contacts_def_jobfunctions_2
WHERE (ParentJobID <> '1841')))
AND
(dbo.contacts_link_emails.Email NOT IN
(SELECT EmailAddress
FROM dbo.newsletterremovelist AS newsletterremovelist))
I'm hoping some of you superstars can help me tune this up.
我希望你们中的一些超级明星能帮我调整一下。
Thanks so much,
非常感谢,
Russell Schutte
罗素舒特等人
UPDATE - UPDATE - UPDATE - UPDATE - UPDATE
更新-更新-更新-更新-更新
After getting several feedback messages, most notably from Khanzor, I've worked hard on tuning this query and have come up with the following:
在收到一些反馈消息后,尤其是来自Khanzor的反馈消息后,我努力调优这个查询,并提出了以下内容:
SELECT DISTINCT
contacts_link_emails.Email, contacts.ContactID, contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID,
institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM contacts
INNER JOIN
contacts_def_jobfunctions ON contacts.jobtitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
contacts_link_jobfunctions ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
INNER JOIN
contacts_link_emails ON contacts.ContactID = contacts_link_emails.ContactID
INNER JOIN
institutionswithzipcodesadditional ON contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
LEFT JOIN
newsletterremovelist ON newsletterremovelist.emailaddress = contacts_link_emails.email
WHERE
newsletterremovelist.emailaddress IS NULL
This isn't quite perfect (I suspect I should have made something an outer join or a right join or something, and I'm not really sure). My result set is about 40% of the records my original query provided (which I'm no longer 100% positive was a perfect query).
这并不是很完美(我怀疑我应该做一些外部连接或右连接之类的东西,我也不是很确定)。我的结果集大约是我最初查询提供的记录的40%(我不再100%肯定这是一个完美的查询)。
To clean things up, I took out all the "dbo." prefixes that SQL Studio adds. Do they do anything?
为了清理问题,我取出了SQL Studio添加的所有“dbo”前缀。他们做任何事情吗?
What am I doing wrong now?
我现在做错了什么?
Thanks,
谢谢,
Russell Schutte
罗素舒特等人
== == == == == == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == ANOTHER UPDATE == == == == ==
== == == == == =另一个更新==另一个更新==另一个更新== =另一个更新== =另一个更新== == == == == == == =
I've been working on this one query for several hours now. I've got it down to this:
我已经对这个查询进行了几个小时的研究。我把它归结为:
SELECT DISTINCT
contacts_link_emails.Email, contacts.contactID, contacts.First AS ContactFirstName, contacts.Last AS ContactLastName, contacts.InstitutionID,
institutionswithzipcodesadditional.CountyID, institutionswithzipcodesadditional.StateID, institutionswithzipcodesadditional.DistrictID
FROM
contacts INNER JOIN institutionswithzipcodesadditional
ON contacts.InstitutionID = institutionswithzipcodesadditional.InstitutionID
INNER JOIN contacts_link_emails
ON contacts.ContactID = contacts_link_emails.ContactID
LEFT OUTER JOIN contacts_def_jobfunctions
ON contacts.JobTitle = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
LEFT OUTER JOIN contacts_link_jobfunctions
ON contacts_link_jobfunctions.JobID = contacts_def_jobfunctions.JobID AND contacts_def_jobfunctions.ParentJobID <> '1841'
LEFT OUTER JOIN
newsletterremovelist ON newsletterremovelist.EmailAddress = contacts_link_emails.Email
WHERE (newsletterremovelist.EmailAddress IS NULL)
Disappointingly, I'm just not able to fill in the gaps in my knowledge. I'm new to joins, except when I have the visual tool build them for me, so I'm thinking I want everything from contacts, institutionswithzipcodesadditional, and contacts_link_emails, so I've INNER JOINed them (above).
令人失望的是,我无法填补我知识上的空白。我是新加入的,除非我有可视化工具为我构建它们,所以我想我需要从联系人、带zipcodes附加的机构和contacts_link_emails等所有东西,所以我已经在内部加入了它们(上面)。
I am stumped on the next bit. If I INNER JOIN them, then I get people who have the proper jobs (<> 1841) - but I'm thinking I LOSE out on people who don't have an entry for both JobTitle AND JobFunctions. In many cases, this isn't right. I could have a JobTitle "Custodian" which I'd want to keep on our newsletter list, but if he doesn't also have a JobFunction entry, I think he'll fall off the list if I use INNER JOIN.
我被难住了。如果我在内心加入他们,那么我就会找到那些有合适工作的人(<> 1841)——但我认为我失去了那些既没有职位又没有工作的人。在很多情况下,这是不对的。我可以有一个JobTitle“Custodian”,我想把它保存在我们的通讯列表中,但是如果他没有一个JobFunction条目,我想如果我使用INNER JOIN,他会从列表上掉下来。
BUT, if I do the query with LEFT OUTER JOINs, as above, I think I get lots of people with the wrong JobTitles, simply because anyone who is lacking EITHER a JobTitle OR a JobFunction would be ON my list - they could be a "High Level Executive" with no JobFunction, and they'd be on the list - which isn't right. We no longer have services appropriate to "High Level Executives".
但是,如果我做左外连接查询,如上所述,我认为我有很多人错误的职务高低,只是因为那些缺乏职务高低或JobFunction将被我列入——他们可能是一个“高级别执行”没有JobFunction,他们会在名单上——这是不对的。我们不再有适合“高层管理人员”的服务。
Then I see how the LEFT OUTER JOIN works for the newsletterremovelist. It's pretty slick and I think I've done it right...
然后,我就会看到左边的外部连接是如何工作的。这太狡猾了,我想我做对了……
But I'm still stuck. Hopefully someone can see what I'm trying to do here and steer me in the right direction.
但我还是卡住了。希望有人能看到我在做什么,并引导我在正确的方向。
Thanks,
谢谢,
Russell Schutte
罗素舒特等人
UPDATE AGAIN
再次更新
Sadly, this thread seems to have died, without a perfect solution - but I'm getting close. Please see a new thread started which restarts the discussion: click here
不幸的是,这条线似乎已经死了,没有一个完美的解决方案——但我已经接近了。请查看重新启动讨论的新线程:单击这里
(awarded a correct answer for the massive amount of work provided - even while a correct answer hasn't quite been reached).
(为所提供的大量工作提供了正确的答案——尽管还没有找到正确的答案)。
Thanks!
谢谢!
Russell Schutte
罗素舒特等人
3 个解决方案
#1
6
Move the queries in your WHERE
out to actual joins. These are called correlated subqueries, and are the work of the Voldemort. If they are joins, they are only executed once, and will speed up your query.
将查询移到实际的连接。这些被称为关联子查询,是伏地魔的工作。如果它们是连接,则只执行一次,并将加快查询速度。
For the NOT IN
sections, use a left outer join, and check that the column you joined on is NULL
.
对于非IN部分,使用左外部连接,并检查您加入的列是否为空。
Also, avoid using OR
in WHERE
queries where possible - remember that OR
is not neccesarily a short circuit operation.
此外,避免使用或在可能的地方查询——记住OR不是短路操作。
An example is as follows:
一个例子如下:
SELECT
*
FROM
dbo.contacts AS c
INNER JOIN
dbo.contacts_def_jobfunctions AS jf
ON c.JobTitle = jf.JobId AND jf.ParentJobID <> '1841'
INNER JOIN
dbo.contacts_link_emails AS e
ON c.ContactID = e.ContactID AND jf.JobID = c.JobTitle
LEFT JOIN
dbo.newsletterremovelist AS rl
ON e.Email = rl.EmailAddress
WHERE
rl.EmailAddress IS NULL
Please don't use this, as it's almost certainly incorrect (not to mention SELECT *
), I've ignored the logic for contacts_ref_jobfunctions_3 to provide a simple example.
请不要使用这个,因为它几乎肯定是不正确的(更不必说SELECT *),我忽略了contacts_ref_jobfunctions_3的逻辑,以提供一个简单的示例。
For a (really) nice explanation of joins, try this visual explanation of joins
对于连接(真正)好的解释,请尝试这种对连接的可视化解释
#2
0
Create some views representing some common associations that you make so that your sub-query is simpler. Also views execute a bit quicker as they do not need to be interpreted each time they are run.
创建一些表示一些常见关联的视图,以便子查询更简单。同样,视图执行得更快,因为每次运行时不需要解释它们。
#3
0
It could be any number of things. My first question is are the columns you're joining on indexed?
它可以是任何数。我的第一个问题是你加入索引的列吗?
Better yet, do a SHOWPLAN
and paste it into your question.
更好的是,做一个展示计划并把它粘贴到你的问题中。
#1
6
Move the queries in your WHERE
out to actual joins. These are called correlated subqueries, and are the work of the Voldemort. If they are joins, they are only executed once, and will speed up your query.
将查询移到实际的连接。这些被称为关联子查询,是伏地魔的工作。如果它们是连接,则只执行一次,并将加快查询速度。
For the NOT IN
sections, use a left outer join, and check that the column you joined on is NULL
.
对于非IN部分,使用左外部连接,并检查您加入的列是否为空。
Also, avoid using OR
in WHERE
queries where possible - remember that OR
is not neccesarily a short circuit operation.
此外,避免使用或在可能的地方查询——记住OR不是短路操作。
An example is as follows:
一个例子如下:
SELECT
*
FROM
dbo.contacts AS c
INNER JOIN
dbo.contacts_def_jobfunctions AS jf
ON c.JobTitle = jf.JobId AND jf.ParentJobID <> '1841'
INNER JOIN
dbo.contacts_link_emails AS e
ON c.ContactID = e.ContactID AND jf.JobID = c.JobTitle
LEFT JOIN
dbo.newsletterremovelist AS rl
ON e.Email = rl.EmailAddress
WHERE
rl.EmailAddress IS NULL
Please don't use this, as it's almost certainly incorrect (not to mention SELECT *
), I've ignored the logic for contacts_ref_jobfunctions_3 to provide a simple example.
请不要使用这个,因为它几乎肯定是不正确的(更不必说SELECT *),我忽略了contacts_ref_jobfunctions_3的逻辑,以提供一个简单的示例。
For a (really) nice explanation of joins, try this visual explanation of joins
对于连接(真正)好的解释,请尝试这种对连接的可视化解释
#2
0
Create some views representing some common associations that you make so that your sub-query is simpler. Also views execute a bit quicker as they do not need to be interpreted each time they are run.
创建一些表示一些常见关联的视图,以便子查询更简单。同样,视图执行得更快,因为每次运行时不需要解释它们。
#3
0
It could be any number of things. My first question is are the columns you're joining on indexed?
它可以是任何数。我的第一个问题是你加入索引的列吗?
Better yet, do a SHOWPLAN
and paste it into your question.
更好的是,做一个展示计划并把它粘贴到你的问题中。