I am currently trying to optimize some bobj reports where our backend is Teradata. The Teradata optimizer seems very finicky and I was wondering if anyone has come up with a solution or a workaround to get the optimizer to treat likes in a similar regard to equals.
我目前正在优化一些bobj报告,我们的后端是Teradata。Teradata优化器似乎非常挑剔,我想知道是否有人想出了一个解决方案或解决方案,让优化器以类似的方式对待like =。
My issue is that we allow the user to input one of two methods:
1. Enter the Number:
or
2. Enter a Number like:
Option one performs like a dream while option two is dragging our query times from 6 seconds to 2 minutes.
选项一执行起来像做梦,而选项二将查询时间从6秒拖到2分钟。
In addition to this; does anyone know of any good articles, discussions, vidoes, etc.. on optimizing SQL statements for the teradata optimizer?
除此之外;有人知道任何好的文章、讨论、气氛等吗?关于优化teradata优化器的SQL语句?
4 个解决方案
#1
1
Because the column is defined as a VARCHAR and you are using the LIKE operator you eliminate the possibility of using the PI for single AMP access. Remember, the primary indexes first job is distributing the data across the AMPs in the system. Because you are using the LIKE operator against the PI the optimizer must perform an 'all AMP' operation to satisfy the LIKE operator.
因为这一列被定义为VARCHAR,而您使用的是LIKE操作符,您可以排除使用PI进行单个AMP访问的可能性。记住,主索引的第一项工作是在系统中的AMPs上分发数据。因为您正在对PI使用LIKE操作符,优化器必须执行“all AMP”操作来满足LIKE操作符。
WHERE MyPIColumn LIKE '123%'
The hashing of values starting with 123 can and will end up on multiple AMPs.
以123开头的值哈希可以并将以多个AMPs结束。
WHERE MyPIColum = '123'
The hashing of 123 will place every single record on the same AMP. Querying for '123' will always be a single AMP operation.
123的散列将把每个记录放在同一个放大器上。对“123”的查询将始终是一个单独的放大器操作。
Statistics on this may help with row estimates but will likely not eliminate the 'all AMP' operation.
有关这方面的统计数据可能有助于估计行数,但可能不会消除“all AMP”操作。
- Is this a Unique PI or Non-Unique PI?
- 这是唯一的PI还是非唯一的PI?
- Why was the data type chosen to be character over numeric? Although GT(E) or LT(E) would likely result in the same 'All-AMP' operation'.
- 为什么选择的数据类型是字符而不是数字?尽管GT(E)或LT(E)可能会导致相同的“全amp”操作。
- Is this PI shared by other tables in the model to facilitate AMP local join strategies?
- 这个PI是否被模型中的其他表共享以促进AMP本地连接策略?
#2
1
I'd take it that Number is indexed? Teradata uses hashing for indexing, so equals will result in the index being used, while like will result in a full table scan.
我认为这个数字被索引了?Teradata使用散列进行索引,因此equals方法将导致使用索引,而like方法将导致完整的表扫描。
If you have a genuine need for using like, there's not an awful lot you can do. One thing you could try is using Substr(Number, 1, 3) = '123'
rather than Number LIKE '123%'
. I've gotten small performance improvements from this in the past, but don't expect anything spectacular.
如果你真的需要使用like,你能做的就不多了。您可以尝试使用Substr(Number, 1,3) = '123',而不是像'123%'这样的数字。在过去,我从这个项目中得到了一些小小的性能改进,但是不要期望有什么惊人的东西。
#3
1
You will need a fulltext index / pre-tokenized index, e.g. lucene, and also a two parse search.
您将需要一个全文索引/预标记索引,例如lucene,以及两个解析搜索。
e.g. When insert a "12345" to your database, create link from "1", "12", "123", "234"...etc to "12345".
当插入一个“12345”到你的数据库时,创建“1”、“12”、“123”、“234”的链接。“12345”等。
Then, when use find something like "123**", find "123" from the lookup table and the seek to the record "12345"
然后,当使用查找“123**”时,从查找表中查找“123”并查找记录“12345”
#4
1
If you are doing a direct VARCHAR comparison, ie
如果你在做直接的VARCHAR比较的话
Column LIKE 'VALUE'
then you could try to use a NUSI on that column. Make sure that you collect statistics for the table's primary index and for the index
然后你可以试着在那一栏用NUSI。确保为表的主索引和索引收集统计信息
#1
1
Because the column is defined as a VARCHAR and you are using the LIKE operator you eliminate the possibility of using the PI for single AMP access. Remember, the primary indexes first job is distributing the data across the AMPs in the system. Because you are using the LIKE operator against the PI the optimizer must perform an 'all AMP' operation to satisfy the LIKE operator.
因为这一列被定义为VARCHAR,而您使用的是LIKE操作符,您可以排除使用PI进行单个AMP访问的可能性。记住,主索引的第一项工作是在系统中的AMPs上分发数据。因为您正在对PI使用LIKE操作符,优化器必须执行“all AMP”操作来满足LIKE操作符。
WHERE MyPIColumn LIKE '123%'
The hashing of values starting with 123 can and will end up on multiple AMPs.
以123开头的值哈希可以并将以多个AMPs结束。
WHERE MyPIColum = '123'
The hashing of 123 will place every single record on the same AMP. Querying for '123' will always be a single AMP operation.
123的散列将把每个记录放在同一个放大器上。对“123”的查询将始终是一个单独的放大器操作。
Statistics on this may help with row estimates but will likely not eliminate the 'all AMP' operation.
有关这方面的统计数据可能有助于估计行数,但可能不会消除“all AMP”操作。
- Is this a Unique PI or Non-Unique PI?
- 这是唯一的PI还是非唯一的PI?
- Why was the data type chosen to be character over numeric? Although GT(E) or LT(E) would likely result in the same 'All-AMP' operation'.
- 为什么选择的数据类型是字符而不是数字?尽管GT(E)或LT(E)可能会导致相同的“全amp”操作。
- Is this PI shared by other tables in the model to facilitate AMP local join strategies?
- 这个PI是否被模型中的其他表共享以促进AMP本地连接策略?
#2
1
I'd take it that Number is indexed? Teradata uses hashing for indexing, so equals will result in the index being used, while like will result in a full table scan.
我认为这个数字被索引了?Teradata使用散列进行索引,因此equals方法将导致使用索引,而like方法将导致完整的表扫描。
If you have a genuine need for using like, there's not an awful lot you can do. One thing you could try is using Substr(Number, 1, 3) = '123'
rather than Number LIKE '123%'
. I've gotten small performance improvements from this in the past, but don't expect anything spectacular.
如果你真的需要使用like,你能做的就不多了。您可以尝试使用Substr(Number, 1,3) = '123',而不是像'123%'这样的数字。在过去,我从这个项目中得到了一些小小的性能改进,但是不要期望有什么惊人的东西。
#3
1
You will need a fulltext index / pre-tokenized index, e.g. lucene, and also a two parse search.
您将需要一个全文索引/预标记索引,例如lucene,以及两个解析搜索。
e.g. When insert a "12345" to your database, create link from "1", "12", "123", "234"...etc to "12345".
当插入一个“12345”到你的数据库时,创建“1”、“12”、“123”、“234”的链接。“12345”等。
Then, when use find something like "123**", find "123" from the lookup table and the seek to the record "12345"
然后,当使用查找“123**”时,从查找表中查找“123”并查找记录“12345”
#4
1
If you are doing a direct VARCHAR comparison, ie
如果你在做直接的VARCHAR比较的话
Column LIKE 'VALUE'
then you could try to use a NUSI on that column. Make sure that you collect statistics for the table's primary index and for the index
然后你可以试着在那一栏用NUSI。确保为表的主索引和索引收集统计信息