SQL不对小写和大写电子邮件地址进行分组

时间:2023-01-25 09:22:37

I am doing a SQL Query where query is not grouping together upper case and lower case email address. Though sql is not case sensitive I don't understand why it is doing so.

我正在进行SQL查询,其中查询不是将大写和小写的电子邮件地址组合在一起。虽然sql不区分大小写但我不明白它为什么会这样做。

SELECT
Customers.EmailAddress,
o.TotalOrders AS 'overall NumOrders',
o.TotalOrdered AS 'overall TotalOrdered',
o1.TotalOrders AS '2017 NumOrders',
o1.TotalOrdered AS '2017 TotalOrdered',
o2.TotalOrders AS '2016 NumOrders',
o2.TotalOrdered AS '2016 TotalOrdered'
FROM Customers 
JOIN Orders
ON Customers.Customerid=Orders.Customerid
FULL  JOIN
(
SELECT DISTINCT Customers.EmailAddress,
COUNT(Orders.OrderID) as TotalOrders,
(SUM(Orders.PaymentAmount)) as TotalOrdered
FROM
Customers  WITH (NOLOCK),Orders  WITH (NOLOCK)
WHERE
 Customers.CustomerID = Orders.CustomerID 
AND Orders.OrderStatus NOT IN ('Cancelled','Payment Declined')
AND Orders.OrderDate BETWEEN '01/01/2016 00:00' AND '11/30/2017 23:59'
GROUP BY
Customers.EmailAddress
 ) AS o ON o.EmailAddress = Customers.EmailAddress
FULL  JOIN
(
SELECT DISTINCT
Customers.EmailAddress,
COUNT(Orders.OrderID) as TotalOrders,
SUM(Orders.PaymentAmount) as TotalOrdered
FROM
Orders  WITH (NOLOCK), Customers  WITH (NOLOCK)
WHERE
Orders.CustomerID = Customers.CustomerID
AND Orders.OrderStatus NOT IN ('Cancelled','Payment Declined')
AND Orders.OrderDate BETWEEN '01/01/2017 00:00' AND '11/30/2017 23:59'
GROUP BY
Customers.EmailAddress
) AS o1 ON o1.EmailAddress = Customers.EmailAddress
FULL JOIN
(
SELECT DISTINCT
Customers.EmailAddress,
COUNT(Orders.OrderID) as TotalOrders,
SUM(Orders.PaymentAmount) as TotalOrdered
FROM
Orders  WITH (NOLOCK), Customers  WITH (NOLOCK)
WHERE
Orders.CustomerID = Customers.CustomerID
AND Orders.OrderStatus NOT IN ('Cancelled','Payment Declined')
AND Orders.OrderDate BETWEEN '01/01/2016 00:00' AND '12/31/2016 23:59'
GROUP BY
Customers.EmailAddress
) AS o2 ON o2.EmailAddress = Customers.EmailAddress
WHERE Orders.Orderdate BETWEEN '1/1/2016 00:00' AND '11/30/2017 23:59'
 AND  Orders.OrderStatus NOT IN('Cancelled','Payment Declined')
GROUP BY
Customers.EmailAddress,
o.TotalOrders ,
o.TotalOrdered ,
o1.TotalOrders ,
o1.TotalOrdered ,
o2.TotalOrders ,
o2.TotalOrdered ,
o3.TotalOrders , 
o3.TotalOrdered 

where I am getting following results. Results showing :

我在哪里得到以下结果。结果显示:

EMAILADDRESS           overallnumorders totalorders 2017totalorder ...2016 totalor
    SMITHWORKS @ GMAIL.COM   3                $23.99           3             
    smithworks@gmail.com     1                                                 

I want to combine both of these email into one: emailaddress overallnumorders.......... smithworks@gmail.com 4 ......................................

我想将这两个电子邮件合并为一个:emailaddress overallnumorders .......... smithworks@gmail.com 4 ..................... .................

1 个解决方案

#1


0  

The 2 email addresses differ by more than just case (there are 2 spaces as well SMITHWORKS @ GMAIL.COM <> smithworks@gmail.com and so they would be "distinct" and hence treated as 2 rows.

2个电子邮件地址的区别不仅仅是大小写(SMITHWORKS @ GMAIL.COM <> smithworks@gmail.com还有2个空格,因此它们将是“不同的”,因此被视为2行。

You could try lower(Customers.EmailAddress) in both the select and group by clauses. Also your existing query can be simplified by using "conditional aggregates" (put a case expression inside an aggregate function):

您可以在select和group by子句中尝试降低(Customers.EmailAddress)。此外,您可以通过使用“条件聚合”(将一个案例表达式放在聚合函数中)来简化现有查询:

SELECT
      lower(Customers.EmailAddress)
    , COUNT(Orders.OrderID)   AS 'overall NumOrders'
    , SUM(Orders.PaymentAmount)  AS 'overall TotalOrdered'
    , COUNT(case when Orders.OrderDate >= '20170101' then Orders.OrderID       end) AS '2017 NumOrders'
    , SUM(  case when Orders.OrderDate >= '20170101' then Orders.PaymentAmount end) AS '2017 TotalOrdered'
    , COUNT(case when Orders.OrderDate <  '20170101' then Orders.OrderID       end) AS '2016 NumOrders'
    , SUM(  case when Orders.OrderDate <  '20170101' then Orders.PaymentAmount end) AS '2016 TotalOrdered'
    , o2.TotalOrders  AS '2016 NumOrders'
    , o2.TotalOrdered AS '2016 TotalOrdered'
FROM Customers
JOIN Orders ON Customers.Customerid = Orders.Customerid
WHERE Orders.OrderStatus NOT IN ('Cancelled', 'Payment Declined')
AND Orders.OrderDate >= '20160101' AND Orders.OrderDate < '20171201'
GROUP BY
      lower(Customers.EmailAddress)

Notes:

  • In SQL Server the safest possible date literal is NOT in MM/dd/yyyy. The safest way to specify a date in T-SQL is YYYYMMDD (no seperators)
  • 在SQL Server中,最安全的日期文字不在MM / dd / yyyy中。在T-SQL中指定日期的最安全方法是YYYYMMDD(没有分隔符)

  • AND '11/30/2017 23:59' is NOT a good way to define an end point, it is one full minute short of Dec. 1st. Far better to stop using between for date ranges. "Less than December 1 2017" is the most accurate way of defining the cutoff point. So, instead of between, use the style shown in this answer which involves >= and <
  • AND '11 / 30/2017 2017:59'不是定义终点的好方法,它比12月1日短一分钟。最好停止在日期范围之间使用。 “少于2017年12月1日”是定义截止点的最准确方法。因此,请使用此答案中显示的涉及> =和 <的样式,而不是介于两者之间< p>

  • 'select distinct' is NOT needed when you are doing group by in a single query. (The group by clause by definition produces rows that are unique so select distinct is just a waste of time. The grouping is done before the select cluse too.)
  • 当您在单个查询中进行分组时,不需要“select distinct”。 (group by子句按定义生成唯一的行,因此select distinct只是浪费时间。分组也在select cluse之前完成。)

  • FULL JOIN is very expensive, and not often used. It wasn't required in your existing query because all the data comes from the exact same tables, so there can be no unmatched rows coming from the subqueries.
  • FULL JOIN非常昂贵,并且不经常使用。在现有查询中不需要它,因为所有数据都来自完全相同的表,因此子查询中不会有不匹配的行。

#1


0  

The 2 email addresses differ by more than just case (there are 2 spaces as well SMITHWORKS @ GMAIL.COM <> smithworks@gmail.com and so they would be "distinct" and hence treated as 2 rows.

2个电子邮件地址的区别不仅仅是大小写(SMITHWORKS @ GMAIL.COM <> smithworks@gmail.com还有2个空格,因此它们将是“不同的”,因此被视为2行。

You could try lower(Customers.EmailAddress) in both the select and group by clauses. Also your existing query can be simplified by using "conditional aggregates" (put a case expression inside an aggregate function):

您可以在select和group by子句中尝试降低(Customers.EmailAddress)。此外,您可以通过使用“条件聚合”(将一个案例表达式放在聚合函数中)来简化现有查询:

SELECT
      lower(Customers.EmailAddress)
    , COUNT(Orders.OrderID)   AS 'overall NumOrders'
    , SUM(Orders.PaymentAmount)  AS 'overall TotalOrdered'
    , COUNT(case when Orders.OrderDate >= '20170101' then Orders.OrderID       end) AS '2017 NumOrders'
    , SUM(  case when Orders.OrderDate >= '20170101' then Orders.PaymentAmount end) AS '2017 TotalOrdered'
    , COUNT(case when Orders.OrderDate <  '20170101' then Orders.OrderID       end) AS '2016 NumOrders'
    , SUM(  case when Orders.OrderDate <  '20170101' then Orders.PaymentAmount end) AS '2016 TotalOrdered'
    , o2.TotalOrders  AS '2016 NumOrders'
    , o2.TotalOrdered AS '2016 TotalOrdered'
FROM Customers
JOIN Orders ON Customers.Customerid = Orders.Customerid
WHERE Orders.OrderStatus NOT IN ('Cancelled', 'Payment Declined')
AND Orders.OrderDate >= '20160101' AND Orders.OrderDate < '20171201'
GROUP BY
      lower(Customers.EmailAddress)

Notes:

  • In SQL Server the safest possible date literal is NOT in MM/dd/yyyy. The safest way to specify a date in T-SQL is YYYYMMDD (no seperators)
  • 在SQL Server中,最安全的日期文字不在MM / dd / yyyy中。在T-SQL中指定日期的最安全方法是YYYYMMDD(没有分隔符)

  • AND '11/30/2017 23:59' is NOT a good way to define an end point, it is one full minute short of Dec. 1st. Far better to stop using between for date ranges. "Less than December 1 2017" is the most accurate way of defining the cutoff point. So, instead of between, use the style shown in this answer which involves >= and <
  • AND '11 / 30/2017 2017:59'不是定义终点的好方法,它比12月1日短一分钟。最好停止在日期范围之间使用。 “少于2017年12月1日”是定义截止点的最准确方法。因此,请使用此答案中显示的涉及> =和 <的样式,而不是介于两者之间< p>

  • 'select distinct' is NOT needed when you are doing group by in a single query. (The group by clause by definition produces rows that are unique so select distinct is just a waste of time. The grouping is done before the select cluse too.)
  • 当您在单个查询中进行分组时,不需要“select distinct”。 (group by子句按定义生成唯一的行,因此select distinct只是浪费时间。分组也在select cluse之前完成。)

  • FULL JOIN is very expensive, and not often used. It wasn't required in your existing query because all the data comes from the exact same tables, so there can be no unmatched rows coming from the subqueries.
  • FULL JOIN非常昂贵,并且不经常使用。在现有查询中不需要它,因为所有数据都来自完全相同的表,因此子查询中不会有不匹配的行。