U-SQL构建错误,equijoin有不同的类型

时间:2021-08-27 15:49:52

I'm trying to create a USQL job and defined my columns from the CSVs they will be retrieved from, however I'm always having issues on the JOIN portion, because the columns I am matching are of a different type. This is weird because I have defined them to be the same type. See screenshot of where the issue lies:

我正在尝试创建一个USQL作业,并从要检索的csv中定义我的列,但是我在JOIN部分总是遇到问题,因为我匹配的列是不同类型的。这很奇怪,因为我将它们定义为相同的类型。见问题所在的截图:

U-SQL构建错误,equijoin有不同的类型

Here is the complete USQL:

以下是完整的USQL:

@guestCheck = 
    EXTRACT GuestCheckID int,
            POSCheckGUID Guid,
            POSCheckNumber int?,
            OwnerEmployeeID int,
            CreatedDateTime DateTime?,
            ClosedDateTime DateTime?,
            TicketReference string,
            CheckAmount decimal?,
            POSTerminalID int,
            CheckState string,
            LocationID int?,
            TableID int?,
            Covers int?,
            PostedDateTime DateTime?,
            OrderChannelID int?,
            MealPeriodID int?,
            RVCLocationID int?,
            ReopenedTerminalID int?,
            ReopenedEmployeeID int?,
            ReopenedDateTime DateTime?,
            ClosedBusDate int?,
            PostedBusDate int?,
            BusHour byte?,
            TaxExempt bool?,
            TaxExemptReference string
    FROM "/GuestCheck/GuestCheck-incomplete.csv"
    USING Extractors.Csv();

@guestCheckAncillaryAmount =
    EXTRACT CheckAncillaryAmountID int,
            GuestCheckID int,
            GuestCheckItemID int?,
            AncillaryAmountTypeID int,
            Amount decimal,
            FirstDetail int?,
            LastDetail int?,
            IsReturn bool?,
            ReturnReasonID int?,
            AncillaryReasonID int?,
            AncillaryNote string,
            ClosedBusDate int?,
            PostedBusDate int?,
            BusHour byte?,
            LocationID int?,
            RVCLocationID int?,
            IsDelisted bool?,
            Exempted bool?
    FROM "/GuestCheck/GuestCheckAncillaryAmount.csv"
    USING Extractors.Csv();

@ancillaryAmountType = 
    EXTRACT AncillaryAmountTypeID int,
            AncillaryAmountCategoryID int,
            CustomerID int,
            CheckTitle string,
            ReportTitle string,
            Percentage decimal,
            FixedAmount decimal,
            IncludeOnCheck bool,
            AutoCalculate bool,
            StoreAtCheckLevel bool?,
            DateTimeModified DateTime?,
            CheckTitleToken Guid?,
            ReportTitleToken Guid?,
            DeletedFlag bool,
            MaxUsageQty int?,
            ApplyToBasePriceOnly bool?,
            Exclusive bool,
            IsItem bool,
            MinValue decimal,
            MaxValue decimal,
            ItemGroupID int?,
            LocationID int,
            ApplicationOrder int?,
            RequiresReason bool,
            Exemptable bool?
    FROM "/GuestCheck/AncillaryAmountType.csv"
    USING Extractors.Csv();

@read =
    SELECT t.POSCheckGUID,
           t.POSCheckNumber,
           t.CheckAmount,
           aat.AncillaryAmountTypeID,
           aat.CheckTitle,
           gcd.Amount
    FROM @guestCheck AS t         
         LEFT JOIN
             @guestCheckAncillaryAmount AS gcd
         ON t.GuestCheckID == gcd.GuestCheckID
         LEFT JOIN
             @ancillaryAmountType AS aat
         ON gcd.AncillaryAmountTypeID == aat.AncillaryAmountTypeID
    WHERE aat.AncillaryAmountCategoryID IN(2, 4, 8);

OUTPUT @read
TO "/GuestCheckOutput/output.csv"
USING Outputters.Csv();

2 个解决方案

#1


2  

Indeed, U-SQL is strongly typed, and int and int? are different types. You would need to cast in an intermediate rowset:

实际上,U-SQL是强类型的,而int和int呢?是不同的类型。您将需要使用中间行集:

@ancillaryAmountType2 =
SELECT (int?) aat.AncillaryAmountTypeID AS AncillaryAmountTypeID,
       aat.AncillaryAmountCategoryID,
       aat.CheckTitle
FROM @ancillaryAmountType AS aat;

Or, better, use dimensional modeling best practice, and avoid nullable "dimensions" for the reasons stated in http://blog.chrisadamson.com/2013/01/avoid-null-in-dimensions.html.

或者,更好地使用维度建模最佳实践,并避免由于在http://blog.chrisadamson.com/2013/01/avoid-null-in-dimensions.html中所描述的原因而为空的“维度”。

#2


3  

This is not to do with the nullability of the columns as specified in the EXTRACT table definition, because as the OP has shown in their code, neither of the join columns are specified as null (ie with ?) in the EXTRACT definition. This is do with the multiple outer joins and what is known as a null-supplying table.

这与提取表定义中指定的列的可空性无关,因为正如OP在它们的代码中所示,提取定义中没有一个连接列被指定为null(即with ?)。这是对多个外部连接以及称为null-supply表的操作。

If you think about it logically, imagine you have three tables, TableA had 3 records, TableB has two records and TableC has one record, something like this:

如果你从逻辑上考虑,假设你有三个表,表a有3条记录,表b有2条记录,TableC有一条记录,大概是这样的:

U-SQL构建错误,equijoin有不同的类型

If you start with tableA and left outer join to tableB you instinctively know you will get three records but the column x will be null for tableB column x; this is your null-supplying table and where the nullability is coming from.

如果你从表a开始,把外部连接留给表b你本能地知道你会得到三条记录但是x列对于表b列x是空的;这是您的nullsupply表,而nullability来自于此。

Thankfully the fix is the same; change the nullability of the column earlier on or specify substitued values, eg -1.

幸运的是,解决方法是一样的;更改前面的列的空值,或者指定替换值(如-1)。

@t3 =
    SELECT (int?) x AS x, 2 AS a
    FROM dbo.tmpC;

// OR

// Use conditional operator to supply substitute values
@t3 =
    SELECT x == null ? -1 : x AS x, 2 AS a
    FROM dbo.tmpC;

However there is another problem with your particular query. In most relational databases, adding a WHERE clause to a table on the right-hand side of a left outer join converts the join to an inner join and it's the same in U-SQL. You might want to think about the real result you are trying to get and consider rewriting your query.

然而,您的特定查询还有另一个问题。在大多数关系数据库中,将WHERE子句添加到左侧外部连接的右边的表中,可以将连接转换为内部连接,在U-SQL中也是如此。您可能想要考虑要获得的真实结果,并考虑重写查询。

HTH

HTH

#1


2  

Indeed, U-SQL is strongly typed, and int and int? are different types. You would need to cast in an intermediate rowset:

实际上,U-SQL是强类型的,而int和int呢?是不同的类型。您将需要使用中间行集:

@ancillaryAmountType2 =
SELECT (int?) aat.AncillaryAmountTypeID AS AncillaryAmountTypeID,
       aat.AncillaryAmountCategoryID,
       aat.CheckTitle
FROM @ancillaryAmountType AS aat;

Or, better, use dimensional modeling best practice, and avoid nullable "dimensions" for the reasons stated in http://blog.chrisadamson.com/2013/01/avoid-null-in-dimensions.html.

或者,更好地使用维度建模最佳实践,并避免由于在http://blog.chrisadamson.com/2013/01/avoid-null-in-dimensions.html中所描述的原因而为空的“维度”。

#2


3  

This is not to do with the nullability of the columns as specified in the EXTRACT table definition, because as the OP has shown in their code, neither of the join columns are specified as null (ie with ?) in the EXTRACT definition. This is do with the multiple outer joins and what is known as a null-supplying table.

这与提取表定义中指定的列的可空性无关,因为正如OP在它们的代码中所示,提取定义中没有一个连接列被指定为null(即with ?)。这是对多个外部连接以及称为null-supply表的操作。

If you think about it logically, imagine you have three tables, TableA had 3 records, TableB has two records and TableC has one record, something like this:

如果你从逻辑上考虑,假设你有三个表,表a有3条记录,表b有2条记录,TableC有一条记录,大概是这样的:

U-SQL构建错误,equijoin有不同的类型

If you start with tableA and left outer join to tableB you instinctively know you will get three records but the column x will be null for tableB column x; this is your null-supplying table and where the nullability is coming from.

如果你从表a开始,把外部连接留给表b你本能地知道你会得到三条记录但是x列对于表b列x是空的;这是您的nullsupply表,而nullability来自于此。

Thankfully the fix is the same; change the nullability of the column earlier on or specify substitued values, eg -1.

幸运的是,解决方法是一样的;更改前面的列的空值,或者指定替换值(如-1)。

@t3 =
    SELECT (int?) x AS x, 2 AS a
    FROM dbo.tmpC;

// OR

// Use conditional operator to supply substitute values
@t3 =
    SELECT x == null ? -1 : x AS x, 2 AS a
    FROM dbo.tmpC;

However there is another problem with your particular query. In most relational databases, adding a WHERE clause to a table on the right-hand side of a left outer join converts the join to an inner join and it's the same in U-SQL. You might want to think about the real result you are trying to get and consider rewriting your query.

然而,您的特定查询还有另一个问题。在大多数关系数据库中,将WHERE子句添加到左侧外部连接的右边的表中,可以将连接转换为内部连接,在U-SQL中也是如此。您可能想要考虑要获得的真实结果,并考虑重写查询。

HTH

HTH