如何调试在任何测试环境中似乎无法重现的SQL Server数据问题？

I have a stored procedure in production that does 2 things. It updates one table and then inserts a record into another. The first step (the update) seems to occur but we've found instances by examining the data where the second step did not occur. I have looked at the data and confirmed that it is not a data issue. I've confirmed that the queries return the appropriate data in order to ensure that the queries complete and in normal circumstances both should execute. I don't know if perhaps there is some sort of performance issue ... or blocking issue that is occurring on the second step that prevents that step from occurring.

我有一个存储过程在生产中做两件事。它更新一个表，然后将记录插入另一个表。第一步（更新）似乎发生但我们通过检查第二步没有发生的数据找到了实例。我查看了数据并确认它不是数据问题。我已经确认查询返回相应的数据，以确保查询完成，并且在正常情况下都应该执行。我不知道是否存在某种性能问题......或者阻止该步骤发生的第二步阻塞问题。

The error handling for the stored procedure is as follows.

存储过程的错误处理如下。

BEGIN TRY  

BEGIN TRANSACTION;  

 -- perform update to data

 -- insert record into second table. 


 IF ( @@ERROR = 0 AND @@TRANCOUNT > 0 )  
    COMMIT TRANSACTION;  

 END TRY  
 BEGIN CATCH  

   IF ( @@TRANCOUNT > 0 )  
   BEGIN   
      ROLLBACK TRANSACTION;  
   END  

   DECLARE @WebSafeErrorId INT;  
   EXEC dbo.spErrorInsert @WebSafeErrorId OUTPUT, 'Proc';  

   -- Reraise the error back to the client.  
   IF ( @WebSafeErrorId != 0 )   
   BEGIN   
      DECLARE @Error VARCHAR(20);  
      SET @Error = CAST( @WebSafeErrorId AS VARCHAR(20) );  
      RAISERROR( @Error, 11, 1 );  
   END  
   ELSE  
   BEGIN  
      RAISERROR( 'An error has occurred but there is no error to log.', 11, 1 );  
   END   

END CATCH;

Surely if an error occurred in this procedure that cause the insert to not occur it would be logged and then raised. The code for spErrorInsert is below ...

当然，如果此过程中发生错误导致插入不发生，则会记录然后引发。 spErrorInsert的代码如下......

CREATE PROCEDURE [dbo].[spErrorInsert]  
@ReturnErrorId INT OUTPUT  
, @ErrorSourceType VARCHAR(4) = NULL  
, @ParentErrorId INT = NULL  
, @StackTrace VARCHAR(MAX) = NULL  
AS  

SET NOCOUNT ON;  
--SET XACT_ABORT ON;  

-- Will indicate an error was not logged.  
SET @ReturnErrorID = 0;   

DECLARE  
  @ErrorSource VARCHAR(200)  
  , @ErrorMessage VARCHAR(MAX)  
  , @ComposedErrorMessage VARCHAR(MAX)  
  , @ErrorLine INT  
  , @ErrorSeverity INT  
  , @ErrorState INT  
  , @ErrorNumber INT;  

 SET @ErrorSource = ERROR_PROCEDURE();  
 SET @ErrorMessage = ERROR_MESSAGE();  
 SET @ErrorLine = ERROR_LINE();  
 SET @ErrorSeverity = ERROR_SEVERITY();  
 SET @ErrorState = ERROR_STATE();  
 SET @ErrorNumber = ERROR_NUMBER();  
 SET @ComposedErrorMessage = 'Message: Error occurred in procedure ' + CAST( @ErrorSource AS VARCHAR(MAX) )  
  + ' on line ' + CAST( @ErrorLine AS VARCHAR(MAX) )   
  + '. Error: ' + @ErrorMessage;  

BEGIN TRY  

   INSERT INTO Errors(  
      ParentId  
      , ErrorSourceType  
      , ErrorSource  
      , [Message]  
      , [LineNo]  
      , Severity  
      , Stacktrace  
      , ts)  
   VALUES (@ParentErrorId  
      , @ErrorSourceType --@ErrorSourceType --- NOTE: move this into a parameter ...   
      , @ErrorSource  
      , @ComposedErrorMessage  
      , @ErrorLine  
      , @ErrorState  
      , @Stacktrace  
      , GETDATE()  
      );  

  SET @ReturnErrorId = SCOPE_IDENTITY();  

END TRY  
BEGIN CATCH  

   RAISERROR( 'An error has occurred but there is no error to log.', 11, 1 );  

END CATCH;

I don't know if maybe there is a way to get a snapshot of what's going on the database at a specific time when a certain procedure is called ... I'm not sure how to determine if something isn't happening when it should? Are there any tools that I can make use of or sql features that I don't know about???

我不知道是否有一种方法可以在调用某个程序的特定时间获取数据库上发生的事情的快照...我不确定如何判断某些事情是否发生在它时应该？有没有我可以使用的工具或我不知道的sql功能???

4 个解决方案

#1

If you want to monitor the database, SQL Profiler is a good place to start but it is going to be deprecated.

如果要监视数据库，SQL Profiler是一个很好的起点，但它将被弃用。

Extended events are much more capable and I would suggest reading about those if you are really interested in monitoring what's going on.

扩展事件的能力要强得多，如果你真的有兴趣监控正在发生的事情，我建议你阅读这些事件。

As a thought, if your procedure code is using the same data to update the row as it is to insert to the other table, consider using OUTPUT.

考虑一下，如果您的过程代码使用相同的数据来更新行以插入到另一个表，请考虑使用OUTPUT。

Update table set col1 = 'value'
OUTPUT inserted.col INTO othertable
Where col3 = stuff

OUTPUT and OUTPUT INTO

输出和输出

Or if this is for some sort of Audit or Log table, you can use DELETED.col1 That will be the original value prior to it being updated. Note that INSERTED will return the value that you are updating or inserting, it's just called INSERTED for both.

或者，如果这是针对某种审计或日志表，您可以使用DELETED.col1这将是更新之前的原始值。请注意，INSERTED将返回您正在更新或插入的值，它只是为两者调用INSERTED。

#2

If you have a copy of Visual Studio, try it. It allows you to step through stored procedures.

如果您有Visual Studio的副本，请尝试它。它允许您逐步执行存储过程。

#3

The approach I would try is to firstly take a copy of this procedure and comment out the try/catch. I have found that tsql does not raise errors if the error generating code is within a try/catch block - I guess this is the sql equivalent of an exception being handled by the catch clause.

我尝试的方法是首先获取此过程的副本并注释掉try / catch。我发现如果错误生成代码在try / catch块中，tsql不会引发错误 - 我猜这是由catch子句处理的异常的sql等价物。

I use a table in my database to keep a permanent record of errors as they occur (a trick I learned doing embedded programming) The errors table creation code is :

我在我的数据库中使用一个表来保存错误的永久记录（我学习嵌入式编程的一个技巧）错误表创建代码是：

CREATE TABLE dbo.errors (
id smallint NOT NULL IDENTITY(1, 1) PRIMARY KEY,
errordesc nvarchar (max) NOT NULL,
dateandtime smalldatetime NOT NULL, -- Date and time of last occurance
errorcount int NOT NULL) ;

My stored procedure for adding a record into the error table is:

我将存储过程添加到错误表中的记录是：

CREATE PROCEDURE jc_diagnostics.jpwsp0005_adderrordescription( 
    @Errordescription nvarchar( max ))
AS 
    BEGIN
    DECLARE
       @Id smallint = 0 ,
       @Currentcount int = 0;
    IF((@Errordescription IS NULL) OR ( @Errordescription = ''))
        BEGIN
            SET @Errordescription = 'Error description missing';
        END;

    SELECT @Id = ( SELECT TOP ( 1 ) id
                 FROM jc_diagnostics.errors
                 WHERE errordesc = @Errordescription );
    IF(@Id IS NOT NULL)
    BEGIN
        SELECT @Currentcount = (SELECT errorcount
                                   FROM jc_diagnostics.errors
                                   WHERE id = @Id );
        SET @Currentcount = @Currentcount + 1;
        UPDATE jc_diagnostics.errors
        SET errorcount = @Currentcount 
          WHERE id = @Id;

        UPDATE jc_diagnostics.errors
        SET dateandtime = CONVERT(smalldatetime , GETDATE())
          WHERE id = @Id;
    END;
ELSE
    BEGIN
        --new entry
        INSERT INTO jc_diagnostics.errors( errordesc ,
                                           dateandtime ,
                                           errorcount )
        VALUES( @Errordescription ,
                CONVERT(smalldatetime , GETDATE()) ,
                1 );
    END;
IF(@Id IS NULL)
    BEGIN
        SET @Id = SCOPE_IDENTITY( );
    END;

RETURN @Id;
END;

The calling code when an error occurs is:

发生错误时的调用代码是：

Declare @Failuredesc nvarchar(max) = 'Description of error';
EXEC @Retval = jc_diagnostics.jpwsp0005_adderrordescription @Failuredesc;

The return value @Retval contains the id of the record in the error table so you can look it up

返回值@Retval包含错误表中记录的id，因此您可以查找它

Finally I would create some code to continuously call your procedure until an error is declared. You can then inspect the error table and see if this throws light on your problem.

最后，我将创建一些代码来连续调用您的过程，直到声明错误。然后，您可以检查错误表，看看是否能够解决您的问题。

Hope this helps. Jude

希望这可以帮助。裘德

#4

Logically thinking - because you declare transaction before these 2 steps, any error would result in rollback of both transactions. So most likely there is no error at all here. I would suggest inspect your queries again as it seems that the problem could be in them rather than anywhere else. Please post the entire code if you like more suggestions.

逻辑思考 - 因为您在这两个步骤之前声明了事务，任何错误都会导致两个事务的回滚。所以很可能这里根本没有错误。我建议再次检查你的查询，因为看起来问题可能在其中而不是其他任何地方。如果您想要更多建议，请发布整个代码。

Regards

问候

Roman

罗马

#1