20

Having a problem with Event Notifications. On the machine/drive/database that the messages are sent to (receiver), the drive filled up when nobody was looking, so it's been backed up all day.

Now that we freed up space on the drive, it's accepting messages into the queue, but it doesn't seem to be processing them - no new records inserted, even though the queue now has 22 million messages and growing(!). The queue IS enabled:

is_activation_enabled = 1
is_receive_enabled = 1
is_enqueue_enabled = 1

I see the activated SP in activation_procedure, but when I look in SP_WHOISACTIVE, I don't see any active readers.

Before I blow out the drive again - what am I doing wrong? How can I get it to either process or flush the messages? Thanks in advance.

Update

One thought - since I have is_enqueue_enabled, maybe it's storing all the messages until it can process all of them? If so, can I safely turn that off?

CREATE PROCEDURE [dbo].[Parse_EN_Messages]
AS
--mdb 2012/09/05 version 1.2  
-- With apologies and thanks to Remus Rusanu, Jonathon Kehayias, Mladen Prajdic, and Jasper Smith for writing
-- about EN, answering questions, and getting the word out about this awesome feature of SQL Server 2005+.
-- Also thanks to Mikael Eriksson for a faster parse with the XML filter.
-- Their code modified, combined, and used below.  Any errors herein are mine, not theirs.  
-- Part of the code came from MVP Deep Dives Vol 1 Chapter 28 (Mladen), PASS Presentations by Jasper and Jonathon,
-- and Stackexchange (below) from Remus and Mikael Eriksson
-- http://dba.stackexchange.com/questions/10273/how-to-create-an-event-notification-that-runs-a-job-procedure-when-mirroring-sta
-- http://stackoverflow.com/questions/12308099/t-sql-dynamically-filter-xml-on-multiple-conditions/12358926
--History:  1.00 2012/08/27 first release
--          1.01 2012/09/05 added server-based exclusions and eventsubclass = 0
--          1.1  2012/09/17 added exclusion_sets which allow multi-condition filtering and improved performance
--          1.11 2012/10/05 removing the 1=1 as per suggestion by Rusanu; 
--                              this could cause it to spin forever, blowing out the error_log..and the drive.
--          1.12 2012/11/14 adding a RETURN in the @@ROWCOUNT. It fails to exit and then hits a COMMIT, causing records
--                              in enaudit_error.  That was due to the 1.11 change where I no longer use a 1=1.
--          1.13 2014/01/16 changing ERRORLOG to write the first 500 chars to the CommandText field, as tested in Canada.

SET NOCOUNT ON 
DECLARE @message_type NVARCHAR(256),
@message VARBINARY(MAX),
@conversation_handle UNIQUEIDENTIFIER,
@auditdata XML,
@queuing_order BIGINT,
@conversation_group_id UNIQUEIDENTIFIER

BEGIN
    BEGIN TRANSACTION;
    BEGIN TRY;
    WAITFOR (
        RECEIVE TOP(1)
        @conversation_handle = [conversation_handle], --aka dialog
        @conversation_group_id = [conversation_group_id],
        @message_type = message_type_name,
        @message = message_body, 
        @queuing_order = queuing_order
        FROM dbo.ENAudit_SBQueue --ORDER BY queuing_order --order by doesn't work there.
        ), TIMEOUT 5000 --we need the timeout so that it won't hold transactions open indefinitely.

    IF (@@ROWCOUNT = 0)
    BEGIN
        ROLLBACK TRANSACTION
  --mdb 1.12 2012/11/14 adding a return as otherwise it tries to commit later and fails, causing records in enaudit_error      
        RETURN 
    END

    SELECT @auditdata = CAST(@message AS XML)

    IF @message_type = N'http://schemas.microsoft.com/SQL/Notifications/EventNotification'
    -- Dynamically shred the XML and compare to our exclusion table.  You should be able to filter on any field.
    -- Exclusion set: unique char(2) name. Has to match every condition.  Servername is one of the fields handled.
    -- Be careful as the filters could impact performance.
    and NOT EXISTS --if all active members of the same exclusion_set match, the event is excluded.
    (
    SELECT * FROM 
    (
    select COUNT(*) AS match_count, exclusion_set
                  from enaudit_exclusion_list 
                  where exists (
                               select *
                               from (
                                    select X.N.value('local-name(.)', 'varchar(128)') as NodeName,
                                           X.N.value('./text()[1]', 'varchar(max)') as NodeValue
                                    from @auditdata.nodes('//*') as X(N)
                                    ) T
                               where T.NodeName = enaudit_exclusion_list.exclusion_type and
                                     T.NodeValue like enaudit_exclusion_list.excluded_value 
                                     AND  enaudit_exclusion_list.active = 1
                               )
    GROUP BY exclusion_set
    ) matches_per_set
    INNER JOIN 
    (SELECT COUNT(*) AS total_count, exclusion_set FROM enaudit_exclusion_list WHERE active = 1 GROUP BY exclusion_set) grouped_set
    ON match_count = total_count
    AND grouped_set.exclusion_set = matches_per_set.exclusion_set
    )
    BEGIN
    INSERT INTO ENAudit_Events
            ( ServerName ,
                queuing_order ,
                PostTime ,
                StartTime,
                EventType ,
                SPID ,
                LoginName ,
                UserName ,
                DatabaseName ,
                SchemaName ,
                ObjectName ,
                ObjectType ,
                TargetObjectName ,
                TargetObjectType ,
                CommandText ,
                insert_datetime,
                message_body_xml
            )
--over 128 elements exist, I've chosen the most useful for what I'm doing.
--To get a full list, GROUP BY in the XSD from http://schemas.microsoft.com/sqlserver/2006/11/eventdata/events.xsd
-- More information in EVENTDATA http://msdn.microsoft.com/en-us/library/ms187909.aspx
    SELECT 
        @auditdata.value('(/EVENT_INSTANCE/ServerName)[1]', 'varchar(128)' ) AS ServerName,
        @queuing_order AS Queuing_Order, 
        @auditdata.value('(/EVENT_INSTANCE/PostTime)[1]', 'datetime') AS PostTime,
        @auditdata.value('(/EVENT_INSTANCE/StartTime)[1]', 'datetime') AS StartTime,
        @auditdata.value('(/EVENT_INSTANCE/EventType)[1]', 'varchar(128)' ) as EventType,
        @auditdata.value('(/EVENT_INSTANCE/SPID)[1]', 'bigint') AS SPID,
        @auditdata.value('(/EVENT_INSTANCE/LoginName)[1]', 'varchar(128)' ) AS LoginName,
        @auditdata.value('(/EVENT_INSTANCE/UserName)[1]', 'varchar(128)' ) AS UserName,
        @auditdata.value('(/EVENT_INSTANCE/DatabaseName)[1]', 'varchar(128)' ) AS DatabaseName,
        @auditdata.value('(/EVENT_INSTANCE/SchemaName)[1]', 'varchar(128)' ) AS SchemaName,
        @auditdata.value('(/EVENT_INSTANCE/ObjectName)[1]', 'varchar(128)' ) AS ObjectName,
        @auditdata.value('(/EVENT_INSTANCE/ObjectType)[1]', 'varchar(128)' ) AS ObjectType,
        @auditdata.value('(/EVENT_INSTANCE/TargetObjectName)[1]', 'varchar(128)' ) AS TargetObjectName,
        @auditdata.value('(/EVENT_INSTANCE/TargetObjectType)[1]', 'varchar(128)' ) AS TargetObjectType,
        --@auditdata.value('(/EVENT_INSTANCE/PropertyName)[1]', 'varchar(128)' ) AS PropertyName,
        --@auditdata.value('(/EVENT_INSTANCE/PropertyValue)[1]', 'varchar(128)' ) AS PropertyValue,
        --@auditdata.value('(/EVENT_INSTANCE/Parameters)[1]', 'varchar(128)' ) AS Parameters,
        CASE @auditdata.value('(/EVENT_INSTANCE/EventType)[1]', 'varchar(128)' )
        WHEN 'ERRORLOG' THEN @auditdata.value('/EVENT_INSTANCE[1]/TextData[1]', 'varchar(500)')
        ELSE @auditdata.value('(/EVENT_INSTANCE/TSQLCommand/CommandText)[1]', 'varchar(max)' ) END AS CommandText,
        GETDATE(),
        @auditdata 
    --Other possibilities for doing a dynamic XML query?
    --      http://www.dotnetgenerics.com/Modules/TricksAndTips/SQLServer/DynamicWhereClause.aspx
    --      http://www.beefycode.com/post/Expressing-Filter-Queries-as-XML-in-SQL-Server.aspx
    --      http://stackoverflow.com/questions/923136/t-sql-filtering-on-dynamic-name-value-pairs
    --      http://stackoverflow.com/questions/1729973/filter-sql-queries-on-the-xml-column-using-xpath-xquery
    END         
    else 
    IF @message_type = N'http://schemas.microsoft.com/SQL/ServiceBroker/Error' --log error messages
    BEGIN 
        WITH XMLNAMESPACES ('http://schemas.microsoft.com/SQL/ServiceBroker/Error' AS ssb)
        INSERT INTO ENAudit_Errors ([conversation_group_id], [conversation_handle], 
        [queuing_order], error_code, error_description, insert_datetime, message_body_raw)
        SELECT @conversation_group_id, @conversation_handle, @queuing_order,
                @auditdata.value('(//ssb:Error/ssb:Code)[1]', 'INT') AS error_code,
                @auditdata.value('(//ssb:Error/ssb:Description)[1]', 'NVARCHAR(4000)') AS error_description,
                GETDATE(), 
                @message

         end conversation @conversation_handle --close the conversation if there was an error
    END
    ELSE 
    IF @message_type =  N'http://schemas.microsoft.com/SQL/ServiceBroker/EndDialog'
        begin
            end conversation @conversation_handle;
        end
    COMMIT TRANSACTION;
    END TRY
    BEGIN CATCH
    declare @xact_state int = xact_state(), 
            @error_number int = error_number(), 
            @error_message nvarchar(4000) = error_message(),
            @has_rolled_back bit = 0;
        if @xact_state = -1
        begin
            -- Doomed transaction, it must rollback
            rollback;
            set @has_rolled_back = 1;
        end
        else if @xact_state = 0
        begin
            -- transaction was already rolled back (deadlock?)
            set @has_rolled_back = 1;
        end
        insert INTO ENAudit_Errors(
            insert_datetime,
            error_code,
            error_description,
            message_body_raw)
        values (
            getdate(),
            @error_number,
            @error_message,
            @message);
        if (@has_rolled_back = 0)
        begin
            commit;
        end
    end catch
 END




GO
mbourgon
  • 433
  • 3
  • 9
  • 1
    So, from the description of your problem, you're using internal activation. Is there anything listed in sys.dm_broker_activated_tasks? What does your activation procedure look like? Are you able to post it here? It could be that the procedure is trying to process all the messages at once (i.e. in one transaction) which could be bad. Also, there's nothing that prevents you from running the activation by hand yourself. – Ben Thul Nov 16 '15 at 03:14
  • 2
    Queue is probably disabled due to a poison message (too many failures). Please query sys.transmission_queue and tell us what errors it is reporting. If my suspicion is correct, then you should try to restart the queue using an ALTER QUEUE statement. – Robert L Davis Nov 16 '15 at 04:14
  • Comment first for @BenThul : Nothing in sys.dm_broker_activated tasks. It's not trying to process all at once - it's doing one at a time. ("receive top (1) [...]from EnAudit_SBQueue"). And I can't run while queue is disabled. And if queue is enabled, it's growing fast enough (10mb every couple seconds) that it's going to blow out the drive again, even with 4 spids running the processing in a loop. – mbourgon Nov 16 '15 at 04:22
  • @RobertLDavis it's not poisoned message - checked my table. I have a monitor checking for "queue disabled", too, and it's fine. It'll turn on and accept messages just fine, though. just won't process them. Sys.transmission_queue is empty. – mbourgon Nov 16 '15 at 04:24
  • @BenThul okay, when the queue is on, after a minute I do see the 5 readers I should have. However, when I tried running a test for a couple of minutes, where I enabled the queue and manually ran 4 loops of the activated SP, I only wound up with 32 rows inserted into the final table. Granted, not all rows make it in, but it ran enabled for several hours without any rows making it in - which seems odd, right? If I run it by hand, why does it process so many more than the activated SP? My only idea is that it's so swamped catching, it doesn't have time to process (or is blocked on RECEIVE). – mbourgon Nov 16 '15 at 04:42
  • @shawnmelton thanks for tweaking the formatting. – mbourgon Nov 16 '15 at 04:47
  • Just because you're doing a receive top(1) doesn't mean that it's doing them one at a time. That is, if you do something like: begin tran; while(1=1)begin; receive top(1)…; if (@@rowcount = 0) break; <<do processing>>;end; commit you'd effectively be trying to process all the messages in one transaction. Again, I think seeing the proc would be helpful. – Ben Thul Nov 16 '15 at 05:21
  • 3
    @BenThul added the Activated Stored Procedure code to the question. FWIW, I don't use 1=1. I spoke with Remus Rusanu about it a couple of years back, and he spoke on how there's no reason whatsoever to use it, and I modified mine accordingly. That being said, I probably ought to look at changing it to perform batches of 100/1000, for speed. But until it got backed up, it had been working like a champ, so I was loathe to touch it. – mbourgon Nov 16 '15 at 15:28
  • 2
    I disagree that the loop is unnecessary for exactly the situation that you're in now. According to this BOL article (https://technet.microsoft.com/en-us/library/ms171601%28v=sql.105%29.aspx), activation is not likely being triggered often enough. Which is why, when it is triggered, you want each procedure to keep processing messages until there aren't any left to process. Just my two cents. In order to dig yourself out of the hole you're in now, you could do something like while exists (select 1 from dbo.ENAudit_SBQueue) begin exec [dbo].[Parse_EN_Messages]; end in SSMS to fake it. – Ben Thul Nov 16 '15 at 17:20
  • 6
    Have a look at http://rusanu.com/2008/08/03/understanding-queue-monitors/ and see if it applies – Remus Rusanu Nov 16 '15 at 18:01
  • @BenThul when we were having the problem i manually ran 4 SPIDs like you suggested. In 2 or 3 minutes they wrote a total of 32 rows to my final table. Granted, that's more than the activated SP "attached" to the queue were doing, so that hints at something else amiss. Apparently I'm going to have to try and duplicate it. – mbourgon Nov 17 '15 at 23:24
  • @RemusRusanu I'm not sure - I met all 4 of the criteria at the time (I was checking sys.service_queues, though not dm_broker_queue_monitors), and the activated SP had been running fine for months before. I do remember checking sp_whoisactive, which normally shows a couple spids as active, and it didn't have anything. Guess I need to reproduce the problem and check queue_monitors. I do know that at one point I tried turning off and on the queue, so maybe it was stuck in that state. Will try and duplicate. – mbourgon Nov 17 '15 at 23:40
  • Troubleshoot it like any other slow stored procedure. Find out which statement(s) is/are slow in the proc then see what they're waiting on. It could have nothing to do with service broker or everything to do with it. But until you profile and get data, it's just guessing. – Ben Thul Nov 18 '15 at 05:05
  • 1
    By way of coincidence, I'm implementing my own SB solution right now. I found that by taking any transactions out of the activation procedure that performance jumped dramatically. As a compromise, I implemented a "dead letter" table. I wrapped my processing logic in a gigantic "try/catch" and if there are any problems, I throw the message in that table. But processing the queue marches forward (i.e. no rollbacks). – Ben Thul Nov 20 '15 at 19:58

1 Answers1

2

For the tcp endpoints, both sides, consider service acct used and connect permission --try re-granting, then new start on endpoints --> even if gui or dmv says started.