Session Capture System

April 4, 2011

“Tom, there was a production problem yesterday at about 2.30pm for 20 minutes. Users complained about timeouts. What was going on in the database?”

This is a classic  “umm … (oh damn)” moment for a DBA. You weren’t running Profiler. You had no Perfmon running.Your boss won’t countenance spending several thousand big ones on 3rd party performance apps. You can’t go back in time. But you want to! What was running? What plans? What were they waiting on? Was there blocking?

Help is at hand.

It is quite easy to implement a session capture system. This will query the DMVs to capture current running process information, every (specified) X seconds, indefinitely. It will capture a lot of data but will auto-prune itself to prevent excessive data bloat. And having it will mean you can precisely answer that formerly embarrassing question: “well, yesterday at 2.30pm, the following was happening … “. You will look good, prepared, clever, proactive, and in charge. Your users, your boss, your developers – your business – will love you.

This is how you do it.

1. Create a utitlity DBA database on your Prod server. It will need 10 – 15 GB disk space. Put it in the SIMPLE recovery mode. Ensure it is being backed up regularly.

2. Create the following table:-


CREATE TABLE [dbo].[session_capture](

[date_captured] [datetime] NULL,

[session_id] [smallint] NULL,

[database_id] [smallint] NULL,

[host_name] [nvarchar](256) NULL,

[command] [nvarchar](32) NULL,

[sql_text] [xml] NULL,

[wait_type] [nvarchar](120) NULL,

[wait_time] [int] NULL,

[wait_resource] [nvarchar](512) NULL,

[last_wait_type] [nvarchar](120) NULL,

[total_elapsed_time] [int] NULL,

[blocking_session_id] [int] NULL,

[blocking_text] [xml] NULL,

[start_time] [datetime] NULL,

[login_name] [nvarchar](256) NULL,

[program_name] [nvarchar](128) NULL,

[login_time] [datetime] NULL,

[last_request_start_time] [datetime] NULL,

[last_request_end_time] [datetime] NULL,

[transaction_id] [bigint] NULL,

[transaction_isolation_level] [smallint] NULL,

[open_transaction_count] [int] NULL,

[totalReads] [bigint] NULL,

[totalWrites] [bigint] NULL,

[totalCPU] [int] NULL,

[writes_in_tempdb] [bigint] NULL,

[sql_plan] [xml] NULL

) ON [PRIMARY]

CREATE CLUSTERED INDEX [IDX1] ON [dbo].[session_capture]

(

[date_captured] ASC,

[session_id] ASC

)

CREATE NONCLUSTERED INDEX [IDX2] ON [dbo].[session_capture]

(

[date_captured] ASC,

[wait_time] ASC,

[blocking_session_id] ASC

)

CREATE NONCLUSTERED INDEX [IDX3] ON [dbo].[session_capture]

(

[database_id] ASC,

[wait_time] ASC,

[wait_type] ASC,

[host_name] ASC

)


3. Create the following 2 Stored Procedures in your utility DB:-


CREATE PROC [dbo].[SessionCapture]

@TargetSizeMB     bigint = 4000,                -- Purge oldest 20% of data if SessionCapture table is larger than this size (MB)

@Threshold        int = 15000,                  -- Alert on blocking that has wait-time greater than threshold (millisecs)

@Recipients       varchar(500) = NULL,    -- Who will receive blocking alert emails. Separate email addresses with ; Set to NULL to disable alerting

@MailInterval     int = 60                      -- Minimum interval between alert emails (secs)

AS

SET NOCOUNT ON

DECLARE @mailsubject    varchar(50),

@tableHTML        nvarchar(max),

@LastEmail        datetime

SET @LastEmail = GETDATE()

-- Loop indefinitely every 10 seonds

WHILE 1=1

BEGIN

BEGIN TRY

INSERT DBAdata.dbo.session_capture

SELECT

GETDATE(),

x.session_id,

x.database_id,

x.host_name,

x.command,

(

SELECT

text AS [text()]

FROM sys.dm_exec_sql_text(x.sql_handle)

FOR XML PATH(''), TYPE

) AS sql_text,

x.wait_type,

x.wait_time,

x.wait_resource,

x.last_wait_type,

x.total_elapsed_time,

COALESCE(x.blocking_session_id, 0) AS blocking_session_id,

(

SELECT

p.text

FROM

(

SELECT

MIN(sql_handle) AS sql_handle

FROM sys.dm_exec_requests r2

WHERE

r2.session_id = x.blocking_session_id

) AS r_blocking

CROSS APPLY

(

SELECT

text AS [text()]

FROM sys.dm_exec_sql_text(r_blocking.sql_handle)

FOR XML PATH(''), TYPE

) p (text)

) AS blocking_text,

x.start_time,

x.login_name,

x.program_name,

x.login_time,

x.last_request_start_time,

x.last_request_end_time,

x.transaction_id,

x.transaction_isolation_level,

x.open_transaction_count,

x.totalReads,

x.totalWrites,

x.totalCPU,

x.writes_in_tempdb

,(select query_plan from sys.dm_exec_query_plan(x.plan_handle)) as sql_plan

FROM

(

SELECT

r.session_id,

r.database_id,

s.host_name,

s.login_name,

s.program_name,

s.login_time,

r.start_time,

r.sql_handle,

r.plan_handle,

r.blocking_session_id,

r.wait_type,

r.wait_resource,

r.wait_time,

r.last_wait_type,

r.total_elapsed_time,

r.transaction_id,

r.transaction_isolation_level,

r.open_transaction_count,

r.command,

s.last_request_start_time,

s.last_request_end_time,

SUM(r.reads) AS totalReads,

SUM(r.writes) AS totalWrites,

SUM(r.cpu_time) AS totalCPU,

SUM(tsu.user_objects_alloc_page_count + tsu.internal_objects_alloc_page_count) AS writes_in_tempdb

FROM sys.dm_exec_requests r

JOIN sys.dm_exec_sessions s ON s.session_id = r.session_id

JOIN sys.dm_db_task_space_usage tsu ON s.session_id = tsu.session_id and r.request_id = tsu.request_id

WHERE r.status IN ('running', 'runnable', 'suspended')

AND r.session_id <> @@spid

AND r.session_id > 50

GROUP BY

r.session_id,

r.database_id,

s.host_name,

s.login_name,

s.login_time,

s.program_name,

r.start_time,

r.sql_handle,

r.plan_handle,

r.blocking_session_id,

r.wait_type,

r.wait_resource,

r.wait_time,

r.last_wait_type,

r.total_elapsed_time,

r.transaction_id,

r.transaction_isolation_level,

r.open_transaction_count,

r.command,

s.last_request_start_time,

s.last_request_end_time

) x

-- If email recipients set and blocking above threshold then email blocking alert

-- But don't send another email within @MailInterval

IF @Recipients IS NOT NULL AND DATEDIFF (ss, @LastEmail, GETDATE()) > @MailInterval

BEGIN

IF EXISTS (

SELECT 1

FROM session_capture sc

WHERE sc.date_captured = (SELECT MAX(Date_Captured) FROM Session_Capture)

AND wait_time > @Threshold

AND blocking_session_id <> 0

)

BEGIN

SELECT @MailSubject =  'Prod Blocking Alert'

SET @tableHTML =

N'<H5 style="font-family:verdana"> Blocking and blocked sessions</H5>' +

N'<table width = 1300 border="1" style="font-family:verdana;font-size:60%">' +

N'<tr><th width = 100>Session_ID</th><th width = 100>Blocking_Session</th><th width = 100>Wait_Time</th><th width = 100>Hostname</th>' +

N'<th width = 100>NT_Username</th><th width = 100>DB_Name</th><th width = 500>Text</th>' +

N'<th width = 100>Current_Command</th><th width = 100>Sequence</th></tr>'

+

-- Query blocked and lead blocker session data

CAST ( ( SELECT DISTINCT

td = sp.spid,'     ',

td = sp.blocked,'     ',

td = sp.waittime,'     ',

td = sp.hostname,'     ',

td = sp.nt_username + ' ','     ',

td = sd.name,'     ',

td = cast(st.text as nvarchar(500)),'     ',

td = case when sp.stmt_start > 0 then

case when sp.stmt_end > sp.stmt_start then

substring(st2.text, 1 + sp.stmt_start/2, 1 + (sp.stmt_end/2)-(sp.stmt_start/2))

else

substring(st2.text, 1 + sp.stmt_start/2, len(st2.text))

end

else ''

end, '     ',

td =  case sp.waittime

when 0 then 999999

else sp.waittime

end, '     '

from sys.sysprocesses (nolock) sp

left outer join sys.sysprocesses (nolock) s2

on sp.spid = s2.blocked

inner join sys.sysdatabases (nolock) sd

on sp.dbid = sd.dbid

left outer join sys.dm_exec_requests er

on sp.spid = er.session_id

cross apply

( SELECT text AS [text()]

FROM sys.dm_exec_sql_text(sp.sql_handle)

) st (text)

cross apply

sys.dm_exec_sql_text(sp.sql_handle) st2

WHERE ((s2.spid IS NOT NULL AND sp.blocked = 0) OR sp.blocked <> 0)

AND sp.SPID > 50

and (s2.ecid is null or s2.ecid = 0)

order by case sp.waittime

when 0 then 999999

else sp.waittime

end desc

FOR XML PATH('tr'), TYPE

) AS NVARCHAR(MAX) ) +

N'</table>'

EXEC msdb.dbo.sp_send_dbmail

@recipients = @Recipients,

@subject = @MailSubject,

@body = @TableHTML,

@body_format = 'HTML'

SET @LastEmail = GETDATE()

END

END

-- Check table size, if > @TargetSizeMB then purge old data

IF (SELECT  SUM(reserved_page_count) FROM sys.dm_db_partition_stats where object_id = object_id('session_capture')) > @TargetSizeMB*1024*1024/8192

BEGIN

-- shrink table size to 80% of @TargetSizeMB

DECLARE @totalrows bigint, @totalpages bigint

SELECT  @totalpages = SUM(reserved_page_count),

@totalrows = SUM(CASE WHEN (index_id < 2) THEN row_count ELSE 0 END)

FROM sys.dm_db_partition_stats ps

WHERE object_id = object_id('session_capture')

-- Calculate how many rows to delete to be left with 80% of TargetSize

SELECT @totalrows = @totalrows - (@totalrows * @TargetSizeMB*1024*1024/8192*0.8 / @totalpages)

DELETE sc

FROM (SELECT TOP (@totalrows) *

FROM SESSION_CAPTURE

ORDER BY date_captured) sc

END

WAITFOR DELAY '00:00:10'

END TRY

BEGIN CATCH

WAITFOR DELAY '00:00:10'

END CATCH

END

GO

CREATE PROC [dbo].[SessionCapturePurge]

@RetentionDays int = 14

AS

DELETE

FROM session_capture

WHERE date_captured < DATEADD(dd, - @RetentionDays, GETDATE())

GO

And create this one in master:-


USE [master]
GO

/****** Object: StoredProcedure [dbo].[SessionCapture] Script Date: 04/05/2011 10:21:03 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE procedure [dbo].[SessionCapture]
as
exec DBAdata.dbo.SessionCapture
@TargetSizeMB = 4000,
@Threshold = 20000,
@Recipients = 'yourname@youremaildomain.com',
@MailInterval = 900

GO

4. Mark the master.dbo.SessionCapture as a startup proc, by executing the following 2 SPs:-

exec sp_configure 'scan for startup procs', 1
RECONFIGURE WITH OVERRIDE

exec sp_procoption 'SessionCapture', 'startup', 1

5. Create an SQL Server Agent job to daily prune the old data.

Schedule the SessionCapturePurge SP every day, to prune (at least) the last 7 days’ capture.

This data & its collection can be very helpful  Firstly it caqn alert you (by email/database mail) to blocking. It can be very useful in analyzing long-running SQL batches, and blocking, historically. Also, it can be very helpful for analyzing wait stats in hindsight. I will add some example queries for analyzing your collected data in the next few days.

2 Responses to “Session Capture System”

  1. Charles Says:

    Umm, DBAdmin you say. You need more underscores…..

  2. thomaspullen Says:

    Er I’m not very good on the underscores! Philip? Jonathan? To,?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s