Session Capture System

“Tom, there was a production problem yesterday at about 2.30pm for 20 minutes. Users complained about timeouts. What was going on in the database?”

This is a classic  “umm … (oh damn)” moment for a DBA. You weren’t running Profiler. You had no Perfmon running.Your boss won’t countenance spending several thousand big ones on 3rd party performance apps. You can’t go back in time. But you want to! What was running? What plans? What were they waiting on? Was there blocking?

Help is at hand.

It is quite easy to implement a session capture system. This will query the DMVs to capture current running process information, every (specified) X seconds, indefinitely. It will capture a lot of data but will auto-prune itself to prevent excessive data bloat. And having it will mean you can precisely answer that formerly embarrassing question: “well, yesterday at 2.30pm, the following was happening … “. You will look good, prepared, clever, proactive, and in charge. Your users, your boss, your developers – your business – will love you.

This is how you do it.

1. Create a utility DBA database on your Prod server. It will need 10 – 15 GB disk space. Put it in the SIMPLE recovery mode. Ensure it is being backed up regularly.

2. Create the following table:-


CREATE TABLE [dbo].[session_capture](

[date_captured] [datetime] NULL,

[session_id] [smallint] NULL,

[database_id] [smallint] NULL,

[host_name] [nvarchar](256) NULL,

[command] [nvarchar](32) NULL,

[sql_text] [xml] NULL,

[wait_type] [nvarchar](120) NULL,

[wait_time] [int] NULL,

[wait_resource] [nvarchar](512) NULL,

[last_wait_type] [nvarchar](120) NULL,

[total_elapsed_time] [int] NULL,

[blocking_session_id] [int] NULL,

[blocking_text] [xml] NULL,

[start_time] [datetime] NULL,

[login_name] [nvarchar](256) NULL,

[program_name] [nvarchar](128) NULL,

[login_time] [datetime] NULL,

[last_request_start_time] [datetime] NULL,

[last_request_end_time] [datetime] NULL,

[transaction_id] [bigint] NULL,

[transaction_isolation_level] [smallint] NULL,

[open_transaction_count] [int] NULL,

[totalReads] [bigint] NULL,

[totalWrites] [bigint] NULL,

[totalCPU] [int] NULL,

[writes_in_tempdb] [bigint] NULL,

[sql_plan] [xml] NULL

) ON [PRIMARY]

CREATE CLUSTERED INDEX [IDX1] ON [dbo].[session_capture]

(

[date_captured] ASC,

[session_id] ASC

)

CREATE NONCLUSTERED INDEX [IDX2] ON [dbo].[session_capture]

(

[date_captured] ASC,

[wait_time] ASC,

[blocking_session_id] ASC

)

CREATE NONCLUSTERED INDEX [IDX3] ON [dbo].[session_capture]

(

[database_id] ASC,

[wait_time] ASC,

[wait_type] ASC,

[host_name] ASC

)
3. Create the following 2 Stored Procedures in your utility DB:-


CREATE PROC [dbo].[SessionCapture]

@TargetSizeMB     bigint = 4000,                — Purge oldest 20% of data if SessionCapture table is larger than this size (MB)

@Threshold        int = 15000,                  — Alert on blocking that has wait-time greater than threshold (millisecs)

@Recipients       varchar(500) = NULL,    — Who will receive blocking alert emails. Separate email addresses with ; Set to NULL to disable alerting

@MailInterval     int = 60                      — Minimum interval between alert emails (secs)

AS

SET NOCOUNT ON

DECLARE @mailsubject    varchar(50),

@tableHTML        nvarchar(max),

@LastEmail        datetime

SET @LastEmail = GETDATE()

— Loop indefinitely every 10 seonds

WHILE 1=1

BEGIN

BEGIN TRY

INSERT DBAdata.dbo.session_capture

SELECT

GETDATE(),

x.session_id,

x.database_id,

x.host_name,

x.command,

(

SELECT

text AS [text()]

FROM sys.dm_exec_sql_text(x.sql_handle)

FOR XML PATH(”), TYPE

) AS sql_text,

x.wait_type,

x.wait_time,

x.wait_resource,

x.last_wait_type,

x.total_elapsed_time,

COALESCE(x.blocking_session_id, 0) AS blocking_session_id,

(

SELECT

p.text

FROM

(

SELECT

MIN(sql_handle) AS sql_handle

FROM sys.dm_exec_requests r2

WHERE

r2.session_id = x.blocking_session_id

) AS r_blocking

CROSS APPLY

(

SELECT

text AS [text()]

FROM sys.dm_exec_sql_text(r_blocking.sql_handle)

FOR XML PATH(”), TYPE

) p (text)

) AS blocking_text,

x.start_time,

x.login_name,

x.program_name,

x.login_time,

x.last_request_start_time,

x.last_request_end_time,

x.transaction_id,

x.transaction_isolation_level,

x.open_transaction_count,

x.totalReads,

x.totalWrites,

x.totalCPU,

x.writes_in_tempdb

,(select query_plan from sys.dm_exec_query_plan(x.plan_handle)) as sql_plan

FROM

(

SELECT

r.session_id,

r.database_id,

s.host_name,

s.login_name,

s.program_name,

s.login_time,

r.start_time,

r.sql_handle,

r.plan_handle,

r.blocking_session_id,

r.wait_type,

r.wait_resource,

r.wait_time,

r.last_wait_type,

r.total_elapsed_time,

r.transaction_id,

r.transaction_isolation_level,

r.open_transaction_count,

r.command,

s.last_request_start_time,

s.last_request_end_time,

SUM(r.reads) AS totalReads,

SUM(r.writes) AS totalWrites,

SUM(r.cpu_time) AS totalCPU,

SUM(tsu.user_objects_alloc_page_count + tsu.internal_objects_alloc_page_count) AS writes_in_tempdb

FROM sys.dm_exec_requests r

JOIN sys.dm_exec_sessions s ON s.session_id = r.session_id

JOIN sys.dm_db_task_space_usage tsu ON s.session_id = tsu.session_id and r.request_id = tsu.request_id

WHERE r.status IN (‘running’, ‘runnable’, ‘suspended’)

AND r.session_id <> @@spid

AND r.session_id > 50

GROUP BY

r.session_id,

r.database_id,

s.host_name,

s.login_name,

s.login_time,

s.program_name,

r.start_time,

r.sql_handle,

r.plan_handle,

r.blocking_session_id,

r.wait_type,

r.wait_resource,

r.wait_time,

r.last_wait_type,

r.total_elapsed_time,

r.transaction_id,

r.transaction_isolation_level,

r.open_transaction_count,

r.command,

s.last_request_start_time,

s.last_request_end_time

) x

— If email recipients set and blocking above threshold then email blocking alert

— But don’t send another email within @MailInterval

IF @Recipients IS NOT NULL AND DATEDIFF (ss, @LastEmail, GETDATE()) > @MailInterval

BEGIN

IF EXISTS (

SELECT 1

FROM session_capture sc

WHERE sc.date_captured = (SELECT MAX(Date_Captured) FROM Session_Capture)

AND wait_time > @Threshold

AND blocking_session_id <> 0

)

BEGIN

SELECT @MailSubject =  ‘Prod Blocking Alert’

SET @tableHTML =

N'<H5 style=”font-family:verdana”> Blocking and blocked sessions</H5>’ +

N'<table width = 1300 border=”1″ style=”font-family:verdana;font-size:60%”>’ +

N'<tr><th width = 100>Session_ID</th><th width = 100>Blocking_Session</th><th width = 100>Wait_Time</th><th width = 100>Hostname</th>’ +

N'<th width = 100>NT_Username</th><th width = 100>DB_Name</th><th width = 500>Text</th>’ +

N'<th width = 100>Current_Command</th><th width = 100>Sequence</th></tr>’

+

— Query blocked and lead blocker session data

CAST ( ( SELECT DISTINCT

td = sp.spid,’     ‘,

td = sp.blocked,’     ‘,

td = sp.waittime,’     ‘,

td = sp.hostname,’     ‘,

td = sp.nt_username + ‘ ‘,’     ‘,

td = sd.name,’     ‘,

td = cast(st.text as nvarchar(500)),’     ‘,

td = case when sp.stmt_start > 0 then

case when sp.stmt_end > sp.stmt_start then

substring(st2.text, 1 + sp.stmt_start/2, 1 + (sp.stmt_end/2)-(sp.stmt_start/2))

else

substring(st2.text, 1 + sp.stmt_start/2, len(st2.text))

end

else ”

end, ‘     ‘,

td =  case sp.waittime

when 0 then 999999

else sp.waittime

end, ‘     ‘

from sys.sysprocesses (nolock) sp

left outer join sys.sysprocesses (nolock) s2

on sp.spid = s2.blocked

inner join sys.sysdatabases (nolock) sd

on sp.dbid = sd.dbid

left outer join sys.dm_exec_requests er

on sp.spid = er.session_id

cross apply

( SELECT text AS [text()]

FROM sys.dm_exec_sql_text(sp.sql_handle)

) st (text)

cross apply

sys.dm_exec_sql_text(sp.sql_handle) st2

WHERE ((s2.spid IS NOT NULL AND sp.blocked = 0) OR sp.blocked <> 0)

AND sp.SPID > 50

and (s2.ecid is null or s2.ecid = 0)

order by case sp.waittime

when 0 then 999999

else sp.waittime

end desc

FOR XML PATH(‘tr’), TYPE

) AS NVARCHAR(MAX) ) +

N'</table>’

EXEC msdb.dbo.sp_send_dbmail

@recipients = @Recipients,

@subject = @MailSubject,

@body = @TableHTML,

@body_format = ‘HTML’

SET @LastEmail = GETDATE()

END

END

— Check table size, if > @TargetSizeMB then purge old data

IF (SELECT  SUM(reserved_page_count) FROM sys.dm_db_partition_stats where object_id = object_id(‘session_capture’)) > @TargetSizeMB*1024*1024/8192

BEGIN

— shrink table size to 80% of @TargetSizeMB

DECLARE @totalrows bigint, @totalpages bigint

SELECT  @totalpages = SUM(reserved_page_count),

@totalrows = SUM(CASE WHEN (index_id < 2) THEN row_count ELSE 0 END)

FROM sys.dm_db_partition_stats ps

WHERE object_id = object_id(‘session_capture’)

— Calculate how many rows to delete to be left with 80% of TargetSize

SELECT @totalrows = @totalrows – (@totalrows * @TargetSizeMB*1024*1024/8192*0.8 / @totalpages)

DELETE sc

FROM (SELECT TOP (@totalrows) *

FROM SESSION_CAPTURE

ORDER BY date_captured) sc

END

WAITFOR DELAY ’00:00:10′

END TRY

BEGIN CATCH

WAITFOR DELAY ’00:00:10′

END CATCH

END

GO

CREATE PROC [dbo].[SessionCapturePurge]

@RetentionDays int = 14

AS

DELETE

FROM session_capture

WHERE date_captured < DATEADD(dd, – @RetentionDays, GETDATE())

GO

 

And create this one in master:-


USE [master]
GO

/****** Object: StoredProcedure [dbo].[SessionCapture] Script Date: 04/05/2011 10:21:03 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE procedure [dbo].[SessionCapture]
as
exec DBAdata.dbo.SessionCapture
@TargetSizeMB = 4000,
@Threshold = 20000,
@Recipients = ‘yourname@youremaildomain.com’,
@MailInterval = 900

GO

 

4. Mark the master.dbo.SessionCapture as a startup proc, by executing the following 2 SPs:-

 

exec sp_configure ‘scan for startup procs’, 1
RECONFIGURE WITH OVERRIDE

exec sp_procoption ‘SessionCapture’, ‘startup’, 1

5. Create an SQL Server Agent job to daily prune the old data.

Schedule the SessionCapturePurge SP every day, to prune (at least) the last 7 days’ capture.

This data & its collection can be very helpful  Firstly it can alert you (by email/database mail) to blocking. It can be very useful in analyzing long-running SQL batches, and blocking, historically. Also, it can be very helpful for analyzing wait stats in hindsight.

SQL Server 2008 Clustering on Windows 2003 Server

This morning I am playing with clustering SQL Server 2008 on Windows 2003 for the first time. This is to prove that we will be able to install a new SQL Server 2008 instance into our existing 4-node, 4-instance cluster at work, which is currently Windows 2003 Server with SQL Server 2005 instances. It took a while to get hold of a test cluster, but I finally have it.

I have set up clustering (with the usual MSDTC hassles!), and a SQL 2005 instance. Created a group for the first 2008 instance, ran the setup for 2008, did the prerequisites, Hotfix for MS Installer, .Net 3.0 whatever it is, then went through the WHOLE set up .. which is about 12 screens full of configs, etc -everything basically, no I don’t want to use FILESREAM .. and the final page, it tells me I can’t proceed because I haven’t installed a hotfix for FILESTREAM for Windows 2003 Server. GRRRR!!

I don’t even WANT to install pesky FILESTREAM! So I had to cancel out of that entire setup, download the Hotfix (KB937444), install it, reboot both nodes, then start again from scratch. Big GRRR! Anyway, I’m getting there now. Will post some details on the setup with screenshots when I’ve done it.

By the way, this is my first ever blog post! My name is Tom Pullen and I am a Senior DBA (Database Administrator) in Oxfordshire, UK. More to come!