Session Capture System

April 4, 2011

“Tom, there was a production problem yesterday at about 2.30pm for 20 minutes. Users complained about timeouts. What was going on in the database?”

This is a classic  “umm … (oh damn)” moment for a DBA. You weren’t running Profiler. You had no Perfmon running.Your boss won’t countenance spending several thousand big ones on 3rd party performance apps. You can’t go back in time. But you want to! What was running? What plans? What were they waiting on? Was there blocking?

Help is at hand.

It is quite easy to implement a session capture system. This will query the DMVs to capture current running process information, every (specified) X seconds, indefinitely. It will capture a lot of data but will auto-prune itself to prevent excessive data bloat. And having it will mean you can precisely answer that formerly embarrassing question: “well, yesterday at 2.30pm, the following was happening … “. You will look good, prepared, clever, proactive, and in charge. Your users, your boss, your developers – your business – will love you.

This is how you do it.

1. Create a utitlity DBA database on your Prod server. It will need 10 – 15 GB disk space. Put it in the SIMPLE recovery mode. Ensure it is being backed up regularly.

2. Create the following table:-


CREATE TABLE [dbo].[session_capture](

[date_captured] [datetime] NULL,

[session_id] [smallint] NULL,

[database_id] [smallint] NULL,

[host_name] [nvarchar](256) NULL,

[command] [nvarchar](32) NULL,

[sql_text] [xml] NULL,

[wait_type] [nvarchar](120) NULL,

[wait_time] [int] NULL,

[wait_resource] [nvarchar](512) NULL,

[last_wait_type] [nvarchar](120) NULL,

[total_elapsed_time] [int] NULL,

[blocking_session_id] [int] NULL,

[blocking_text] [xml] NULL,

[start_time] [datetime] NULL,

[login_name] [nvarchar](256) NULL,

[program_name] [nvarchar](128) NULL,

[login_time] [datetime] NULL,

[last_request_start_time] [datetime] NULL,

[last_request_end_time] [datetime] NULL,

[transaction_id] [bigint] NULL,

[transaction_isolation_level] [smallint] NULL,

[open_transaction_count] [int] NULL,

[totalReads] [bigint] NULL,

[totalWrites] [bigint] NULL,

[totalCPU] [int] NULL,

[writes_in_tempdb] [bigint] NULL,

[sql_plan] [xml] NULL

) ON [PRIMARY]

CREATE CLUSTERED INDEX [IDX1] ON [dbo].[session_capture]

(

[date_captured] ASC,

[session_id] ASC

)

CREATE NONCLUSTERED INDEX [IDX2] ON [dbo].[session_capture]

(

[date_captured] ASC,

[wait_time] ASC,

[blocking_session_id] ASC

)

CREATE NONCLUSTERED INDEX [IDX3] ON [dbo].[session_capture]

(

[database_id] ASC,

[wait_time] ASC,

[wait_type] ASC,

[host_name] ASC

)


3. Create the following 2 Stored Procedures in your utility DB:-


CREATE PROC [dbo].[SessionCapture]

@TargetSizeMB     bigint = 4000,                -- Purge oldest 20% of data if SessionCapture table is larger than this size (MB)

@Threshold        int = 15000,                  -- Alert on blocking that has wait-time greater than threshold (millisecs)

@Recipients       varchar(500) = NULL,    -- Who will receive blocking alert emails. Separate email addresses with ; Set to NULL to disable alerting

@MailInterval     int = 60                      -- Minimum interval between alert emails (secs)

AS

SET NOCOUNT ON

DECLARE @mailsubject    varchar(50),

@tableHTML        nvarchar(max),

@LastEmail        datetime

SET @LastEmail = GETDATE()

-- Loop indefinitely every 10 seonds

WHILE 1=1

BEGIN

BEGIN TRY

INSERT DBAdata.dbo.session_capture

SELECT

GETDATE(),

x.session_id,

x.database_id,

x.host_name,

x.command,

(

SELECT

text AS [text()]

FROM sys.dm_exec_sql_text(x.sql_handle)

FOR XML PATH(''), TYPE

) AS sql_text,

x.wait_type,

x.wait_time,

x.wait_resource,

x.last_wait_type,

x.total_elapsed_time,

COALESCE(x.blocking_session_id, 0) AS blocking_session_id,

(

SELECT

p.text

FROM

(

SELECT

MIN(sql_handle) AS sql_handle

FROM sys.dm_exec_requests r2

WHERE

r2.session_id = x.blocking_session_id

) AS r_blocking

CROSS APPLY

(

SELECT

text AS [text()]

FROM sys.dm_exec_sql_text(r_blocking.sql_handle)

FOR XML PATH(''), TYPE

) p (text)

) AS blocking_text,

x.start_time,

x.login_name,

x.program_name,

x.login_time,

x.last_request_start_time,

x.last_request_end_time,

x.transaction_id,

x.transaction_isolation_level,

x.open_transaction_count,

x.totalReads,

x.totalWrites,

x.totalCPU,

x.writes_in_tempdb

,(select query_plan from sys.dm_exec_query_plan(x.plan_handle)) as sql_plan

FROM

(

SELECT

r.session_id,

r.database_id,

s.host_name,

s.login_name,

s.program_name,

s.login_time,

r.start_time,

r.sql_handle,

r.plan_handle,

r.blocking_session_id,

r.wait_type,

r.wait_resource,

r.wait_time,

r.last_wait_type,

r.total_elapsed_time,

r.transaction_id,

r.transaction_isolation_level,

r.open_transaction_count,

r.command,

s.last_request_start_time,

s.last_request_end_time,

SUM(r.reads) AS totalReads,

SUM(r.writes) AS totalWrites,

SUM(r.cpu_time) AS totalCPU,

SUM(tsu.user_objects_alloc_page_count + tsu.internal_objects_alloc_page_count) AS writes_in_tempdb

FROM sys.dm_exec_requests r

JOIN sys.dm_exec_sessions s ON s.session_id = r.session_id

JOIN sys.dm_db_task_space_usage tsu ON s.session_id = tsu.session_id and r.request_id = tsu.request_id

WHERE r.status IN ('running', 'runnable', 'suspended')

AND r.session_id <> @@spid

AND r.session_id > 50

GROUP BY

r.session_id,

r.database_id,

s.host_name,

s.login_name,

s.login_time,

s.program_name,

r.start_time,

r.sql_handle,

r.plan_handle,

r.blocking_session_id,

r.wait_type,

r.wait_resource,

r.wait_time,

r.last_wait_type,

r.total_elapsed_time,

r.transaction_id,

r.transaction_isolation_level,

r.open_transaction_count,

r.command,

s.last_request_start_time,

s.last_request_end_time

) x

-- If email recipients set and blocking above threshold then email blocking alert

-- But don't send another email within @MailInterval

IF @Recipients IS NOT NULL AND DATEDIFF (ss, @LastEmail, GETDATE()) > @MailInterval

BEGIN

IF EXISTS (

SELECT 1

FROM session_capture sc

WHERE sc.date_captured = (SELECT MAX(Date_Captured) FROM Session_Capture)

AND wait_time > @Threshold

AND blocking_session_id <> 0

)

BEGIN

SELECT @MailSubject =  'Prod Blocking Alert'

SET @tableHTML =

N'<H5 style="font-family:verdana"> Blocking and blocked sessions</H5>' +

N'<table width = 1300 border="1" style="font-family:verdana;font-size:60%">' +

N'<tr><th width = 100>Session_ID</th><th width = 100>Blocking_Session</th><th width = 100>Wait_Time</th><th width = 100>Hostname</th>' +

N'<th width = 100>NT_Username</th><th width = 100>DB_Name</th><th width = 500>Text</th>' +

N'<th width = 100>Current_Command</th><th width = 100>Sequence</th></tr>'

+

-- Query blocked and lead blocker session data

CAST ( ( SELECT DISTINCT

td = sp.spid,'     ',

td = sp.blocked,'     ',

td = sp.waittime,'     ',

td = sp.hostname,'     ',

td = sp.nt_username + ' ','     ',

td = sd.name,'     ',

td = cast(st.text as nvarchar(500)),'     ',

td = case when sp.stmt_start > 0 then

case when sp.stmt_end > sp.stmt_start then

substring(st2.text, 1 + sp.stmt_start/2, 1 + (sp.stmt_end/2)-(sp.stmt_start/2))

else

substring(st2.text, 1 + sp.stmt_start/2, len(st2.text))

end

else ''

end, '     ',

td =  case sp.waittime

when 0 then 999999

else sp.waittime

end, '     '

from sys.sysprocesses (nolock) sp

left outer join sys.sysprocesses (nolock) s2

on sp.spid = s2.blocked

inner join sys.sysdatabases (nolock) sd

on sp.dbid = sd.dbid

left outer join sys.dm_exec_requests er

on sp.spid = er.session_id

cross apply

( SELECT text AS [text()]

FROM sys.dm_exec_sql_text(sp.sql_handle)

) st (text)

cross apply

sys.dm_exec_sql_text(sp.sql_handle) st2

WHERE ((s2.spid IS NOT NULL AND sp.blocked = 0) OR sp.blocked <> 0)

AND sp.SPID > 50

and (s2.ecid is null or s2.ecid = 0)

order by case sp.waittime

when 0 then 999999

else sp.waittime

end desc

FOR XML PATH('tr'), TYPE

) AS NVARCHAR(MAX) ) +

N'</table>'

EXEC msdb.dbo.sp_send_dbmail

@recipients = @Recipients,

@subject = @MailSubject,

@body = @TableHTML,

@body_format = 'HTML'

SET @LastEmail = GETDATE()

END

END

-- Check table size, if > @TargetSizeMB then purge old data

IF (SELECT  SUM(reserved_page_count) FROM sys.dm_db_partition_stats where object_id = object_id('session_capture')) > @TargetSizeMB*1024*1024/8192

BEGIN

-- shrink table size to 80% of @TargetSizeMB

DECLARE @totalrows bigint, @totalpages bigint

SELECT  @totalpages = SUM(reserved_page_count),

@totalrows = SUM(CASE WHEN (index_id < 2) THEN row_count ELSE 0 END)

FROM sys.dm_db_partition_stats ps

WHERE object_id = object_id('session_capture')

-- Calculate how many rows to delete to be left with 80% of TargetSize

SELECT @totalrows = @totalrows - (@totalrows * @TargetSizeMB*1024*1024/8192*0.8 / @totalpages)

DELETE sc

FROM (SELECT TOP (@totalrows) *

FROM SESSION_CAPTURE

ORDER BY date_captured) sc

END

WAITFOR DELAY '00:00:10'

END TRY

BEGIN CATCH

WAITFOR DELAY '00:00:10'

END CATCH

END

GO

CREATE PROC [dbo].[SessionCapturePurge]

@RetentionDays int = 14

AS

DELETE

FROM session_capture

WHERE date_captured < DATEADD(dd, - @RetentionDays, GETDATE())

GO

And create this one in master:-


USE [master]
GO

/****** Object: StoredProcedure [dbo].[SessionCapture] Script Date: 04/05/2011 10:21:03 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE procedure [dbo].[SessionCapture]
as
exec DBAdata.dbo.SessionCapture
@TargetSizeMB = 4000,
@Threshold = 20000,
@Recipients = 'yourname@youremaildomain.com',
@MailInterval = 900

GO

4. Mark the master.dbo.SessionCapture as a startup proc, by executing the following 2 SPs:-

exec sp_configure 'scan for startup procs', 1
RECONFIGURE WITH OVERRIDE

exec sp_procoption 'SessionCapture', 'startup', 1

5. Create an SQL Server Agent job to daily prune the old data.

Schedule the SessionCapturePurge SP every day, to prune (at least) the last 7 days’ capture.

This data & its collection can be very helpful  Firstly it caqn alert you (by email/database mail) to blocking. It can be very useful in analyzing long-running SQL batches, and blocking, historically. Also, it can be very helpful for analyzing wait stats in hindsight. I will add some example queries for analyzing your collected data in the next few days.

On Thursday (26.8.2010) last week I had the privilege to be invited to spend the day at Microsoft HQ in Reading, for SQL Bits Speaker Training. The day was run by Guy Smith-Ferrier, an MVP, conference regular, and trainer.

We were told to prepare (in advance), a 5-minute presentation on any subject. Mine was on how an aeroplane’s engines work. The day consisted of Guy giving sessions on subjects like demos, how to prepare your laptop and so on, and his presentations contained lots of very useful tips and thought-provoking ideas (and discussion). Interleaved with this were two breakout sessions, where we divided into 4 groups of 5, each led by a conference luminary: Simon Sabin, Mike Taulty, Dave McMahon and Andrew Fryer.

In these groups each participant (not the leaders) gave their 5 minute presentation then faced 5 minutes of feedback and constructive criticism from the group. This was very useful, somewhat nerve-wracking for some people, but clearly beneficial! We then did the same thing nearer the end of the day, in different groups, and there was a clear improvement over what we’d managed in the morning.

260820101313
Guy tells us how to handle Questions!

Some of the key learning points I took away from the day are:-

- Don’t say “thanks for coming” (credit to Dave McMahon for this one). Say only at the end “Thank you for supporting the community” or some such. You are the one they should be thanking, especially at an event like SQL Bits Community Day. Not all people agreed with this one and there was an interesting discussion, but I decided I agreed with Dave.

- Strategies for dealing with nerves

- How to diplomatically get past obstructive questioners

- Techniques for good demos

- Tips for preparing your laptop

- How to structure your agenda, how to summarise

- How to do an elevator talk (i.e. a 1 to 2 minute initial summary of the session’s goal and outline)

- How to react to negative feedback

There was loads to take away, but I would like to add a personal thank you to all the group leaders and organisers, Chris Testa-O’Neill who was instrumental in the planning but sadly couldn’t make it to the actually event, Simon and all the group leaders but principally to Guy for laying on some really excellent sessions and giving me plenty to think about in how to improve my public speaking.

I have previously written recruitment advice for candidates, in my new job – which is lovely and full of variety – I have been tasked with doing telephone interviews for contract SQL developers. This is fairly straightforward – I have their CVs which I review beforehand and I have a list of 15 technical questions which I use some of (but never all) during the course of the conversation.

Reviewing CVs is very interesting. I have my own views about how a CV should look. I prefer them to be very simple: laid out with no tables, no lines, no multi-column sections. I like only the use of bold for emphasis, no colour, and above all – not too much detail!

One CV I was reviewing recently was 12 pages long! I nearly choked. No one is ever going to read that much, it is simply boring. CVs also have limited utility – they are factual documents and give you an idea of what a candidate’s skill and experience is, but very little more. Even some of the ‘personal statement’-type sections can be a little sterile.

Obviously it’s a subjective thing, but I really like CVs in which there is some kind of evidence of personality. I look for touches of humour, or something about a candidate’s personal (non-work) interests that shows a bit of colour.

In terms of the telephone interviews themselves, I’m not a great fan of the phone at all in general, again, not being able to see a person’s face and how they react to things, makes the process of limited utility. But you can get an idea of a person’s communication skills, a bit of personality and use the technical questions as a filter.

It has certainly made me realise I’m selling myself far too cheaply!

My DailyWTF Job

July 21, 2010

I’ve landed myself a really big one. I’m trying to sort out a database and application which have things like this in them:-

1. DDL designed (and executed) by the user in the application, not temporary tables, not worktables, not permanent tables in tempdb, not table variables. Permanent user tables in the database that are actually crucially important to the application. None of them have any more indexes on them than a clustered primary key IDENTITY field. Yeah.

2. Use of the ‘One True Lookup Table’. Well two actually! One of these is particularly heinous. Free text data, or optionally, images (ugh) are stored in an image data type field and are then converted & looked up by some really WTF-style UDFs. Urgh, urgh, urgh.

3. Use of an ORM tool to abstract interaction with the databases for the programmers. In a word, yuk. have you ever seen LINQ queries? ‘nuff said. This isn’t even LINQ.

4. Error handling. WTF? The only error the application can ever return is “invalid data”. This covers every single SQL Server error that may get raised. Glorious.

5. Cursors in recursive scalar functions. Yum!

6. Views of views of views of views. Love them, I really do.

7. Use of DISTINCT – just in case!

8. Duplicate data. Duplicate columns! Duplicate duplicates!

9. Cartesian product from a join eliminated by using DISTINCT. Performant!

I think I need a lie down. Wish me luck!

Voting is now open for sessions you’d like to see at SQL Bits 7 in York at the end of September 2010. I have submitted a session on ‘Performance Tuning Quick Wins’. So head on over there and vote. For me! Obviously.

I have recently started a new job and I am working on a post about the challenges facing me. Not to put to fine a point on it, there is enough daily WTFery to fill that wonderful site with content for the rest of the year!

More on that topic later …. until then, get voting! For me!

Submit a session for SQLBits
Submit a session for SQLBits SQL Bits 7 has just been announced and it’s going to be in York at the end of September! The free community day is Saturday, with 2 paid days on Thursday and Friday, similar to SQL Bits 5. I have submitted a session on ‘Performance Tuning Quick Wins’! It was a compulsive thing .. I hope I get chosen!

I will let you know when voting opens!

I have a week off this week in between jobs. I had my last day at RM on Friday last week, I will be starting my new role at Helveta on Monday next week, I am very excited about it!

I’m greatly enjoying a bit of free time. I am spending a lot of it here:-

Yesterday, the summer solstice, was perfect: 25°C, cloudless sky, and barely any wind. The pool was heavenly!

For the rest of the week, it’s more of the same, with a few lunch meetings with friends and ex-colleagues, then on Thursday I’m going down to look after my Mum’s farm (pictured, right) while she and her husband are away. The forecast is excellent, I can’t wait for some lovely swims in the Atlantic (maybe at Perranporth beach, below)!

I am going to take my bike (bottom right) and do some riding on the lovely, rolling, quiet roads of North Devon. I hope you’re envious!!

I love having a week off at this time of the year – I should definitely do it more often!

But maybe not always unpaid …..

I have recently been offered, and have accepted, a new job. When this news filtered out many people have asked me for advice about CVs, interviews, and job hunting in general. Without wishing to sound big-headed, I am good at it. I have been offered every single job I have applied for in the past 15 years. My CV is lean, neat, short & to the point, and I believe I am justified in being a little proud of it.

I decided to blog about this so I could point people at this rather having to repeat myself.

CV
First of all, let’s start with the CV. My golden rules are as follows:
1. No more than 2 sides of A4. This is non-negotiable. Any more than that is just boring. No you are not that interesting.
2. Endless lists are dull. List your technical skills, tersely, and honestly.
3. Your CV must be completely honest. Lies will only end in tears.
4. It must be grammatically perfect. Including punctuation.
5. You should not omit the word “I..”. Sentences should read like sentences. This is a descriptive document, it is not minutes from a meeting.
6. The format and layout should be simple, clean and uncluttered. Lose the fancy headers and footers. KISS. (‘keep it simple, stupid’).
7. Include a summary “Personal Profile” saying what you are like, what your soft skills are and why you’re a good person to employ. I have one in my CV and I think it does me a lot of favours.
8. Keep employment history relevant and brief. We don’t care about that casual gardening job you did when you were 16.
9. Put in something to make you sound like a human, not an automaton. Tell us you like watching telly, or going to the pub. I’m more interested in people who sound like they actually have a life outside of 9 to 5.
10. Don’t worry unduly about the minutiae – wording, etc. The facts will speak for themselves. If your CV is good – it will get you an interview. But it’s the facts on it that count – the job history, the qualifications, the personal profile.

Check out my CV here – I do genuinely think it is a good one. Or email me for a copy.

Interviews
You will probably have 2 rounds of interviews. Possibly a screening telephone interview first too. Most of the following advice is common sense, but I’m going to say it anyway!

1. Stay calm. Being a little nervous is good, it will make you sharper. However don’t get gripped by panic and fear – it won’t help you sell yourself.

2. Look good. Be clean. Wear smart clothes – suit and tie mandatory for men. Shave. Squeeze your spots. Moisturize. It will help make you feel good about yourself. Iron your shirt. Have your suit dry-cleaned.

3. DO YOUR RESEARCH. Ensure you know the following key facts about your prospective employer
- summary history of the company
- rough figures of turnover, profit, number of employees, location of offices, etc
- background or basic knowledge of their sector, services and products, and their markets
- know in advance the name(s), job role and title of the people who will be interviewing you

4. Have good quality questions ready, for example:
What makes this company a good place to work?
Who are your competitors?
Why did the incumbent leave, why have you got a vacancy?
What are the prospects for this company, is it growing, what are its threats and opportunities?

5. Do not discuss pay and benefits until you are asked about it. Do not broach it. Wait. They will come to it when they are ready. You do not want to appear to be a money-grabbing desperado.

6. Do not slag off your current employer. Explain your motivations for looking for a new job in positive terms (e.g. ‘I am looking for new challenges that I feel my current role lacks’, not ‘I am bored’.) Make sure you KNOW what your motivations are before you even start typing your CV.

7. Have examples of things ready in your mind. Be ready with examples to use in answer to these type of questions:
When have you had to work under pressure?
Give an example of having had to deal with a difficult colleague/client/boss etc.
Tell us about a piece of work you were particularly proud of.
Give us an example of something you might consider to be a weakness. And no – do NOT say you don’t have any.
Tell us about a time you felt like you were out of your depth, how did you cope?
How do you persuade someone to do something for you that they don’t want to do?
Have all this stuff ready,with meaty, relevant, fresh examples at your fingertips. Do not just say “I am hard working and well-organised.” Say, “I am hard working and well organised. A good example of this is the time I … “ Do this unprompted too, if possible – it is very impressive. They will love you.

8. Be prepared to pause. It is fine to say “let me think about that for a moment”. Use natural pauses to compose yourself. Try to come across as steady, thoughtful and composed – it’s very attractive.

9. Admit if you do not know something – do not flannel, bluff or make cr*p up on the spot – it is painfully transparent. Instead say, I don’t know that, but I know where I’d look to find out. Everyone has Google. There is no need to pretend you know everything.

10. Be respectful. Do not use first names unless explicitly told to do so. Above all – do not get flustered. Even if you feel you are struggling.

11. Do not slouch, swear, chew gum, be over-familiar, or crack too many jokes.

Technical Stuff
You will have to answer technical questions if it’s a tech role. Don’t worry if you don’t know everything. I had a screening 20-question tech interview by my recruitment consultant, before I even got anywhere near the hiring company. I didn’t know it all, but I got 16/20 and that was good enough.

In the tech interview(s), be prepared to talk about work you have done, be ready with examples of things. This is where your technical skill and knowledge will shine through. There will be open questions where you can really make yourself sound good. Tell them about the time you worked through the night to deliver a new highly-available clustered system without downtime during the working day. They will love you. But you have to come prepared. But do not worry – I repeat – no one is expected to know everything. You just have to convince them that you know enough.

Recruitment Consultants
These people need to be very carefully marshalled and I will write a whole separate post about them. They can be your friend, they can also be an almighty pain in the ar*e. They either ignore you or bombard you with calls to your mobile phone at inappropriate times during the working day – there seems to be no middle ground. Keep your personal mobile on silent during the working day – insist they deal with you via your (personal) email. Lay the rules down for them – don’t let them make rules for you. They stand to make up to £10k in commission/fees out of getting you the job – make them earn it.

Offers/Notice/Salary negotiations
Again, it’s common sense. DO NOT hand your notice in until you have a written offer, and you have formally accepted it. Do not haggle unduly over salary but set a hard minimum – hopefully someway above your current salary, although not necessarily – and rigidly stick to it. Don’t let them get you cheap. Insist on benefits for pension, bonus, child care, private healthcare that match your current at a minimum. If it is a small company with no pension scheme of their own insist they contribute to your private pension.

This is all a bit of a brain dump – I will be editing this post as new things occur to me. Email me if you want specific advice. I don’t mind casting my critical eye over your CV, as long as you are ready to receive unsparing criticism back.

On Friday 16th April 2010, I presented at SQLBits 6. It was held for the first time in London, and also (another great improvement) on a Friday as opposed to Saturdays as before. Unsurprisingly it was by far the best attended SQLBits Community Day (free for attendees and speakers unpaid, expenses covered by sponsors) so far, with 438 attendees.

I really enjoyed the sessions that I managed to attend. Keith Burns on first presenting about SQL Azure, which seems to be progressing steadily despite the hitherto apparently crippling lack of features, Conor Cunningham was next who stealthily managed to avoid the volcanic ash cloud from Iceland to fly in from Texas and present a fascinating session on the Query Optimizer. He was in a room which was used to hold parliament during WW2 (the venue itself is very close to the Houses of Parliament, right next to Westminster Abbey). It was packed – standing room only!

parkplaza I also attended Tony Rogerson’s session on Normalisation and a round-table open Q&A session after lunch with Keith, Conor and Simon Sabin to fill in for a cancelled session by Klaus Aschenbrenner who couldn’t make it due to the closed airspace. I was up at 3.10pm, presenting on High Availability In The Real World, attempting to give an honest and grounded outline of the pros and cons of the HA various options. I was pleased with the turn-out, had some good questions during and after, and felt generally satisfied.

Whizzed off after that to check in to the very cool hotel (picture from the atrium looking north (L)) – it’s only been open for 3 weeks! There are some other good pictures here, including some of the day in general.

Then I hurried back to the Church House Conference Centre for some well-earned beers at the post-conference party. I hung out for a while with Iain Kick from Quest who was pleased to have heard me plugging his company’s software twice during my session!

I stumbled out at 7.30 to meet my girlfriend and some other friends for drinks and dinner. All in all a great day, thanks as ever to the organisers, Simon, Darren, Allan, Tim, Martin, James, etc – great job, can’t wait till the next one.

Simon mentioned it might be in Oxford next time round – yes please, I’ll be able to cycle in!

You will be able to find links to my slides and sample scripts on the SQLBits site soon.

I am still struggling on with this issue. Despite several weeks of mine and PSS’s efforts, we are nowhere nearer resolution.

I took advantage of New Year downtime in the factory at work to completely remove then reinstall the instance. Sledgehammer to crack a nut? Well..

sledgehammersmall 
A lot of effort, many hours planning and preparation, and all to no avail – both the timeouts issue and the (probably unrelated) Access Violation on a certain update issue remain.

I thought I’d share a quick overview of how I went about the reinstall, if any of you ever find yourself in a similar boat. the outline plan went like this:-

1. Put all cluster resources in maintenance mode in monitoring tool. Full backup to disk of all DBs.
2. Disable all relevant jobs, interfaces, etc, keeping a list for re-enabling (or script as appropriate)
3. Unsubscribe publications to and from the instance.
4. Detach user dbs and leave data and log files in tact in cluster partitions.
5. Remove instance in turn from each cluster node.
6. Reinstall instance on each node in turn. (used vanilla RTM install this time, to try to eliminate slipstreaming SP1 as a possible cause of issues)
7. Applied SP1 and CU5 to the instance on each node.
8. Reconfigure (trace flags, memory, static port assignment, tempdb layout etc)
9. Restore msdb (with SQL Agent cluster resource offline)
10. Stop SQL Server cluster resource.
11. Start SQL Server locally via command line in single user mode (outside of clustering).
12. Restore master using SQLCMD.
13. Restart all cluster resources and check. Check all logins and jobs are back.
14. Remove cluster resources from maintenance mode.
15. Recreate publications as necessary, re-subscribe and re-sync.

Restoring the master database on a cluster is interesting (first time I’d ever done it).

Navigate to the location of the SQL Server executables from the command prompt (in my case, something like..
C:\Program Files\Microsoft SQL Server\MSSQL10.INSTANCENAME\MSSQL\Binn

and type

sqlservr –c –m –s INSTANCENAME

where INSTANCENAME is simply the named instance name, not including the SQL Server virtual name (that is not clear in BOL).

Then open an SQLCMD connection, such as like this, and restore the master db:-

sqlcmd –SSERVER\INSTANCENAME –E
>restore database master from disk = ‘\\path\backup.bak’ with replace
>go
 

This restores master and then SQL Server is stopped and you are unceremoniously disconnected.

Of course, this was vaguely interesting, but I certainly had better things to be doing at 6am on the 2nd of January! And it didn’t fix my problems. But it was worth a try, I guess.

Under Microsoft Premier Support Services’ direction, we are testing installing a brand new 2008 instance into the cluster to see if we can isolate the issue to this one problematic instance.

A few more details on my current woes with SQL Server 2008. This is 64-bit Enterprise Edition, SP1 + CU5. We are getting severe, intermittent, but easily reproducible connection timeouts. This is when you attempt to connect to the instance, but the client eventually gets fed up of waiting and gives up, like this in Management Studio:-

 ssms_connection_timeout

This timeout is set by default in SSMS to be 15 seconds (plenty long enough to establish a connection to an instance on your LAN, Gigabit connected, fast server)…

connection_timeout_default_ssms

This error is slightly different when you generate it using SQLCMD:-

 connection_timeout

I am getting other symptoms of the same underlying issue, e.g. occasionally when trying to read the error log in a query window using xp_readerrorlog:-

 
xp_readerrorlog_error2

I have no resolution for this issue yet; will update if/when I do. Troubleshooting with Microsoft Premier Support has involved registry changes for things like Max Ports, chimneying off, SynAttackProtect (no benefit). We have also done network tracing using WireShark. I have done some testing with a custom Microsoft-supplied app, and also my own testing using SQLCMD. And before you asked, yes I have tried rebooting!! That’s not an easy thing to arrange when it’s a live 4-node cluster supporting 24X7 websites and applications… one of the (many) compelling reasons to have a passive node in your cluster, which you can use precisely for this kind of thing.

Wish me luck.. even the Director’s started asking for daily updates. Urgh!