Tuesday, August 26, 2008

It is a small world (no not the Disney kind)

Dolmen is a large company and today christophe vandeplas because we will be working on a project together. Check out his blog, it has some fine reading material on it :).

Interesting case - part 2

In my previous post I told you about the medical institute having problems. Yesterday their server has crashed again. One could wonder why some of us consider this a good thing, well a problem that repeats it self has a better chance to be solved than one that occurs only once.

I was out of office so my colleague got the dump file and the output was exactly the same. Then we compared the cause and the configuration (sorry to stay vague but I think it is bad practice to name customers and their configuration on my blog) and it seems that Windows has only 2 GB and everything else was dedicated to SQL.

It is fine to dedicate a whole lot of memory to your database server process but the OS has got to breath too.

One of the other problems is that they have only one server and every database in the institute is on it and they expect it to be high available. I proposed that they contacted their sales contact and he would come by with a specialized sales since databases that are high available and the rest is strictly useless.

Tuesday, August 19, 2008

Interesting case

Today I had to be in a medical instituate where there has been a server crash a week ago and now I had to look at the server.

The SQL server has produced a minidump so a post about the SQL minidump will be in the near future on this blog :).

There are a huge amount of errors, i'll have to analyse them and will write about something about them as well.

The third topic i'll have to do some research on is windows 2003 (64-bit) paging, since their crash there is a huge amount of paging.

Thursday, August 7, 2008

A night at an ISP

Recently I've spent the night at one of Belgium's bigger Internet service providers. The ISP had had some trouble with their databases last December and I had to implement database mirroring.

In the beginning of July I had created a test database for their IT people so they could play with it. And now, the time had come to implement it for all their databases as a test to adapt their programming and make it fail-over aware.

There were some specifics as the mirror had to be synchronous and encrypted and it had to be the same port on each server.

So here are my findings:
  • Use 2005 SP2, it figures but I prefer to mention it ;)
  • You need the database to be in full recovery mode
  • Watch out for the auto close option, it runs the fun
  • You need a full backup and a transaction log backup
It actually went pretty smooth since everything (mirror endpoints, encryption and witness) was already there from the previous month.

The mirror wizard doesn't use the full qualified network name for the principal server so at the end it proposes to start mirroring but it fails because you have to manually adapt the principal server.

The only thing that was a real problem was 1 database. For some reason it failed time after time and the error message was that it was unable to connect to the witness or mirror server.

The cause was one app that writes constantly in the database and since it took about 15 minutes to move the backup and restore it with no recovery on the witness it was not possible to create the mirror.

To work around this I made the full backup, restored it with no recovery and then I made the transactional backup. I had the permission to take the database offline once I made the transaction log backup had finished and restored it on the mirror . Once I had put the database back online the mirroring was no problem at all.

We ran some tests and everything went fine. The only thing my customer still has to do is create maintenance plans on the mirror (for some weird reason you can't mirror those) and alter his apps.

At the break of rush hour we all went home for some sleep :).

An update: 10 days later and something went wrong, for some reason one database went suspect on the principal.

Sunday, August 3, 2008

Disable 8.3 short-name generation

One of the things that is still a remainder of the past is the 8.3 short-name. You know it can be a pain to access somethings in program files. Well you can have solve this by editing the registry.

Go to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem
add the dword value NtfsDisable8dot3NameCreation and set the value to 1.

Disable updating last access update

If for some reason you don't need the NTFS file system to keep track of when a file was last accessed you can disable this with a registry key.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem

Add dword value NtfsDisableLastAccessUpdate and set the value to 1. This works when you have rebooted your system.

From a security standpoint it might be a bad idea to do this since it will make it impossible to tell you if someone did or did not access a file.

Where is that procedure?

Last week after the power outage at the medical facility where I was doing a project I noticed that one phrase came back often "Where is that procedure?".

Apparently there were 2 problems, all procedures were word documents saved on a file server but the network was down and after the network was back it seemed quite a struggle to find the correct procedure.

In a previous job I organized a 24/7 IT-standby team to give support. I started out just like this customer but realized quite fast that managing a document library wasn't going to do the trick. I tried to identify what I wanted and what the problems were with the document library.

I wanted a system that was easy to maintain and where everybody of the IT team (15 people) could add the necessary info since gathering the information was dull and usually was the information nearly outdated when a document was "ready". One of the conserns was of course that the system should be accessible only to the IT department and that even when the network and servers in the server room failed the data was accessible.

The solution was simple, I took 1 ordinary desktop and putted a wamp server on it with a mediawiki. The wiki access was restricted and if the system went down, you just had to go and sit at that particular computer.

It was not the best system and it would probably have been better to use a wiki on a stick of a xampp since these can be used on a USB dongle but I wasn't aware of those solutions at the time.

The point is, that people should give thought to the documentation and not just ask documentation for the sake of have a document. Another thing is test that documentation because you can have a procedure and a company who does it for you but in the end you are responsable for your systems.

Business and procedures

Last week I was working in a public medical facilty and there was a power outage at 11:20 AM. This gave some interesting insights. There was no recovery plan so it was stressy for the IT departement of the hospital.

Some backup power system powered the computers but the network was down. The server room should have had 2 backup systems (the emergency room's and the hospital's) according to one of their IT guys but once the network was back I saw that all VM's were restarting and the ESX cluster had been down during the outage because the network connections were down.

They were lucky and lost no data but it is frightening that something like causes panic since it is quite obvious that these things will happen even in 2008 in Belgium.

I have written some technical procedures, like backup and restore, for their SQL Server but when it all comes down to it the basic needs are not fullfiled and I know by experience that this is the case in many companies.

May be some questions should be asked like What are business critical systems?" and do the proper risk management for each asset in the organisation.

Historic data

One of the things I notice over and over again is that many applications clutter databases with historic data. It is actually no that hard for a programmer to make a maintenance plan as a part of his application.

Lets consider a real life situation. Most organizations have accountants and they generate quite some data year after year. Today were in 2008 and honestly, I still haven't figured out why the data of 2005 should be on the system. It is valuable to the organization and law demands to keep it but why keep it on a production system?