Saturday, February 25, 2012

Problems with Active / Passive SQL cluster

Hi everybody
I'm run into a strange problem at a customer site.
We're running a two node cluster with one virtual SQL server.
Software
Windows 2003 Server Enterprise
Microsoft SQL server 2000 Enterprise Edition (SP3)
Here's the HW setup
2 HP DL 740 G2
each of them equipped with
8 * 2.7 GHz 2 mb cache CPU
33 GB RAM (usable)
2 * 72 Gb RAID1 internal drives
Storage
HP EVA 3000 with redundant paths
300 GB LUN VRaid1 (Database)
50 GB LUN VRaid1 (Logfiles)
300 GB LUN VRaid1(Databasedump)
1 GB LUN VRaid1 (Quorum)
All FW are the latest confirmed by HP
My problem
All clusterresources resides on Node1, everything is working fine.
But...
If I power down the passive node (Node2), then the active node (Node1) will
hang up on me, I can't even get the start meny up...
How can a power down of a passive node impact the active node like that?
Can it be that the Software vendor for the databaseapplication screwed up
with the SQL-scripts, can it be that some proccesses are running at the
passive node.
I should mention that the SQL server where installed at the passive node.
Please help me, because I'm lost...
Regards
Zeke
Hi
It should not.....
Sounds like a hardware or OS problem. You can prove this by taking SQL
server offline and doing the same test.
What happens when you power up node 1 with node 2 down?
Regards
Mike
"Zeke" wrote:

> Hi everybody
> I'm run into a strange problem at a customer site.
> We're running a two node cluster with one virtual SQL server.
> Software
> Windows 2003 Server Enterprise
> Microsoft SQL server 2000 Enterprise Edition (SP3)
> Here's the HW setup
> 2 HP DL 740 G2
> each of them equipped with
> 8 * 2.7 GHz 2 mb cache CPU
> 33 GB RAM (usable)
> 2 * 72 Gb RAID1 internal drives
> Storage
> HP EVA 3000 with redundant paths
> 300 GB LUN VRaid1 (Database)
> 50 GB LUN VRaid1 (Logfiles)
> 300 GB LUN VRaid1(Databasedump)
> 1 GB LUN VRaid1 (Quorum)
> All FW are the latest confirmed by HP
> My problem
> All clusterresources resides on Node1, everything is working fine.
> But...
> If I power down the passive node (Node2), then the active node (Node1) will
> hang up on me, I can't even get the start meny up...
> How can a power down of a passive node impact the active node like that?
> Can it be that the Software vendor for the databaseapplication screwed up
> with the SQL-scripts, can it be that some proccesses are running at the
> passive node.
> I should mention that the SQL server where installed at the passive node.
> Please help me, because I'm lost...
> Regards
> Zeke
|||Hi Mike
Since the cluster is serving a journalsystem on a major hospital I'm
allowed to work with it 2:00 AM - 4:00 AM sundays... (-:
I'll try to test, but it will probably take a while...
So you think that it's the OS or Hardware, well I'm ruling out HW, I've done
every test possible on the hardware. HP have been here to verify the
installation to.
So I should try to just stop the SQL server on the passive node, and if I
got the same problem? Then what?
"Mike Epprecht (SQL MVP)" wrote:
[vbcol=seagreen]
> Hi
> It should not.....
> Sounds like a hardware or OS problem. You can prove this by taking SQL
> server offline and doing the same test.
> What happens when you power up node 1 with node 2 down?
> Regards
> Mike
> "Zeke" wrote:
|||i have a similar problem to Zekes, all the activee resources were working on
the active node initially,the server is set up as active/active setup. But
suddenly thee resources are now shared by the 2 nodes. for example node1 has
cluster resources running on it while node 2 has Sql server resources working
on it . What could cause this and how do i solve the problem.
Victord70
"Zeke" wrote:

> Hi everybody
> I'm run into a strange problem at a customer site.
> We're running a two node cluster with one virtual SQL server.
> Software
> Windows 2003 Server Enterprise
> Microsoft SQL server 2000 Enterprise Edition (SP3)
> Here's the HW setup
> 2 HP DL 740 G2
> each of them equipped with
> 8 * 2.7 GHz 2 mb cache CPU
> 33 GB RAM (usable)
> 2 * 72 Gb RAID1 internal drives
> Storage
> HP EVA 3000 with redundant paths
> 300 GB LUN VRaid1 (Database)
> 50 GB LUN VRaid1 (Logfiles)
> 300 GB LUN VRaid1(Databasedump)
> 1 GB LUN VRaid1 (Quorum)
> All FW are the latest confirmed by HP
> My problem
> All clusterresources resides on Node1, everything is working fine.
> But...
> If I power down the passive node (Node2), then the active node (Node1) will
> hang up on me, I can't even get the start meny up...
> How can a power down of a passive node impact the active node like that?
> Can it be that the Software vendor for the databaseapplication screwed up
> with the SQL-scripts, can it be that some proccesses are running at the
> passive node.
> I should mention that the SQL server where installed at the passive node.
> Please help me, because I'm lost...
> Regards
> Zeke
|||Hi
You have 1 SQL Instance, in a 2 node cluster.
Am I understanding you right that you have at any one time SQL Server
running on both nodes of the cluster? If so, that is not correct.
With hardware problems, what might be occurring is that with the other node
up, either the fiber or network is getting terminated correctly, and with
the other node off, it does not.
Can you post here the following configuration: What resources are
configured, their names, and on which nodes they run (e.g. Cluster Name,
node 1, Cluster IP address node 1, Quorum Drive node 1, SQL IP address, Node
1 etc...)
Regards
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland
IM: mike@.epprecht.net
MVP Program: http://www.microsoft.com/mvp
Blog: http://www.msmvps.com/epprecht/
"Zeke" <Zeke@.discussions.microsoft.com> wrote in message
news:93379863-8A9B-479E-86AA-2B46D6747FB5@.microsoft.com...[vbcol=seagreen]
> Hi Mike
> Since the cluster is serving a journalsystem on a major hospital I'm
> allowed to work with it 2:00 AM - 4:00 AM sundays... (-:
> I'll try to test, but it will probably take a while...
> So you think that it's the OS or Hardware, well I'm ruling out HW, I've
> done
> every test possible on the hardware. HP have been here to verify the
> installation to.
> So I should try to just stop the SQL server on the passive node, and if I
> got the same problem? Then what?
>
> "Mike Epprecht (SQL MVP)" wrote:
|||Hi Mike!
No! It's Active/Passive configuration, it's installed on the default
SQL-server instance.
Due to a misshapp yesterday a failover accured, this is what happend.
Node 1 Active, where taken down due to av cabling error.
The failover whent as it supposed to.
When Node 2 got the resources online we powered down Node 1, without any
hangups or performance drops.
The problem only occurs when Node 1 have the resources online and I power
down Node 2.
Weird?
|||"vdavid70" <vdavid70@.discussions.microsoft.com> wrote in message
news:8F86D6A1-2E26-4D4E-9546-F99C37A34BCD@.microsoft.com...
> i have a similar problem to Zekes, all the activee resources were working
on
> the active node initially,the server is set up as active/active setup. But
> suddenly thee resources are now shared by the 2 nodes. for example node1
has
> cluster resources running on it while node 2 has Sql server resources
working
> on it . What could cause this and how do i solve the problem.
I would make sure your heartbeat is setup correctly.

No comments:

Post a Comment