Slow performance after a few days of use

**JonMoore87** · 01-28-2010, 10:31 AM

Greetings,

We have a DSSv6 with 16 1TB SATA disks set up with RAID 6, we have a single LUN connected to a 3 server 2008 R2 cluster.
Our SAN has 4 1Gbps network cards, 1 for management and 1 connected to each host via a cross-over cable.
The SAN is used to store VHD files for our Hyper-V virtual servers, we have 9 servers on the SAN (3 on each host).

Recently after being up for about 40 days the SAN performance went from good to terrible, checking the statistics page showed me that the load had increased from an average of between 2-4 up to 4-8.

I downloaded the logs and opened up the tests file, hdparm was reporting speeds of around 10MBps on the RAID.

I shut down all of the virtual machine as soon as I had a maintainance window and restated the SAN, hdparm was then reporting around 400MBps on the RAID.
I booted the virtual servers and performance was good again.

This lasted for 7 days until the issue re-occured, I restated the SAN again last night and performance is now fine again.

I've raised a support ticket but haven't heard much back so far.

Anyone else had this or know what the problem might be?
Restarting the SAN once a week or even once a month really isn't a viable solution for me.

Many thanks,

Jon

**symm** · 01-28-2010, 03:14 PM

hi jon

did you test memory ?

did you check netstat, ifconfig and meminfo?
was there anything in dmesg, dmesg.2 and critical error.log?

what is your hardware setup?

**JonMoore87** · 01-28-2010, 06:29 PM

Hi Symm,

Thanks for your post! I've posted the tests you mentioned below, critical_errors was empty and dmesg.2 has quite a lot of text in it - do you know what I should be looking for?
Our hardware is;

1 x Intel Xeon 5410 Quad Core 2.33GHz, 12MB Cache, 1333Mhz FSB
4GB DDR2-667 ECC FB-DIMM
16 x Barracuda ES.2 SATA 3.0-Gb/s 1-TB
LSI MEGARaid SAS 8704EM2 controller

Performance has been poor again today... I can't understand what's going on here, we've made no config changes.

Thanks for all your help,
Jonathon

*-----------------------------------------------------------------------------*
netstat -lnp
*-----------------------------------------------------------------------------*

Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:49152 0.0.0.0:* LISTEN 18286/mrmonitord
tcp 0 0 0.0.0.0:49153 0.0.0.0:* LISTEN 18487/mrmonitord
tcp 0 0 0.0.0.0:389 0.0.0.0:* LISTEN 17206/slapd
tcp 0 0 0.0.0.0:199 0.0.0.0:* LISTEN 18159/snmpd
tcp 0 0 0.0.0.0:49258 0.0.0.0:* LISTEN 18274/java
tcp 0 0 0.0.0.0:139 0.0.0.0:* LISTEN 19694/smbd
tcp 0 0 0.0.0.0:11211 0.0.0.0:* LISTEN 16455/memcached
tcp 0 0 127.0.0.1:9999 0.0.0.0:* LISTEN 17163/perl
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 14720/portmap
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 3266/apache2
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 3266/apache2
tcp 0 0 0.0.0.0:3260 0.0.0.0:* LISTEN 16734/iscsi-scstd
tcp 0 0 0.0.0.0:445 0.0.0.0:* LISTEN 19694/smbd
tcp 0 0 0.0.0.0:3071 0.0.0.0:* LISTEN 18274/java
udp 0 0 172.31.101.1:137 0.0.0.0:* 19680/nmbd
udp 0 0 192.168.101.1:137 0.0.0.0:* 19680/nmbd
udp 0 0 192.168.102.1:137 0.0.0.0:* 19680/nmbd
udp 0 0 192.168.103.1:137 0.0.0.0:* 19680/nmbd
udp 0 0 0.0.0.0:137 0.0.0.0:* 19680/nmbd
udp 0 0 172.31.101.1:138 0.0.0.0:* 19680/nmbd
udp 0 0 192.168.101.1:138 0.0.0.0:* 19680/nmbd
udp 0 0 192.168.102.1:138 0.0.0.0:* 19680/nmbd
udp 0 0 192.168.103.1:138 0.0.0.0:* 19680/nmbd
udp 0 0 0.0.0.0:138 0.0.0.0:* 19680/nmbd
udp 0 0 0.0.0.0:161 0.0.0.0:* 18159/snmpd
udp 0 0 0.0.0.0:36284 0.0.0.0:* 18274/java
udp 0 0 0.0.0.0:5571 0.0.0.0:* 18274/java
udp 0 0 0.0.0.0:111 0.0.0.0:* 14720/portmap
udp 0 0 192.168.103.1:500 0.0.0.0:* 14175/racoon
udp 0 0 192.168.102.1:500 0.0.0.0:* 14175/racoon
udp 0 0 192.168.101.1:500 0.0.0.0:* 14175/racoon
udp 0 0 172.31.101.1:500 0.0.0.0:* 14175/racoon
udp 0 0 127.0.0.1:500 0.0.0.0:* 14175/racoon
udp 0 0 0.0.0.0:3071 0.0.0.0:* 18274/java
Active UNIX domain sockets (only servers)
Proto RefCnt Flags Type State I-Node PID/Program name Path
unix 2 [ ACC ] STREAM LISTENING 17155 9467/iscsid @ISCSIADM_ABSTRACT_NAMESPACE
unix 2 [ ACC ] STREAM LISTENING 28729 7405/eventd /tmp/eventd
unix 2 [ ACC ] STREAM LISTENING 27752 16734/iscsi-scstd @ISCSI_SCST_ADM
unix 2 [ ACC ] STREAM LISTENING 23131 14175/racoon /var/run/racoon/racoon.sock
unix 2 [ ACC ] STREAM LISTENING 26297 16185/acpid /var/run/acpid.socket
unix 2 [ ACC ] STREAM LISTENING 23801 14538/syslog-ng /dev/log

*-----------------------------------------------------------------------------*
netstat -s
*-----------------------------------------------------------------------------*

error parsing /proc/net/snmp: Success
Ip:
294868332 total packets received
7 with invalid addresses
0 forwarded
0 incoming packets discarded
294807638 incoming packets delivered
169314537 requests sent out
Icmp:
20 ICMP messages received
1 input ICMP message failed.
ICMP input histogram:
destination unreachable: 20
25 ICMP messages sent
0 ICMP messages failed
ICMP output histogram:
destination unreachable: 25
IcmpMsg:
InType3: 20
OutType3: 25
Tcp:
64551 active connections openings
67024 passive connection openings
46 failed connection attempts
1492 connection resets received
40 connections established
292407920 segments received
166946801 segments send out
29576 segments retransmited
0 bad segments received.
1640 resets sent
Udp:
2395572 packets received
6 packets to unknown port received.
0 packet receive errors
2338133 packets sent
UdpLite:

*-----------------------------------------------------------------------------*
ifconfig -a
*-----------------------------------------------------------------------------*

eth0 Link encap:Ethernet HWaddr 00:30:48:BA:C4:0C
inet addr:192.168.101.1 Bcast:192.168.101.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:54478417 errors:0 dropped:0 overruns:0 frame:0
TX packets:68743410 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:59405455272 (55.3 GiB) TX bytes:81821787782 (76.2 GiB)

eth1 Link encap:Ethernet HWaddr 00:30:48:BA:C4:0D
inet addr:192.168.102.1 Bcast:192.168.102.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
RX packets:236233683 errors:0 dropped:0 overruns:0 frame:0
TX packets:279553463 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:267740193313 (249.3 GiB) TX bytes:368008336212 (342.7 GiB)

eth2 Link encap:Ethernet HWaddr 00:04:23

8:59:22
inet addr:192.168.103.1 Bcast:192.168.103.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:2622 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:0 (0.0 b) TX bytes:246818 (241.0 KiB)

eth3 Link encap:Ethernet HWaddr 00:04:76

E:C0:AB
inet addr:172.31.101.1 Bcast:172.31.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:949366 errors:125166 dropped:0 overruns:179 frame:125166
TX packets:702466 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:815043713 (777.2 MiB) TX bytes:97487602 (92.9 MiB)
Interrupt:28 Base address:0x4000

ipddp0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
BROADCAST NOARP MULTICAST MTU:585 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:3339148 errors:0 dropped:0 overruns:0 frame:0
TX packets:3339148 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:638853650 (609.2 MiB) TX bytes:638853650 (609.2 MiB)

*-----------------------------------------------------------------------------*
cat /proc/meminfo
*-----------------------------------------------------------------------------*

MemTotal: 4035208 kB
MemFree: 27348 kB
Buffers: 3752 kB
Cached: 3708316 kB
SwapCached: 0 kB
Active: 576460 kB
Inactive: 3213604 kB
SwapTotal: 0 kB
SwapFree: 0 kB
Dirty: 7636 kB
Writeback: 0 kB
AnonPages: 77704 kB
Mapped: 29340 kB
Slab: 158960 kB
SReclaimable: 119848 kB
SUnreclaim: 39112 kB
PageTables: 2288 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 2017604 kB
Committed_AS: 615660 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 16124 kB
VmallocChunk: 34359721739 kB
DirectMap4k: 3392 kB
DirectMap2M: 4190208 kB

**JonMoore87** · 01-28-2010, 06:31 PM

P.S I've not tried a memtest as the san is in use... although if it's unavoidable I could always try it at the weekend - would prefer not to do this though.

Thanks,
Jon

**enealDC** · 01-28-2010, 11:31 PM

Hi Couple of questions.

How have you been determining that performance is poor?

Do you have any network monitoring on the switch ports to see the level of network activity during periods of poor performance?

Have you investigated the VMs themselves? Are any of them used as a file server?

Are you running perfmon on the VMs during the poor performance to see if they may be contributing to it?

Nothing appears to be wrong with your setup. I will say this though; don't let hdparm fool you though. Just because hdparm says 400Mbs per second, does not mean your array can sustain that!

Until Open-E provides an iozone type of test, it's difficult to gauge the true performance of your disk subsystem from the local point of view. On new configuration, I will test them using Linux before deploying Open-E on the hardware.

**symm** · 01-29-2010, 01:08 AM

what is connected to eth3, there is a bunch of errors,

what is at the bottom of dmesg2?

**JonMoore87** · 01-29-2010, 10:00 AM

Hi Symm,

Eth3 is connected to our local network and used to manage the SAN, I've unzipped dmesg2 and posted the bottem section below.

Hi enealDC, all of our VMs are very slow at the moment, one is a file server, one exchange, MSSQL etc. I should point out that they had all been running fine before this.

none of the virtual machines seem to be under any extra load, also the statistics page on the SAN shows that network activity over the links to the hosts is about the same during times of poor performance.

Thanks for the help on this guys,
Jon

.domain unexpectedly closed!
scst: Using security group "Default_iqn.2009-11:san01.target1" for initiator "iqn.1991-05.com.microsoft:hvh01.domain"
iscsi-scst: Negotiated parameters: InitialR2T Yes, ImmediateData No, MaxConnections 1, MaxRecvDataSegmentLength 65536, MaxXmitDataSegmentLength 65536,
iscsi-scst: MaxBurstLength 262144, FirstBurstLength 65536, DefaultTime2Wait 2, DefaultTime2Retain 20,
iscsi-scst: MaxOutstandingR2T 1, DataPDUInOrder Yes, DataSequenceInOrder Yes, ErrorRecoveryLevel 0,
iscsi-scst: HeaderDigest None, DataDigest None, OFMarker No, IFMarker No, OFMarkInt 2048, IFMarkInt 2048
e1000e: eth1 NIC Link is Down
e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
e1000: eth2: e1000_watchdog_task: NIC Link is Down
iscsi-scst: ***ERROR*** Connection with initiator iqn.1991-05.com.microsoft:hvh02.domain unexpectedly closed!
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
scst: Using security group "Default_iqn.2009-11:san01.target1" for initiator "iqn.1991-05.com.microsoft:hvh02.domain"
iscsi-scst: Negotiated parameters: InitialR2T Yes, ImmediateData No, MaxConnections 1, MaxRecvDataSegmentLength 65536, MaxXmitDataSegmentLength 65536,
iscsi-scst: MaxBurstLength 262144, FirstBurstLength 65536, DefaultTime2Wait 2, DefaultTime2Retain 20,
iscsi-scst: MaxOutstandingR2T 1, DataPDUInOrder Yes, DataSequenceInOrder Yes, ErrorRecoveryLevel 0,
iscsi-scst: HeaderDigest None, DataDigest None, OFMarker No, IFMarker No, OFMarkInt 2048, IFMarkInt 2048
iscsi-scst: ***ERROR*** Connection with initiator iqn.1991-05.com.microsoft:hvh03.domain unexpectedly closed!
scst: Using security group "Default_iqn.2009-11:san01.target1" for initiator "iqn.1991-05.com.microsoft:hvh03.domain"
iscsi-scst: Negotiated parameters: InitialR2T Yes, ImmediateData No, MaxConnections 1, MaxRecvDataSegmentLength 65536, MaxXmitDataSegmentLength 65536,
iscsi-scst: MaxBurstLength 262144, FirstBurstLength 65536, DefaultTime2Wait 2, DefaultTime2Retain 20,
iscsi-scst: MaxOutstandingR2T 1, DataPDUInOrder Yes, DataSequenceInOrder Yes, ErrorRecoveryLevel 0,
iscsi-scst: HeaderDigest None, DataDigest None, OFMarker No, IFMarker No, OFMarkInt 2048, IFMarkInt 2048
e1000e: eth1 NIC Link is Down
e1000e: eth1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
0000:04:00.1: eth1: 10/100 speed: disabling TSO
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
e1000e: eth0 NIC Link is Down
e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
0000:04:00.0: eth0: 10/100 speed: disabling TSO
e1000e: eth1 NIC Link is Down
e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000e: eth0 NIC Link is Down
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000: eth2: e1000_watchdog_task: NIC Link is Down
e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
e1000e: eth0 NIC Link is Down
e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
iscsi-scst: ***ERROR*** Connection with initiator iqn.1991-05.com.microsoft:hvh01.domain unexpectedly closed!
scst: Using security group "Default_iqn.2009-11:san01.target1" for initiator "iqn.1991-05.com.microsoft:hvh01.domain"
iscsi-scst: Negotiated parameters: InitialR2T Yes, ImmediateData No, MaxConnections 1, MaxRecvDataSegmentLength 65536, MaxXmitDataSegmentLength 65536,
iscsi-scst: MaxBurstLength 262144, FirstBurstLength 65536, DefaultTime2Wait 2, DefaultTime2Retain 20,
iscsi-scst: MaxOutstandingR2T 1, DataPDUInOrder Yes, DataSequenceInOrder Yes, ErrorRecoveryLevel 0,
iscsi-scst: HeaderDigest None, DataDigest None, OFMarker No, IFMarker No, OFMarkInt 2048, IFMarkInt 2048
e1000e: eth1 NIC Link is Down
e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
iscsi-scst: ***ERROR*** Connection with initiator iqn.1991-05.com.microsoft:hvh02.domain unexpectedly closed!

**JonMoore87** · 01-29-2010, 10:14 AM

btw... on the statistics page, what does "load" mean as CPU, memory and network statistics all seem the same as normal but "load" has pretty much doubled.

Thanks,
Jon

**symm** · 01-29-2010, 03:31 PM

check out the connection to eth2
it seems to keep dropping
try the cable or a different port on switch

Let us know what happens

**cphastings** · 01-29-2010, 05:19 PM

I had a similar problem on my production data network, turned out to be a bad switch. Like symm said, I would rule out other hardware issues.

Thread: Slow performance after a few days of use

Thread Tools

Display

Slow performance after a few days of use

Posting Permissions