Visit Open-E website
Page 1 of 2 12 LastLast
Results 1 to 10 of 18

Thread: Slow performance after a few days of use

  1. #1

    Question Slow performance after a few days of use

    Greetings,

    We have a DSSv6 with 16 1TB SATA disks set up with RAID 6, we have a single LUN connected to a 3 server 2008 R2 cluster.
    Our SAN has 4 1Gbps network cards, 1 for management and 1 connected to each host via a cross-over cable.
    The SAN is used to store VHD files for our Hyper-V virtual servers, we have 9 servers on the SAN (3 on each host).

    Recently after being up for about 40 days the SAN performance went from good to terrible, checking the statistics page showed me that the load had increased from an average of between 2-4 up to 4-8.

    I downloaded the logs and opened up the tests file, hdparm was reporting speeds of around 10MBps on the RAID.

    I shut down all of the virtual machine as soon as I had a maintainance window and restated the SAN, hdparm was then reporting around 400MBps on the RAID.
    I booted the virtual servers and performance was good again.

    This lasted for 7 days until the issue re-occured, I restated the SAN again last night and performance is now fine again.

    I've raised a support ticket but haven't heard much back so far.

    Anyone else had this or know what the problem might be?
    Restarting the SAN once a week or even once a month really isn't a viable solution for me.

    Many thanks,

    Jon

  2. #2

    Default

    hi jon

    did you test memory ?

    did you check netstat, ifconfig and meminfo?
    was there anything in dmesg, dmesg.2 and critical error.log?

    what is your hardware setup?

  3. #3

    Default

    Hi Symm,

    Thanks for your post! I've posted the tests you mentioned below, critical_errors was empty and dmesg.2 has quite a lot of text in it - do you know what I should be looking for?
    Our hardware is;

    1 x Intel Xeon 5410 Quad Core 2.33GHz, 12MB Cache, 1333Mhz FSB
    4GB DDR2-667 ECC FB-DIMM
    16 x Barracuda ES.2 SATA 3.0-Gb/s 1-TB
    LSI MEGARaid SAS 8704EM2 controller

    Performance has been poor again today... I can't understand what's going on here, we've made no config changes.

    Thanks for all your help,
    Jonathon

    *-----------------------------------------------------------------------------*
    netstat -lnp
    *-----------------------------------------------------------------------------*

    Active Internet connections (only servers)
    Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
    tcp 0 0 0.0.0.0:49152 0.0.0.0:* LISTEN 18286/mrmonitord
    tcp 0 0 0.0.0.0:49153 0.0.0.0:* LISTEN 18487/mrmonitord
    tcp 0 0 0.0.0.0:389 0.0.0.0:* LISTEN 17206/slapd
    tcp 0 0 0.0.0.0:199 0.0.0.0:* LISTEN 18159/snmpd
    tcp 0 0 0.0.0.0:49258 0.0.0.0:* LISTEN 18274/java
    tcp 0 0 0.0.0.0:139 0.0.0.0:* LISTEN 19694/smbd
    tcp 0 0 0.0.0.0:11211 0.0.0.0:* LISTEN 16455/memcached
    tcp 0 0 127.0.0.1:9999 0.0.0.0:* LISTEN 17163/perl
    tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 14720/portmap
    tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 3266/apache2
    tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 3266/apache2
    tcp 0 0 0.0.0.0:3260 0.0.0.0:* LISTEN 16734/iscsi-scstd
    tcp 0 0 0.0.0.0:445 0.0.0.0:* LISTEN 19694/smbd
    tcp 0 0 0.0.0.0:3071 0.0.0.0:* LISTEN 18274/java
    udp 0 0 172.31.101.1:137 0.0.0.0:* 19680/nmbd
    udp 0 0 192.168.101.1:137 0.0.0.0:* 19680/nmbd
    udp 0 0 192.168.102.1:137 0.0.0.0:* 19680/nmbd
    udp 0 0 192.168.103.1:137 0.0.0.0:* 19680/nmbd
    udp 0 0 0.0.0.0:137 0.0.0.0:* 19680/nmbd
    udp 0 0 172.31.101.1:138 0.0.0.0:* 19680/nmbd
    udp 0 0 192.168.101.1:138 0.0.0.0:* 19680/nmbd
    udp 0 0 192.168.102.1:138 0.0.0.0:* 19680/nmbd
    udp 0 0 192.168.103.1:138 0.0.0.0:* 19680/nmbd
    udp 0 0 0.0.0.0:138 0.0.0.0:* 19680/nmbd
    udp 0 0 0.0.0.0:161 0.0.0.0:* 18159/snmpd
    udp 0 0 0.0.0.0:36284 0.0.0.0:* 18274/java
    udp 0 0 0.0.0.0:5571 0.0.0.0:* 18274/java
    udp 0 0 0.0.0.0:111 0.0.0.0:* 14720/portmap
    udp 0 0 192.168.103.1:500 0.0.0.0:* 14175/racoon
    udp 0 0 192.168.102.1:500 0.0.0.0:* 14175/racoon
    udp 0 0 192.168.101.1:500 0.0.0.0:* 14175/racoon
    udp 0 0 172.31.101.1:500 0.0.0.0:* 14175/racoon
    udp 0 0 127.0.0.1:500 0.0.0.0:* 14175/racoon
    udp 0 0 0.0.0.0:3071 0.0.0.0:* 18274/java
    Active UNIX domain sockets (only servers)
    Proto RefCnt Flags Type State I-Node PID/Program name Path
    unix 2 [ ACC ] STREAM LISTENING 17155 9467/iscsid @ISCSIADM_ABSTRACT_NAMESPACE
    unix 2 [ ACC ] STREAM LISTENING 28729 7405/eventd /tmp/eventd
    unix 2 [ ACC ] STREAM LISTENING 27752 16734/iscsi-scstd @ISCSI_SCST_ADM
    unix 2 [ ACC ] STREAM LISTENING 23131 14175/racoon /var/run/racoon/racoon.sock
    unix 2 [ ACC ] STREAM LISTENING 26297 16185/acpid /var/run/acpid.socket
    unix 2 [ ACC ] STREAM LISTENING 23801 14538/syslog-ng /dev/log

    *-----------------------------------------------------------------------------*
    netstat -s
    *-----------------------------------------------------------------------------*

    error parsing /proc/net/snmp: Success
    Ip:
    294868332 total packets received
    7 with invalid addresses
    0 forwarded
    0 incoming packets discarded
    294807638 incoming packets delivered
    169314537 requests sent out
    Icmp:
    20 ICMP messages received
    1 input ICMP message failed.
    ICMP input histogram:
    destination unreachable: 20
    25 ICMP messages sent
    0 ICMP messages failed
    ICMP output histogram:
    destination unreachable: 25
    IcmpMsg:
    InType3: 20
    OutType3: 25
    Tcp:
    64551 active connections openings
    67024 passive connection openings
    46 failed connection attempts
    1492 connection resets received
    40 connections established
    292407920 segments received
    166946801 segments send out
    29576 segments retransmited
    0 bad segments received.
    1640 resets sent
    Udp:
    2395572 packets received
    6 packets to unknown port received.
    0 packet receive errors
    2338133 packets sent
    UdpLite:


    *-----------------------------------------------------------------------------*
    ifconfig -a
    *-----------------------------------------------------------------------------*

    eth0 Link encap:Ethernet HWaddr 00:30:48:BA:C4:0C
    inet addr:192.168.101.1 Bcast:192.168.101.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
    RX packets:54478417 errors:0 dropped:0 overruns:0 frame:0
    TX packets:68743410 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:59405455272 (55.3 GiB) TX bytes:81821787782 (76.2 GiB)

    eth1 Link encap:Ethernet HWaddr 00:30:48:BA:C4:0D
    inet addr:192.168.102.1 Bcast:192.168.102.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MULTICAST MTU:9000 Metric:1
    RX packets:236233683 errors:0 dropped:0 overruns:0 frame:0
    TX packets:279553463 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:267740193313 (249.3 GiB) TX bytes:368008336212 (342.7 GiB)

    eth2 Link encap:Ethernet HWaddr 00:04:238:59:22
    inet addr:192.168.103.1 Bcast:192.168.103.255 Mask:255.255.255.0
    UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
    RX packets:0 errors:0 dropped:0 overruns:0 frame:0
    TX packets:2622 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:100
    RX bytes:0 (0.0 b) TX bytes:246818 (241.0 KiB)

    eth3 Link encap:Ethernet HWaddr 00:04:76E:C0:AB
    inet addr:172.31.101.1 Bcast:172.31.255.255 Mask:255.255.0.0
    UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
    RX packets:949366 errors:125166 dropped:0 overruns:179 frame:125166
    TX packets:702466 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:815043713 (777.2 MiB) TX bytes:97487602 (92.9 MiB)
    Interrupt:28 Base address:0x4000

    ipddp0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
    BROADCAST NOARP MULTICAST MTU:585 Metric:1
    RX packets:0 errors:0 dropped:0 overruns:0 frame:0
    TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:1000
    RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

    lo Link encap:Local Loopback
    inet addr:127.0.0.1 Mask:255.0.0.0
    UP LOOPBACK RUNNING MTU:16436 Metric:1
    RX packets:3339148 errors:0 dropped:0 overruns:0 frame:0
    TX packets:3339148 errors:0 dropped:0 overruns:0 carrier:0
    collisions:0 txqueuelen:0
    RX bytes:638853650 (609.2 MiB) TX bytes:638853650 (609.2 MiB)

    *-----------------------------------------------------------------------------*
    cat /proc/meminfo
    *-----------------------------------------------------------------------------*

    MemTotal: 4035208 kB
    MemFree: 27348 kB
    Buffers: 3752 kB
    Cached: 3708316 kB
    SwapCached: 0 kB
    Active: 576460 kB
    Inactive: 3213604 kB
    SwapTotal: 0 kB
    SwapFree: 0 kB
    Dirty: 7636 kB
    Writeback: 0 kB
    AnonPages: 77704 kB
    Mapped: 29340 kB
    Slab: 158960 kB
    SReclaimable: 119848 kB
    SUnreclaim: 39112 kB
    PageTables: 2288 kB
    NFS_Unstable: 0 kB
    Bounce: 0 kB
    WritebackTmp: 0 kB
    CommitLimit: 2017604 kB
    Committed_AS: 615660 kB
    VmallocTotal: 34359738367 kB
    VmallocUsed: 16124 kB
    VmallocChunk: 34359721739 kB
    DirectMap4k: 3392 kB
    DirectMap2M: 4190208 kB

  4. #4

    Default

    P.S I've not tried a memtest as the san is in use... although if it's unavoidable I could always try it at the weekend - would prefer not to do this though.

    Thanks,
    Jon

  5. #5
    Join Date
    Aug 2008
    Posts
    236

    Default

    Hi Couple of questions.

    How have you been determining that performance is poor?

    Do you have any network monitoring on the switch ports to see the level of network activity during periods of poor performance?

    Have you investigated the VMs themselves? Are any of them used as a file server?

    Are you running perfmon on the VMs during the poor performance to see if they may be contributing to it?

    Nothing appears to be wrong with your setup. I will say this though; don't let hdparm fool you though. Just because hdparm says 400Mbs per second, does not mean your array can sustain that!

    Until Open-E provides an iozone type of test, it's difficult to gauge the true performance of your disk subsystem from the local point of view. On new configuration, I will test them using Linux before deploying Open-E on the hardware.

  6. #6

    Default

    what is connected to eth3, there is a bunch of errors,

    what is at the bottom of dmesg2?

  7. #7

    Default

    Hi Symm,

    Eth3 is connected to our local network and used to manage the SAN, I've unzipped dmesg2 and posted the bottem section below.

    Hi enealDC, all of our VMs are very slow at the moment, one is a file server, one exchange, MSSQL etc. I should point out that they had all been running fine before this.

    none of the virtual machines seem to be under any extra load, also the statistics page on the SAN shows that network activity over the links to the hosts is about the same during times of poor performance.

    Thanks for the help on this guys,
    Jon

    .domain unexpectedly closed!
    scst: Using security group "Default_iqn.2009-11:san01.target1" for initiator "iqn.1991-05.com.microsoft:hvh01.domain"
    iscsi-scst: Negotiated parameters: InitialR2T Yes, ImmediateData No, MaxConnections 1, MaxRecvDataSegmentLength 65536, MaxXmitDataSegmentLength 65536,
    iscsi-scst: MaxBurstLength 262144, FirstBurstLength 65536, DefaultTime2Wait 2, DefaultTime2Retain 20,
    iscsi-scst: MaxOutstandingR2T 1, DataPDUInOrder Yes, DataSequenceInOrder Yes, ErrorRecoveryLevel 0,
    iscsi-scst: HeaderDigest None, DataDigest None, OFMarker No, IFMarker No, OFMarkInt 2048, IFMarkInt 2048
    e1000e: eth1 NIC Link is Down
    e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    iscsi-scst: ***ERROR*** Connection with initiator iqn.1991-05.com.microsoft:hvh02.domain unexpectedly closed!
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
    scst: Using security group "Default_iqn.2009-11:san01.target1" for initiator "iqn.1991-05.com.microsoft:hvh02.domain"
    iscsi-scst: Negotiated parameters: InitialR2T Yes, ImmediateData No, MaxConnections 1, MaxRecvDataSegmentLength 65536, MaxXmitDataSegmentLength 65536,
    iscsi-scst: MaxBurstLength 262144, FirstBurstLength 65536, DefaultTime2Wait 2, DefaultTime2Retain 20,
    iscsi-scst: MaxOutstandingR2T 1, DataPDUInOrder Yes, DataSequenceInOrder Yes, ErrorRecoveryLevel 0,
    iscsi-scst: HeaderDigest None, DataDigest None, OFMarker No, IFMarker No, OFMarkInt 2048, IFMarkInt 2048
    iscsi-scst: ***ERROR*** Connection with initiator iqn.1991-05.com.microsoft:hvh03.domain unexpectedly closed!
    scst: Using security group "Default_iqn.2009-11:san01.target1" for initiator "iqn.1991-05.com.microsoft:hvh03.domain"
    iscsi-scst: Negotiated parameters: InitialR2T Yes, ImmediateData No, MaxConnections 1, MaxRecvDataSegmentLength 65536, MaxXmitDataSegmentLength 65536,
    iscsi-scst: MaxBurstLength 262144, FirstBurstLength 65536, DefaultTime2Wait 2, DefaultTime2Retain 20,
    iscsi-scst: MaxOutstandingR2T 1, DataPDUInOrder Yes, DataSequenceInOrder Yes, ErrorRecoveryLevel 0,
    iscsi-scst: HeaderDigest None, DataDigest None, OFMarker No, IFMarker No, OFMarkInt 2048, IFMarkInt 2048
    e1000e: eth1 NIC Link is Down
    e1000e: eth1 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
    0000:04:00.1: eth1: 10/100 speed: disabling TSO
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
    e1000e: eth0 NIC Link is Down
    e1000e: eth0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
    0000:04:00.0: eth0: 10/100 speed: disabling TSO
    e1000e: eth1 NIC Link is Down
    e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000e: eth0 NIC Link is Down
    e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000: eth2: e1000_watchdog_task: NIC Link is Down
    e1000: eth2: e1000_watchdog_task: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX
    e1000e: eth0 NIC Link is Down
    e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
    iscsi-scst: ***ERROR*** Connection with initiator iqn.1991-05.com.microsoft:hvh01.domain unexpectedly closed!
    scst: Using security group "Default_iqn.2009-11:san01.target1" for initiator "iqn.1991-05.com.microsoft:hvh01.domain"
    iscsi-scst: Negotiated parameters: InitialR2T Yes, ImmediateData No, MaxConnections 1, MaxRecvDataSegmentLength 65536, MaxXmitDataSegmentLength 65536,
    iscsi-scst: MaxBurstLength 262144, FirstBurstLength 65536, DefaultTime2Wait 2, DefaultTime2Retain 20,
    iscsi-scst: MaxOutstandingR2T 1, DataPDUInOrder Yes, DataSequenceInOrder Yes, ErrorRecoveryLevel 0,
    iscsi-scst: HeaderDigest None, DataDigest None, OFMarker No, IFMarker No, OFMarkInt 2048, IFMarkInt 2048
    e1000e: eth1 NIC Link is Down
    e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
    iscsi-scst: ***ERROR*** Connection with initiator iqn.1991-05.com.microsoft:hvh02.domain unexpectedly closed!

  8. #8

    Default

    btw... on the statistics page, what does "load" mean as CPU, memory and network statistics all seem the same as normal but "load" has pretty much doubled.

    Thanks,
    Jon

  9. #9

    Default

    check out the connection to eth2
    it seems to keep dropping
    try the cable or a different port on switch

    Let us know what happens

  10. #10

    Default

    I had a similar problem on my production data network, turned out to be a bad switch. Like symm said, I would rule out other hardware issues.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •