Visit Open-E website
Results 1 to 3 of 3

Thread: Disconnects and corrupt filesystem...

  1. #1

    Unhappy Disconnects and corrupt filesystem...

    Hello,

    We have 2 new 12 disk 600GB 15K disk cabinets... 10GB dedicated replication and 4x1GB multipath connections redundant over HP3500YL switches...
    We use open-e 7 with active-active. We have MTU 9000 set and no flow control on the dedicated SAN switches. We use Xenserver 6.1 on two dual CPU machines with 128GB memory and we wired all with Cat6A new cables.

    We now have a problem with especially 1 used VM and need to get this fixed quickly. (maybe others to but they are not so heavily used:

    Once in a while we get on the XENSERVER 6.1:
    Dec 2 16:01:31 xs2 tapdisk[3227]: ERROR: errno -16 at __tapdisk_vbd_request_timeout: req tap-4.22 timed out, retried 0 times
    Dec 2 16:01:31 xs2 tapdisk[3227]: nbd: NBD server unpause(0x9870140) - listening_fd = -1
    Dec 2 16:01:31 xs2 kernel: [139954.440854] end_request: I/O error, dev tde, sector 159705793
    Dec 2 16:01:31 xs2 tapdisk[3227]: Res=-16, image->type=4

    In the VM:
    Dec 2 16:01:33 server02 kernel: [<ffffffff88036e31>] :jbd:log_wait_commit+0xa3/0xf5
    Dec 2 16:01:33 server02 xinetd[2002]: EXIT: nrpe status=0 pid=23859 duration=1(sec)
    Dec 2 16:01:33 server02 kernel: [<ffffffff8029ec32>] autoremove_wake_function+0x0/0x2e
    Dec 2 16:01:33 server02 xinetd[2002]: START: nrpe pid=23868 from=31.3.103.45
    Dec 2 16:01:33 server02 nrpe[23868]: Error: Could not complete SSL handshake. 5
    Dec 2 16:01:33 server02 kernel: [<ffffffff880317a1>] :jbd:journal_stop+0x1d3/0x203
    Dec 2 16:01:33 server02 kernel: [<ffffffff80230ef0>] __writeback_single_inode+0x1dd/0x31c
    Dec 2 16:01:33 server02 kernel: [<ffffffff802d6a54>] do_readv_writev+0x26e/0x291
    Dec 2 16:01:33 server02 xinetd[2002]: EXIT: nrpe status=0 pid=23868 duration=0(sec)
    Dec 2 16:01:33 server02 kernel: [<ffffffff802ea4f4>] sync_inode+0x24/0x33
    Dec 2 16:01:33 server02 xinetd[2002]: EXIT: nrpe status=0 pid=23871 duration=0(sec)
    Dec 2 16:01:33 server02 kernel: [<ffffffff8804c37e>] :ext3:ext3_sync_file+0xce/0xf8
    Dec 2 16:01:33 server02 xinetd[2002]: START: nrpe pid=23884 from=31.3.103.45
    Dec 2 16:01:33 server02 kernel: [<ffffffff80251b39>] do_fsync+0x52/0xa4
    Dec 2 16:01:33 server02 xinetd[2002]: EXIT: nrpe status=0 pid=23884 duration=0(sec)
    Dec 2 16:01:33 server02 kernel: [<ffffffff802d7261>] __do_fsync+0x23/0x36
    Dec 2 16:01:33 server02 kernel: [<ffffffff802602f9>] tracesys+0xab/0xb6
    Dec 2 16:01:33 server02 xinetd[2002]: START: nrpe pid=23894 from=31.3.103.45
    Dec 2 16:01:33 server02 kernel:
    Dec 2 16:01:33 server02 xinetd[2002]: EXIT: nrpe status=0 pid=23894 duration=0(sec)
    Dec 2 16:01:33 server02 kernel: end_request: I/O error, dev xvda, sector 159705793
    Dec 2 16:01:33 server02 xinetd[2002]: START: nrpe pid=23897 from=31.3.103.45
    Dec 2 16:01:33 server02 kernel: Buffer I/O error on device xvda4, logical block 14049296
    Dec 2 16:01:33 server02 kernel: lost page write due to I/O error on xvda4
    Dec 2 16:01:33 server02 xinetd[2002]: EXIT: nrpe status=0 pid=23897 duration=0(sec)
    Dec 2 16:01:33 server02 kernel: end_request: I/O error, dev xvda, sector 159705801
    Dec 2 16:01:33 server02 kernel: Buffer I/O error on device xvda4, logical block 14049297
    Dec 2 16:01:33 server02 kernel: lost page write due to I/O error on xvda4

    We then need to reboot, since the VM's filesystem is set readonly..... Very anoying....

    What could cause this? better still how can we solve this? With flow control or?

    Thanks for the answers!

  2. #2

    Default

    Just an update: Xenserver with VM storage space bigger then 150GB causes this, it is a Xenserver coalesce errror and is not related to Open-e

  3. #3
    Join Date
    Aug 2010
    Posts
    404

    Default

    westm003, thank you for your update.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •