cmnd_abort

Printable View

01-21-2008, 02:45 PM
Eisofen

cmnd_abort

Hello,

since nerly a hour I receive emails from dds-lite:
2008/01/21 15:43:17 cmnd_abort(1143) 7c000010 1 1000 42 4096 0 1

Should I get nervous? :-)

Cheers

Matthias
01-21-2008, 10:26 PM
To-M

This message mostly means that some iSCSI Initiator commands were aborted probably due to high usage. This can repeat itself for awhile.

You can try and test with changing the iSCSI daemon options in the Console screen CTRL + ALT + W then Tuning options and set the following.

MaxRecvDataSegmentLength 65536
MaxXmitDataSegmentLength 65536
01-22-2008, 07:45 AM
Eisofen

OK,

I'll try that, but this morning I've had 2 new entries at 03:43. No one is working here at this time :-)

Gruss

Matthias
01-22-2008, 09:29 AM
Eisofen

Hello To-M,

setting the options didn't helped:
2008/01/22 09:44:44 :n
2008/01/22 09:44:44 kernel:cmnd_abort(1143) 11000090 1 1000 42 4096 0 1
2008/01/22 09:43:34 kernel:cmnd_abort(1143) 6000090 1 1000 42 4096 0 1
2008/01/22 09:42:14 kernel:cmnd_abort(1143) 4b000090 1 1000 42 4096 0 1

snippet from dmesg due to restriction of 10000 characters:

Using IPI Shortcut mode
Freeing unused kernel memory: 240k freed
squashfs: version 3.2-r2 (2007/01/15) Phillip Lougher
aufs 20070903
attempt to access beyond end of device
loop0: rw=0, want=66, limit=8
isofs_fill_super: bread failed, dev=loop0, iso_blknum=16, block=32
attempt to access beyond end of device
loop0: rw=0, want=68, limit=8
attempt to access beyond end of device
loop0: rw=0, want=1252, limit=8
attempt to access beyond end of device
loop0: rw=0, want=1028, limit=8
UDF-fs: No partition found (1)
XFS: bad magic number
XFS: SB validate failed
Vendor: TTI-MSA Model: USB 2.0 MD Rev: PMAP
Type: Direct-Access ANSI SCSI revision: 00
SCSI device sda: 985088 512-byte hdwr sectors (504 MB)
sda: Write Protect is off
sda: Mode Sense: 23 00 00 00
SCSI device sda: 985088 512-byte hdwr sectors (504 MB)
sda: Write Protect is off
sda: Mode Sense: 23 00 00 00
sda: sda1
sd 0:0:0:0: Attached scsi removable disk sda
usb-storage: device scan complete
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-0, internal journal
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
attempt to access beyond end of device
dm-1: rw=0, want=66, limit=8
isofs_fill_super: bread failed, dev=dm-1, iso_blknum=16, block=32
attempt to access beyond end of device
dm-1: rw=0, want=68, limit=8
attempt to access beyond end of device
dm-1: rw=0, want=1252, limit=8
attempt to access beyond end of device
dm-1: rw=0, want=1028, limit=8
UDF-fs: No partition found (1)
XFS: bad magic number
XFS: SB validate failed
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed
attempt to access beyond end of device
dm-6: rw=0, want=354, limit=352
isofs_fill_super: bread failed, dev=dm-6, iso_blknum=88, block=176
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed

I've created 3 volumes which will be accessed by two servers via RHCS/clvm. I've formatted all 3 with GFS2. Currently are both servers connected but only one is accessing one target.

Cheers

Matthias
01-22-2008, 09:41 AM
Eisofen

Hello again,

I've checked my servers and found this:
Jan 22 09:13:40 xen-2 iscsid: connect failed (111)
Jan 22 09:13:41 xen-2 iscsid: Kernel reported iSCSI connection 3:0 error (1011) state (3)
Jan 22 09:13:41 xen-2 iscsid: connection1:0 is operational after recovery (2 attempts)
Jan 22 09:13:43 xen-2 iscsid: connection4:0 is operational after recovery (3 attempts)
Jan 22 09:13:44 xen-2 iscsid: connection3:0 is operational after recovery (2 attempts)
Jan 22 09:13:46 xen-2 kernel: connection2:0: iscsi: detected conn error (1011)
Jan 22 09:13:46 xen-2 iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
Jan 22 09:13:51 xen-2 kernel: connection1:0: iscsi: detected conn error (1011)
Jan 22 09:13:51 xen-2 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Jan 22 09:13:53 xen-2 kernel: connection4:0: iscsi: detected conn error (1011)
Jan 22 09:13:54 xen-2 kernel: connection3:0: iscsi: detected conn error (1011)
Jan 22 09:13:54 xen-2 iscsid: Kernel reported iSCSI connection 4:0 error (1011) state (3)
Jan 22 09:13:54 xen-2 iscsid: connection1:0 is operational after recovery (2 attempts)
Jan 22 09:13:54 xen-2 iscsid: Kernel reported iSCSI connection 3:0 error (1011) state (3)
Jan 22 09:13:56 xen-2 iscsid: connection4:0 is operational after recovery (2 attempts)
Jan 22 09:13:57 xen-2 iscsid: connection3:0 is operational after recovery (2 attempts)
Jan 22 09:14:07 xen-2 kernel: connection4:0: iscsi: detected conn error (1011)
Jan 22 09:14:07 xen-2 iscsid: Kernel reported iSCSI connection 4:0 error (1011) state (3)
Jan 22 09:14:08 xen-2 kernel: connection3:0: iscsi: detected conn error (1011)
Jan 22 09:14:08 xen-2 iscsid: Kernel reported iSCSI connection 3:0 error (1011) state (3)
Jan 22 09:14:10 xen-2 iscsid: connect failed (111)
Jan 22 09:14:11 xen-2 iscsid: connect failed (111)
Jan 22 09:14:13 xen-2 kernel: connection1:0: iscsi: detected conn error (1011)
Jan 22 09:14:13 xen-2 iscsid: connection4:0 is operational after recovery (3 attempts)
Jan 22 09:14:14 xen-2 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Jan 22 09:14:14 xen-2 iscsid: connection3:0 is operational after recovery (3 attempts)
Jan 22 09:14:16 xen-2 iscsid: connection1:0 is operational after recovery (2 attempts)
Jan 22 09:14:24 xen-2 kernel: connection4:0: iscsi: detected conn error (1011)
Jan 22 09:14:24 xen-2 iscsid: Kernel reported iSCSI connection 4:0 error (1011) state (3)
Jan 22 09:14:24 xen-2 kernel: connection3:0: iscsi: detected conn error (1011)
Jan 22 09:14:25 xen-2 iscsid: Kernel reported iSCSI connection 3:0 error (1011) state (3)
Jan 22 09:14:25 xen-2 iscsid: connection2:0 is operational after recovery (6 attempts)
Jan 22 09:14:27 xen-2 kernel: connection1:0: iscsi: detected conn error (1011)
Jan 22 09:14:27 xen-2 iscsid: connection4:0 is operational after recovery (2 attempts)
Jan 22 09:14:27 xen-2 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Jan 22 09:14:28 xen-2 iscsid: connection3:0 is operational after recovery (2 attempts)
Jan 22 09:14:30 xen-2 iscsid: connection1:0 is operational after recovery (2 attempts)

initiator and target are hooked to the same switch. Could the switch be the problem? :-)

Cheers

Matthias
01-22-2008, 01:19 PM
To-M

Try directly connecting to DSS LITE to diagnose if this is a switch issue.
01-22-2008, 02:29 PM
To-M

After researching this with the XFS: SB validate failed & XFS: bad magic number errors this could be issues with your Volume. Not sure if you have any data residing on these volumes but if you can backup the data and reconfigure the Unit (RAID set) and start over.
I would recommend using function from Extended Console Tools - "Clear contents of units" in order to delete VG and LV configuration (reboot will happen). Then in the WebGUI add the unit again to the storage.
02-20-2008, 12:55 PM
Eisofen

I've recreated the unit as you suggested, plugged the remaining NIC into a seperate switch wher only DSS and both servers are connected to. The proble is still persistent :-|
ATM I'm a little bit pissed since recreating the setup, clvm volumes and stuff took nearly three hours.

Oh, when this timeout happens the Xen-DomU, which IS on the iSCSI-volume, stops responding...

dmesg:
SCSI device sda: 985088 512-byte hdwr sectors (504 MB)
sda: Write Protect is off
sda: Mode Sense: 23 00 00 00
sda: sda1
sd 0:0:0:0: Attached scsi removable disk sda
usb-storage: device scan complete
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed
kjournald starting. Commit interval 5 seconds
EXT3 FS on dm-0, internal journal
ext3_orphan_cleanup: deleting unreferenced inode 2526
EXT3-fs: dm-0: 1 orphan inode deleted
EXT3-fs: recovery complete.
EXT3-fs: mounted filesystem with ordered data mode.
attempt to access beyond end of device
dm-1: rw=0, want=66, limit=8
isofs_fill_super: bread failed, dev=dm-1, iso_blknum=16, block=32
attempt to access beyond end of device
dm-1: rw=0, want=68, limit=8
attempt to access beyond end of device
dm-1: rw=0, want=1252, limit=8
attempt to access beyond end of device
dm-1: rw=0, want=1028, limit=8
UDF-fs: No partition found (1)
XFS: bad magic number
XFS: SB validate failed
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed
attempt to access beyond end of device
dm-6: rw=0, want=354, limit=352
isofs_fill_super: bread failed, dev=dm-6, iso_blknum=88, block=176
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed

and again:
2008/02/20 13:47:36 kernel:cmnd_abort(1143) 2a000000 1 1000 42 4096 0 1

regards

Matthias
02-20-2008, 10:15 PM
masim

As I know this [cmnd_abort (1143)] is related to the writing process. If disks are high loaded sometimes could the iSCSI reach timeout before writing goes.
So the iSCSI initiator should retry the writing.

Same command will be send when a disk hardware problem slowdown the writing - not only in high load.

What Hardware u r using??

Can you check your drives, RAID??
02-21-2008, 04:30 AM
Eisofen

The server is an Opteron 185 or 2218 (dualcore), 4GB Ram and 3ware 9550 4port HBA.
Disks are four WD3200 configured to raid5 on the 3ware. OK, I know that the 3ware suck on Linux, but we're using them on our other identical servers too.
The raid is fine for me, no problems reported by the HBA.
One thing I've noticed is load and CPU-Load when writing to the array: CPU goes up to 50% when writing with one initiator, load goes up to 3 or 4.

regards

Matthias
02-21-2008, 09:19 AM
Eisofen

reviewed the hardware this morning and did a firmware-upgrade on the 9550SXU to 3.08xxxx. Maybe this speeds things up...

Processor: Opteron 285
mem: 8GB

Cheers
Matthias
02-21-2008, 03:02 PM
Eisofen

Well..

That dindn't helped..
still getting cmd_abort and
XFS: bad magic number
XFS: SB validate failed
attempt to access beyond end of device
dm-6: rw=0, want=354, limit=352
isofs_fill_super: bread failed, dev=dm-6, iso_blknum=88, block=176
UDF-fs: No VRS found
XFS: bad magic number
XFS: SB validate failed

looks like I'll trash Open-e and use GNBD, since I need a working solution..

cheers

Matthias
07-09-2009, 10:16 PM
Robotbeat

Okay, I guess I'm kind of necro-bumping an old post, but we have a customer who is getting lots of these cmnd_abort(1143) errors. They are using the old build 3278. Obviously, the Atlanta build uses a different iscsi target framework by default (it uses SCST vs. old IET). So, this will most likely fix the problem, right?

(Also, we're making the changes suggested in the knowledge base article regarding buffer lengths for the iscsi target.) We are also enabling disk write cache, which we've seen to GREATLY increase performance when using 7200rpm drives and the Areca 1680 controller (rebuild times decrease by a full order of magnitude).
07-09-2009, 10:26 PM
To-M

I saw this link - could be something.

https://forums.openfiler.com/viewtopic.php?id=1654
07-09-2009, 10:53 PM
Robotbeat

Yeah, this seems to confirm it. open-filer uses IET, I believe. While googling, I only really saw "cmnd_abort(1143)" when refering to IET. So, either this issue isn't a problem for SCST or SCST gives a different error name/number. Good to know!
07-09-2009, 10:56 PM
To-M

Correct, they use IET as well and we now use SCST - though we have IET as well that is switchable.