Hello,
I'm a collegue of Marco Chiavacci, who you already know (he's on holiday now).
I'm posting this thread because we are experiencing a serious trouble with snapshots.
This problem happens usually on "old" iSCSI XSR Enterprise but sometime it happens also on new iSCSI R3:
When you try to access data on snapshot partition (mounted on a Windows server by MS-Iscsi initiator), some folders are corrupted and unreadable, but other ones will be correct accessible.
If you try to access the same folders on standard iSCSI partition, all folders are correctly readabe (this is a non-sense, because if data are really corrupted, they should be unaccessible on both target and his snapshot).
If you will simply reboot iSCSI server, all data will become accessible.
Do you have some suggestions?
All iscsi servers are updated to most recent software versions, iSCSI initiator on target servers is 2.03 (we are upgrading it to 2.04 but we don't think this could be the problem).
Thank you and please excuse me for my bad English.
Marco Grassi.
Thank you for your help.
I'm sending you logfiles by e-mail.
Now, snap003,005,006,008,009,011,013,014 are corrupted (you can access to the snapshot, but you cannot read some directories like /inetpub/webs)
Regards,
Marco Grassi.
Please re-flash the iSCSI module again with 1.72, you have very minimal space left.
Also I see unconfigured Volume replication tasks - are you using these? Could be part of the reason for the Snapshot issue.
The Execute_task_management errors means that the Initiator was not reachable for a while.
We export it to critical errors because sometimes this behavior can damage file system or data if packet errors exist on the test.log file then this error is generated. Also caused by bond issues from NIC, switch or network.
If this is happening for the iSCSI-R3 Enterprise as well like you stated then we will need to see these logs as well to verify.
Aug 7 07:16:56 iSCSI-T kernel: execute_task_management(1236) 39eafdb 5 ffffffff
Aug 7 07:16:57 iSCSI-T kernel: execute_task_management(1236) 1cf6447 5 ffffffff
Hello,
I don't understand why I have to re-flash the server..... I have 15 ISCSI server and all of them have the issue, do you really think that re-flashing will solve something?
About Volume replication tasks, I cannot find any of them by interface (I'm sending you a screenshot), how can I delete them?
About execute_task issue: I'm sure that we don't have network problems (iSCSI connecttions are in a dedicated "backup" network, made by very fast switches (please take note also that we never experienced snapshot corruption for our NAS XSR servers).
The main suspect could be iSCSI initiator, but WHY we ever have corruption on snapshot device and never have it in main targets? Why if I reboot iSCSI server "corrupted" data will become again readable?
Is it possible that 14 targets (and 14 snapshot) accessed by 14 different iSCSI clients are too many for the same iSCSI server?
Do you have other suggestions?
Bye,
Marco Grassi.
You sent the log file and we saw specific errors and feel re-flashing would fix issues pointed out in previous reply. If you wish not to assist then let's move on. Any errors, Event IDs or try version 2.02. If you are saying only the Snapshot is corrupted during active states of the Snapshot maybe their is heavy IO or not enough memory for the systems. Have you checked the performance values? For iSCSI Ent. use https://x.x.x.x/check_sys/127.0.0.1.html and review the loads for mem during the Snapshot. Sometimes running out of memory can cause this and with the 14 clients doing heavy I/O could cause this (now the execute_task_management(1236) starts to tell more). Then maybe increasing the iSCSI Deamon settings, RAM, larger bond....
You cannot use the iSCSI-R3 Enterprise USB with the iSCSI Enterprise IDE existing volumes, they are different volume managers - you have to backup and then restore.