[09] WHAT SHOULD I DO IF A SYSTEM CRASHES OR LOCKS UP? Hopefully this will not happen at all to you, but if you experience 'lock ups' or 'freezes', please follow these steps to help prevent your own data loss. Also, it is important to note that you do not have a direct connection to SDF and are mostly likely hopping through 10 or more networks to get to SDF. You can use ping and traceroute to measure lag between your computer and SDF. So, your experience of lag on SDF is subjective and it is very important for you to understand that. Typically a lockup will occur when you are trying to access a file that is resident on the fileserver. For instance, say you are trying to cat a file and instead of seeing the contents you get either nothing or a message similar to: ol1:/sys: not responding Be patient, the fileserver will recover shortly and your task will be completed .. you will probably see: ol1:/sys: is alive again which means your request will actually begin to be processed. During the hang time, you can use ^T (CTRL T) to display the status of your job .. for instance: load: 2.04 cmd: tail 12966 [select] 0.00u 0.00s 0% 808k [select] is the current state of the process id 12966 which is the 'tail' program. If the system is waiting on actual disk I/O, you'll probably see [biowait]. In cases of a hang you may see either [nfsrcvlk] (Network File System Received Lock) or [vnlock] (Virtual Node Lock) which the system will usually recover from, but can be telling of a serious resource problem on the NFS client should this state be prolonged. In the event that the fileserver becomes unavailable, it is important that you do not become impatient and interrupt, quit or suspend your jobs (^C, ^\ or ^Z) but rather, wait them out. If you are patient your chances of losing data will be significantly reduced. Usually the fileserver will respond within a few seconds, but usually no longer. In the case when it is the NFS client's problem (vnlock for more than say 20 seconds) that particular host will most likely need to be reset. More on this. SDF is pushing NetBSD to its limits and we are currently (2003-2004) doing quite a bit of investigation with the uvm/vfs/vnode code developers to help NetBSD become scalable in high usage situations such as the loads we experience on SDF. Solutions we find will be incorporated into the public code. |
[back]