From: fengguang tian (fernyabc_at_gmail_dot_com)
Date: Fri Apr 02 2010 - 09:50:11 PDT
Hi, when in a cluster with two machines, the checkpoint and restart works fine, but when the cluster has three or more machines, when restart, some error happened: mpirun noticed that process rank 6 with PID 27878 on node xxxx exited on signal 11 (Segmentation fault). some segmentation fault happened. I am using NFS file during the machines. cheers