restart error in cluster

From: fengguang tian (fernyabc_at_gmail_dot_com)
Date: Fri Apr 02 2010 - 09:50:11 PDT

  • Next message: Balazs Gerofi: "BLCR @ BG/P?"
    Hi,
    when in a cluster with two machines, the checkpoint and restart works fine,
    but when the cluster has three or more machines, when restart, some error
    happened:
    
    mpirun noticed that process rank 6 with PID 27878 on node xxxx exited on
    signal 11 (Segmentation fault).
    
    some segmentation fault happened. I am using NFS file during the machines.
    
    cheers
    

  • Next message: Balazs Gerofi: "BLCR @ BG/P?"