Re: 32bit and 64bit platforms: suggestion

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Aug 01 2007 - 13:44:30 PDT

  • Next message: Paul H. Hargrove: "Re: Simple blcr API usage?"
    Adolfo J. Banchio wrote:
    > We have here a cluster which has a mixture ia32 
    > and EM64T cpu's. And as we already know it is
    > not possible to restart a job (even a 32bit one)
    > that started in an 64 bit node in a 32 bit one.
    > However, even defining in the default script 
    > for the queue system a default architecture to
    > prevent jobs started in one to continue in other,
    > one user ended up restarting in the wrong architecture
    > producing a KERNEL PANIC !!.
    > So, my suggestion is, if possible, to prevent cr_restart
    > to proceed if it realizes that the checkpoint is from
    > different architecture and deliver a corresponding error
    > message.
    > We are using blcr-0.5.0_b5-1 on the 64bit nodes and
    > blcr-0.5.0_b1-1 on the 32bit ones. Just for your information.
    > Best regards,
    > adolfo
    > P.S.: again, this is just a suggestion, for a minor thing.
    IMHO a kernel panic caused by a non-root user is *not* a minor thing.
    We really could/should include an architecture identifier in the BLCR 
    file header.
    I've entered a bug report 
    ( for this issue 
    and hope to resolve it for the current 0.6.0 beta series.
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

  • Next message: Paul H. Hargrove: "Re: Simple blcr API usage?"