From: Jeff Squyres (jsquyres_at_open-mpi.org)
Date: Wed Jul 27 2005 - 12:32:48 PDT
I didn't dig, but I'm guessing that it calls aio_init() (or whatever) -- doesn't that spawn off another thread and/or setup things with resources that could be non-checkpointable? On Jul 27, 2005, at 11:21 AM, Paul H. Hargrove wrote: > Jeff, > > I am not sure this explains why a simple hello world program should > fail to restart. Even if romio runs some initialization code at > MPI_Init time, I can see how any actual async I/O would be started. > > -Paul > > Jeff Squyres wrote: > >> On Jul 26, 2005, at 5:01 PM, Paul H. Hargrove wrote: >> >>> There is no support in current BLCR versions for either POSIX or >>> Linux-native async I/O support. While this has nothing to do with >>> whatever linker problems Jeff mentioned, it could be the cause of >>> the problems you've been seeing. >> >> >> I'm inferring from Pradeep's mail that there was an RPM that was >> removed, but has now been replaced (LAM won't use libaio unless it >> finds it during configure -- so it must have been there at some point >> and then was later removed). >> >>> How/when is async I/O used in LAM? Is there a simple way to >>> disable it via ssi params? >> >> >> It's used in ROMIO. There are currently no SSI params to remove its >> use -- part of the problem is that the wrapper compilers add "-laio" >> So it's not just a run-time switch to change ROMIO's behavior, it's >> a compile-time decision (ROMIO makes a bunch of decisions and sets >> #define's based on whether AIO is present or not) for both LAM and >> ROMIO. >> >> But this also explains why we rarely (never?) saw this problem in our >> own testing -- the vast majority of our manual testing builds disable >> ROMIO because it takes so long to compile. Urgh. This also explains >> why my LAM build on Pradeep's system worked -- I configured and built >> LAM after the libaio-devel RPM was removed, so my build did not add >> -laio. >> >> The quick and easy solution is to disable ROMIO ("--without-romio"). >> Not really an optimal solution, but it'll work. >> > > -- > Paul H. Hargrove PHHargrove_at_lbl_dot_gov > Future Technologies Group HPC Research Department > Tel: +1-510-495-2352 > Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 > > -- {+} Jeff Squyres {+} The Open MPI Project {+} http://www.open-mpi.org/