Re: Q: Status of integration with Torque?

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Jul 03 2007 - 11:40:07 PDT

  • Next message: Jerry Mersel: "Re: berkeley checkpoint and matlab"
    Brian Dobbins wrote:
    > Hi Paul,
    >   I came across the Spruce presentation that said there's some hope 
    > for BLCR integration with Torque, possibly slated for SC'07, and I'd 
    > love to learn a bit more about the state of things.  At the moment, I 
    > have little experience with Torque or BLCR directly, but I'm willing 
    > to help out with such development.  The slide mentions some 
    > 'engineering support' from Cluster Resources, as well as this being a 
    > Cray deliverable for the NERSC project, so perhaps some other people 
    > might already be making tons of progress (in which case I doubt my own 
    > contributions would be terribly meaningful), but even if it is minor 
    > stuff like testing, I've got a small cluster largely for my own use 
    > and would be happy to do what I can.
    >   Cheers,
    >   - Brian
    > Brian Dobbins
    > Yale Engineering HPC
      Thank you for you interest in BLCR/Torque integration and your offer 
    to help.
      Currently our focus is on pushing out a 0.6.0 release of BLCR later 
    this month.  Once that is done, the Torque integration will be one of 
    the top active development items for us.  The "Engineering Support" from 
    Cluster Resources roughly means that they are happy to answer any 
    questions we may have about where or how to make changes to Torque, but 
    that they will not be the ones writing the code.  The current state is 
    that we have a "proof of concept" prototype based on a pre-0.5.0 version 
    of BLCR that lacked support for checkpoint/restart of full process 
    groups or sessions.  Most of that prototype will probably be discarded 
    (rather than incrementally modified) and the "lessons learned" applied 
    to writing a fresh version.
      The nature of the work to be done doesn't really lend itself to 
    breaking into smaller sub-projects, so I'd rather handle the development 
    work myself.  However, when we have some code to test, I'll keep your 
    offer of assistance in mind.
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900

  • Next message: Jerry Mersel: "Re: berkeley checkpoint and matlab"