Re: Announcing the release of BLCR 0.6.0_beta1

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Wed Jul 25 2007 - 09:53:23 PDT

  • Next message: Paul H. Hargrove: "Re: Announcing the release of BLCR 0.6.0_beta1"
    Cedric Le Goater wrote:
    > Cedric Le Goater wrote:
    >   
    >>> Hmm, not 100% sure on that one.  There is code in
    >>> cr_module/cr_module.c:cr_init_module() that compares the addresses of
    >>> two symbols as probed from the System.map at configure time against
    >>> their addresses as resolved by the kernel's module linker/loader.  BLCR
    >>> refuses to load the module if these don't match, since it will be making
    >>> function calls to other addresses obtained in the same way (and we
    >>> really don't want to invoke code at random addresses in kernel context).
    >>>
    >>> So, my best guess is that message means what it says, perhaps due to
    >>> BLCR's autoconf machinery locating the wrong System.map file.  You
    >>> should try comparing the output of
    >>>  $ grep register_chrdev /proc/kyms
    >>> against that of
    >>>  $ grep register_chrdev [MAPFILE]
    >>> where [MAPFILE] is the System.map file being used by BLCR (try "grep
    >>> LINUX_SYMTAB_FILE Makefile" in your BLCR build directory).  If they
    >>> match, then it is possible that something is happening with respect to
    >>> kernel linking/relocation that BLCR is not prepared to deal with.  If
    >>> they don't match then it might still be a relocation issue, but more
    >>> likely it means BLCR found the wrong System.map file.  If that is the
    >>> case, try passing --with-system-map=[WHATEVER] when configuring BLCR. 
    >>> Let us know what you find.
    >>>       
    >> hmm, the addresses in the /boot/System.map-2.6.22.1-27.fc7 file and 
    >> the ones from /proc/kallsyms (same kernel shipped by fedora) are
    >> different. weird. I'll investigate.
    >>     
    >
    > dunno why this is not the same.
    >   
    
    My guess is that fc7 has enabled relocation of the kernel image, in 
    which case the difference between the addresses in System.map and 
    /proc/kallsyms would probably be a multiple of the page size.  Could you 
    check for CONFIG_RELOCATABLE in your kernel config (probably in 
    /boot/config-2.6.22.1-27.fc7).
    
    If that is the case, then BLCR is going to need to find a way to deal 
    with this (and I don't have any bright ideas at the moment).
    
    >   
    >>>> what about glibc 2.6 ?    
    >>>>         
    >>> What about it?  I don't have any systems running glibc 2.6.  If you have
    >>> specific problems (once past the System.map problem), please let us know
    >>> and we'll see what we can do to sort them out.
    >>>       
    >> I will as soon as i get that module loaded ! :)
    >>     
    >
    > so I generated a real System.map with :
    >
    > 	$ cat /proc/kallsyms > System.map
    >   
    
    
    If that works for you, then --with-system-map=/proc/kallsyms should work 
    as well.
    
    Did that really work with no other changes?  I've been unable to use 
    that approach on other systems because the symbol "_end" (which we key 
    off to validate System.map) is missing from /proc/kallsyms where I've 
    tried it.
    
    >  
    > configured, built and run 'make check' :
    >  
    > PASS: atomics
    > PASS: cr_run
    > PASS: bug2003
    > PASS: stage0001.st
    > PASS: stage0002.st
    > PASS: stage0003.st
    > PASS: critical_sections.st
    > PASS: replace_cb.st
    > PASS: failed_cb.st
    > PASS: pid_in_use.st
    > PASS: simple.ct
    > PASS: simple_pthread.ct
    > PASS: cwd.ct
    > PASS: dup.ct
    > PASS: filedescriptors.ct
    > PASS: pipe.ct
    > PASS: named_fifo.ct
    > PASS: cloexec.ct
    > PASS: get_info.ct
    > PASS: orphan.ct
    > PASS: overlap.ct
    > PASS: child.ct
    > PASS: mmaps.ct
    > No hugetlbfs mount point found (test skipped)
    > SKIP: hugetlbfs.ct
    > PASS: readdir.ct
    > PASS: dev_null.ct
    > PASS: cr_signal.ct
    > PASS: linked_fifo.ct
    > ======================
    > All 27 tests passed
    > (1 tests were not run)
    >
    > it seems fine on a bare fc7. I suppose that the tests are doing a 
    > checkpoint+restart sequence. right ? 
    >
    >   
    
    Yes, all the test that end in ".ct" are doing checkpoint+restart (some 
    of them multiple times).
    
    > thanks !
    >
    > C.
    >   
    
    You are welcome.
    
    -Paul
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 
    HPC Research Department                   Tel: +1-510-495-2352
    Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
    

  • Next message: Paul H. Hargrove: "Re: Announcing the release of BLCR 0.6.0_beta1"