Re: math.ct failure

From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Fri Jun 05 2009 - 12:03:27 PDT

  • Next message: Paul H. Hargrove: "Re: More on fedora f11 build failure"
    Peter,
    
     The CFLAGS you quote from our configure script are obtained by running 
    the kernel kbuild infrastructure on a dummy module:
         make V=1 -C ${LINUX_SRCDIR} builddir=${TOPBUILDDIR}/conftestdir
    where conftestdir contains a Makefile and conftest.c for a dummy kernel 
    module.  The CFLAGS are then extracted from the output of that command.  
    So, it is the kernel's own build infrastructure (perhaps w/ gentoo 
    specific changes or additions) that is providing these CFLAGS.
    
      I doubt that "-msoft-float" could be a factor.  Along with "-mno-sse 
    -mno-mmx -mno-sse2 -mno-3dnow", these come from arch/x86/Makefile and 
    are included for all 32-bit x86 kernel builds to ensure that there are 
    no floating-point instructions in the kernel code other than inline 
    assembly which is guarded by "kernel_fpu_begin()" and 
    "kernel_fpu_end()".  This is required because the kernel runs with fpu 
    register state "borrowed" from the previously running user process.  
    Similar flags are passed on other architectures for the same reason.
    
    While I have no gentoo installations to test, I can run BLCR with a 
    vanilla 2.6.28 kernel on a Pentium4 with no problems, and have confirmed 
    that the CFLAGS there are very similar to yours:
    
    -nostdinc -isystem 
    /usr/local/pkg/gcc-4.1.2/lib/gcc/i686-pc-linux-gnu/4.1.2/include 
    -D__KERNEL__ -I/lib/modules/2.6.28/build/include 
    -I/lib/modules/2.6.28/build/include2 
    -I/home/data1/phargrov/kernel-src/linux-2.6.28/include 
    -I/home/data1/phargrov/kernel-src/linux-2.6.28/arch/x86/include -include 
    /lib/modules/2.6.28/build/include/linux/autoconf.h 
    -I/data/blcr/BUILD-2.6.28/conftestdir -Wall -Wundef -Wstrict-prototypes 
    -Wno-trigraphs -fno-strict-aliasing -fno-common 
    -Werror-implicit-function-declaration -m32 -msoft-float -mregparm=3 
    -freg-struct-return -mpreferred-stack-boundary=2 -march=i686 
    -mtune=pentium4 -ffreestanding -pipe -Wno-sign-compare 
    -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow 
    -I/home/data1/phargrov/kernel-src/linux-2.6.28/arch/x86/include/asm/mach-default 
    -I/lib/modules/2.6.28/build/arch/x86/include/asm/mach-default 
    -fno-stack-protector -fomit-frame-pointer -Wdeclaration-after-statement 
    -Wno-pointer-sign -DMODULE -DKBUILD_STR(s)=#s 
    -DKBUILD_BASENAME=KBUILD_STR(conftest) -DKBUILD_MODNAME=KBUILD_STR(conftest)
    
    So, I am afraid I still have no good guesses as to the source of your 
    problem.
    
    -Paul
    
    
    Peter Elias wrote:
    >   Hello,
    > I have succesfully fixed my CHOST (and recompiled everything, to be 
    > sure). As one may expect, it did not help. But I have noticed the 
    > following when running configure:
    >
    > checking for flags to compile Linux kernel probes...  -nostdinc 
    > -isystem /usr/lib/gcc/i686-pc-linux-gnu/4.3.2/include -D__KERNEL__ 
    > -I/lib/modules/2.6.28-gentoo-r5/build/include 
    > -I/usr/src/linux-2.6.28-gentoo-r5/arch/x86/include -include 
    > /lib/modules/2.6.28-gentoo-r5/build/include/linux/autoconf.h -Wall 
    > -Wundef -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing 
    > -fno-common -Werror-implicit-function-declaration -m32 -msoft-float 
    > -mregparm=3 -freg-struct-return -mpreferred-stack-boundary=2 
    > -march=athlon -Wa,-mtune=generic32 -ffreestanding -DCONFIG_AS_CFI=1 
    > -DCONFIG_AS_CFI_SIGNAL_FRAME=1 -pipe -Wno-sign-compare 
    > -fno-asynchronous-unwind-tables -mno-sse -mno-mmx -mno-sse2 -mno-3dnow 
    > -I/lib/modules/2.6.28-gentoo-r5/build/arch/x86/include/asm/mach-default 
    > -fno-stack-protector -fomit-frame-pointer 
    > -Wdeclaration-after-statement -Wno-pointer-sign -fwrapv -DMODULE 
    > -DKBUILD_STR(s)=#s -DKBUILD_BASENAME=KBUILD_STR(conftest) 
    > -DKBUILD_MODNAME=KBUILD_STR(conftest)
    >
    > Is that what it should be? Especially notice the usage of -msoft-float.
    > How the compiler flags are guessed? I am now afraid that Gentoo 
    > manages CFLAGS slightly non-standard way. Could this cause the problem?
    >   Peter
    >
    >
    > Peter Elias wrote:
    >> Hello,
    >>   I apologize for my late answer; I had some troubles running X after 
    >> upgrading to Xorg 1.5 ...
    >>   I am running the latest gentoo-patched kernel 
    >> (linux-2.6.28-gentoo-r5), and NOT running Xen (nor any other 
    >> virtualization software).
    >>   Is there any chance that the failure of math.ct test signalizes 
    >> some hardware problem?
    >>   There is one thing I should probably try: when I have installed 
    >> Gentoo, I have downloaded wrong stage3 tarball, so now I have CHOST 
    >> set to i486-pc-linux-gnu (instead of i686-pc-linux-gnu). Anyway, I 
    >> have then compiled everything with -march=athlon option. As I have 
    >> learned, a change of CHOST leads to some (possibly lot of) 
    >> recompiling. I do not know what exactly will the change of CHOST do, 
    >> but who knows...
    >>   Peter
    >>
    >> Eric Roman wrote:
    >>> Can you please let us know which kernel you're running?
    >>>
    >>> Eric
    >>>
    >>> On Thu, May 14, 2009 at 07:43:02AM -0700, Paul Hargrove wrote:
    >>>> Peter,
    >>>>  The list address is checkpoint_at_lbl_dot_gov (I've cc:ed this reply and 
    >>>> set Reply-To)
    >>>>
    >>>>  If you are seeing this failure, then I would NOT proceed with 
    >>>> using BLCR.  We should figure out the source of the failure.
    >>>>  Given the things we've been through in the past, my only guess is 
    >>>> that you are using Xen with a version older than 3.1.2.  If that is 
    >>>> the case, the bug is actaully in Xen nor BLCR and you will need to 
    >>>> upgrade Xen.  If that is not the case, let us know and we'll see if 
    >>>> we can figure out the source of the problem.
    >>>>
    >>>> -Paul
    >>>>
    >>>> Peter Elias wrote:
    >>>>>  Dear Paul H. Hargrove!
    >>>>>
    >>>>>  I apologize for sending this e-mail to your personal address. I 
    >>>>> planned to send it to the checkpoint mailing list (archives are at 
    >>>>> www.nersc.gov), but I couldn't find list's address.
    >>>>>  I am trying to compile and run blcr-0.8.1 on my machine (Gentoo 
    >>>>> Linux, AMD Athlon CPU). Compilation went without errors and from 
    >>>>> all tests only one has failed:
    >>>>>
    >>>>> .../blcr-0.8.1/builddir/tests/.libs/lt-math[31579]: file " 
    >>>>> ../../tests/math.c", line 54, in math_callback: FP restore failure 
    >>>>> nan != 38842.2
    >>>>>
    >>>>> cr_poll_request() failed: Restart aborted: restart cancelled and 
    >>>>> job killed
    >>>>> FAIL: math.ct
    >>>>>
    >>>>>  Shall I install and use blcr? I want to save and then repeatedly 
    >>>>> restore some ocaml sessions.
    >>>>>
    >>>>>  With best regards,
    >>>>>
    >>>>> Peter Elias
    >>>>>
    >>>>
    >>>> -- 
    >>>> Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    >>>> Future Technologies Group                 Tel: +1-510-495-2352
    >>>> HPC Research Department                   Fax: +1-510-486-6900
    >>>> Lawrence Berkeley National Laboratory     
    >>
    >>
    >
    
    
    -- 
    Paul H. Hargrove                          PHHargrove_at_lbl_dot_gov
    Future Technologies Group                 Tel: +1-510-495-2352
    HPC Research Department                   Fax: +1-510-486-6900
    Lawrence Berkeley National Laboratory     
    

  • Next message: Paul H. Hargrove: "Re: More on fedora f11 build failure"