From: Paul H. Hargrove (PHHargrove_at_lbl_dot_gov)
Date: Tue Jun 17 2008 - 15:39:53 PDT
Parviz, Thanks for the dmesg output you sent last week. Unfortunately, there is nothing in there to point to the actual cause of the EFAULT. The attached patch will print an error message in the two most likely paths to generated the EFAULT. If you could, please recompile BLCR with this patch applied and "make insmod cr_ktrace_mask=2" to load the module with only minimal debugging enabled. Then please send the dmesg output for a failed restart. Hopefully the added output will help me narrow the possible causes. Also, is it possible for me to get a copy of the application source code? It would be helpful if I could reproduce the problem locally. If you wish, contact me directly at the email address in my .sig below, rather than posting your code to the archived checkpoint_at_lbl_dot_gov list. -Paul -- Paul H. Hargrove PHHargrove_at_lbl_dot_gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 Index: vmadump4/vmadump_common.c =================================================================== RCS file: /var/local/cvs/lbnl_cr/vmadump4/vmadump_common.c,v retrieving revision 1.58 diff -u -r1.58 vmadump_common.c --- vmadump4/vmadump_common.c 26 May 2008 05:26:37 -0000 1.58 +++ vmadump4/vmadump_common.c 17 Jun 2008 22:32:25 -0000 @@ -105,6 +105,7 @@ oldfs = get_fs(); set_fs(KERNEL_DS); err = read_user(ctx, file, buf, count); set_fs(oldfs); + if (err == -EFAULT) printk("vmadump: EFAULT in read_kern\n"); return err; } @@ -572,6 +573,7 @@ err: if (r >= 0) r = -EIO; /* map short reads to EIO */ + if (r == -EFAULT) printk("vmadump: EFAULT loading %d pages at %p\n", (int)page.num_pages, (void*)page.start); return r; }