We have encountered a problem with the minidump code, running on the amd64 kernel, failing in the 6.x release. We were not able to obtain core files. We also found a seemingly relevant "bounds check" update (applied to multiple files, one example is this: http://www.freebsd.org/cgi/cvsweb.cgi/src/sys/amd64/amd64/dump_machdep.c.diff?r1=1.12;r2=1.12.2.1 ), but this change was only back-ported to 7.x in CVS. We hunted this down and back-ported it to FreeBSD 6.x. Although we believe the change is safe & probably necessary, it didn't seem to fully address the issue. Instead we ran into a new symptom which is the message "Attempt to write outside dump device boundaries". A workaround of course is to enable full core dumps, but when using the amd64 kernel this can result in a very big core file. "minidumpsys()" computes the expected size of the core dump, and then places it at the end of the dump device. For example, if the dump is expected to be 500M, and the dump device size is 8G, the dump starts at offset 7.5G. In the failing case, the dump is larger than expected. There are several stages to the dump; the one that goes awry is the "memory chunks", which is the largest part of the core dump. The logic is as follows: - during initialization, create and zero an array of uint64_t, called vm_page_dump, with a size of vm_page_dump_size. This array consists of bit fields, with each bit representing a page of physical memory. - when we decide to do a minidump (1) walk the kernel page tables; for each mapping, find the physical page(s) underlying it, and set their bits in the vm_page_dump array (2) walk the vm_page_dump array; for each bit set (a) if !is_dumpable(associated physical address) clear the bit in vm_page-dump (b) otherwise add one to the count of pages to be dumped (3) walk the vm_page_dump array again, dumping each page. The root of the overrun appears to be due to the fact that dump_add_page() is called from uma_small_alloc() - and this can be called during the actual dump, from blk_write(). Comments in vm_page_startup() kind of explain it: #if defined(__amd64__) || defined(__i386__) /* * Allocate a bitmap to indicate that a random physical page * needs to be included in a minidump. * * The amd64 port needs this to indicate which direct map pages * need to be dumped, via calls to dump_add_page()/dump_drop_page(). * * However, i386 still needs this workspace internally within the * minidump code. In theory, they are not needed on i386, but are * included should the sf_buf code decide to use them. */ page_range = phys_avail[(nblocks - 1) * 2 + 1] / PAGE_SIZE; vm_page_dump_size = round_page(roundup2(page_range, NBBY) / NBBY); new_end -= vm_page_dump_size; vm_page_dump = (void *)(uintptr_t)pmap_map(&vaddr, new_end, new_end + vm_page_dump_size, VM_PROT_READ | VM_PROT_WRITE); bzero((void *)vm_page_dump, vm_page_dump_size); #endif #ifdef __amd64__ /* * pmap_map on amd64 comes out of the direct-map, not kvm like i386, * so the pages must be tracked for a crashdump to include this data. * This includes the vm_page_array and the early UMA bootstrap pages. */ for (pa = new_end; pa < phys_avail[biggestone + 1]; pa += PAGE_SIZE) dump_add_page(pa); #endif Our simple solution is to put a flag in minidump_machdep.c. If that flag is set, dump_add_page() becomes a no-op. It will be set just before the pages to be dumped are counted. Since the pages added to support the dump are not aspects of the system at the point of the crash, there should be no loss of debugging value from instituting this change. Note that our patch is specific to the amd64 kernel (patch follows). Comments on our approach greatly appreciated. Thanks, Arlie Stephens Engineer Dorr H. Clark Advisor Graduate School of Engineering Santa Clara University, Santa Clara, CA usr/src/sys/amd64/amd64/minidump_machdep.c @@ -68,6 +68,16 @@ CTASSERT(sizeof(*vm_page_dump) == 8); +/* + * We freeze the vm_page_dump[] array before we compute the amount to dump + * so we don't risk having a new increase in pages and a consequent overrun + * (Actually, this is a misnomer; we still allow pages to be dropped, + * because the code to count the pages needed also has calls + * to dump_drop_page()* and while these seem to be no-ops, it's better + * to be safe.) + */ +static int vm_page_dump_frozen; + static int is_dumpable(vm_paddr_t pa) { @@ -256,6 +268,13 @@ } /* Calculate dump size. */ + /* + * Don't let more pages be added to the set of pages to be dumped + * after this point, lest we try to use more memory than + * we reserve + */ + vm_page_dump_frozen = 1; + dumpsize = ptesize; dumpsize += round_page(msgbufp->msg_size); dumpsize += round_page(vm_page_dump_size); @@ -444,6 +465,9 @@ { int idx, bit; + if (vm_page_dump_frozen) { + return; + } pa >>= PAGE_SHIFT; idx = pa >> 6; /* 2^6 = 64 */ bit = pa & 63; http://www.cse.scu.edu/~dclark/coen_284_FreeBSD/minidump_fix_amd64.txt