1. 21 5月, 2010 1 次提交
  2. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  3. 25 3月, 2010 1 次提交
  4. 07 3月, 2010 3 次提交
  5. 04 3月, 2010 1 次提交
    • E
      init: Open /dev/console from rootfs · 2bd3a997
      Eric W. Biederman 提交于
      To avoid potential problems with an empty /dev open /dev/console
      from rootfs instead of waiting to mount our root filesystem and
      mounting it there.   This effectively guarantees that there will
      be a device node, and it won't be on a filesystem that we will
      ever unmount, so there are no issues with leaving /dev/console
      open and pinning the filesystem.
      
      This is actually more effective than automatically mounting
      devtmpfs on /dev because it removes removes the occasionally
      problematic assumption that /dev/console exists from the boot
      code.
      
      With this patch I was able to throw busybox on my /boot partition
      (which has no /dev directory) and boot into userspace without
      problems.
      
      The only possible negative consequence I can think of is that
      someone out there deliberately used did not use a character device
      that is major 5 minor 2 for /dev/console.  Does anyone know of a
      situation in which that could make sense?
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2bd3a997
  6. 25 2月, 2010 1 次提交
    • P
      sched: Use lockdep-based checking on rcu_dereference() · d11c563d
      Paul E. McKenney 提交于
      Update the rcu_dereference() usages to take advantage of the new
      lockdep-based checking.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-6-git-send-email-paulmck@linux.vnet.ibm.com>
      [ -v2: fix allmodconfig missing symbol export build failure on x86 ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d11c563d
  7. 18 2月, 2010 2 次提交
  8. 07 2月, 2010 1 次提交
  9. 17 12月, 2009 1 次提交
  10. 16 12月, 2009 1 次提交
  11. 03 12月, 2009 1 次提交
  12. 24 9月, 2009 3 次提交
    • A
      headers: utsname.h redux · 2bcd57ab
      Alexey Dobriyan 提交于
      * remove asm/atomic.h inclusion from linux/utsname.h --
         not needed after kref conversion
       * remove linux/utsname.h inclusion from files which do not need it
      
      NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
      due to some personality stuff it _is_ needed -- cowardly leave ELF-related
      headers and files alone.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2bcd57ab
    • R
      cpumask: remove unused cpu_mask_all · 72d78d05
      Rusty Russell 提交于
      It's only defined for NR_CPUS > BITS_PER_LONG; cpu_all_mask is always
      defined (and const).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      72d78d05
    • P
      rcu: Clean up code based on review feedback from Josh Triplett, part 2 · 1eba8f84
      Paul E. McKenney 提交于
      These issues identified during an old-fashioned face-to-face code
      review extending over many hours.
      
      o	Add comments for tricky parts of code, and correct comments
      	that have passed their sell-by date.
      
      o	Get rid of the vestiges of rcu_init_sched(), which is no
      	longer needed now that PREEMPT_RCU is gone.
      
      o	Move the #include of rcutree_plugin.h to the end of
      	rcutree.c, which means that, rather than having a random
      	collection of forward declarations, the new set of forward
      	declarations document the set of plugins.  The new home for
      	this #include also allows __rcu_init_preempt() to move into
      	rcutree_plugin.h.
      
      o	Fix rcu_preempt_check_callbacks() to be static.
      Suggested-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12537246443924-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Peter Zijlstra <peterz@infradead.org>
      1eba8f84
  13. 22 9月, 2009 1 次提交
  14. 16 9月, 2009 1 次提交
    • K
      Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev · 2b2af54a
      Kay Sievers 提交于
      Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
      very early at kernel initialization, before any driver-core device
      is registered. Every device with a major/minor will provide a
      device node in devtmpfs.
      
      Devtmpfs can be changed and altered by userspace at any time,
      and in any way needed - just like today's udev-mounted tmpfs.
      Unmodified udev versions will run just fine on top of it, and will
      recognize an already existing kernel-created device node and use it.
      The default node permissions are root:root 0600. Proper permissions
      and user/group ownership, meaningful symlinks, all other policy still
      needs to be applied by userspace.
      
      If a node is created by devtmps, devtmpfs will remove the device node
      when the device goes away. If the device node was created by
      userspace, or the devtmpfs created node was replaced by userspace, it
      will no longer be removed by devtmpfs.
      
      If it is requested to auto-mount it, it makes init=/bin/sh work
      without any further userspace support. /dev will be fully populated
      and dynamic, and always reflect the current device state of the kernel.
      With the commonly used dynamic device numbers, it solves the problem
      where static devices nodes may point to the wrong devices.
      
      It is intended to make the initial bootup logic simpler and more robust,
      by de-coupling the creation of the inital environment, to reliably run
      userspace processes, from a complex userspace bootstrap logic to provide
      a working /dev.
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJan Blunck <jblunck@suse.de>
      Tested-By: NHarald Hoyer <harald@redhat.com>
      Tested-By: NScott James Remnant <scott@ubuntu.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      2b2af54a
  15. 04 9月, 2009 1 次提交
  16. 02 9月, 2009 1 次提交
  17. 29 8月, 2009 1 次提交
  18. 27 8月, 2009 1 次提交
    • T
      init: Move sched_clock_init after late_time_init · fa84e9ee
      Thomas Gleixner 提交于
      Some architectures initialize clocks and timers in late_time_init and
      x86 wants to do the same to avoid FIXMAP hackery for calibrating the
      TSC. That would result in undefined sched_clock readout and wreckaged
      printk timestamps again. We probably have those already on archs which
      do all their time/clock setup in late_time_init.
      
      There is no harm to move that after late_time_init except that a few
      more boot timestamps are stale. The scheduler is not active at that
      point so no real wreckage is expected.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      Cc: linux-arch@vger.kernel.org
      fa84e9ee
  19. 21 8月, 2009 1 次提交
    • I
      tracing: Fix too large stack usage in do_one_initcall() · 4a683bf9
      Ingo Molnar 提交于
      One of my testboxes triggered this nasty stack overflow crash
      during SCSI probing:
      
      [    5.874004] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
      [    5.875004] device: 'sda': device_add
      [    5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
      [    5.878004] IP: [<b1008321>] print_context_stack+0x81/0x110
      [    5.878004] *pde = 00000000
      [    5.878004] Thread overran stack, or stack corrupted
      [    5.878004] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [    5.878004] last sysfs file:
      [    5.878004]
      [    5.878004] Pid: 1, comm: swapper Not tainted (2.6.31-rc6-tip-01272-g9919e28-dirty #5685)
      [    5.878004] EIP: 0060:[<b1008321>] EFLAGS: 00010083 CPU: 0
      [    5.878004] EIP is at print_context_stack+0x81/0x110
      [    5.878004] EAX: cf8a3000 EBX: cf8a3fe4 ECX: 00000049 EDX: 00000000
      [    5.878004] ESI: b1cfce84 EDI: 00000000 EBP: cf8a3018 ESP: cf8a2ff4
      [    5.878004]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      [    5.878004] Process swapper (pid: 1, ti=cf8a2000 task=cf8a8000 task.ti=cf8a3000)
      [    5.878004] Stack:
      [    5.878004]  b1004867 fffff000 cf8a3ffc
      [    5.878004] Call Trace:
      [    5.878004]  [<b1004867>] ? kernel_thread_helper+0x7/0x10
      [    5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
      [    5.878004] IP: [<b1008321>] print_context_stack+0x81/0x110
      [    5.878004] *pde = 00000000
      [    5.878004] Thread overran stack, or stack corrupted
      [    5.878004] Oops: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
      
      The oops did not reveal any more details about the real stack
      that we have and the system got into an infinite loop of
      recursive pagefaults.
      
      So i booted with CONFIG_STACK_TRACER=y and the 'stacktrace' boot
      parameter. The box did not crash (timings/conditions probably
      changed a tiny bit to trigger the catastrophic crash), but the
      /debug/tracing/stack_trace file was rather revealing:
      
              Depth    Size   Location    (72 entries)
              -----    ----   --------
        0)     3704      52   __change_page_attr+0xb8/0x290
        1)     3652      24   __change_page_attr_set_clr+0x43/0x90
        2)     3628      60   kernel_map_pages+0x108/0x120
        3)     3568      40   prep_new_page+0x7d/0x130
        4)     3528      84   get_page_from_freelist+0x106/0x420
        5)     3444     116   __alloc_pages_nodemask+0xd7/0x550
        6)     3328      36   allocate_slab+0xb1/0x100
        7)     3292      36   new_slab+0x1c/0x160
        8)     3256      36   __slab_alloc+0x133/0x2b0
        9)     3220       4   kmem_cache_alloc+0x1bb/0x1d0
       10)     3216     108   create_object+0x28/0x250
       11)     3108      40   kmemleak_alloc+0x81/0xc0
       12)     3068      24   kmem_cache_alloc+0x162/0x1d0
       13)     3044      52   scsi_pool_alloc_command+0x29/0x70
       14)     2992      20   scsi_host_alloc_command+0x22/0x70
       15)     2972      24   __scsi_get_command+0x1b/0x90
       16)     2948      28   scsi_get_command+0x35/0x90
       17)     2920      24   scsi_setup_blk_pc_cmnd+0xd4/0x100
       18)     2896     128   sd_prep_fn+0x332/0xa70
       19)     2768      36   blk_peek_request+0xe7/0x1d0
       20)     2732      56   scsi_request_fn+0x54/0x520
       21)     2676      12   __generic_unplug_device+0x2b/0x40
       22)     2664      24   blk_execute_rq_nowait+0x59/0x80
       23)     2640     172   blk_execute_rq+0x6b/0xb0
       24)     2468      32   scsi_execute+0xe0/0x140
       25)     2436      64   scsi_execute_req+0x152/0x160
       26)     2372      60   scsi_vpd_inquiry+0x6c/0x90
       27)     2312      44   scsi_get_vpd_page+0x112/0x160
       28)     2268      52   sd_revalidate_disk+0x1df/0x320
       29)     2216      92   rescan_partitions+0x98/0x330
       30)     2124      52   __blkdev_get+0x309/0x350
       31)     2072       8   blkdev_get+0xf/0x20
       32)     2064      44   register_disk+0xff/0x120
       33)     2020      36   add_disk+0x6e/0xb0
       34)     1984      44   sd_probe_async+0xfb/0x1d0
       35)     1940      44   __async_schedule+0xf4/0x1b0
       36)     1896       8   async_schedule+0x12/0x20
       37)     1888      60   sd_probe+0x305/0x360
       38)     1828      44   really_probe+0x63/0x170
       39)     1784      36   driver_probe_device+0x5d/0x60
       40)     1748      16   __device_attach+0x49/0x50
       41)     1732      32   bus_for_each_drv+0x5b/0x80
       42)     1700      24   device_attach+0x6b/0x70
       43)     1676      16   bus_attach_device+0x47/0x60
       44)     1660      76   device_add+0x33d/0x400
       45)     1584      52   scsi_sysfs_add_sdev+0x6a/0x2c0
       46)     1532     108   scsi_add_lun+0x44b/0x460
       47)     1424     116   scsi_probe_and_add_lun+0x182/0x4e0
       48)     1308      36   __scsi_add_device+0xd9/0xe0
       49)     1272      44   ata_scsi_scan_host+0x10b/0x190
       50)     1228      24   async_port_probe+0x96/0xd0
       51)     1204      44   __async_schedule+0xf4/0x1b0
       52)     1160       8   async_schedule+0x12/0x20
       53)     1152      48   ata_host_register+0x171/0x1d0
       54)     1104      60   ata_pci_sff_activate_host+0xf3/0x230
       55)     1044      44   ata_pci_sff_init_one+0xea/0x100
       56)     1000      48   amd_init_one+0xb2/0x190
       57)      952       8   local_pci_probe+0x13/0x20
       58)      944      32   pci_device_probe+0x68/0x90
       59)      912      44   really_probe+0x63/0x170
       60)      868      36   driver_probe_device+0x5d/0x60
       61)      832      20   __driver_attach+0x89/0xa0
       62)      812      32   bus_for_each_dev+0x5b/0x80
       63)      780      12   driver_attach+0x1e/0x20
       64)      768      72   bus_add_driver+0x14b/0x2d0
       65)      696      36   driver_register+0x6e/0x150
       66)      660      20   __pci_register_driver+0x53/0xc0
       67)      640       8   amd_init+0x14/0x16
       68)      632     572   do_one_initcall+0x2b/0x1d0
       69)       60      12   do_basic_setup+0x56/0x6a
       70)       48      20   kernel_init+0x84/0xce
       71)       28      28   kernel_thread_helper+0x7/0x10
      
      There's a lot of fat functions on that stack trace, but
      the largest of all is do_one_initcall(). This is due to
      the boot trace entry variables being on the stack.
      
      Fixing this is relatively easy, initcalls are fundamentally
      serialized, so we can move the local variables to file scope.
      
      Note that this large stack footprint was present for a
      couple of months already - what pushed my system over
      the edge was the addition of kmemleak to the call-chain:
      
        6)     3328      36   allocate_slab+0xb1/0x100
        7)     3292      36   new_slab+0x1c/0x160
        8)     3256      36   __slab_alloc+0x133/0x2b0
        9)     3220       4   kmem_cache_alloc+0x1bb/0x1d0
       10)     3216     108   create_object+0x28/0x250
       11)     3108      40   kmemleak_alloc+0x81/0xc0
       12)     3068      24   kmem_cache_alloc+0x162/0x1d0
       13)     3044      52   scsi_pool_alloc_command+0x29/0x70
      
      This pushes the total to ~3800 bytes, only a tiny bit
      more was needed to corrupt the on-kernel-stack thread_info.
      
      The fix reduces the stack footprint from 572 bytes
      to 28 bytes.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4a683bf9
  20. 14 8月, 2009 1 次提交
  21. 22 7月, 2009 1 次提交
    • J
      x86, intel_txt: Intel TXT reboot/halt shutdown support · 840c2baf
      Joseph Cihula 提交于
      Support for graceful handling of kernel reboots after an Intel(R) TXT launch.
      
      Without this patch, attempting to reboot or halt the system will cause the
      TXT hardware to lock memory upon system restart because the secrets-in-memory
      flag that was set on launch was never cleared.  This will in turn cause BIOS
      to execute a TXT Authenticated Code Module (ACM) that will scrub all of memory
      and then unlock it.  Depending on the amount of memory in the system and its type,
      this may take some time.
      
      This patch creates a 1:1 address mapping to the tboot module and then calls back
      into tboot so that it may properly and securely clean up system state and clear
      the secrets-in-memory flag.  When it has completed these steps, the tboot module
      will reboot or halt the system.
      
       arch/x86/kernel/reboot.c |    8 ++++++++
       init/main.c              |    3 +++
       2 files changed, 11 insertions(+)
      Signed-off-by: NJoseph Cihula <joseph.cihula@intel.com>
      Signed-off-by: NShane Wang <shane.wang@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      840c2baf
  22. 24 6月, 2009 1 次提交
    • T
      percpu: use dynamic percpu allocator as the default percpu allocator · e74e3962
      Tejun Heo 提交于
      This patch makes most !CONFIG_HAVE_SETUP_PER_CPU_AREA archs use
      dynamic percpu allocator.  The first chunk is allocated using
      embedding helper and 8k is reserved for modules.  This ensures that
      the new allocator behaves almost identically to the original allocator
      as long as static percpu variables are concerned, so it shouldn't
      introduce much breakage.
      
      s390 and alpha use custom SHIFT_PERCPU_PTR() to work around addressing
      range limit the addressing model imposes.  Unfortunately, this breaks
      if the address is specified using a variable, so for now, the two
      archs aren't converted.
      
      The following architectures are affected by this change.
      
      * sh
      * arm
      * cris
      * mips
      * sparc(32)
      * blackfin
      * avr32
      * parisc (broken, under investigation)
      * m32r
      * powerpc(32)
      
      As this change makes the dynamic allocator the default one,
      CONFIG_HAVE_DYNAMIC_PER_CPU_AREA is replaced with its invert -
      CONFIG_HAVE_LEGACY_PER_CPU_AREA, which is added to yet-to-be converted
      archs.  These archs implement their own setup_per_cpu_areas() and the
      conversion is not trivial.
      
      * powerpc(64)
      * sparc(64)
      * ia64
      * alpha
      * s390
      
      Boot and batch alloc/free tests on x86_32 with debug code (x86_32
      doesn't use default first chunk initialization).  Compile tested on
      sparc(32), powerpc(32), arm and alpha.
      
      Kyle McMartin reported that this change breaks parisc.  The problem is
      still under investigation and he is okay with pushing this patch
      forward and fixing parisc later.
      
      [ Impact: use dynamic allocator for most archs w/o custom percpu setup ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Reviewed-by: NChristoph Lameter <cl@linux.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Bryan Wu <cooloney@kernel.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      e74e3962
  23. 23 6月, 2009 1 次提交
  24. 19 6月, 2009 2 次提交
  25. 17 6月, 2009 2 次提交
    • B
      mm: Move pgtable_cache_init() earlier · c868d550
      Benjamin Herrenschmidt 提交于
      Some architectures need to initialize SLAB caches to be able
      to allocate page tables. They do that from pgtable_cache_init()
      so the later should be called earlier now, best is before
      vmalloc_init().
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c868d550
    • M
      cpuset,mm: update tasks' mems_allowed in time · 58568d2a
      Miao Xie 提交于
      Fix allocating page cache/slab object on the unallowed node when memory
      spread is set by updating tasks' mems_allowed after its cpuset's mems is
      changed.
      
      In order to update tasks' mems_allowed in time, we must modify the code of
      memory policy.  Because the memory policy is applied in the process's
      context originally.  After applying this patch, one task directly
      manipulates anothers mems_allowed, and we use alloc_lock in the
      task_struct to protect mems_allowed and memory policy of the task.
      
      But in the fast path, we didn't use lock to protect them, because adding a
      lock may lead to performance regression.  But if we don't add a lock,the
      task might see no nodes when changing cpuset's mems_allowed to some
      non-overlapping set.  In order to avoid it, we set all new allowed nodes,
      then clear newly disallowed ones.
      
      [lee.schermerhorn@hp.com:
        The rework of mpol_new() to extract the adjusting of the node mask to
        apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
        with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
        allocation.  Fix this by adding the check for MPOL_PREFERRED and empty
        node mask to mpol_new_mpolicy().
      
        Remove the now unneeded 'nodes = NULL' from mpol_new().
      
        Note that mpol_new_mempolicy() is always called with a non-NULL
        'nodes' parameter now that it has been removed from mpol_new().
        Therefore, we don't need to test nodes for NULL before testing it for
        'empty'.  However, just to be extra paranoid, add a VM_BUG_ON() to
        verify this assumption.]
      [lee.schermerhorn@hp.com:
      
        I don't think the function name 'mpol_new_mempolicy' is descriptive
        enough to differentiate it from mpol_new().
      
        This function applies cpuset set context, usually constraining nodes
        to those allowed by the cpuset.  However, when the 'RELATIVE_NODES flag
        is set, it also translates the nodes.  So I settled on
        'mpol_set_nodemask()', because the comment block for mpol_new() mentions
        that we need to call this function to "set nodes".
      
        Some additional minor line length, whitespace and typo cleanup.]
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58568d2a
  26. 13 6月, 2009 2 次提交
  27. 12 6月, 2009 6 次提交
    • P
      slab,slub: don't enable interrupts during early boot · 7e85ee0c
      Pekka Enberg 提交于
      As explained by Benjamin Herrenschmidt:
      
        Oh and btw, your patch alone doesn't fix powerpc, because it's missing
        a whole bunch of GFP_KERNEL's in the arch code... You would have to
        grep the entire kernel for things that check slab_is_available() and
        even then you'll be missing some.
      
        For example, slab_is_available() didn't always exist, and so in the
        early days on powerpc, we used a mem_init_done global that is set form
        mem_init() (not perfect but works in practice). And we still have code
        using that to do the test.
      
      Therefore, mask out __GFP_WAIT, __GFP_IO, and __GFP_FS in the slab allocators
      in early boot code to avoid enabling interrupts.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      7e85ee0c
    • K
      memcg: fix page_cgroup fatal error in FLATMEM · ca371c0d
      KAMEZAWA Hiroyuki 提交于
      Now, SLAB is configured in very early stage and it can be used in
      init routine now.
      
      But replacing alloc_bootmem() in FLAT/DISCONTIGMEM's page_cgroup()
      initialization breaks the allocation, now.
      (Works well in SPARSEMEM case...it supports MEMORY_HOTPLUG and
       size of page_cgroup is in reasonable size (< 1 << MAX_ORDER.)
      
      This patch revive FLATMEM+memory cgroup by using alloc_bootmem.
      
      In future,
      We stop to support FLATMEM (if no users) or rewrite codes for flatmem
      completely.But this will adds more messy codes and overheads.
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Tested-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      ca371c0d
    • P
      init: introduce mm_init() · 444f478f
      Pekka Enberg 提交于
      As suggested by Christoph Lameter, introduce mm_init() now that we initialize
      all the kernel memory allocations together.
      
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      444f478f
    • P
      vmalloc: use kzalloc() instead of alloc_bootmem() · 43ebdac4
      Pekka Enberg 提交于
      We can call vmalloc_init() after kmem_cache_init() and use kzalloc() instead of
      the bootmem allocator when initializing vmalloc data structures.
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      43ebdac4
    • P
      slab: setup allocators earlier in the boot sequence · 83b519e8
      Pekka Enberg 提交于
      This patch makes kmalloc() available earlier in the boot sequence so we can get
      rid of some bootmem allocations. The bulk of the changes are due to
      kmem_cache_init() being called with interrupts disabled which requires some
      changes to allocator boostrap code.
      
      Note: 32-bit x86 does WP protect test in mem_init() so we must setup traps
      before we call mem_init() during boot as reported by Ingo Molnar:
      
        We have a hard crash in the WP-protect code:
      
        [    0.000000] Checking if this processor honours the WP bit even in supervisor mode...BUG: Int 14: CR2 ffcff000
        [    0.000000]      EDI 00000188  ESI 00000ac7  EBP c17eaf9c  ESP c17eaf8c
        [    0.000000]      EBX 000014e0  EDX 0000000e  ECX 01856067  EAX 00000001
        [    0.000000]      err 00000003  EIP c10135b1   CS 00000060  flg 00010002
        [    0.000000] Stack: c17eafa8 c17fd410 c16747bc c17eafc4 c17fd7e5 000011fd f8616000 c18237cc
        [    0.000000]        00099800 c17bb000 c17eafec c17f1668 000001c5 c17f1322 c166e039 c1822bf0
        [    0.000000]        c166e033 c153a014 c18237cc 00020800 c17eaff8 c17f106a 00020800 01ba5003
        [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip-02161-g7a74539-dirty #52203
        [    0.000000] Call Trace:
        [    0.000000]  [<c15357c2>] ? printk+0x14/0x16
        [    0.000000]  [<c10135b1>] ? do_test_wp_bit+0x19/0x23
        [    0.000000]  [<c17fd410>] ? test_wp_bit+0x26/0x64
        [    0.000000]  [<c17fd7e5>] ? mem_init+0x1ba/0x1d8
        [    0.000000]  [<c17f1668>] ? start_kernel+0x164/0x2f7
        [    0.000000]  [<c17f1322>] ? unknown_bootoption+0x0/0x19c
        [    0.000000]  [<c17f106a>] ? __init_begin+0x6a/0x6f
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      83b519e8
    • C
      kmemleak: Add the base support · 3c7b4e6b
      Catalin Marinas 提交于
      This patch adds the base support for the kernel memory leak
      detector. It traces the memory allocation/freeing in a way similar to
      the Boehm's conservative garbage collector, the difference being that
      the unreferenced objects are not freed but only shown in
      /sys/kernel/debug/kmemleak. Enabling this feature introduces an
      overhead to memory allocations.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      3c7b4e6b