1. 17 6月, 2009 4 次提交
    • M
      page allocator: do not check NUMA node ID when the caller knows the node is valid · 6484eb3e
      Mel Gorman 提交于
      Callers of alloc_pages_node() can optionally specify -1 as a node to mean
      "allocate from the current node".  However, a number of the callers in
      fast paths know for a fact their node is valid.  To avoid a comparison and
      branch, this patch adds alloc_pages_exact_node() that only checks the nid
      with VM_BUG_ON().  Callers that know their node is valid are then
      converted.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Reviewed-by: NChristoph Lameter <cl@linux-foundation.org>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: Paul Mundt <lethal@linux-sh.org>	[for the SLOB NUMA bits]
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6484eb3e
    • M
      cpuset,mm: update tasks' mems_allowed in time · 58568d2a
      Miao Xie 提交于
      Fix allocating page cache/slab object on the unallowed node when memory
      spread is set by updating tasks' mems_allowed after its cpuset's mems is
      changed.
      
      In order to update tasks' mems_allowed in time, we must modify the code of
      memory policy.  Because the memory policy is applied in the process's
      context originally.  After applying this patch, one task directly
      manipulates anothers mems_allowed, and we use alloc_lock in the
      task_struct to protect mems_allowed and memory policy of the task.
      
      But in the fast path, we didn't use lock to protect them, because adding a
      lock may lead to performance regression.  But if we don't add a lock,the
      task might see no nodes when changing cpuset's mems_allowed to some
      non-overlapping set.  In order to avoid it, we set all new allowed nodes,
      then clear newly disallowed ones.
      
      [lee.schermerhorn@hp.com:
        The rework of mpol_new() to extract the adjusting of the node mask to
        apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
        with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
        allocation.  Fix this by adding the check for MPOL_PREFERRED and empty
        node mask to mpol_new_mpolicy().
      
        Remove the now unneeded 'nodes = NULL' from mpol_new().
      
        Note that mpol_new_mempolicy() is always called with a non-NULL
        'nodes' parameter now that it has been removed from mpol_new().
        Therefore, we don't need to test nodes for NULL before testing it for
        'empty'.  However, just to be extra paranoid, add a VM_BUG_ON() to
        verify this assumption.]
      [lee.schermerhorn@hp.com:
      
        I don't think the function name 'mpol_new_mempolicy' is descriptive
        enough to differentiate it from mpol_new().
      
        This function applies cpuset set context, usually constraining nodes
        to those allowed by the cpuset.  However, when the 'RELATIVE_NODES flag
        is set, it also translates the nodes.  So I settled on
        'mpol_set_nodemask()', because the comment block for mpol_new() mentions
        that we need to call this function to "set nodes".
      
        Some additional minor line length, whitespace and typo cleanup.]
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58568d2a
    • M
      cpusets: update tasks' page/slab spread flags in time · 950592f7
      Miao Xie 提交于
      Fix the bug that the kernel didn't spread page cache/slab object evenly
      over all the allowed nodes when spread flags were set by updating tasks'
      page/slab spread flags in time.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      950592f7
    • M
      cpusets: restructure the function cpuset_update_task_memory_state() · f3b39d47
      Miao Xie 提交于
      The kernel still allocates the page caches on old node after modifying its
      cpuset's mems when 'memory_spread_page' was set, or it didn't spread the
      page cache evenly over all the nodes that faulting task is allowed to usr
      after memory_spread_page was set.  it is caused by the old mem_allowed and
      flags of the task, the current kernel doesn't updates them unless some
      function invokes cpuset_update_task_memory_state(), it is too late
      sometimes.We must update the mem_allowed and the flags of the tasks in
      time.
      
      Slab has the same problem.
      
      The following patches fix this bug by updating tasks' mem_allowed and
      spread flag after its cpuset's mems or spread flag is changed.
      
      This patch:
      
      Extract a function from cpuset_update_task_memory_state().  It will be
      used later for update tasks' page/slab spread flags after its cpuset's
      flag is set
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Paul Menage <menage@google.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f3b39d47
  2. 13 6月, 2009 10 次提交
  3. 12 6月, 2009 18 次提交
    • R
      sched: export kick_process · b43e3521
      Rusty Russell 提交于
      lguest needs kick_process: wake_up_process() does nothing if a process
      is running, which isn't sufficient (we need it in the kernel).
      
      And lguest support is usually modular.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      b43e3521
    • P
      perf_counter: Add forward/backward attribute ABI compatibility · 974802ea
      Peter Zijlstra 提交于
      Provide for means of extending the perf_counter_attr in a 'natural' way.
      
      We allow growing the structure by appending fields at the end by specifying
      the full structure size inside it.
      
      When a new kernel sees a smaller (old) structure, it will 0 pad the tail.
      When an old kernel sees a larger (new) structure, it will verify the tail
      consists of 0s, otherwise fail.
      
      If we fail due to a size-mismatch, we return -E2BIG and write the kernel's
      native attribe size back into the provided structure.
      
      Furthermore, add some attribute verification, so that we'll fail counter
      creation when unknown bits are present (PERF_SAMPLE, PERF_FORMAT, or in
      the __reserved fields).
      
      (This ABI detail is introduced while keeping the existing syscall ABI.)
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      974802ea
    • P
      perf_counter: Remove PERF_TYPE_RAW special casing · 081fad86
      Peter Zijlstra 提交于
      The PERF_TYPE_RAW special case seems superfluous these days. Remove
      it and add it to the switch() stmt like the others.
      
      [ Impact: cleanup ]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      081fad86
    • R
      module: trim exception table on init free. · ad6561df
      Rusty Russell 提交于
      It's theoretically possible that there are exception table entries
      which point into the (freed) init text of modules.  These could cause
      future problems if other modules get loaded into that memory and cause
      an exception as we'd see the wrong fixup.  The only case I know of is
      kvm-intel.ko (when CONFIG_CC_OPTIMIZE_FOR_SIZE=n).
      
      Amerigo fixed this long-standing FIXME in the x86 version, but this
      patch is more general.
      
      This implements trim_init_extable(); most archs are simple since they
      use the standard lib/extable.c sort code.  Alpha and IA64 use relative
      addresses in their fixups, so thier trimming is a slight variation.
      
      Sparc32 is unique; it doesn't seem to define ARCH_HAS_SORT_EXTABLE,
      yet it defines its own sort_extable() which overrides the one in lib.
      It doesn't sort, so we have to mark deleted entries instead of
      actually trimming them.
      Inspired-by: NAmerigo Wang <amwang@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: linux-alpha@vger.kernel.org
      Cc: sparclinux@vger.kernel.org
      Cc: linux-ia64@vger.kernel.org
      ad6561df
    • R
      module_param: allow 'bool' module_params to be bool, not just int. · fddd5201
      Rusty Russell 提交于
      Impact: API cleanup
      
      For historical reasons, 'bool' parameters must be an int, not a bool.
      But there are around 600 users, so a conversion seems like useless churn.
      
      So we use __same_type() to distinguish, and handle both cases.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      fddd5201
    • R
      module_param: split perm field into flags and perm · 45fcc70c
      Rusty Russell 提交于
      Impact: cleanup
      
      Rather than hack KPARAM_KMALLOCED into the perm field, separate it out.
      Since the perm field was 32 bits and only needs 16, we don't add bloat.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      45fcc70c
    • R
      module_param: invbool should take a 'bool', not an 'int' · 9a71af2c
      Rusty Russell 提交于
      It takes an 'int' for historical reasons, and there are only two
      users: simply switch it over to bool.
      
      The other user (uvesafb.c) will get a (harmless-on-x86) warning until
      the next patch is applied.
      
      Cc: Brad Douglas <brad@neruo.com>
      Cc: Michal Januszewski <spock@gentoo.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      9a71af2c
    • Y
      irq: slab alloc for default irq_affinity · 28be225b
      Yinghai Lu 提交于
      Ingo had
      
      [    0.000000] ------------[ cut here ]------------
      [    0.000000] WARNING: at mm/bootmem.c:537 alloc_arch_preferred_bootmem+0x2b/0x71()
      [    0.000000] Hardware name: System Product Name
      [    0.000000] Modules linked in:
      [    0.000000] Pid: 0, comm: swapper Tainted: G        W  2.6.30-tip-03087-g0bb2618-dirty #52506
      [    0.000000] Call Trace:
      [    0.000000]  [<81032588>] warn_slowpath_common+0x60/0x90
      [    0.000000]  [<810325c5>] warn_slowpath_null+0xd/0x10
      [    0.000000]  [<819d1bc0>] alloc_arch_preferred_bootmem+0x2b/0x71
      [    0.000000]  [<819d1c31>] ___alloc_bootmem_nopanic+0x2b/0x9a
      [    0.000000]  [<81050a0a>] ? lock_release+0xac/0xb2
      [    0.000000]  [<819d1d4c>] ___alloc_bootmem+0xe/0x2d
      [    0.000000]  [<819d1e9f>] __alloc_bootmem+0xa/0xc
      [    0.000000]  [<819d7c63>] alloc_bootmem_cpumask_var+0x21/0x26
      [    0.000000]  [<819d0cc8>] early_irq_init+0x15/0x10d
      [    0.000000]  [<819bb75a>] start_kernel+0x167/0x326
      [    0.000000]  [<819bb06b>] __init_begin+0x6b/0x70
      [    0.000000] ---[ end trace 4eaa2a86a8e2da23 ]---
      [    0.000000] NR_IRQS:2304 nr_irqs:424
      [    0.000000] CPU 0 irqstacks, hard=821e6000 soft=821e7000
      
      we need to update init_irq_default_affinity
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      28be225b
    • A
      Push BKL down into ->remount_fs() · 337eb00a
      Alessio Igor Bogani 提交于
      [xfs, btrfs, capifs, shmem don't need BKL, exempt]
      Signed-off-by: NAlessio Igor Bogani <abogani@texware.it>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      337eb00a
    • A
      Switch collect_mounts() to struct path · 589ff870
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      589ff870
    • O
      slow_work_thread() should do the exclusive wait · b415c49a
      Oleg Nesterov 提交于
      slow_work_thread() sleeps on slow_work_thread_wq without WQ_FLAG_EXCLUSIVE,
      this means that slow_work_enqueue()->__wake_up(nr_exclusive => 1) wakes up all
      kslowd threads.  This is not what we want, so we change slow_work_thread() to
      use prepare_to_wait_exclusive() instead.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b415c49a
    • P
      irq: use kcalloc() instead of the bootmem allocator · 22fb4e71
      Pekka Enberg 提交于
      Fixes the following problem:
      
      [    0.000000] Experimental hierarchical RCU init done.
      [    0.000000] NR_IRQS:4352 nr_irqs:256
      [    0.000000] ------------[ cut here ]------------
      [    0.000000] WARNING: at mm/bootmem.c:537 alloc_arch_preferred_bootmem+0x40/0x7e()
      [    0.000000] Hardware name: To Be Filled By O.E.M.
      [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.30-tip-02161-g7a74539-dirty #59709
      [    0.000000] Call Trace:
      [    0.000000]  [<ffffffff823f8c8e>] ? alloc_arch_preferred_bootmem+0x40/0x7e
      [    0.000000]  [<ffffffff81067168>] warn_slowpath_common+0x88/0xcb
      [    0.000000]  [<ffffffff810671d2>] warn_slowpath_null+0x27/0x3d
      [    0.000000]  [<ffffffff823f8c8e>] alloc_arch_preferred_bootmem+0x40/0x7e
      [    0.000000]  [<ffffffff823f9307>] ___alloc_bootmem_nopanic+0x4e/0xec
      [    0.000000]  [<ffffffff823f93c5>] ___alloc_bootmem+0x20/0x61
      [    0.000000]  [<ffffffff823f962e>] __alloc_bootmem+0x1e/0x34
      [    0.000000]  [<ffffffff823f757c>] early_irq_init+0x6d/0x118
      [    0.000000]  [<ffffffff823e0140>] ? early_idt_handler+0x0/0x71
      [    0.000000]  [<ffffffff823e0cf7>] start_kernel+0x192/0x394
      [    0.000000]  [<ffffffff823e0140>] ? early_idt_handler+0x0/0x71
      [    0.000000]  [<ffffffff823e02ad>] x86_64_start_reservations+0xb4/0xcf
      [    0.000000]  [<ffffffff823e0000>] ? __init_begin+0x0/0x140
      [    0.000000]  [<ffffffff823e0420>] x86_64_start_kernel+0x158/0x17b
      [    0.000000] ---[ end trace a7919e7f17c0a725 ]---
      [    0.000000] Fast TSC calibration using PIT
      [    0.000000] Detected 2002.510 MHz processor.
      [    0.004000] Console: colour VGA+ 80x25
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      22fb4e71
    • P
      sched: use slab in cpupri_init() · 0fb53029
      Pekka Enberg 提交于
      Lets not use the bootmem allocator in cpupri_init() as slab is already up when
      it is run.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      0fb53029
    • P
      sched: use alloc_cpumask_var() instead of alloc_bootmem_cpumask_var() · 4bdddf8f
      Pekka Enberg 提交于
      Slab is initialized when sched_init() runs now so lets use alloc_cpumask_var().
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      4bdddf8f
    • Y
      irq/cpumask: make memoryless node zero happy · dad213ae
      Yinghai Lu 提交于
      Don't hardcode to node zero for early boot IRQ setup memory allocations.
      
      [ penberg@cs.helsinki.fi: minor cleanups ]
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      dad213ae
    • Y
      x86: remove some alloc_bootmem_cpumask_var calling · 38c7fed2
      Yinghai Lu 提交于
      Now that we set up the slab allocator earlier, we can get rid of some
      alloc_bootmem_cpumask_var() calls in boot code.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      38c7fed2
    • P
      sched: use kzalloc() instead of the bootmem allocator · 36b7b6d4
      Pekka Enberg 提交于
      Now that kmem_cache_init() happens before sched_init(), we should use kzalloc()
      and not the bootmem allocator.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      36b7b6d4
    • C
      kmemleak: Add modules support · 4f2294b6
      Catalin Marinas 提交于
      This patch handles the kmemleak operations needed for modules loading so
      that memory allocations from inside a module are properly tracked.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      4f2294b6
  4. 11 6月, 2009 8 次提交