1. 10 1月, 2006 2 次提交
  2. 09 1月, 2006 38 次提交
    • M
      [PATCH] Make vm86 support optional · 64ca9004
      Matt Mackall 提交于
      This adds an option to remove vm86 support under CONFIG_EMBEDDED.  Saves
      about 5k.
      
      This version eliminates most of the #ifdefs of the previous version and
      instead uses function stubs in vm86.h.  Also, release_vm86_irqs is moved
      from asm-i386/irq.h to a more appropriate home in vm86.h so that the stubs
      can live together.
      
      $ size vmlinux-baseline vmlinux-novm86
         text    data     bss     dec     hex filename
      2920821  523232  190652 3634705  377611 vmlinux-baseline
      2916268  523100  190492 3629860  376324 vmlinux-novm86
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      64ca9004
    • M
      [PATCH] tiny: Make *[ug]id16 support optional · e585e470
      Matt Mackall 提交于
      Configurable 16-bit UID and friends support
      
      This allows turning off the legacy 16 bit UID interfaces on embedded platforms.
      
         text    data     bss     dec     hex filename
      3330172  529036  190556 4049764  3dcb64 vmlinux-baseline
      3328268  529040  190556 4047864  3dc3f8 vmlinux
      
      From: Adrian Bunk <bunk@stusta.de>
      
          UID16 was accidentially disabled for !EMBEDDED.
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e585e470
    • O
      [PATCH] simplify k_getrusage() · 0f59cc4a
      Oleg Nesterov 提交于
      Factor out common code for different RUSAGE_xxx cases.
      
      Don't take ->sighand->siglock in RUSAGE_SELF case, suggested by Ravikiran G
      Thirumalai <kiran@scalex86.org>.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0f59cc4a
    • N
      [PATCH] fix workqueue oops during cpu offline · f756d5e2
      Nathan Lynch 提交于
      Use first_cpu(cpu_possible_map) for the single-thread workqueue case.  We
      used to hardcode 0, but that broke on systems where !cpu_possible(0) when
      workqueue_struct->cpu_workqueue_struct was changed from a static array to
      alloc_percpu.
      
      Commit id bce61dd4 ("Fix hardcoded cpu=0 in
      workqueue for per_cpu_ptr() calls") fixed that for Ben's funky sparc64
      system, but it regressed my Power5.  Offlining cpu 0 oopses upon the next
      call to queue_work for a single-thread workqueue, because now we try to
      manipulate per_cpu_ptr(wq->cpu_wq, 1), which is uninitialized.
      
      So we need to establish an unchanging "slot" for single-thread workqueues
      which will have a valid percpu allocation.  Since alloc_percpu keys off of
      cpu_possible_map, which must not change after initialization, make this
      slot == first_cpu(cpu_possible_map).
      Signed-off-by: NNathan Lynch <ntl@pobox.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f756d5e2
    • A
      [PATCH] kernel/module.c: remove redundant spinlock in resolve_symbol() · eb46996f
      Ashutosh Naik 提交于
      Remove the redundant spinlock in the function resolve_symbol() as we are
      not altering the module list, and we already hold the semaphore.
      Signed-off-by: NAshutosh Naik <ashutosh.naik@gmail.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      eb46996f
    • A
      [PATCH] modules: mark TAINT_FORCED_RMMOD correctly · fb169793
      Akinobu Mita 提交于
      Currently TAINT_FORCED_RMMOD is totally unused.  Because it is marked as
      TAINT_FORCED_MODULE instead when user forced a module unload.  This patch
      marks it correctly
      Signed-off-by: NAkinobu Mita <mita@miraclelinux.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fb169793
    • A
      [PATCH] modules: prevent overriding of symbols · eea8b54d
      Ashutosh Naik 提交于
      Ensure that an exported symbol does not already exist in the kernel or in
      some other module's exported symbol table.  This is done by checking the
      symbol tables for the exported symbol at the time of loading the module.
      Currently this is done after the relocation of the symbol.
      Signed-off-by: NAshutosh Naik <ashutosh.naik@gmail.com>
      Signed-off-by: NAnand Krishnan <anandhkrishnan@yahoo.co.in>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      eea8b54d
    • O
      [PATCH] copy_process: error path cleanup · fe7d37d1
      Oleg Nesterov 提交于
      This patch moves 'fork_out:' under 'bad_fork_free:', and removes now
      unneeded 'if (retval)' check.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fe7d37d1
    • O
      [PATCH] setpgid: should not accept ptraced childs · f7dd795e
      Oleg Nesterov 提交于
      sys_setpgid() allows to change ->pgrp of ptraced childs.
      
      'man setpgid' does not tell anything about that, so I consider
      this behaviour is a bug.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Oren Laadan <orenl@cs.columbia.edu>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f7dd795e
    • O
      [PATCH] setpgid: should work for sub-threads · e19f247a
      Oren Laadan 提交于
      setsid() does not work unless the calling process is a
      thread_group_leader().
      
      'man setpgid' does not tell anything about that, so I consider this
      behaviour is a bug.
      Signed-off-by: NOren Laadan <orenl@cs.columbia.edu>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e19f247a
    • O
      [PATCH] setpgid: should work for sub-threads · ee0acf90
      Oleg Nesterov 提交于
      setpgid(0, pgid) or setpgid(forked_child_pid, pgid) does not work unless
      the calling process is a thread_group_leader().
      
      'man setpgid' does not tell anything about that, so I consider this
      behaviour is a bug.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Oren Laadan <orenl@cs.columbia.edu>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ee0acf90
    • O
      [PATCH] fork: fix race in setting child's pgrp and tty · 9a5d3023
      Oren Laadan 提交于
      In fork, child should recopy parent's pgrp/tty after it has tasklist_lock.
      Otherwise following a setpgid() on the parent, *after* copy_signal(), the
      child will own a stale pgrp (which may be reused); (eg.  if copy_mm()
      sleeps a long while due to memory pressure).  Similar issue for the tty.
      Signed-off-by: NOren Laadan <orenl@cs.columbia.edu>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9a5d3023
    • E
      [PATCH] Don't attempt to power off if power off is not implemented · 5e38291d
      Eric W. Biederman 提交于
      The problem.  It is expected that /sbin/halt -p works exactly like
      /sbin/halt, when the kernel does not implement power off functionality.
      
      The kernel can do a lot of work in the reboot notifiers and in
      device_shutdown before we even get to machine_power_off.  Some of that
      shutdown is not safe if you are leaving the power on, and it definitely
      gets in the way of using sysrq or pressing ctrl-alt-del.  Since the
      shutdown happens in generic code there is no way to fix this in
      architecture specific code :(
      
      Some machines are kernel oopsing today because of this.
      
      The simple solution is to turn LINUX_REBOOT_CMD_POWER_OFF into
      LINUX_REBOOT_CMD_HALT if power_off functionality is not implemented.
      
      This has the unfortunate side effect of disabling the power off
      functionality on architectures that leave pm_power_off to null and still
      implement something in machine_power_off.  And it will break the build on
      some architectures that don't have a pm_power_off variable at all.
      
      On both counts I say tough.
      
      For architectures like alpha that don't implement the pm_power_off variable
      pm_power_off is declared in linux/pm.h and it is a generic part of our
      power management code, and all architectures should implement it.
      
      For architectures like parisc that have a default power off method in
      machine_power_off if pm_power_off is not implemented or fails.  It is easy
      enough to set the pm_power_off variable.  And nothing bad happens there,
      the machines just stop powering off.
      
      The current semantics are impossible without a flag at the top level so we
      can avoid the problem code if a power off is not implemented.  pm_power_off
      is as good a flag as any with the bonus that it works without modification
      on at least x86, x86_64, powerpc, and ppc today.
      
      Andrew can you pick this up and put this in the mm tree.  Kernels that
      don't compile or don't power off seem saner than kernels that oops or
      panic.  Until we get the arch specific patches for the problem
      architectures this probably isn't smart to push into the stable kernel.
      Unfortunately I don't have the time at the moment to walk through every
      architecture and make them work.  And even if I did I couldn't test it :(
      
      From: Hirokazu Takata <takata@linux-m32r.org>
      
          Add pm_power_off() for build fix of arch/m32r/kernel/process.c.
      
      From: Miklos Szeredi <miklos@szeredi.hu>
      
          UML build fix
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NHayato Fujiwara <fujiwara@linux-m32r.org>
      Signed-off-by: NHirokazu Takata <takata@linux-m32r.org>
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5e38291d
    • S
      [PATCH] Extend RCU torture module to test tickless idle CPU · d84f5203
      Srivatsa Vaddagiri 提交于
      This patch forces RCU torture threads off various CPUs in the system
      allowing them to become idle and go tickless.  Meant to test support for
      such tickless idle CPU in RCU.
      Signed-off-by: NSrivatsa Vaddagiri <vatsa@in.ibm.com>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d84f5203
    • D
      [PATCH] Add tainting for proprietary helper modules · 9841d61d
      Dave Jones 提交于
      Kernels that have had Windows drivers loaded into them are undebuggable.
      I've wasted a number of hours chasing bugs filed in Fedora bugzilla only to
      find out much later that the user had used such 'helpers', and their
      problems were unreproducable without them loaded.
      Acked-by: NArjan van de Ven <arjan@infradead.org>
      Signed-off-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9841d61d
    • E
      [PATCH] shrink dentry struct · 5160ee6f
      Eric Dumazet 提交于
      Some long time ago, dentry struct was carefully tuned so that on 32 bits
      UP, sizeof(struct dentry) was exactly 128, ie a power of 2, and a multiple
      of memory cache lines.
      
      Then RCU was added and dentry struct enlarged by two pointers, with nice
      results for SMP, but not so good on UP, because breaking the above tuning
      (128 + 8 = 136 bytes)
      
      This patch reverts this unwanted side effect, by using an union (d_u),
      where d_rcu and d_child are placed so that these two fields can share their
      memory needs.
      
      At the time d_free() is called (and d_rcu is really used), d_child is known
      to be empty and not touched by the dentry freeing.
      
      Lockless lookups only access d_name, d_parent, d_lock, d_op, d_flags (so
      the previous content of d_child is not needed if said dentry was unhashed
      but still accessed by a CPU because of RCU constraints)
      
      As dentry cache easily contains millions of entries, a size reduction is
      worth the extra complexity of the ugly C union.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Cc: Dipankar Sarma <dipankar@in.ibm.com>
      Cc: Maneesh Soni <maneesh@in.ibm.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Al Viro <viro@ftp.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Neil Brown <neilb@cse.unsw.edu.au>
      Cc: James Morris <jmorris@namei.org>
      Cc: Stephen Smalley <sds@epoch.ncsc.mil>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5160ee6f
    • O
      [PATCH] remove unneeded sig->curr_target recalculation · 86174cdc
      Oleg Nesterov 提交于
      This patch removes unneeded sig->curr_target recalculation under 'if
      (atomic_dec_and_test(&sig->count))' in __exit_signal().
      
      When sig->count == 0 the signal can't be sent to this task and
      next_thread(tsk) == tsk anyway.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      86174cdc
    • O
      [PATCH] little do_group_exit() cleanup · 485a6435
      Oleg Nesterov 提交于
      zap_other_threads() sets SIGNAL_GROUP_EXIT at the very start,
      do_group_exit() doesn't need to do it.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      485a6435
    • O
      [PATCH] kill_proc_info_as_uid: don't use hardcoded constants · 0811af28
      Oleg Nesterov 提交于
      Use symbolic names instead of hardcoded constants.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-by: NHarald Welte <laforge@gnumonks.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0811af28
    • B
      [PATCH] Unchecked alloc_percpu() return in __create_workqueue() · 676121fc
      Ben Collins 提交于
      __create_workqueue() not checking return of alloc_percpu()
      
      NULL dereference was possible.
      Signed-off-by: NBen Collins <bcollins@ubuntu.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      676121fc
    • G
      [PATCH] sigaction should clear all signals on SIG_IGN, not just < 32 · 71fabd5e
      George Anzinger 提交于
      While rooting aroung in the signal code trying to understand how to fix the
      SIG_IGN ploy (set sig handler to SIG_IGN and flood system with high speed
      repeating timers) I came across what, I think, is a problem in sigaction()
      in that when processing a SIG_IGN request it flushes signals from 1 to
      SIGRTMIN and leaves the rest.  Attempt to fix this.
      Signed-off-by: NGeorge Anzinger <george@mvista.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Linus Torvalds <torvalds@osdl.org>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      71fabd5e
    • G
      [PATCH] printk return value: fix it · 025510cd
      Guillaume Chazarain 提交于
      What's the true meaning of the printk return value?  Should it include the
      priority prefix length of 3?  and what about the timing information?  In
      both cases it was broken:
      
      strace -e write echo 1 > /dev/kmsg
      => write(1, "1\n", 2)                      = 5
      strace -e write echo "<1>1" > /dev/kmsg
      => write(1, "<1>1\n", 5)                   = 8
      
      The returned length was "length of input string + 3", I made it "length
      of string output to the log buffer".
      
      Note that I couldn't find any printk caller in the kernel interested by its
      return value besides kmsg_write.
      Signed-off-by: NGuillaume Chazarain <guichaz@yahoo.fr>
      Acked-By: NTim Bird <tim.bird@am.sony.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      025510cd
    • C
      [PATCH] use ptrace_get_task_struct in various places · 6b9c7ed8
      Christoph Hellwig 提交于
      The ptrace_get_task_struct() helper that I added as part of the ptrace
      consolidation is useful in variety of places that currently opencode it.
      Switch them to the common helpers.
      
      Add a ptrace_traceme() helper that needs to be explicitly called, and simplify
      the ptrace_get_task_struct() interface.  We don't need the request argument
      now, and we return the task_struct directly, using ERR_PTR() for error
      returns.  It's a bit more code in the callers, but we have two sane routines
      that do one thing well now.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6b9c7ed8
    • N
      [PATCH] rcu file: use atomic primitives · 095975da
      Nick Piggin 提交于
      Use atomic_inc_not_zero for rcu files instead of special case rcuref.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      095975da
    • A
      [PATCH] kernel/: small cleanups · 97a41e26
      Adrian Bunk 提交于
      This patch contains the following cleanups:
      - make needlessly global functions static
      - every file should include the headers containing the prototypes for
        it's global functions
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Acked-by: N"Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      97a41e26
    • P
      [PATCH] cpuset: skip rcu check if task is in root cpuset · 03a285f5
      Paul Jackson 提交于
      For systems that aren't using cpusets, but have them CONFIG_CPUSET enabled in
      their kernel (eventually this may be most distribution kernels), this patch
      removes even the minimal rcu_read_lock() from the memory page allocation path.
      
      Actually, it removes that rcu call for any task that is in the root cpuset
      (top_cpuset), which on systems not actively using cpusets, is all tasks.
      
      We don't need the rcu check for tasks in the top_cpuset, because the
      top_cpuset is statically allocated, so at no risk of being freed out from
      underneath us.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      03a285f5
    • P
      [PATCH] cpuset: mark number_of_cpusets read_mostly · 7edc5962
      Paul Jackson 提交于
      Mark cpuset global 'number_of_cpusets' as __read_mostly.
      
      This global is accessed everytime a zone is considered in the zonelist loops
      beneath __alloc_pages, looking for a free memory page.  If number_of_cpusets
      is just one, then we can short circuit the mems_allowed check.
      
      Since this global is read alot on a hot path, and written rarely, it is an
      excellent candidate for __read_mostly.
      
      Thanks to Christoph Lameter for the suggestion.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7edc5962
    • P
      [PATCH] cpuset: use rcu directly optimization · 6b9c2603
      Paul Jackson 提交于
      Optimize the cpuset impact on page allocation, the most performance critical
      cpuset hook in the kernel.
      
      On each page allocation, the cpuset hook needs to check for a possible change
      in the current tasks cpuset.  It can now handle the common case, of no change,
      without taking any spinlock or semaphore, thanks to RCU.
      
      Convert a spinlock on the current task to an rcu_read_lock(), saving
      approximately a memory barrier and an atomic op, depending on architecture.
      
      This is done by adding rcu_assign_pointer() and synchronize_rcu() calls to the
      write side of the task->cpuset pointer, in cpuset.c:attach_task(), to delay
      freeing up a detached cpuset until after any critical sections referencing
      that pointer.
      
      Thanks to Andi Kleen, Nick Piggin and Eric Dumazet for ideas.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6b9c2603
    • P
      [PATCH] cpuset: remove test for null cpuset from alloc code path · c417f024
      Paul Jackson 提交于
      Remove a couple of more lines of code from the cpuset hooks in the page
      allocation code path.
      
      There was a check for a NULL cpuset pointer in the routine
      cpuset_update_task_memory_state() that was only needed during system boot,
      after the memory subsystem was initialized, before the cpuset subsystem was
      initialized, to catch a NULL task->cpuset pointer.
      
      Add a cpuset_init_early() routine, just before the mem_init() call in
      init/main.c, that sets up just enough of the init tasks cpuset structure to
      render cpuset_update_task_memory_state() calls harmless.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c417f024
    • P
      [PATCH] cpuset: migrate all tasks in cpuset at once · 04c19fa6
      Paul Jackson 提交于
      Given the mechanism in the previous patch to handle rebinding the per-vma
      mempolicies of all tasks in a cpuset that changes its memory placement, it is
      now easier to handle the page migration requirements of such tasks at the same
      time.
      
      The previous code didn't actually attempt to migrate the pages of the tasks in
      a cpuset whose memory placement changed until the next time each such task
      tried to allocate memory.  This was undesirable, as users invoking memory page
      migration exected to happen when the placement changed, not some unspecified
      time later when the task needed more memory.
      
      It is now trivial to handle the page migration at the same time as the per-vma
      rebinding is done.
      
      The routine cpuset.c:update_nodemask(), which handles changing a cpusets
      memory placement ('mems') now checks for the special case of being asked to
      write a placement that is the same as before.  It was harmless enough before
      to just recompute everything again, even though nothing had changed.  But page
      migration is a heavy weight operation - moving pages about.  So now it is
      worth avoiding that if asked to move a cpuset to its current location.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      04c19fa6
    • P
      [PATCH] cpuset: rebind vma mempolicies fix · 4225399a
      Paul Jackson 提交于
      Fix more of longstanding bug in cpuset/mempolicy interaction.
      
      NUMA mempolicies (mm/mempolicy.c) are constrained by the current tasks cpuset
      to just the Memory Nodes allowed by that cpuset.  The kernel maintains
      internal state for each mempolicy, tracking what nodes are used for the
      MPOL_INTERLEAVE, MPOL_BIND or MPOL_PREFERRED policies.
      
      When a tasks cpuset memory placement changes, whether because the cpuset
      changed, or because the task was attached to a different cpuset, then the
      tasks mempolicies have to be rebound to the new cpuset placement, so as to
      preserve the cpuset-relative numbering of the nodes in that policy.
      
      An earlier fix handled such mempolicy rebinding for mempolicies attached to a
      task.
      
      This fix rebinds mempolicies attached to vma's (address ranges in a tasks
      address space.) Due to the need to hold the task->mm->mmap_sem semaphore while
      updating vma's, the rebinding of vma mempolicies has to be done when the
      cpuset memory placement is changed, at which time mmap_sem can be safely
      acquired.  The tasks mempolicy is rebound later, when the task next attempts
      to allocate memory and notices that its task->cpuset_mems_generation is
      out-of-date with its cpusets mems_generation.
      
      Because walking the tasklist to find all tasks attached to a changing cpuset
      requires holding tasklist_lock, a spinlock, one cannot update the vma's of the
      affected tasks while doing the tasklist scan.  In general, one cannot acquire
      a semaphore (which can sleep) while already holding a spinlock (such as
      tasklist_lock).  So a list of mm references has to be built up during the
      tasklist scan, then the tasklist lock dropped, then for each mm, its mmap_sem
      acquired, and the vma's in that mm rebound.
      
      Once the tasklist lock is dropped, affected tasks may fork new tasks, before
      their mm's are rebound.  A kernel global 'cpuset_being_rebound' is set to
      point to the cpuset being rebound (there can only be one; cpuset modifications
      are done under a global 'manage_sem' semaphore), and the mpol_copy code that
      is used to copy a tasks mempolicies during fork catches such forking tasks,
      and ensures their children are also rebound.
      
      When a task is moved to a different cpuset, it is easier, as there is only one
      task involved.  It's mm->vma's are scanned, using the same
      mpol_rebind_policy() as used above.
      
      It may happen that both the mpol_copy hook and the update done via the
      tasklist scan update the same mm twice.  This is ok, as the mempolicies of
      each vma in an mm keep track of what mems_allowed they are relative to, and
      safely no-op a second request to rebind to the same nodes.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4225399a
    • P
      [PATCH] cpuset: number_of_cpusets optimization · 202f72d5
      Paul Jackson 提交于
      Easy little optimization hack to avoid actually having to call
      cpuset_zone_allowed() and check mems_allowed, in the main page allocation
      routine, __alloc_pages().  This saves several CPU cycles per page allocation
      on systems not using cpusets.
      
      A counter is updated each time a cpuset is created or removed, and whenever
      there is only one cpuset in the system, it must be the root cpuset, which
      contains all CPUs and all Memory Nodes.  In that case, when the counter is
      one, all allocations are allowed.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      202f72d5
    • P
      [PATCH] cpuset: numa_policy_rebind cleanup · 74cb2155
      Paul Jackson 提交于
      Cleanup, reorganize and make more robust the mempolicy.c code to rebind
      mempolicies relative to the containing cpuset after a tasks memory placement
      changes.
      
      The real motivator for this cleanup patch is to lay more groundwork for the
      upcoming patch to correctly rebind NUMA mempolicies that are attached to vma's
      after the containing cpuset memory placement changes.
      
      NUMA mempolicies are constrained by the cpuset their task is a member of.
      When either (1) a task is moved to a different cpuset, or (2) the 'mems'
      mems_allowed of a cpuset is changed, then the NUMA mempolicies have embedded
      node numbers (for MPOL_BIND, MPOL_INTERLEAVE and MPOL_PREFERRED) that need to
      be recalculated, relative to their new cpuset placement.
      
      The old code used an unreliable method of determining what was the old
      mems_allowed constraining the mempolicy.  It just looked at the tasks
      mems_allowed value.  This sort of worked with the present code, that just
      rebinds the -task- mempolicy, and leaves any -vma- mempolicies broken,
      referring to the old nodes.  But in an upcoming patch, the vma mempolicies
      will be rebound as well.  Then the order in which the various task and vma
      mempolicies are updated will no longer be deterministic, and one can no longer
      count on the task->mems_allowed holding the old value for as long as needed.
      It's not even clear if the current code was guaranteed to work reliably for
      task mempolicies.
      
      So I added a mems_allowed field to each mempolicy, stating exactly what
      mems_allowed the policy is relative to, and updated synchronously and reliably
      anytime that the mempolicy is rebound.
      
      Also removed a useless wrapper routine, numa_policy_rebind(), and had its
      caller, cpuset_update_task_memory_state(), call directly to the rewritten
      policy_rebind() routine, and made that rebind routine extern instead of
      static, and added a "mpol_" prefix to its name, making it
      mpol_rebind_policy().
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      74cb2155
    • P
      [PATCH] cpuset: implement cpuset_mems_allowed · 909d75a3
      Paul Jackson 提交于
      Provide a cpuset_mems_allowed() method, which the sys_migrate_pages() code
      needed, to obtain the mems_allowed vector of a cpuset, and replaced the
      workaround in sys_migrate_pages() to call this new method.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      909d75a3
    • P
      [PATCH] cpuset: combine refresh_mems and update_mems · cf2a473c
      Paul Jackson 提交于
      The important code paths through alloc_pages_current() and alloc_page_vma(),
      by which most kernel page allocations go, both called
      cpuset_update_current_mems_allowed(), which in turn called refresh_mems().
      -Both- of these latter two routines did a tasklock, got the tasks cpuset
      pointer, and checked for out of date cpuset->mems_generation.
      
      That was a silly duplication of code and waste of CPU cycles on an important
      code path.
      
      Consolidated those two routines into a single routine, called
      cpuset_update_task_memory_state(), since it updates more than just
      mems_allowed.
      
      Changed all callers of either routine to call the new consolidated routine.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cf2a473c
    • P
      [PATCH] cpuset: fork hook fix · b4b26418
      Paul Jackson 提交于
      Fix obscure, never seen in real life, cpuset fork race.  The cpuset_fork()
      call in fork.c was setting up the correct task->cpuset pointer after the
      tasklist_lock was dropped, which briefly exposed the newly forked process with
      an unsafe (copied from parent without locks or usage counter increment) cpuset
      pointer.
      
      In theory, that exposed cpuset pointer could have been pointing at a cpuset
      that was already freed and removed, and in theory another task that had been
      sitting on the tasklist_lock waiting to scan the task list could have raced
      down the entire tasklist, found our new child at the far end, and dereferenced
      that bogus cpuset pointer.
      
      To fix, setup up the correct cpuset pointer in the new child by calling
      cpuset_fork() before the new task is linked into the tasklist, and with that,
      add a fork failure case, to dereference that cpuset, if the fork fails along
      the way, after cpuset_fork() was called.
      
      Had to remove a BUG_ON() from cpuset_exit(), because it was no longer valid -
      the call to cpuset_exit() from a failed fork would not have PF_EXITING set.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b4b26418
    • P
      [PATCH] cpuset: update_nodemask code reformat · 59dac16f
      Paul Jackson 提交于
      Restructure code layout of the kernel/cpuset.c update_nodemask() routine,
      removing embedded returns and nested if's in favor of goto completion labels.
      This is being done in anticipation of adding more logic to this routine, which
      will favor the goto style structure.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      59dac16f
    • P
      [PATCH] cpuset: minor spacing initializer fixes · c5b2aff8
      Paul Jackson 提交于
      Four trivial cpuset fixes: remove extra spaces, remove useless initializers,
      mark one __read_mostly.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c5b2aff8