1. 30 6月, 2006 4 次提交
    • I
      [PATCH] genirq: cleanup: merge irq_affinity[] into irq_desc[] · a53da52f
      Ingo Molnar 提交于
      Consolidation: remove the irq_affinity[NR_IRQS] array and move it into the
      irq_desc[NR_IRQS].affinity field.
      
      [akpm@osdl.org: sparc64 build fix]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a53da52f
    • I
      [PATCH] genirq: rename desc->handler to desc->chip · d1bef4ed
      Ingo Molnar 提交于
      This patch-queue improves the generic IRQ layer to be truly generic, by adding
      various abstractions and features to it, without impacting existing
      functionality.
      
      While the queue can be best described as "fix and improve everything in the
      generic IRQ layer that we could think of", and thus it consists of many
      smaller features and lots of cleanups, the one feature that stands out most is
      the new 'irq chip' abstraction.
      
      The irq-chip abstraction is about describing and coding and IRQ controller
      driver by mapping its raw hardware capabilities [and quirks, if needed] in a
      straightforward way, without having to think about "IRQ flow"
      (level/edge/etc.) type of details.
      
      This stands in contrast with the current 'irq-type' model of genirq
      architectures, which 'mixes' raw hardware capabilities with 'flow' details.
      The patchset supports both types of irq controller designs at once, and
      converts i386 and x86_64 to the new irq-chip design.
      
      As a bonus side-effect of the irq-chip approach, chained interrupt controllers
      (master/slave PIC constructs, etc.) are now supported by design as well.
      
      The end result of this patchset intends to be simpler architecture-level code
      and more consolidation between architectures.
      
      We reused many bits of code and many concepts from Russell King's ARM IRQ
      layer, the merging of which was one of the motivations for this patchset.
      
      This patch:
      
      rename desc->handler to desc->chip.
      
      Originally i did not want to do this, because it's a big patch.  But having
      both "desc->handler", "desc->handle_irq" and "action->handler" caused a
      large degree of confusion and made the code appear alot less clean than it
      truly is.
      
      I have also attempted a dual approach as well by introducing a
      desc->chip alias - but that just wasnt robust enough and broke
      frequently.
      
      So lets get over with this quickly.  The conversion was done automatically
      via scripts and converts all the code in the kernel.
      
      This renaming patch is the first one amongst the patches, so that the
      remaining patches can stay flexible and can be merged and split up
      without having some big monolithic patch act as a merge barrier.
      
      [akpm@osdl.org: build fix]
      [akpm@osdl.org: another build fix]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d1bef4ed
    • K
      [PATCH] i4l: remove unneeded include/linux/isdn/tpam.h · 9dc3885d
      Karsten Keil 提交于
      The TPAM isdn driver was removed in 2.6.12, but include/linux/isdn/tpam.h
      was missed.
      Signed-off-by: NKarsten Keil <kkeil@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9dc3885d
    • D
      [PATCH] Keys: Allow in-kernel key requestor to pass auxiliary data to upcaller · 4e54f085
      David Howells 提交于
      The proposed NFS key type uses its own method of passing key requests to
      userspace (upcalling) rather than invoking /sbin/request-key.  This is
      because the responsible userspace daemon should already be running and will
      be contacted through rpc_pipefs.
      
      This patch permits the NFS filesystem to pass auxiliary data to the upcall
      operation (struct key_type::request_key) so that the upcaller can use a
      pre-existing communications channel more easily.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-By: NKevin Coffman <kwc@citi.umich.edu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4e54f085
  2. 29 6月, 2006 8 次提交
  3. 28 6月, 2006 28 次提交
    • A
      [PATCH] drivers/char/ipmi/ipmi_msghandler.c: make proc_ipmi_root static · 456229a9
      Adrian Bunk 提交于
      Make struct proc_ipmi_root static.
      
      Besides this, tremove removes an unused #ifdef CONFIG_PROC_FS from
      include/linux/ipmi.h.
      Acked-by: NCorey Minyard <minyard@acm.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      456229a9
    • T
      [PATCH] rtmutex: Propagate priority settings into PI lock chains · 95e02ca9
      Thomas Gleixner 提交于
      When the priority of a task, which is blocked on a lock, changes we must
      propagate this change into the PI lock chain.  Therefor the chain walk code
      is changed to get rid of the references to current to avoid false positives
      in the deadlock detector, as setscheduler might be called by a task which
      holds the lock on which the task whose priority is changed is blocked.
      
      Also add some comments about the get/put_task_struct usage to avoid
      confusion.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      95e02ca9
    • I
      [PATCH] pi-futex: futex_lock_pi/futex_unlock_pi support · c87e2837
      Ingo Molnar 提交于
      This adds the actual pi-futex implementation, based on rt-mutexes.
      
      [dino@in.ibm.com: fix an oops-causing race]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NDinakar Guniguntala <dino@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c87e2837
    • T
      [PATCH] pi-futex: rt mutex tester · 61a87122
      Thomas Gleixner 提交于
      RT-mutex tester: scriptable tester for rt mutexes, which allows userspace
      scripting of mutex unit-tests (and dynamic tests as well), using the actual
      rt-mutex implementation of the kernel.
      
      [akpm@osdl.org: fixlet]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      61a87122
    • I
      [PATCH] pi-futex: rt mutex debug · e7eebaf6
      Ingo Molnar 提交于
      Runtime debugging functionality for rt-mutexes.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e7eebaf6
    • I
      [PATCH] pi-futex: rt mutex core · 23f78d4a
      Ingo Molnar 提交于
      Core functions for the rt-mutex subsystem.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      23f78d4a
    • I
      [PATCH] pi-futex: scheduler support for pi · b29739f9
      Ingo Molnar 提交于
      Add framework to boost/unboost the priority of RT tasks.
      
      This consists of:
      
       - caching the 'normal' priority in ->normal_prio
       - providing a functions to set/get the priority of the task
       - make sched_setscheduler() aware of boosting
      
      The effective_prio() cleanups also fix a priority-calculation bug pointed out
      by Andrey Gelman, in set_user_nice().
      
      has_rt_policy() fix: Peter Williams <pwil3058@bigpond.net.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Andrey Gelman <agelman@012.net.il>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b29739f9
    • I
      [PATCH] pi-futex: add plist implementation · 77ba89c5
      Ingo Molnar 提交于
      Add the priority-sorted list (plist) implementation.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      77ba89c5
    • I
      [PATCH] pi-futex: introduce debug_check_no_locks_freed() · f9b8404c
      Ingo Molnar 提交于
      Add debug_check_no_locks_freed(), as a central inline to add
      bad-lock-free-debugging functionality to.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f9b8404c
    • I
      [PATCH] pi-futex: futex code cleanups · e2970f2f
      Ingo Molnar 提交于
      We are pleased to announce "lightweight userspace priority inheritance" (PI)
      support for futexes.  The following patchset and glibc patch implements it,
      ontop of the robust-futexes patchset which is included in 2.6.16-mm1.
      
      We are calling it lightweight for 3 reasons:
      
       - in the user-space fastpath a PI-enabled futex involves no kernel work
         (or any other PI complexity) at all.  No registration, no extra kernel
         calls - just pure fast atomic ops in userspace.
      
       - in the slowpath (in the lock-contention case), the system call and
         scheduling pattern is in fact better than that of normal futexes, due to
         the 'integrated' nature of FUTEX_LOCK_PI.  [more about that further down]
      
       - the in-kernel PI implementation is streamlined around the mutex
         abstraction, with strict rules that keep the implementation relatively
         simple: only a single owner may own a lock (i.e.  no read-write lock
         support), only the owner may unlock a lock, no recursive locking, etc.
      
        Priority Inheritance - why, oh why???
        -------------------------------------
      
      Many of you heard the horror stories about the evil PI code circling Linux for
      years, which makes no real sense at all and is only used by buggy applications
      and which has horrible overhead.  Some of you have dreaded this very moment,
      when someone actually submits working PI code ;-)
      
      So why would we like to see PI support for futexes?
      
      We'd like to see it done purely for technological reasons.  We dont think it's
      a buggy concept, we think it's useful functionality to offer to applications,
      which functionality cannot be achieved in other ways.  We also think it's the
      right thing to do, and we think we've got the right arguments and the right
      numbers to prove that.  We also believe that we can address all the
      counter-arguments as well.  For these reasons (and the reasons outlined below)
      we are submitting this patch-set for upstream kernel inclusion.
      
      What are the benefits of PI?
      
        The short reply:
        ----------------
      
      User-space PI helps achieving/improving determinism for user-space
      applications.  In the best-case, it can help achieve determinism and
      well-bound latencies.  Even in the worst-case, PI will improve the statistical
      distribution of locking related application delays.
      
        The longer reply:
        -----------------
      
      Firstly, sharing locks between multiple tasks is a common programming
      technique that often cannot be replaced with lockless algorithms.  As we can
      see it in the kernel [which is a quite complex program in itself], lockless
      structures are rather the exception than the norm - the current ratio of
      lockless vs.  locky code for shared data structures is somewhere between 1:10
      and 1:100.  Lockless is hard, and the complexity of lockless algorithms often
      endangers to ability to do robust reviews of said code.  I.e.  critical RT
      apps often choose lock structures to protect critical data structures, instead
      of lockless algorithms.  Furthermore, there are cases (like shared hardware,
      or other resource limits) where lockless access is mathematically impossible.
      
      Media players (such as Jack) are an example of reasonable application design
      with multiple tasks (with multiple priority levels) sharing short-held locks:
      for example, a highprio audio playback thread is combined with medium-prio
      construct-audio-data threads and low-prio display-colory-stuff threads.  Add
      video and decoding to the mix and we've got even more priority levels.
      
      So once we accept that synchronization objects (locks) are an unavoidable fact
      of life, and once we accept that multi-task userspace apps have a very fair
      expectation of being able to use locks, we've got to think about how to offer
      the option of a deterministic locking implementation to user-space.
      
      Most of the technical counter-arguments against doing priority inheritance
      only apply to kernel-space locks.  But user-space locks are different, there
      we cannot disable interrupts or make the task non-preemptible in a critical
      section, so the 'use spinlocks' argument does not apply (user-space spinlocks
      have the same priority inversion problems as other user-space locking
      constructs).  Fact is, pretty much the only technique that currently enables
      good determinism for userspace locks (such as futex-based pthread mutexes) is
      priority inheritance:
      
      Currently (without PI), if a high-prio and a low-prio task shares a lock [this
      is a quite common scenario for most non-trivial RT applications], even if all
      critical sections are coded carefully to be deterministic (i.e.  all critical
      sections are short in duration and only execute a limited number of
      instructions), the kernel cannot guarantee any deterministic execution of the
      high-prio task: any medium-priority task could preempt the low-prio task while
      it holds the shared lock and executes the critical section, and could delay it
      indefinitely.
      
        Implementation:
        ---------------
      
      As mentioned before, the userspace fastpath of PI-enabled pthread mutexes
      involves no kernel work at all - they behave quite similarly to normal
      futex-based locks: a 0 value means unlocked, and a value==TID means locked.
      (This is the same method as used by list-based robust futexes.) Userspace uses
      atomic ops to lock/unlock these mutexes without entering the kernel.
      
      To handle the slowpath, we have added two new futex ops:
      
        FUTEX_LOCK_PI
        FUTEX_UNLOCK_PI
      
      If the lock-acquire fastpath fails, [i.e.  an atomic transition from 0 to TID
      fails], then FUTEX_LOCK_PI is called.  The kernel does all the remaining work:
      if there is no futex-queue attached to the futex address yet then the code
      looks up the task that owns the futex [it has put its own TID into the futex
      value], and attaches a 'PI state' structure to the futex-queue.  The pi_state
      includes an rt-mutex, which is a PI-aware, kernel-based synchronization
      object.  The 'other' task is made the owner of the rt-mutex, and the
      FUTEX_WAITERS bit is atomically set in the futex value.  Then this task tries
      to lock the rt-mutex, on which it blocks.  Once it returns, it has the mutex
      acquired, and it sets the futex value to its own TID and returns.  Userspace
      has no other work to perform - it now owns the lock, and futex value contains
      FUTEX_WAITERS|TID.
      
      If the unlock side fastpath succeeds, [i.e.  userspace manages to do a TID ->
      0 atomic transition of the futex value], then no kernel work is triggered.
      
      If the unlock fastpath fails (because the FUTEX_WAITERS bit is set), then
      FUTEX_UNLOCK_PI is called, and the kernel unlocks the futex on the behalf of
      userspace - and it also unlocks the attached pi_state->rt_mutex and thus wakes
      up any potential waiters.
      
      Note that under this approach, contrary to other PI-futex approaches, there is
      no prior 'registration' of a PI-futex.  [which is not quite possible anyway,
      due to existing ABI properties of pthread mutexes.]
      
      Also, under this scheme, 'robustness' and 'PI' are two orthogonal properties
      of futexes, and all four combinations are possible: futex, robust-futex,
      PI-futex, robust+PI-futex.
      
        glibc support:
        --------------
      
      Ulrich Drepper and Jakub Jelinek have written glibc support for PI-futexes
      (and robust futexes), enabling robust and PI (PTHREAD_PRIO_INHERIT) POSIX
      mutexes.  (PTHREAD_PRIO_PROTECT support will be added later on too, no
      additional kernel changes are needed for that).  [NOTE: The glibc patch is
      obviously inofficial and unsupported without matching upstream kernel
      functionality.]
      
      the patch-queue and the glibc patch can also be downloaded from:
      
        http://redhat.com/~mingo/PI-futex-patches/
      
      Many thanks go to the people who helped us create this kernel feature: Steven
      Rostedt, Esben Nielsen, Benedikt Spranger, Daniel Walker, John Cooper, Arjan
      van de Ven, Oleg Nesterov and others.  Credits for related prior projects goes
      to Dirk Grambow, Inaky Perez-Gonzalez, Bill Huey and many others.
      
      Clean up the futex code, before adding more features to it:
      
       - use u32 as the futex field type - that's the ABI
       - use __user and pointers to u32 instead of unsigned long
       - code style / comment style cleanups
       - rename hash-bucket name from 'bh' to 'hb'.
      
      I checked the pre and post futex.o object files to make sure this
      patch has no code effects.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Jakub Jelinek <jakub@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e2970f2f
    • S
      [PATCH] sched: mc/smt power savings sched policy · 5c45bf27
      Siddha, Suresh B 提交于
      sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in
      /sys/devices/system/cpu/ control the MC/SMT power savings policy for the
      scheduler.
      
      Based on the values (1-enable, 0-disable) for these controls, sched groups
      cpu power will be determined for different domains.  When power savings
      policy is enabled and under light load conditions, scheduler will minimize
      the physical packages/cpu cores carrying the load and thus conserving
      power(with a perf impact based on the workload characteristics...  see OLS
      2005 CMP kernel scheduler paper for more details..)
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Con Kolivas <kernel@kolivas.org>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5c45bf27
    • S
      [PATCH] sched_domain: handle kmalloc failure · 51888ca2
      Srivatsa Vaddagiri 提交于
      Try to handle mem allocation failures in build_sched_domains by bailing out
      and cleaning up thus-far allocated memory.  The patch has a direct consequence
      that we disable load balancing completely (even at sibling level) upon *any*
      memory allocation failure.
      
      [Lee.Schermerhorn@hp.com: bugfix]
      Signed-off-by: NSrivatsa Vaddagir <vatsa@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      51888ca2
    • P
      [PATCH] sched: implement smpnice · 2dd73a4f
      Peter Williams 提交于
      Problem:
      
      The introduction of separate run queues per CPU has brought with it "nice"
      enforcement problems that are best described by a simple example.
      
      For the sake of argument suppose that on a single CPU machine with a
      nice==19 hard spinner and a nice==0 hard spinner running that the nice==0
      task gets 95% of the CPU and the nice==19 task gets 5% of the CPU.  Now
      suppose that there is a system with 2 CPUs and 2 nice==19 hard spinners and
      2 nice==0 hard spinners running.  The user of this system would be entitled
      to expect that the nice==0 tasks each get 95% of a CPU and the nice==19
      tasks only get 5% each.  However, whether this expectation is met is pretty
      much down to luck as there are four equally likely distributions of the
      tasks to the CPUs that the load balancing code will consider to be balanced
      with loads of 2.0 for each CPU.  Two of these distributions involve one
      nice==0 and one nice==19 task per CPU and in these circumstances the users
      expectations will be met.  The other two distributions both involve both
      nice==0 tasks being on one CPU and both nice==19 being on the other CPU and
      each task will get 50% of a CPU and the user's expectations will not be
      met.
      
      Solution:
      
      The solution to this problem that is implemented in the attached patch is
      to use weighted loads when determining if the system is balanced and, when
      an imbalance is detected, to move an amount of weighted load between run
      queues (as opposed to a number of tasks) to restore the balance.  Once
      again, the easiest way to explain why both of these measures are necessary
      is to use a simple example.  Suppose that (in a slight variation of the
      above example) that we have a two CPU system with 4 nice==0 and 4 nice=19
      hard spinning tasks running and that the 4 nice==0 tasks are on one CPU and
      the 4 nice==19 tasks are on the other CPU.  The weighted loads for the two
      CPUs would be 4.0 and 0.2 respectively and the load balancing code would
      move 2 tasks resulting in one CPU with a load of 2.0 and the other with
      load of 2.2.  If this was considered to be a big enough imbalance to
      justify moving a task and that task was moved using the current
      move_tasks() then it would move the highest priority task that it found and
      this would result in one CPU with a load of 3.0 and the other with a load
      of 1.2 which would result in the movement of a task in the opposite
      direction and so on -- infinite loop.  If, on the other hand, an amount of
      load to be moved is calculated from the imbalance (in this case 0.1) and
      move_tasks() skips tasks until it find ones whose contributions to the
      weighted load are less than this amount it would move two of the nice==19
      tasks resulting in a system with 2 nice==0 and 2 nice=19 on each CPU with
      loads of 2.1 for each CPU.
      
      One of the advantages of this mechanism is that on a system where all tasks
      have nice==0 the load balancing calculations would be mathematically
      identical to the current load balancing code.
      
      Notes:
      
      struct task_struct:
      
      has a new field load_weight which (in a trade off of space for speed)
      stores the contribution that this task makes to a CPU's weighted load when
      it is runnable.
      
      struct runqueue:
      
      has a new field raw_weighted_load which is the sum of the load_weight
      values for the currently runnable tasks on this run queue.  This field
      always needs to be updated when nr_running is updated so two new inline
      functions inc_nr_running() and dec_nr_running() have been created to make
      sure that this happens.  This also offers a convenient way to optimize away
      this part of the smpnice mechanism when CONFIG_SMP is not defined.
      
      int try_to_wake_up():
      
      in this function the value SCHED_LOAD_BALANCE is used to represent the load
      contribution of a single task in various calculations in the code that
      decides which CPU to put the waking task on.  While this would be a valid
      on a system where the nice values for the runnable tasks were distributed
      evenly around zero it will lead to anomalous load balancing if the
      distribution is skewed in either direction.  To overcome this problem
      SCHED_LOAD_SCALE has been replaced by the load_weight for the relevant task
      or by the average load_weight per task for the queue in question (as
      appropriate).
      
      int move_tasks():
      
      The modifications to this function were complicated by the fact that
      active_load_balance() uses it to move exactly one task without checking
      whether an imbalance actually exists.  This precluded the simple
      overloading of max_nr_move with max_load_move and necessitated the addition
      of the latter as an extra argument to the function.  The internal
      implementation is then modified to move up to max_nr_move tasks and
      max_load_move of weighted load.  This slightly complicates the code where
      move_tasks() is called and if ever active_load_balance() is changed to not
      use move_tasks() the implementation of move_tasks() should be simplified
      accordingly.
      
      struct sched_group *find_busiest_group():
      
      Similar to try_to_wake_up(), there are places in this function where
      SCHED_LOAD_SCALE is used to represent the load contribution of a single
      task and the same issues are created.  A similar solution is adopted except
      that it is now the average per task contribution to a group's load (as
      opposed to a run queue) that is required.  As this value is not directly
      available from the group it is calculated on the fly as the queues in the
      groups are visited when determining the busiest group.
      
      A key change to this function is that it is no longer to scale down
      *imbalance on exit as move_tasks() uses the load in its scaled form.
      
      void set_user_nice():
      
      has been modified to update the task's load_weight field when it's nice
      value and also to ensure that its run queue's raw_weighted_load field is
      updated if it was runnable.
      
      From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      
      With smpnice, sched groups with highest priority tasks can mask the imbalance
      between the other sched groups with in the same domain.  This patch fixes some
      of the listed down scenarios by not considering the sched groups which are
      lightly loaded.
      
      a) on a simple 4-way MP system, if we have one high priority and 4 normal
         priority tasks, with smpnice we would like to see the high priority task
         scheduled on one cpu, two other cpus getting one normal task each and the
         fourth cpu getting the remaining two normal tasks.  but with current
         smpnice extra normal priority task keeps jumping from one cpu to another
         cpu having the normal priority task.  This is because of the
         busiest_has_loaded_cpus, nr_loaded_cpus logic..  We are not including the
         cpu with high priority task in max_load calculations but including that in
         total and avg_load calcuations..  leading to max_load < avg_load and load
         balance between cpus running normal priority tasks(2 Vs 1) will always show
         imbalanace as one normal priority and the extra normal priority task will
         keep moving from one cpu to another cpu having normal priority task..
      
      b) 4-way system with HT (8 logical processors).  Package-P0 T0 has a
         highest priority task, T1 is idle.  Package-P1 Both T0 and T1 have 1 normal
         priority task each..  P2 and P3 are idle.  With this patch, one of the
         normal priority tasks on P1 will be moved to P2 or P3..
      
      c) With the current weighted smp nice calculations, it doesn't always make
         sense to look at the highest weighted runqueue in the busy group..
         Consider a load balance scenario on a DP with HT system, with Package-0
         containing one high priority and one low priority, Package-1 containing one
         low priority(with other thread being idle)..  Package-1 thinks that it need
         to take the low priority thread from Package-0.  And find_busiest_queue()
         returns the cpu thread with highest priority task..  And ultimately(with
         help of active load balance) we move high priority task to Package-1.  And
         same continues with Package-0 now, moving high priority task from package-1
         to package-0..  Even without the presence of active load balance, load
         balance will fail to balance the above scenario..  Fix find_busiest_queue
         to use "imbalance" when it is lightly loaded.
      
      [kernel@kolivas.org: sched: store weighted load on up]
      [kernel@kolivas.org: sched: add discrete weighted cpu load function]
      [suresh.b.siddha@intel.com: sched: remove dead code]
      Signed-off-by: NPeter Williams <pwil3058@bigpond.com.au>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NCon Kolivas <kernel@kolivas.org>
      Cc: John Hawkes <hawkes@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2dd73a4f
    • J
      [PATCH] chardev: GPIO for SCx200 & PC-8736x: use dev_dbg in common module · f31000e5
      Jim Cromie 提交于
      Use of dev_dbg() and friends is considered good practice.  dev_dbg() needs a
      struct device *devp, but nsc_gpio is only a helper module, so it doesnt
      have/need its own.  To provide devp to the user-modules (scx200 & pc8736x
      _gpio), we add it to the vtable, and set it during init.
      
      Also squeeze nsc_gpio_dump()'s format a little.
      
      [  199.259879]  pc8736x_gpio.0: io09: 0x0044 TS OD PUE  EDGE LO DEBOUNCE
      Signed-off-by: NJim Cromie <jim.cromie@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f31000e5
    • J
      [PATCH] chardev: GPIO for SCx200 & PC-8736x: migrate file-ops to common module · 1a66fdf0
      Jim Cromie 提交于
      Now that the read(), write() file-ops are dispatching gpio-ops via the vtable,
      they are generic, and can be moved 'verbatim' to the nsc_gpio common-support
      module.  After the move, various symbols are renamed to update 'scx200_' to
      'nsc_', and headers are adjusted accordingly.
      Signed-off-by: NJim Cromie <jim.cromie@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1a66fdf0
    • J
      [PATCH] chardev: GPIO for SCx200 & PC-8736x: add gpio-ops vtable · fe3a168a
      Jim Cromie 提交于
      Abstract the gpio operations into a new nsc_gpio_ops vtable.
      Signed-off-by: NJim Cromie <jim.cromie@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fe3a168a
    • J
      [PATCH] chardev: GPIO for SCx200 & PC-8736x: device minor numbers are unsigned ints · 55b8c045
      Jim Cromie 提交于
      Per kernel headers, device minor numbers are unsigned ints.  Do the same in
      this driver.
      Signed-off-by: NJim Cromie <jim.cromie@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      55b8c045
    • J
      [PATCH] chardev: GPIO for SCx200 & PC-8736x: whitespace pre-clean · 62c83cde
      Jim Cromie 提交于
      GPIO SUPPORT FOR SCx200 & PC8736x
      
      The patch-set reworks the 2.4 vintage scx200_gpio driver for modern 2.6, and
      refactors GPIO support to reuse it in a new driver for the GPIO on PC-8736x
      chips.  Its handy for the Soekris.com net-4801, which has both chips.
      
      These patches have been seen recently on Kernel-Mentors, and then
      Kernel-Newbies ML, where Jesper Juhl kindly reviewed it.  His feedback has
      been incorporated.  Thanks Jesper !
      
      Its also gone to soekris-tech@soekris.com for possible testing by linux folks,
      I've gotten 1 promise so far.  Theyre mostly BSD folk over there, but we'll
      see..
      
      Device-file & Sysfs
      
      The driver preserves the existing device-file interface, including the
      write/cmd set, but adds v to 'view' the pin-settings & configs by inducing,
      via gpio_dump(), a dev_info() call.  Its a fairly crappy way to get status,
      but it sticks to the syslog approach, conservatively.
      
      Allowing users to voluntarily trigger logging is good, it gives them a
      familiar way to confirm their app's control & use of the pins, and I've thus
      reduced the pin-mode-updates from dev_info to dev_dbg.
      
      I've recently bolted on a proto sysfs interface for both new drivers.  Im not
      including those patches here; they (the patch + doc-pre-patch) are still quite
      raw (and unreviewed on KNML), and since they 'invent' a convention for GPIO, a
      proper vetting is needed.  Since this patchset is much bigger than my previous
      ones, Id like to keep things simpler, and address it 1st, before bolting on
      more stuff.
      
      The driver-split
      
      The Geode CPU and the PC-87366 Super-IO chip have GPIO units which share a
      common pin-architecture (same pin features, with same bits controlling), but
      with different addressing mechanics and port organizations.
      
      The vintage driver expresses the pin capabilities with pin-mode commands
      [OoPpTt],etc that change the pin configurations, and since the 2 chips share
      pin-arch, we can reuse the read(), write() commands, once the implementation
      is suitably adjusted.
      
      The patchset adds a vtable: struct nsc_gpio_ops, to abstract the existing gpio
      operations, then adjusts fileops.write() code to invoke operations via that
      vtable.  Driver specific open()s set private_data to the vtable so its
      available for use by write().
      
      The vtable gets the gpio_dump() too, since its user-friendly, and (could be
      construed as) part of the current device-file interface.  To support use of
      dev_dbg() in write() & _dump(), the vtable gets a dev ptr too, set by both
      scx200 & pc8736x _gpio drivers.
      
      heres how the pins are presented in syslog:
      
      [ 1890.176223]  scx200_gpio.0: io00: 0x0044 TS OD PUE  EDGE LO DEBOUNCE
      [ 1890.287223]  scx200_gpio.0: io01: 0x0003 OE PP PUD  EDGE LO
      
      nsc_gpio.c: new file is new home of several file-ops methods, which are
      modified to get their vtable from filp->private_data, and use it where needed.
      
      scx200_gpio.c: keeps some of its existing gpio routines, but now wires them up
      via the vtable (they're invoked by nsc_gpio.c:nsc_gpio_write() thru this
      vtable).  A driver-spcific open() initializes filp->private_data with the
      vtable.
      
      Once the split is clean, and the scx200_gpio driver is working, we copy and
      modify the function and variable names, and rework the access-method bodies
      for the different addressing scheme.
      
      Heres a working overview of the patchset:
      
      # series file for GPIO
      
      # Spring Cleaning
      gpio-scx/patch.preclean        # scripts/Lindent fixes, editor-ctrl comments
      
      # API Modernization
      
      gpio-scx/patch.api26        # what I learned from LDD3
      gpio-scx/patch.platform-dev-2    # get pdev, support for dev_dbg()
      gpio-scx/patch.unsigned-minor    # fix to match std practice
      
      # Debuggability
      
      gpio-scx/patch.dump-diet    # shrink gpio_dump()
      gpio-scx/patch.viewpins        # add new 'command' to call dump()
      gpio-scx/patch.init-refactor    # pull shadow-register init to sub
      
      # Access-Abstraction (add vtable)
      
      gpio-scx/patch.access-vtable    # introduce nsg_gpio_ops vtable, w dump
      gpio-scx/patch.vtable-calls    # add & use the vtable in scx200_gpio
      gpio-scx/patch.nscgpio-shell    # add empty driver for common-fops
      
      # move code under abstraction
      gpio-scx/patch.migrate-fops    # move file-ops methods from scx200_gpio
      gpio-scx/patch.common-dump    # mv scx200.c:scx200_gpio_dump() to nsc_gpio.c
      gpio-scx/patch.add-pc8736x-gpio    # add new driver, like old, w chip adapt
      # gpio-scx/patch.add-DEBUG    # enable all dev_dbg()s
      
      # Cleanups
      
      # finish printk -> dev_dbg() etc
      gpio-scx/patch.pdev-pc8736x    # new drvr needs pdev too,
      gpio-scx/patch.devdbg-nscgpio    # add device to 'vtable', use in dev_dbg()
      
      # gpio-scx/patch.pin-config-view    # another 'c' 'command'
      # gpio-scx/quiet-getset        # take out excess dbg stuff (pretty quiet
      now)
      gpio-scx/patch.shadow-current    # imitate scx200_gpio's shadow regs in
      pc87*
      
      # post KMentors-post patches ..
      
      gpio-scx/patch.mutexes        # use mutexes for config-locks
      gpio-scx/patch.viewpins-values    # extend dump to obsolete separate 'c' cmd
      
      gpio-scx/patch.kconfig        # add stuff for kbuild
      
      # TBC
      # combine api26 with pdev, which is just one step.
      # merge c&v commands to single do-all-fn
      # delay viewpins, dump-diet should also un-ifdef it too.
      
      diff.sys-gpio-rollup-1
      
      This patch:
      
      Removed editor format-control comments, and used scripts/Lindent to clean up
      whitespace, then deleted the bogus chunks :-(
      Signed-off-by: NJim Cromie <jim.cromie@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      62c83cde
    • C
      [PATCH] cpu hotplug: add hotplug versions of cpu_notifier · 39f4885c
      Chandra Seetharaman 提交于
      Define new macros register_hotcpu_notifier() and unregister_hotcpu_notifier()
      that redefines register_cpu_notifier() and unregister_cpu_notifier() for use
      only when HOTPLUG_CPU is defined.
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      39f4885c
    • C
      [PATCH] cpu hotplug: make [un]register_cpu_notifier init time only · 65edc68c
      Chandra Seetharaman 提交于
      CPUs come online only at init time (unless CONFIG_HOTPLUG_CPU is defined).
      So, cpu_notifier functionality need to be available only at init time.
      
      This patch makes register_cpu_notifier() available only at init time, unless
      CONFIG_HOTPLUG_CPU is defined.
      
      This patch exports register_cpu_notifier() and unregister_cpu_notifier() only
      if CONFIG_HOTPLUG_CPU is defined.
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      65edc68c
    • P
      [PATCH] rcutorture: add call_rcu_bh() operations · c32e0660
      Paul E. McKenney 提交于
      Add operations for the call_rcu_bh() variant of RCU.  Also add an
      rcu_batches_completed_bh() function, which is needed by rcutorture.
      Signed-off-by: NPaul E. McKenney <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c32e0660
    • D
      [PATCH] Remove gratuitous inclusion of <linux/config.h> from <linux/dmaengine.h> · 1c0f16e5
      David Woodhouse 提交于
      We include config.h on the compiler command line. There's no need for it
      to be included again.
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1c0f16e5
    • A
      [PATCH] fs/buffer.c: cleanups · b6cd0b77
      Adrian Bunk 提交于
      - add a proper prototype for the following global function:
        - buffer_init()
      
      - make the following needlessly global function static:
        - end_buffer_async_write()
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b6cd0b77
    • R
      [PATCH] poison: add & use more constants · a7807a32
      Randy Dunlap 提交于
      Add more poison values to include/linux/poison.h.  It's not clear to me
      whether some others should be added or not, so I haven't added any of
      these:
      
      ./include/linux/libata.h:#define ATA_TAG_POISON		0xfafbfcfdU
      ./arch/ppc/8260_io/fcc_enet.c:1918:	memset((char *)(&(immap->im_dprambase[(mem_addr+64)])), 0x88, 32);
      ./drivers/usb/mon/mon_text.c:429:	memset(mem, 0xe5, sizeof(struct mon_event_text));
      ./drivers/char/ftape/lowlevel/ftape-ctl.c:738:		memset(ft_buffer[i]->address, 0xAA, FT_BUFF_SIZE);
      ./drivers/block/sx8.c:/* 0xf is just arbitrary, non-zero noise; this is sorta like poisoning */
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a7807a32
    • R
      [PATCH] update two drivers for poison.h · b3c681e0
      Randy Dunlap 提交于
      Update two drivers to use poison.h.
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b3c681e0
    • R
      [PATCH] add poison.h and patch primary users · c9cf5528
      Randy Dunlap 提交于
      Localize poison values into one header file for better documentation and
      easier/quicker debugging and so that the same values won't be used for
      multiple purposes.
      
      Use these constants in core arch., mm, driver, and fs code.
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Acked-by: NMatt Mackall <mpm@selenic.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c9cf5528
    • I
      [PATCH] vdso: randomize the i386 vDSO by moving it into a vma · e6e5494c
      Ingo Molnar 提交于
      Move the i386 VDSO down into a vma and thus randomize it.
      
      Besides the security implications, this feature also helps debuggers, which
      can COW a vma-backed VDSO just like a normal DSO and can thus do
      single-stepping and other debugging features.
      
      It's good for hypervisors (Xen, VMWare) too, which typically live in the same
      high-mapped address space as the VDSO, hence whenever the VDSO is used, they
      get lots of guest pagefaults and have to fix such guest accesses up - which
      slows things down instead of speeding things up (the primary purpose of the
      VDSO).
      
      There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
      for older glibcs that still rely on a prelinked high-mapped VDSO.  Newer
      distributions (using glibc 2.3.3 or later) can turn this option off.  Turning
      it off is also recommended for security reasons: attackers cannot use the
      predictable high-mapped VDSO page as syscall trampoline anymore.
      
      There is a new vdso=[0|1] boot option as well, and a runtime
      /proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
      on/off.
      
      (This version of the VDSO-randomization patch also has working ELF
      coredumping, the previous patch crashed in the coredumping code.)
      
      This code is a combined work of the exec-shield VDSO randomization
      code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
      started this patch and i completed it.
      
      [akpm@osdl.org: cleanups]
      [akpm@osdl.org: compile fix]
      [akpm@osdl.org: compile fix 2]
      [akpm@osdl.org: compile fix 3]
      [akpm@osdl.org: revernt MAXMEM change]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@infradead.org>
      Cc: Gerd Hoffmann <kraxel@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e6e5494c
    • K
      [PATCH] node hotplug: register cpu: remove node struct · 76b67ed9
      KAMEZAWA Hiroyuki 提交于
      With Goto-san's patch, we can add new pgdat/node at runtime.  I'm now
      considering node-hot-add with cpu + memory on ACPI.
      
      I found acpi container, which describes node, could evaluate cpu before
      memory. This means cpu-hot-add occurs before memory hot add.
      
      In most part, cpu-hot-add doesn't depend on node hot add.  But register_cpu(),
      which creates symbolic link from node to cpu, requires that node should be
      onlined before register_cpu().  When a node is onlined, its pgdat should be
      there.
      
      This patch-set holds off creating symbolic link from node to cpu
      until node is onlined.
      
      This removes node arguments from register_cpu().
      
      Now, register_cpu() requires 'struct node' as its argument.  But the array of
      struct node is now unified in driver/base/node.c now (By Goto's node hotplug
      patch).  We can get struct node in generic way.  So, this argument is not
      necessary now.
      
      This patch also guarantees add cpu under node only when node is onlined.  It
      is necessary for node-hot-add vs.  cpu-hot-add patch following this.
      
      Moreover, register_cpu calculates cpu->node_id by cpu_to_node() without regard
      to its 'struct node *root' argument.  This patch removes it.
      
      Also modify callers of register_cpu()/unregister_cpu, whose args are changed
      by register-cpu-remove-node-struct patch.
      
      [Brice.Goglin@ens-lyon.org: fix it]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Ashok Raj <ashok.raj@intel.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NBrice Goglin <Brice.Goglin@ens-lyon.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      76b67ed9