1. 01 10月, 2006 1 次提交
  2. 30 9月, 2006 7 次提交
  3. 26 9月, 2006 2 次提交
    • A
      [PATCH] Add the canary field to the PDA area and the task struct · 0a425405
      Arjan van de Ven 提交于
      This patch adds the per thread cookie field to the task struct and the PDA.
      Also it makes sure that the PDA value gets the new cookie value at context
      switch, and that a new task gets a new cookie at task creation time.
      Signed-off-by: NArjan van Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      CC: Andi Kleen <ak@suse.de>
      0a425405
    • A
      [PATCH] non lazy "sleazy" fpu implementation · e07e23e1
      Arjan van de Ven 提交于
      Right now the kernel on x86-64 has a 100% lazy fpu behavior: after *every*
      context switch a trap is taken for the first FPU use to restore the FPU
      context lazily.  This is of course great for applications that have very
      sporadic or no FPU use (since then you avoid doing the expensive
      save/restore all the time).  However for very frequent FPU users...  you
      take an extra trap every context switch.
      
      The patch below adds a simple heuristic to this code: After 5 consecutive
      context switches of FPU use, the lazy behavior is disabled and the context
      gets restored every context switch.  If the app indeed uses the FPU, the
      trap is avoided.  (the chance of the 6th time slice using FPU after the
      previous 5 having done so are quite high obviously).
      
      After 256 switches, this is reset and lazy behavior is returned (until
      there are 5 consecutive ones again).  The reason for this is to give apps
      that do longer bursts of FPU use still the lazy behavior back after some
      time.
      
      [akpm@osdl.org: place new task_struct field next to jit_keyring to save space]
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      e07e23e1
  4. 02 9月, 2006 1 次提交
  5. 06 8月, 2006 1 次提交
  6. 15 7月, 2006 5 次提交
  7. 04 7月, 2006 5 次提交
    • I
      [PATCH] sched: cleanup, convert sched.c-internal typedefs to struct · 70b97a7f
      Ingo Molnar 提交于
      convert:
      
       - runqueue_t to 'struct rq'
       - prio_array_t to 'struct prio_array'
       - migration_req_t to 'struct migration_req'
      
      I was the one who added these but they are both against the kernel coding
      style and also were used inconsistently at places.  So just get rid of them at
      once, now that we are flushing the scheduler patch-queue anyway.
      
      Conversion was mostly scripted, the result was reviewed and all secondary
      whitespace and style impact (if any) was fixed up by hand.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      70b97a7f
    • I
      [PATCH] sched: cleanup, remove task_t, convert to struct task_struct · 36c8b586
      Ingo Molnar 提交于
      cleanup: remove task_t and convert all the uses to struct task_struct. I
      introduced it for the scheduler anno and it was a mistake.
      
      Conversion was mostly scripted, the result was reviewed and all
      secondary whitespace and style impact (if any) was fixed up by hand.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      36c8b586
    • I
      [PATCH] lockdep: core · fbb9ce95
      Ingo Molnar 提交于
      Do 'make oldconfig' and accept all the defaults for new config options -
      reboot into the kernel and if everything goes well it should boot up fine and
      you should have /proc/lockdep and /proc/lockdep_stats files.
      
      Typically if the lock validator finds some problem it will print out
      voluminous debug output that begins with "BUG: ..." and which syslog output
      can be used by kernel developers to figure out the precise locking scenario.
      
      What does the lock validator do?  It "observes" and maps all locking rules as
      they occur dynamically (as triggered by the kernel's natural use of spinlocks,
      rwlocks, mutexes and rwsems).  Whenever the lock validator subsystem detects a
      new locking scenario, it validates this new rule against the existing set of
      rules.  If this new rule is consistent with the existing set of rules then the
      new rule is added transparently and the kernel continues as normal.  If the
      new rule could create a deadlock scenario then this condition is printed out.
      
      When determining validity of locking, all possible "deadlock scenarios" are
      considered: assuming arbitrary number of CPUs, arbitrary irq context and task
      context constellations, running arbitrary combinations of all the existing
      locking scenarios.  In a typical system this means millions of separate
      scenarios.  This is why we call it a "locking correctness" validator - for all
      rules that are observed the lock validator proves it with mathematical
      certainty that a deadlock could not occur (assuming that the lock validator
      implementation itself is correct and its internal data structures are not
      corrupted by some other kernel subsystem).  [see more details and conditionals
      of this statement in include/linux/lockdep.h and
      Documentation/lockdep-design.txt]
      
      Furthermore, this "all possible scenarios" property of the validator also
      enables the finding of complex, highly unlikely multi-CPU multi-context races
      via single single-context rules, increasing the likelyhood of finding bugs
      drastically.  In practical terms: the lock validator already found a bug in
      the upstream kernel that could only occur on systems with 3 or more CPUs, and
      which needed 3 very unlikely code sequences to occur at once on the 3 CPUs.
      That bug was found and reported on a single-CPU system (!).  So in essence a
      race will be found "piecemail-wise", triggering all the necessary components
      for the race, without having to reproduce the race scenario itself!  In its
      short existence the lock validator found and reported many bugs before they
      actually caused a real deadlock.
      
      To further increase the efficiency of the validator, the mapping is not per
      "lock instance", but per "lock-class".  For example, all struct inode objects
      in the kernel have inode->inotify_mutex.  If there are 10,000 inodes cached,
      then there are 10,000 lock objects.  But ->inotify_mutex is a single "lock
      type", and all locking activities that occur against ->inotify_mutex are
      "unified" into this single lock-class.  The advantage of the lock-class
      approach is that all historical ->inotify_mutex uses are mapped into a single
      (and as narrow as possible) set of locking rules - regardless of how many
      different tasks or inode structures it took to build this set of rules.  The
      set of rules persist during the lifetime of the kernel.
      
      To see the rough magnitude of checking that the lock validator does, here's a
      portion of /proc/lockdep_stats, fresh after bootup:
      
       lock-classes:                            694 [max: 2048]
       direct dependencies:                  1598 [max: 8192]
       indirect dependencies:               17896
       all direct dependencies:             16206
       dependency chains:                    1910 [max: 8192]
       in-hardirq chains:                      17
       in-softirq chains:                     105
       in-process chains:                    1065
       stack-trace entries:                 38761 [max: 131072]
       combined max dependencies:         2033928
       hardirq-safe locks:                     24
       hardirq-unsafe locks:                  176
       softirq-safe locks:                     53
       softirq-unsafe locks:                  137
       irq-safe locks:                         59
       irq-unsafe locks:                      176
      
      The lock validator has observed 1598 actual single-thread locking patterns,
      and has validated all possible 2033928 distinct locking scenarios.
      
      More details about the design of the lock validator can be found in
      Documentation/lockdep-design.txt, which can also found at:
      
         http://redhat.com/~mingo/lockdep-patches/lockdep-design.txt
      
      [bunk@stusta.de: cleanups]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fbb9ce95
    • I
      [PATCH] lockdep: irqtrace subsystem, core · de30a2b3
      Ingo Molnar 提交于
      Accurate hard-IRQ-flags and softirq-flags state tracing.
      
      This allows us to attach extra functionality to IRQ flags on/off
      events (such as trace-on/off).
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      de30a2b3
    • I
      [PATCH] lockdep: better lock debugging · 9a11b49a
      Ingo Molnar 提交于
      Generic lock debugging:
      
       - generalized lock debugging framework. For example, a bug in one lock
         subsystem turns off debugging in all lock subsystems.
      
       - got rid of the caller address passing (__IP__/__IP_DECL__/etc.) from
         the mutex/rtmutex debugging code: it caused way too much prototype
         hackery, and lockdep will give the same information anyway.
      
       - ability to do silent tests
      
       - check lock freeing in vfree too.
      
       - more finegrained debugging options, to allow distributions to
         turn off more expensive debugging features.
      
      There's no separate 'held mutexes' list anymore - but there's a 'held locks'
      stack within lockdep, which unifies deadlock detection across all lock
      classes.  (this is independent of the lockdep validation stuff - lockdep first
      checks whether we are holding a lock already)
      
      Here are the current debugging options:
      
      CONFIG_DEBUG_MUTEXES=y
      CONFIG_DEBUG_LOCK_ALLOC=y
      
      which do:
      
       config DEBUG_MUTEXES
                bool "Mutex debugging, basic checks"
      
       config DEBUG_LOCK_ALLOC
               bool "Detect incorrect freeing of live mutexes"
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9a11b49a
  8. 01 7月, 2006 1 次提交
  9. 28 6月, 2006 8 次提交
    • T
      [PATCH] rtmutex: Propagate priority settings into PI lock chains · 95e02ca9
      Thomas Gleixner 提交于
      When the priority of a task, which is blocked on a lock, changes we must
      propagate this change into the PI lock chain.  Therefor the chain walk code
      is changed to get rid of the references to current to avoid false positives
      in the deadlock detector, as setscheduler might be called by a task which
      holds the lock on which the task whose priority is changed is blocked.
      
      Also add some comments about the get/put_task_struct usage to avoid
      confusion.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      95e02ca9
    • I
      [PATCH] pi-futex: futex_lock_pi/futex_unlock_pi support · c87e2837
      Ingo Molnar 提交于
      This adds the actual pi-futex implementation, based on rt-mutexes.
      
      [dino@in.ibm.com: fix an oops-causing race]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NDinakar Guniguntala <dino@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c87e2837
    • T
      [PATCH] pi-futex: rt mutex tester · 61a87122
      Thomas Gleixner 提交于
      RT-mutex tester: scriptable tester for rt mutexes, which allows userspace
      scripting of mutex unit-tests (and dynamic tests as well), using the actual
      rt-mutex implementation of the kernel.
      
      [akpm@osdl.org: fixlet]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      61a87122
    • I
      [PATCH] pi-futex: rt mutex core · 23f78d4a
      Ingo Molnar 提交于
      Core functions for the rt-mutex subsystem.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      23f78d4a
    • I
      [PATCH] pi-futex: scheduler support for pi · b29739f9
      Ingo Molnar 提交于
      Add framework to boost/unboost the priority of RT tasks.
      
      This consists of:
      
       - caching the 'normal' priority in ->normal_prio
       - providing a functions to set/get the priority of the task
       - make sched_setscheduler() aware of boosting
      
      The effective_prio() cleanups also fix a priority-calculation bug pointed out
      by Andrey Gelman, in set_user_nice().
      
      has_rt_policy() fix: Peter Williams <pwil3058@bigpond.net.au>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Andrey Gelman <agelman@012.net.il>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b29739f9
    • S
      [PATCH] sched: mc/smt power savings sched policy · 5c45bf27
      Siddha, Suresh B 提交于
      sysfs entries 'sched_mc_power_savings' and 'sched_smt_power_savings' in
      /sys/devices/system/cpu/ control the MC/SMT power savings policy for the
      scheduler.
      
      Based on the values (1-enable, 0-disable) for these controls, sched groups
      cpu power will be determined for different domains.  When power savings
      policy is enabled and under light load conditions, scheduler will minimize
      the physical packages/cpu cores carrying the load and thus conserving
      power(with a perf impact based on the workload characteristics...  see OLS
      2005 CMP kernel scheduler paper for more details..)
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Con Kolivas <kernel@kolivas.org>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5c45bf27
    • S
      [PATCH] sched_domain: handle kmalloc failure · 51888ca2
      Srivatsa Vaddagiri 提交于
      Try to handle mem allocation failures in build_sched_domains by bailing out
      and cleaning up thus-far allocated memory.  The patch has a direct consequence
      that we disable load balancing completely (even at sibling level) upon *any*
      memory allocation failure.
      
      [Lee.Schermerhorn@hp.com: bugfix]
      Signed-off-by: NSrivatsa Vaddagir <vatsa@in.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      51888ca2
    • P
      [PATCH] sched: implement smpnice · 2dd73a4f
      Peter Williams 提交于
      Problem:
      
      The introduction of separate run queues per CPU has brought with it "nice"
      enforcement problems that are best described by a simple example.
      
      For the sake of argument suppose that on a single CPU machine with a
      nice==19 hard spinner and a nice==0 hard spinner running that the nice==0
      task gets 95% of the CPU and the nice==19 task gets 5% of the CPU.  Now
      suppose that there is a system with 2 CPUs and 2 nice==19 hard spinners and
      2 nice==0 hard spinners running.  The user of this system would be entitled
      to expect that the nice==0 tasks each get 95% of a CPU and the nice==19
      tasks only get 5% each.  However, whether this expectation is met is pretty
      much down to luck as there are four equally likely distributions of the
      tasks to the CPUs that the load balancing code will consider to be balanced
      with loads of 2.0 for each CPU.  Two of these distributions involve one
      nice==0 and one nice==19 task per CPU and in these circumstances the users
      expectations will be met.  The other two distributions both involve both
      nice==0 tasks being on one CPU and both nice==19 being on the other CPU and
      each task will get 50% of a CPU and the user's expectations will not be
      met.
      
      Solution:
      
      The solution to this problem that is implemented in the attached patch is
      to use weighted loads when determining if the system is balanced and, when
      an imbalance is detected, to move an amount of weighted load between run
      queues (as opposed to a number of tasks) to restore the balance.  Once
      again, the easiest way to explain why both of these measures are necessary
      is to use a simple example.  Suppose that (in a slight variation of the
      above example) that we have a two CPU system with 4 nice==0 and 4 nice=19
      hard spinning tasks running and that the 4 nice==0 tasks are on one CPU and
      the 4 nice==19 tasks are on the other CPU.  The weighted loads for the two
      CPUs would be 4.0 and 0.2 respectively and the load balancing code would
      move 2 tasks resulting in one CPU with a load of 2.0 and the other with
      load of 2.2.  If this was considered to be a big enough imbalance to
      justify moving a task and that task was moved using the current
      move_tasks() then it would move the highest priority task that it found and
      this would result in one CPU with a load of 3.0 and the other with a load
      of 1.2 which would result in the movement of a task in the opposite
      direction and so on -- infinite loop.  If, on the other hand, an amount of
      load to be moved is calculated from the imbalance (in this case 0.1) and
      move_tasks() skips tasks until it find ones whose contributions to the
      weighted load are less than this amount it would move two of the nice==19
      tasks resulting in a system with 2 nice==0 and 2 nice=19 on each CPU with
      loads of 2.1 for each CPU.
      
      One of the advantages of this mechanism is that on a system where all tasks
      have nice==0 the load balancing calculations would be mathematically
      identical to the current load balancing code.
      
      Notes:
      
      struct task_struct:
      
      has a new field load_weight which (in a trade off of space for speed)
      stores the contribution that this task makes to a CPU's weighted load when
      it is runnable.
      
      struct runqueue:
      
      has a new field raw_weighted_load which is the sum of the load_weight
      values for the currently runnable tasks on this run queue.  This field
      always needs to be updated when nr_running is updated so two new inline
      functions inc_nr_running() and dec_nr_running() have been created to make
      sure that this happens.  This also offers a convenient way to optimize away
      this part of the smpnice mechanism when CONFIG_SMP is not defined.
      
      int try_to_wake_up():
      
      in this function the value SCHED_LOAD_BALANCE is used to represent the load
      contribution of a single task in various calculations in the code that
      decides which CPU to put the waking task on.  While this would be a valid
      on a system where the nice values for the runnable tasks were distributed
      evenly around zero it will lead to anomalous load balancing if the
      distribution is skewed in either direction.  To overcome this problem
      SCHED_LOAD_SCALE has been replaced by the load_weight for the relevant task
      or by the average load_weight per task for the queue in question (as
      appropriate).
      
      int move_tasks():
      
      The modifications to this function were complicated by the fact that
      active_load_balance() uses it to move exactly one task without checking
      whether an imbalance actually exists.  This precluded the simple
      overloading of max_nr_move with max_load_move and necessitated the addition
      of the latter as an extra argument to the function.  The internal
      implementation is then modified to move up to max_nr_move tasks and
      max_load_move of weighted load.  This slightly complicates the code where
      move_tasks() is called and if ever active_load_balance() is changed to not
      use move_tasks() the implementation of move_tasks() should be simplified
      accordingly.
      
      struct sched_group *find_busiest_group():
      
      Similar to try_to_wake_up(), there are places in this function where
      SCHED_LOAD_SCALE is used to represent the load contribution of a single
      task and the same issues are created.  A similar solution is adopted except
      that it is now the average per task contribution to a group's load (as
      opposed to a run queue) that is required.  As this value is not directly
      available from the group it is calculated on the fly as the queues in the
      groups are visited when determining the busiest group.
      
      A key change to this function is that it is no longer to scale down
      *imbalance on exit as move_tasks() uses the load in its scaled form.
      
      void set_user_nice():
      
      has been modified to update the task's load_weight field when it's nice
      value and also to ensure that its run queue's raw_weighted_load field is
      updated if it was runnable.
      
      From: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      
      With smpnice, sched groups with highest priority tasks can mask the imbalance
      between the other sched groups with in the same domain.  This patch fixes some
      of the listed down scenarios by not considering the sched groups which are
      lightly loaded.
      
      a) on a simple 4-way MP system, if we have one high priority and 4 normal
         priority tasks, with smpnice we would like to see the high priority task
         scheduled on one cpu, two other cpus getting one normal task each and the
         fourth cpu getting the remaining two normal tasks.  but with current
         smpnice extra normal priority task keeps jumping from one cpu to another
         cpu having the normal priority task.  This is because of the
         busiest_has_loaded_cpus, nr_loaded_cpus logic..  We are not including the
         cpu with high priority task in max_load calculations but including that in
         total and avg_load calcuations..  leading to max_load < avg_load and load
         balance between cpus running normal priority tasks(2 Vs 1) will always show
         imbalanace as one normal priority and the extra normal priority task will
         keep moving from one cpu to another cpu having normal priority task..
      
      b) 4-way system with HT (8 logical processors).  Package-P0 T0 has a
         highest priority task, T1 is idle.  Package-P1 Both T0 and T1 have 1 normal
         priority task each..  P2 and P3 are idle.  With this patch, one of the
         normal priority tasks on P1 will be moved to P2 or P3..
      
      c) With the current weighted smp nice calculations, it doesn't always make
         sense to look at the highest weighted runqueue in the busy group..
         Consider a load balance scenario on a DP with HT system, with Package-0
         containing one high priority and one low priority, Package-1 containing one
         low priority(with other thread being idle)..  Package-1 thinks that it need
         to take the low priority thread from Package-0.  And find_busiest_queue()
         returns the cpu thread with highest priority task..  And ultimately(with
         help of active load balance) we move high priority task to Package-1.  And
         same continues with Package-0 now, moving high priority task from package-1
         to package-0..  Even without the presence of active load balance, load
         balance will fail to balance the above scenario..  Fix find_busiest_queue
         to use "imbalance" when it is lightly loaded.
      
      [kernel@kolivas.org: sched: store weighted load on up]
      [kernel@kolivas.org: sched: add discrete weighted cpu load function]
      [suresh.b.siddha@intel.com: sched: remove dead code]
      Signed-off-by: NPeter Williams <pwil3058@bigpond.com.au>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: "Chen, Kenneth W" <kenneth.w.chen@intel.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NCon Kolivas <kernel@kolivas.org>
      Cc: John Hawkes <hawkes@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2dd73a4f
  10. 27 6月, 2006 1 次提交
  11. 26 6月, 2006 3 次提交
    • K
      [PATCH] pacct: none-delayed process accounting accumulation · 77787bfb
      KaiGai Kohei 提交于
      In current 2.6.17 implementation, signal_struct refered from task_struct is
      used for per-process data structure.  The pacct facility also uses it as a
      per-process data structure to store stime, utime, minflt, majflt.  But those
      members are saved in __exit_signal().  It's too late.
      
      For example, if some threads exits at same time, pacct facility has a
      possibility to drop accountings for a part of those threads.  (see, the
      following 'The results of original 2.6.17 kernel') I think accounting
      information should be completely collected into the per-process data structure
      before writing out an accounting record.
      
      This patch fixes this matter.  Accumulation of stime, utime, minflt and majflt
      are done before generating accounting record.
      
      [mingo@elte.hu: fix acct_collect() siglock bug found by lockdep]
      Signed-off-by: NKaiGai Kohei <kaigai@ak.jp.nec.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      77787bfb
    • K
      [PATCH] pacct: avoidance to refer the last thread as a representation of the process · f6ec29a4
      KaiGai Kohei 提交于
      When pacct facility generate an 'ac_flag' field in accounting record, it
      refers a task_struct of the thread which died last in the process.  But any
      other task_structs are ignored.
      
      Therefore, pacct facility drops ASU flag even if root-privilege operations are
      used by any other threads except the last one.  In addition, AFORK flag is
      always set when the thread of group-leader didn't die last, although this
      process has called execve() after fork().
      
      We have a same matter in ac_exitcode.  The recorded ac_exitcode is an exit
      code of the last thread in the process.  There is a possibility this exitcode
      is not the group leader's one.
      f6ec29a4
    • K
      [PATCH] pacct: add pacct_struct to fix some pacct bugs. · 0e464814
      KaiGai Kohei 提交于
      The pacct facility need an i/o operation when an accounting record is
      generated.  There is a possibility to wake OOM killer up.  If OOM killer is
      activated, it kills some processes to make them release process memory
      regions.
      
      But acct_process() is called in the killed processes context before calling
      exit_mm(), so those processes cannot release own memory.  In the results, any
      processes stop in this point and it finally cause a system stall.
      0e464814
  12. 23 6月, 2006 2 次提交
    • J
      [PATCH] Kill PF_SYNCWRITE flag · b31dc66a
      Jens Axboe 提交于
      A process flag to indicate whether we are doing sync io is incredibly
      ugly. It also causes performance problems when one does a lot of async
      io and then proceeds to sync it. Part of the io will go out as async,
      and the other part as sync. This causes a disconnect between the
      previously submitted io and the synced io. For io schedulers such as CFQ,
      this will cause us lost merges and suboptimal behaviour in scheduling.
      
      Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
      the O_DIRECT path just directly indicate that the writes are sync
      by using WRITE_SYNC instead.
      Signed-off-by: NJens Axboe <axboe@suse.de>
      b31dc66a
    • E
      [PATCH] ptrace: document the locking rules · 260ea101
      Eric W. Biederman 提交于
      After a lot of reading the code and thinking about how it behaves I have
      managed to figure out what the current ptrace locking rules are.  The
      current code is in much better that it appears at first glance.  The
      troublesome code paths are actually the code paths that violate the current
      rules.
      
      ptrace uses simple exclusive access as it's locking.  You can only touch
      task->ptrace if the task is stopped and you are the ptracer, or if the task
      is running and are the task itself.
      
      Very simple, very easy to maintain.  It just needs to be documented so
      people know not to touch ptrace from elsewhere.
      
      Currently we do have a few pieces of code that are in violation of this
      rule.  Particularly the core dump code, and ptrace_attach.  But so far the
      code looks fixable.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      260ea101
  13. 20 6月, 2006 1 次提交
  14. 27 4月, 2006 1 次提交
  15. 25 4月, 2006 1 次提交