1. 11 9月, 2005 6 次提交
    • I
      [PATCH] sched: TASK_NONINTERACTIVE · d79fc0fc
      Ingo Molnar 提交于
      This patch implements a task state bit (TASK_NONINTERACTIVE), which can be
      used by blocking points to mark the task's wait as "non-interactive".  This
      does not mean the task will be considered a CPU-hog - the wait will simply
      not have an effect on the waiting task's priority - positive or negative
      alike.  Right now only pipe_wait() will make use of it, because it's a
      common source of not-so-interactive waits (kernel compilation jobs, etc.).
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d79fc0fc
    • I
      [PATCH] sched cleanups · 95cdf3b7
      Ingo Molnar 提交于
      whitespace cleanups.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      95cdf3b7
    • M
      [PATCH] sched: make idlest_group/cpu cpus_allowed-aware · da5a5522
      M.Baris Demiray 提交于
      Add relevant checks into find_idlest_group() and find_idlest_cpu() to make
      them return only the groups that have allowed CPUs and allowed CPUs
      respectively.
      Signed-off-by: NM.Baris Demiray <baris@labristeknoloji.com>
      Signed-off-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      da5a5522
    • C
      [PATCH] sched: run SCHED_NORMAL tasks with real time tasks on SMT siblings · fc38ed75
      Con Kolivas 提交于
      The hyperthread aware nice handling currently puts to sleep any non real
      time task when a real time task is running on its sibling cpu.  This can
      lead to prolonged starvation by having the non real time task pegged to the
      cpu with load balancing not pulling that task away.
      
      Currently we force lower priority hyperthread tasks to run a percentage of
      time difference based on timeslice differences which is meaningless when
      comparing real time tasks to SCHED_NORMAL tasks.  We can allow non real
      time tasks to run with real time tasks on the sibling up to per_cpu_gain%
      if we use jiffies as a counter.
      
      Cleanups and micro-optimisations to the relevant code section should make
      it more understandable as well.
      Signed-off-by: NCon Kolivas <kernel@kolivas.org>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fc38ed75
    • P
      [PATCH] cpuset semaphore depth check deadlock fix · 4247bdc6
      Paul Jackson 提交于
      The cpusets-formalize-intermediate-gfp_kernel-containment patch
      has a deadlock problem.
      
      This patch was part of a set of four patches to make more
      extensive use of the cpuset 'mem_exclusive' attribute to
      manage kernel GFP_KERNEL memory allocations and to constrain
      the out-of-memory (oom) killer.
      
      A task that is changing cpusets in particular ways on a system
      when it is very short of free memory could double trip over
      the global cpuset_sem semaphore (get the lock and then deadlock
      trying to get it again).
      
      The second attempt to get cpuset_sem would be in the routine
      cpuset_zone_allowed().  This was discovered by code inspection.
      I can not reproduce the problem except with an artifically
      hacked kernel and a specialized stress test.
      
      In real life you cannot hit this unless you are manipulating
      cpusets, and are very unlikely to hit it unless you are rapidly
      modifying cpusets on a memory tight system.  Even then it would
      be a rare occurence.
      
      If you did hit it, the task double tripping over cpuset_sem
      would deadlock in the kernel, and any other task also trying
      to manipulate cpusets would deadlock there too, on cpuset_sem.
      Your batch manager would be wedged solid (if it was cpuset
      savvy), but classic Unix shells and utilities would work well
      enough to reboot the system.
      
      The unusual condition that led to this bug is that unlike most
      semaphores, cpuset_sem _can_ be acquired while in the page
      allocation code, when __alloc_pages() calls cpuset_zone_allowed.
      So it easy to mistakenly perform the following sequence:
        1) task makes system call to alter a cpuset
        2) take cpuset_sem
        3) try to allocate memory
        4) memory allocator, via cpuset_zone_allowed, trys to take cpuset_sem
        5) deadlock
      
      The reason that this is not a serious bug for most users
      is that almost all calls to allocate memory don't require
      taking cpuset_sem.  Only some code paths off the beaten
      track require taking cpuset_sem -- which is good.  Taking
      a global semaphore on the main code path for allocating
      memory would not scale well.
      
      This patch fixes this deadlock by wrapping the up() and down()
      calls on cpuset_sem in kernel/cpuset.c with code that tracks
      the nesting depth of the current task on that semaphore, and
      only does the real down() if the task doesn't hold the lock
      already, and only does the real up() if the nesting depth
      (number of unmatched downs) is exactly one.
      
      The previous required use of refresh_mems(), anytime that
      the cpuset_sem semaphore was acquired and the code executed
      while holding that semaphore might try to allocate memory, is
      no longer required.  Two refresh_mems() calls were removed
      thanks to this.  This is a good change, as failing to get
      all the necessary refresh_mems() calls placed was a primary
      source of bugs in this cpuset code.  The only remaining call
      to refresh_mems() is made while doing a memory allocation,
      if certain task memory placement data needs to be updated
      from its cpuset, due to the cpuset having been changed behind
      the tasks back.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4247bdc6
    • I
      [PATCH] spinlock consolidation · fb1c8f93
      Ingo Molnar 提交于
      This patch (written by me and also containing many suggestions of Arjan van
      de Ven) does a major cleanup of the spinlock code.  It does the following
      things:
      
       - consolidates and enhances the spinlock/rwlock debugging code
      
       - simplifies the asm/spinlock.h files
      
       - encapsulates the raw spinlock type and moves generic spinlock
         features (such as ->break_lock) into the generic code.
      
       - cleans up the spinlock code hierarchy to get rid of the spaghetti.
      
      Most notably there's now only a single variant of the debugging code,
      located in lib/spinlock_debug.c.  (previously we had one SMP debugging
      variant per architecture, plus a separate generic one for UP builds)
      
      Also, i've enhanced the rwlock debugging facility, it will now track
      write-owners.  There is new spinlock-owner/CPU-tracking on SMP builds too.
      All locks have lockup detection now, which will work for both soft and hard
      spin/rwlock lockups.
      
      The arch-level include files now only contain the minimally necessary
      subset of the spinlock code - all the rest that can be generalized now
      lives in the generic headers:
      
       include/asm-i386/spinlock_types.h       |   16
       include/asm-x86_64/spinlock_types.h     |   16
      
      I have also split up the various spinlock variants into separate files,
      making it easier to see which does what. The new layout is:
      
         SMP                         |  UP
         ----------------------------|-----------------------------------
         asm/spinlock_types_smp.h    |  linux/spinlock_types_up.h
         linux/spinlock_types.h      |  linux/spinlock_types.h
         asm/spinlock_smp.h          |  linux/spinlock_up.h
         linux/spinlock_api_smp.h    |  linux/spinlock_api_up.h
         linux/spinlock.h            |  linux/spinlock.h
      
      /*
       * here's the role of the various spinlock/rwlock related include files:
       *
       * on SMP builds:
       *
       *  asm/spinlock_types.h: contains the raw_spinlock_t/raw_rwlock_t and the
       *                        initializers
       *
       *  linux/spinlock_types.h:
       *                        defines the generic type and initializers
       *
       *  asm/spinlock.h:       contains the __raw_spin_*()/etc. lowlevel
       *                        implementations, mostly inline assembly code
       *
       *   (also included on UP-debug builds:)
       *
       *  linux/spinlock_api_smp.h:
       *                        contains the prototypes for the _spin_*() APIs.
       *
       *  linux/spinlock.h:     builds the final spin_*() APIs.
       *
       * on UP builds:
       *
       *  linux/spinlock_type_up.h:
       *                        contains the generic, simplified UP spinlock type.
       *                        (which is an empty structure on non-debug builds)
       *
       *  linux/spinlock_types.h:
       *                        defines the generic type and initializers
       *
       *  linux/spinlock_up.h:
       *                        contains the __raw_spin_*()/etc. version of UP
       *                        builds. (which are NOPs on non-debug, non-preempt
       *                        builds)
       *
       *   (included on UP-non-debug builds:)
       *
       *  linux/spinlock_api_up.h:
       *                        builds the _spin_*() APIs.
       *
       *  linux/spinlock.h:     builds the final spin_*() APIs.
       */
      
      All SMP and UP architectures are converted by this patch.
      
      arm, i386, ia64, ppc, ppc64, s390/s390x, x64 was build-tested via
      crosscompilers.  m32r, mips, sh, sparc, have not been tested yet, but should
      be mostly fine.
      
      From: Grant Grundler <grundler@parisc-linux.org>
      
        Booted and lightly tested on a500-44 (64-bit, SMP kernel, dual CPU).
        Builds 32-bit SMP kernel (not booted or tested).  I did not try to build
        non-SMP kernels.  That should be trivial to fix up later if necessary.
      
        I converted bit ops atomic_hash lock to raw_spinlock_t.  Doing so avoids
        some ugly nesting of linux/*.h and asm/*.h files.  Those particular locks
        are well tested and contained entirely inside arch specific code.  I do NOT
        expect any new issues to arise with them.
      
       If someone does ever need to use debug/metrics with them, then they will
        need to unravel this hairball between spinlocks, atomic ops, and bit ops
        that exist only because parisc has exactly one atomic instruction: LDCW
        (load and clear word).
      
      From: "Luck, Tony" <tony.luck@intel.com>
      
         ia64 fix
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjanv@infradead.org>
      Signed-off-by: NGrant Grundler <grundler@parisc-linux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Signed-off-by: NHirokazu Takata <takata@linux-m32r.org>
      Signed-off-by: NMikael Pettersson <mikpe@csd.uu.se>
      Signed-off-by: NBenoit Boissinot <benoit.boissinot@ens-lyon.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fb1c8f93
  2. 10 9月, 2005 7 次提交
  3. 08 9月, 2005 23 次提交
    • K
      [PATCH] kprobes: fix bug when probed on task and isr functions · deac66ae
      Keshavamurthy Anil S 提交于
      This patch fixes a race condition where in system used to hang or sometime
      crash within minutes when kprobes are inserted on ISR routine and a task
      routine.
      
      The fix has been stress tested on i386, ia64, pp64 and on x86_64.  To
      reproduce the problem insert kprobes on schedule() and do_IRQ() functions
      and you should see hang or system crash.
      Signed-off-by: NAnil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Signed-off-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Acked-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      deac66ae
    • P
      [PATCH] Kprobes: prevent possible race conditions generic · d0aaff97
      Prasanna S Panchamukhi 提交于
      There are possible race conditions if probes are placed on routines within the
      kprobes files and routines used by the kprobes.  For example if you put probe
      on get_kprobe() routines, the system can hang while inserting probes on any
      routine such as do_fork().  Because while inserting probes on do_fork(),
      register_kprobes() routine grabs the kprobes spin lock and executes
      get_kprobe() routine and to handle probe of get_kprobe(), kprobes_handler()
      gets executed and tries to grab kprobes spin lock, and spins forever.  This
      patch avoids such possible race conditions by preventing probes on routines
      within the kprobes file and routines used by kprobes.
      
      I have modified the patches as per Andi Kleen's suggestion to move kprobes
      routines and other routines used by kprobes to a seperate section
      .kprobes.text.
      
      Also moved page fault and exception handlers, general protection fault to
      .kprobes.text section.
      
      These patches have been tested on i386, x86_64 and ppc64 architectures, also
      compiled on ia64 and sparc64 architectures.
      Signed-off-by: NPrasanna S Panchamukhi <prasanna@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d0aaff97
    • P
      [PATCH] introduce and use kzalloc · dd392710
      Pekka J Enberg 提交于
      This patch introduces a kzalloc wrapper and converts kernel/ to use it.  It
      saves a little program text.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dd392710
    • M
      [PATCH] remove duplicated code from proc and ptrace · ab8d11be
      Miklos Szeredi 提交于
      Extract common code used by ptrace_attach() and may_ptrace_attach()
      into a separate function.
      Signed-off-by: NMiklos Szeredi <miklos@szeredi.hu>
      Cc: <viro@parcelfarce.linux.theplanet.co.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab8d11be
    • J
      [PATCH] cpusets: re-enable "dynamic sched domains" · 0811bab2
      John Hawkes 提交于
      Revert the hack introduced last week.
      Signed-off-by: NJohn Hawkes <hawkes@sgi.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0811bab2
    • J
      [PATCH] cpusets: fix the "dynamic sched domains" bug · d1b55138
      John Hawkes 提交于
      For a NUMA system with multiple CPUs per node, declaring a cpu-exclusive
      cpuset that includes only some, but not all, of the CPUs in a node will mangle
      the sched domain structures.
      Signed-off-by: NJohn Hawkes <hawkes@sgi.com>
      Cc; Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d1b55138
    • J
      9c1cfda2
    • P
      [PATCH] cpusets: confine oom_killer to mem_exclusive cpuset · ef08e3b4
      Paul Jackson 提交于
      Now the real motivation for this cpuset mem_exclusive patch series seems
      trivial.
      
      This patch keeps a task in or under one mem_exclusive cpuset from provoking an
      oom kill of a task under a non-overlapping mem_exclusive cpuset.  Since only
      interrupt and GFP_ATOMIC allocations are allowed to escape mem_exclusive
      containment, there is little to gain from oom killing a task under a
      non-overlapping mem_exclusive cpuset, as almost all kernel and user memory
      allocation must come from disjoint memory nodes.
      
      This patch enables configuring a system so that a runaway job under one
      mem_exclusive cpuset cannot cause the killing of a job in another such cpuset
      that might be using very high compute and memory resources for a prolonged
      time.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ef08e3b4
    • P
      [PATCH] cpusets: formalize intermediate GFP_KERNEL containment · 9bf2229f
      Paul Jackson 提交于
      This patch makes use of the previously underutilized cpuset flag
      'mem_exclusive' to provide what amounts to another layer of memory placement
      resolution.  With this patch, there are now the following four layers of
      memory placement available:
      
       1) The whole system (interrupt and GFP_ATOMIC allocations can use this),
       2) The nearest enclosing mem_exclusive cpuset (GFP_KERNEL allocations can use),
       3) The current tasks cpuset (GFP_USER allocations constrained to here), and
       4) Specific node placement, using mbind and set_mempolicy.
      
      These nest - each layer is a subset (same or within) of the previous.
      
      Layer (2) above is new, with this patch.  The call used to check whether a
      zone (its node, actually) is in a cpuset (in its mems_allowed, actually) is
      extended to take a gfp_mask argument, and its logic is extended, in the case
      that __GFP_HARDWALL is not set in the flag bits, to look up the cpuset
      hierarchy for the nearest enclosing mem_exclusive cpuset, to determine if
      placement is allowed.  The definition of GFP_USER, which used to be identical
      to GFP_KERNEL, is changed to also set the __GFP_HARDWALL bit, in the previous
      cpuset_gfp_hardwall_flag patch.
      
      GFP_ATOMIC and GFP_KERNEL allocations will stay within the current tasks
      cpuset, so long as any node therein is not too tight on memory, but will
      escape to the larger layer, if need be.
      
      The intended use is to allow something like a batch manager to handle several
      jobs, each job in its own cpuset, but using common kernel memory for caches
      and such.  Swapper and oom_kill activity is also constrained to Layer (2).  A
      task in or below one mem_exclusive cpuset should not cause swapping on nodes
      in another non-overlapping mem_exclusive cpuset, nor provoke oom_killing of a
      task in another such cpuset.  Heavy use of kernel memory for i/o caching and
      such by one job should not impact the memory available to jobs in other
      non-overlapping mem_exclusive cpusets.
      
      This patch enables providing hardwall, inescapable cpusets for memory
      allocations of each job, while sharing kernel memory allocations between
      several jobs, in an enclosing mem_exclusive cpuset.
      
      Like Dinakar's patch earlier to enable administering sched domains using the
      cpu_exclusive flag, this patch also provides a useful meaning to a cpuset flag
      that had previously done nothing much useful other than restrict what cpuset
      configurations were allowed.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9bf2229f
    • P
      [PATCH] futex: remove duplicate code · 39ed3fde
      Pekka Enberg 提交于
      This patch cleans up the error path of futex_fd() by removing duplicate
      code.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      39ed3fde
    • O
      [PATCH] fix send_sigqueue() vs thread exit race · e752dd6c
      Oleg Nesterov 提交于
      posix_timer_event() first checks that the thread (SIGEV_THREAD_ID case)
      does not have PF_EXITING flag, then it calls send_sigqueue() which locks
      task list.  But if the thread exits in between the kernel will oops
      (->sighand == NULL after __exit_sighand).
      
      This patch moves the PF_EXITING check into the send_sigqueue(), it must be
      done atomically under tasklist_lock.  When send_sigqueue() detects exiting
      thread it returns -1.  In that case posix_timer_event will send the signal
      to thread group.
      
      Also, this patch fixes task_struct use-after-free in posix_timer_event.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e752dd6c
    • J
      [PATCH] remove a redundant variable in sys_prctl() · 0730ded5
      Jesper Juhl 提交于
      The patch removes a redundant variable `sig' from sys_prctl().
      
      For some reason, when sys_prctl is called with option == PR_SET_PDEATHSIG
      then the value of arg2 is assigned to an int variable named sig.  Then sig
      is tested with valid_signal() and later used to set the value of
      current->pdeath_signal .
      
      There is no reason to use this intermediate variable since valid_signal()
      takes a unsigned long argument, so it can handle being passed arg2
      directly, and if the call to valid_signal is OK, then we know the value of
      arg2 is in the range zero to _NSIG and thus it'll easily fit in a plain int
      and thus there's no problem assigning it later to current->pdeath_signal
      (which is an int).
      
      The patch gets rid of the pointless variable `sig'.
      This reduces the size of kernel/sys.o in 2.6.13-rc6-mm1 by 32 bytes on my
      system.
      
      Patch has been compile tested, boot tested, and just to make damn sure I
      didn't break anything I wrote a quick test app that calls
      prctl(PR_SET_PDEATHSIG ...) with the entire range of values for a
      unsigned long, and it behaves as expected with and without the patch.
      Signed-off-by: NJesper Juhl <jesper.juhl@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0730ded5
    • P
      [PATCH] largefile support for accounting · 6c9c0b52
      Peter Staubach 提交于
      There is a problem in the accounting subsystem in the kernel can not
      correctly handle files larger than 2GB.  The output file containing the
      process accounting data can grow very large if the system is large enough
      and active enough.  If the 2GB limit is reached, then the system simply
      stops storing process accounting data.
      
      Another annoying problem is that once the system reaches this 2GB limit,
      then every process which exits will receive a signal, SIGXFSZ.  This signal
      is generated because an attempt was made to write beyond the limit for the
      file descriptor.  This signal makes it look like every process has exited
      due to a signal, when in fact, they have not.
      
      The solution is to add the O_LARGEFILE flag to the list of flags used to
      open the accounting file.  The rest of the accounting support is already
      largefile safe.
      
      The changes were tested by constructing a large file (just short of 2GB),
      enabling accounting, and then running enough commands to cause the
      accounting data generated to increase the size of the file to 2GB.  Without
      the changes, the file grows to 2GB and the last command run in the test
      script appears to exit due a signal when it has not.  With the changes,
      things work as expected and quietly.
      
      There are some user level changes required so that it can deal with
      largefiles, but those are being handled separately.
      Signed-off-by: NPeter Staubach <staubach@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6c9c0b52
    • O
      [PATCH] do_notify_parent_cldstop() cleanup · bc505a47
      Oleg Nesterov 提交于
      This patch simplifies the usage of do_notify_parent_cldstop(), it lessens
      the source and .text size slightly, and makes the code (in my opinion) a
      bit more readable.
      
      I am sending this patch now because I'm afraid Paul will touch
      do_notify_parent_cldstop() really soon, It's better to cleanup first.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bc505a47
    • K
      [PATCH] CHECK_IRQ_PER_CPU() to avoid dead code in __do_IRQ() · f26fdd59
      Karsten Wiese 提交于
      IRQ_PER_CPU is not used by all architectures.  This patch introduces the
      macros ARCH_HAS_IRQ_PER_CPU and CHECK_IRQ_PER_CPU() to avoid the generation
      of dead code in __do_IRQ().
      
      ARCH_HAS_IRQ_PER_CPU is defined by architectures using IRQ_PER_CPU in their
      include/asm_ARCH/irq.h file.
      
      Through grepping the tree I found the following architectures currently use
      IRQ_PER_CPU:
      
              cris, ia64, ppc, ppc64 and parisc.
      Signed-off-by: NKarsten Wiese <annabellesgarden@yahoo.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f26fdd59
    • M
      [PATCH] create_workqueue_thread() signedness fix · 230649da
      Mika Kukkonen 提交于
      With "-W -Wno-unused -Wno-sign-compare" I get the following compile warning:
      
        CC      kernel/workqueue.o
      kernel/workqueue.c: In function `workqueue_cpu_callback':
      kernel/workqueue.c:504: warning: ordered comparison of pointer with integer zero
      
      On error create_workqueue_thread() returns NULL, not negative pointer, so
      following trivial patch suggests itself.
      Signed-off-by: NMika Kukkonen <mikukkon@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      230649da
    • T
      [PATCH] flush icache early when loading module · 378bac82
      Thomas Koeller 提交于
      Change the sequence of operations performed during module loading to flush
      the instruction cache before module parameters are processed.  If a module
      has parameters of an unusual type that cannot be handled using the standard
      accessor functions param_set_xxx and param_get_xxx, it has to to provide a
      set of accessor functions for this type.  This requires module code to be
      executed during parameter processing, which is of course only possible
      after the icache has been flushed.
      Signed-off-by: NThomas Koeller <thomas@koeller.dyndns.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      378bac82
    • A
      [PATCH] optimize writer path in time_interpolator_get_counter() · 486d46ae
      Alex Williamson 提交于
            Christoph Lameter <clameter@engr.sgi.com>
      
      When using a time interpolator that is susceptible to jitter there's
      potentially contention over a cmpxchg used to prevent time from going
      backwards.  This is unnecessary when the caller holds the xtime write
      seqlock as all readers will be blocked from returning until the write is
      complete.  We can therefore allow writers to insert a new value and exit
      rather than fight with CPUs who only hold a reader lock.
      Signed-off-by: NAlex Williamson <alex.williamson@hp.com>
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      486d46ae
    • D
      [PATCH] Provide better printk() support for SMP machines · fe21773d
      David Howells 提交于
      The attached patch prevents oopses interleaving with characters from
      other printks on other CPUs by only breaking the lock if the oops is
      happening on the machine holding the lock.
      
      It might be better if the oops generator got the lock and then called an
      inner vprintk routine that assumed the caller holds the lock, thus
      making oops reports "atomic".
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fe21773d
    • I
      [PATCH] detect soft lockups · 8446f1d3
      Ingo Molnar 提交于
      This patch adds a new kernel debug feature: CONFIG_DETECT_SOFTLOCKUP.
      
      When enabled then per-CPU watchdog threads are started, which try to run
      once per second.  If they get delayed for more than 10 seconds then a
      callback from the timer interrupt detects this condition and prints out a
      warning message and a stack dump (once per lockup incident).  The feature
      is otherwise non-intrusive, it doesnt try to unlock the box in any way, it
      only gets the debug info out, automatically, and on all CPUs affected by
      the lockup.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-Off-By: NMatthias Urlichs <smurf@smurf.noris.de>
      Signed-off-by: NRichard Purdie <rpurdie@rpsys.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8446f1d3
    • J
      [PATCH] FUTEX_WAKE_OP: pthread_cond_signal() speedup · 4732efbe
      Jakub Jelinek 提交于
      ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter
      (which at least on UP usually means an immediate context switch to one of
      the waiter threads).  This waiter wakes up and after a few instructions it
      attempts to acquire the cv internal lock, but that lock is still held by
      the thread calling pthread_cond_signal.  So it goes to sleep and eventually
      the signalling thread is scheduled in, unlocks the internal lock and wakes
      the waiter again.
      
      Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal
      to avoid this performance issue, but it was removed when locks were
      redesigned to the 3 state scheme (unlocked, locked uncontended, locked
      contended).
      
      Following scenario shows why simply using FUTEX_REQUEUE in
      pthread_cond_signal together with using lll_mutex_unlock_force in place of
      lll_mutex_unlock is not enough and probably why it has been disabled at
      that time:
      
      The number is value in cv->__data.__lock.
              thr1            thr2            thr3
      0       pthread_cond_wait
      1       lll_mutex_lock (cv->__data.__lock)
      0       lll_mutex_unlock (cv->__data.__lock)
      0       lll_futex_wait (&cv->__data.__futex, futexval)
      0                       pthread_cond_signal
      1                       lll_mutex_lock (cv->__data.__lock)
      1                                       pthread_cond_signal
      2                                       lll_mutex_lock (cv->__data.__lock)
      2                                         lll_futex_wait (&cv->__data.__lock, 2)
      2                       lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock)
                                # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE
      2                       lll_mutex_unlock_force (cv->__data.__lock)
      0                         cv->__data.__lock = 0
      0                         lll_futex_wake (&cv->__data.__lock, 1)
      1       lll_mutex_lock (cv->__data.__lock)
      0       lll_mutex_unlock (cv->__data.__lock)
                # Here, lll_mutex_unlock doesn't know there are threads waiting
                # on the internal cv's lock
      
      Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal,
      but it will cost us not one, but 2 extra syscalls and, what's worse, one of
      these extra syscalls will be done for every single waiting loop in
      pthread_cond_*wait.
      
      We would need to use lll_mutex_unlock_force in pthread_cond_signal after
      requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait.
      
      Another alternative is to do the unlocking pthread_cond_signal needs to do
      (the lock can't be unlocked before lll_futex_wake, as that is racy) in the
      kernel.
      
      I have implemented both variants, futex-requeue-glibc.patch is the first
      one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel.
       The kernel interface allows userland to specify how exactly an unlocking
      operation should look like (some atomic arithmetic operation with optional
      constant argument and comparison of the previous futex value with another
      constant).
      
      It has been implemented just for ppc*, x86_64 and i?86, for other
      architectures I'm including just a stub header which can be used as a
      starting point by maintainers to write support for their arches and ATM
      will just return -ENOSYS for FUTEX_WAKE_OP.  The requeue patch has been
      (lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running
      32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL.
      
      With the following benchmark on UP x86-64 I get:
      
      for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \
      for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done
      time elf/ld.so --library-path .:nptl-orig /tmp/bench
      real 0m0.655s user 0m0.253s sys 0m0.403s
      real 0m0.657s user 0m0.269s sys 0m0.388s
      time elf/ld.so --library-path .:nptl-requeue /tmp/bench
      real 0m0.496s user 0m0.225s sys 0m0.271s
      real 0m0.531s user 0m0.242s sys 0m0.288s
      time elf/ld.so --library-path .:nptl-wake_op /tmp/bench
      real 0m0.380s user 0m0.176s sys 0m0.204s
      real 0m0.382s user 0m0.175s sys 0m0.207s
      
      The benchmark is at:
      http://sourceware.org/ml/libc-alpha/2005-03/txt00001.txt
      Older futex-requeue-glibc.patch version is at:
      http://sourceware.org/ml/libc-alpha/2005-03/txt00002.txt
      Older futex-wake_op-glibc.patch version is at:
      http://sourceware.org/ml/libc-alpha/2005-03/txt00003.txt
      Will post a new version (just x86-64 fixes so that the patch
      applies against pthread_cond_signal.S) to libc-hacker ml soon.
      
      Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded
      testcase that will not test the atomicity of the operation, but at least
      check if the threads that should have been woken up are woken up and
      whether the arithmetic operation in the kernel gave the expected results.
      Acked-by: NIngo Molnar <mingo@redhat.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: Jamie Lokier <jamie@shareable.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NYoichi Yuasa <yuasa@hh.iij4u.or.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4732efbe
    • P
      [PATCH] swsusp: update documentation · d7ae79c7
      Pavel Machek 提交于
      This updates documentation a bit (mostly removing obsolete stuff), and
      marks swsusp as no longer experimental in config.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d7ae79c7
    • A
      [PATCH] x86/x86_64: deferred handling of writes to /proc/irqxx/smp_affinity · 54d5d424
      Ashok Raj 提交于
      When handling writes to /proc/irq, current code is re-programming rte
      entries directly. This is not recommended and could potentially cause
      chipset's to lockup, or cause missing interrupts.
      
      CONFIG_IRQ_BALANCE does this correctly, where it re-programs only when the
      interrupt is pending. The same needs to be done for /proc/irq handling as well.
      Otherwise user space irq balancers are really not doing the right thing.
      
      - Changed pending_irq_balance_cpumask to pending_irq_migrate_cpumask for
        lack of a generic name.
      - added move_irq out of IRQ_BALANCE, and added this same to X86_64
      - Added new proc handler for write, so we can do deferred write at irq
        handling time.
      - Display of /proc/irq/XX/smp_affinity used to display CPU_MASKALL, instead
        it now shows only active cpu masks, or exactly what was set.
      - Provided a common move_irq implementation, instead of duplicating
        when using generic irq framework.
      
      Tested on i386/x86_64 and ia64 with CONFIG_PCI_MSI turned on and off.
      Tested UP builds as well.
      
      MSI testing: tbd: I have cards, need to look for a x-over cable, although I
      did test an earlier version of this patch.  Will test in a couple days.
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      Acked-by: NZwane Mwaikambo <zwane@holomorphy.com>
      Grudgingly-acked-by: NAndi Kleen <ak@muc.de>
      Signed-off-by: NCoywolf Qi Hunt <coywolf@lovecn.org>
      Signed-off-by: NAshok Raj <ashok.raj@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      54d5d424
  4. 07 9月, 2005 1 次提交
  5. 05 9月, 2005 3 次提交
    • L
      [PATCH] UML Support - Ptrace: adds the host SYSEMU support, for UML and general usage · ed75e8d5
      Laurent Vivier 提交于
            Jeff Dike <jdike@addtoit.com>,
            Paolo 'Blaisorblade' Giarrusso <blaisorblade_spam@yahoo.it>,
            Bodo Stroesser <bstroesser@fujitsu-siemens.com>
      
      Adds a new ptrace(2) mode, called PTRACE_SYSEMU, resembling PTRACE_SYSCALL
      except that the kernel does not execute the requested syscall; this is useful
      to improve performance for virtual environments, like UML, which want to run
      the syscall on their own.
      
      In fact, using PTRACE_SYSCALL means stopping child execution twice, on entry
      and on exit, and each time you also have two context switches; with SYSEMU you
      avoid the 2nd stop and so save two context switches per syscall.
      
      Also, some architectures don't have support in the host for changing the
      syscall number via ptrace(), which is currently needed to skip syscall
      execution (UML turns any syscall into getpid() to avoid it being executed on
      the host).  Fixing that is hard, while SYSEMU is easier to implement.
      
      * This version of the patch includes some suggestions of Jeff Dike to avoid
        adding any instructions to the syscall fast path, plus some other little
        changes, by myself, to make it work even when the syscall is executed with
        SYSENTER (but I'm unsure about them). It has been widely tested for quite a
        lot of time.
      
      * Various fixed were included to handle the various switches between
        various states, i.e. when for instance a syscall entry is traced with one of
        PT_SYSCALL / _SYSEMU / _SINGLESTEP and another one is used on exit.
        Basically, this is done by remembering which one of them was used even after
        the call to ptrace_notify().
      
      * We're combining TIF_SYSCALL_EMU with TIF_SYSCALL_TRACE or TIF_SINGLESTEP
        to make do_syscall_trace() notice that the current syscall was started with
        SYSEMU on entry, so that no notification ought to be done in the exit path;
        this is a bit of a hack, so this problem is solved in another way in next
        patches.
      
      * Also, the effects of the patch:
      "Ptrace - i386: fix Syscall Audit interaction with singlestep"
      are cancelled; they are restored back in the last patch of this series.
      
      Detailed descriptions of the patches doing this kind of processing follow (but
      I've already summed everything up).
      
      * Fix behaviour when changing interception kind #1.
      
        In do_syscall_trace(), we check the status of the TIF_SYSCALL_EMU flag
        only after doing the debugger notification; but the debugger might have
        changed the status of this flag because he continued execution with
        PTRACE_SYSCALL, so this is wrong.  This patch fixes it by saving the flag
        status before calling ptrace_notify().
      
      * Fix behaviour when changing interception kind #2:
        avoid intercepting syscall on return when using SYSCALL again.
      
        A guest process switching from using PTRACE_SYSEMU to PTRACE_SYSCALL
        crashes.
      
        The problem is in arch/i386/kernel/entry.S.  The current SYSEMU patch
        inhibits the syscall-handler to be called, but does not prevent
        do_syscall_trace() to be called after this for syscall completion
        interception.
      
        The appended patch fixes this.  It reuses the flag TIF_SYSCALL_EMU to
        remember "we come from PTRACE_SYSEMU and now are in PTRACE_SYSCALL", since
        the flag is unused in the depicted situation.
      
      * Fix behaviour when changing interception kind #3:
        avoid intercepting syscall on return when using SINGLESTEP.
      
        When testing 2.6.9 and the skas3.v6 patch, with my latest patch and had
        problems with singlestepping on UML in SKAS with SYSEMU.  It looped
        receiving SIGTRAPs without moving forward.  EIP of the traced process was
        the same for all SIGTRAPs.
      
      What's missing is to handle switching from PTRACE_SYSCALL_EMU to
      PTRACE_SINGLESTEP in a way very similar to what is done for the change from
      PTRACE_SYSCALL_EMU to PTRACE_SYSCALL_TRACE.
      
      I.e., after calling ptrace(PTRACE_SYSEMU), on the return path, the debugger is
      notified and then wake ups the process; the syscall is executed (or skipped,
      when do_syscall_trace() returns 0, i.e.  when using PTRACE_SYSEMU), and
      do_syscall_trace() is called again.  Since we are on the return path of a
      SYSEMU'd syscall, if the wake up is performed through ptrace(PTRACE_SYSCALL),
      we must still avoid notifying the parent of the syscall exit.  Now, this
      behaviour is extended even to resuming with PTRACE_SINGLESTEP.
      Signed-off-by: NPaolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ed75e8d5
    • P
      [PATCH] pm: clean up /sys/power/disk · 57c4ce3c
      Pavel Machek 提交于
      Clean code up a bit, and only show suspend to disk as available when
      it is configured in.
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      57c4ce3c
    • P
      [PATCH] pm: fix process freezing · 6161b2ce
      Pavel Machek 提交于
      If process freezing fails, some processes are frozen, and rest are left in
      "were asked to be frozen" state.  Thats wrong, we should leave it in some
      consistent state.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      6161b2ce