1. 01 10月, 2006 19 次提交
    • J
      [PATCH] csa: basic accounting over taskstats · f3cef7a9
      Jay Lan 提交于
      Add some basic accounting fields to the taskstats struct, add a new
      kernel/tsacct.c to handle basic accounting data handling upon exit.  A handle
      is added to taskstats.c to invoke the basic accounting data handling.
      Signed-off-by: NJay Lan <jlan@sgi.com>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Jes Sorensen <jes@sgi.com>
      Cc: Chris Sturtivant <csturtiv@sgi.com>
      Cc: Tony Ernst <tee@sgi.com>
      Cc: Guillaume Thouvenin <guillaume.thouvenin@bull.net>
      Cc: "Michal Piotrowski" <michal.k.k.piotrowski@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f3cef7a9
    • B
      [PATCH] Fix taskstats size calculation (use the new genetlink utility functions) · 0ae64684
      Balbir Singh 提交于
      The addition of the CSA patch pushed the size of struct taskstats to 256
      bytes.  This exposed a problem with prepare_reply(), we were not allocating
      space for the netlink and genetlink header.  It worked earlier because
      alloc_skb() would align the skb to SMP_CACHE_BYTES, which added some additonal
      bytes.
      Signed-off-by: NBalbir Singh <balbir@in.ibm.com>
      Cc: Jamal Hadi <hadi@cyberus.ca>
      Cc: Shailabh Nagar <nagar@watson.ibm.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jay Lan <jlan@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0ae64684
    • A
      [PATCH] kill wall_jiffies · 8ef38609
      Atsushi Nemoto 提交于
      With 2.6.18-rc4-mm2, now wall_jiffies will always be the same as jiffies.
      So we can kill wall_jiffies completely.
      
      This is just a cleanup and logically should not change any real behavior
      except for one thing: RTC updating code in (old) ppc and xtensa use a
      condition "jiffies - wall_jiffies == 1".  This condition is never met so I
      suppose it is just a bug.  I just remove that condition only instead of
      kill the whole "if" block.
      
      [heiko.carstens@de.ibm.com: s390 build fix and cleanup]
      Signed-off-by: NAtsushi Nemoto <anemo@mba.ocn.ne.jp>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Hirokazu Takata <takata.hirokazu@renesas.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8ef38609
    • A
      [PATCH] kernel/time/ntp.c: possible cleanups · 70bc42f9
      Adrian Bunk 提交于
      This patch contains the following possible cleanups:
      - make the following needlessly global function static:
        - ntp_update_frequency()
      - make the following needlessly global variables static:
        - time_state
        - time_offset
        - time_constant
        - time_reftime
      - remove the following read-only global variable:
        - time_precision
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      70bc42f9
    • R
      [PATCH] ntp: convert to the NTP4 reference model · f1992393
      Roman Zippel 提交于
      This converts the kernel ntp model into a model which matches the nanokernel
      reference implementations.  The previous patches already increased the
      resolution and precision of the computations, so that this conversion becomes
      quite simple.
      
      <linux@horizon.com> explains:
      
      The original NTP kernel interface was defined in units of microseconds.
      That's what Linux implements.  As computers have gotten faster and can now
      split microseconds easily, a new kernel interface using nanosecond units was
      defined ("the nanokernel", confusing as that name is to OS hackers), and
      there's an STA_NANO bit in the adjtimex() status field to tell the application
      which units it's using.
      
      The current ntpd supports both, but Linux loses some possible timing
      resolution because of quantization effects, and the ntpd hackers would really
      like to be able to drop the backwards compatibility code.
      
      Ulrich Windl has been maintaining a patch set to do the conversion for years,
      but it's hard to keep in sync.
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      f1992393
    • R
      [PATCH] ntp: convert time_freq to nsec value · 04b617e7
      Roman Zippel 提交于
      This converts time_freq to a scaled nsec value and adds around 6bit of extra
      resolution.  This pushes the time_freq to its 32bit limits so the calculatons
      have to be done with 64bit.
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      04b617e7
    • R
      [PATCH] ntp: remove time_tolerance · 97eebe13
      Roman Zippel 提交于
      time_tolerance isn't changed at all in the kernel, so simply remove it, this
      simplifies the next patch, as it avoids a number of conversions.
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      97eebe13
    • R
      [PATCH] ntp: add time_adjust to tick length · 8f807f8d
      Roman Zippel 提交于
      This folds update_ntp_one_tick() into second_overflow() and adds time_adjust
      to the tick length, this makes time_next_adjust unnecessary.  This slightly
      changes the adjtime() behaviour, instead of applying it to the next tick, it's
      applied to the next second.
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8f807f8d
    • R
      [PATCH] ntp: prescale time_offset · 3d3675cc
      Roman Zippel 提交于
      This converts time_offset into a scaled per tick value.  This avoids now
      completely the crude compensation in second_overflow().
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3d3675cc
    • R
      [PATCH] ntp: add time_freq to tick length · dc6a43e4
      Roman Zippel 提交于
      This adds the frequency part to ntp_update_frequency().
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      dc6a43e4
    • R
      [PATCH] ntp: add time_adj to tick length · ab8783b6
      Roman Zippel 提交于
      This makes time_adj local to second_overflow() and integrates it into the tick
      length instead of adding it everytime.
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab8783b6
    • R
      [PATCH] ntp: add ntp_update_frequency · b0ee7556
      Roman Zippel 提交于
      This introduces ntp_update_frequency() and deinlines ntp_clear() (as it's not
      performance critical).  ntp_update_frequency() calculates the base tick length
      using tick_usec and adds a base adjustment, in case the frequency doesn't
      divide evenly by HZ.
      Signed-off-by: NRoman Zippel <zippel@linux-m68k.org>
      Cc: john stultz <johnstul@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b0ee7556
    • J
      [PATCH] NTP: Move all the NTP related code to ntp.c · 4c7ee8de
      john stultz 提交于
      Move all the NTP related code to ntp.c
      
      [akpm@osdl.org: cleanups, build fix]
      Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4c7ee8de
    • M
      [PATCH] Directed yield: cpu_relax variants for spinlocks and rw-locks · ef6edc97
      Martin Schwidefsky 提交于
      On systems running with virtual cpus there is optimization potential in
      regard to spinlocks and rw-locks.  If the virtual cpu that has taken a lock
      is known to a cpu that wants to acquire the same lock it is beneficial to
      yield the timeslice of the virtual cpu in favour of the cpu that has the
      lock (directed yield).
      
      With CONFIG_PREEMPT="n" this can be implemented by the architecture without
      common code changes.  Powerpc already does this.
      
      With CONFIG_PREEMPT="y" the lock loops are coded with _raw_spin_trylock,
      _raw_read_trylock and _raw_write_trylock in kernel/spinlock.c.  If the lock
      could not be taken cpu_relax is called.  A directed yield is not possible
      because cpu_relax doesn't know anything about the lock.  To be able to
      yield the lock in favour of the current lock holder variants of cpu_relax
      for spinlocks and rw-locks are needed.  The new _raw_spin_relax,
      _raw_read_relax and _raw_write_relax primitives differ from cpu_relax
      insofar that they have an argument: a pointer to the lock structure.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ef6edc97
    • C
      [PATCH] CodingStyle cleanup for kernel/sys.c · 756184b7
      Cal Peake 提交于
      Fix up kernel/sys.c to be consistent with CodingStyle and the rest of the
      file.
      Signed-off-by: NCal Peake <cp@absolutedigital.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      756184b7
    • A
      [PATCH] maximum latency tracking infrastructure · 5c87579e
      Arjan van de Ven 提交于
      Add infrastructure to track "maximum allowable latency" for power saving
      policies.
      
      The reason for adding this infrastructure is that power management in the
      idle loop needs to make a tradeoff between latency and power savings
      (deeper power save modes have a longer latency to running code again).  The
      code that today makes this tradeoff just does a rather simple algorithm;
      however this is not good enough: There are devices and use cases where a
      lower latency is required than that the higher power saving states provide.
       An example would be audio playback, but another example is the ipw2100
      wireless driver that right now has a very direct and ugly acpi hook to
      disable some higher power states randomly when it gets certain types of
      error.
      
      The proposed solution is to have an interface where drivers can
      
      * announce the maximum latency (in microseconds) that they can deal with
      * modify this latency
      * give up their constraint
      
      and a function where the code that decides on power saving strategy can
      query the current global desired maximum.
      
      This patch has a user of each side: on the consumer side, ACPI is patched
      to use this, on the producer side the ipw2100 driver is patched.
      
      A generic maximum latency is also registered of 2 timer ticks (more and you
      lose accurate time tracking after all).
      
      While the existing users of the patch are x86 specific, the infrastructure
      is not.  I'd like to ask the arch maintainers of other architectures if the
      infrastructure is generic enough for their use (assuming the architecture
      has such a tradeoff as concept at all), and the sound/multimedia driver
      owners to look at the driver facing API to see if this is something they
      can use.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NJesse Barnes <jesse.barnes@intel.com>
      Cc: "Brown, Len" <len.brown@intel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5c87579e
    • D
      [PATCH] BLOCK: Make it possible to disable the block layer [try #6] · 9361401e
      David Howells 提交于
      Make it possible to disable the block layer.  Not all embedded devices require
      it, some can make do with just JFFS2, NFS, ramfs, etc - none of which require
      the block layer to be present.
      
      This patch does the following:
      
       (*) Introduces CONFIG_BLOCK to disable the block layer, buffering and blockdev
           support.
      
       (*) Adds dependencies on CONFIG_BLOCK to any configuration item that controls
           an item that uses the block layer.  This includes:
      
           (*) Block I/O tracing.
      
           (*) Disk partition code.
      
           (*) All filesystems that are block based, eg: Ext3, ReiserFS, ISOFS.
      
           (*) The SCSI layer.  As far as I can tell, even SCSI chardevs use the
           	 block layer to do scheduling.  Some drivers that use SCSI facilities -
           	 such as USB storage - end up disabled indirectly from this.
      
           (*) Various block-based device drivers, such as IDE and the old CDROM
           	 drivers.
      
           (*) MTD blockdev handling and FTL.
      
           (*) JFFS - which uses set_bdev_super(), something it could avoid doing by
           	 taking a leaf out of JFFS2's book.
      
       (*) Makes most of the contents of linux/blkdev.h, linux/buffer_head.h and
           linux/elevator.h contingent on CONFIG_BLOCK being set.  sector_div() is,
           however, still used in places, and so is still available.
      
       (*) Also made contingent are the contents of linux/mpage.h, linux/genhd.h and
           parts of linux/fs.h.
      
       (*) Makes a number of files in fs/ contingent on CONFIG_BLOCK.
      
       (*) Makes mm/bounce.c (bounce buffering) contingent on CONFIG_BLOCK.
      
       (*) set_page_dirty() doesn't call __set_page_dirty_buffers() if CONFIG_BLOCK
           is not enabled.
      
       (*) fs/no-block.c is created to hold out-of-line stubs and things that are
           required when CONFIG_BLOCK is not set:
      
           (*) Default blockdev file operations (to give error ENODEV on opening).
      
       (*) Makes some /proc changes:
      
           (*) /proc/devices does not list any blockdevs.
      
           (*) /proc/diskstats and /proc/partitions are contingent on CONFIG_BLOCK.
      
       (*) Makes some compat ioctl handling contingent on CONFIG_BLOCK.
      
       (*) If CONFIG_BLOCK is not defined, makes sys_quotactl() return -ENODEV if
           given command other than Q_SYNC or if a special device is specified.
      
       (*) In init/do_mounts.c, no reference is made to the blockdev routines if
           CONFIG_BLOCK is not defined.  This does not prohibit NFS roots or JFFS2.
      
       (*) The bdflush, ioprio_set and ioprio_get syscalls can now be absent (return
           error ENOSYS by way of cond_syscall if so).
      
       (*) The seclvl_bd_claim() and seclvl_bd_release() security calls do nothing if
           CONFIG_BLOCK is not set, since they can't then happen.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9361401e
    • D
      [PATCH] BLOCK: Move extern declarations out of fs/*.c into header files [try #6] · 07f3f05c
      David Howells 提交于
      Create a new header file, fs/internal.h, for common definitions local to the
      sources in the fs/ directory.
      
      Move extern definitions that should be in header files from fs/*.c to
      fs/internal.h or other main header files where they span directories.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      07f3f05c
    • D
      [PATCH] BLOCK: Remove duplicate declaration of exit_io_context() [try #6] · 0d67a46d
      David Howells 提交于
      Remove the duplicate declaration of exit_io_context() from linux/sched.h.
      Signed-Off-By: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0d67a46d
  2. 30 9月, 2006 21 次提交
    • A
    • A
      [PATCH] x86: Clean up x86 NMI sysctls · 29cbc78b
      Andi Kleen 提交于
      Use prototypes in headers
      Don't define panic_on_unrecovered_nmi for all architectures
      
      Cc: dzickus@redhat.com
      Signed-off-by: NAndi Kleen <ak@suse.de>
      29cbc78b
    • P
      [PATCH] cpuset: fix obscure attach_task vs exiting race · 181b6480
      Paul Jackson 提交于
      Fix obscure race condition in kernel/cpuset.c attach_task() code.
      
      There is basically zero chance of anyone accidentally being harmed by this
      race.
      
      It requires a special 'micro-stress' load and a special timing loop hacks
      in the kernel to hit in less than an hour, and even then you'd have to hit
      it hundreds or thousands of times, followed by some unusual and senseless
      cpuset configuration requests, including removing the top cpuset, to cause
      any visibly harm affects.
      
      One could, with perhaps a few days or weeks of such effort, get the
      reference count on the top cpuset below zero, and manage to crash the
      kernel by asking to remove the top cpuset.
      
      I found it by code inspection.
      
      The race was introduced when 'the_top_cpuset_hack' was introduced, and one
      piece of code was not updated.  An old check for a possibly null task
      cpuset pointer needed to be changed to a check for a task marked
      PF_EXITING.  The pointer can't be null anymore, thanks to
      the_top_cpuset_hack (documented in kernel/cpuset.c).  But the task could
      have gone into PF_EXITING state after it was found in the task_list scan.
      
      If a task is PF_EXITING in this code, it is possible that its task->cpuset
      pointer is pointing to the top cpuset due to the_top_cpuset_hack, rather
      than because the top_cpuset was that tasks last valid cpuset.  In that
      case, the wrong cpuset reference counter would be decremented.
      
      The fix is trivial.  Instead of failing the system call if the tasks cpuset
      pointer is null here, fail it if the task is in PF_EXITING state.
      
      The code for 'the_top_cpuset_hack' that changes an exiting tasks cpuset to
      the top_cpuset is done without locking, so could happen at anytime.  But it
      is done during the exit handling, after the PF_EXITING flag is set.  So if
      we verify that a task is still not PF_EXITING after we copy out its cpuset
      pointer (into 'oldcs', below), we know that 'oldcs' is not one of these
      hack references to the top_cpuset.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      181b6480
    • I
      [PATCH] lockdep core: improve the lock-chain-hash · 03cbc358
      Ingo Molnar 提交于
      With CONFIG_DEBUG_LOCK_ALLOC turned off i was getting sporadic failures in
      the locking self-test:
      
        ------------>
        | Locking API testsuite:
        ----------------------------------------------------------------------------
                                         | spin |wlock |rlock |mutex | wsem | rsem |
          --------------------------------------------------------------------------
                             A-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
                         A-B-B-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
                     A-B-B-C-C-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
                     A-B-C-A-B-C deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
                 A-B-B-C-C-D-D-A deadlock:  ok  |FAILED|  ok  |  ok  |  ok  |  ok  |
                 A-B-C-D-B-D-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
                 A-B-C-D-B-C-D-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |FAILED|
      
      after much debugging it turned out to be caused by accidental chain-hash
      key collisions.  The current hash is:
      
       #define iterate_chain_key(key1, key2) \
      	(((key1) << MAX_LOCKDEP_KEYS_BITS/2) ^ \
      	((key1) >> (64-MAX_LOCKDEP_KEYS_BITS/2)) ^ \
       	(key2))
      
      where MAX_LOCKDEP_KEYS_BITS is 11.  This hash is pretty good as it will
      shift by 5 bits in every iteration, where every new ID 'mixed' into the
      hash would have up to 11 bits.  But because there was a 6 bits overlap
      between subsequent IDs and their high bits tended to be similar, there was
      a chance for accidental chain-hash collision for a low number of locks
      held.
      
      the solution is to shift by 11 bits:
      
       #define iterate_chain_key(key1, key2) \
      	(((key1) << MAX_LOCKDEP_KEYS_BITS) ^ \
      	((key1) >> (64-MAX_LOCKDEP_KEYS_BITS)) ^ \
       	(key2))
      
      This keeps the hash perfect up to 5 locks held, but even above that the
      hash is still good because 11 bits is a relative prime to the total 64
      bits, so a complete match will only occur after 64 held locks (which doesnt
      happen in Linux).  Even after 5 locks held, entropy of the 5 IDs mixed into
      the hash is already good enough so that overlap doesnt generate a colliding
      hash ID.
      
      with this change the false positives went away.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      03cbc358
    • A
      [PATCH] audit/accounting: tty locking · eb84a20e
      Alan Cox 提交于
      Add tty locking around the audit and accounting code.
      
      The whole current->signal-> locking is all deeply strange but it's for
      someone else to sort out.  Add rather than replace the lock for acct.c
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      eb84a20e
    • R
      [PATCH] stop_machine.c copyright · e5582ca2
      Rusty Russell 提交于
      I had to look back: this code was extracted from the module.c code in 2005.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e5582ca2
    • I
      [PATCH] /sys/modules: allow full length section names · 04b1db9f
      Ian S. Nelson 提交于
      I've been using systemtap for some debugging and I noticed that it can't
      probe a lot of modules.  Turns out it's kind of silly, the sections section
      of /sys/module is limited to 32byte filenames and many of the actual
      sections are a a bit longer than that.
      
      [akpm@osdl.org: rewrite to use dymanic allocation]
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      04b1db9f
    • P
      [PATCH] cpuset: hotunplug cpus and mems in all cpusets · b1aac8bb
      Paul Jackson 提交于
      The cpuset code handling hot unplug of CPUs or Memory Nodes was incorrect -
      it could remove a CPU or Node from the top cpuset, while leaving it still
      in some child cpusets.
      
      One basic rule of cpusets is that each cpusets cpus and mems are subsets of
      its parents.  The cpuset hot unplug code violated this rule.
      
      So the cpuset hotunplug handler must walk down the tree, removing any
      removed CPU or Node from all cpusets.
      
      However, it is not allowed to make a cpusets cpus or mems become empty.
      They can only transition from empty to non-empty, not back.
      
      So if the last CPU or Node would be removed from a cpuset by the above
      walk, we scan back up the cpuset hierarchy, finding the nearest ancestor
      that still has something online, and copy its CPU or Memory placement.
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Cc: Nathan Lynch <ntl@pobox.com>
      Cc: Anton Blanchard <anton@samba.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      b1aac8bb
    • P
      [PATCH] cpuset: top_cpuset tracks hotplug changes to node_online_map · 38837fc7
      Paul Jackson 提交于
      Change the list of memory nodes allowed to tasks in the top (root) nodeset
      to dynamically track what cpus are online, using a call to a cpuset hook
      from the memory hotplug code.  Make this top cpus file read-only.
      
      On systems that have cpusets configured in their kernel, but that aren't
      actively using cpusets (for some distros, this covers the majority of
      systems) all tasks end up in the top cpuset.
      
      If that system does support memory hotplug, then these tasks cannot make
      use of memory nodes that are added after system boot, because the memory
      nodes are not allowed in the top cpuset.  This is a surprising regression
      over earlier kernels that didn't have cpusets enabled.
      
      One key motivation for this change is to remain consistent with the
      behaviour for the top_cpuset's 'cpus', which is also read-only, and which
      automatically tracks the cpu_online_map.
      
      This change also has the minor benefit that it fixes a long standing,
      little noticed, minor bug in cpusets.  The cpuset performance tweak to
      short circuit the cpuset_zone_allowed() check on systems with just a single
      cpuset (see 'number_of_cpusets', in linux/cpuset.h) meant that simply
      changing the 'mems' of the top_cpuset had no affect, even though the change
      (the write system call) appeared to succeed.  With the following change,
      that write to the 'mems' file fails -EACCES, and the 'mems' file stubbornly
      refuses to be changed via user space writes.  Thus no one should be mislead
      into thinking they've changed the top_cpusets's 'mems' when in affect they
      haven't.
      
      In order to keep the behaviour of cpusets consistent between systems
      actively making use of them and systems not using them, this patch changes
      the behaviour of the 'mems' file in the top (root) cpuset, making it read
      only, and making it automatically track the value of node_online_map.  Thus
      tasks in the top cpuset will have automatic use of hot plugged memory nodes
      allowed by their cpuset.
      
      [akpm@osdl.org: build fix]
      [bunk@stusta.de: build fix]
      Signed-off-by: NPaul Jackson <pj@sgi.com>
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      38837fc7
    • O
      [PATCH] introduce TASK_DEAD state · c394cc9f
      Oleg Nesterov 提交于
      I am not sure about this patch, I am asking Ingo to take a decision.
      
      task_struct->state == EXIT_DEAD is a very special case, to avoid a confusion
      it makes sense to introduce a new state, TASK_DEAD, while EXIT_DEAD should
      live only in ->exit_state as documented in sched.h.
      
      Note that this state is not visible to user-space, get_task_state() masks off
      unsuitable states.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c394cc9f
    • O
      [PATCH] kill PF_DEAD flag · 55a101f8
      Oleg Nesterov 提交于
      After the previous change (->flags & PF_DEAD) <=> (->state == EXIT_DEAD), we
      don't need PF_DEAD any longer.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      55a101f8
    • O
      [PATCH] set EXIT_DEAD state in do_exit(), not in schedule() · 29b88492
      Oleg Nesterov 提交于
      schedule() checks PF_DEAD on every context switch and sets ->state = EXIT_DEAD
      to ensure that the exiting task will be deactivated.  Note that this EXIT_DEAD
      is in fact a "random" value, we can use any bit except normal TASK_XXX values.
      
      It is better to set this state in do_exit() along with PF_DEAD flag and remove
      that check in schedule().
      
      We are safe wrt concurrent try_to_wake_up() (for example ptrace, tkill), it
      can not change task's ->state: the 'state' argument of try_to_wake_up() can't
      have EXIT_DEAD bit.  And in case when try_to_wake_up() sees a stale value of
      ->state == TASK_RUNNING it will do nothing.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      29b88492
    • O
      [PATCH] sys_get_robust_list(): don't take tasklist_lock · aaa2a97e
      Oleg Nesterov 提交于
      use rcu locks for find_task_by_pid().
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      aaa2a97e
    • O
      [PATCH] futex_find_get_task(): don't take tasklist_lock · d359b549
      Oleg Nesterov 提交于
      It is ok to do find_task_by_pid() + get_task_struct() under
      rcu_read_lock(), we cand drop tasklist_lock.
      
      Note that testing of ->exit_state is racy with or without tasklist anyway.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      d359b549
    • O
      [PATCH] copy_process: cosmetic ->ioprio tweak · 5b160f5e
      Oleg Nesterov 提交于
      copy_process:
      // holds tasklist_lock + ->siglock
             /*
              * inherit ioprio
              */
             p->ioprio = current->ioprio;
      
      Why?  ->ioprio was already copied in dup_task_struct().  I guess this is
      needed to ensure that the child can't escape
      sys_ioprio_set(IOPRIO_WHO_{PGRP,USER}), yes?
      
      In that case we don't need ->siglock held, and the comment should be
      updated.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Jens Axboe <axboe@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5b160f5e
    • O
      [PATCH] reparent_to_init(): use has_rt_policy() · 1c573afe
      Oleg Nesterov 提交于
      Remove open-coded has_rt_policy(), no changes in kernel/exit.o
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1c573afe
    • O
      [PATCH] sched_setscheduler: fix? policy checks · 8dc3e909
      Oleg Nesterov 提交于
      I am not sure this patch is correct: I can't understand what the current
      code does, and I don't know what it was supposed to do.
      
      The comment says:
      
      		 * can't change policy, except between SCHED_NORMAL
      		 * and SCHED_BATCH:
      
      The code:
      
      		if (((policy != SCHED_NORMAL && p->policy != SCHED_BATCH) &&
      			(policy != SCHED_BATCH && p->policy != SCHED_NORMAL)) &&
      
      But this is equivalent to:
      
      		if ( (is_rt_policy(policy) && has_rt_policy(p)) &&
      
      which means something different.  We can't _decrease_ the current
      ->rt_priority with such a check (if rlim[RLIMIT_RTPRIO] == 0).
      
      Probably, it was supposed to be:
      
      		if (	!(policy == SCHED_NORMAL && p->policy == SCHED_BATCH)  &&
      			!(policy == SCHED_BATCH  && p->policy == SCHED_NORMAL)
      
      this matches the comment, but strange: it doesn't allow to _drop_ the
      realtime priority when rlim[RLIMIT_RTPRIO] == 0.
      
      I think the right check would be:
      
      		/* can't set/change rt policy */
      		if (is_rt_policy(policy) &&
      				policy != p->policy &&
      				!rlim_rtprio)
      			return -EPERM;
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8dc3e909
    • O
      [PATCH] introduce is_rt_policy() helper · 57a6f51c
      Oleg Nesterov 提交于
      Imho, makes the code a bit easier to read.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      57a6f51c
    • O
      [PATCH] do_sched_setscheduler(): don't take tasklist_lock · 5fe1d75f
      Oleg Nesterov 提交于
      Use rcu locks instead. sched_setscheduler() now takes ->siglock
      before reading ->signal->rlim[].
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5fe1d75f
    • C
      [PATCH] kill extraneous printk in kernel_restart() · c9472e0f
      Cal Peake 提交于
      Get rid of an extraneous printk in kernel_restart().
      Signed-off-by: NCal Peake <cp@absolutedigital.net>
      Acked-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c9472e0f
    • B
      [PATCH] Fix ____call_usermodehelper errors being silently ignored · 111dbe0c
      Björn Steinbrink 提交于
      If ____call_usermodehelper fails, we're not interested in the child
      process' exit value, but the real error, so let's stop wait_for_helper from
      overwriting it in that case.
      
      Issue discovered by Benedikt Böhm while working on a Linux-VServer usermode
      helper.
      Signed-off-by: NBjörn Steinbrink <B.Steinbrink@gmx.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      111dbe0c