1. 26 9月, 2006 16 次提交
    • R
      [PATCH] Disable CPU hotplug during suspend · e3920fb4
      Rafael J. Wysocki 提交于
      The current suspend code has to be run on one CPU, so we use the CPU
      hotplug to take the non-boot CPUs offline on SMP machines.  However, we
      should also make sure that these CPUs will not be enabled by someone else
      after we have disabled them.
      
      The functions disable_nonboot_cpus() and enable_nonboot_cpus() are moved to
      kernel/cpu.c, because they now refer to some stuff in there that should
      better be static.  Also it's better if disable_nonboot_cpus() returns an
      error instead of panicking if something goes wrong, and
      enable_nonboot_cpus() has no reason to panic(), because the CPUs may have
      been enabled by the userland before it tries to take them online.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e3920fb4
    • R
      [PATCH] swsusp: struct snapshot_handle cleanup · fb13a28b
      Rafael J. Wysocki 提交于
      Add comments describing struct snapshot_handle and its members, change the
      confusing name of its member 'page' to 'cur'.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fb13a28b
    • R
      [PATCH] swsusp: clean up browsing of pfns · ae83c5ee
      Rafael J. Wysocki 提交于
      Clean up some loops over pfns for each zone in snapshot.c: reduce the
      number of additions to perform, rework detection of saveable pages and make
      the code a bit less difficult to understand, hopefully.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ae83c5ee
    • A
      [PATCH] swsusp: read speedup · 546e0d27
      Andrew Morton 提交于
      Implement async reads for swsusp resuming.
      
      Crufty old PIII testbox:
      	15.7 MB/s -> 20.3 MB/s
      
      Sony Vaio:
      	14.6 MB/s -> 33.3 MB/s
      
      I didn't implement the post-resume bio_set_pages_dirty().  I don't really
      understand why resume needs to run set_page_dirty() against these pages.
      
      It might be a worry that this code modifies PG_Uptodate, PG_Error and
      PG_Locked against the image pages.  Can this possibly affect the resumed-into
      kernel?  Hopefully not, if we're atomically restoring its mem_map?
      
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Jens Axboe <axboe@suse.de>
      Cc: Laurent Riffard <laurent.riffard@free.fr>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      546e0d27
    • A
      [PATCH] swsusp: add read-speed instrumentation · 8c002494
      Andrew Morton 提交于
      Add some instrumentation to the swsusp readin code to show what bandwidth
      we're achieving.
      
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8c002494
    • A
      [PATCH] swsusp: write speedup · ab954160
      Andrew Morton 提交于
      Switch the swsusp writeout code from 4k-at-a-time to 4MB-at-a-time.
      
      Crufty old PIII testbox:
      	12.9 MB/s -> 20.9 MB/s
      
      Sony Vaio:
      	14.7 MB/s -> 26.5 MB/s
      
      The implementation is crude.  A better one would use larger BIOs, but wouldn't
      gain any performance.
      
      The memcpys will be mostly pipelined with the IO and basically come for free.
      
      The ENOMEM path has not been tested.  It should be.
      
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ab954160
    • A
      [PATCH] swsusp: add write-speed instrumentation · 3a4f7577
      Andrew Morton 提交于
      Add some instrumentation to the swsusp writeout code to show what bandwidth
      we're achieving.
      
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3a4f7577
    • D
      [PATCH] FRV: permit __do_IRQ() to be dispensed with · af8c65b5
      David Howells 提交于
      Permit __do_IRQ() to be dispensed with based on a configuration option.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      af8c65b5
    • S
      [PATCH] selinux: rename selinux_ctxid_to_string · 1a70cd40
      Stephen Smalley 提交于
      Rename selinux_ctxid_to_string to selinux_sid_to_string to be
      consistent with other interfaces.
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1a70cd40
    • S
      [PATCH] selinux: eliminate selinux_task_ctxid · 62bac018
      Stephen Smalley 提交于
      Eliminate selinux_task_ctxid since it duplicates selinux_task_get_sid.
      Signed-off-by: NStephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NJames Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      62bac018
    • C
      [PATCH] NUMA: Add zone_to_nid function · 89fa3024
      Christoph Lameter 提交于
      There are many places where we need to determine the node of a zone.
      Currently we use a difficult to read sequence of pointer dereferencing.
      Put that into an inline function and use throughout VM.  Maybe we can find
      a way to optimize the lookup in the future.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      89fa3024
    • C
      [PATCH] zone_reclaim: dynamic slab reclaim · 0ff38490
      Christoph Lameter 提交于
      Currently one can enable slab reclaim by setting an explicit option in
      /proc/sys/vm/zone_reclaim_mode.  Slab reclaim is then used as a final
      option if the freeing of unmapped file backed pages is not enough to free
      enough pages to allow a local allocation.
      
      However, that means that the slab can grow excessively and that most memory
      of a node may be used by slabs.  We have had a case where a machine with
      46GB of memory was using 40-42GB for slab.  Zone reclaim was effective in
      dealing with pagecache pages.  However, slab reclaim was only done during
      global reclaim (which is a bit rare on NUMA systems).
      
      This patch implements slab reclaim during zone reclaim.  Zone reclaim
      occurs if there is a danger of an off node allocation.  At that point we
      
      1. Shrink the per node page cache if the number of pagecache
         pages is more than min_unmapped_ratio percent of pages in a zone.
      
      2. Shrink the slab cache if the number of the nodes reclaimable slab pages
         (patch depends on earlier one that implements that counter)
         are more than min_slab_ratio (a new /proc/sys/vm tunable).
      
      The shrinking of the slab cache is a bit problematic since it is not node
      specific.  So we simply calculate what point in the slab we want to reach
      (current per node slab use minus the number of pages that neeed to be
      allocated) and then repeately run the global reclaim until that is
      unsuccessful or we have reached the limit.  I hope we will have zone based
      slab reclaim at some point which will make that easier.
      
      The default for the min_slab_ratio is 5%
      
      Also remove the slab option from /proc/sys/vm/zone_reclaim_mode.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0ff38490
    • C
      [PATCH] Profiling: require buffer allocation on the correct node · fbd98167
      Christoph Lameter 提交于
      Profiling really suffers with off node buffers.  Fail if no memory is
      available on the nodes.  The profiling code can deal with these failures
      should they occur.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      fbd98167
    • C
      [PATCH] Add __GFP_THISNODE to avoid fallback to other nodes and ignore... · 9b819d20
      Christoph Lameter 提交于
      [PATCH] Add __GFP_THISNODE to avoid fallback to other nodes and ignore cpuset/memory policy restrictions
      
      Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes.  This
      flag is essential if a kernel component requires memory to be located on a
      certain node.  It will be needed for alloc_pages_node() to force allocation
      on the indicated node and for alloc_pages() to force allocation on the
      current node.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9b819d20
    • C
      [PATCH] Fix longstanding load balancing bug in the scheduler · 0a2966b4
      Christoph Lameter 提交于
      The scheduler will stop load balancing if the most busy processor contains
      processes pinned via processor affinity.
      
      The scheduler currently only does one search for busiest cpu.  If it cannot
      pull any tasks away from the busiest cpu because they were pinned then the
      scheduler goes into a corner and sulks leaving the idle processors idle.
      
      F.e.  If you have processor 0 busy running four tasks pinned via taskset,
      there are none on processor 1 and one just started two processes on
      processor 2 then the scheduler will not move one of the two processes away
      from processor 2.
      
      This patch fixes that issue by forcing the scheduler to come out of its
      corner and retrying the load balancing by considering other processors for
      load balancing.
      
      This patch was originally developed by John Hawkes and discussed at
      
          http://marc.theaimsgroup.com/?l=linux-kernel&m=113901368523205&w=2.
      
      I have removed extraneous material and gone back to equipping struct rq
      with the cpu the queue is associated with since this makes the patch much
      easier and it is likely that others in the future will have the same
      difficulty of figuring out which processor owns which runqueue.
      
      The overhead added through these patches is a single word on the stack if
      the kernel is configured to support 32 cpus or less (32 bit).  For 32 bit
      environments the maximum number of cpus that can be configued is 255 which
      would result in the use of 32 bytes additional on the stack.  On IA64 up to
      1k cpus can be configured which will result in the use of 128 additional
      bytes on the stack.  The maximum additional cache footprint is one
      cacheline.  Typically memory use will be much less than a cacheline and the
      additional cpumask will be placed on the stack in a cacheline that already
      contains other local variable.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: John Hawkes <hawkes@sgi.com>
      Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Williams <pwil3058@bigpond.net.au>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0a2966b4
    • E
      [PATCH] load_module: no BUG if module_subsys uninitialized · 1cc5f714
      Ed Swierk 提交于
      Invoking load_module() before param_sysfs_init() is called crashes in
      mod_sysfs_setup(), since the kset in module_subsys is not initialized yet.
      
      In my case, net-pf-1 is getting modprobed as a result of hotplug trying to
      create a UNIX socket.  Calls to hotplug begin after the topology_init
      initcall.
      
      Another patch for the same symptom (module_subsys-initialize-earlier.patch)
      moves param_sysfs_init() to the subsys initcalls, but this is still not
      early enough in the boot process in some cases.  In particular,
      topology_init() causes /sbin/hotplug to run, which requests net-pf-1 (the
      UNIX socket protocol) which can be compiled as a module.  Moving
      param_sysfs_init() to the postcore initcalls fixes this particular race,
      but there might well be other cases where a usermodehelper causes a module
      to load earlier still.
      
      The patch makes load_module() return an error rather than crashing the
      kernel if invoked before module_subsys is initialized.
      
      Cc: Mark Huang <mlhuang@cs.princeton.edu>
      Cc: Greg KH <greg@kroah.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      1cc5f714
  2. 23 9月, 2006 1 次提交
  3. 19 9月, 2006 1 次提交
    • I
      [PATCH] genirq core: fix handle_level_irq() · 86998aa6
      Ingo Molnar 提交于
      while porting the -rt tree to 2.6.18-rc7 i noticed the following
      screaming-IRQ scenario on an SMP system:
      
       2274  0Dn.:1 0.001ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.010ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.020ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.029ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.039ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.048ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.058ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.068ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.077ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.087ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
       2274  0Dn.:1 0.097ms: do_IRQ+0xc/0x103  <= (ret_from_intr+0x0/0xf)
      
      as it turns out, the bug is caused by handle_level_irq(), which if it
      races with another CPU already handling this IRQ, it _unmasks_ the IRQ
      line on the way out. This is not how 2.6.17 works, and we introduced
      this bug in one of the early genirq cleanups right before it went into
      -mm. (the bug was not in the genirq patchset for a long time, and we
      didnt notice the bug due to the lack of -rt rebase to the new genirq
      code. -rt, and hardirq-preemption in particular opens up such races much
      wider than anything else.)
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      86998aa6
  4. 17 9月, 2006 2 次提交
  5. 13 9月, 2006 2 次提交
  6. 12 9月, 2006 4 次提交
  7. 09 9月, 2006 1 次提交
    • T
      [PATCH] Use the correct restart option for futex_lock_pi · c5780e97
      Thomas Gleixner 提交于
      The current implementation of futex_lock_pi returns -ERESTART_RESTARTBLOCK
      in case that the lock operation has been interrupted by a signal.  This
      results in a return of -EINTR to userspace in case there is an handler for
      the signal.  This is wrong, because userspace expects that the lock
      function does not return in any case of signal delivery.
      
      This was not caught by my insufficient test case, but triggered a nasty
      userspace problem in an high load application scenario.  Unfortunately also
      glibc does not check for this invalid return value.
      
      Using -ERSTARTNOINTR makes sure, that the interrupted syscall is restarted.
       The restart block related code can be safely removed, as the possible
      timeout argument is an absolute time value.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      c5780e97
  8. 07 9月, 2006 3 次提交
  9. 03 9月, 2006 1 次提交
  10. 02 9月, 2006 2 次提交
  11. 28 8月, 2006 5 次提交
  12. 15 8月, 2006 2 次提交
    • A
      [PATCH] workqueue: remove lock_cpu_hotplug() · 9b41ea72
      Andrew Morton 提交于
      Use a private lock instead.  It protects all per-cpu data structures in
      workqueue.c, including the workqueues list.
      
      Fix a bug in schedule_on_each_cpu(): it was forgetting to lock down the
      per-cpu resources.
      
      Unfixed long-standing bug: if someone unplugs the CPU identified by
      `singlethread_cpu' the kernel will get very sick.
      
      Cc: Dave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      9b41ea72
    • J
      [PATCH] futex_handle_fault always fails · e579dcbf
      john stultz 提交于
      We found this issue last week w/ the -RT kernel, but it seems the same
      issue is in mainline as well.
      
      Basically it is possible for futex_unlock_pi to return without actually
      freeing the lock.  This is due to buggy logic in the use of
      futex_handle_fault() and its attempt argument in a failure case.
      
      Looking at futex.c the logic is as follows:
      
      1) In futex_unlock_pi() we start w/ ret=0 and we go down to the first
         futex_atomic_cmpxchg_inatomic(), where we find uval==-EFAULT.  We then
         jump to the pi_faulted label.
      
      2) From pi_faulted: We increment attempt, unlock the sem and hit the
         retry label.
      
      3) From the retry label, with ret still zero, we again hit EFAULT on the
         first futex_atomic_cmpxchg_inatomic(), and again goto the pi_faulted
         label.
      
      4) Again from pi_faulted: we increment attempt and enter the
         conditional, where we call futex_handle_fault.
      
      5) futex_handle_fault fails, and we goto the out_unlock_release_sem
         label.
      
      6) From out_unlock_release_sem we return, and since ret is still zero,
         we return without error, while never actually unlocking the lock.
      
      Issue #1: at the first futex_atomic_cmpxchg_inatomic() we should probably
      be setting ret=-EFAULT before jumping to pi_faulted: However in our case
      this doesn't really affect anything, as the glibc we're using ignores the
      error value from futex_unlock_pi().
      
      Issue #2: Look at futex_handle_fault(), its first conditional will return
      -EFAULT if attempt is >= 2.  However, from the "if(attempt++)
      futex_handle_fault(attempt)" logic above, we'll *never* call
      futex_handle_fault when attempt is less then two.  So we never get a chance
      to even try to fault the page in.
      
      The following patch addresses these two issues by 1) Always setting ret to
      -EFAULT if futex_handle_fault fails, and 2) Removing the = in
      futex_handle_fault's (attempt >= 2) check.
      
      I'm really not sure this is the right fix, but wanted to bring it up so
      folks knew the issue is alive and well in the current -git tree.  From
      looking at the git logs the logic was first introduced (then later copied
      to other places) in the following commit almost a year ago:
      
      http://www.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=4732efbeb997189d9f9b04708dc26bf8613ed721;hp=5b039e681b8c5f30aac9cc04385cc94be45d0823
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      e579dcbf