1. 08 12月, 2006 40 次提交
    • R
      [PATCH] Use freezeable workqueues in XFS · 58e14b14
      Rafael J. Wysocki 提交于
      Make the workqueues used by XFS freezeable, so their worker threads don't
      submit any I/O after the suspend image has been created.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@suspend2.net>
      Cc: David Chinner <dgc@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      58e14b14
    • R
      [PATCH] Support for freezeable workqueues · 341a5958
      Rafael J. Wysocki 提交于
      Make it possible to create a workqueue the worker thread of which will be
      frozen during suspend, along with other kernel threads.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@suspend2.net>
      Cc: David Chinner <dgc@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      341a5958
    • P
      [PATCH] swsusp: kill write-only variable · 5045cfc1
      Pavel Machek 提交于
      Cleanup write-only variable, suggested by D Binderman.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5045cfc1
    • R
      [PATCH] PM: Fix swsusp debug mode testproc · 2d87595e
      Rafael J. Wysocki 提交于
      The 'testproc' swsusp debug mode thaws tasks twice in a row, which is _very_
      confusing.  Fix that.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2d87595e
    • P
      [PATCH] s2ram debugging documentation · 06df6a5c
      Pavel Machek 提交于
      Linus posted quite nice TRACE_RESUME how-to, and I think it is too nice to
      be hidden in archives of mailing list, so I turned it into Documentation
      piece.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      06df6a5c
    • R
      [PATCH] swsusp: Fix labels · 59a49335
      Rafael J. Wysocki 提交于
      Move all labels in the swsusp code to the second column, so that they won't
      fool diff -p.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@suspend2.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      59a49335
    • R
      [PATCH] swsusp: Fix coding style in suspend.c · 5b6d15de
      Rafael J. Wysocki 提交于
      Fix coding style in suspend.c.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@suspend2.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5b6d15de
    • R
      [PATCH] swsusp: Untangle freeze_processes · 11b2ce2b
      Rafael J. Wysocki 提交于
      Move the loop from freeze_processes() to a separate function and call it
      independently for user space processes and kernel threads so that the order
      of freezing tasks is clearly visible.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@suspend2.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      11b2ce2b
    • R
      [PATCH] swsusp: Untangle thaw_processes · a9b6f562
      Rafael J. Wysocki 提交于
      Move the loop from thaw_processes() to a separate function and call it
      independently for kernel threads and user space processes so that the order
      of thawing tasks is clearly visible.
      
      Drop thaw_kernel_threads() which is never used.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@suspend2.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a9b6f562
    • S
      [PATCH] convert pm_sem to a mutex · a6d70980
      Stephen Hemminger 提交于
      The power management semaphore is only used as mutex, so convert it.
      
      [akpm@osdl.org: fix rotten bug]
      Signed-off-by: NStephen Hemminger <shemminger@osdl.org>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      a6d70980
    • R
      [PATCH] suspend to disk fails if gdb is suspended with a traced child · 3eb1b3a4
      Rafael J. Wysocki 提交于
      Fix http://bugzilla.kernel.org/show_bug.cgi?id=7534
      
      Fix the freezing of processes so that it won't fail if there is a traced
      process the parent of which has been stopped.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: maurice barnum <pixi+kbug@burble.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3eb1b3a4
    • R
      [PATCH] swsusp: Measure memory shrinking time · 0d3a9abe
      Rafael J. Wysocki 提交于
      Make swsusp measure and print the time needed to shrink memory during the
      suspend.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@suspend2.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      0d3a9abe
    • S
      [PATCH] suspend: don't change cpus_allowed for task initiating the suspend · 112cecb2
      Siddha, Suresh B 提交于
      Don't modify the cpus_allowed of the task initiating the suspend.
      _cpu_down() already makes sure that the task doing the suspend doesn't run
      on dying cpu.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Nigel Cunningham <nigel@suspend2.net>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      112cecb2
    • R
      [PATCH] swsusp: Support i386 systems with PAE or without PSE · 2d4a34c9
      Rafael J. Wysocki 提交于
      Make swsusp support i386 systems with PAE or without PSE.
      
      This is done by creating temporary page tables located in resume-safe page
      frames before the suspend image is restored in the same way as x86_64 does
      it.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Nigel Cunningham <ncunningham@linuxmail.org>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      2d4a34c9
    • N
      [PATCH] swsusp: thaw userspace and kernel space separately · ff39593a
      Nigel Cunningham 提交于
      Modify process thawing so that we can thaw kernel space without thawing
      userspace, and thaw kernelspace first.  This will be useful in later
      patches, where I intend to get swsusp thawing kernel threads only before
      seeking to free memory.
      Signed-off-by: NNigel Cunningham <nigel@suspend2.net>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ff39593a
    • N
      [PATCH] swsusp: clean up whitespace in freezer output · 14b5b7cf
      Nigel Cunningham 提交于
      Minor whitespace and formatting modifications for the freezer.
      Signed-off-by: NNigel Cunningham <nigel@suspend2.net>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      14b5b7cf
    • N
      [PATCH] swsusp: quieten Freezer if !CONFIG_PM_DEBUG · 32d50f57
      Nigel Cunningham 提交于
      The freezer currently prints an '=' for every process that is frozen.  This
      is pretty pointless, as the equals sign says nothing about which process is
      frozen, and makes logs look messier (especially if there were a large
      number of processes running).  All we really need to know is that we
      started trying to freeze processes and what processes (if any) failed to
      freeze, or that we succeeded.
      Signed-off-by: NNigel Cunningham <nigel@suspend2.net>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      32d50f57
    • N
      [PATCH] Add include/linux/freezer.h and move definitions from sched.h · 7dfb7103
      Nigel Cunningham 提交于
      Move process freezing functions from include/linux/sched.h to freezer.h, so
      that modifications to the freezer or the kernel configuration don't require
      recompiling just about everything.
      
      [akpm@osdl.org: fix ueagle driver]
      Signed-off-by: NNigel Cunningham <nigel@suspend2.net>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7dfb7103
    • S
      [PATCH] swsusp: fix platform mode · 8a05aac2
      Stefan Seyfried 提交于
      At some point after 2.6.13, in-kernel software suspend got "incomplete" for
      the so-called "platform" mode.  pm_ops->prepare() is never called.  A
      visible sign of this is the "moon" light on thinkpads not flashing during
      suspend.  Fix by readding the pm_ops->prepare call during suspend.
      Signed-off-by: NStefan Seyfried <seife@suse.de>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8a05aac2
    • R
      [PATCH] swsusp: use __GFP_WAIT · 85949121
      Rafael J. Wysocki 提交于
      swsusp uses GFP_ATOMIC, but it can afford to use __GFP_WAIT, which will
      permit it to reclaim clean pagecache instead of emitting scary
      page-allocation-failure messages.
      
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      85949121
    • R
      [PATCH] swsusp: Improve handling of highmem · 8357376d
      Rafael J. Wysocki 提交于
      Currently swsusp saves the contents of highmem pages by copying them to the
      normal zone which is quite inefficient (eg.  it requires two normal pages
      to be used for saving one highmem page).  This may be improved by using
      highmem for saving the contents of saveable highmem pages.
      
      Namely, during the suspend phase of the suspend-resume cycle we try to
      allocate as many free highmem pages as there are saveable highmem pages.
      If there are not enough highmem image pages to store the contents of all of
      the saveable highmem pages, some of them will be stored in the "normal"
      memory.  Next, we allocate as many free "normal" pages as needed to store
      the (remaining) image data.  We use a memory bitmap to mark the allocated
      free pages (ie.  highmem as well as "normal" image pages).
      
      Now, we use another memory bitmap to mark all of the saveable pages
      (highmem as well as "normal") and the contents of the saveable pages are
      copied into the image pages.  Then, the second bitmap is used to save the
      pfns corresponding to the saveable pages and the first one is used to save
      their data.
      
      During the resume phase the pfns of the pages that were saveable during the
      suspend are loaded from the image and used to mark the "unsafe" page
      frames.  Next, we try to allocate as many free highmem page frames as to
      load all of the image data that had been in the highmem before the suspend
      and we allocate so many free "normal" page frames that the total number of
      allocated free pages (highmem and "normal") is equal to the size of the
      image.  While doing this we have to make sure that there will be some extra
      free "normal" and "safe" page frames for two lists of PBEs constructed
      later.
      
      Now, the image data are loaded, if possible, into their "original" page
      frames.  The image data that cannot be written into their "original" page
      frames are loaded into "safe" page frames and their "original" kernel
      virtual addresses, as well as the addresses of the "safe" pages containing
      their copies, are stored in one of two lists of PBEs.
      
      One list of PBEs is for the copies of "normal" suspend pages (ie.  "normal"
      pages that were saveable during the suspend) and it is used in the same way
      as previously (ie.  by the architecture-dependent parts of swsusp).  The
      other list of PBEs is for the copies of highmem suspend pages.  The pages
      in this list are restored (in a reversible way) right before the
      arch-dependent code is called.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      8357376d
    • R
      [PATCH] swsusp: update userland interface documentation · bf73bae6
      Rafael J. Wysocki 提交于
      The swsusp userland interface has recently changed for a couple of times, but
      the changes have not been documented.  Fix this, and document the
      SNAPSHOT_SET_SWAP_AREA ioctl().
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      bf73bae6
    • R
      [PATCH] swsusp: add ioctl for swap files support · 37b2ba12
      Rafael J. Wysocki 提交于
      To be able to use swap files as suspend storage from the userland suspend
      tools we need an additional ioctl() that will allow us to provide the kernel
      with both the swap header's offset and the identification of the resume
      partition.
      
      The new ioctl() should be regarded as a replacement for the
      SNAPSHOT_SET_SWAP_FILE ioctl() that from now on will be considered as
      obsolete, but has to stay for backwards compatibility of the interface.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      37b2ba12
    • R
      [PATCH] swsusp: document support for swap files · ecbd0da1
      Rafael J. Wysocki 提交于
      Document the "resume_offset=" command line parameter as well as the way in
      which swap files are supported by swsusp.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      ecbd0da1
    • R
      [PATCH] swsusp: add resume_offset command line parameter · 9a154d9d
      Rafael J. Wysocki 提交于
      Add the kernel command line parameter "resume_offset=" allowing us to specify
      the offset, in <PAGE_SIZE> units, from the beginning of the partition pointed
      to by the "resume=" parameter at which the swap header is located.
      
      This offset can be determined, for example, by an application using the FIBMAP
      ioctl to obtain the swap header's block number for given file.
      
      [akpm@osdl.org: we don't know what type sector_t is]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      9a154d9d
    • R
      [PATCH] swsusp: use block device offsets to identify swap locations · 3aef83e0
      Rafael J. Wysocki 提交于
      Make swsusp use block device offsets instead of swap offsets to identify swap
      locations and make it use the same code paths for writing as well as for
      reading data.
      
      This allows us to use the same code for handling swap files and swap
      partitions and to simplify the code, eg.  by dropping rw_swap_page_sync().
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3aef83e0
    • R
      [PATCH] swsusp: rearrange swap-handling code · 3fc6b34f
      Rafael J. Wysocki 提交于
      Rearrange the code in kernel/power/swap.c so that the next patch is more
      readable.
      
      [This patch only moves the existing code.]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3fc6b34f
    • R
      [PATCH] swsusp: use partition device and offset to identify swap areas · 915bae9e
      Rafael J. Wysocki 提交于
      The Linux kernel handles swap files almost in the same way as it handles swap
      partitions and there are only two differences between these two types of swap
      areas:
      
      (1) swap files need not be contiguous,
      
      (2) the header of a swap file is not in the first block of the partition
          that holds it.  From the swsusp's point of view (1) is not a problem,
          because it is already taken care of by the swap-handling code, but (2) has
          to be taken into consideration.
      
      In principle the location of a swap file's header may be determined with the
      help of appropriate filesystem driver.  Unfortunately, however, it requires
      the filesystem holding the swap file to be mounted, and if this filesystem is
      journaled, it cannot be mounted during a resume from disk.  For this reason we
      need some other means by which swap areas can be identified.
      
      For example, to identify a swap area we can use the partition that holds the
      area and the offset from the beginning of this partition at which the swap
      header is located.
      
      The following patch allows swsusp to identify swap areas this way.  It changes
      swap_type_of() so that it takes an additional argument representing an offset
      of the swap header within the partition represented by its first argument.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      915bae9e
    • S
      [PATCH] uswsusp: add pmops->{prepare,enter,finish} support (aka "platform mode") · 3592695c
      Stefan Seyfried 提交于
      Add an ioctl to the userspace swsusp code that enables the usage of the
      pmops->prepare, pmops->enter and pmops->finish methods (the in-kernel
      suspend knows these as "platform method").  These are needed on many
      machines to (among others) speed up resuming by letting the BIOS skip some
      steps or let my hp nx5000 recognise the correct ac_adapter state after
      resume again.
      
      It also ensures on many machines, that changed hardware (unplugged AC
      adapters) gets correctly detected and that kacpid does not run wild after
      resume.
      Signed-off-by: NStefan Seyfried <seife@suse.de>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3592695c
    • A
      [PATCH] alpha: switch to pci_get API · 074cec54
      Alan Cox 提交于
      Now that we have pci_get_bus_and_slot we can do the job correctly.  Note that
      some of these calls intentionally leak a device - this is because the device
      in question is always needed from boot to reboot.
      Signed-off-by: NAlan Cox <alan@redhat.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      074cec54
    • M
      [PATCH] h8300 stray bracket fix · 3869aa29
      Mariusz Kozlowski 提交于
      Signed-off-by: NMariusz Kozlowski <m.kozlowski@tuxland.pl>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3869aa29
    • P
      [PATCH] avr32: fixup kprobes preemption handling · 4d3eeeac
      Paul Mundt 提交于
      While working on SH kprobes, I noticed that avr32 got the preemption
      handling wrong in the no probe case.  The idea is that upon entry of
      kprobe_handler() preemption is disabled outright across the life of the
      kprobe, only to be re-enabled in post_kprobe_handler().
      
      However, in the event that the probe is never activated, there's never any
      chance of hitting the post probe handler, which allows for the current
      avr32 implementation to disable preemption indefinitely, as it's currently
      missing a re-enable when no probe is activated.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4d3eeeac
    • A
      [PATCH] arch/frv/kernel/futex.c must #include <linux/uaccess.h> · e9c1528a
      Adrian Bunk 提交于
      This patch fixes the following compile error with
      -Werror-implicit-function-declaration
      (without -Werror-implicit-function-declaration it's a link error):
      
          ...
            CC      arch/frv/kernel/futex.o
          /home/bunk/linux/kernel-2.6/linux-2.6.19-rc6-mm2/arch/frv/kernel/futex.c:
          In function 'futex_atomic_op_inuser':
          /home/bunk/linux/kernel-2.6/linux-2.6.19-rc6-mm2/arch/frv/kernel/futex.c:203:
          error: implicit declaration of function 'pagefault_disable'
          /home/bunk/linux/kernel-2.6/linux-2.6.19-rc6-mm2/arch/frv/kernel/futex.c:226:
          error: implicit declaration of function 'pagefault_enable'
          make[2]: *** [arch/frv/kernel/futex.o] Error 1
          ...
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e9c1528a
    • E
      48ad504e
    • N
      [PATCH] radix-tree: RCU lockless readside · 7cf9c2c7
      Nick Piggin 提交于
      Make radix tree lookups safe to be performed without locks.  Readers are
      protected against nodes being deleted by using RCU based freeing.  Readers
      are protected against new node insertion by using memory barriers to ensure
      the node itself will be properly written before it is visible in the radix
      tree.
      
      Each radix tree node keeps a record of their height (above leaf nodes).
      This height does not change after insertion -- when the radix tree is
      extended, higher nodes are only inserted in the top.  So a lookup can take
      the pointer to what is *now* the root node, and traverse down it even if
      the tree is concurrently extended and this node becomes a subtree of a new
      root.
      
      "Direct" pointers (tree height of 0, where root->rnode points directly to
      the data item) are handled by using the low bit of the pointer to signal
      whether rnode is a direct pointer or a pointer to a radix tree node.
      
      When a reader wants to traverse the next branch, they will take a copy of
      the pointer.  This pointer will be either NULL (and the branch is empty) or
      non-NULL (and will point to a valid node).
      
      [akpm@osdl.org: cleanups]
      [Lee.Schermerhorn@hp.com: bugfixes, comments, simplifications]
      [clameter@sgi.com: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7cf9c2c7
    • A
      [PATCH] Save some bytes in struct mm_struct · 36de6437
      Arnaldo Carvalho de Melo 提交于
      Before:
      [acme@newtoy net-2.6.20]$ pahole --cacheline 32 kernel/sched.o mm_struct
      
      /* include2/asm/processor.h:542 */
      struct mm_struct {
              struct vm_area_struct *    mmap;                 /*     0     4 */
              struct rb_root             mm_rb;                /*     4     4 */
              struct vm_area_struct *    mmap_cache;           /*     8     4 */
              long unsigned int          (*get_unmapped_area)(); /*    12     4 */
              void                       (*unmap_area)();      /*    16     4 */
              long unsigned int          mmap_base;            /*    20     4 */
              long unsigned int          task_size;            /*    24     4 */
              long unsigned int          cached_hole_size;     /*    28     4 */
              /* ---------- cacheline 1 boundary ---------- */
              long unsigned int          free_area_cache;      /*    32     4 */
              pgd_t *                    pgd;                  /*    36     4 */
              atomic_t                   mm_users;             /*    40     4 */
              atomic_t                   mm_count;             /*    44     4 */
              int                        map_count;            /*    48     4 */
              struct rw_semaphore        mmap_sem;             /*    52    64 */
              spinlock_t                 page_table_lock;      /*   116    40 */
              struct list_head           mmlist;               /*   156     8 */
              mm_counter_t               _file_rss;            /*   164     4 */
              mm_counter_t               _anon_rss;            /*   168     4 */
              long unsigned int          hiwater_rss;          /*   172     4 */
              long unsigned int          hiwater_vm;           /*   176     4 */
              long unsigned int          total_vm;             /*   180     4 */
              long unsigned int          locked_vm;            /*   184     4 */
              long unsigned int          shared_vm;            /*   188     4 */
              /* ---------- cacheline 6 boundary ---------- */
              long unsigned int          exec_vm;              /*   192     4 */
              long unsigned int          stack_vm;             /*   196     4 */
              long unsigned int          reserved_vm;          /*   200     4 */
              long unsigned int          def_flags;            /*   204     4 */
              long unsigned int          nr_ptes;              /*   208     4 */
              long unsigned int          start_code;           /*   212     4 */
              long unsigned int          end_code;             /*   216     4 */
              long unsigned int          start_data;           /*   220     4 */
              /* ---------- cacheline 7 boundary ---------- */
              long unsigned int          end_data;             /*   224     4 */
              long unsigned int          start_brk;            /*   228     4 */
              long unsigned int          brk;                  /*   232     4 */
              long unsigned int          start_stack;          /*   236     4 */
              long unsigned int          arg_start;            /*   240     4 */
              long unsigned int          arg_end;              /*   244     4 */
              long unsigned int          env_start;            /*   248     4 */
              long unsigned int          env_end;              /*   252     4 */
              /* ---------- cacheline 8 boundary ---------- */
              long unsigned int          saved_auxv[44];       /*   256   176 */
              unsigned int               dumpable:2;           /*   432     4 */
              cpumask_t                  cpu_vm_mask;          /*   436     4 */
              mm_context_t               context;              /*   440    68 */
              long unsigned int          swap_token_time;      /*   508     4 */
              /* ---------- cacheline 16 boundary ---------- */
              char                       recent_pagein;        /*   512     1 */
      
              /* XXX 3 bytes hole, try to pack */
      
              int                        core_waiters;         /*   516     4 */
              struct completion *        core_startup_done;    /*   520     4 */
              struct completion          core_done;            /*   524    52 */
              rwlock_t                   ioctx_list_lock;      /*   576    36 */
              struct kioctx *            ioctx_list;           /*   612     4 */
      }; /* size: 616, sum members: 613, holes: 1, sum holes: 3, cachelines: 20,
            last cacheline: 8 bytes */
      
      After:
      
      [acme@newtoy net-2.6.20]$ pahole --cacheline 32 kernel/sched.o mm_struct
      /* include2/asm/processor.h:542 */
      struct mm_struct {
              struct vm_area_struct *    mmap;                 /*     0     4 */
              struct rb_root             mm_rb;                /*     4     4 */
              struct vm_area_struct *    mmap_cache;           /*     8     4 */
              long unsigned int          (*get_unmapped_area)(); /*    12     4 */
              void                       (*unmap_area)();      /*    16     4 */
              long unsigned int          mmap_base;            /*    20     4 */
              long unsigned int          task_size;            /*    24     4 */
              long unsigned int          cached_hole_size;     /*    28     4 */
              /* ---------- cacheline 1 boundary ---------- */
              long unsigned int          free_area_cache;      /*    32     4 */
              pgd_t *                    pgd;                  /*    36     4 */
              atomic_t                   mm_users;             /*    40     4 */
              atomic_t                   mm_count;             /*    44     4 */
              int                        map_count;            /*    48     4 */
              struct rw_semaphore        mmap_sem;             /*    52    64 */
              spinlock_t                 page_table_lock;      /*   116    40 */
              struct list_head           mmlist;               /*   156     8 */
              mm_counter_t               _file_rss;            /*   164     4 */
              mm_counter_t               _anon_rss;            /*   168     4 */
              long unsigned int          hiwater_rss;          /*   172     4 */
              long unsigned int          hiwater_vm;           /*   176     4 */
              long unsigned int          total_vm;             /*   180     4 */
              long unsigned int          locked_vm;            /*   184     4 */
              long unsigned int          shared_vm;            /*   188     4 */
              /* ---------- cacheline 6 boundary ---------- */
              long unsigned int          exec_vm;              /*   192     4 */
              long unsigned int          stack_vm;             /*   196     4 */
              long unsigned int          reserved_vm;          /*   200     4 */
              long unsigned int          def_flags;            /*   204     4 */
              long unsigned int          nr_ptes;              /*   208     4 */
              long unsigned int          start_code;           /*   212     4 */
              long unsigned int          end_code;             /*   216     4 */
              long unsigned int          start_data;           /*   220     4 */
              /* ---------- cacheline 7 boundary ---------- */
              long unsigned int          end_data;             /*   224     4 */
              long unsigned int          start_brk;            /*   228     4 */
              long unsigned int          brk;                  /*   232     4 */
              long unsigned int          start_stack;          /*   236     4 */
              long unsigned int          arg_start;            /*   240     4 */
              long unsigned int          arg_end;              /*   244     4 */
              long unsigned int          env_start;            /*   248     4 */
              long unsigned int          env_end;              /*   252     4 */
              /* ---------- cacheline 8 boundary ---------- */
              long unsigned int          saved_auxv[44];       /*   256   176 */
              cpumask_t                  cpu_vm_mask;          /*   432     4 */
              mm_context_t               context;              /*   436    68 */
              long unsigned int          swap_token_time;      /*   504     4 */
              char                       recent_pagein;        /*   508     1 */
              unsigned char              dumpable:2;           /*   509     1 */
      
              /* XXX 2 bytes hole, try to pack */
      
              int                        core_waiters;         /*   512     4 */
              struct completion *        core_startup_done;    /*   516     4 */
              struct completion          core_done;            /*   520    52 */
              rwlock_t                   ioctx_list_lock;      /*   572    36 */
              struct kioctx *            ioctx_list;           /*   608     4 */
      }; /* size: 612, sum members: 610, holes: 1, sum holes: 2, cachelines: 20,
            last cacheline: 4 bytes */
      
      [acme@newtoy net-2.6.20]$ codiff -V /tmp/sched.o.before kernel/sched.o
      /pub/scm/linux/kernel/git/acme/net-2.6.20/kernel/sched.c:
        struct mm_struct |   -4
          dumpable:2;
           from: unsigned int          /*   432(30)    4(2) */
           to:   unsigned char         /*   509(6)     1(2) */
      < SNIP other offset changes >
       1 struct changed
      [acme@newtoy net-2.6.20]$
      
      I'm not aware of any problem about using 2 byte wide bitfields where
      previously a 4 byte wide one was, holler if there is any, I wouldn't be
      surprised, bitfields are things from hell.
      
      For the curious, 432(30) means: at offset 432 from the struct start, at
      offset 30 in the bitfield (yeah, it comes backwards, hellish, huh?) ditto
      for 509(6), while 4(2) and 1(2) means "struct field size(bitfield size)".
      
      Now we have a 2 bytes hole and are using only 4 bytes of the last 32
      bytes cacheline, any takers? :-)
      Signed-off-by: NArnaldo Carvalho de Melo <acme@mandriva.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      36de6437
    • A
      [PATCH] mm: make compound page destructor handling explicit · 33f2ef89
      Andy Whitcroft 提交于
      Currently we we use the lru head link of the second page of a compound page
      to hold its destructor.  This was ok when it was purely an internal
      implmentation detail.  However, hugetlbfs overrides this destructor
      violating the layering.  Abstract this out as explicit calls, also
      introduce a type for the callback function allowing them to be type
      checked.  For each callback we pre-declare the function, causing a type
      error on definition rather than on use elsewhere.
      
      [akpm@osdl.org: cleanups]
      Signed-off-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      33f2ef89
    • C
      [PATCH] slab: better fallback allocation behavior · 3c517a61
      Christoph Lameter 提交于
      Currently we simply attempt to allocate from all allowed nodes using
      GFP_THISNODE.  However, GFP_THISNODE does not do reclaim (it wont do any at
      all if the recent GFP_THISNODE patch is accepted).  If we truly run out of
      memory in the whole system then fallback_alloc may return NULL although
      memory may still be available if we would perform more thorough reclaim.
      
      This patch changes fallback_alloc() so that we first only inspect all the
      per node queues for available slabs.  If we find any then we allocate from
      those.  This avoids slab fragmentation by first getting rid of all partial
      allocated slabs on every node before allocating new memory.
      
      If we cannot satisfy the allocation from any per node queue then we extend
      a slab.  We now call into the page allocator without specifying
      GFP_THISNODE.  The page allocator will then implement its own fallback (in
      the given cpuset context), perform necessary reclaim (again considering not
      a single node but the whole set of allowed nodes) and then return pages for
      a new slab.
      
      We identify from which node the pages were allocated and then insert the
      pages into the corresponding per node structure.  In order to do so we need
      to modify cache_grow() to take a parameter that specifies the new slab.
      kmem_getpages() can no longer set the GFP_THISNODE flag since we need to be
      able to use kmem_getpage to allocate from an arbitrary node.  GFP_THISNODE
      needs to be specified when calling cache_grow().
      
      One key advantage is that the decision from which node to allocate new
      memory is removed from slab fallback processing.  The patch allows to go
      back to use of the page allocators fallback/reclaim logic.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      3c517a61
    • C
      [PATCH] GFP_THISNODE must not trigger global reclaim · 952f3b51
      Christoph Lameter 提交于
      The intent of GFP_THISNODE is to make sure that an allocation occurs on a
      particular node.  If this is not possible then NULL needs to be returned so
      that the caller can choose what to do next on its own (the slab allocator
      depends on that).
      
      However, GFP_THISNODE currently triggers reclaim before returning a failure
      (GFP_THISNODE means GFP_NORETRY is set).  If we have over allocated a node
      then we will currently do some reclaim before returning NULL.  The caller
      may want memory from other nodes before reclaim should be triggered.  (If
      the caller wants reclaim then he can directly use __GFP_THISNODE instead).
      
      There is no flag to avoid reclaim in the page allocator and adding yet
      another GFP_xx flag would be difficult given that we are out of available
      flags.
      
      So just compare and see if all bits for GFP_THISNODE (__GFP_THISNODE,
      __GFP_NORETRY and __GFP_NOWARN) are set.  If so then we return NULL before
      waking up kswapd.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      952f3b51
    • C
      [PATCH] slab: fix two issues in kmalloc_node / __cache_alloc_node · 5bcd234d
      Christoph Lameter 提交于
      This addresses two issues:
      
      1. Kmalloc_node() may intermittently return NULL if we are allocating
         from the current node and are unable to obtain memory for the current
         node from the page allocator.  This is because we call ___cache_alloc()
         if nodeid == numa_node_id() and ____cache_alloc is not able to fallback
         to other nodes.
      
         This was introduced in the 2.6.19 development cycle.  <= 2.6.18 in
         that case does not do a restricted allocation and blindly trusts the
         page allocator to have given us memory from the indicated node.  It
         inserts the page regardless of the node it came from into the queues for
         the current node.
      
      2. If kmalloc_node() is used on a node that has not been bootstrapped
         yet then we may try to pass an invalid node number to
         ____cache_alloc_node() triggering a BUG().
      
         Change the function to call fallback_alloc() instead.  Only call
         fallback_alloc() if we are allowed to fallback at all.  The need to
         handle a node not bootstrapped yet also first surfaced in the 2.6.19
         cycle.
      
      Update the comments since they were still describing the old kmalloc_node
      from 2.6.12.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      5bcd234d