1. 20 7月, 2007 24 次提交
    • R
      PM: Reduce code duplication between main.c and user.c · 6c961dfb
      Rafael J. Wysocki 提交于
      The SNAPSHOT_S2RAM ioctl code is outdated and it should not duplicate the
      suspend code in kernel/power/main.c.  Fix that.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c961dfb
    • R
      PM: prevent frozen user mode helpers from failing the freezing of tasks · ccd4b65a
      Rafael J. Wysocki 提交于
      At present, if a user mode helper is running while
      usermodehelper_pm_callback() is executed, the helper may be frozen and the
      completion in call_usermodehelper_exec() won't be completed until user
      space processes are thawed.  As a result, the freezing of kernel threads
      may fail, which is not desirable.
      
      Prevent this from happening by introducing a counter of running user mode
      helpers and allowing usermodehelper_pm_callback() to succeed for action =
      PM_HIBERNATION_PREPARE or action = PM_SUSPEND_PREPARE only if there are no
      helpers running.  [Namely, usermodehelper_pm_callback() waits for at most
      RUNNING_HELPERS_TIMEOUT for the number of running helpers to become zero
      and fails if that doesn't happen.]
      
      Special thanks to Uli Luckas <u.luckas@road.de>, Pavel Machek
      <pavel@ucw.cz> and Oleg Nesterov <oleg@tv-sign.ru> for reviewing the
      previous versions of this patch and for very useful comments.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NUli Luckas <u.luckas@road.de>
      Acked-by: NNigel Cunningham <nigel@nigel.suspend2.net>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ccd4b65a
    • R
      PM: disable usermode helper before hibernation and suspend · 8cdd4936
      Rafael J. Wysocki 提交于
      Use a hibernation and suspend notifier to disable the user mode helper before
      a hibernation/suspend and enable it after the operation.
      
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Acked-by: NNigel Cunningham <nigel@nigel.suspend2.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8cdd4936
    • R
      PM: introduce hibernation and suspend notifiers · b10d9117
      Rafael J. Wysocki 提交于
      Make it possible to register hibernation and suspend notifiers, so that
      subsystems can perform hibernation-related or suspend-related operations that
      should not be carried out by device drivers' .suspend() and .resume()
      routines.
      
      [akpm@linux-foundation.org: build fixes]
      [akpm@linux-foundation.org: cleanups]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b10d9117
    • R
      Freezer: remove redundant check in try_to_freeze_tasks · c2cf7d87
      Rafael J. Wysocki 提交于
      We don't need to check if todo is positive before calling time_after() in
      try_to_freeze_tasks(), because if todo is zero at this point, the loop will be
      broken anyway due to the while () condition being false.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2cf7d87
    • R
      Freezer: return int from freeze_processes · e7cd8a72
      Rafael J. Wysocki 提交于
      Make try_to_freeze_tasks() and freeze_processes() return -EBUSY on failure
      instead of the number of unfrozen tasks (none of the callers actually uses
      this number).
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7cd8a72
    • R
      Freezer: use __set_current_state in refrigerator · f4a3a7d6
      Rafael J. Wysocki 提交于
      Use __set_current_state() as appropriate in refrigerator() instead of
      accessing current->state directly.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4a3a7d6
    • R
      Freezer: avoid freezing kernel threads prematurely · 0c1eecfb
      Rafael J. Wysocki 提交于
      Kernel threads should not have TIF_FREEZE set when user space processes are
      being frozen, since otherwise some of them might be frozen prematurely.
      To prevent this from happening we can (1) make exit_mm() unset TIF_FREEZE
      unconditionally just after clearing tsk->mm and (2) make try_to_freeze_tasks()
      check if p->mm is different from zero and PF_BORROWED_MM is unset in p->flags
      when user space processes are to be frozen.
      
      Namely, when user space processes are being frozen, we only should set
      TIF_FREEZE for tasks that have p->mm different from NULL and don't have
      PF_BORROWED_MM set in p->flags.  For this reason task_lock() must be used to
      prevent try_to_freeze_tasks() from racing with use_mm()/unuse_mm(), in which
      p->mm and p->flags.PF_BORROWED_MM are changed under task_lock(p).  Also, we
      need to prevent the following scenario from happening:
      
      * daemonize() is called by a task spawned from a user space code path
      * freezer checks if the task has p->mm set and the result is positive
      * task enters exit_mm() and clears its TIF_FREEZE
      * freezer sets TIF_FREEZE for the task
      * task calls try_to_freeze() and goes to the refrigerator, which is wrong at
        that point
      
      This requires us to acquire task_lock(p) before p->flags.PF_BORROWED_MM and
      p->mm are examined and release it after TIF_FREEZE is set for p (or it turns
      out that TIF_FREEZE should not be set).
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c1eecfb
    • R
      Hibernation: prepare to enter the low power state · b1457bcc
      Rafael J. Wysocki 提交于
      During hibernation we call hibernation_ops->prepare() before creating the image,
      but then, before saving it, we cancel the power transition by calling
      hibernation_ops->finish().  Thus prior to calling hibernation_ops->enter() we
      should let the platform firmware know that we're going to enter the low power
      state after all.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1457bcc
    • R
      swsusp: fix hibernation code ordering · 10a1803d
      Rafael J. Wysocki 提交于
      Change the code ordering so that hibernation_ops->prepare() is called after
      device_suspend().  This is needed so that we don't violate the ACPI
      specification, which states that the _PTS and _GTS system-control methods,
      executed from acpi_sleep_prepare(), ought to be called after devices have been
      put in low power states.
      
      The "Finish" label in hibernation_restore() is moved, because device_suspend()
      resumes devices if the suspending of them fails and the restore code ordering
      should reflect the hibernation code ordering.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10a1803d
    • R
      swsusp: introduce restore platform operations · a634cc10
      Rafael J. Wysocki 提交于
      At least on some machines it is necessary to prepare the ACPI firmware for the
      restoration of the system memory state from the hibernation image if the
      "platform" mode of hibernation has been used.  Namely, in that cases we need
      to disable the GPEs before replacing the "boot" kernel with the "frozen"
      kernel (cf.  http://bugzilla.kernel.org/show_bug.cgi?id=7887).  After the
      restore they will be re-enabled by hibernation_ops->finish(), but if the
      restore fails, they have to be re-enabled by the restore code explicitly.
      
      For this purpose we can introduce two additional hibernation operations,
      called pre_restore() and restore_cleanup() and call them from the restore code
      path.  Still, they should be called if the "platform" mode of hibernation has
      been used, so we need to pass the information about the hibernation mode from
      the "frozen" kernel to the "boot" kernel in the image header.
      
      Apparently, we can't drop the disabling of GPEs before the restore because of
      Bug #7887 .   We also can't do it unconditionally, because the GPEs wouldn't
      have been enabled after a successful restore if the suspend had been done in
      the 'shutdown' or 'reboot' mode.
      
      In principle we could (and probably should) unconditionally disable the GPEs
      before each snapshot creation *and* before the restore, but then we'd have to
      unconditionally enable them after the snapshot creation as well as after the
      restore (or restore failure)   Still, for this purpose we'd need to modify
      acpi_enter_sleep_state_prep() and acpi_leave_sleep_state() and we'd have to
      introduce some mechanism synchronizing the disablind/enabling of the GPEs with
      the device drivers' .suspend()/.resume() routines and with
      disable_/enable_nonboot_cpus().   However, this would have affected the
      suspend (ie.  s2ram) code as well as the hibernation, which I'd like to avoid
      in this patch series.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a634cc10
    • R
      swsusp: remove code duplication between disk.c and user.c · 7777fab9
      Rafael J. Wysocki 提交于
      Currently, much of the code in kernel/power/disk.c is duplicated in
      kernel/power/user.c , mainly for historical reasons.  By eliminating this code
      duplication we can reduce the size of user.c quite substantially and remove
      the maintenance difficulty resulting from it.
      
      [bunk@stusta.de: kernel/power/disk.c: make code static]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7777fab9
    • R
      swsusp: remove incorrect code from user.c · 127067a9
      Rafael J. Wysocki 提交于
      In the face of the recent change of suspend code ordering (cf.
      http://marc.info/?l=linux-acpi&m=117938245931603&w=2) we should also modify
      the code ordering in swsusp so that hibernation_ops->prepare() is executed
      after device_suspend().
      
      However, for this purpose it seems reasonable to eliminate the code
      duplication between kernel/power/disk.c and kernel/power/user.c first.  By
      eliminating it we can reduce the size of user.c quite substantially and remove
      the maintenance difficulty with making essentially the same changes in two
      different places.
      
      Moreover, we should also remove the calls to "platform" functions from the
      restore code path, since it doesn't carry out any power transition of the
      system, but we generally need to disable the GPEs before the restore if the
      'platform' hibernation mode has been used.  To do this, we can introduce two
      new hibernation_ops to be used in the restore code.
      
      This patch:
      
      Make the code hibernation code in kernel/power/user.c be functionally
      equivalent to the corresponding code in kernel/power/disk.c , as it should be.
      
      The calls to the platform functions removed by this patch are incorrect.  They
      should be replaced with some other "platform" invocations that will be
      introduced in one of the subsequent patches.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      127067a9
    • B
      PM: Do not require dev spew to get PM_DEBUG · a0349828
      Ben Collins 提交于
      In order to enable things like PM_TRACE, you're required to enable
      PM_DEBUG, which sends a large spew of messages on boot, and often times can
      overflow dmesg buffer.
      
      Create new PM_VERBOSE and shift that to be the option that enables
      drivers/base/power's messages.
      Signed-off-by: NBen Collins <bcollins@ubuntu.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0349828
    • A
      freezer: run show_state() when freezing times out · 328616e3
      Andrew Morton 提交于
      To see which tasks are stuck where.
      
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      328616e3
    • M
      only allow nonlinear vmas for ram backed filesystems · 3ee6dafc
      Miklos Szeredi 提交于
      page_mkclean() doesn't re-protect ptes for non-linear mappings, so a later
      re-dirty through such a mapping will not generate a fault, PG_dirty will
      not reflect the dirty state and the dirty count will be skewed.  This
      implies that msync() is also currently broken for nonlinear mappings.
      
      The easiest solution is to emulate remap_file_pages on non-linear mappings
      with simple mmap() for non ram-backed filesystems.  Applications continue
      to work (albeit slower), as long as the number of remappings remain below
      the maximum vma count.
      
      However all currently known real uses of non-linear mappings are for ram
      backed filesystems, which this patch doesn't affect.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ee6dafc
    • M
      Remove alloc_zeroed_user_highpage() · bb2d5ce1
      Mel Gorman 提交于
      alloc_zeroed_user_highpage() has no in-tree users and it is not exported.
      As it is not exported, it can simply be removed.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Acked-by: NAndy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb2d5ce1
    • N
      mm: fix clear_page_dirty_for_io vs fault race · 79352894
      Nick Piggin 提交于
      Fix msync data loss and (less importantly) dirty page accounting
      inaccuracies due to the race remaining in clear_page_dirty_for_io().
      
      The deleted comment explains what the race was, and the added comments
      explain how it is fixed.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79352894
    • N
      mm: fault feedback #2 · 83c54070
      Nick Piggin 提交于
      This patch completes Linus's wish that the fault return codes be made into
      bit flags, which I agree makes everything nicer.  This requires requires
      all handle_mm_fault callers to be modified (possibly the modifications
      should go further and do things like fault accounting in handle_mm_fault --
      however that would be for another patch).
      
      [akpm@linux-foundation.org: fix alpha build]
      [akpm@linux-foundation.org: fix s390 build]
      [akpm@linux-foundation.org: fix sparc build]
      [akpm@linux-foundation.org: fix sparc64 build]
      [akpm@linux-foundation.org: fix ia64 build]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ian Molton <spyro@f2s.com>
      Cc: Bryan Wu <bryan.wu@analog.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Richard Curnow <rc@rc0.org.uk>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it>
      Cc: Miles Bader <uclinux-v850@lsi.nec.co.jp>
      Cc: Chris Zankel <chris@zankel.net>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NHaavard Skinnemoen <hskinnemoen@atmel.com>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Acked-by: NAndi Kleen <ak@muc.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Still apparently needs some ARM and PPC loving - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c54070
    • N
      mm: fault feedback #1 · d0217ac0
      Nick Piggin 提交于
      Change ->fault prototype.  We now return an int, which contains
      VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
       FAULT_RET_ code tells the VM whether a page was found, whether it has been
      locked, and potentially other things.  This is not quite the way he wanted
      it yet, but that's changed in the next patch (which requires changes to
      arch code).
      
      This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
      that a page is locked which requires filemap_nopage to go away (because we
      can no longer remain backward compatible without that flag), but we were
      going to do that anyway.
      
      struct fault_data is renamed to struct vm_fault as Linus asked. address
      is now a void __user * that we should firmly encourage drivers not to use
      without really good reason.
      
      The page is now returned via a page pointer in the vm_fault struct.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0217ac0
    • M
      Document ->page_mkwrite() locking · ed2f2f9b
      Mark Fasheh 提交于
      There seems to be very little documentation about this callback in general.
      The locking in particular is a bit tricky, so it's worth having this in
      writing.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed2f2f9b
    • M
      ocfs2: release page lock before calling ->page_mkwrite · 69676147
      Mark Fasheh 提交于
      __do_fault() was calling ->page_mkwrite() with the page lock held, which
      violates the locking rules for that callback.  Release and retake the page
      lock around the callback to avoid deadlocking file systems which manually
      take it.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69676147
    • N
      mm: merge populate and nopage into fault (fixes nonlinear) · 54cb8821
      Nick Piggin 提交于
      Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
      the virtual address -> file offset differently from linear mappings.
      
      ->populate is a layering violation because the filesystem/pagecache code
      should need to know anything about the virtual memory mapping.  The hitch here
      is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
       But it is more logical to pass pgoff rather than have the ->nopage function
      calculate it itself anyway (because that's a similar layering violation).
      
      Having the populate handler install the pte itself is likewise a nasty thing
      to be doing.
      
      This patch introduces a new fault handler that replaces ->nopage and
      ->populate and (later) ->nopfn.  Most of the old mechanism is still in place
      so there is a lot of duplication and nice cleanups that can be removed if
      everyone switches over.
      
      The rationale for doing this in the first place is that nonlinear mappings are
      subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
      to duplicate the synchronisation logic rather than just consolidate the two.
      
      After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
      pagecache.  Seems like a fringe functionality anyway.
      
      NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
      users have hit mainline yet.
      
      [akpm@linux-foundation.org: cleanup]
      [randy.dunlap@oracle.com: doc. fixes for readahead]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54cb8821
    • N
      mm: fix fault vs invalidate race for linear mappings · d00806b1
      Nick Piggin 提交于
      Fix the race between invalidate_inode_pages and do_no_page.
      
      Andrea Arcangeli identified a subtle race between invalidation of pages from
      pagecache with userspace mappings, and do_no_page.
      
      The issue is that invalidation has to shoot down all mappings to the page,
      before it can be discarded from the pagecache.  Between shooting down ptes to
      a particular page, and actually dropping the struct page from the pagecache,
      do_no_page from any process might fault on that page and establish a new
      mapping to the page just before it gets discarded from the pagecache.
      
      The most common case where such invalidation is used is in file truncation.
      This case was catered for by doing a sort of open-coded seqlock between the
      file's i_size, and its truncate_count.
      
      Truncation will decrease i_size, then increment truncate_count before
      unmapping userspace pages; do_no_page will read truncate_count, then find the
      page if it is within i_size, and then check truncate_count under the page
      table lock and back out and retry if it had subsequently been changed (ptl
      will serialise against unmapping, and ensure a potentially updated
      truncate_count is actually visible).
      
      Complexity and documentation issues aside, the locking protocol fails in the
      case where we would like to invalidate pagecache inside i_size.  do_no_page
      can come in anytime and filemap_nopage is not aware of the invalidation in
      progress (as it is when it is outside i_size).  The end result is that
      dangling (->mapping == NULL) pages that appear to be from a particular file
      may be mapped into userspace with nonsense data.  Valid mappings to the same
      place will see a different page.
      
      Andrea implemented two working fixes, one using a real seqlock, another using
      a page->flags bit.  He also proposed using the page lock in do_no_page, but
      that was initially considered too heavyweight.  However, it is not a global or
      per-file lock, and the page cacheline is modified in do_no_page to increment
      _count and _mapcount anyway, so a further modification should not be a large
      performance hit.  Scalability is not an issue.
      
      This patch implements this latter approach.  ->nopage implementations return
      with the page locked if it is possible for their underlying file to be
      invalidated (in that case, they must set a special vm_flags bit to indicate
      so).  do_no_page only unlocks the page after setting up the mapping
      completely.  invalidation is excluded because it holds the page lock during
      invalidation of each page (and ensures that the page is not mapped while
      holding the lock).
      
      This also allows significant simplifications in do_no_page, because we have
      the page locked in the right place in the pagecache from the start.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d00806b1
  2. 19 7月, 2007 16 次提交
    • L
      Merge branch 'isdn-fix' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6 · 589f1e81
      Linus Torvalds 提交于
      * 'isdn-fix' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6:
        ISDN HiSax: uninitialized return in hisax_cs_setup
      589f1e81
    • L
      Merge branch 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 · ce524c83
      Linus Torvalds 提交于
      * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6:
        eHEA: Fix bonding support
        Blackfin ethernet driver: on chip ethernet MAC controller driver
        fix wrong argument of tc35815_read_plat_dev_addr()
        ARM/ETHER3: Handle multicast frames.
        SAA9730: Handle multicast frames.
        NI5010: Handle multicast frames.
        NS83820: Handle multicast frames.
        Fix RGMII-ID handling in gianfar
        Fix Vitesse RGMII-ID support
        Add phy-connection-type to gianfar nodes
        Fix Vitesse 824x PHY interrupt acking
        [PATCH] zd1211rw: Add ID for Siemens Gigaset USB Stick 54
        [PATCH] zd1211rw: Add ID for Planex GW-US54GXS
        [PATCH] Update version ipw2200 stamp to 1.2.2
        [PATCH] ipw2200: Fix ipw_isr() comments error on shared IRQ
        [PATCH] Fix ipw2200 set wrong power parameter causing firmware error
        [PATCH] ipw2100: Fix `iwpriv set_power` error
        [PATCH] softmac: Channel is listed twice in scan output
      ce524c83
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6 · 789c56b7
      Linus Torvalds 提交于
      * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (24 commits)
        [CIFS] merge conflict in fs/cifs/export.c
        [CIFS] Allow disabling CIFS Unix Extensions as mount option
        [CIFS] More whitespace/formatting fixes (noticed by checkpatch)
        [CIFS] Typo in previous patch
        [CIFS] zero_user_page() conversions
        [CIFS] use simple_prepare_write to zero page data
        [CIFS] Fix build break - inet.h not included when experimental ifdef off
        [CIFS] Add support for new POSIX unlink
        [CIFS] whitespace/formatting fixes
        [CIFS] Fix oops in cifs_create when nfsd server exports cifs mount
        [CIFS] whitespace cleanup
        [CIFS] Fix packet signatures for NTLMv2 case
        [CIFS] more whitespace fixes
        [CIFS] more whitespace cleanup
        [CIFS] whitespace cleanup
        [CIFS] whitespace cleanup
        [CIFS] ipv6 support no longer experimental
        [CIFS] Mount should fail if server signing off but client mount option requires it
        [CIFS] whitespace fixes
        [CIFS] Fix sign mount option and sign proc config setting
        ...
      789c56b7
    • L
      Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/docs-2.6 · 7209a1dc
      Linus Torvalds 提交于
      * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/docs-2.6:
        zh_CN/HOWTO: update URLs of git trees
        Chinese translation of Documentation/stable_api_nonsense.txt
        HOWTO: add Chinese translation of Documentation/HOWTO
        Documentation: add Japanese translated stable_api_nonsense.txt
        HOWTO: add Japanese translation of Documentation/HOWTO
      7209a1dc
    • L
      Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6 · 29e7ee37
      Linus Torvalds 提交于
      * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6:
        sysfs: cosmetic clean up on node creation failure paths
        sysfs: kill an extra put in sysfs_create_link() failure path
        Driver core: check return code of sysfs_create_link()
        HOWTO: Add the knwon_regression URI to the documentation
        dev_vdbg() documentation
        dev_vdbg(), available with -DVERBOSE_DEBUG
        sysfs: make sysfs_init_inode() static
        sysfs: fix sysfs root inode nlink accounting
        Documentation fix devres.txt: lib/iomap.c -> lib/devres.c
        sysfs: avoid kmem_cache_free(NULL)
        PM: remove deprecated dpm_runtime_* routines
        PM: Remove deprecated sysfs files
        Driver core: accept all valid action-strings in uevent-trigger
        debugfs: remove rmdir() non-empty complaint
      29e7ee37
    • L
      Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/uio-2.6 · fc15bc81
      Linus Torvalds 提交于
      * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/uio-2.6:
        UIO: Hilscher CIF card driver
        UIO: Documentation
        UIO: Add the User IO core code
      fc15bc81
    • L
      Merge branch 'for-linus' of git://linux-nfs.org/~bfields/linux · a8dcf12f
      Linus Torvalds 提交于
      * 'for-linus' of git://linux-nfs.org/~bfields/linux:
        locks: fix vfs_test_lock() comment
        locks: make posix_test_lock() interface more consistent
        nfs: disable leases over NFS
        gfs2: stop giving out non-cluster-coherent leases
        locks: export setlease to filesystems
        locks: provide a file lease method enabling cluster-coherent leases
        locks: rename lease functions to reflect locks.c conventions
        locks: share more common lease code
        locks: clean up lease_alloc()
        locks: convert an -EINVAL return to a BUG
        leases: minor break_lease() comment clarification
      a8dcf12f
    • L
      Merge branch 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband · d796e641
      Linus Torvalds 提交于
      * 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband: (29 commits)
        IB/mthca: Simplify use of size0 in work request posting
        IB/mthca: Factor out setting WQE UD segment entries
        IB/mthca: Factor out setting WQE remote address and atomic segment entries
        IB/mlx4: Factor out setting other WQE segments
        IB/mlx4: Factor out setting WQE data segment entries
        IB/mthca: Factor out setting WQE data segment entries
        IB/mlx4: Return receive queue sizes for userspace QPs from query QP
        IB/mlx4: Increase max outstanding RDMA reads as target
        RDMA/cma: Remove local write permission from QP access flags
        IB/mthca: Use uninitialized_var() for f0
        IB/cm: Make internal function cm_get_ack_delay() static
        IB/ipath: Remove ipath_get_user_pages_nocopy()
        IB/ipath: Make a few functions static
        mlx4_core: Reset device when internal error is detected
        IB/iser: Make a couple of functions static
        IB/mthca: Fix printk format used for firmware version in warning
        IB/mthca: Schedule MSI support for removal
        IB/ehca: Fix warnings issued by checkpatch.pl
        IB/ehca: Restructure ehca_set_pagebuf()
        IB/ehca: MR/MW structure refactoring
        ...
      d796e641
    • S
      Merge branch 'master' of /pub/scm/linux/kernel/git/torvalds/linux-2.6 · 1ff8392c
      Steve French 提交于
      Conflicts:
      
      	fs/cifs/export.c
      1ff8392c
    • S
      [CIFS] merge conflict in fs/cifs/export.c · 70b315b0
      Steve French 提交于
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      70b315b0
    • S
      [CIFS] Allow disabling CIFS Unix Extensions as mount option · c18c842b
      Steve French 提交于
      Previously the only way to do this was to umount all mounts to that server,
      turn off a proc setting (/proc/fs/cifs/LinuxExtensionsEnabled).
      
      Fixes Samba bugzilla bug number: 4582 (and also 2008)
      Signed-off-by: NSteve French <sfrench@us.ibm.com>
      c18c842b
    • J
      locks: fix vfs_test_lock() comment · 6924c554
      J. Bruce Fields 提交于
      Thanks to Doug Chapman for pointing out that the comment here is
      inconsistent with the function prototype.
      Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
      6924c554
    • J
      locks: make posix_test_lock() interface more consistent · 6d34ac19
      J. Bruce Fields 提交于
      Since posix_test_lock(), like fcntl() and ->lock(), indicates absence or
      presence of a conflict lock by setting fl_type to, respectively, F_UNLCK
      or something other than F_UNLCK, the return value is no longer needed.
      Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
      6d34ac19
    • J
      nfs: disable leases over NFS · 370f6599
      J. Bruce Fields 提交于
      As Peter Staubach says elsewhere
      (http://marc.info/?l=linux-kernel&m=118113649526444&w=2):
      
      > The problem is that some file system such as NFSv2 and NFSv3 do
      > not have sufficient support to be able to support leases correctly.
      > In particular for these two file systems, there is no over the wire
      > protocol support.
      >
      > Currently, these two file systems fail the fcntl(F_SETLEASE) call
      > accidentally, due to a reference counting difference.  These file
      > systems should fail more consciously, with a proper error to
      > indicate that the call is invalid for them.
      
      Define an nfs setlease method that just returns -EINVAL.
      
      If someone can demonstrate a real need, perhaps we could reenable
      them in the presence of the "nolock" mount option.
      Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
      Cc: Peter Staubach <staubach@redhat.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      370f6599
    • M
      gfs2: stop giving out non-cluster-coherent leases · 60446067
      Marc Eshel 提交于
      Since gfs2 can't prevent conflicting opens or leases on other nodes, we
      probably shouldn't allow it to give out leases at all.
      
      Put the newly defined lease operation into use in gfs2 by turning off
      lease, unless we're using the "nolock' locking module (in which case all
      locking is local anyway).
      Signed-off-by: NMarc Eshel <eshel@almaden.ibm.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      60446067
    • J
      locks: export setlease to filesystems · 4698afe8
      J. Bruce Fields 提交于
      Export setlease so it can used by filesystems to implement their lease
      methods.
      Signed-off-by: N"J. Bruce Fields" <bfields@citi.umich.edu>
      4698afe8