1. 04 4月, 2014 1 次提交
    • J
      mm + fs: store shadow entries in page cache · 91b0abe3
      Johannes Weiner 提交于
      Reclaim will be leaving shadow entries in the page cache radix tree upon
      evicting the real page.  As those pages are found from the LRU, an
      iput() can lead to the inode being freed concurrently.  At this point,
      reclaim must no longer install shadow pages because the inode freeing
      code needs to ensure the page tree is really empty.
      
      Add an address_space flag, AS_EXITING, that the inode freeing code sets
      under the tree lock before doing the final truncate.  Reclaim will check
      for this flag before installing shadow pages.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91b0abe3
  2. 13 3月, 2014 1 次提交
    • T
      fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d
      Theodore Ts'o 提交于
      Previously, the no-op "mount -o mount /dev/xxx" operation when the
      file system is already mounted read-write causes an implied,
      unconditional syncfs().  This seems pretty stupid, and it's certainly
      documented or guaraunteed to do this, nor is it particularly useful,
      except in the case where the file system was mounted rw and is getting
      remounted read-only.
      
      However, it's possible that there might be some file systems that are
      actually depending on this behavior.  In most file systems, it's
      probably fine to only call sync_filesystem() when transitioning from
      read-write to read-only, and there are some file systems where this is
      not needed at all (for example, for a pseudo-filesystem or something
      like romfs).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Cc: Jan Kara <jack@suse.cz>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Anders Larsen <al@alarsen.net>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: xfs@oss.sgi.com
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: fuse-devel@lists.sourceforge.net
      Cc: cluster-devel@redhat.com
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-nilfs@vger.kernel.org
      Cc: linux-ntfs-dev@lists.sourceforge.net
      Cc: ocfs2-devel@oss.oracle.com
      Cc: reiserfs-devel@vger.kernel.org
      02b9984d
  3. 22 2月, 2014 1 次提交
    • J
      Revert "writeback: do not sync data dirtied after sync start" · 0dc83bd3
      Jan Kara 提交于
      This reverts commit c4a391b5. Dave
      Chinner <david@fromorbit.com> has reported the commit may cause some
      inodes to be left out from sync(2). This is because we can call
      redirty_tail() for some inode (which sets i_dirtied_when to current time)
      after sync(2) has started or similarly requeue_inode() can set
      i_dirtied_when to current time if writeback had to skip some pages. The
      real problem is in the functions clobbering i_dirtied_when but fixing
      that isn't trivial so revert is a safer choice for now.
      
      CC: stable@vger.kernel.org # >= 3.13
      Signed-off-by: NJan Kara <jack@suse.cz>
      0dc83bd3
  4. 13 11月, 2013 1 次提交
    • J
      writeback: do not sync data dirtied after sync start · c4a391b5
      Jan Kara 提交于
      When there are processes heavily creating small files while sync(2) is
      running, it can easily happen that quite some new files are created
      between WB_SYNC_NONE and WB_SYNC_ALL pass of sync(2).  That can happen
      especially if there are several busy filesystems (remember that sync
      traverses filesystems sequentially and waits in WB_SYNC_ALL phase on one
      fs before starting it on another fs).  Because WB_SYNC_ALL pass is slow
      (e.g.  causes a transaction commit and cache flush for each inode in
      ext3), resulting sync(2) times are rather large.
      
      The following script reproduces the problem:
      
        function run_writers
        {
          for (( i = 0; i < 10; i++ )); do
            mkdir $1/dir$i
            for (( j = 0; j < 40000; j++ )); do
              dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null
            done &
          done
        }
      
        for dir in "$@"; do
          run_writers $dir
        done
      
        sleep 40
        time sync
      
      Fix the problem by disregarding inodes dirtied after sync(2) was called
      in the WB_SYNC_ALL pass.  To allow for this, sync_inodes_sb() now takes
      a time stamp when sync has started which is used for setting up work for
      flusher threads.
      
      To give some numbers, when above script is run on two ext4 filesystems
      on simple SATA drive, the average sync time from 10 runs is 267.549
      seconds with standard deviation 104.799426.  With the patched kernel,
      the average sync time from 10 runs is 2.995 seconds with standard
      deviation 0.096.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4a391b5
  5. 31 10月, 2013 1 次提交
  6. 24 10月, 2013 4 次提交
    • D
      xfs: decouple inode and bmap btree header files · a4fbe6ab
      Dave Chinner 提交于
      Currently the xfs_inode.h header has a dependency on the definition
      of the BMAP btree records as the inode fork includes an array of
      xfs_bmbt_rec_host_t objects in it's definition.
      
      Move all the btree format definitions from xfs_btree.h,
      xfs_bmap_btree.h, xfs_alloc_btree.h and xfs_ialloc_btree.h to
      xfs_format.h to continue the process of centralising the on-disk
      format definitions. With this done, the xfs inode definitions are no
      longer dependent on btree header files.
      
      The enables a massive culling of unnecessary includes, with close to
      200 #include directives removed from the XFS kernel code base.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a4fbe6ab
    • D
      xfs: decouple log and transaction headers · 239880ef
      Dave Chinner 提交于
      xfs_trans.h has a dependency on xfs_log.h for a couple of
      structures. Most code that does transactions doesn't need to know
      anything about the log, but this dependency means that they have to
      include xfs_log.h. Decouple the xfs_trans.h and xfs_log.h header
      files and clean up the includes to be in dependency order.
      
      In doing this, remove the direct include of xfs_trans_reserve.h from
      xfs_trans.h so that we remove the dependency between xfs_trans.h and
      xfs_mount.h. Hence the xfs_trans.h include can be moved to the
      indicate the actual dependencies other header files have on it.
      
      Note that these are kernel only header files, so this does not
      translate to any userspace changes at all.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      239880ef
    • D
      xfs: unify directory/attribute format definitions · 57062787
      Dave Chinner 提交于
      The on-disk format definitions for the directory and attribute
      structures are spread across 3 header files right now, only one of
      which is dedicated to defining on-disk structures and their
      manipulation (xfs_dir2_format.h). Pull all the format definitions
      into a single header file - xfs_da_format.h - and switch all the
      code over to point at that.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      57062787
    • D
      xfs: create a shared header file for format-related information · 70a9883c
      Dave Chinner 提交于
      All of the buffer operations structures are needed to be exported
      for xfs_db, so move them all to a common location rather than
      spreading them all over the place. They are verifying the on-disk
      format, so while xfs_format.h might be a good place, it is not part
      of the on disk format.
      
      Hence we need to create a new header file that we centralise these
      related definitions. Start by moving the bffer operations
      structures, and then also move all the other definitions that have
      crept into xfs_log_format.h and xfs_format.h as there was no other
      shared header file to put them in.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      70a9883c
  7. 18 10月, 2013 1 次提交
  8. 02 10月, 2013 1 次提交
  9. 11 9月, 2013 2 次提交
    • D
      fs: convert inode and dentry shrinking to be node aware · 9b17c623
      Dave Chinner 提交于
      Now that the shrinker is passing a node in the scan control structure, we
      can pass this to the the generic LRU list code to isolate reclaim to the
      lists on matching nodes.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@parallels.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9b17c623
    • D
      shrinker: convert superblock shrinkers to new API · 0a234c6d
      Dave Chinner 提交于
      Convert superblock shrinker to use the new count/scan API, and propagate
      the API changes through to the filesystem callouts.  The filesystem
      callouts already use a count/scan API, so it's just changing counters to
      longs to match the VM API.
      
      This requires the dentry and inode shrinker callouts to be converted to
      the count/scan API.  This is mainly a mechanical change.
      
      [glommer@openvz.org: use mult_frac for fractional proportions, build fixes]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0a234c6d
  10. 13 8月, 2013 4 次提交
  11. 31 7月, 2013 1 次提交
  12. 23 7月, 2013 1 次提交
  13. 29 6月, 2013 2 次提交
  14. 28 6月, 2013 1 次提交
    • D
      xfs: Inode create log items · 3ebe7d2d
      Dave Chinner 提交于
      Introduce the inode create log item type for logical inode create logging.
      Instead of logging the changes in buffers, pass the range to be
      initialised through the log by a new transaction type.  This reduces
      the amount of log space required to record initialisation during
      allocation from about 128 bytes per inode to a small fixed amount
      per inode extent to be initialised.
      
      This requires a new log item type to track it through the log
      and the AIL. This is a relatively simple item - most callbacks are
      noops as this item has the same life cycle as the transaction.
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      3ebe7d2d
  15. 20 6月, 2013 1 次提交
    • J
      xfs: Remove XFS_MOUNT_RETERR · 39a45d84
      Jie Liu 提交于
      XFS_MOUNT_RETERR is going to be set at xfs_parseargs() if
      mp->m_dalign is enabled, so any time we enter "if (mp->m_dalign)"
      branch in xfs_update_alignment(), XFS_MOUNT_RETERR is set and so
      we always be emitting a warning and returning an error.
      
      Hence, we can remove it and get rid of a couple of redundant
      check up against it at xfs_upate_alignment().
      
      Thanks Dave Chinner for the suggestions of simplify the code
      in xfs_parseargs().
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: Mark Tinguely <tinguely@sgi.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      39a45d84
  16. 06 6月, 2013 2 次提交
  17. 04 3月, 2013 1 次提交
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  18. 14 1月, 2013 1 次提交
  19. 09 11月, 2012 1 次提交
  20. 18 10月, 2012 10 次提交
    • D
      xfs: rename xfs_sync.[ch] to xfs_icache.[ch] · 6d8b79cf
      Dave Chinner 提交于
      xfs_sync.c now only contains inode reclaim functions and inode cache
      iteration functions. It is not related to sync operations anymore.
      Rename to xfs_icache.c to reflect it's contents and prepare for
      consolidation with the other inode cache file that exists
      (xfs_iget.c).
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      6d8b79cf
    • D
      xfs: xfs_quiesce_attr() should quiesce the log like unmount · c75921a7
      Dave Chinner 提交于
      xfs_quiesce_attr() is supposed to leave the log empty with an
      unmount record written. Right now it does not wait for the AIL to be
      emptied before writing the unmount record, not does it wait for
      metadata IO completion, either. Fix it to use the same method and
      code as xfs_log_unmount().
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c75921a7
    • D
      xfs: move xfs_quiesce_attr() into xfs_super.c · c7eea6f7
      Dave Chinner 提交于
      Both callers of xfs_quiesce_attr() are in xfs_super.c, and there's
      nothing really sync-specific about this functionality so it doesn't
      really matter where it lives. Move it to benext to it's callers, so
      all the remount/sync_fs code is in the one place.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      c7eea6f7
    • D
      xfs: xfs_sync_fsdata is redundant · 34061f5c
      Dave Chinner 提交于
      Why do we need to write the superblock to disk once we've written
      all the data?  We don't actually - the reasons for doing this are
      lost in the mists of time, and go back to the way Irix used to drive
      VFS flushing.
      
      On linux, this code is only called from two contexts: remount and
      .sync_fs. In the remount case, the call is followed by a metadata
      sync, which unpins and writes the superblock.  In the sync_fs case,
      we only need to force the log to disk to ensure that the superblock
      is correctly on disk, so we don't actually need to write it. Hence
      the functionality is either redundant or superfluous and thus can be
      removed.
      
      Seeing as xfs_quiesce_data is essentially now just a log force,
      remove it as well and fold the code back into the two callers.
      Neither of them need the log covering check, either, as that is
      redundant for the remount case, and unnecessary for the .sync_fs
      case.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      34061f5c
    • D
      xfs: syncd workqueue is no more · 5889608d
      Dave Chinner 提交于
      With the syncd functions moved to the log and/or removed, the syncd
      workqueue is the only remaining bit left. It is used by the log
      covering/ail pushing work, as well as by the inode reclaim work.
      
      Given how cheap workqueues are these days, give the log and inode
      reclaim work their own work queues and kill the syncd work queue.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5889608d
    • D
      xfs: xfs_sync_data is redundant. · 9aa05000
      Dave Chinner 提交于
      We don't do any data writeback from XFS any more - the VFS is
      completely responsible for that, including for freeze. We can
      replace the remaining caller with a VFS level function that
      achieves the same thing, but without conflicting with current
      writeback work.
      
      This means we can remove the flush_work and xfs_flush_inodes() - the
      VFS functionality completely replaces the internal flush queue for
      doing this writeback work in a separate context to avoid stack
      overruns.
      
      This does have one complication - it cannot be called with page
      locks held.  Hence move the flushing of delalloc space when ENOSPC
      occurs back up into xfs_file_aio_buffered_write when we don't hold
      any locks that will stall writeback.
      
      Unfortunately, writeback_inodes_sb_if_idle() is not sufficient to
      trigger delalloc conversion fast enough to prevent spurious ENOSPC
      whent here are hundreds of writers, thousands of small files and GBs
      of free RAM.  Hence we need to use sync_sb_inodes() to block callers
      while we wait for writeback like the previous xfs_flush_inodes
      implementation did.
      
      That means we have to hold the s_umount lock here, but because this
      call can nest inside i_mutex (the parent directory in the create
      case, held by the VFS), we have to use down_read_trylock() to avoid
      potential deadlocks. In practice, this trylock will succeed on
      almost every attempt as unmount/remount type operations are
      exceedingly rare.
      
      Note: we always need to pass a count of zero to
      generic_file_buffered_write() as the previously written byte count.
      We only do this by accident before this patch by the virtue of ret
      always being zero when there are no errors. Make this explicit
      rather than needing to specifically zero ret in the ENOSPC retry
      case.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      9aa05000
    • D
      xfs: sync work is now only periodic log work · f661f1e0
      Dave Chinner 提交于
      The only thing the periodic sync work does now is flush the AIL and
      idle the log. These are really functions of the log code, so move
      the work to xfs_log.c and rename it appropriately.
      
      The only wart that this leaves behind is the xfssyncd_centisecs
      sysctl, otherwise the xfssyncd is dead. Clean up any comments that
      related to xfssyncd to reflect it's passing.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      f661f1e0
    • D
      xfs: don't run the sync work if the filesystem is read-only · 7f7bebef
      Dave Chinner 提交于
      If the filesystem is mounted or remounted read-only, stop the sync
      worker that tries to flush or cover the log if the filesystem is
      dirty. It's read-only, so it isn't dirty. Restart it on a remount,rw
      as necessary. This avoids the need for RO checks in the work.
      
      Similarly, stop the sync work when the filesystem is frozen, and
      start it again when the filesysetm is thawed. This avoids the need
      for special freeze checks in the work.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      7f7bebef
    • D
      xfs: rationalise xfs_mount_wq users · 7e18530b
      Dave Chinner 提交于
      Instead of starting and stopping background work on the xfs_mount_wq
      all at the same time, separate them to where they really are needed
      to start and stop.
      
      The xfs_sync_worker, only needs to be started after all the mount
      processing has completed successfully, while it needs to be stopped
      before the log is unmounted.
      
      The xfs_reclaim_worker is started on demand, and can be
      stopped before the unmount process does it's own inode reclaim pass.
      
      The xfs_flush_inodes work is run on demand, and so we really only
      need to ensure that it has stopped running before we start
      processing an unmount, freeze or remount,ro.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      7e18530b
    • D
      xfs: xfs_syncd_stop must die · 33c7a2bc
      Dave Chinner 提交于
      xfs_syncd_start and xfs_syncd_stop tie a bunch of unrelated
      functionailty together that actually have different start and stop
      requirements. Kill these functions and open code the start/stop
      methods for each of the background functions.
      
      Subsequent patches will move the start/stop functions around to the
      correct places to avoid races and shutdown issues.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      33c7a2bc
  21. 03 10月, 2012 1 次提交
  22. 27 9月, 2012 1 次提交