1. 03 6月, 2014 1 次提交
  2. 16 5月, 2014 2 次提交
  3. 14 5月, 2014 1 次提交
    • B
      GFS2: remove transaction glock · 24972557
      Benjamin Marzinski 提交于
      GFS2 has a transaction glock, which must be grabbed for every
      transaction, whose purpose is to deal with freezing the filesystem.
      Aside from this involving a large amount of locking, it is very easy to
      make the current fsfreeze code hang on unfreezing.
      
      This patch rewrites how gfs2 handles freezing the filesystem. The
      transaction glock is removed. In it's place is a freeze glock, which is
      cached (but not held) in a shared state by every node in the cluster
      when the filesystem is mounted. This lock only needs to be grabbed on
      freezing, and actions which need to be safe from freezing, like
      recovery.
      
      When a node wants to freeze the filesystem, it grabs this glock
      exclusively.  When the freeze glock state changes on the nodes (either
      from shared to unlocked, or shared to exclusive), the filesystem does a
      special log flush.  gfs2_log_flush() does all the work for flushing out
      the and shutting down the incore log, and then it tries to grab the
      freeze glock in a shared state again.  Since the filesystem is stuck in
      gfs2_log_flush, no new transaction can start, and nothing can be written
      to disk. Unfreezing the filesytem simply involes dropping the freeze
      glock, allowing gfs2_log_flush() to grab and then release the shared
      lock, so it is cached for next time.
      
      However, in order for the unfreezing ioctl to occur, gfs2 needs to get a
      shared lock on the filesystem root directory inode to check permissions.
      If that glock has already been grabbed exclusively, fsfreeze will be
      unable to get the shared lock and unfreeze the filesystem.
      
      In order to allow the unfreeze, this patch makes gfs2 grab a shared lock
      on the filesystem root directory during the freeze, and hold it until it
      unfreezes the filesystem.  The functions which need to grab a shared
      lock in order to allow the unfreeze ioctl to be issued now use the lock
      grabbed by the freeze code instead.
      
      The freeze and unfreeze code take care to make sure that this shared
      lock will not be dropped while another process is using it.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      24972557
  4. 28 4月, 2014 1 次提交
  5. 18 4月, 2014 1 次提交
  6. 17 4月, 2014 1 次提交
  7. 08 4月, 2014 1 次提交
  8. 04 4月, 2014 1 次提交
    • J
      mm + fs: store shadow entries in page cache · 91b0abe3
      Johannes Weiner 提交于
      Reclaim will be leaving shadow entries in the page cache radix tree upon
      evicting the real page.  As those pages are found from the LRU, an
      iput() can lead to the inode being freed concurrently.  At this point,
      reclaim must no longer install shadow pages because the inode freeing
      code needs to ensure the page tree is really empty.
      
      Add an address_space flag, AS_EXITING, that the inode freeing code sets
      under the tree lock before doing the final truncate.  Reclaim will check
      for this flag before installing shadow pages.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Bob Liu <bob.liu@oracle.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Luigi Semenzato <semenzato@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Metin Doslu <metin@citusdata.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Ozgun Erdogan <ozgun@citusdata.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Roman Gushchin <klamm@yandex-team.ru>
      Cc: Ryan Mallon <rmallon@gmail.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91b0abe3
  9. 01 4月, 2014 1 次提交
  10. 31 3月, 2014 2 次提交
  11. 19 3月, 2014 3 次提交
  12. 13 3月, 2014 1 次提交
    • T
      fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d
      Theodore Ts'o 提交于
      Previously, the no-op "mount -o mount /dev/xxx" operation when the
      file system is already mounted read-write causes an implied,
      unconditional syncfs().  This seems pretty stupid, and it's certainly
      documented or guaraunteed to do this, nor is it particularly useful,
      except in the case where the file system was mounted rw and is getting
      remounted read-only.
      
      However, it's possible that there might be some file systems that are
      actually depending on this behavior.  In most file systems, it's
      probably fine to only call sync_filesystem() when transitioning from
      read-write to read-only, and there are some file systems where this is
      not needed at all (for example, for a pseudo-filesystem or something
      like romfs).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Cc: Jan Kara <jack@suse.cz>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Anders Larsen <al@alarsen.net>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: xfs@oss.sgi.com
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: fuse-devel@lists.sourceforge.net
      Cc: cluster-devel@redhat.com
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-nilfs@vger.kernel.org
      Cc: linux-ntfs-dev@lists.sourceforge.net
      Cc: ocfs2-devel@oss.oracle.com
      Cc: reiserfs-devel@vger.kernel.org
      02b9984d
  13. 12 3月, 2014 3 次提交
  14. 07 3月, 2014 5 次提交
  15. 06 3月, 2014 1 次提交
  16. 03 3月, 2014 1 次提交
    • S
      GFS2: Clean up journal extent mapping · b50f227b
      Steven Whitehouse 提交于
      This patch fixes a long standing issue in mapping the journal
      extents. Most journals will consist of only a single extent,
      and although the cache took account of that by merging extents,
      it did not actually map large extents, but instead was doing a
      block by block mapping. Since the journal was only being mapped
      on mount, this was not normally noticeable.
      
      With the updated code, it is now possible to use the same extent
      mapping system during journal recovery (which will be added in a
      later patch). This will allow checking of the integrity of the
      journal before any reply of the journal content is attempted. For
      this reason the code is moving to bmap.c, since it will be used
      more widely in due course.
      
      An exercise left for the reader is to compare the new function
      gfs2_map_journal_extents() with gfs2_write_alloc_required()
      
      Additionally, should there be a failure, the error reporting is
      also updated to show more detail about what went wrong.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b50f227b
  17. 27 2月, 2014 1 次提交
  18. 25 2月, 2014 3 次提交
    • S
      GFS2: Remove extra "if" in gfs2_log_flush() · b1ab1e44
      Steven Whitehouse 提交于
      By reordering some of the assignments in gfs2_log_flush() it
      is possible to remove one of the "if" statements as it can be
      merged with one higher up the function.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b1ab1e44
    • S
      GFS2: Move log buffer accounting to transaction · 022ef4fe
      Steven Whitehouse 提交于
      Now we have a master transaction into which other transactions
      are merged, the accounting can be done using this master
      transaction. We no longer require the superblock fields which
      were being used for this function.
      
      In addition, this allows for a clean up in calc_reserved()
      making it rather easier understand. Also, by reducing the
      number of variables used to track the buffers being added
      and removed from the journal, a number of error checks are
      now no longer required.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      022ef4fe
    • S
      GFS2: Move log buffer lists into transaction · d69a3c65
      Steven Whitehouse 提交于
      Over time, we hope to be able to improve the concurrency available
      in the log code. This is one small step towards that, by moving
      the buffer lists from the super block, and into the transaction
      structure, so that each transaction builds its own buffer lists.
      
      At transaction commit time, the buffer lists are merged into
      the currently accumulating transaction. That transaction then
      is passed into the before and after commit functions at journal
      flush time. Thus there should be no change in overall behaviour
      yet.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d69a3c65
  19. 21 2月, 2014 1 次提交
  20. 17 2月, 2014 1 次提交
  21. 10 2月, 2014 1 次提交
  22. 07 2月, 2014 1 次提交
    • S
      GFS2: Add meta readahead field in directory entries · 44aaada9
      Steven Whitehouse 提交于
      The intent of this new field in the directory entry is to
      allow a subsequent lookup to know how many blocks, which
      are contiguous with the inode, contain metadata which relates
      to the inode. This will then allow the issuing of a single
      read to read these blocks, rather than reading the inode
      first, and then issuing a second read for the metadata.
      
      This only works under some fairly strict conditions, since
      we do not have back pointers from inodes to directory entries
      we must ensure that the blocks referenced in this way will
      always belong to the inode.
      
      This rules out being able to use this system for indirect
      blocks, as these can change as a result of truncate/rewrite.
      
      So the idea here is to restrict this to xattr blocks only
      for the time being. For most inodes, that means only a
      single block. Also, when using ACLs and/or SELinux or
      other LSMs, these will be added at inode creation time
      so that they will be contiguous with the inode on disk and
      also will almost always be needed when we read the inode in
      for permissions checks.
      
      Once an xattr block for an inode is allocated, it will never
      change until the inode is deallocated.
      
      This patch adds the new field, a further patch will add the
      readahead in due course.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      44aaada9
  23. 06 2月, 2014 2 次提交
    • B
      GFS2: Lock i_mutex and use a local gfs2_holder for fallocate · a0846a53
      Bob Peterson 提交于
      This patch causes GFS2 to lock the i_mutex during fallocate. It
      also switches from using a dinode's inode glock to using a local
      holder like the other GFS2 i_operations.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a0846a53
    • S
      GFS2: journal data writepages update · 774016b2
      Steven Whitehouse 提交于
      GFS2 has carried what is more or less a copy of the
      write_cache_pages() for some time. It seems that this
      copy has slipped behind the core code over time. This
      patch brings it back uptodate, and in addition adds the
      tracepoint which would otherwise be missing.
      
      We could go further, and eliminate some or all of the
      code duplication here. The issue is that if we do that,
      then the function we need to split out from the existing
      write_cache_pages(), which will look a lot like
      gfs2_jdata_write_pagevec(), would land up putting quite a
      lot of extra variables on the stack. I know that has been
      a problem in the past in the writeback code path, which
      is why I've hesitated to do it here.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      774016b2
  24. 04 2月, 2014 1 次提交
    • S
      GFS2: Allocate block for xattr at inode alloc time, if required · b2c8b3ea
      Steven Whitehouse 提交于
      This is another step towards improving the allocation of xattr
      blocks at inode allocation time. Here we take advantage of
      Christoph's recent work on ACLs to allocate a block for the
      xattrs early if we know that we will be adding ACLs to the
      inode later on. The advantage of that is that it is much
      more likely that we'll get a contiguous run of two blocks
      where the first is the inode and the second is the xattr block.
      
      We still have to fall back to the original system in case we
      don't get the requested two contiguous blocks, or in case the
      ACLs are too large to fit into the block.
      
      Future patches will move more of the ACL setting code further
      up the gfs2_inode_create() function. Also, I'd like to be
      able to do the same thing with the xattrs from LSMs in
      due course, too. That way we should be able to slowly reduce
      the number of independent transactions, at least in the
      most common cases.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b2c8b3ea
  25. 03 2月, 2014 1 次提交
    • S
      GFS2: Plug on AIL flush · 885bceca
      Steven Whitehouse 提交于
      When we do a flush of the AIL list, we are writing out what is
      likely to be a lot of small I/Os, which are possibly in an order
      which is not ideal performance-wise. Since this is done by calling
      filemap_fdatatwrite for each individual inode's address space there
      is no overall plugging going on.
      
      In addition to that, we do not always wait for AIL i/o when we flush
      it, so that it is possible for things to get left behind on the queue.
      By adding explicit plugging here, we reduce the chances of this
      being an issues. A quick test using the AIL flush tracepoint shows a
      small, but measurable improvement.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      885bceca
  26. 26 1月, 2014 2 次提交