1. 26 10月, 2010 1 次提交
  2. 28 9月, 2010 2 次提交
  3. 20 9月, 2010 3 次提交
    • B
      GFS2: fallocate support · 3921120e
      Benjamin Marzinski 提交于
      This patch adds support for fallocate to gfs2.  Since the gfs2 does not support
      uninitialized data blocks, it must write out zeros to all the blocks.  However,
      since it does not need to lock any pages to read from, gfs2 can write out the
      zero blocks much more efficiently.  On a moderately full filesystem, fallocate
      works around 5 times faster on average.  The fallocate call also allows gfs2 to
      add blocks to the file without changing the filesize, which will make it
      possible for gfs2 to preallocate space for the rindex file, so that gfs2 can
      grow a completely full filesystem.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3921120e
    • S
      GFS2: Remove i_disksize · a2e0f799
      Steven Whitehouse 提交于
      With the update of the truncate code, ip->i_disksize and
      inode->i_size are merely copies of each other. This means
      we can remove ip->i_disksize and use inode->i_size exclusively
      reducing the size of a GFS2 inode by 8 bytes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a2e0f799
    • S
      GFS2: New truncate sequence · ff8f33c8
      Steven Whitehouse 提交于
      This updates GFS2's truncate code to use the new truncate
      sequence correctly. This is a stepping stone to being
      able to remove ip->i_disksize in favour of using i_size
      everywhere now that the two sizes are always identical.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Christoph Hellwig <hch@lst.de>
      ff8f33c8
  4. 10 8月, 2010 2 次提交
    • C
      check ATTR_SIZE contraints in inode_change_ok · 2c27c65e
      Christoph Hellwig 提交于
      Make sure we check the truncate constraints early on in ->setattr by adding
      those checks to inode_change_ok.  Also clean up and document inode_change_ok
      to make this obvious.
      
      As a fallout we don't have to call inode_newsize_ok from simple_setsize and
      simplify it down to a truncate_setsize which doesn't return an error.  This
      simplifies a lot of setattr implementations and means we use truncate_setsize
      almost everywhere.  Get rid of fat_setsize now that it's trivial and mark
      ext2_setsize static to make the calling convention obvious.
      
      Keep the inode_newsize_ok in vmtruncate for now as all callers need an
      audit for its removal anyway.
      
      Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
      needs a deeper audit, but that is left for later.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2c27c65e
    • C
      sort out blockdev_direct_IO variants · eafdc7d1
      Christoph Hellwig 提交于
      Move the call to vmtruncate to get rid of accessive blocks to the callers
      in prepearation of the new truncate calling sequence.  This was only done
      for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
      was not needed anyway.  Get rid of blockdev_direct_IO_no_locking and
      its _newtrunc variant while at it as just opencoding the two additional
      paramters is shorted than the name suffix.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      eafdc7d1
  5. 29 7月, 2010 2 次提交
  6. 28 5月, 2010 1 次提交
  7. 29 3月, 2010 1 次提交
    • S
      GFS2: Clean up stuffed file copying · 602c89d2
      Steven Whitehouse 提交于
      If the inode size was corrupt for stuffed files, it was possible
      for the copying of data to overrun the block and/or page. This patch
      checks for that condition so that this is no longer possible.
      
      This is also preparation for the new truncate sequence patch which
      requires the ability to have stuffed files with larger sizes than
      (disk block size - sizeof(on disk inode)) with the restriction that
      only the initial part of the file may be non-zero.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      602c89d2
  8. 01 3月, 2010 1 次提交
    • S
      GFS2: Metadata address space clean up · 009d8518
      Steven Whitehouse 提交于
      Since the start of GFS2, an "extra" inode has been used to store
      the metadata belonging to each inode. The only reason for using
      this inode was to have an extra address space, the other fields
      were unused. This means that the memory usage was rather inefficient.
      
      The reason for keeping each inode's metadata in a separate address
      space is that when glocks are requested on remote nodes, we need to
      be able to efficiently locate the data and metadata which relating
      to that glock (inode) in order to sync or sync and invalidate it
      (depending on the remotely requested lock mode).
      
      This patch adds a new type of glock, which has in addition to
      its normal fields, has an address space. This applies to all
      inode and rgrp glocks (but to no other glock types which remain
      as before). As a result, we no longer need to have the second
      inode.
      
      This results in three major improvements:
       1. A saving of approx 25% of memory used in caching inodes
       2. A removal of the circular dependency between inodes and glocks
       3. No confusion between "normal" and "metadata" inodes in super.c
      
      Although the first of these is the more immediately apparent, the
      second is just as important as it now enables a number of clean
      ups at umount time. Those will be the subject of future patches.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      009d8518
  9. 03 12月, 2009 2 次提交
  10. 16 9月, 2009 1 次提交
    • A
      HWPOISON: Enable .remove_error_page for migration aware file systems · aa261f54
      Andi Kleen 提交于
      Enable removing of corrupted pages through truncation
      for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
      These should cover most server needs.
      
      I chose the set of migration aware file systems for this
      for now, assuming they have been especially audited.
      But in general it should be safe for all file systems
      on the data area that support read/write and truncate.
      
      Caveat: the hardware error handler does not take i_mutex
      for now before calling the truncate function. Is that ok?
      
      Cc: tytso@mit.edu
      Cc: hch@infradead.org
      Cc: mfasheh@suse.com
      Cc: aia21@cantab.net
      Cc: hugh.dickins@tiscali.co.uk
      Cc: swhiteho@redhat.com
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      aa261f54
  11. 30 7月, 2009 1 次提交
    • B
      GFS2: keep statfs info in sync on grows · 1946f70a
      Benjamin Marzinski 提交于
      GFS2 wasn't syncing its statfs info on grows.  This causes a problem
      when you grow the filesystem on multiple nodes.  GFS2 would calculate
      the new space based on the resource groups (which are always current),
      and then assume that the filesystem had grown the from the existing
      statfs size.  If you grew the filesystem on two different nodes in a
      short time, the second node wouldn't see the statfs size change from the
      first node, and would assume that it was grown by a larger amount than
      it was.  When all these changes were synced out, the total fileystem
      size would be incorrect (the first grow would be counted twice).
      
      This patch syncs makes GFS2 read in the statfs changes from disk before
      a grow, and write them out after the grow, while the master statfs inode
      is locked.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1946f70a
  12. 22 5月, 2009 1 次提交
    • S
      GFS2: Clean up some file names · b1e71b06
      Steven Whitehouse 提交于
      This patch renames the ops_*.c files which have no counterpart
      without the ops_ prefix in order to shorten the name and make
      it more readable. In addition, ops_address.h (which was very
      small) is moved into inode.h and inode.h is cleaned up by
      adding extern where required.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b1e71b06
  13. 12 5月, 2009 1 次提交
  14. 24 3月, 2009 3 次提交
    • H
      GFS2: Pagecache usage optimization on GFS2 · 229615de
      Hisashi Hifumi 提交于
      I introduced "is_partially_uptodate" aops for GFS2.
      
      A page can have multiple buffers and even if a page is not uptodate, some buffers
      can be uptodate on pagesize != blocksize environment.
      This aops checks that all buffers which correspond to a part of a file
      that we want to read are uptodate. If so, we do not have to issue actual
      read IO to HDD even if a page is not uptodate because the portion we
      want to read are uptodate.
      "block_is_partially_uptodate" function is already used by ext2/3/4.
      With the following patch random read/write mixed workloads or random read after
      random write workloads can be optimized and we can get performance improvement.
      
      I did a performance test using the sysbench.
      
      #sysbench --num-threads=16 --max-requests=200000 --test=fileio --file-num=1
      --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0
      --file-rw-ratio=1 run
      
      -2.6.29-rc6
      Test execution summary:
          total time:                          202.6389s
          total number of events:              200000
          total time taken by event execution: 2580.0480
          per-request statistics:
               min:                            0.0000s
               avg:                            0.0129s
               max:                            49.5852s
               approx.  95 percentile:         0.0462s
      
      -2.6.29-rc6-patched
      Test execution summary:
          total time:                          177.8639s
          total number of events:              200000
          total time taken by event execution: 2419.0199
          per-request statistics:
               min:                            0.0000s
               avg:                            0.0121s
               max:                            52.4306s
               approx.  95 percentile:         0.0444s
      
      arch: ia64
      pagesize: 16k
      blocksize: 4k
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      229615de
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
    • A
      GFS2: change gfs2_quota_scan into a shrinker · 0a7ab79c
      Abhijith Das 提交于
      Deallocation of gfs2_quota_data objects now happens on-demand through a
      shrinker instead of routinely deallocating through the quotad daemon.
      Signed-off-by: NAbhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      0a7ab79c
  15. 07 1月, 2009 1 次提交
  16. 05 1月, 2009 4 次提交
    • S
      GFS2: Streamline alloc calculations for writes · 7ed122e4
      Steven Whitehouse 提交于
      This patch removes some unused code, and make the calculation
      of the number of blocks required conditional in order to reduce
      the number of times this (potentially expensive) calculation
      is done.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7ed122e4
    • S
      GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize · c9e98886
      Steven Whitehouse 提交于
      This patch moved the i_size field from the gfs2_dinode_host and
      following the ext3 convention renames it i_disksize.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c9e98886
    • S
      GFS2: Fix up jdata writepage/delete_inode · 1bb7322f
      Steven Whitehouse 提交于
      There is a bug in writepage and delete_inode which allows jdata files to
      invalidate pages from the address space without being in a transaction at
      the time. This causes problems in case the pages are in the journal. This
      patch fixes that case and prevents the resulting oops.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1bb7322f
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
  17. 18 9月, 2008 1 次提交
    • S
      GFS2: high time to take some time over atime · 719ee344
      Steven Whitehouse 提交于
      Until now, we've used the same scheme as GFS1 for atime. This has failed
      since atime is a per vfsmnt flag, not a per fs flag and as such the
      "noatime" flag was not getting passed down to the filesystems. This
      patch removes all the "special casing" around atime updates and we
      simply use the VFS's atime code.
      
      The net result is that GFS2 will now support all the same atime related
      mount options of any other filesystem on a per-vfsmnt basis. We do lose
      the "lazy atime" updates, but we gain "relatime". We could add lazy
      atime to the VFS at a later date, if there is a requirement for that
      variant still - I suspect relatime will be enough.
      
      Also we lose about 100 lines of code after this patch has been applied,
      and I have a suspicion that it will speed things up a bit, even when
      atime is "on". So it seems like a nice clean up as well.
      
      From a user perspective, everything stays the same except the loss of
      the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
      least, and to be honest I don't think anybody ever used it) and that a
      number of options which were ignored before now work correctly.
      
      Please let me know if you've got any comments. I'm pushing this out
      early so that you can all see what my plans are.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      719ee344
  18. 15 9月, 2008 1 次提交
  19. 27 6月, 2008 2 次提交
    • S
      [GFS2] Revise readpage locking · 01b7c7ae
      Steven Whitehouse 提交于
      The previous attempt to fix the locking in readpage failed due
      to the use of a "try lock" which resulted in occasional high
      cpu usage during testing (due to repeated tries) and also it
      did not resolve all the ordering problems wrt the transaction
      lock (although it did solve all the inode lock ordering problems).
      
      This patch avoids the problem by unlocking the page and getting the
      locks in the correct order. This means that we have to retest the
      page to ensure that it hasn't changed when we relock the page.
      
      This now passes the tests which were previously failing.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      01b7c7ae
    • S
      [GFS2] Clean up the glock core · 6802e340
      Steven Whitehouse 提交于
      This patch implements a number of cleanups to the core of the
      GFS2 glock code. As a result a lot of code is removed. It looks
      like a really big change, but actually a large part of this patch
      is either removing or moving existing code.
      
      There are some new bits too though, such as the new run_queue()
      function which is considerably streamlined. Highlights of this
      patch include:
      
       o Fixes a cluster coherency bug during SH -> EX lock conversions
       o Removes the "glmutex" code in favour of a single bit lock
       o Removes the ->go_xmote_bh() for inodes since it was duplicating
         ->go_lock()
       o We now only use the ->lm_lock() function for both locks and
         unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
       o The fast path is considerably shortly, giving performance gains
         especially with lock_nolock
       o The glock_workqueue is now used for all the callbacks from the DLM
         which allows us to simplify the lock_dlm module (see following patch)
       o The way is now open to make further changes such as eliminating the two
         threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
         scheme.
      
      This patch has undergone extensive testing with various test suites
      so it should be pretty stable by now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      6802e340
  20. 28 4月, 2008 1 次提交
  21. 31 3月, 2008 5 次提交
  22. 06 2月, 2008 1 次提交
    • C
      Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user · eebd2aa3
      Christoph Lameter 提交于
      Simplify page cache zeroing of segments of pages through 3 functions
      
      zero_user_segments(page, start1, end1, start2, end2)
      
              Zeros two segments of the page. It takes the position where to
              start and end the zeroing which avoids length calculations and
      	makes code clearer.
      
      zero_user_segment(page, start, end)
      
              Same for a single segment.
      
      zero_user(page, start, length)
      
              Length variant for the case where we know the length.
      
      We remove the zero_user_page macro. Issues:
      
      1. Its a macro. Inline functions are preferable.
      
      2. The KM_USER0 macro is only defined for HIGHMEM.
      
         Having to treat this special case everywhere makes the
         code needlessly complex. The parameter for zeroing is always
         KM_USER0 except in one single case that we open code.
      
      Avoiding KM_USER0 makes a lot of code not having to be dealing
      with the special casing for HIGHMEM anymore. Dealing with
      kmap is only necessary for HIGHMEM configurations. In those
      configurations we use KM_USER0 like we do for a series of other
      functions defined in highmem.h.
      
      Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
      function could not be a macro. zero_user_* functions introduced
      here can be be inline because that constant is not used when these
      functions are called.
      
      Also extract the flushing of the caches to be outside of the kmap.
      
      [akpm@linux-foundation.org: fix nfs and ntfs build]
      [akpm@linux-foundation.org: fix ntfs build some more]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eebd2aa3
  23. 25 1月, 2008 2 次提交
    • S
      [GFS2] Reduce inode size by moving i_alloc out of line · 6dbd8224
      Steven Whitehouse 提交于
      It is possible to reduce the size of GFS2 inodes by taking the i_alloc
      structure out of the gfs2_inode. This patch allocates the i_alloc
      structure whenever its needed, and frees it afterward. This decreases
      the amount of low memory we use at the expense of requiring a memory
      allocation for each page or partial page that we write. A quick test
      with postmark shows that the overhead is not measurable and I also note
      that OCFS2 use the same approach.
      
      In the future I'd like to solve the problem by shrinking down the size
      of the members of the i_alloc structure, but for now, this reduces the
      immediate problem of using too much low-memory on x86 and doesn't add
      too much overhead.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6dbd8224
    • S
      [GFS2] Fix problems relating to execution of files on GFS2 · 9656b2c1
      Steven Whitehouse 提交于
      This patch fixes a couple of problems which affected the execution of files
      on GFS2. The first is that there was a corner case where inodes were not
      always uptodate at the point at which permissions checks were being carried
      out, this was resulting in refusal of execute permission, but only on the
      first lookup, subsequent requests worked correctly. The second was a problem
      relating to incorrect updating of file sizes which was introduced with the
      write_begin/end code for GFS2 a little while back.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Abhijith Das <adas@redhat.com>
      9656b2c1