1. 16 9月, 2009 1 次提交
    • A
      HWPOISON: Enable .remove_error_page for migration aware file systems · aa261f54
      Andi Kleen 提交于
      Enable removing of corrupted pages through truncation
      for a bunch of file systems: ext*, xfs, gfs2, ocfs2, ntfs
      These should cover most server needs.
      
      I chose the set of migration aware file systems for this
      for now, assuming they have been especially audited.
      But in general it should be safe for all file systems
      on the data area that support read/write and truncate.
      
      Caveat: the hardware error handler does not take i_mutex
      for now before calling the truncate function. Is that ok?
      
      Cc: tytso@mit.edu
      Cc: hch@infradead.org
      Cc: mfasheh@suse.com
      Cc: aia21@cantab.net
      Cc: hugh.dickins@tiscali.co.uk
      Cc: swhiteho@redhat.com
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      aa261f54
  2. 30 7月, 2009 1 次提交
    • B
      GFS2: keep statfs info in sync on grows · 1946f70a
      Benjamin Marzinski 提交于
      GFS2 wasn't syncing its statfs info on grows.  This causes a problem
      when you grow the filesystem on multiple nodes.  GFS2 would calculate
      the new space based on the resource groups (which are always current),
      and then assume that the filesystem had grown the from the existing
      statfs size.  If you grew the filesystem on two different nodes in a
      short time, the second node wouldn't see the statfs size change from the
      first node, and would assume that it was grown by a larger amount than
      it was.  When all these changes were synced out, the total fileystem
      size would be incorrect (the first grow would be counted twice).
      
      This patch syncs makes GFS2 read in the statfs changes from disk before
      a grow, and write them out after the grow, while the master statfs inode
      is locked.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1946f70a
  3. 22 5月, 2009 1 次提交
    • S
      GFS2: Clean up some file names · b1e71b06
      Steven Whitehouse 提交于
      This patch renames the ops_*.c files which have no counterpart
      without the ops_ prefix in order to shorten the name and make
      it more readable. In addition, ops_address.h (which was very
      small) is moved into inode.h and inode.h is cleaned up by
      adding extern where required.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b1e71b06
  4. 12 5月, 2009 1 次提交
  5. 24 3月, 2009 3 次提交
    • H
      GFS2: Pagecache usage optimization on GFS2 · 229615de
      Hisashi Hifumi 提交于
      I introduced "is_partially_uptodate" aops for GFS2.
      
      A page can have multiple buffers and even if a page is not uptodate, some buffers
      can be uptodate on pagesize != blocksize environment.
      This aops checks that all buffers which correspond to a part of a file
      that we want to read are uptodate. If so, we do not have to issue actual
      read IO to HDD even if a page is not uptodate because the portion we
      want to read are uptodate.
      "block_is_partially_uptodate" function is already used by ext2/3/4.
      With the following patch random read/write mixed workloads or random read after
      random write workloads can be optimized and we can get performance improvement.
      
      I did a performance test using the sysbench.
      
      #sysbench --num-threads=16 --max-requests=200000 --test=fileio --file-num=1
      --file-block-size=8K --file-total-size=2G --file-test-mode=rndrw --file-fsync-freq=0
      --file-rw-ratio=1 run
      
      -2.6.29-rc6
      Test execution summary:
          total time:                          202.6389s
          total number of events:              200000
          total time taken by event execution: 2580.0480
          per-request statistics:
               min:                            0.0000s
               avg:                            0.0129s
               max:                            49.5852s
               approx.  95 percentile:         0.0462s
      
      -2.6.29-rc6-patched
      Test execution summary:
          total time:                          177.8639s
          total number of events:              200000
          total time taken by event execution: 2419.0199
          per-request statistics:
               min:                            0.0000s
               avg:                            0.0121s
               max:                            52.4306s
               approx.  95 percentile:         0.0444s
      
      arch: ia64
      pagesize: 16k
      blocksize: 4k
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      229615de
    • S
      GFS2: Merge lock_dlm module into GFS2 · f057f6cd
      Steven Whitehouse 提交于
      This is the big patch that I've been working on for some time
      now. There are many reasons for wanting to make this change
      such as:
       o Reducing overhead by eliminating duplicated fields between structures
       o Simplifcation of the code (reduces the code size by a fair bit)
       o The locking interface is now the DLM interface itself as proposed
         some time ago.
       o Fewer lookups of glocks when processing replies from the DLM
       o Fewer memory allocations/deallocations for each glock
       o Scope to do further optimisations in the future (but this patch is
         more than big enough for now!)
      
      Please note that (a) this patch relates to the lock_dlm module and
      not the DLM itself, that is still a separate module; and (b) that
      we retain the ability to build GFS2 as a standalone single node
      filesystem with out requiring the DLM.
      
      This patch needs a lot of testing, hence my keeping it I restarted
      my -git tree after the last merge window. That way, this has the maximum
      exposure before its merged. This is (modulo a few minor bug fixes) the
      same patch that I've been posting on and off the the last three months
      and its passed a number of different tests so far.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f057f6cd
    • A
      GFS2: change gfs2_quota_scan into a shrinker · 0a7ab79c
      Abhijith Das 提交于
      Deallocation of gfs2_quota_data objects now happens on-demand through a
      shrinker instead of routinely deallocating through the quotad daemon.
      Signed-off-by: NAbhijith Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      0a7ab79c
  6. 07 1月, 2009 1 次提交
  7. 05 1月, 2009 4 次提交
    • S
      GFS2: Streamline alloc calculations for writes · 7ed122e4
      Steven Whitehouse 提交于
      This patch removes some unused code, and make the calculation
      of the number of blocks required conditional in order to reduce
      the number of times this (potentially expensive) calculation
      is done.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7ed122e4
    • S
      GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize · c9e98886
      Steven Whitehouse 提交于
      This patch moved the i_size field from the gfs2_dinode_host and
      following the ext3 convention renames it i_disksize.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c9e98886
    • S
      GFS2: Fix up jdata writepage/delete_inode · 1bb7322f
      Steven Whitehouse 提交于
      There is a bug in writepage and delete_inode which allows jdata files to
      invalidate pages from the address space without being in a transaction at
      the time. This causes problems in case the pages are in the journal. This
      patch fixes that case and prevents the resulting oops.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1bb7322f
    • N
      fs: symlink write_begin allocation context fix · 54566b2c
      Nick Piggin 提交于
      With the write_begin/write_end aops, page_symlink was broken because it
      could no longer pass a GFP_NOFS type mask into the point where the
      allocations happened.  They are done in write_begin, which would always
      assume that the filesystem can be entered from reclaim.  This bug could
      cause filesystem deadlocks.
      
      The funny thing with having a gfp_t mask there is that it doesn't really
      allow the caller to arbitrarily tinker with the context in which it can be
      called.  It couldn't ever be GFP_ATOMIC, for example, because it needs to
      take the page lock.  The only thing any callers care about is __GFP_FS
      anyway, so turn that into a single flag.
      
      Add a new flag for write_begin, AOP_FLAG_NOFS.  Filesystems can now act on
      this flag in their write_begin function.  Change __grab_cache_page to
      accept a nofs argument as well, to honour that flag (while we're there,
      change the name to grab_cache_page_write_begin which is more instructive
      and does away with random leading underscores).
      
      This is really a more flexible way to go in the end anyway -- if a
      filesystem happens to want any extra allocations aside from the pagecache
      ones in ints write_begin function, it may now use GFP_KERNEL (rather than
      GFP_NOFS) for common case allocations (eg.  ocfs2_alloc_write_ctxt, for a
      random example).
      
      [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
      [kosaki.motohiro@jp.fujitsu.com: fix fuse]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: <stable@kernel.org>		[2.6.28.x]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ Cleaned up the calling convention: just pass in the AOP flags
        untouched to the grab_cache_page_write_begin() function.  That
        just simplifies everybody, and may even allow future expansion of the
        logic.   - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54566b2c
  8. 18 9月, 2008 1 次提交
    • S
      GFS2: high time to take some time over atime · 719ee344
      Steven Whitehouse 提交于
      Until now, we've used the same scheme as GFS1 for atime. This has failed
      since atime is a per vfsmnt flag, not a per fs flag and as such the
      "noatime" flag was not getting passed down to the filesystems. This
      patch removes all the "special casing" around atime updates and we
      simply use the VFS's atime code.
      
      The net result is that GFS2 will now support all the same atime related
      mount options of any other filesystem on a per-vfsmnt basis. We do lose
      the "lazy atime" updates, but we gain "relatime". We could add lazy
      atime to the VFS at a later date, if there is a requirement for that
      variant still - I suspect relatime will be enough.
      
      Also we lose about 100 lines of code after this patch has been applied,
      and I have a suspicion that it will speed things up a bit, even when
      atime is "on". So it seems like a nice clean up as well.
      
      From a user perspective, everything stays the same except the loss of
      the per-fs atime quantum tweekable (ought to be per-vfsmnt at the very
      least, and to be honest I don't think anybody ever used it) and that a
      number of options which were ignored before now work correctly.
      
      Please let me know if you've got any comments. I'm pushing this out
      early so that you can all see what my plans are.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      719ee344
  9. 15 9月, 2008 1 次提交
  10. 27 6月, 2008 2 次提交
    • S
      [GFS2] Revise readpage locking · 01b7c7ae
      Steven Whitehouse 提交于
      The previous attempt to fix the locking in readpage failed due
      to the use of a "try lock" which resulted in occasional high
      cpu usage during testing (due to repeated tries) and also it
      did not resolve all the ordering problems wrt the transaction
      lock (although it did solve all the inode lock ordering problems).
      
      This patch avoids the problem by unlocking the page and getting the
      locks in the correct order. This means that we have to retest the
      page to ensure that it hasn't changed when we relock the page.
      
      This now passes the tests which were previously failing.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      01b7c7ae
    • S
      [GFS2] Clean up the glock core · 6802e340
      Steven Whitehouse 提交于
      This patch implements a number of cleanups to the core of the
      GFS2 glock code. As a result a lot of code is removed. It looks
      like a really big change, but actually a large part of this patch
      is either removing or moving existing code.
      
      There are some new bits too though, such as the new run_queue()
      function which is considerably streamlined. Highlights of this
      patch include:
      
       o Fixes a cluster coherency bug during SH -> EX lock conversions
       o Removes the "glmutex" code in favour of a single bit lock
       o Removes the ->go_xmote_bh() for inodes since it was duplicating
         ->go_lock()
       o We now only use the ->lm_lock() function for both locks and
         unlocks (i.e. unlock is a lock with target mode LM_ST_UNLOCKED)
       o The fast path is considerably shortly, giving performance gains
         especially with lock_nolock
       o The glock_workqueue is now used for all the callbacks from the DLM
         which allows us to simplify the lock_dlm module (see following patch)
       o The way is now open to make further changes such as eliminating the two
         threads (gfs2_glockd and gfs2_scand) in favour of a more efficient
         scheme.
      
      This patch has undergone extensive testing with various test suites
      so it should be pretty stable by now.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Bob Peterson <rpeterso@redhat.com>
      6802e340
  11. 28 4月, 2008 1 次提交
  12. 31 3月, 2008 5 次提交
  13. 06 2月, 2008 1 次提交
    • C
      Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user · eebd2aa3
      Christoph Lameter 提交于
      Simplify page cache zeroing of segments of pages through 3 functions
      
      zero_user_segments(page, start1, end1, start2, end2)
      
              Zeros two segments of the page. It takes the position where to
              start and end the zeroing which avoids length calculations and
      	makes code clearer.
      
      zero_user_segment(page, start, end)
      
              Same for a single segment.
      
      zero_user(page, start, length)
      
              Length variant for the case where we know the length.
      
      We remove the zero_user_page macro. Issues:
      
      1. Its a macro. Inline functions are preferable.
      
      2. The KM_USER0 macro is only defined for HIGHMEM.
      
         Having to treat this special case everywhere makes the
         code needlessly complex. The parameter for zeroing is always
         KM_USER0 except in one single case that we open code.
      
      Avoiding KM_USER0 makes a lot of code not having to be dealing
      with the special casing for HIGHMEM anymore. Dealing with
      kmap is only necessary for HIGHMEM configurations. In those
      configurations we use KM_USER0 like we do for a series of other
      functions defined in highmem.h.
      
      Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
      function could not be a macro. zero_user_* functions introduced
      here can be be inline because that constant is not used when these
      functions are called.
      
      Also extract the flushing of the caches to be outside of the kmap.
      
      [akpm@linux-foundation.org: fix nfs and ntfs build]
      [akpm@linux-foundation.org: fix ntfs build some more]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eebd2aa3
  14. 25 1月, 2008 13 次提交
    • S
      [GFS2] Reduce inode size by moving i_alloc out of line · 6dbd8224
      Steven Whitehouse 提交于
      It is possible to reduce the size of GFS2 inodes by taking the i_alloc
      structure out of the gfs2_inode. This patch allocates the i_alloc
      structure whenever its needed, and frees it afterward. This decreases
      the amount of low memory we use at the expense of requiring a memory
      allocation for each page or partial page that we write. A quick test
      with postmark shows that the overhead is not measurable and I also note
      that OCFS2 use the same approach.
      
      In the future I'd like to solve the problem by shrinking down the size
      of the members of the i_alloc structure, but for now, this reduces the
      immediate problem of using too much low-memory on x86 and doesn't add
      too much overhead.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      6dbd8224
    • S
      [GFS2] Fix problems relating to execution of files on GFS2 · 9656b2c1
      Steven Whitehouse 提交于
      This patch fixes a couple of problems which affected the execution of files
      on GFS2. The first is that there was a corner case where inodes were not
      always uptodate at the point at which permissions checks were being carried
      out, this was resulting in refusal of execute permission, but only on the
      first lookup, subsequent requests worked correctly. The second was a problem
      relating to incorrect updating of file sizes which was introduced with the
      write_begin/end code for GFS2 a little while back.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Abhijith Das <adas@redhat.com>
      9656b2c1
    • S
      [GFS2] Allow page migration for writeback and ordered pages · e5d9dc27
      Steven Whitehouse 提交于
      To improve performance on NUMA, we use the VM's standard page
      migration for writeback and ordered pages. Probably we could
      also do the same for journaled data, but that would need a
      careful audit of the code, so will be the subject of a later
      patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e5d9dc27
    • B
      [GFS2] Remove function gfs2_get_block · e9e1ef2b
      Bob Peterson 提交于
      This patch is just a cleanup.  Function gfs2_get_block() just calls
      function gfs2_block_map reversing the last two parameters.  By
      reversing the parameters, gfs2_block_map() may be called directly
      and function gfs2_get_block may be eliminated altogether.
      Since this function is done for every block operation,
      this streamlines the code and makes it a little bit more efficient.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e9e1ef2b
    • S
      [GFS2] Use correct include file in ops_address.c · 47e83b50
      Steven Whitehouse 提交于
      Something changed in the upstream kernel, and it needs this
      one-liner to allow ops_address.c to build.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      47e83b50
    • S
      [GFS2] Don't hold page lock when starting transaction · c41d4f09
      Steven Whitehouse 提交于
      This is an addendum to the new AOPs work which moves the point
      at which we take the page lock so that we don't get it until
      the last possible moment. This resolves a conflict between
      starting transactions and the page lock.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c41d4f09
    • S
      [GFS2] Add writepages for GFS2 jdata · b8e7cbb6
      Steven Whitehouse 提交于
      This patch resolves a lock ordering issue where we had been getting
      a transaction lock in the wrong order with respect to the page lock.
      By using writepages rather than just writepage, it is then possible
      to start a transaction before locking the page, and thus matching the
      locking order elsewhere in the code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b8e7cbb6
    • S
      [GFS2] Split gfs2_writepage into three cases · 9ff8ec32
      Steven Whitehouse 提交于
      This patch splits gfs2_writepage into separate functions for each of
      the three cases: writeback, ordered and journalled. As a result
      it becomes a lot easier to see what each one is doing. The common
      code is moved into gfs2_writepage_common.
      
      This fixes a performance bug where we were doing more work than
      strictly required in the ordered write case.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9ff8ec32
    • S
      [GFS2] Introduce gfs2_set_aops() · 5561093e
      Steven Whitehouse 提交于
      Just like ext3 we now have three sets of address space operations
      to cover the cases of writeback, ordered and journalled data
      writes. This means that the individual operations can now become
      less complicated as we are able to remove some of the tests for
      file data mode from the code.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5561093e
    • S
      [GFS2] Add gfs2_is_writeback() · bf36a713
      Steven Whitehouse 提交于
      This adds a function "gfs2_is_writeback()" along the lines of the
      existing "gfs2_is_jdata()" in order to clean up the code and make
      the various tests for the inode mode more obvious. It also fixes
      the PageChecked() logic where we were resetting the flag too early
      in the case of an error path.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bf36a713
    • S
      [GFS2] Remove useless i_cache from inodes · f91a0d3e
      Steven Whitehouse 提交于
      The i_cache was designed to keep references to the indirect blocks
      used during block mapping so that they didn't have to be looked
      up continually. The idea failed because there are too many places
      where the i_cache needs to be freed, and this has in the past been
      the cause of many bugs.
      
      In addition there was no performance benefit being gained since the
      disk blocks in question were cached anyway. So this patch removes
      it in order to simplify the code to prepare for other changes which
      would otherwise have had to add further support for this feature.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f91a0d3e
    • S
      [GFS2] Use ->page_mkwrite() for mmap() · 3cc3f710
      Steven Whitehouse 提交于
      This cleans up the mmap() code path for GFS2 by implementing the
      page_mkwrite function for GFS2. We are thus able to use the
      generic filemap_fault function for our ->fault() implementation.
      
      This now means that shared writable mappings will be much more
      efficiently shared across the cluster if there is a reasonable
      proportion of read activity (the greater proportion, the better).
      
      As a side effect, it also reduces the size of the code, removes
      special cases from readpage and readpages, and makes the code
      path easier to follow.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3cc3f710
    • S
      [GFS2] Clean up internal read function · 51ff87bd
      Steven Whitehouse 提交于
      As requested by Christoph, this patch cleans up GFS2's internal
      read function so that it no longer uses the do_generic_mapping_read
      function. This function is obsolete and GFS2 is the last user of it.
      
      As a side effect the internal read code gets smaller and easier
      to read and gfs2_readpage is split into two. One function has the locking
      and the other function has the rest of the logic.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      51ff87bd
  15. 17 10月, 2007 1 次提交
  16. 10 10月, 2007 3 次提交
    • S
      891ba6d4
    • W
      [GFS2] Data corruption fix · de986e85
      Wendy Cheng 提交于
      * GFS2 has been using i_cache array to store its indirect meta blocks.
      Its flush routine doesn't correctly clean up all the entries. The
      problem would show while multiple nodes do simultaneous writes to the
      same file. Upon glock exclusive lock transfer, if the file is a sparse
      file with large file size where the indirect meta blocks span multiple
      array entries with "zero" entries in between. The flush routine
      prematurely stops the flushing that leaves old (stale) entries around.
      This leads to several nasty issues, including data corruption.
      * Fix gfs2_get_block_noalloc checking to correctly return EIO upon
      unmapped buffer.
      Signed-off-by: NWendy Cheng <wcheng@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      de986e85
    • S
      [GFS2] Clean up journaled data writing · 16615be1
      Steven Whitehouse 提交于
      This patch cleans up the code for writing journaled data into the log.
      It also removes the need to allocate a small "tag" structure for each
      block written into the log. Instead we just keep count of the outstanding
      I/O so that we can be sure that its all been written at the correct time.
      Another result of this patch is that a number of ll_rw_block() calls
      have become submit_bh() calls, closing some races at the same time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      16615be1