1. 22 5月, 2010 2 次提交
  2. 19 5月, 2010 3 次提交
    • T
      Ocfs2: Optimize punching-hole code. · c1631d4a
      Tristan Ye 提交于
      This patch simplifies the logic of handling existing holes and
      skipping extent blocks and removes some confusing comments.
      
      The patch survived the fill_verify_holes testcase in ocfs2-test.
      It also passed my manual sanity check and stress tests with enormous
      extent records.
      
      Currently punching a hole on a file with 3+ extent tree depth was
      really a performance disaster.  It can even take several hours,
      though we may not hit this in real life with such a huge extent
      number.
      
      One simple way to improve the performance is quite straightforward.
      From the logic of truncate, we can punch the hole from hole_end to
      hole_start, which reduces the overhead of btree operations in a
      significant way, such as tree rotation and moving.
      
      Following is the testing result when punching hole from 0 to file end
      in bytes, on a 1G file, 1G file consists of 256k extent records, each record
      cover 4k data(just one cluster, clustersize is 4k):
      
      ===========================================================================
       * Original punching-hole mechanism:
      ===========================================================================
      
         I waited 1 hour for its completion, unfortunately it's still ongoing.
      
      ===========================================================================
       * Patched punching-hode mechanism:
      ===========================================================================
      
         real 0m2.518s
         user 0m0.000s
         sys  0m2.445s
      
      That means we've gained up to 1000 times improvement on performance in this
      case, whee! It's fairly cool. and it looks like that performance gain will
      be raising when extent records grow.
      
      The patch was based on my former 2 patches, which were about truncating
      codes optimization and fixup to handle CoW on punching hole.
      Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      c1631d4a
    • T
      Ocfs2: Fix hole punching to correctly do CoW during cluster zeroing. · e8aec068
      Tristan Ye 提交于
      Based on the previous patch of optimizing truncate, the bugfix for
      refcount trees when punching holes can be fairly easy
      and straightforward since most of work we should take into account for
      refcounting have been completed already in ocfs2_remove_btree_range().
      
      This patch performs CoW for refcounted extents when a hole being punched
      whose start or end offset were in the middle of a cluster, which means
      partial zeroing of the cluster will be performed soon.
      
      The patch has been tested fixing the following bug:
      
      http://oss.oracle.com/bugzilla/show_bug.cgi?id=1216Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      e8aec068
    • T
      Ocfs2: Optimize ocfs2 truncate to use ocfs2_remove_btree_range() instead. · 78f94673
      Tristan Ye 提交于
      Truncate is just a special case of punching holes(from new i_size to
      end), we therefore could take advantage of the existing
      ocfs2_remove_btree_range() to reduce the comlexity and redundancy in
      alloc.c.  The goal here is to make truncate more generic and
      straightforward.
      
      Several functions only used by ocfs2_commit_truncate() will smiply be
      removed.
      
      ocfs2_remove_btree_range() was originally used by the hole punching
      code, which didn't take refcount trees into account (definitely a bug).
      We therefore need to change that func a bit to handle refcount trees.
      It must take the refcount lock, calculate and reserve blocks for
      refcount tree changes, and decrease refcounts at the end.  We replace 
      ocfs2_lock_allocators() here by adding a new func
      ocfs2_reserve_blocks_for_rec_trunc() which accepts some extra blocks to
      reserve.  This will not hurt any other code using
      ocfs2_remove_btree_range() (such as dir truncate and hole punching).
      
      I merged the following steps into one patch since they may be
      logically doing one thing, though I know it looks a little bit fat
      to review.
      
      1). Remove redundant code used by ocfs2_commit_truncate(), since we're
          moving to ocfs2_remove_btree_range anyway.
      
      2). Add a new func ocfs2_reserve_blocks_for_rec_trunc() for purpose of
          accepting some extra blocks to reserve.
      
      3). Change ocfs2_prepare_refcount_change_for_del() a bit to fit our
          needs.  It's safe to do this since it's only being called by
          truncate.
      
      4). Change ocfs2_remove_btree_range() a bit to take refcount case into
          account.
      
      5). Finally, we change ocfs2_commit_truncate() to call
          ocfs2_remove_btree_range() in a proper way.
      
      The patch has been tested normally for sanity check, stress tests
      with heavier workload will be expected.
      
      Based on this patch, fixing the punching holes bug will be fairly easy.
      Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      78f94673
  3. 06 5月, 2010 2 次提交
    • M
      ocfs2: use allocation reservations during file write · 4fe370af
      Mark Fasheh 提交于
      Add a per-inode reservations structure and pass it through to the
      reservations code.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      4fe370af
    • J
      ocfs2: Make ocfs2_journal_dirty() void. · ec20cec7
      Joel Becker 提交于
      jbd[2]_journal_dirty_metadata() only returns 0.  It's been returning 0
      since before the kernel moved to git.  There is no point in checking
      this error.
      
      ocfs2_journal_dirty() has been faithfully returning the status since the
      beginning.  All over ocfs2, we have blocks of code checking this can't
      fail status.  In the past few years, we've tried to avoid adding these
      checks, because they are pointless.  But anyone who looks at our code
      assumes they are needed.
      
      Finally, ocfs2_journal_dirty() is made a void function.  All error
      checking is removed from other files.  We'll BUG_ON() the status of
      jbd2_journal_dirty_metadata() just in case they change it someday.  They
      won't.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      ec20cec7
  4. 01 5月, 2010 1 次提交
  5. 16 4月, 2010 1 次提交
  6. 31 3月, 2010 1 次提交
  7. 05 3月, 2010 4 次提交
    • C
      dquot: cleanup dquot initialize routine · 871a2931
      Christoph Hellwig 提交于
      Get rid of the initialize dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_initialize helper to __dquot_initialize
      and vfs_dq_init to dquot_initialize to have a consistent namespace.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      871a2931
    • C
      dquot: move dquot initialization responsibility into the filesystem · 907f4554
      Christoph Hellwig 提交于
      Currently various places in the VFS call vfs_dq_init directly.  This means
      we tie the quota code into the VFS.  Get rid of that and make the
      filesystem responsible for the initialization.   For most metadata operations
      this is a straight forward move into the methods, but for truncate and
      open it's a bit more complicated.
      
      For truncate we currently only call vfs_dq_init for the sys_truncate case
      because open already takes care of it for ftruncate and open(O_TRUNC) - the
      new code causes an additional vfs_dq_init for those which is harmless.
      
      For open the initialization is moved from do_filp_open into the open method,
      which means it happens slightly earlier now, and only for regular files.
      The latter is fine because we don't need to initialize it for operations
      on special files, and we already do it as part of the namespace operations
      for directories.
      
      Add a dquot_file_open helper that filesystems that support generic quotas
      can use to fill in ->open.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      907f4554
    • C
      dquot: cleanup dquot transfer routine · b43fa828
      Christoph Hellwig 提交于
      Get rid of the transfer dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_transfer helper to __dquot_transfer
      and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
      and make the new dquot_transfer return a normal negative errno value
      which all callers expect.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b43fa828
    • C
      dquot: cleanup space allocation / freeing routines · 5dd4056d
      Christoph Hellwig 提交于
      Get rid of the alloc_space, free_space, reserve_space, claim_space and
      release_rsv dquot operations - they are always called from the filesystem
      and if a filesystem really needs their own (which none currently does)
      it can just call into it's own routine directly.
      
      Move shared logic into the common __dquot_alloc_space,
      dquot_claim_space_nodirty and __dquot_free_space low-level methods,
      and rationalize the wrappers around it to move as much as possible
      code into the common block for CONFIG_QUOTA vs not.  Also rename
      all these helpers to be named dquot_* instead of vfs_dq_*.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      5dd4056d
  8. 28 2月, 2010 1 次提交
  9. 27 2月, 2010 2 次提交
  10. 03 2月, 2010 1 次提交
  11. 26 1月, 2010 1 次提交
  12. 31 12月, 2009 1 次提交
  13. 10 12月, 2009 1 次提交
    • C
      vfs: Implement proper O_SYNC semantics · 6b2f3d1f
      Christoph Hellwig 提交于
      While Linux provided an O_SYNC flag basically since day 1, it took until
      Linux 2.4.0-test12pre2 to actually get it implemented for filesystems,
      since that day we had generic_osync_around with only minor changes and the
      great "For now, when the user asks for O_SYNC, we'll actually give
      O_DSYNC" comment.  This patch intends to actually give us real O_SYNC
      semantics in addition to the O_DSYNC semantics.  After Jan's O_SYNC
      patches which are required before this patch it's actually surprisingly
      simple, we just need to figure out when to set the datasync flag to
      vfs_fsync_range and when not.
      
      This patch renames the existing O_SYNC flag to O_DSYNC while keeping it's
      numerical value to keep binary compatibility, and adds a new real O_SYNC
      flag.  To guarantee backwards compatiblity it is defined as expanding to
      both the O_DSYNC and the new additional binary flag (__O_SYNC) to make
      sure we are backwards-compatible when compiled against the new headers.
      
      This also means that all places that don't care about the differences can
      just check O_DSYNC and get the right behaviour for O_SYNC, too - only
      places that actuall care need to check __O_SYNC in addition.  Drivers and
      network filesystems have been updated in a fail safe way to always do the
      full sync magic if O_DSYNC is set.  The few places setting O_SYNC for
      lower layers are kept that way for now to stay failsafe.
      
      We enforce that O_DSYNC is set when __O_SYNC is set early in the open path
      to make sure we always get these sane options.
      
      Note that parisc really screwed up their headers as they already define a
      O_DSYNC that has always been a no-op.  We try to repair it by using it for
      the new O_DSYNC and redefinining O_SYNC to send both the traditional
      O_SYNC numerical value _and_ the O_DSYNC one.
      
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger@sun.com>
      Acked-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Acked-by: NKyle McMartin <kyle@mcmartin.ca>
      Acked-by: NUlrich Drepper <drepper@redhat.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJan Kara <jack@suse.cz>
      6b2f3d1f
  14. 29 10月, 2009 1 次提交
  15. 23 9月, 2009 3 次提交
    • T
      ocfs2: Call refcount tree remove process properly. · 8b2c0dba
      Tao Ma 提交于
      Now with xattr refcount support, we need to check whether
      we have xattr refcounted before we remove the refcount tree.
      
      Now the mechanism is:
      1) Check whether i_clusters == 0, if no, exit.
      2) check whether we have i_xattr_loc in dinode. if yes, exit.
      2) Check whether we have inline xattr stored outside, if yes, exit.
      4) Remove the tree.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      8b2c0dba
    • T
      ocfs2: CoW a reflinked cluster when it is truncated. · 37f8a2bf
      Tao Ma 提交于
      When we truncate a file to a specific size which resides in a reflinked
      cluster, we need to CoW it since ocfs2_zero_range_for_truncate will
      zero the space after the size(just another type of write).
      
      So we add a "max_cpos" in ocfs2_refcount_cow so that it will stop when
      it hit the max cluster offset.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      37f8a2bf
    • T
      ocfs2: Integrate CoW in file write. · 293b2f70
      Tao Ma 提交于
      When we use mmap, we CoW the refcountd clusters in
      ocfs2_write_begin_nolock. While for normal file
      io(including directio), we do CoW in
      ocfs2_prepare_inode_for_write.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      293b2f70
  16. 14 9月, 2009 2 次提交
  17. 05 9月, 2009 3 次提交
  18. 21 7月, 2009 1 次提交
  19. 11 7月, 2009 1 次提交
  20. 23 6月, 2009 1 次提交
  21. 10 6月, 2009 1 次提交
    • H
      ocfs2: fdatasync should skip unimportant metadata writeout · e04cc15f
      Hisashi Hifumi 提交于
      In ocfs2, fdatasync and fsync are identical.
      I think fdatasync should skip committing transaction when
      inode->i_state is set just I_DIRTY_SYNC and this indicates
      only atime or/and mtime updates.
      Following patch improves fdatasync throughput.
      
      #sysbench --num-threads=16 --max-requests=300000 --test=fileio
      --file-block-size=4K --file-total-size=16G --file-test-mode=rndwr
      --file-fsync-mode=fdatasync run
      
      Results:
      -2.6.30-rc8
      Test execution summary:
          total time:                          107.1445s
          total number of events:              119559
          total time taken by event execution: 116.1050
          per-request statistics:
               min:                            0.0000s
               avg:                            0.0010s
               max:                            0.1220s
               approx.  95 percentile:         0.0016s
      
      Threads fairness:
          events (avg/stddev):           7472.4375/303.60
          execution time (avg/stddev):   7.2566/0.64
      
      -2.6.30-rc8-patched
      Test execution summary:
          total time:                          86.8529s
          total number of events:              300016
          total time taken by event execution: 24.3077
          per-request statistics:
               min:                            0.0000s
               avg:                            0.0001s
               max:                            0.0336s
               approx.  95 percentile:         0.0001s
      
      Threads fairness:
          events (avg/stddev):           18751.0000/718.75
          execution time (avg/stddev):   1.5192/0.05
      Signed-off-by: NHisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      e04cc15f
  22. 04 6月, 2009 1 次提交
    • J
      ocfs2: Fix possible deadlock with quotas in ocfs2_setattr() · 65bac575
      Jan Kara 提交于
      We called vfs_dq_transfer() with global quota file lock held. This can lead
      to deadlocks as if vfs_dq_transfer() has to allocate new quota structure,
      it calls ocfs2_dquot_acquire() which tries to get quota file lock again and
      this can block if another node requested the lock in the mean time.
      
      Since we have to call vfs_dq_transfer() with transaction already started
      and quota file lock ranks above the transaction start, we cannot just rely
      on ocfs2_dquot_acquire() or ocfs2_dquot_release() on getting the lock
      if they need it. We fix the problem by acquiring pointers to all quota
      structures needed by vfs_dq_transfer() already before calling the function.
      By this we are sure that all quota structures are properly allocated and
      they can be freed only after we drop references to them. Thus we don't need
      quota file lock anywhere inside vfs_dq_transfer().
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      65bac575
  23. 15 4月, 2009 1 次提交
  24. 07 4月, 2009 1 次提交
    • M
      splice: fix deadlock in splicing to file · 7bfac9ec
      Miklos Szeredi 提交于
      There's a possible deadlock in generic_file_splice_write(),
      splice_from_pipe() and ocfs2_file_splice_write():
      
       - task A calls generic_file_splice_write()
       - this calls inode_double_lock(), which locks i_mutex on both
         pipe->inode and target inode
       - ordering depends on inode pointers, can happen that pipe->inode is
         locked first
       - __splice_from_pipe() needs more data, calls pipe_wait()
       - this releases lock on pipe->inode, goes to interruptible sleep
       - task B calls generic_file_splice_write(), similarly to the first
       - this locks pipe->inode, then tries to lock inode, but that is
         already held by task A
       - task A is interrupted, it tries to lock pipe->inode, but fails, as
         it is already held by task B
       - ABBA deadlock
      
      Fix this by explicitly ordering locks: the outer lock must be on
      target inode and the inner lock (which is later unlocked and relocked)
      must be on pipe->inode.  This is OK, pipe inodes and target inodes
      form two nonoverlapping sets, generic_file_splice_write() and friends
      are not called with a target which is a pipe.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bfac9ec
  25. 09 1月, 2009 1 次提交
  26. 06 1月, 2009 2 次提交
    • J
      ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. · 13723d00
      Joel Becker 提交于
      The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
      commit triggers and allow us to compute metadata ecc right before the
      buffers are written out.  This commit provides ecc for inodes, extent
      blocks, group descriptors, and quota blocks.  It is not safe to use
      extened attributes and metaecc at the same time yet.
      
      The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
      the type of block at their root.  Before, it didn't matter, but now the
      root block must use the appropriate ocfs2_journal_access_*() function.
      To keep this abstract, the structures now have a pointer to the matching
      journal_access function and a wrapper call to call it.
      
      A few places use naked ocfs2_write_block() calls instead of adding the
      blocks to the journal.  We make sure to calculate their checksum and ecc
      before the write.
      
      Since we pass around the journal_access functions.  Let's typedef them
      in ocfs2.h.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      13723d00
    • J
      ocfs2: Add quota calls for allocation and freeing of inodes and space · a90714c1
      Jan Kara 提交于
      Add quota calls for allocation and freeing of inodes and space, also update
      estimates on number of needed credits for a transaction. Move out inode
      allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called
      outside of a transaction.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      a90714c1