1. 01 6月, 2015 1 次提交
    • B
      xfs: fix quota block reservation leak when tp allocates and frees blocks · 7f884dc1
      Brian Foster 提交于
      Al Viro reports that generic/231 fails frequently on XFS and bisected
      the problem to the following commit:
      
      	5d11fb4b xfs: rework zero range to prevent invalid i_size updates
      
      ... which is just the first commit that happens to cause fsx to
      reproduce the problem. fsx reproduces via zero range calls. The
      aforementioned commit overhauls zero range to use hole punch and
      fallocate. As it turns out, the problem is reproducible on demand using
      basic hole punch as follows:
      
      $ mkfs.xfs -f -m crc=1,finobt=1 <dev>
      $ mount <dev> /mnt -o uquota
      $ xfs_io -f -c "falloc 0 50m" /mnt/file
      $ for i in $(seq 1 20); do xfs_io -c "fpunch ${i}m 32k" /mnt/file; done
      $ rm -f /mnt/file
      $ repquota -us /mnt
      ...
      User            used    soft    hard  grace    used  soft  hard  grace
      ----------------------------------------------------------------------
      root      --     32K      0K      0K              3     0     0
      
      A file is allocated with a single 50m extent. The extent count increases
      via hole punches until the bmap converts to btree format. The file is
      removed but quota reports 32k of space usage for the user. This
      reservation is effectively leaked for the lifetime of the mount.
      
      The reason this occurs is because the quota block reservation tracking
      is confused when a transaction happens to free and allocate blocks at
      the same time. Consider the following sequence of events:
      
      - tp is allocated from xfs_free_file_space() and reserves several blocks
        for btree management. Blocks are reserved against the dquot and marked
        as such in the transaction (qtrx->qt_blk_res).
      - 8 blocks are accounted free when the 32k range is punched out.
        xfs_trans_mod_dquot() is called with XFS_TRANS_DQ_BCOUNT and sets
        ->qt_bcount_delta to -8.
      - Subsequently, a block is allocated against the same transaction by
        xfs_bmap_extents_to_btree() for btree conversion. A call to
        xfs_trans_mod_dquot() increases qt_blk_res_used to 1 and qt_bcount_delta
        to -7.
      - The transaction is dup'd and committed by xfs_bmap_finish().
        xfs_trans_dup_dqinfo() sets the first transaction up such that it has a
        matching qt_blk_res and qt_blk_res_used of 1. The remaining unused
        reservation is transferred to the duplicate tp.
      
      When the transactions are committed, the dquots are fixed up in
      xfs_trans_apply_dquot_deltas() according to one of two methods:
      
      1.) If the transaction holds a block reservation (->qt_blk_res != 0),
      _only_ the unused portion reservation is unaccounted from the dquot.
      Note that the tp duplication behavior of xfs_bmap_finish() makes it such
      that qt_blk_res is typically 0 for tp's with unused reservation.
      2.) Otherwise, the dquot is fixed up based on the block delta
      (->qt_bcount_delta) created by the transaction.
      
      Therefore, if a transaction has a negative qt_bcount_delta and positive
      qt_blk_res_used, the former set of blocks that have been removed from
      the file are never factored out of the in-core dquot reservation.
      Instead, *_apply_dquot_deltas() sees 1 block used out of a 1 block
      reservation and believes there is nothing to fix up. The on-disk
      d_bcount is updated independently from qt_bcount_delta, and thus is
      correct (and allows the quota usage to correct on remount).
      
      To deal with this situation, we effectively want the "used reservation"
      part of the transaction to be consistent with any freed blocks with
      respect to quota tracking. For example, if 8 blocks are freed, the
      subsequent single block allocation does not need to consume the initial
      reservation made by the tp. Instead, it simply borrows one from the
      previously freed. One possible implementation of such borrowing is to
      avoid the blks_res_used increment when bcount_delta is negative. This
      alone is flawed logic in that it only handles the case where blocks are
      freed before allocated, however.
      
      Rather than add more complexity to manage synchronization between
      bcount_delta and blks_res_used, kill the latter entirely. blk_res_used
      is only updated in one place and always in sync with delta_bcount.
      Therefore, the net block reservation consumption of the transaction is
      always available from bcount_delta. Calculate the reservation
      consumption on the fly where necessary based on whether the tp has a
      reservation and results in a positive net block delta on the inode.
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      7f884dc1
  2. 24 10月, 2013 2 次提交
  3. 04 9月, 2013 1 次提交
  4. 16 8月, 2013 1 次提交
  5. 13 8月, 2013 2 次提交
  6. 12 7月, 2013 1 次提交
  7. 11 7月, 2013 1 次提交
    • C
      xfs: Add pquota fields where gquota is used. · 92f8ff73
      Chandra Seetharaman 提交于
      Add project quota changes to all the places where group quota field
      is used:
         * add separate project quota members into various structures
         * split project quota and group quotas so that instead of overriding
           the group quota members incore, the new project quota members are
           used instead
         * get rid of usage of the OQUOTA flag incore, in favor of separate
           group and project quota flags.
         * add a project dquot argument to various functions.
      
      Not using the pquotino field from superblock yet.
      Signed-off-by: NChandra Seetharaman <sekharan@us.ibm.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      92f8ff73
  8. 10 7月, 2013 1 次提交
    • D
      xfs: dquot log reservations are too small · b0a9dab7
      Dave Chinner 提交于
      During review of the separate project quota inode patches, it became
      obvious that the dquot log reservation calculation underestimated
      the number dquots that can be modified in a transaction. This has
      it's roots way back in the Irix quota implementation.
      
      That is, when quotas were first implemented in XFS, it only
      supported user and project quotas as Irix did not have group quotas.
      Hence the worst case operation involving dquot modification was
      calculated to involve 2 user dquots and 1 project dquot or 1 user
      dequot and 2 project dquots. i.e. 3 dquots. This was determined back
      in 1996, and has remained unchanged ever since.
      
      However, back in 2001, the Linux XFS port dropped all support for
      project quota and implmented group quotas over the top. This was
      effectively done with a search-and-replace of project with group,
      and as such the log reservation was not changed. However, with the
      advent of group quotas, chmod and rename now could modify more than
      3 dquots in a single transaction - both could modify 4 dquots. Hence
      this log reservation has been wrong for a long time.
      
      In 2005, project quota support was reintroduced into Linux, but it
      was implemented to be mutually exclusive to group quotas and so this
      didn't add any new changes to the dquot log reservation. Hence when
      project quotas were in use (rather than group quotas) the log
      reservation was again valid, just like in the Irix days.
      
      Now, with the addition of the separate project quota inode, group
      and project quotas are no longer mutually exclusive, and hence
      operations can now modify three dquots per inode where previously it
      was only two. The worst case here is the rename transaction, which
      can allocate/free space on two different directory inodes, and if
      they have different uid/gid/prid configurations and are world
      writeable, then rename can actually modify 6 different dquots now.
      
      Further, the dquot log reservation doesn't take into account the
      space used by the dquot log format structure that precedes the dquot
      that is logged, and hence further underestimates the worst case
      log space required by dquots during a transaction. This has been
      missing since the first commit in 1996.
      
      Hence the worst case log reservation needs to be increased from 3 to
      6, and it needs to take into account a log format header for each of
      those dquots.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      b0a9dab7
  9. 29 6月, 2013 1 次提交
  10. 06 6月, 2013 1 次提交
    • D
      xfs: rework dquot CRCs · bb9b8e86
      Dave Chinner 提交于
      Calculating dquot CRCs when the backing buffer is written back just
      doesn't work reliably. There are several places which manipulate
      dquots directly in the buffers, and they don't calculate CRCs
      appropriately, nor do they always set the buffer up to calculate
      CRCs appropriately.
      
      Firstly, if we log a dquot buffer (e.g. during allocation) it gets
      logged without valid CRC, and so on recovery we end up with a dquot
      that is not valid.
      
      Secondly, if we recover/repair a dquot, we don't have a verifier
      attached to the buffer and hence CRCs are not calculated on the way
      down to disk.
      
      Thirdly, calculating the CRC after we've changed the contents means
      that if we re-read the dquot from the buffer, we cannot verify the
      contents of the dquot are valid, as the CRC is invalid.
      
      So, to avoid all the dquot CRC errors that are being detected by the
      read verifier, change to using the same model as for inodes. That
      is, dquot CRCs are calculated and written to the backing buffer at
      the time the dquot is flushed to the backing buffer. If we modify
      the dquot directly in the backing buffer, calculate the CRC
      immediately after the modification is complete. Hence the dquot in
      the on-disk buffer should always have a valid CRC.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 6fcdc59d)
      bb9b8e86
  11. 05 6月, 2013 1 次提交
    • D
      xfs: rework dquot CRCs · 6fcdc59d
      Dave Chinner 提交于
      Calculating dquot CRCs when the backing buffer is written back just
      doesn't work reliably. There are several places which manipulate
      dquots directly in the buffers, and they don't calculate CRCs
      appropriately, nor do they always set the buffer up to calculate
      CRCs appropriately.
      
      Firstly, if we log a dquot buffer (e.g. during allocation) it gets
      logged without valid CRC, and so on recovery we end up with a dquot
      that is not valid.
      
      Secondly, if we recover/repair a dquot, we don't have a verifier
      attached to the buffer and hence CRCs are not calculated on the way
      down to disk.
      
      Thirdly, calculating the CRC after we've changed the contents means
      that if we re-read the dquot from the buffer, we cannot verify the
      contents of the dquot are valid, as the CRC is invalid.
      
      So, to avoid all the dquot CRC errors that are being detected by the
      read verifier, change to using the same model as for inodes. That
      is, dquot CRCs are calculated and written to the backing buffer at
      the time the dquot is flushed to the backing buffer. If we modify
      the dquot directly in the backing buffer, calculate the CRC
      immediately after the modification is complete. Hence the dquot in
      the on-disk buffer should always have a valid CRC.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      6fcdc59d
  12. 22 4月, 2013 1 次提交
  13. 04 2月, 2012 1 次提交
  14. 16 12月, 2011 1 次提交
  15. 15 12月, 2011 1 次提交
  16. 14 12月, 2011 2 次提交
  17. 13 12月, 2011 1 次提交
  18. 07 3月, 2011 1 次提交
  19. 11 11月, 2010 1 次提交
  20. 19 5月, 2010 1 次提交
  21. 04 2月, 2010 1 次提交
  22. 22 1月, 2010 1 次提交
  23. 15 12月, 2009 1 次提交
    • C
      xfs: event tracing support · 0b1b213f
      Christoph Hellwig 提交于
      Convert the old xfs tracing support that could only be used with the
      out of tree kdb and xfsidbg patches to use the generic event tracer.
      
      To use it make sure CONFIG_EVENT_TRACING is enabled and then enable
      all xfs trace channels by:
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/enable
      
      or alternatively enable single events by just doing the same in one
      event subdirectory, e.g.
      
         echo 1 > /sys/kernel/debug/tracing/events/xfs/xfs_ihold/enable
      
      or set more complex filters, etc. In Documentation/trace/events.txt
      all this is desctribed in more detail.  To reads the events do a
      
         cat /sys/kernel/debug/tracing/trace
      
      Compared to the last posting this patch converts the tracing mostly to
      the one tracepoint per callsite model that other users of the new
      tracing facility also employ.  This allows a very fine-grained control
      of the tracing, a cleaner output of the traces and also enables the
      perf tool to use each tracepoint as a virtual performance counter,
           allowing us to e.g. count how often certain workloads git various
           spots in XFS.  Take a look at
      
          http://lwn.net/Articles/346470/
      
      for some examples.
      
      Also the btree tracing isn't included at all yet, as it will require
      additional core tracing features not in mainline yet, I plan to
      deliver it later.
      
      And the really nice thing about this patch is that it actually removes
      many lines of code while adding this nice functionality:
      
       fs/xfs/Makefile                |    8
       fs/xfs/linux-2.6/xfs_acl.c     |    1
       fs/xfs/linux-2.6/xfs_aops.c    |   52 -
       fs/xfs/linux-2.6/xfs_aops.h    |    2
       fs/xfs/linux-2.6/xfs_buf.c     |  117 +--
       fs/xfs/linux-2.6/xfs_buf.h     |   33
       fs/xfs/linux-2.6/xfs_fs_subr.c |    3
       fs/xfs/linux-2.6/xfs_ioctl.c   |    1
       fs/xfs/linux-2.6/xfs_ioctl32.c |    1
       fs/xfs/linux-2.6/xfs_iops.c    |    1
       fs/xfs/linux-2.6/xfs_linux.h   |    1
       fs/xfs/linux-2.6/xfs_lrw.c     |   87 --
       fs/xfs/linux-2.6/xfs_lrw.h     |   45 -
       fs/xfs/linux-2.6/xfs_super.c   |  104 ---
       fs/xfs/linux-2.6/xfs_super.h   |    7
       fs/xfs/linux-2.6/xfs_sync.c    |    1
       fs/xfs/linux-2.6/xfs_trace.c   |   75 ++
       fs/xfs/linux-2.6/xfs_trace.h   | 1369 +++++++++++++++++++++++++++++++++++++++++
       fs/xfs/linux-2.6/xfs_vnode.h   |    4
       fs/xfs/quota/xfs_dquot.c       |  110 ---
       fs/xfs/quota/xfs_dquot.h       |   21
       fs/xfs/quota/xfs_qm.c          |   40 -
       fs/xfs/quota/xfs_qm_syscalls.c |    4
       fs/xfs/support/ktrace.c        |  323 ---------
       fs/xfs/support/ktrace.h        |   85 --
       fs/xfs/xfs.h                   |   16
       fs/xfs/xfs_ag.h                |   14
       fs/xfs/xfs_alloc.c             |  230 +-----
       fs/xfs/xfs_alloc.h             |   27
       fs/xfs/xfs_alloc_btree.c       |    1
       fs/xfs/xfs_attr.c              |  107 ---
       fs/xfs/xfs_attr.h              |   10
       fs/xfs/xfs_attr_leaf.c         |   14
       fs/xfs/xfs_attr_sf.h           |   40 -
       fs/xfs/xfs_bmap.c              |  507 +++------------
       fs/xfs/xfs_bmap.h              |   49 -
       fs/xfs/xfs_bmap_btree.c        |    6
       fs/xfs/xfs_btree.c             |    5
       fs/xfs/xfs_btree_trace.h       |   17
       fs/xfs/xfs_buf_item.c          |   87 --
       fs/xfs/xfs_buf_item.h          |   20
       fs/xfs/xfs_da_btree.c          |    3
       fs/xfs/xfs_da_btree.h          |    7
       fs/xfs/xfs_dfrag.c             |    2
       fs/xfs/xfs_dir2.c              |    8
       fs/xfs/xfs_dir2_block.c        |   20
       fs/xfs/xfs_dir2_leaf.c         |   21
       fs/xfs/xfs_dir2_node.c         |   27
       fs/xfs/xfs_dir2_sf.c           |   26
       fs/xfs/xfs_dir2_trace.c        |  216 ------
       fs/xfs/xfs_dir2_trace.h        |   72 --
       fs/xfs/xfs_filestream.c        |    8
       fs/xfs/xfs_fsops.c             |    2
       fs/xfs/xfs_iget.c              |  111 ---
       fs/xfs/xfs_inode.c             |   67 --
       fs/xfs/xfs_inode.h             |   76 --
       fs/xfs/xfs_inode_item.c        |    5
       fs/xfs/xfs_iomap.c             |   85 --
       fs/xfs/xfs_iomap.h             |    8
       fs/xfs/xfs_log.c               |  181 +----
       fs/xfs/xfs_log_priv.h          |   20
       fs/xfs/xfs_log_recover.c       |    1
       fs/xfs/xfs_mount.c             |    2
       fs/xfs/xfs_quota.h             |    8
       fs/xfs/xfs_rename.c            |    1
       fs/xfs/xfs_rtalloc.c           |    1
       fs/xfs/xfs_rw.c                |    3
       fs/xfs/xfs_trans.h             |   47 +
       fs/xfs/xfs_trans_buf.c         |   62 -
       fs/xfs/xfs_vnodeops.c          |    8
       70 files changed, 2151 insertions(+), 2592 deletions(-)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      0b1b213f
  24. 13 6月, 2009 1 次提交
  25. 08 6月, 2009 1 次提交
    • C
      xfs: kill xfs_qmops · 7d095257
      Christoph Hellwig 提交于
      Kill the quota ops function vector and replace it with direct calls or
      stubs in the CONFIG_XFS_QUOTA=n case.
      
      Make sure we check XFS_IS_QUOTA_RUNNING in the right spots.  We can remove
      the number of those checks because the XFS_TRANS_DQ_DIRTY flag can't be set
      otherwise.
      
      This brings us back closer to the way this code worked in IRIX and earlier
      Linux versions, but we keep a lot of the more useful factoring of common
      code.
      
      Eventually we should also kill xfs_qm_bhv.c, but that's left for a later
      patch.
      
      Reduces the size of the source code by about 250 lines and the size of
      XFS module by about 1.5 kilobytes with quotas enabled:
      
         text	   data	    bss	    dec	    hex	filename
       615957	   2960	   3848	 622765	  980ad	fs/xfs/xfs.o
       617231	   3152	   3848	 624231	  98667	fs/xfs/xfs.o.old
      
      Fallout:
      
       - xfs_qm_dqattach is split into xfs_qm_dqattach_locked which expects
         the inode locked and xfs_qm_dqattach which does the locking around it,
         thus removing XFS_QMOPT_ILOCKED.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NEric Sandeen <sandeen@sandeen.net>
      7d095257
  26. 09 2月, 2009 2 次提交
  27. 04 12月, 2008 1 次提交
  28. 16 10月, 2007 1 次提交
  29. 08 5月, 2007 1 次提交
  30. 28 9月, 2006 1 次提交
  31. 09 6月, 2006 1 次提交
  32. 31 3月, 2006 1 次提交
  33. 29 3月, 2006 1 次提交
  34. 02 11月, 2005 3 次提交