1. 07 1月, 2016 1 次提交
  2. 04 3月, 2015 1 次提交
  3. 17 2月, 2015 1 次提交
    • D
      Btrfs: ctree: reduce args where only fs_info used · b7a0365e
      Daniel Dressler 提交于
      This patch is part of a larger project to cleanup btrfs's internal usage
      of struct btrfs_root. Many functions take btrfs_root only to grab a
      pointer to fs_info.
      
      This causes programmers to ponder which root can be passed. Since only
      the fs_info is read affected functions can accept any root, except this
      is only obvious upon inspection.
      
      This patch reduces the specificty of such functions to accept the
      fs_info directly.
      
      This patch does not address the two functions in ctree.c (insert_ptr,
      and split_item) which only use root for BUG_ONs in ctree.c
      
      This patch affects the following functions:
        1) fixup_low_keys
        2) btrfs_set_item_key_safe
      Signed-off-by: NDaniel Dressler <danieru.dressler@gmail.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      b7a0365e
  4. 04 11月, 2014 1 次提交
  5. 18 9月, 2014 3 次提交
  6. 15 8月, 2014 1 次提交
    • F
      Btrfs: fix csum tree corruption, duplicate and outdated checksums · 27b9a812
      Filipe Manana 提交于
      Under rare circumstances we can end up leaving 2 versions of a checksum
      for the same file extent range.
      
      The reason for this is that after calling btrfs_next_leaf we process
      slot 0 of the leaf it returns, instead of processing the slot set in
      path->slots[0]. Most of the time (by far) path->slots[0] is 0, but after
      btrfs_next_leaf() releases the path and before it searches for the next
      leaf, another task might cause a split of the next leaf, which migrates
      some of its keys to the leaf we were processing before calling
      btrfs_next_leaf(). In this case btrfs_next_leaf() returns again the
      same leaf but with path->slots[0] having a slot number corresponding
      to the first new key it got, that is, a slot number that didn't exist
      before calling btrfs_next_leaf(), as the leaf now has more keys than
      it had before. So we must really process the returned leaf starting at
      path->slots[0] always, as it isn't always 0, and the key at slot 0 can
      have an offset much lower than our search offset/bytenr.
      
      For example, consider the following scenario, where we have:
      
      sums->bytenr: 40157184, sums->len: 16384, sums end: 40173568
      four 4kb file data blocks with offsets 40157184, 40161280, 40165376, 40169472
      
        Leaf N:
      
          slot = 0                           slot = btrfs_header_nritems() - 1
        |-------------------------------------------------------------------|
        | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4] |
        |-------------------------------------------------------------------|
      
        Leaf N + 1:
      
            slot = 0                          slot = btrfs_header_nritems() - 1
        |--------------------------------------------------------------------|
        | [(CSUM CSUM 40161280), size 32] ... [((CSUM CSUM 40615936), size 8 |
        |--------------------------------------------------------------------|
      
      Because we are at the last slot of leaf N, we call btrfs_next_leaf() to
      find the next highest key, which releases the current path and then searches
      for that next key. However after releasing the path and before finding that
      next key, the item at slot 0 of leaf N + 1 gets moved to leaf N, due to a call
      to ctree.c:push_leaf_left() (via ctree.c:split_leaf()), and therefore
      btrfs_next_leaf() will returns us a path again with leaf N but with the slot
      pointing to its new last key (CSUM CSUM 40161280). This new version of leaf N
      is then:
      
          slot = 0                        slot = btrfs_header_nritems() - 2  slot = btrfs_header_nritems() - 1
        |----------------------------------------------------------------------------------------------------|
        | [(CSUM CSUM 39239680), size 8] ... [(CSUM CSUM 40116224), size 4]  [(CSUM CSUM 40161280), size 32] |
        |----------------------------------------------------------------------------------------------------|
      
      And incorrecly using slot 0, makes us set next_offset to 39239680 and we jump
      into the "insert:" label, which will set tmp to:
      
          tmp = min((sums->len - total_bytes) >> blocksize_bits,
              (next_offset - file_key.offset) >> blocksize_bits) =
          min((16384 - 0) >> 12, (39239680 - 40157184) >> 12) =
          min(4, (u64)-917504 = 18446744073708634112 >> 12) = 4
      
      and
      
         ins_size = csum_size * tmp = 4 * 4 = 16 bytes.
      
      In other words, we insert a new csum item in the tree with key
      (CSUM_OBJECTID CSUM_KEY 40157184 = sums->bytenr) that contains the checksums
      for all the data (4 blocks of 4096 bytes each = sums->len). Which is wrong,
      because the item with key (CSUM CSUM 40161280) (the one that was moved from
      leaf N + 1 to the end of leaf N) contains the old checksums of the last 12288
      bytes of our data and won't get those old checksums removed.
      
      So this leaves us 2 different checksums for 3 4kb blocks of data in the tree,
      and breaks the logical rule:
      
         Key_N+1.offset >= Key_N.offset + length_of_data_its_checksums_cover
      
      An obvious bad effect of this is that a subsequent csum tree lookup to get
      the checksum of any of the blocks with logical offset of 40161280, 40165376
      or 40169472 (the last 3 4kb blocks of file data), will get the old checksums.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      27b9a812
  7. 10 6月, 2014 3 次提交
    • F
      Btrfs: make fsync work after cloning into a file · 7ffbb598
      Filipe Manana 提交于
      When cloning into a file, we were correctly replacing the extent
      items in the target range and removing the extent maps. However
      we weren't replacing the extent maps with new ones that point to
      the new extents - as a consequence, an incremental fsync (when the
      inode doesn't have the full sync flag) was a NOOP, since it relies
      on the existence of extent maps in the modified list of the inode's
      extent map tree, which was empty. Therefore add new extent maps to
      reflect the target clone range.
      
      A test case for xfstests follows.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      7ffbb598
    • F
      Btrfs: don't access non-existent key when csum tree is empty · 35045bf2
      Filipe Manana 提交于
      When the csum tree is empty, our leaf (path->nodes[0]) has a number
      of items equal to 0 and since btrfs_header_nritems() returns an
      unsigned integer (and so is our local nritems variable) the following
      comparison always evaluates to false:
      
           if (path->slots[0] >= nritems - 1) {
      
      As the casting rules lead to:
      
           if ((u32)0 >= (u32)4294967295) {
      
      This makes us access key at slot paths->slots[0] + 1 (1) of the empty leaf
      some lines below:
      
          btrfs_item_key_to_cpu(path->nodes[0], &found_key, slot);
          if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID ||
              found_key.type != BTRFS_EXTENT_CSUM_KEY) {
      		found_next = 1;
      		goto insert;
          }
      
      So just don't access such non-existent slot and don't set found_next to 1
      when the tree is empty. It's very unlikely we'll get a random key with the
      objectid and type values above, which is where we could go into trouble.
      
      If nritems is 0, just set found_next to 1 anyway as it will make us insert
      a csum item covering our whole extent (or the whole leaf) when the tree is
      empty.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      35045bf2
    • L
      Btrfs: do not increment on bio_index one by one · d2cbf2a2
      Liu Bo 提交于
      'bio_index' is just a index, it's really not necessary to do increment
      one by one.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      d2cbf2a2
  8. 29 1月, 2014 1 次提交
  9. 24 11月, 2013 1 次提交
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  10. 12 11月, 2013 2 次提交
  11. 01 9月, 2013 2 次提交
  12. 02 7月, 2013 1 次提交
    • M
      Btrfs: remove btrfs_sector_sum structure · f51a4a18
      Miao Xie 提交于
      Using the structure btrfs_sector_sum to keep the checksum value is
      unnecessary, because the extents that btrfs_sector_sum points to are
      continuous, we can find out the expected checksums by btrfs_ordered_sum's
      bytenr and the offset, so we can remove btrfs_sector_sum's bytenr. After
      removing bytenr, there is only one member in the structure, so it makes
      no sense to keep the structure, just remove it, and use a u32 array to
      store the checksum value.
      
      By this change, we don't use the while loop to get the checksums one by
      one. Now, we can get several checksum value at one time, it improved the
      performance by ~74% on my SSD (31MB/s -> 54MB/s).
      
      test command:
       # dd if=/dev/zero of=/mnt/btrfs/file0 bs=1M count=1024 oflag=sync
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f51a4a18
  13. 07 5月, 2013 6 次提交
  14. 28 3月, 2013 2 次提交
  15. 21 2月, 2013 1 次提交
    • L
      Btrfs: extend the checksum item as much as possible · 2f697dc6
      Liu Bo 提交于
      For write, we also reserve some space for COW blocks during updating
      the checksum tree, and we calculate the number of blocks by checking
      if the number of bytes outstanding that are going to need csums needs
      one more block for csum.
      
      When we add these checksum into the checksum tree, we use ordered sums
      list.
      Every ordered sum contains csums for each sector, and we'll first try
      to look up an existing csum item,
      a) if we don't yet have a proper csum item, then we need to insert one,
      b) or if we find one but the csum item is not big enough, then we need
      to extend it.
      
      The point is we'll unlock the whole path and then insert or extend.
      So others can hack in and update the tree.
      
      Each insert or extend needs update the tree with COW on, and we may need
      to insert/extend for many times.
      
      That means what we've reserved for updating checksum tree is NOT enough
      indeed.
      
      The case is even more serious with having several write threads at the
      same time, it can end up eating our reserved space quickly and starting
      eating globle reserve pool instead.
      
      I don't yet come up with a way to calculate the worse case for updating
      csum, but extending the checksum item as much as possible can be helpful
      in my test.
      
      The idea behind is that it can reduce the times we insert/extend so that
      it saves us precious reserved space.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2f697dc6
  16. 25 1月, 2013 1 次提交
    • J
      Btrfs: put csums on the right ordered extent · e58dd74b
      Josef Bacik 提交于
      I noticed a WARN_ON going off when adding csums because we were going over
      the amount of csum bytes that should have been allowed for an ordered
      extent.  This is a leftover from when we used to hold the csums privately
      for direct io, but now we use the normal ordered sum stuff so we need to
      make sure and check if we've moved on to another extent so that the csums
      are added to the right extent.  Without this we could end up with csums for
      bytenrs that don't have extents to cover them yet.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e58dd74b
  17. 13 12月, 2012 1 次提交
  18. 09 10月, 2012 1 次提交
    • Z
      btrfs: fix min csum item size warnings in 32bit · 221b8318
      Zach Brown 提交于
      commit 7ca4be45 limited csum items to
      PAGE_CACHE_SIZE.  It used min() with incompatible types in 32bit which
      generates warnings:
      
      fs/btrfs/file-item.c: In function ‘btrfs_csum_file_blocks’:
      fs/btrfs/file-item.c:717: warning: comparison of distinct pointer types lacks a cast
      
      This uses min_t(u32,) to fix the warnings.  u32 seemed reasonable
      because btrfs_root->leafsize is u32 and PAGE_CACHE_SIZE is unsigned
      long.
      Signed-off-by: NZach Brown <zab@zabbo.net>
      221b8318
  19. 02 10月, 2012 1 次提交
  20. 29 8月, 2012 1 次提交
    • J
      Btrfs: don't allocate a seperate csums array for direct reads · c329861d
      Josef Bacik 提交于
      We've been allocating a big array for csums instead of storing them in the
      io_tree like we do for buffered reads because previously we were locking the
      entire range, so we didn't have an extent state for each sector of the
      range.  But now that we do the range locking as we map the buffers we can
      limit the mapping lenght to sectorsize and use the private part of the
      io_tree for our csums.  This allows us to avoid an extra memory allocation
      for direct reads which could incur latency.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c329861d
  21. 24 7月, 2012 2 次提交
    • L
      Btrfs: kill root from btrfs_is_free_space_inode · 83eea1f1
      Liu Bo 提交于
      Since root can be fetched via BTRFS_I macro directly, we can save an args
      for btrfs_is_free_space_inode().
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      83eea1f1
    • J
      Btrfs: change how we indicate we're adding csums · 0e721106
      Josef Bacik 提交于
      There is weird logic I had to put in place to make sure that when we were
      adding csums that we'd used the delalloc block rsv instead of the global
      block rsv.  Part of this meant that we had to free up our transaction
      reservation before we ran the delayed refs since csum deletion happens
      during the delayed ref work.  The problem with this is that when we release
      a reservation we will add it to the global reserve if it is not full in
      order to keep us going along longer before we have to force a transaction
      commit.  By releasing our reservation before we run delayed refs we don't
      get the opportunity to drain down the global reserve for the work we did, so
      we won't refill it as often.  This isn't a problem per-se, it just results
      in us possibly committing transactions more and more often, and in rare
      cases could cause those WARN_ON()'s to pop in use_block_rsv because we ran
      out of space in our block rsv.
      
      This also helps us by holding onto space while the delayed refs run so we
      don't end up with as many people trying to do things at the same time, which
      again will help us not force commits or hit the use_block_rsv warnings.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0e721106
  22. 29 3月, 2012 1 次提交
    • C
      Btrfs: don't use crc items bigger than 4KB · 7ca4be45
      Chris Mason 提交于
      With the big metadata blocks, we can have crc items
      that are much bigger than a page.  There are a few
      places that we try to kmalloc memory to hold the
      items during a split.
      
      Items bigger than 4KB don't really have a huge benefit
      in efficiency, but they do trigger larger order allocations.
      This commits changes the csums to make sure they stay under
      4KB.  This is not a format change, just a #define to limit
      huge items.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7ca4be45
  23. 22 3月, 2012 3 次提交
    • J
      btrfs: replace many BUG_ONs with proper error handling · 79787eaa
      Jeff Mahoney 提交于
       btrfs currently handles most errors with BUG_ON. This patch is a work-in-
       progress but aims to handle most errors other than internal logic
       errors and ENOMEM more gracefully.
      
       This iteration prevents most crashes but can run into lockups with
       the page lock on occasion when the timing "works out."
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      79787eaa
    • M
      btrfs: Don't BUG_ON kzalloc error in btrfs_lookup_csums_range() · 0678b618
      Mark Fasheh 提交于
      Unfortunately it isn't enough to just exit here - the kzalloc() happens in a
      loop and the allocated items are added to a linked list whose head is passed
      in from the caller.
      
      To fix the BUG_ON() and also provide the semantic that the list passed in is
      only modified on success, I create function-local temporary list that we add
      items too. If no error is met, that list is spliced to the callers at the
      end of the function. Otherwise the list will be walked and all items freed
      before the error value is returned.
      
      I did a simple test on this patch by forcing an error at the kzalloc() point
      and verifying that when this hits (git clone seemed to exercise this), the
      function throws the proper error. Unfortunately but predictably, we later
      hit a BUG_ON(ret) type line that still hasn't been fixed up ;)
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0678b618
    • J
      143bede5
  24. 20 3月, 2012 1 次提交
  25. 06 11月, 2011 1 次提交
    • D
      btrfs: separate superblock items out of fs_info · 6c41761f
      David Sterba 提交于
      fs_info has now ~9kb, more than fits into one page. This will cause
      mount failure when memory is too fragmented. Top space consumers are
      super block structures super_copy and super_for_commit, ~2.8kb each.
      Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)
      
      Add a wrapper for freeing fs_info and all of it's dynamically allocated
      members.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      6c41761f