1. 10 6月, 2014 3 次提交
    • F
      Btrfs: make fsync work after cloning into a file · 7ffbb598
      Filipe Manana 提交于
      When cloning into a file, we were correctly replacing the extent
      items in the target range and removing the extent maps. However
      we weren't replacing the extent maps with new ones that point to
      the new extents - as a consequence, an incremental fsync (when the
      inode doesn't have the full sync flag) was a NOOP, since it relies
      on the existence of extent maps in the modified list of the inode's
      extent map tree, which was empty. Therefore add new extent maps to
      reflect the target clone range.
      
      A test case for xfstests follows.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      7ffbb598
    • F
      Btrfs: don't access non-existent key when csum tree is empty · 35045bf2
      Filipe Manana 提交于
      When the csum tree is empty, our leaf (path->nodes[0]) has a number
      of items equal to 0 and since btrfs_header_nritems() returns an
      unsigned integer (and so is our local nritems variable) the following
      comparison always evaluates to false:
      
           if (path->slots[0] >= nritems - 1) {
      
      As the casting rules lead to:
      
           if ((u32)0 >= (u32)4294967295) {
      
      This makes us access key at slot paths->slots[0] + 1 (1) of the empty leaf
      some lines below:
      
          btrfs_item_key_to_cpu(path->nodes[0], &found_key, slot);
          if (found_key.objectid != BTRFS_EXTENT_CSUM_OBJECTID ||
              found_key.type != BTRFS_EXTENT_CSUM_KEY) {
      		found_next = 1;
      		goto insert;
          }
      
      So just don't access such non-existent slot and don't set found_next to 1
      when the tree is empty. It's very unlikely we'll get a random key with the
      objectid and type values above, which is where we could go into trouble.
      
      If nritems is 0, just set found_next to 1 anyway as it will make us insert
      a csum item covering our whole extent (or the whole leaf) when the tree is
      empty.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      35045bf2
    • L
      Btrfs: do not increment on bio_index one by one · d2cbf2a2
      Liu Bo 提交于
      'bio_index' is just a index, it's really not necessary to do increment
      one by one.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      d2cbf2a2
  2. 29 1月, 2014 1 次提交
  3. 24 11月, 2013 1 次提交
    • K
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet 提交于
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      Signed-off-by: NKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  4. 12 11月, 2013 2 次提交
  5. 01 9月, 2013 2 次提交
  6. 02 7月, 2013 1 次提交
    • M
      Btrfs: remove btrfs_sector_sum structure · f51a4a18
      Miao Xie 提交于
      Using the structure btrfs_sector_sum to keep the checksum value is
      unnecessary, because the extents that btrfs_sector_sum points to are
      continuous, we can find out the expected checksums by btrfs_ordered_sum's
      bytenr and the offset, so we can remove btrfs_sector_sum's bytenr. After
      removing bytenr, there is only one member in the structure, so it makes
      no sense to keep the structure, just remove it, and use a u32 array to
      store the checksum value.
      
      By this change, we don't use the while loop to get the checksums one by
      one. Now, we can get several checksum value at one time, it improved the
      performance by ~74% on my SSD (31MB/s -> 54MB/s).
      
      test command:
       # dd if=/dev/zero of=/mnt/btrfs/file0 bs=1M count=1024 oflag=sync
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      f51a4a18
  7. 07 5月, 2013 6 次提交
  8. 28 3月, 2013 2 次提交
  9. 21 2月, 2013 1 次提交
    • L
      Btrfs: extend the checksum item as much as possible · 2f697dc6
      Liu Bo 提交于
      For write, we also reserve some space for COW blocks during updating
      the checksum tree, and we calculate the number of blocks by checking
      if the number of bytes outstanding that are going to need csums needs
      one more block for csum.
      
      When we add these checksum into the checksum tree, we use ordered sums
      list.
      Every ordered sum contains csums for each sector, and we'll first try
      to look up an existing csum item,
      a) if we don't yet have a proper csum item, then we need to insert one,
      b) or if we find one but the csum item is not big enough, then we need
      to extend it.
      
      The point is we'll unlock the whole path and then insert or extend.
      So others can hack in and update the tree.
      
      Each insert or extend needs update the tree with COW on, and we may need
      to insert/extend for many times.
      
      That means what we've reserved for updating checksum tree is NOT enough
      indeed.
      
      The case is even more serious with having several write threads at the
      same time, it can end up eating our reserved space quickly and starting
      eating globle reserve pool instead.
      
      I don't yet come up with a way to calculate the worse case for updating
      csum, but extending the checksum item as much as possible can be helpful
      in my test.
      
      The idea behind is that it can reduce the times we insert/extend so that
      it saves us precious reserved space.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2f697dc6
  10. 25 1月, 2013 1 次提交
    • J
      Btrfs: put csums on the right ordered extent · e58dd74b
      Josef Bacik 提交于
      I noticed a WARN_ON going off when adding csums because we were going over
      the amount of csum bytes that should have been allowed for an ordered
      extent.  This is a leftover from when we used to hold the csums privately
      for direct io, but now we use the normal ordered sum stuff so we need to
      make sure and check if we've moved on to another extent so that the csums
      are added to the right extent.  Without this we could end up with csums for
      bytenrs that don't have extents to cover them yet.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e58dd74b
  11. 13 12月, 2012 1 次提交
  12. 09 10月, 2012 1 次提交
    • Z
      btrfs: fix min csum item size warnings in 32bit · 221b8318
      Zach Brown 提交于
      commit 7ca4be45 limited csum items to
      PAGE_CACHE_SIZE.  It used min() with incompatible types in 32bit which
      generates warnings:
      
      fs/btrfs/file-item.c: In function ‘btrfs_csum_file_blocks’:
      fs/btrfs/file-item.c:717: warning: comparison of distinct pointer types lacks a cast
      
      This uses min_t(u32,) to fix the warnings.  u32 seemed reasonable
      because btrfs_root->leafsize is u32 and PAGE_CACHE_SIZE is unsigned
      long.
      Signed-off-by: NZach Brown <zab@zabbo.net>
      221b8318
  13. 02 10月, 2012 1 次提交
  14. 29 8月, 2012 1 次提交
    • J
      Btrfs: don't allocate a seperate csums array for direct reads · c329861d
      Josef Bacik 提交于
      We've been allocating a big array for csums instead of storing them in the
      io_tree like we do for buffered reads because previously we were locking the
      entire range, so we didn't have an extent state for each sector of the
      range.  But now that we do the range locking as we map the buffers we can
      limit the mapping lenght to sectorsize and use the private part of the
      io_tree for our csums.  This allows us to avoid an extra memory allocation
      for direct reads which could incur latency.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c329861d
  15. 24 7月, 2012 2 次提交
    • L
      Btrfs: kill root from btrfs_is_free_space_inode · 83eea1f1
      Liu Bo 提交于
      Since root can be fetched via BTRFS_I macro directly, we can save an args
      for btrfs_is_free_space_inode().
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      83eea1f1
    • J
      Btrfs: change how we indicate we're adding csums · 0e721106
      Josef Bacik 提交于
      There is weird logic I had to put in place to make sure that when we were
      adding csums that we'd used the delalloc block rsv instead of the global
      block rsv.  Part of this meant that we had to free up our transaction
      reservation before we ran the delayed refs since csum deletion happens
      during the delayed ref work.  The problem with this is that when we release
      a reservation we will add it to the global reserve if it is not full in
      order to keep us going along longer before we have to force a transaction
      commit.  By releasing our reservation before we run delayed refs we don't
      get the opportunity to drain down the global reserve for the work we did, so
      we won't refill it as often.  This isn't a problem per-se, it just results
      in us possibly committing transactions more and more often, and in rare
      cases could cause those WARN_ON()'s to pop in use_block_rsv because we ran
      out of space in our block rsv.
      
      This also helps us by holding onto space while the delayed refs run so we
      don't end up with as many people trying to do things at the same time, which
      again will help us not force commits or hit the use_block_rsv warnings.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0e721106
  16. 29 3月, 2012 1 次提交
    • C
      Btrfs: don't use crc items bigger than 4KB · 7ca4be45
      Chris Mason 提交于
      With the big metadata blocks, we can have crc items
      that are much bigger than a page.  There are a few
      places that we try to kmalloc memory to hold the
      items during a split.
      
      Items bigger than 4KB don't really have a huge benefit
      in efficiency, but they do trigger larger order allocations.
      This commits changes the csums to make sure they stay under
      4KB.  This is not a format change, just a #define to limit
      huge items.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7ca4be45
  17. 22 3月, 2012 3 次提交
    • J
      btrfs: replace many BUG_ONs with proper error handling · 79787eaa
      Jeff Mahoney 提交于
       btrfs currently handles most errors with BUG_ON. This patch is a work-in-
       progress but aims to handle most errors other than internal logic
       errors and ENOMEM more gracefully.
      
       This iteration prevents most crashes but can run into lockups with
       the page lock on occasion when the timing "works out."
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      79787eaa
    • M
      btrfs: Don't BUG_ON kzalloc error in btrfs_lookup_csums_range() · 0678b618
      Mark Fasheh 提交于
      Unfortunately it isn't enough to just exit here - the kzalloc() happens in a
      loop and the allocated items are added to a linked list whose head is passed
      in from the caller.
      
      To fix the BUG_ON() and also provide the semantic that the list passed in is
      only modified on success, I create function-local temporary list that we add
      items too. If no error is met, that list is spliced to the callers at the
      end of the function. Otherwise the list will be walked and all items freed
      before the error value is returned.
      
      I did a simple test on this patch by forcing an error at the kzalloc() point
      and verifying that when this hits (git clone seemed to exercise this), the
      function throws the proper error. Unfortunately but predictably, we later
      hit a BUG_ON(ret) type line that still hasn't been fixed up ;)
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0678b618
    • J
      143bede5
  18. 20 3月, 2012 1 次提交
  19. 06 11月, 2011 1 次提交
    • D
      btrfs: separate superblock items out of fs_info · 6c41761f
      David Sterba 提交于
      fs_info has now ~9kb, more than fits into one page. This will cause
      mount failure when memory is too fragmented. Top space consumers are
      super block structures super_copy and super_for_commit, ~2.8kb each.
      Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)
      
      Add a wrapper for freeing fs_info and all of it's dynamically allocated
      members.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      6c41761f
  20. 11 9月, 2011 1 次提交
  21. 28 7月, 2011 2 次提交
    • C
      Btrfs: use the commit_root for reading free_space_inode crcs · 2cf8572d
      Chris Mason 提交于
      Now that we are using regular file crcs for the free space cache,
      we can deadlock if we try to read the free_space_inode while we are
      updating the crc tree.
      
      This commit fixes things by using the commit_root to read the crcs.  This is
      safe because we the free space cache file would already be loaded if
      that block group had been changed in the current transaction.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      2cf8572d
    • C
      Btrfs: stop using highmem for extent_buffers · a6591715
      Chris Mason 提交于
      The extent_buffers have a very complex interface where
      we use HIGHMEM for metadata and try to cache a kmap mapping
      to access the memory.
      
      The next commit adds reader/writer locks, and concurrent use
      of this kmap cache would make it even more complex.
      
      This commit drops the ability to use HIGHMEM with extent buffers,
      and rips out all of the related code.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a6591715
  22. 15 7月, 2011 1 次提交
    • M
      btrfs: don't BUG_ON btrfs_alloc_path() errors · d8926bb3
      Mark Fasheh 提交于
      This patch fixes many callers of btrfs_alloc_path() which BUG_ON allocation
      failure. All the sites that are fixed in this patch were checked by me to
      be fairly trivial to fix because of at least one of two criteria:
      
       - Callers of the function catch errors from it already so bubbling the
         error up will be handled.
       - Callers of the function might BUG_ON any nonzero return code in which
         case there is no behavior changed (but we still got to remove a BUG_ON)
      
      The following functions were updated:
      
      btrfs_lookup_extent, alloc_reserved_tree_block, btrfs_remove_block_group,
      btrfs_lookup_csums_range, btrfs_csum_file_blocks, btrfs_mark_extent_written,
      btrfs_inode_by_name, btrfs_new_inode, btrfs_symlink,
      insert_reserved_file_extent, and run_delalloc_nocow
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      d8926bb3
  23. 24 5月, 2011 2 次提交
  24. 12 5月, 2011 1 次提交
    • A
      btrfs: scrub · a2de733c
      Arne Jansen 提交于
      This adds an initial implementation for scrub. It works quite
      straightforward. The usermode issues an ioctl for each device in the
      fs. For each device, it enumerates the allocated device chunks. For
      each chunk, the contained extents are enumerated and the data checksums
      fetched. The extents are read sequentially and the checksums verified.
      If an error occurs (checksum or EIO), a good copy is searched for. If
      one is found, the bad copy will be rewritten.
      All enumerations happen from the commit roots. During a transaction
      commit, the scrubs get paused and afterwards continue from the new
      roots.
      
      This commit is based on the series originally posted to linux-btrfs
      with some improvements that resulted from comments from David Sterba,
      Ilya Dryomov and Jan Schmidt.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      a2de733c
  25. 02 5月, 2011 1 次提交