1. 30 5月, 2012 4 次提交
    • M
      Btrfs: fix the same inode id problem when doing auto defragment · 762f2263
      Miao Xie 提交于
      Two files in the different subvolumes may have the same inode id, so
      The rb-tree which is used to manage the defragment object must take it
      into account. This patch fix this problem.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      762f2263
    • J
      Btrfs: convert the inode bit field to use the actual bit operations · 72ac3c0d
      Josef Bacik 提交于
      Miao pointed this out while I was working on an orphan problem that messing
      with a bitfield where different ranges are protected by different locks
      doesn't work out right.  Turns out we've been doing this forever where we
      have different parts of the bit field protected by either no lock at all or
      different locks which could cause all sorts of weird problems including the
      issue I was hitting.  So instead make a runtime_flags thing that we use the
      normal bit operations on that are all atomic so we can keep having our
      no/different locking for the different flags and then make force_compress
      it's own thing so it can be treated normally.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      72ac3c0d
    • J
      Btrfs: do not do filemap_write_and_wait_range in fsync · 0885ef5b
      Josef Bacik 提交于
      We already do the btrfs_wait_ordered_range which will do this for us, so
      just remove this call so we don't call it twice.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0885ef5b
    • J
      Btrfs: use i_version instead of our own sequence · 0c4d2d95
      Josef Bacik 提交于
      We've been keeping around the inode sequence number in hopes that somebody
      would use it, but nobody uses it and people actually use i_version which
      serves the same purpose, so use i_version where we used the incore inode's
      sequence number and that way the sequence is updated properly across the
      board, and not just in file write.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0c4d2d95
  2. 28 4月, 2012 1 次提交
  3. 22 3月, 2012 2 次提交
  4. 15 2月, 2012 1 次提交
  5. 01 2月, 2012 1 次提交
  6. 17 1月, 2012 1 次提交
  7. 11 1月, 2012 1 次提交
  8. 22 12月, 2011 1 次提交
    • A
      Btrfs: mark delayed refs as for cow · 66d7e7f0
      Arne Jansen 提交于
      Add a for_cow parameter to add_delayed_*_ref and pass the appropriate value
      from every call site. The for_cow parameter will later on be used to
      determine if a ref will change anything with respect to qgroups.
      
      Delayed refs coming from relocation are always counted as for_cow, as they
      don't change subvol quota.
      
      Also pass in the fs_info for later use.
      
      btrfs_find_all_roots() will use this as an optimization, as changes that are
      for_cow will not change anything with respect to which root points to a
      certain leaf. Thus, we don't need to add the current sequence number to
      those delayed refs.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      66d7e7f0
  9. 18 12月, 2011 1 次提交
  10. 17 12月, 2011 1 次提交
  11. 16 12月, 2011 1 次提交
    • J
      Btrfs: deal with enospc from dirtying inodes properly · 22c44fe6
      Josef Bacik 提交于
      Now that we're properly keeping track of delayed inode space we've been getting
      a lot of warnings out of btrfs_dirty_inode() when running xfstest 83.  This is
      because a bunch of people call mark_inode_dirty, which is void so we can't
      return ENOSPC.  This needs to be fixed in a few areas
      
      1) file_update_time - this updates the mtime and such when writing to a file,
      which will call mark_inode_dirty.  So copy file_update_time into btrfs so we can
      call btrfs_dirty_inode directly and return an error if we get one appropriately.
      
      2) fix symlinks to use btrfs_setattr for ->setattr.  For some reason we weren't
      setting ->setattr for symlinks, even though we should have been.  This catches
      one of the cases where we were getting errors in mark_inode_dirty.
      
      3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
      instead of mark_inode_dirty.  This lets us return errors properly for truncate
      and chown/anything related to setattr.
      
      4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
      print an error if we have one.  The only remaining user we can't control for
      this is touch_atime(), but we don't really want to keep people from walking
      down the tree if we don't have space to save the atime update, so just complain
      but don't worry about it.
      
      With this patch xfstests 83 complains a handful of times instead of hundreds of
      times.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      22c44fe6
  12. 28 10月, 2011 1 次提交
    • A
      vfs: do (nearly) lockless generic_file_llseek · ef3d0fd2
      Andi Kleen 提交于
      The i_mutex lock use of generic _file_llseek hurts.  Independent processes
      accessing the same file synchronize over a single lock, even though
      they have no need for synchronization at all.
      
      Under high utilization this can cause llseek to scale very poorly on larger
      systems.
      
      This patch does some rethinking of the llseek locking model:
      
      First the 64bit f_pos is not necessarily atomic without locks
      on 32bit systems. This can already cause races with read() today.
      This was discussed on linux-kernel in the past and deemed acceptable.
      The patch does not change that.
      
      Let's look at the different seek variants:
      
      SEEK_SET: Doesn't really need any locking.
      If there's a race one writer wins, the other loses.
      
      For 32bit the non atomic update races against read()
      stay the same. Without a lock they can also happen
      against write() now.  The read() race was deemed
      acceptable in past discussions, and I think if it's
      ok for read it's ok for write too.
      
      => Don't need a lock.
      
      SEEK_END: This behaves like SEEK_SET plus it reads
      the maximum size too. Reading the maximum size would have the
      32bit atomic problem. But luckily we already have a way to read
      the maximum size without locking (i_size_read), so we
      can just use that instead.
      
      Without i_mutex there is no synchronization with write() anymore,
      however since the write() update is atomic on 64bit it just behaves
      like another racy SEEK_SET.  On non atomic 32bit it's the same
      as SEEK_SET.
      
      => Don't need a lock, but need to use i_size_read()
      
      SEEK_CUR: This has a read-modify-write race window
      on the same file. One could argue that any application
      doing unsynchronized seeks on the same file is already broken.
      But for the sake of not adding a regression here I'm
      using the file->f_lock to synchronize this. Using this
      lock is much better than the inode mutex because it doesn't
      synchronize between processes.
      
      => So still need a lock, but can use a f_lock.
      
      This patch implements this new scheme in generic_file_llseek.
      I dropped generic_file_llseek_unlocked and changed all callers.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      ef3d0fd2
  13. 20 10月, 2011 2 次提交
  14. 01 10月, 2011 1 次提交
    • J
      Btrfs: force a page fault if we have a shorty copy on a page boundary · b6316429
      Josef Bacik 提交于
      A user reported a problem where ceph was getting into 100% cpu usage while doing
      some writing.  It turns out it's because we were doing a short write on a not
      uptodate page, which means we'd fall back at one page at a time and fault the
      page in.  The problem is our position is on the page boundary, so our fault in
      logic wasn't actually reading the page, so we'd just spin forever or until the
      page got read in by somebody else.  This will force a readpage if we end up
      doing a short copy.  Alexandre could reproduce this easily with ceph and reports
      it fixes his problem.  I also wrote a reproducer that no longer hangs my box
      with this patch.  Thanks,
      Reported-and-tested-by: NAlexandre Oliva <aoliva@redhat.com>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b6316429
  15. 18 9月, 2011 1 次提交
  16. 11 9月, 2011 1 次提交
  17. 18 8月, 2011 2 次提交
  18. 17 8月, 2011 1 次提交
  19. 02 8月, 2011 2 次提交
  20. 28 7月, 2011 2 次提交
    • J
      Btrfs: fix enospc problems with delalloc · 9e0baf60
      Josef Bacik 提交于
      So I had this brilliant idea to use atomic counters for outstanding and reserved
      extents, but this turned out to be a bad idea.  Consider this where we have 1
      outstanding extent and 1 reserved extent
      
      Reserver				Releaser
      					atomic_dec(outstanding) now 0
      atomic_read(outstanding)+1 get 1
      atomic_read(reserved) get 1
      don't actually reserve anything because
      they are the same
      					atomic_cmpxchg(reserved, 1, 0)
      atomic_inc(outstanding)
      atomic_add(0, reserved)
      					free reserved space for 1 extent
      
      Then the reserver now has no actual space reserved for it, and when it goes to
      finish the ordered IO it won't have enough space to do it's allocation and you
      get those lovely warnings.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      9e0baf60
    • J
      Btrfs: use find_or_create_page instead of grab_cache_page · a94733d0
      Josef Bacik 提交于
      grab_cache_page will use mapping_gfp_mask(), which for all inodes is set to
      GFP_HIGHUSER_MOVABLE.  So instead use find_or_create_page in all cases where we
      need GFP_NOFS so we don't deadlock.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      a94733d0
  21. 21 7月, 2011 2 次提交
    • J
      fs: push i_mutex and filemap_write_and_wait down into ->fsync() handlers · 02c24a82
      Josef Bacik 提交于
      Btrfs needs to be able to control how filemap_write_and_wait_range() is called
      in fsync to make it less of a painful operation, so push down taking i_mutex and
      the calling of filemap_write_and_wait() down into the ->fsync() handlers.  Some
      file systems can drop taking the i_mutex altogether it seems, like ext3 and
      ocfs2.  For correctness sake I just pushed everything down in all cases to make
      sure that we keep the current behavior the same for everybody, and then each
      individual fs maintainer can make up their mind about what to do from there.
      Thanks,
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      02c24a82
    • J
      Btrfs: implement our own ->llseek · b2675157
      Josef Bacik 提交于
      In order to handle SEEK_HOLE/SEEK_DATA we need to implement our own llseek.
      Basically for the normal SEEK_*'s we will just defer to the generic helper, and
      for SEEK_HOLE/SEEK_DATA we will use our fiemap helper to figure out the nearest
      hole or data.  Currently this helper doesn't check for delalloc bytes for
      prealloc space, so for now treat prealloc as data until that is fixed.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b2675157
  22. 15 7月, 2011 1 次提交
    • M
      btrfs: don't BUG_ON btrfs_alloc_path() errors · d8926bb3
      Mark Fasheh 提交于
      This patch fixes many callers of btrfs_alloc_path() which BUG_ON allocation
      failure. All the sites that are fixed in this patch were checked by me to
      be fairly trivial to fix because of at least one of two criteria:
      
       - Callers of the function catch errors from it already so bubbling the
         error up will be handled.
       - Callers of the function might BUG_ON any nonzero return code in which
         case there is no behavior changed (but we still got to remove a BUG_ON)
      
      The following functions were updated:
      
      btrfs_lookup_extent, alloc_reserved_tree_block, btrfs_remove_block_group,
      btrfs_lookup_csums_range, btrfs_csum_file_blocks, btrfs_mark_extent_written,
      btrfs_inode_by_name, btrfs_new_inode, btrfs_symlink,
      insert_reserved_file_extent, and run_delalloc_nocow
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      d8926bb3
  23. 04 6月, 2011 2 次提交
  24. 27 5月, 2011 1 次提交
  25. 24 5月, 2011 1 次提交
    • J
      Btrfs: kill trans_mutex · a4abeea4
      Josef Bacik 提交于
      We use trans_mutex for lots of things, here's a basic list
      
      1) To serialize trans_handles joining the currently running transaction
      2) To make sure that no new trans handles are started while we are committing
      3) To protect the dead_roots list and the transaction lists
      
      Really the serializing trans_handles joining is not too hard, and can really get
      bogged down in acquiring a reference to the transaction.  So replace the
      trans_mutex with a trans_lock spinlock and use it to do the following
      
      1) Protect fs_info->running_transaction.  All trans handles have to do is check
      this, and then take a reference of the transaction and keep on going.
      2) Protect the fs_info->trans_list.  This doesn't get used too much, basically
      it just holds the current transactions, which will usually just be the currently
      committing transaction and the currently running transaction at most.
      3) Protect the dead roots list.  This is only ever processed by splicing the
      list so this is relatively simple.
      4) Protect the fs_info->reloc_ctl stuff.  This is very lightweight and was using
      the trans_mutex before, so this is a pretty straightforward change.
      5) Protect fs_info->no_trans_join.  Because we don't hold the trans_lock over
      the entirety of the commit we need to have a way to block new people from
      creating a new transaction while we're doing our work.  So we set no_trans_join
      and in join_transaction we test to see if that is set, and if it is we do a
      wait_on_commit.
      6) Make the transaction use count atomic so we don't need to take locks to
      modify it when we're dropping references.
      7) Add a commit_lock to the transaction to make sure multiple people trying to
      commit the same transaction don't race and commit at the same time.
      8) Make open_ioctl_trans an atomic so we don't have to take any locks for ioctl
      trans.
      
      I have tested this with xfstests, but obviously it is a pretty hairy change so
      lots of testing is greatly appreciated.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      a4abeea4
  26. 02 5月, 2011 3 次提交
  27. 25 4月, 2011 1 次提交
    • L
      Btrfs: Always use 64bit inode number · 33345d01
      Li Zefan 提交于
      There's a potential problem in 32bit system when we exhaust 32bit inode
      numbers and start to allocate big inode numbers, because btrfs uses
      inode->i_ino in many places.
      
      So here we always use BTRFS_I(inode)->location.objectid, which is an
      u64 variable.
      
      There are 2 exceptions that BTRFS_I(inode)->location.objectid !=
      inode->i_ino: the btree inode (0 vs 1) and empty subvol dirs (256 vs 2),
      and inode->i_ino will be used in those cases.
      
      Another reason to make this change is I'm going to use a special inode
      to save free ino cache, and the inode number must be > (u64)-256.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      33345d01
  28. 09 4月, 2011 1 次提交
    • J
      Btrfs: deal with the case that we run out of space in the cache · be1a12a0
      Josef Bacik 提交于
      Currently we don't handle running out of space in the cache, so to fix this we
      keep track of how far in the cache we are.  Then we only dirty the pages if we
      successfully modify all of them, otherwise if we have an error or run out of
      space we can just drop them and not worry about the vm writing them out.
      Thanks,
      
      Tested-by Johannes Hirte <johannes.hirte@fem.tu-ilmenau.de>
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      be1a12a0