1. 18 3月, 2011 5 次提交
    • J
      Btrfs: handle errors in btrfs_orphan_cleanup · 66b4ffd1
      Josef Bacik 提交于
      If we cannot truncate an inode for some reason we will never delete the orphan
      item associated with that inode, which means that we will loop forever in
      btrfs_orphan_cleanup.  Instead of doing this just return error so we fail to
      mount.  It sucks, but hey it's better than hanging.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      66b4ffd1
    • J
      Btrfs: cleanup error handling in the truncate path · 3893e33b
      Josef Bacik 提交于
      Now that we can handle having errors in the truncate path lets make sure we
      return errors instead of doing BUG_ON() and such.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      3893e33b
    • J
      Btrfs: convert to the new truncate sequence · a41ad394
      Josef Bacik 提交于
      ->truncate() is going away, instead all of the work needs to be done in
      ->setattr().  So this converts us over to do this.  It's fairly straightforward,
      just get rid of our .truncate inode operation and call btrfs_truncate() directly
      from btrfs_setsize.  This works out better for us since truncate can technically
      return ENOSPC, and before we had no way of letting anybody know.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      a41ad394
    • J
      Btrfs: use a slab for the free space entries · dc89e982
      Josef Bacik 提交于
      Since we alloc/free free space entries a whole lot, lets use a slab to keep
      track of them.  This makes some of my tests slightly faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      dc89e982
    • J
      Btrfs: change reserved_extents to an atomic_t · 57a45ced
      Josef Bacik 提交于
      We track delayed allocation per inodes via 2 counters, one is
      outstanding_extents and reserved_extents.  Outstanding_extents is already an
      atomic_t, but reserved_extents is not and is protected by a spinlock.  So
      convert this to an atomic_t and instead of using a spinlock, use atomic_cmpxchg
      when releasing delalloc bytes.  This makes our inode 72 bytes smaller, and
      reduces locking overhead (albiet it was minimal to begin with).  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      57a45ced
  2. 11 3月, 2011 2 次提交
  3. 24 2月, 2011 1 次提交
    • C
      Btrfs: fix fiemap bugs with delalloc · ec29ed5b
      Chris Mason 提交于
      The Btrfs fiemap code wasn't properly returning delalloc extents,
      so applications that trust fiemap to decide if there are holes in the
      file see holes instead of delalloc.
      
      This reworks the btrfs fiemap code, adding a get_extent helper that
      searches for delalloc ranges and also adding a helper for extent_fiemap
      that skips past holes in the file.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      ec29ed5b
  4. 15 2月, 2011 1 次提交
  5. 06 2月, 2011 1 次提交
  6. 01 2月, 2011 1 次提交
  7. 29 1月, 2011 3 次提交
    • J
      Btrfs: fix check_path_shared so it returns the right value · dedefd72
      Josef Bacik 提交于
      When running xfstests 224 I kept getting ENOSPC when trying to remove the files,
      and this is because we were returning ret from check_path_shared while it was
      uninitalized, which isn't right.  Fix this to return 0 properly, and now
      xfstests 224 doesn't freak out when it tries to clean itself up.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      dedefd72
    • T
      btrfs: fix return value check of btrfs_join_transaction() · 3612b495
      Tsutomu Itoh 提交于
      The error check of btrfs_join_transaction()/btrfs_join_transaction_nolock()
      is added, and the mistake of the error check in several places is
      corrected.
      
      For more stable Btrfs, I think that we should reduce BUG_ON().
      But, I think that long time is necessary for this.
      So, I propose this patch as a short-term solution.
      
      With this patch:
       - To more stable Btrfs, the part that should be corrected is clarified.
       - The panic isn't done by the NULL pointer reference etc. (even if
         BUG_ON() is increased temporarily)
       - The error code is returned in the place where the error can be easily
         returned.
      
      As a long-term plan:
       - BUG_ON() is reduced by using the forced-readonly framework, etc.
      Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      3612b495
    • J
      fs/btrfs/inode.c: Add missing IS_ERR test · 34d19bad
      Julia Lawall 提交于
      After the conditional that precedes the following code, inode may be an
      ERR_PTR value.  This can eg result from a memory allocation failure via the
      call to btrfs_iget, and thus does not imply that root is different than
      sub_root.  Thus, an IS_ERR check is added to ensure that there is no
      dereference of inode in this case.
      
      The semantic match that finds this problem is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @r@
      identifier f;
      @@
      f(...) { ... return ERR_PTR(...); }
      
      @@
      identifier r.f, fld;
      expression x;
      statement S1,S2;
      @@
       x = f(...)
       ... when != IS_ERR(x)
      (
       if (IS_ERR(x) ||...) S1 else S2
      |
      *x->fld
      )
      // </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      34d19bad
  8. 27 1月, 2011 1 次提交
  9. 17 1月, 2011 2 次提交
    • C
      fallocate should be a file operation · 2fe17c10
      Christoph Hellwig 提交于
      Currently all filesystems except XFS implement fallocate asynchronously,
      while XFS forced a commit.  Both of these are suboptimal - in case of O_SYNC
      I/O we really want our allocation on disk, especially for the !KEEP_SIZE
      case where we actually grow the file with user-visible zeroes.  On the
      other hand always commiting the transaction is a bad idea for fast-path
      uses of fallocate like for example in recent Samba versions.   Given
      that block allocation is a data plane operation anyway change it from
      an inode operation to a file operation so that we have the file structure
      available that lets us check for O_SYNC.
      
      This also includes moving the code around for a few of the filesystems,
      and remove the already unnedded S_ISDIR checks given that we only wire
      up fallocate for regular files.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2fe17c10
    • C
      make the feature checks in ->fallocate future proof · 64c23e86
      Christoph Hellwig 提交于
      Instead of various home grown checks that might need updates for new
      flags just check for any bit outside the mask of the features supported
      by the filesystem.  This makes the check future proof for any newly
      added flag.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      64c23e86
  10. 13 1月, 2011 2 次提交
  11. 07 1月, 2011 5 次提交
    • N
      btrfs: provide simple rcu-walk ACL implementation · 258a5aa8
      Nick Piggin 提交于
      This simple implementation just checks for no ACLs on the inode, and
      if so, then the rcu-walk may proceed, otherwise fail it.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      258a5aa8
    • N
      fs: provide rcu-walk aware permission i_ops · b74c79e9
      Nick Piggin 提交于
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      b74c79e9
    • N
      fs: dcache reduce branches in lookup path · fb045adb
      Nick Piggin 提交于
      Reduce some branches and memory accesses in dcache lookup by adding dentry
      flags to indicate common d_ops are set, rather than having to check them.
      This saves a pointer memory access (dentry->d_op) in common path lookup
      situations, and saves another pointer load and branch in cases where we
      have d_op but not the particular operation.
      
      Patched with:
      
      git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fb045adb
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
    • N
      fs: change d_delete semantics · fe15ce44
      Nick Piggin 提交于
      Change d_delete from a dentry deletion notification to a dentry caching
      advise, more like ->drop_inode. Require it to be constant and idempotent,
      and not take d_lock. This is how all existing filesystems use the callback
      anyway.
      
      This makes fine grained dentry locking of dput and dentry lru scanning
      much simpler.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fe15ce44
  12. 23 12月, 2010 1 次提交
    • L
      Btrfs: Add readonly snapshots support · b83cc969
      Li Zefan 提交于
      Usage:
      
      Set BTRFS_SUBVOL_RDONLY of btrfs_ioctl_vol_arg_v2->flags, and call
      ioctl(BTRFS_I0CTL_SNAP_CREATE_V2).
      
      Implementation:
      
      - Set readonly bit of btrfs_root_item->flags.
      - Add readonly checks in btrfs_permission (inode_permission),
      btrfs_setattr, btrfs_set/remove_xattr and some ioctls.
      
      Changelog for v3:
      
      - Eliminate btrfs_root->readonly, but check btrfs_root->root_item.flags.
      - Rename BTRFS_ROOT_SNAP_RDONLY to BTRFS_ROOT_SUBVOL_RDONLY.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      b83cc969
  13. 22 12月, 2010 1 次提交
  14. 11 12月, 2010 2 次提交
  15. 10 12月, 2010 1 次提交
  16. 29 11月, 2010 1 次提交
  17. 28 11月, 2010 3 次提交
  18. 22 11月, 2010 6 次提交
    • J
      Btrfs: make btrfs_add_nondir take parent inode as an argument · a1b075d2
      Josef Bacik 提交于
      Everybody who calls btrfs_add_nondir just passes in the dentry of the new file
      and then dereference dentry->d_parent->d_inode, but everybody who calls
      btrfs_add_nondir() are already passed the parent's inode.  So instead of
      dereferencing dentry->d_parent, just make btrfs_add_nondir take the dir inode as
      an argument and pass that along so we don't have to worry about d_parent.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      a1b075d2
    • J
      Btrfs: use dget_parent where we can UPDATED · 6a912213
      Josef Bacik 提交于
      There are lots of places where we do dentry->d_parent->d_inode without holding
      the dentry->d_lock.  This could cause problems with rename.  So instead we need
      to use dget_parent() and hold the reference to the parent as long as we are
      going to use it's inode and then dput it at the end.
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Cc: raven@themaw.net
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      6a912213
    • J
      Btrfs: fix more ESTALE problems with NFS · 76195853
      Josef Bacik 提交于
      When creating new inodes we don't setup inode->i_generation.  So if we generate
      an fh with a newly created inode we save the generation of 0, but if we flush
      the inode to disk and have to read it back when getting the inode on the server
      we'll have the right i_generation, so gens wont match and we get ESTALE.  This
      patch properly sets inode->i_generation when we create the new inode and now I'm
      no longer getting ESTALE.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      76195853
    • L
      btrfs: Show device attr correctly for symlinks · f209561a
      Li Zefan 提交于
      Symlinks and files of other types show different device numbers, though
      they are on the same partition:
      
       $ touch tmp; ln -s tmp tmp2; stat tmp tmp2
         File: `tmp'
         Size: 0         	Blocks: 0          IO Block: 4096   regular empty file
       Device: 15h/21d	Inode: 984027      Links: 1
       --- snip ---
         File: `tmp2' -> `tmp'
         Size: 3         	Blocks: 0          IO Block: 4096   symbolic link
       Device: 13h/19d	Inode: 984028      Links: 1
      Reported-by: NToke Høiland-Jørgensen <toke@toke.dk>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      f209561a
    • M
      btrfs: fix panic caused by direct IO · e65e1535
      Miao Xie 提交于
      btrfs paniced when we write >64KB data by direct IO at one time.
      
      Reproduce steps:
       # mkfs.btrfs /dev/sda5 /dev/sda6
       # mount /dev/sda5 /mnt
       # dd if=/dev/zero of=/mnt/tmpfile bs=100K count=1 oflag=direct
      
      Then btrfs paniced:
      mapping failed logical 1103155200 bio len 69632 len 12288
      ------------[ cut here ]------------
      kernel BUG at fs/btrfs/volumes.c:3010!
      [SNIP]
      Pid: 1992, comm: btrfs-worker-0 Not tainted 2.6.37-rc1 #1 D2399/PRIMERGY
      RIP: 0010:[<ffffffffa03d1462>]  [<ffffffffa03d1462>] btrfs_map_bio+0x202/0x210 [btrfs]
      [SNIP]
      Call Trace:
       [<ffffffffa03ab3eb>] __btrfs_submit_bio_done+0x1b/0x20 [btrfs]
       [<ffffffffa03a35ff>] run_one_async_done+0x9f/0xb0 [btrfs]
       [<ffffffffa03d3d20>] run_ordered_completions+0x80/0xc0 [btrfs]
       [<ffffffffa03d45a4>] worker_loop+0x154/0x5f0 [btrfs]
       [<ffffffffa03d4450>] ? worker_loop+0x0/0x5f0 [btrfs]
       [<ffffffffa03d4450>] ? worker_loop+0x0/0x5f0 [btrfs]
       [<ffffffff81083216>] kthread+0x96/0xa0
       [<ffffffff8100cec4>] kernel_thread_helper+0x4/0x10
       [<ffffffff81083180>] ? kthread+0x0/0xa0
       [<ffffffff8100cec0>] ? kernel_thread_helper+0x0/0x10
      
      We fix this problem by splitting bios when we submit bios.
      Reported-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Tested-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      e65e1535
    • M
      btrfs: fix free dip and dip->csums twice · 0c56fa96
      Miao Xie 提交于
      bio_endio() will free dip and dip->csums, so dip and dip->csums twice will
      be freed twice. Fix it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      0c56fa96
  19. 30 10月, 2010 1 次提交