1. 23 2月, 2015 1 次提交
    • D
      VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) · e36cb0b8
      David Howells 提交于
      Convert the following where appropriate:
      
       (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).
      
       (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).
      
       (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry).  This is actually more
           complicated than it appears as some calls should be converted to
           d_can_lookup() instead.  The difference is whether the directory in
           question is a real dir with a ->lookup op or whether it's a fake dir with
           a ->d_automount op.
      
      In some circumstances, we can subsume checks for dentry->d_inode not being
      NULL into this, provided we the code isn't in a filesystem that expects
      d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
      use d_inode() rather than d_backing_inode() to get the inode pointer).
      
      Note that the dentry type field may be set to something other than
      DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
      manages the fall-through from a negative dentry to a lower layer.  In such a
      case, the dentry type of the negative union dentry is set to the same as the
      type of the lower dentry.
      
      However, if you know d_inode is not NULL at the call site, then you can use
      the d_is_xxx() functions even in a filesystem.
      
      There is one further complication: a 0,0 chardev dentry may be labelled
      DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE.  Strictly, this was
      intended for special directory entry types that don't have attached inodes.
      
      The following perl+coccinelle script was used:
      
      use strict;
      
      my @callers;
      open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
          die "Can't grep for S_ISDIR and co. callers";
      @callers = <$fd>;
      close($fd);
      unless (@callers) {
          print "No matches\n";
          exit(0);
      }
      
      my @cocci = (
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISLNK(E->d_inode->i_mode)',
          '+ d_is_symlink(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISDIR(E->d_inode->i_mode)',
          '+ d_is_dir(E)',
          '',
          '@@',
          'expression E;',
          '@@',
          '',
          '- S_ISREG(E->d_inode->i_mode)',
          '+ d_is_reg(E)' );
      
      my $coccifile = "tmp.sp.cocci";
      open($fd, ">$coccifile") || die $coccifile;
      print($fd "$_\n") || die $coccifile foreach (@cocci);
      close($fd);
      
      foreach my $file (@callers) {
          chomp $file;
          print "Processing ", $file, "\n";
          system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
      	die "spatch failed";
      }
      
      [AV: overlayfs parts skipped]
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      e36cb0b8
  2. 25 11月, 2014 2 次提交
    • F
      Btrfs: fix snapshot inconsistency after a file write followed by truncate · 9ea24bbe
      Filipe Manana 提交于
      If right after starting the snapshot creation ioctl we perform a write against a
      file followed by a truncate, with both operations increasing the file's size, we
      can get a snapshot tree that reflects a state of the source subvolume's tree where
      the file truncation happened but the write operation didn't. This leaves a gap
      between 2 file extent items of the inode, which makes btrfs' fsck complain about it.
      
      For example, if we perform the following file operations:
      
          $ mkfs.btrfs -f /dev/vdd
          $ mount /dev/vdd /mnt
          $ xfs_io -f \
                -c "pwrite -S 0xaa -b 32K 0 32K" \
                -c "fsync" \
                -c "pwrite -S 0xbb -b 32770 16K 32770" \
                -c "truncate 90123" \
                /mnt/foobar
      
      and the snapshot creation ioctl was just called before the second write, we often
      can get the following inode items in the snapshot's btree:
      
              item 120 key (257 INODE_ITEM 0) itemoff 7987 itemsize 160
                      inode generation 146 transid 7 size 90123 block group 0 mode 100600 links 1 uid 0 gid 0 rdev 0 flags 0x0
              item 121 key (257 INODE_REF 256) itemoff 7967 itemsize 20
                      inode ref index 282 namelen 10 name: foobar
              item 122 key (257 EXTENT_DATA 0) itemoff 7914 itemsize 53
                      extent data disk byte 1104855040 nr 32768
                      extent data offset 0 nr 32768 ram 32768
                      extent compression 0
              item 123 key (257 EXTENT_DATA 53248) itemoff 7861 itemsize 53
                      extent data disk byte 0 nr 0
                      extent data offset 0 nr 40960 ram 40960
                      extent compression 0
      
      There's a file range, corresponding to the interval [32K; ALIGN(16K + 32770, 4096)[
      for which there's no file extent item covering it. This is because the file write
      and file truncate operations happened both right after the snapshot creation ioctl
      called btrfs_start_delalloc_inodes(), which means we didn't start and wait for the
      ordered extent that matches the write and, in btrfs_setsize(), we were able to call
      btrfs_cont_expand() before being able to commit the current transaction in the
      snapshot creation ioctl. So this made it possibe to insert the hole file extent
      item in the source subvolume (which represents the region added by the truncate)
      right before the transaction commit from the snapshot creation ioctl.
      
      Btrfs' fsck tool complains about such cases with a message like the following:
      
          "root 331 inode 257 errors 100, file extent discount"
      
      >From a user perspective, the expectation when a snapshot is created while those
      file operations are being performed is that the snapshot will have a file that
      either:
      
      1) is empty
      2) only the first write was captured
      3) only the 2 writes were captured
      4) both writes and the truncation were captured
      
      But never capture a state where only the first write and the truncation were
      captured (since the second write was performed before the truncation).
      
      A test case for xfstests follows.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9ea24bbe
    • F
      Btrfs: ensure send always works on roots without orphans · e5fa8f86
      Filipe Manana 提交于
      Move the logic from the snapshot creation ioctl into send. This avoids
      doing the transaction commit if send isn't used, and ensures that if
      a crash/reboot happens after the transaction commit that created the
      snapshot and before the transaction commit that switched the commit
      root, send will not get a commit root that differs from the main root
      (that has orphan items).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e5fa8f86
  3. 20 11月, 2014 1 次提交
  4. 24 10月, 2014 1 次提交
  5. 17 10月, 2014 1 次提交
    • C
      Revert "Btrfs: race free update of commit root for ro snapshots" · d3797308
      Chris Mason 提交于
      This reverts commit 9c3b306e.
      
      Switching only one commit root during a transaction is wrong because it
      leads the fs into an inconsistent state. All commit roots should be
      switched at once, at transaction commit time, otherwise backref walking
      can often miss important references that were only accessible through
      the old commit root.  Plus, the root item for the snapshot's root wasn't
      getting updated and preventing the next transaction commit to do it.
      
      This made several users get into random corruption issues after creation
      of readonly snapshots.
      
      A regression test for xfstests will follow soon.
      
      Cc: stable@vger.kernel.org # 3.17
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d3797308
  6. 09 10月, 2014 1 次提交
  7. 02 10月, 2014 3 次提交
  8. 18 9月, 2014 10 次提交
  9. 09 9月, 2014 1 次提交
    • D
      Btrfs: kfree()ing ERR_PTRs · c47ca32d
      Dan Carpenter 提交于
      The "inherit" in btrfs_ioctl_snap_create_v2() and "vol_args" in
      btrfs_ioctl_rm_dev() are ERR_PTRs so we can't call kfree() on them.
      
      These kind of bugs are "One Err Bugs" where there is just one error
      label that does everything.  I could set the "inherit = NULL" and keep
      the single out label but it ends up being more complicated that way.  It
      makes the code simpler to re-order the unwind so it's in the mirror
      order of the allocation and introduce some new error labels.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c47ca32d
  10. 27 8月, 2014 1 次提交
    • C
      Btrfs: fix autodefrag with compression · e9512d72
      Chris Mason 提交于
      The autodefrag code skips defrag when two extents are adjacent.  But one
      big advantage for autodefrag is cutting down on the number of small
      extents, even when they are adjacent.  This commit changes it to defrag
      all small extents.
      Signed-off-by: NChris Mason <clm@fb.com>
      e9512d72
  11. 21 8月, 2014 2 次提交
    • F
      Btrfs: clone, don't create invalid hole extent map · 62e2390e
      Filipe Manana 提交于
      When cloning a file that consists of an inline extent, we were creating
      an extent map that represents a non-existing trailing hole starting at a
      file offset that isn't a multiple of the sector size. This happened because
      when processing an inline extent we weren't aligning the extent's length to
      the sector size, and therefore incorrectly treating the range
      [inline_extent_length; sector_size[ as a hole.
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NSatoru Takeuchi <takeuchi_satoru@jp.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      62e2390e
    • F
      Btrfs: race free update of commit root for ro snapshots · 9c3b306e
      Filipe Manana 提交于
      This is a better solution for the problem addressed in the following
      commit:
      
          Btrfs: update commit root on snapshot creation after orphan cleanup
          (3821f348)
      
      The previous solution wasn't the best because of 2 reasons:
      
          1) It added another full transaction commit, which is more expensive
             than just swapping the commit root with the root;
      
          2) If a reboot happened after the first transaction commit (the one
             that creates the snapshot) and before the second transaction commit,
             then we would end up with the same problem if a send using that
             snapshot was requested before the first transaction commit after
             the reboot.
      
      This change addresses those 2 issues. The second issue is addressed by
      switching the commit root in the dentry lookup VFS callback, which is
      also called by the snapshot/subvol creation ioctl and performs orphan
      cleanup if needed. Like the vfs, the ioctl locks the parent inode too,
      preventing race issues between a dentry lookup and snapshot creation.
      
      Cc: Alex Lyakas <alex.btrfs@zadarastorage.com>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      9c3b306e
  12. 03 7月, 2014 2 次提交
  13. 14 6月, 2014 1 次提交
  14. 13 6月, 2014 5 次提交
  15. 10 6月, 2014 8 次提交
    • F
      Btrfs: make fsync work after cloning into a file · 7ffbb598
      Filipe Manana 提交于
      When cloning into a file, we were correctly replacing the extent
      items in the target range and removing the extent maps. However
      we weren't replacing the extent maps with new ones that point to
      the new extents - as a consequence, an incremental fsync (when the
      inode doesn't have the full sync flag) was a NOOP, since it relies
      on the existence of extent maps in the modified list of the inode's
      extent map tree, which was empty. Therefore add new extent maps to
      reflect the target clone range.
      
      A test case for xfstests follows.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      7ffbb598
    • A
      trivial: fs/btrfs/ioctl.c: fix typo s/substract/subtract/ · 93915584
      Antonio Ospite 提交于
      Signed-off-by: NAntonio Ospite <ao2@ao2.it>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: linux-btrfs@vger.kernel.org
      Signed-off-by: NChris Mason <clm@fb.com>
      93915584
    • F
      Btrfs: fix clone to deal with holes when NO_HOLES feature is enabled · f82a9901
      Filipe Manana 提交于
      If the NO_HOLES feature is enabled holes don't have file extent items in
      the btree that represent them anymore. This made the clone operation
      ignore the gaps that exist between consecutive file extent items and
      therefore not create the holes at the destination. When not using the
      NO_HOLES feature, the holes were created at the destination.
      
      A test case for xfstests follows.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f82a9901
    • G
      btrfs: replace EINVAL with ERANGE for resize when ULLONG_MAX · 902c68a4
      Gui Hecheng 提交于
      To be accurate about the error case,
      if the new size is beyond ULLONG_MAX, return ERANGE instead of EINVAL.
      Signed-off-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      902c68a4
    • F
      Btrfs: update commit root on snapshot creation after orphan cleanup · 3821f348
      Filipe Manana 提交于
      On snapshot creation (either writable or read-only), we do orphan cleanup
      against the root of the snapshot. If the cleanup did remove any orphans,
      then the current root node will be different from the commit root node
      until the next transaction commit happens.
      
      A send operation always uses the commit root of a snapshot - this means
      it will see the orphans if it starts computing the send stream before the
      next transaction commit happens (triggered by a timer or sync() for .e.g),
      which is when the commit root gets assigned a reference to current root,
      where the orphans are not visible anymore. The consequence of send seeing
      the orphans is explained below.
      
      For example:
      
          mkfs.btrfs -f /dev/sdd
          mount -o commit=999 /dev/sdd /mnt
      
          # open a file with O_TMPFILE and leave it open
          # write some data to the file
          btrfs subvolume snapshot -r /mnt /mnt/snap1
      
          btrfs send /mnt/snap1 -f /tmp/send.data
      
      The send operation will fail with the following error:
      
          ERROR: send ioctl failed with -116: Stale file handle
      
      What happens here is that our snapshot has an orphan inode still visible
      through the commit root, that corresponds to the tmpfile. However send
      will attempt to call inode.c:btrfs_iget(), with the goal of reading the
      file's data, which will return -ESTALE because it will use the current
      root (and not the commit root) of the snapshot.
      
      Of course, there are other cases where we can get orphans, but this
      example using a tmpfile makes it much easier to reproduce the issue.
      
      Therefore on snapshot creation, after calling btrfs_orphan_cleanup, if
      the commit root is different from the current root, just commit the
      transaction associated with the snapshot's root (if it exists), so that
      a send will not see any orphans that don't exist anymore. This also
      guarantees a send will always see the same content regardless of whether
      a transaction commit happened already before the send was requested and
      after the orphan cleanup (meaning the commit root and current roots are
      the same) or it hasn't happened yet (commit and current roots are
      different).
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3821f348
    • F
      Btrfs: ioctl, don't re-lock extent range when not necessary · ff5df9b8
      Filipe Manana 提交于
      In ioctl.c:lock_extent_range(), after locking our target range, the
      ordered extent that btrfs_lookup_first_ordered_extent() returns us
      may not overlap our target range at all. In this case we would just
      unlock our target range, wait for any new ordered extents that overlap
      the range to complete, lock again the range and repeat all these steps
      until we don't get any ordered extent and the delalloc flag isn't set
      in the io tree for our target range.
      
      Therefore just stop if we get an ordered extent that doesn't overlap
      our target range and the dealalloc flag isn't set for the range in
      the inode's io tree.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      ff5df9b8
    • F
      Btrfs: avoid visiting all extent items when cloning a range · 2c463823
      Filipe Manana 提交于
      When cloning a range of a file, we were visiting all the extent items in
      the btree that belong to our source inode. We don't need to visit those
      extent items that don't overlap the range we are cloning, as doing so only
      makes us waste time and do unnecessary btree navigations (btrfs_next_leaf)
      for inodes that have a large number of file extent items in the btree.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2c463823
    • F
      Btrfs: set dead flag on the right root when destroying snapshot · c55bfa67
      Filipe Manana 提交于
      We were setting the BTRFS_ROOT_SUBVOL_DEAD flag on the root of the
      parent of our target snapshot, instead of setting it in the target
      snapshot's root.
      
      This is easy to observe by running the following scenario:
      
          mkfs.btrfs -f /dev/sdd
          mount /dev/sdd /mnt
      
          btrfs subvolume create /mnt/first_subvol
          btrfs subvolume snapshot -r /mnt /mnt/mysnap1
      
          btrfs subvolume delete /mnt/first_subvol
          btrfs subvolume snapshot -r /mnt /mnt/mysnap2
      
          btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/send.data
      
      The send command failed because the send ioctl returned -EPERM.
      A test case for xfstests follows.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      c55bfa67