1. 22 1月, 2018 1 次提交
  2. 22 6月, 2017 1 次提交
    • S
      btrfs: Check name_len with boundary in verify dir_item · e79a3327
      Su Yue 提交于
      Originally, verify_dir_item verifies name_len of dir_item with fixed
      values but not item boundary.
      If corrupted name_len was not bigger than the fixed value, for example
      255, the function will think the dir_item is fine. And then reading
      beyond boundary will cause crash.
      
      Example:
      	1. Corrupt one dir_item name_len to be 255.
              2. Run 'ls -lar /mnt/test/ > /dev/null'
      dmesg:
      [   48.451449] BTRFS info (device vdb1): disk space caching is enabled
      [   48.451453] BTRFS info (device vdb1): has skinny extents
      [   48.489420] general protection fault: 0000 [#1] SMP
      [   48.489571] Modules linked in: ext4 jbd2 mbcache btrfs xor raid6_pq
      [   48.489716] CPU: 1 PID: 2710 Comm: ls Not tainted 4.10.0-rc1 #5
      [   48.489853] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
      [   48.490008] task: ffff880035df1bc0 task.stack: ffffc90004800000
      [   48.490008] RIP: 0010:read_extent_buffer+0xd2/0x190 [btrfs]
      [   48.490008] RSP: 0018:ffffc90004803d98 EFLAGS: 00010202
      [   48.490008] RAX: 000000000000001b RBX: 000000000000001b RCX: 0000000000000000
      [   48.490008] RDX: ffff880079dbf36c RSI: 0005080000000000 RDI: ffff880079dbf368
      [   48.490008] RBP: ffffc90004803dc8 R08: ffff880078e8cc48 R09: ffff880000000000
      [   48.490008] R10: 0000160000000000 R11: 0000000000001000 R12: ffff880079dbf288
      [   48.490008] R13: ffff880078e8ca88 R14: 0000000000000003 R15: ffffc90004803e20
      [   48.490008] FS:  00007fef50c60800(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
      [   48.490008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   48.490008] CR2: 000055f335ac2ff8 CR3: 000000007356d000 CR4: 00000000001406e0
      [   48.490008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   48.490008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   48.490008] Call Trace:
      [   48.490008]  btrfs_real_readdir+0x3b7/0x4a0 [btrfs]
      [   48.490008]  iterate_dir+0x181/0x1b0
      [   48.490008]  SyS_getdents+0xa7/0x150
      [   48.490008]  ? fillonedir+0x150/0x150
      [   48.490008]  entry_SYSCALL_64_fastpath+0x18/0xad
      [   48.490008] RIP: 0033:0x7fef5032546b
      [   48.490008] RSP: 002b:00007ffeafcdb830 EFLAGS: 00000206 ORIG_RAX: 000000000000004e
      [   48.490008] RAX: ffffffffffffffda RBX: 00007fef5061db38 RCX: 00007fef5032546b
      [   48.490008] RDX: 0000000000008000 RSI: 000055f335abaff0 RDI: 0000000000000003
      [   48.490008] RBP: 00007fef5061dae0 R08: 00007fef5061db48 R09: 0000000000000000
      [   48.490008] R10: 000055f335abafc0 R11: 0000000000000206 R12: 00007fef5061db38
      [   48.490008] R13: 0000000000008040 R14: 00007fef5061db38 R15: 000000000000270e
      [   48.490008] RIP: read_extent_buffer+0xd2/0x190 [btrfs] RSP: ffffc90004803d98
      [   48.499455] ---[ end trace 321920d8e8339505 ]---
      
      Fix it by adding a parameter @slot and check name_len with item boundary
      by calling btrfs_is_name_len_valid.
      Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
      rev
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e79a3327
  3. 14 2月, 2017 2 次提交
  4. 06 12月, 2016 3 次提交
  5. 28 9月, 2016 1 次提交
  6. 28 5月, 2016 1 次提交
  7. 18 5月, 2016 1 次提交
  8. 11 4月, 2016 1 次提交
  9. 02 3月, 2016 1 次提交
    • F
      Btrfs: fix listxattrs not listing all xattrs packed in the same item · daac7ba6
      Filipe Manana 提交于
      In the listxattrs handler, we were not listing all the xattrs that are
      packed in the same btree item, which happens when multiple xattrs have
      a name that when crc32c hashed produce the same checksum value.
      
      Fix this by processing them all.
      
      The following test case for xfstests reproduces the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            cd /
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
        . ./common/attr
      
        # real QA test starts here
        _supported_fs generic
        _supported_os Linux
        _require_scratch
        _require_attrs
      
        rm -f $seqres.full
      
        _scratch_mkfs >>$seqres.full 2>&1
        _scratch_mount
      
        # Create our test file with a few xattrs. The first 3 xattrs have a name
        # that when given as input to a crc32c function result in the same checksum.
        # This made btrfs list only one of the xattrs through listxattrs system call
        # (because it packs xattrs with the same name checksum into the same btree
        # item).
        touch $SCRATCH_MNT/testfile
        $SETFATTR_PROG -n user.foobar -v 123 $SCRATCH_MNT/testfile
        $SETFATTR_PROG -n user.WvG1c1Td -v qwerty $SCRATCH_MNT/testfile
        $SETFATTR_PROG -n user.J3__T_Km3dVsW_ -v hello $SCRATCH_MNT/testfile
        $SETFATTR_PROG -n user.something -v pizza $SCRATCH_MNT/testfile
        $SETFATTR_PROG -n user.ping -v pong $SCRATCH_MNT/testfile
      
        # Now call getfattr with --dump, which calls the listxattrs system call.
        # It should list all the xattrs we have set before.
        $GETFATTR_PROG --absolute-names --dump $SCRATCH_MNT/testfile | _filter_scratch
      
        status=0
        exit
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      daac7ba6
  10. 18 2月, 2016 1 次提交
  11. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  12. 07 1月, 2016 1 次提交
  13. 07 12月, 2015 1 次提交
  14. 03 12月, 2015 1 次提交
  15. 10 11月, 2015 1 次提交
    • F
      Btrfs: fix race when listing an inode's xattrs · f1cd1f0b
      Filipe Manana 提交于
      When listing a inode's xattrs we have a time window where we race against
      a concurrent operation for adding a new hard link for our inode that makes
      us not return any xattr to user space. In order for this to happen, the
      first xattr of our inode needs to be at slot 0 of a leaf and the previous
      leaf must still have room for an inode ref (or extref) item, and this can
      happen because an inode's listxattrs callback does not lock the inode's
      i_mutex (nor does the VFS does it for us), but adding a hard link to an
      inode makes the VFS lock the inode's i_mutex before calling the inode's
      link callback.
      
      If we have the following leafs:
      
                     Leaf X (has N items)                    Leaf Y
      
       [ ... (257 INODE_ITEM 0) (257 INODE_REF 256) ]  [ (257 XATTR_ITEM 12345), ... ]
                 slot N - 2         slot N - 1              slot 0
      
      The race illustrated by the following sequence diagram is possible:
      
             CPU 1                                               CPU 2
      
        btrfs_listxattr()
      
          searches for key (257 XATTR_ITEM 0)
      
          gets path with path->nodes[0] == leaf X
          and path->slots[0] == N
      
          because path->slots[0] is >=
          btrfs_header_nritems(leaf X), it calls
          btrfs_next_leaf()
      
          btrfs_next_leaf()
            releases the path
      
                                                         adds key (257 INODE_REF 666)
                                                         to the end of leaf X (slot N),
                                                         and leaf X now has N + 1 items
      
            searches for the key (257 INODE_REF 256),
            with path->keep_locks == 1, because that
            is the last key it saw in leaf X before
            releasing the path
      
            ends up at leaf X again and it verifies
            that the key (257 INODE_REF 256) is no
            longer the last key in leaf X, so it
            returns with path->nodes[0] == leaf X
            and path->slots[0] == N, pointing to
            the new item with key (257 INODE_REF 666)
      
          btrfs_listxattr's loop iteration sees that
          the type of the key pointed by the path is
          different from the type BTRFS_XATTR_ITEM_KEY
          and so it breaks the loop and stops looking
          for more xattr items
            --> the application doesn't get any xattr
                listed for our inode
      
      So fix this by breaking the loop only if the key's type is greater than
      BTRFS_XATTR_ITEM_KEY and skip the current key if its type is smaller.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      f1cd1f0b
  16. 16 4月, 2015 1 次提交
  17. 27 3月, 2015 1 次提交
  18. 03 3月, 2015 1 次提交
  19. 21 11月, 2014 1 次提交
    • F
      Btrfs: make xattr replace operations atomic · 5f5bc6b1
      Filipe Manana 提交于
      Replacing a xattr consists of doing a lookup for its existing value, delete
      the current value from the respective leaf, release the search path and then
      finally insert the new value. This leaves a time window where readers (getxattr,
      listxattrs) won't see any value for the xattr. Xattrs are used to store ACLs,
      so this has security implications.
      
      This change also fixes 2 other existing issues which were:
      
      *) Deleting the old xattr value without verifying first if the new xattr will
         fit in the existing leaf item (in case multiple xattrs are packed in the
         same item due to name hash collision);
      
      *) Returning -EEXIST when the flag XATTR_CREATE is given and the xattr doesn't
         exist but we have have an existing item that packs muliple xattrs with
         the same name hash as the input xattr. In this case we should return ENOSPC.
      
      A test case for xfstests follows soon.
      
      Thanks to Alexandre Oliva for reporting the non-atomicity of the xattr replace
      implementation.
      Reported-by: NAlexandre Oliva <oliva@gnu.org>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5f5bc6b1
  20. 18 9月, 2014 1 次提交
  21. 29 1月, 2014 1 次提交
    • F
      Btrfs: add support for inode properties · 63541927
      Filipe David Borba Manana 提交于
      This change adds infrastructure to allow for generic properties for
      inodes. Properties are name/value pairs that can be associated with
      inodes for different purposes. They are stored as xattrs with the
      prefix "btrfs."
      
      Properties can be inherited - this means when a directory inode has
      inheritable properties set, these are added to new inodes created
      under that directory. Further, subvolumes can also have properties
      associated with them, and they can be inherited from their parent
      subvolume. Naturally, directory properties have priority over subvolume
      properties (in practice a subvolume property is just a regular
      property associated with the root inode, objectid 256, of the
      subvolume's fs tree).
      
      This change also adds one specific property implementation, named
      "compression", whose values can be "lzo" or "zlib" and it's an
      inheritable property.
      
      The corresponding changes to btrfs-progs were also implemented.
      A patch with xfstests for this feature will follow once there's
      agreement on this change/feature.
      
      Further, the script at the bottom of this commit message was used to
      do some benchmarks to measure any performance penalties of this feature.
      
      Basically the tests correspond to:
      
      Test 1 - create a filesystem and mount it with compress-force=lzo,
      then sequentially create N files of 64Kb each, measure how long it took
      to create the files, unmount the filesystem, mount the filesystem and
      perform an 'ls -lha' against the test directory holding the N files, and
      report the time the command took.
      
      Test 2 - create a filesystem and don't use any compression option when
      mounting it - instead set the compression property of the subvolume's
      root to 'lzo'. Then create N files of 64Kb, and report the time it took.
      The unmount the filesystem, mount it again and perform an 'ls -lha' like
      in the former test. This means every single file ends up with a property
      (xattr) associated to it.
      
      Test 3 - same as test 2, but uses 4 properties - 3 are duplicates of the
      compression property, have no real effect other than adding more work
      when inheriting properties and taking more btree leaf space.
      
      Test 4 - same as test 3 but with 10 properties per file.
      
      Results (in seconds, and averages of 5 runs each), for different N
      numbers of files follow.
      
      * Without properties (test 1)
      
                          file creation time        ls -lha time
      10 000 files              3.49                   0.76
      100 000 files            47.19                   8.37
      1 000 000 files         518.51                 107.06
      
      * With 1 property (compression property set to lzo - test 2)
      
                          file creation time        ls -lha time
      10 000 files              3.63                    0.93
      100 000 files            48.56                    9.74
      1 000 000 files         537.72                  125.11
      
      * With 4 properties (test 3)
      
                          file creation time        ls -lha time
      10 000 files              3.94                    1.20
      100 000 files            52.14                   11.48
      1 000 000 files         572.70                  142.13
      
      * With 10 properties (test 4)
      
                          file creation time        ls -lha time
      10 000 files              4.61                    1.35
      100 000 files            58.86                   13.83
      1 000 000 files         656.01                  177.61
      
      The increased latencies with properties are essencialy because of:
      
      *) When creating an inode, we now synchronously write 1 more item
         (an xattr item) for each property inherited from the parent dir
         (or subvolume). This could be done in an asynchronous way such
         as we do for dir intex items (delayed-inode.c), which could help
         reduce the file creation latency;
      
      *) With properties, we now have larger fs trees. For this particular
         test each xattr item uses 75 bytes of leaf space in the fs tree.
         This could be less by using a new item for xattr items, instead of
         the current btrfs_dir_item, since we could cut the 'location' and
         'type' fields (saving 18 bytes) and maybe 'transid' too (saving a
         total of 26 bytes per xattr item) from the btrfs_dir_item type.
      
      Also tried batching the xattr insertions (ignoring proper hash
      collision handling, since it didn't exist) when creating files that
      inherit properties from their parent inode/subvolume, but the end
      results were (surprisingly) essentially the same.
      
      Test script:
      
      $ cat test.pl
        #!/usr/bin/perl -w
      
        use strict;
        use Time::HiRes qw(time);
        use constant NUM_FILES => 10_000;
        use constant FILE_SIZES => (64 * 1024);
        use constant DEV => '/dev/sdb4';
        use constant MNT_POINT => '/home/fdmanana/btrfs-tests/dev';
        use constant TEST_DIR => (MNT_POINT . '/testdir');
      
        system("mkfs.btrfs", "-l", "16384", "-f", DEV) == 0 or die "mkfs.btrfs failed!";
      
        # following line for testing without properties
        #system("mount", "-o", "compress-force=lzo", DEV, MNT_POINT) == 0 or die "mount failed!";
      
        # following 2 lines for testing with properties
        system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
        system("btrfs", "prop", "set", MNT_POINT, "compression", "lzo") == 0 or die "set prop failed!";
      
        system("mkdir", TEST_DIR) == 0 or die "mkdir failed!";
        my ($t1, $t2);
      
        $t1 = time();
        for (my $i = 1; $i <= NUM_FILES; $i++) {
            my $p = TEST_DIR . '/file_' . $i;
            open(my $f, '>', $p) or die "Error opening file!";
            $f->autoflush(1);
            for (my $j = 0; $j < FILE_SIZES; $j += 4096) {
                print $f ('A' x 4096) or die "Error writing to file!";
            }
            close($f);
        }
        $t2 = time();
        print "Time to create " . NUM_FILES . ": " . ($t2 - $t1) . " seconds.\n";
        system("umount", DEV) == 0 or die "umount failed!";
        system("mount", DEV, MNT_POINT) == 0 or die "mount failed!";
      
        $t1 = time();
        system("bash -c 'ls -lha " . TEST_DIR . " > /dev/null'") == 0 or die "ls failed!";
        $t2 = time();
        print "Time to ls -lha all files: " . ($t2 - $t1) . " seconds.\n";
        system("umount", DEV) == 0 or die "umount failed!";
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      63541927
  22. 26 1月, 2014 1 次提交
  23. 07 5月, 2013 1 次提交
    • E
      btrfs: make static code static & remove dead code · 48a3b636
      Eric Sandeen 提交于
      Big patch, but all it does is add statics to functions which
      are in fact static, then remove the associated dead-code fallout.
      
      removed functions:
      
      btrfs_iref_to_path()
      __btrfs_lookup_delayed_deletion_item()
      __btrfs_search_delayed_insertion_item()
      __btrfs_search_delayed_deletion_item()
      find_eb_for_page()
      btrfs_find_block_group()
      range_straddles_pages()
      extent_range_uptodate()
      btrfs_file_extent_length()
      btrfs_scrub_cancel_devid()
      btrfs_start_transaction_lflush()
      
      btrfs_print_tree() is left because it is used for debugging.
      btrfs_start_transaction_lflush() and btrfs_reada_detach() are
      left for symmetry.
      
      ulist.c functions are left, another patch will take care of those.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      48a3b636
  24. 17 12月, 2012 3 次提交
  25. 30 5月, 2012 1 次提交
    • J
      Btrfs: use i_version instead of our own sequence · 0c4d2d95
      Josef Bacik 提交于
      We've been keeping around the inode sequence number in hopes that somebody
      would use it, but nobody uses it and people actually use i_version which
      serves the same purpose, so use i_version where we used the incore inode's
      sequence number and that way the sequence is updated properly across the
      board, and not just in file write.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0c4d2d95
  26. 17 1月, 2012 1 次提交
    • J
      Btrfs: do not use btrfs_end_transaction_throttle everywhere · 7ad85bb7
      Josef Bacik 提交于
      A user reported a problem where things like open with O_CREAT would take up to
      30 seconds when he had nfs activity on the same mount.  This is because all of
      our quick metadata operations, like create, symlink etc all do
      btrfs_end_transaction_throttle, which if the transaction is blocked will wait
      for the commit to complete before it returns.  This adds a ridiculous amount of
      latency and isn't really needed.  The normal btrfs_end_transaction will mark the
      transaction as blocked and wake the transaction kthread up if it thinks the
      transaction needs to end (this being in the running out of global reserve space
      scenario), and this is all that is really needed since we've already done
      everything we're going to do, we just need to return.  This should help people
      with the latency they were seeing when using synchronous heavy workloads.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      7ad85bb7
  27. 20 10月, 2011 1 次提交
    • J
      Btrfs: fix regression in re-setting a large xattr · ed3ee9f4
      Josef Bacik 提交于
      Recently I changed the xattr stuff to unconditionally set the xattr first in
      case the xattr didn't exist yet.  This has introduced a regression when setting
      an xattr that already exists with a large value.  If we find the key we are
      looking for split_leaf will assume that we're extending that item.  The problem
      is the size we pass down to btrfs_search_slot includes the size of the item
      already, so if we have the largest xattr we can possibly have plus the size of
      the xattr item plus the xattr item that btrfs_search_slot we'd overflow the
      leaf.  Thankfully this is not what we're doing, but split_leaf doesn't know this
      so it just returns EOVERFLOW.  So in the xattr code we need to check and see if
      we got back EOVERFLOW and treat it like EEXIST since that's really what
      happened.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      ed3ee9f4
  28. 11 9月, 2011 1 次提交
  29. 19 7月, 2011 1 次提交
    • M
      security: new security_inode_init_security API adds function callback · 9d8f13ba
      Mimi Zohar 提交于
      This patch changes the security_inode_init_security API by adding a
      filesystem specific callback to write security extended attributes.
      This change is in preparation for supporting the initialization of
      multiple LSM xattrs and the EVM xattr.  Initially the callback function
      walks an array of xattrs, writing each xattr separately, but could be
      optimized to write multiple xattrs at once.
      
      For existing security_inode_init_security() calls, which have not yet
      been converted to use the new callback function, such as those in
      reiserfs and ocfs2, this patch defines security_old_inode_init_security().
      Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
      9d8f13ba
  30. 11 7月, 2011 1 次提交
    • J
      Btrfs: try to only do one btrfs_search_slot in do_setxattr · fa09200b
      Josef Bacik 提交于
      I've been watching how many btrfs_search_slot()'s we do and I noticed that when
      we create a file with selinux enabled we were doing 2 each time we initialize
      the security context.  That's because we lookup the xattr first so we can delete
      it if we're setting a new value to an existing xattr.  But in the create case we
      don't have any xattrs, so it is completely useless to have the extra lookup.  So
      re-arrange things so that we only lookup first if we specifically have
      XATTR_REPLACE.  That way in the basic case we only do 1 search, and in the more
      complicated case we do the normal 2 lookups.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      fa09200b
  31. 24 5月, 2011 1 次提交
    • J
      Btrfs: kill BTRFS_I(inode)->block_group · d82a6f1d
      Josef Bacik 提交于
      Originally this was going to be used as a way to give hints to the allocator,
      but frankly we can get much better hints elsewhere and it's not even used at all
      for anything usefull.  In addition to be completely useless, when we initialize
      an inode we try and find a freeish block group to set as the inodes block group,
      and with a completely full 40gb fs this takes _forever_, so I imagine with say
      1tb fs this is just unbearable.  So just axe the thing altoghether, we don't
      need it and it saves us 8 bytes in the inode and saves us 500 microseconds per
      inode lookup in my testcase.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d82a6f1d
  32. 02 5月, 2011 1 次提交
  33. 25 4月, 2011 1 次提交
    • L
      Btrfs: Always use 64bit inode number · 33345d01
      Li Zefan 提交于
      There's a potential problem in 32bit system when we exhaust 32bit inode
      numbers and start to allocate big inode numbers, because btrfs uses
      inode->i_ino in many places.
      
      So here we always use BTRFS_I(inode)->location.objectid, which is an
      u64 variable.
      
      There are 2 exceptions that BTRFS_I(inode)->location.objectid !=
      inode->i_ino: the btree inode (0 vs 1) and empty subvol dirs (256 vs 2),
      and inode->i_ino will be used in those cases.
      
      Another reason to make this change is I'm going to use a special inode
      to save free ino cache, and the inode number must be > (u64)-256.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      33345d01
  34. 13 4月, 2011 1 次提交
  35. 18 3月, 2011 1 次提交