1. 22 6月, 2017 2 次提交
    • S
      btrfs: Check name_len with boundary in verify dir_item · e79a3327
      Su Yue 提交于
      Originally, verify_dir_item verifies name_len of dir_item with fixed
      values but not item boundary.
      If corrupted name_len was not bigger than the fixed value, for example
      255, the function will think the dir_item is fine. And then reading
      beyond boundary will cause crash.
      
      Example:
      	1. Corrupt one dir_item name_len to be 255.
              2. Run 'ls -lar /mnt/test/ > /dev/null'
      dmesg:
      [   48.451449] BTRFS info (device vdb1): disk space caching is enabled
      [   48.451453] BTRFS info (device vdb1): has skinny extents
      [   48.489420] general protection fault: 0000 [#1] SMP
      [   48.489571] Modules linked in: ext4 jbd2 mbcache btrfs xor raid6_pq
      [   48.489716] CPU: 1 PID: 2710 Comm: ls Not tainted 4.10.0-rc1 #5
      [   48.489853] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-20170228_101828-anatol 04/01/2014
      [   48.490008] task: ffff880035df1bc0 task.stack: ffffc90004800000
      [   48.490008] RIP: 0010:read_extent_buffer+0xd2/0x190 [btrfs]
      [   48.490008] RSP: 0018:ffffc90004803d98 EFLAGS: 00010202
      [   48.490008] RAX: 000000000000001b RBX: 000000000000001b RCX: 0000000000000000
      [   48.490008] RDX: ffff880079dbf36c RSI: 0005080000000000 RDI: ffff880079dbf368
      [   48.490008] RBP: ffffc90004803dc8 R08: ffff880078e8cc48 R09: ffff880000000000
      [   48.490008] R10: 0000160000000000 R11: 0000000000001000 R12: ffff880079dbf288
      [   48.490008] R13: ffff880078e8ca88 R14: 0000000000000003 R15: ffffc90004803e20
      [   48.490008] FS:  00007fef50c60800(0000) GS:ffff88007d400000(0000) knlGS:0000000000000000
      [   48.490008] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   48.490008] CR2: 000055f335ac2ff8 CR3: 000000007356d000 CR4: 00000000001406e0
      [   48.490008] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   48.490008] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   48.490008] Call Trace:
      [   48.490008]  btrfs_real_readdir+0x3b7/0x4a0 [btrfs]
      [   48.490008]  iterate_dir+0x181/0x1b0
      [   48.490008]  SyS_getdents+0xa7/0x150
      [   48.490008]  ? fillonedir+0x150/0x150
      [   48.490008]  entry_SYSCALL_64_fastpath+0x18/0xad
      [   48.490008] RIP: 0033:0x7fef5032546b
      [   48.490008] RSP: 002b:00007ffeafcdb830 EFLAGS: 00000206 ORIG_RAX: 000000000000004e
      [   48.490008] RAX: ffffffffffffffda RBX: 00007fef5061db38 RCX: 00007fef5032546b
      [   48.490008] RDX: 0000000000008000 RSI: 000055f335abaff0 RDI: 0000000000000003
      [   48.490008] RBP: 00007fef5061dae0 R08: 00007fef5061db48 R09: 0000000000000000
      [   48.490008] R10: 000055f335abafc0 R11: 0000000000000206 R12: 00007fef5061db38
      [   48.490008] R13: 0000000000008040 R14: 00007fef5061db38 R15: 000000000000270e
      [   48.490008] RIP: read_extent_buffer+0xd2/0x190 [btrfs] RSP: ffffc90004803d98
      [   48.499455] ---[ end trace 321920d8e8339505 ]---
      
      Fix it by adding a parameter @slot and check name_len with item boundary
      by calling btrfs_is_name_len_valid.
      Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
      rev
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e79a3327
    • S
      btrfs: Introduce btrfs_is_name_len_valid to avoid reading beyond boundary · 19c6dcbf
      Su Yue 提交于
      Introduce function btrfs_is_name_len_valid.
      
      The function compares parameter @name_len with item boundary then
      returns true if name_len is valid.
      Signed-off-by: NSu Yue <suy.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ s/btrfs_leaf_data/BTRFS_LEAF_DATA_OFFSET/ ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      19c6dcbf
  2. 20 6月, 2017 11 次提交
    • N
      btrfs: Round down values which are written for total_bytes_size · 7dfb8be1
      Nikolay Borisov 提交于
      We got an internal report about a file system not wanting to mount
      following 99e3ecfc ("Btrfs: add more validation checks for
      superblock").
      
      BTRFS error (device sdb1): super_total_bytes 1000203816960 mismatch with
      fs_devices total_rw_bytes 1000203820544
      
      Subtracting the numbers we get a difference of less than a 4kb. Upon
      closer inspection it became apparent that mkfs actually rounds down the
      size of the device to a multiple of sector size. However, the same
      cannot be said for various functions which modify the total size and are
      called from btrfs_balance as well as when adding a new device. So this
      patch ensures that values being saved into on-disk data structures are
      always rounded down to a multiple of sectorsize.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7dfb8be1
    • N
      btrfs: Manually implement device_total_bytes getter/setter · eca152ed
      Nikolay Borisov 提交于
      The device->total_bytes member needs to always be rounded down to sectorsize
      so that it corresponds to the value of super->total_bytes. However, there are
      multiple places where the setter is fed a value which is not rounded which
      can cause a fs to be unmountable due to the check introduced in
      99e3ecfc ("Btrfs: add more validation checks for superblock"). This patch
      implements the getter/setter manually so that in a later patch I can add
      necessary code to catch offenders.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      eca152ed
    • D
      btrfs: obsolete and remove mount option alloc_start · 0d0c71b3
      David Sterba 提交于
      The mount option alloc_start was used in the past for debugging and
      stressing the chunk allocator. Not meant to be used by users, so we're
      not breaking anybody's setup.
      
      There was some added complexity handling changes of the value and when
      it was not same as default. Such code has likely been untested and I
      think it's better to remove it.
      
      This patch kills all use of alloc_start, and by doing that also fixes
      a bug when alloc_size is set, potentially called from statfs:
      
      in btrfs_calc_avail_data_space, traversing the list in RCU, the RCU
      protection is temporarily dropped so btrfs_account_dev_extents_size can
      be called and then RCU is locked again! Doing that inside
      list_for_each_entry_rcu is just asking for trouble, but unlikely to be
      observed in practice.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0d0c71b3
    • D
      btrfs: move fs_info::fs_frozen to the flags · fac03c8d
      David Sterba 提交于
      We can keep the state among the other fs_info flags, there's no reason
      why fs_frozen would need to be separate.
      Reviewed-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      fac03c8d
    • D
      btrfs: use generic slab for for btrfs_transaction · 4b5faeac
      David Sterba 提交于
      Observing the number of slab objects of btrfs_transaction, there's just
      one active on an almost quiescent filesystem, and the number of objects
      goes to about ten when sync is in progress. Then the nubmer goes down to
      1.  This matches the expectations of the transaction lifetime.
      
      For such use the separate slab cache is not justified, as we do not
      reuse objects frequently. For the shortlived transaction, the generic
      slab (size 512) should be ok. We can optimistically expect that the 512
      slabs are not all used (fragmentation) and there are free slots to take
      when we do the allocation, compared to potentially allocating a whole new
      page for the separate slab.
      
      We'll lose the stats about the object use, which could be added later if
      we really need them.
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      4b5faeac
    • N
      btrfs: remove __BTRFS_LEAF_DATA_SIZE · 118c701e
      Nikolay Borisov 提交于
      __BTRFS_LAF_DATA_SIZE is used only by BTRFS_LEAF_DATA_SIZE. Make the
      latter subsume the former.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      118c701e
    • N
      btrfs: rename btrfs_leaf_data to BTRFS_LEAF_DATA_OFFSET · 3d9ec8c4
      Nikolay Borisov 提交于
      Commit 5f39d397 ("Btrfs: Create extent_buffer interface
      for large blocksizes") refactored btrfs_leaf_data function to take
      extent_buffer rather than struct btrfs_leaf. However, as it turns out the
      parameter being passed is never used. Furthermore this function no longer
      returns the leaf data but rather the offset to it. So rename the function
      to BTRFS_LEAF_DATA_OFFSET to make it consistent with other BTRFS_LEAF_*
      helpers and turn it into a macro.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      [ removed () from the macro ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3d9ec8c4
    • J
      btrfs: cleanup root usage by btrfs_get_alloc_profile · 1b86826d
      Jeff Mahoney 提交于
      There are two places where we don't already know what kind of alloc
      profile we need before calling btrfs_get_alloc_profile, but we need
      access to a root everywhere we call it.
      
      This patch adds helpers for btrfs_{data,metadata,system}_alloc_profile()
      and relegates btrfs_system_alloc_profile to a static for use in those
      two cases.  The next patch will eliminate one of those.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      1b86826d
    • J
      Btrfs: replace tree->mapping with tree->private_data · c6100a4b
      Josef Bacik 提交于
      For extent_io tree's we have carried the address_mapping of the inode
      around in the io tree in order to pull the inode back out for calling
      into various tree ops hooks.  This works fine when everything that has
      an extent_io_tree has an inode.  But we are going to remove the
      btree_inode, so we need to change this.  Instead just have a generic
      void * for private data that we can initialize with, and have all the
      tree ops use that instead.  This had a lot of cascading changes but
      should be relatively straightforward.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Reviewed-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ minor reordering of the callback prototypes ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c6100a4b
    • S
      btrfs: add quota override flag to enable quota override for CAP_SYS_RESOURCE · f29efe29
      Sargun Dhillon 提交于
      This patch introduces the quota override flag to btrfs_fs_info, and a
      change to quota limit checking code to temporarily allow for quota to be
      overridden for processes with CAP_SYS_RESOURCE.
      
      It's useful for administrative programs, such as log rotation, that may
      need to temporarily use more disk space in order to free up a greater
      amount of overall disk space without yielding more disk space to the
      rest of userland.
      
      Eventually, we may want to add the idea of an operator-specific quota,
      operator reserved space, or something else to allow for administrative
      override, but this is perhaps the simplest solution.
      Signed-off-by: NSargun Dhillon <sargun@sargun.me>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      [ minor changelog edits ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f29efe29
    • N
      btrfs: Convert fs_info->free_chunk_space to atomic64_t · a5ed45f8
      Nikolay Borisov 提交于
      The ->free_chunk_space variable is used to track the unallocated space
      and access to it is protected by a spinlock, which is not used for
      anything else.  Make the code a bit self-explanatory by switching the
      variable to an atomic64_t type and kill the spinlock.
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      [ not a performance critical code, use of atomic type is ok ]
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      a5ed45f8
  3. 10 6月, 2017 1 次提交
    • O
      Btrfs: fix delalloc accounting leak caused by u32 overflow · 70e7af24
      Omar Sandoval 提交于
      btrfs_calc_trans_metadata_size() does an unsigned 32-bit multiplication,
      which can overflow if num_items >= 4 GB / (nodesize * BTRFS_MAX_LEVEL * 2).
      For a nodesize of 16kB, this overflow happens at 16k items. Usually,
      num_items is a small constant passed to btrfs_start_transaction(), but
      we also use btrfs_calc_trans_metadata_size() for metadata reservations
      for extent items in btrfs_delalloc_{reserve,release}_metadata().
      
      In drop_outstanding_extents(), num_items is calculated as
      inode->reserved_extents - inode->outstanding_extents. The difference
      between these two counters is usually small, but if many delalloc
      extents are reserved and then the outstanding extents are merged in
      btrfs_merge_extent_hook(), the difference can become large enough to
      overflow in btrfs_calc_trans_metadata_size().
      
      The overflow manifests itself as a leak of a multiple of 4 GB in
      delalloc_block_rsv and the metadata bytes_may_use counter. This in turn
      can cause early ENOSPC errors. Additionally, these WARN_ONs in
      extent-tree.c will be hit when unmounting:
      
          WARN_ON(fs_info->delalloc_block_rsv.size > 0);
          WARN_ON(fs_info->delalloc_block_rsv.reserved > 0);
          WARN_ON(space_info->bytes_pinned > 0 ||
                  space_info->bytes_reserved > 0 ||
                  space_info->bytes_may_use > 0);
      
      Fix it by casting nodesize to a u64 so that
      btrfs_calc_trans_metadata_size() does a full 64-bit multiplication.
      While we're here, do the same in btrfs_calc_trunc_metadata_size(); this
      can't overflow with any existing uses, but it's better to be safe here
      than have another hard-to-debug problem later on.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      70e7af24
  4. 21 4月, 2017 1 次提交
  5. 18 4月, 2017 6 次提交
  6. 29 3月, 2017 1 次提交
    • G
      btrfs: Change qgroup_meta_rsv to 64bit · ce0dcee6
      Goldwyn Rodrigues 提交于
      Using an int value is causing qg->reserved to become negative and
      exclusive -EDQUOT to be reached prematurely.
      
      This affects exclusive qgroups only.
      
      TEST CASE:
      
      DEVICE=/dev/vdb
      MOUNTPOINT=/mnt
      SUBVOL=$MOUNTPOINT/tmp
      
      umount $SUBVOL
      umount $MOUNTPOINT
      
      mkfs.btrfs -f $DEVICE
      mount /dev/vdb $MOUNTPOINT
      btrfs quota enable $MOUNTPOINT
      btrfs subvol create $SUBVOL
      umount $MOUNTPOINT
      mount /dev/vdb $MOUNTPOINT
      mount -o subvol=tmp $DEVICE $SUBVOL
      btrfs qgroup limit -e 3G $SUBVOL
      
      btrfs quota rescan /mnt -w
      
      for i in `seq 1 44000`; do
        dd if=/dev/zero of=/mnt/tmp/test_$i bs=10k count=1
        if [[ $? > 0 ]]; then
           btrfs qgroup show -pcref $SUBVOL
           exit 1
        fi
      done
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      [ add reproducer to changelog ]
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      ce0dcee6
  7. 02 3月, 2017 1 次提交
  8. 28 2月, 2017 14 次提交
  9. 25 2月, 2017 1 次提交
  10. 17 2月, 2017 2 次提交