1. 14 2月, 2017 1 次提交
  2. 11 2月, 2017 1 次提交
    • O
      Btrfs: fix btrfs_decompress_buf2page() · 6e78b3f7
      Omar Sandoval 提交于
      If btrfs_decompress_buf2page() is handed a bio with its page in the
      middle of the working buffer, then we adjust the offset into the working
      buffer. After we copy into the bio, we advance the iterator by the
      number of bytes we copied. Then, we have some logic to handle the case
      of discontiguous pages and adjust the offset into the working buffer
      again. However, if we didn't advance the bio to a new page, we may enter
      this case in error, essentially repeating the adjustment that we already
      made when we entered the function. The end result is bogus data in the
      bio.
      
      Previously, we only checked for this case when we advanced to a new
      page, but the conversion to bio iterators changed that. This restores
      the old, correct behavior.
      
      A case I saw when testing with zlib was:
      
          buf_start = 42769
          total_out = 46865
          working_bytes = total_out - buf_start = 4096
          start_byte = 45056
      
      The condition (total_out > start_byte && buf_start < start_byte) is
      true, so we adjust the offset:
      
          buf_offset = start_byte - buf_start = 2287
          working_bytes -= buf_offset = 1809
          current_buf_start = buf_start = 42769
      
      Then, we copy
      
          bytes = min(bvec.bv_len, PAGE_SIZE - buf_offset, working_bytes) = 1809
          buf_offset += bytes = 4096
          working_bytes -= bytes = 0
          current_buf_start += bytes = 44578
      
      After bio_advance(), we are still in the same page, so start_byte is the
      same. Then, we check (total_out > start_byte && current_buf_start < start_byte),
      which is true! So, we adjust the values again:
      
          buf_offset = start_byte - buf_start = 2287
          working_bytes = total_out - start_byte = 1809
          current_buf_start = buf_start + buf_offset = 45056
      
      But note that working_bytes was already zero before this, so we should
      have stopped copying.
      
      Fixes: 974b1adc ("btrfs: use bio iterators for the decompression handlers")
      Reported-by: NPat Erley <pat-lkml@erley.org>
      Reviewed-by: NChris Mason <clm@fb.com>
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Tested-by: NLiu Bo <bo.li.liu@oracle.com>
      6e78b3f7
  3. 09 2月, 2017 1 次提交
  4. 27 1月, 2017 3 次提交
  5. 20 1月, 2017 3 次提交
    • L
      Btrfs: fix truncate down when no_holes feature is enabled · 91298eec
      Liu Bo 提交于
      For such a file mapping,
      
      [0-4k][hole][8k-12k]
      
      In NO_HOLES mode, we don't have the [hole] extent any more.
      Commit c1aa4575 ("Btrfs: fix shrinking truncate when the no_holes feature is enabled")
       fixed disk isize not being updated in NO_HOLES mode when data is not flushed.
      
      However, even if data has been flushed, we can still have trouble
      in updating disk isize since we updated disk isize to 'start' of
      the last evicted extent.
      Reviewed-by: NChris Mason <clm@fb.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      91298eec
    • C
      Btrfs: Fix deadlock between direct IO and fast fsync · 97dcdea0
      Chandan Rajendra 提交于
      The following deadlock is seen when executing generic/113 test,
      
       ---------------------------------------------------------+----------------------------------------------------
        Direct I/O task                                           Fast fsync task
       ---------------------------------------------------------+----------------------------------------------------
        btrfs_direct_IO
          __blockdev_direct_IO
           do_blockdev_direct_IO
            do_direct_IO
             btrfs_get_blocks_direct
              while (blocks needs to written)
               get_more_blocks (first iteration)
                btrfs_get_blocks_direct
                 btrfs_create_dio_extent
                   down_read(&BTRFS_I(inode) >dio_sem)
                   Create and add extent map and ordered extent
                   up_read(&BTRFS_I(inode) >dio_sem)
                                                                  btrfs_sync_file
                                                                    btrfs_log_dentry_safe
                                                                     btrfs_log_inode_parent
                                                                      btrfs_log_inode
                                                                       btrfs_log_changed_extents
                                                                        down_write(&BTRFS_I(inode) >dio_sem)
                                                                         Collect new extent maps and ordered extents
                                                                          wait for ordered extent completion
               get_more_blocks (second iteration)
                btrfs_get_blocks_direct
                 btrfs_create_dio_extent
                   down_read(&BTRFS_I(inode) >dio_sem)
       --------------------------------------------------------------------------------------------------------------
      
      In the above description, Btrfs direct I/O code path has not yet started
      submitting bios for file range covered by the initial ordered
      extent. Meanwhile, The fast fsync task obtains the write semaphore and
      waits for I/O on the ordered extent to get completed. However, the
      Direct I/O task is now blocked on obtaining the read semaphore.
      
      To resolve the deadlock, this commit modifies the Direct I/O code path
      to obtain the read semaphore before invoking
      __blockdev_direct_IO(). The semaphore is then given up after
      __blockdev_direct_IO() returns. This allows the Direct I/O code to
      complete I/O on all the ordered extents it creates.
      Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      97dcdea0
    • W
      btrfs: fix false enospc error when truncating heavily reflinked file · 47b5d646
      Wang Xiaoguang 提交于
      Below test script can reveal this bug:
          dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=100
          dev=$(losetup --show -f fs.img)
          mkdir -p /mnt/mntpoint
          mkfs.btrfs  -f $dev
          mount $dev /mnt/mntpoint
          cd /mnt/mntpoint
      
          echo "workdir is: /mnt/mntpoint"
          blocksize=$((128 * 1024))
          dd if=/dev/zero of=testfile bs=$blocksize count=1
          sync
          count=$((17*1024*1024*1024/blocksize))
          echo "file size is:" $((count*blocksize))
          for ((i = 1; i <= $count; i++)); do
              dst_offset=$((blocksize * i))
              xfs_io -f -c "reflink testfile 0 $dst_offset $blocksize"\
                      testfile > /dev/null
          done
          sync
          truncate --size 0 testfile
      
      The last truncate operation will fail for ENOSPC reason, but indeed
      it should not fail.
      
      In btrfs_truncate(), we use a temporary block_rsv to do truncate
      operation. With every btrfs_truncate_inode_items() call, we migrate space
      to this block_rsv, but forget to cleanup previous reservation, which
      will make this block_rsv's reserved bytes keep growing, and this reserved
      space will only be released in the end of btrfs_truncate(), this metadata
      leak will impact other's metadata reservation. In this case, it's
      "btrfs_start_transaction(root, 2);" fails for enospc error, which make
      this truncate operation fail.
      
      Call btrfs_block_rsv_release() to fix this bug.
      Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      47b5d646
  6. 09 1月, 2017 2 次提交
  7. 04 1月, 2017 1 次提交
  8. 03 1月, 2017 4 次提交
  9. 20 12月, 2016 1 次提交
  10. 15 12月, 2016 3 次提交
  11. 14 12月, 2016 1 次提交
    • M
      btrfs: limit async_work allocation and worker func duration · 2939e1a8
      Maxim Patlasov 提交于
      Problem statement: unprivileged user who has read-write access to more than
      one btrfs subvolume may easily consume all kernel memory (eventually
      triggering oom-killer).
      
      Reproducer (./mkrmdir below essentially loops over mkdir/rmdir):
      
      [root@kteam1 ~]# cat prep.sh
      
      DEV=/dev/sdb
      mkfs.btrfs -f $DEV
      mount $DEV /mnt
      for i in `seq 1 16`
      do
      	mkdir /mnt/$i
      	btrfs subvolume create /mnt/SV_$i
      	ID=`btrfs subvolume list /mnt |grep "SV_$i$" |cut -d ' ' -f 2`
      	mount -t btrfs -o subvolid=$ID $DEV /mnt/$i
      	chmod a+rwx /mnt/$i
      done
      
      [root@kteam1 ~]# sh prep.sh
      
      [maxim@kteam1 ~]$ for i in `seq 1 16`; do ./mkrmdir /mnt/$i 2000 2000 & done
      
      [root@kteam1 ~]# for i in `seq 1 4`; do grep "kmalloc-128" /proc/slabinfo | grep -v dma; sleep 60; done
      kmalloc-128        10144  10144    128   32    1 : tunables    0    0    0 : slabdata    317    317      0
      kmalloc-128       9992352 9992352    128   32    1 : tunables    0    0    0 : slabdata 312261 312261      0
      kmalloc-128       24226752 24226752    128   32    1 : tunables    0    0    0 : slabdata 757086 757086      0
      kmalloc-128       42754240 42754240    128   32    1 : tunables    0    0    0 : slabdata 1336070 1336070      0
      
      The huge numbers above come from insane number of async_work-s allocated
      and queued by btrfs_wq_run_delayed_node.
      
      The problem is caused by btrfs_wq_run_delayed_node() queuing more and more
      works if the number of delayed items is above BTRFS_DELAYED_BACKGROUND. The
      worker func (btrfs_async_run_delayed_root) processes at least
      BTRFS_DELAYED_BATCH items (if they are present in the list). So, the machinery
      works as expected while the list is almost empty. As soon as it is getting
      bigger, worker func starts to process more than one item at a time, it takes
      longer, and the chances to have async_works queued more than needed is getting
      higher.
      
      The problem above is worsened by another flaw of delayed-inode implementation:
      if async_work was queued in a throttling branch (number of items >=
      BTRFS_DELAYED_WRITEBACK), corresponding worker func won't quit until
      the number of items < BTRFS_DELAYED_BACKGROUND / 2. So, it is possible that
      the func occupies CPU infinitely (up to 30sec in my experiments): while the
      func is trying to drain the list, the user activity may add more and more
      items to the list.
      
      The patch fixes both problems in straightforward way: refuse queuing too
      many works in btrfs_wq_run_delayed_node and bail out of worker func if
      at least BTRFS_DELAYED_WRITEBACK items are processed.
      
      Changed in v2: remove support of thresh == NO_THRESHOLD.
      Signed-off-by: NMaxim Patlasov <mpatlasov@virtuozzo.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      Cc: stable@vger.kernel.org # v3.15+
      2939e1a8
  12. 13 12月, 2016 1 次提交
    • P
      printk/btrfs: handle more message headers · 262c5e86
      Petr Mladek 提交于
      Commit 4bcc595c ("printk: reinstate KERN_CONT for printing
      continuation lines") allows to define more message headers for a single
      message.  The motivation is that continuous lines might get mixed.
      Therefore it make sense to define the right log level for every piece of
      a cont line.
      
      The current btrfs_printk() macros do not support continuous lines at the
      moment.  But better be prepared for a custom messages and avoid
      potential "lvl" buffer overflow.
      
      This patch iterates over the entire message header.  It is interested
      only into the message level like the original code.
      
      This patch also introduces PRINTK_MAX_SINGLE_HEADER_LEN.  Three bytes
      are enough for the message level header at the moment.  But it used to
      be three, see the commit 04d2c8c8 ("printk: convert the format for
      KERN_<LEVEL> to a 2 byte pattern").
      
      Also I fixed the default ratelimit level.  It looked very strange when it
      was different from the default log level.
      
      [pmladek@suse.com: Fix a check of the valid message level]
        Link: http://lkml.kernel.org/r/20161111183236.GD2145@dhcp128.suse.cz
      Link: http://lkml.kernel.org/r/1478695291-12169-4-git-send-email-pmladek@suse.comSigned-off-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NDavid Sterba <dsterba@suse.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Jaroslav Kysela <perex@perex.cz>
      Cc: Takashi Iwai <tiwai@suse.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      262c5e86
  13. 12 12月, 2016 1 次提交
  14. 10 12月, 2016 1 次提交
    • C
      fs: try to clone files first in vfs_copy_file_range · a76b5b04
      Christoph Hellwig 提交于
      A clone is a perfectly fine implementation of a file copy, so most
      file systems just implement the copy that way.  Instead of duplicating
      this logic move it to the VFS.  Currently btrfs and XFS implement copies
      the same way as clones and there is no behavior change for them, cifs
      only implements clones and grow support for copy_file_range with this
      patch.  NFS implements both, so this will allow copy_file_range to work
      on servers that only implement CLONE and be lot more efficient on servers
      that implements CLONE and COPY.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      a76b5b04
  15. 09 12月, 2016 2 次提交
  16. 06 12月, 2016 14 次提交