1. 30 10月, 2013 1 次提交
    • T
      sysfs: merge sysfs_elem_bin_attr into sysfs_elem_attr · 56b3f3b8
      Tejun Heo 提交于
      3124eb16 ("sysfs: merge regular and bin file handling") folded bin
      file handling into regular file handling.  Among other things, bin
      file now shares the same open path including sysfs_open_dirent
      association using sysfs_dirent->s_attr.open.  This is buggy because
      ->s_bin_attr lives in the same union and doesn't have the field.  This
      bug doesn't trigger because sysfs_elem_bin_attr doesn't have an active
      field at the conflicting position.  It does have a field "buffers" but
      it isn't used anymore.
      
      This patch collapses sysfs_elem_bin_attr into sysfs_elem_attr so that
      the bin_attr is accessed through ->s_attr.bin_attr which lives with
      ->s_attr.attr in an anonymous union.  The code paths already assume
      bin_attr contains attr as the first element, so this doesn't add any
      more assumptions while making it explicit that the two types are
      handled together.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      56b3f3b8
  2. 25 10月, 2013 1 次提交
  3. 19 10月, 2013 1 次提交
  4. 17 10月, 2013 4 次提交
  5. 16 10月, 2013 1 次提交
  6. 15 10月, 2013 2 次提交
  7. 14 10月, 2013 1 次提交
    • T
      sysfs: make sysfs_file_ops() follow ignore_lockdep flag · 785a162d
      Tejun Heo 提交于
      375b611e ("sysfs: remove sysfs_buffer->ops") introduced
      sysfs_file_ops() which determines the associated file operation of a
      given sysfs_dirent.  As file ops access should be protected by an
      active reference, the new function includes a lockdep assertion on the
      sysfs_dirent; unfortunately, I forgot to take attr->ignore_lockdep
      flag into account and the lockdep assertion trips spuriously for files
      which opt out from active reference lockdep checking.
      
      # cat /sys/devices/pci0000:00/0000:00:01.2/usb1/authorized
      
       ------------[ cut here ]------------
       WARNING: CPU: 1 PID: 540 at /work/os/work/fs/sysfs/file.c:79 sysfs_file_ops+0x4e/0x60()
       Modules linked in:
       CPU: 1 PID: 540 Comm: cat Not tainted 3.11.0-work+ #3
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000009 ffff880016205c08 ffffffff81ca0131 0000000000000000
        ffff880016205c40 ffffffff81096d0d ffff8800166cb898 ffff8800166f6f60
        ffffffff8125a220 ffff880011ab1ec0 ffff88000aff0c78 ffff880016205c50
       Call Trace:
        [<ffffffff81ca0131>] dump_stack+0x4e/0x82
        [<ffffffff81096d0d>] warn_slowpath_common+0x7d/0xa0
        [<ffffffff81096dea>] warn_slowpath_null+0x1a/0x20
        [<ffffffff8125994e>] sysfs_file_ops+0x4e/0x60
        [<ffffffff8125a274>] sysfs_open_file+0x54/0x300
        [<ffffffff811df612>] do_dentry_open.isra.17+0x182/0x280
        [<ffffffff811df820>] finish_open+0x30/0x40
        [<ffffffff811f0623>] do_last+0x503/0xd90
        [<ffffffff811f0f6b>] path_openat+0xbb/0x6d0
        [<ffffffff811f23ba>] do_filp_open+0x3a/0x90
        [<ffffffff811e09a9>] do_sys_open+0x129/0x220
        [<ffffffff811e0abe>] SyS_open+0x1e/0x20
        [<ffffffff81caf3c2>] system_call_fastpath+0x16/0x1b
       ---[ end trace aa48096b111dafdb ]---
      
      Rename fs/sysfs/dir.c::ignore_lockdep() to sysfs_ignore_lockdep() and
      move it to fs/sysfs/sysfs.h and make sysfs_file_ops() skip lockdep
      assertion if sysfs_ignore_lockdep() is true.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      785a162d
  8. 13 10月, 2013 2 次提交
  9. 11 10月, 2013 4 次提交
    • M
      Btrfs: fix oops caused by the space balance and dead roots · c00869f1
      Miao Xie 提交于
      When doing space balance and subvolume destroy at the same time, we met
      the following oops:
      
      kernel BUG at fs/btrfs/relocation.c:2247!
      RIP: 0010: [<ffffffffa04cec16>] prepare_to_merge+0x154/0x1f0 [btrfs]
      Call Trace:
       [<ffffffffa04b5ab7>] relocate_block_group+0x466/0x4e6 [btrfs]
       [<ffffffffa04b5c7a>] btrfs_relocate_block_group+0x143/0x275 [btrfs]
       [<ffffffffa0495c56>] btrfs_relocate_chunk.isra.27+0x5c/0x5a2 [btrfs]
       [<ffffffffa0459871>] ? btrfs_item_key_to_cpu+0x15/0x31 [btrfs]
       [<ffffffffa048b46a>] ? btrfs_get_token_64+0x7e/0xcd [btrfs]
       [<ffffffffa04a3467>] ? btrfs_tree_read_unlock_blocking+0xb2/0xb7 [btrfs]
       [<ffffffffa049907d>] btrfs_balance+0x9c7/0xb6f [btrfs]
       [<ffffffffa049ef84>] btrfs_ioctl_balance+0x234/0x2ac [btrfs]
       [<ffffffffa04a1e8e>] btrfs_ioctl+0xd87/0x1ef9 [btrfs]
       [<ffffffff81122f53>] ? path_openat+0x234/0x4db
       [<ffffffff813c3b78>] ? __do_page_fault+0x31d/0x391
       [<ffffffff810f8ab6>] ? vma_link+0x74/0x94
       [<ffffffff811250f5>] vfs_ioctl+0x1d/0x39
       [<ffffffff811258c8>] do_vfs_ioctl+0x32d/0x3e2
       [<ffffffff811259d4>] SyS_ioctl+0x57/0x83
       [<ffffffff813c3bfa>] ? do_page_fault+0xe/0x10
       [<ffffffff813c73c2>] system_call_fastpath+0x16/0x1b
      
      It is because we returned the error number if the reference of the root was 0
      when doing space relocation. It was not right here, because though the root
      was dead(refs == 0), but the space it held still need be relocated, or we
      could not remove the block group. So in this case, we should return the root
      no matter it is dead or not.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      c00869f1
    • M
      Btrfs: insert orphan roots into fs radix tree · 14927d95
      Miao Xie 提交于
      Now we don't drop all the deleted snapshots/subvolumes before the space
      balance. It means we have to relocate the space which is held by the dead
      snapshots/subvolumes. So we must into them into fs radix tree, or we would
      forget to commit the change of them when doing transaction commit, and it
      would corrupt the metadata.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      14927d95
    • J
      Btrfs: limit delalloc pages outside of find_delalloc_range · 7bf811a5
      Josef Bacik 提交于
      Liu fixed part of this problem and unfortunately I steered him in slightly the
      wrong direction and so didn't completely fix the problem.  The problem is we
      limit the size of the delalloc range we are looking for to max bytes and then we
      try to lock that range.  If we fail to lock the pages in that range we will
      shrink the max bytes to a single page and re loop.  However if our first page is
      inside of the delalloc range then we will end up limiting the end of the range
      to a period before our first page.  This is illustrated below
      
      [0 -------- delalloc range --------- 256mb]
                                        [page]
      
      So find_delalloc_range will return with delalloc_start as 0 and end as 128mb,
      and then we will notice that delalloc_start < *start and adjust it up, but not
      adjust delalloc_end up, so things go sideways.  To fix this we need to not limit
      the max bytes in find_delalloc_range, but in find_lock_delalloc_range and that
      way we don't end up with this confusion.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      7bf811a5
    • J
      Btrfs: use right root when checking for hash collision · 4871c158
      Josef Bacik 提交于
      btrfs_rename was using the root of the old dir instead of the root of the new
      dir when checking for a hash collision, so if you tried to move a file into a
      subvol it would freak out because it would see the file you are trying to move
      in its current root.  This fixes the bug where this would fail
      
      btrfs subvol create test1
      btrfs subvol create test2
      mv test1 test2.
      
      Thanks to Chris Murphy for catching this,
      
      Cc: stable@vger.kernel.org
      Reported-by: NChris Murphy <lists@colorremedies.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4871c158
  10. 07 10月, 2013 3 次提交
  11. 06 10月, 2013 16 次提交
    • S
      do not treat non-symlink reparse points as valid symlinks · c31f3307
      Steve French 提交于
      Windows 8 and later can create NFS symlinks (within reparse points)
      which we were assuming were normal NTFS symlinks and thus reporting
      corrupt paths for.  Add check for reparse points to make sure that
      they really are normal symlinks before we try to parse the pathname.
      
      We also should not be parsing other types of reparse points (DFS
      junctions etc) as if they were a  symlink so return EOPNOTSUPP
      on those.  Also fix endian errors (we were not parsing symlink
      lengths as little endian).
      
      This fixes commit d244bf2d
      which implemented follow link for non-Unix CIFS mounts
      
      CC: Stable <stable@kernel.org>
      Reviewed-by: NAndrew Bartlett <abartlet@samba.org>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      c31f3307
    • T
      sysfs: merge regular and bin file handling · 3124eb16
      Tejun Heo 提交于
      With the previous changes, sysfs regular file code is ready to handle
      bin files too.  This patch makes bin files share the regular file
      path.
      
      * sysfs_create/remove_bin_file() are moved to fs/sysfs/file.c.
      
      * sysfs_init_inode() is updated to use the new sysfs_bin_operations
        instead of bin_fops for bin files.
      
      * fs/sysfs/bin.c and the related pieces are removed.
      
      This patch shouldn't introduce any behavior difference to bin file
      accesses.
      
      Overall, this unification reduces the amount of duplicate logic, makes
      behaviors more consistent and paves the road for building simpler and
      more versatile interface which will allow other subsystems to make use
      of sysfs for their pseudo filesystems.
      
      v2: Stale fs/sysfs/bin.c reference dropped from
          Documentation/DocBook/filesystems.tmpl.  Reported by kbuild test
          robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3124eb16
    • T
      sysfs: prepare open path for unified regular / bin file handling · 49fe6047
      Tejun Heo 提交于
      sysfs bin file handling will be merged into the regular file support.
      This patch prepares the open path.
      
      This patch updates sysfs_open_file() such that it can handle both
      regular and bin files.
      
      This is a preparation and the new bin file path isn't used yet.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      49fe6047
    • T
      sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c · 73d97146
      Tejun Heo 提交于
      sysfs bin file handling will be merged into the regular file support.
      This patch copies mmap support from bin so that fs/sysfs/file.c can
      handle mmapping bin files.
      
      The code is copied mostly verbatim with the following updates.
      
      * ->mmapped and ->vm_ops are added to sysfs_open_file and bin_buffer
        references are replaced with sysfs_open_file ones.
      
      * Symbols are prefixed with sysfs_.
      
      * sysfs_unmap_bin_file() grabs sysfs_open_dirent and traverses
        ->files.  Invocation of this function is added to
        sysfs_addrm_finish().
      
      * sysfs_bin_mmap() is added to sysfs_bin_operations.
      
      This is a preparation and the new mmap path isn't used yet.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73d97146
    • T
      sysfs: add sysfs_bin_read() · 2f0c6b75
      Tejun Heo 提交于
      sysfs bin file handling will be merged into the regular file support.
      This patch prepares the read path.
      
      Copy fs/sysfs/bin.c::read() to fs/sysfs/file.c and make it use
      sysfs_open_file instead of bin_buffer.  The function is identical copy
      except for the use of sysfs_open_file.
      
      The new function is added to sysfs_bin_operations.  This isn't used
      yet but will eventually replace fs/sysfs/bin.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2f0c6b75
    • T
      sysfs: prepare path write for unified regular / bin file handling · f9b9a621
      Tejun Heo 提交于
      sysfs bin file handling will be merged into the regular file support.
      This patch prepares the write path.
      
      bin file write is almost identical to regular file write except that
      the write length is capped by the inode size and @off is passed to the
      write method.  This patch adds bin file handling to sysfs_write_file()
      so that it can handle both regular and bin files.
      
      A new file_operations struct sysfs_bin_operations is added, which
      currently only hosts sysfs_write_file() and generic_file_llseek().
      This isn't used yet but will eventually replace fs/sysfs/bin.c.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f9b9a621
    • T
      sysfs: collapse fs/sysfs/bin.c::fill_read() into read() · 3ff65d3c
      Tejun Heo 提交于
      read() is simple enough and fill_read() being in a separate function
      doesn't add anything.  Let's collapse it into read().  This will make
      merging bin file handling with regular file.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3ff65d3c
    • T
      sysfs: skip bin_buffer->buffer while reading · 91270162
      Tejun Heo 提交于
      After b31ca3f5 ("sysfs: fix deadlock"), bin read() first writes
      data to bb->buffer and bounces it to a transient kernel buffer which
      is then copied out to userland.  The double bouncing doesn't add
      anything.  Let's just use the transient buffer directly.
      
      While at it, rename @temp to @buf for clarity.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      91270162
    • T
      sysfs: use seq_file when reading regular files · 13c589d5
      Tejun Heo 提交于
      sysfs read path implements its own buffering scheme between userland
      and kernel callbacks, which essentially is a degenerate duplicate of
      seq_file.  This patch replaces the custom read buffering
      implementation in sysfs with seq_file.
      
      While the amount of code reduction is small, this reduces low level
      hairiness and enables future development of a new versatile API based
      on seq_file so that sysfs features can be shared with other
      subsystems.
      
      As write path was already converted to not use sysfs_open_file->page,
      this patch makes ->page and ->count unused and removes them.
      
      Userland behavior remains the same except for some extreme corner
      cases - e.g. sysfs will now regenerate the content each time a file is
      read after a non-contiguous seek whereas the original code would keep
      using the same content.  While this is a userland visible behavior
      change, it is extremely unlikely to be noticeable and brings sysfs
      behavior closer to that of procfs.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kay Sievers <kay@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13c589d5
    • T
      sysfs: use transient write buffer · 8ef445f0
      Tejun Heo 提交于
      There isn't much to be gained by keeping around kernel buffer while a
      file is open especially as the read path planned to be converted to
      use seq_file and won't use the buffer.  This patch makes
      sysfs_write_file() use per-write transient buffer instead of
      sysfs_open_file->page.
      
      This simplifies the write path, enables removing sysfs_open_file->page
      once read path is updated and will help merging bin file write path
      which already requires the use of a transient buffer due to a locking
      order issue.
      
      As the function comments of flush_write_buffer() and
      sysfs_write_buffer() are being updated anyway, reformat them so that
      they're more conventional.
      
      v2: Use min_t() instead of min() in sysfs_write_file() to avoid build
          warning on arm.  Reported by build test robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8ef445f0
    • T
      sysfs: add sysfs_open_file->sd and ->file · bcafe4ee
      Tejun Heo 提交于
      sysfs will be converted to use seq_file for read path, which will make
      it difficult to pass around multiple pointers directly.  This patch
      adds sysfs_open_file->sd and ->file so that we can reach all the
      necessary data structures from sysfs_open_file.
      
      flush_write_buffer() is updated to drop @dentry which was used to
      discover the sysfs_dirent as it's now available through
      sysfs_open_file->sd.
      
      This patch doesn't cause any behavior difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bcafe4ee
    • T
      sysfs: rename sysfs_buffer to sysfs_open_file · 58282d8d
      Tejun Heo 提交于
      sysfs read path will be converted to use seq_file which will handle
      buffering making sysfs_buffer a misnomer.  Rename sysfs_buffer to
      sysfs_open_file, and sysfs_open_dirent->buffers to ->files.
      
      This path is pure rename.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      58282d8d
    • T
      sysfs: add sysfs_open_file_mutex · c75ec764
      Tejun Heo 提交于
      Add a separate mutex to protect sysfs_open_dirent->buffers list.  This
      will allow performing sleepable operations while traversing
      sysfs_buffers, which will be renamed to sysfs_open_file.
      
      Note that currently sysfs_open_dirent->buffers list isn't being used
      for anything and this patch doesn't make any functional difference.
      It will be used to merge regular and bin file supports.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c75ec764
    • T
      sysfs: remove sysfs_buffer->ops · 375b611e
      Tejun Heo 提交于
      Currently, sysfs_ops is fetched during sysfs_open_file() and cached in
      sysfs_buffer->ops to be used while the file is open.  This patch
      removes the caching and makes each operation directly fetch sysfs_ops.
      
      This patch doesn't introduce any behavior difference and is to prepare
      for merging regular and bin file supports.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      375b611e
    • T
      sysfs: remove sysfs_buffer->needs_read_fill · aea585ef
      Tejun Heo 提交于
      ->needs_read_fill is used to implement the following behaviors.
      
      1. Ensure buffer filling on the first read.
      2. Force buffer filling after a write.
      3. Force buffer filling after a successful poll.
      
      However, #2 and #3 don't really work as sysfs doesn't reset file
      position.  While the read buffer would be refilled, the next read
      would continue from the position after the last read or write,
      requiring an explicit seek to the start for it to be useful, which
      makes ->needs_read_fill superflous as read buffer is always refilled
      if f_pos == 0.
      
      Update sysfs_read_file() to test buffer->page for #1 instead and
      remove ->needs_read_fill.  While this changes behavior in extreme
      corner cases - e.g. re-reading a sysfs file after seeking to non-zero
      position after a write or poll, it's highly unlikely to lead to actual
      breakage.  This change is to prepare for using seq_file in the read
      path.
      
      While at it, reformat a comment in fill_write_buffer().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kay Sievers <kay@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aea585ef
    • T
      sysfs: remove unused sysfs_buffer->pos · 89e51dab
      Tejun Heo 提交于
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      89e51dab
  12. 05 10月, 2013 4 次提交
    • D
      btrfs: Fix crash due to not allocating integrity data for a bioset · b208c2f7
      Darrick J. Wong 提交于
      When btrfs creates a bioset, we must also allocate the integrity data pool.
      Otherwise btrfs will crash when it tries to submit a bio to a checksumming
      disk:
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
       IP: [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
       PGD 2305e4067 PUD 23063d067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP
       Modules linked in: btrfs scsi_debug xfs ext4 jbd2 ext3 jbd mbcache
      sch_fq_codel eeprom lpc_ich mfd_core nfsd exportfs auth_rpcgss af_packet
      raid6_pq xor zlib_deflate libcrc32c [last unloaded: scsi_debug]
       CPU: 1 PID: 4486 Comm: mount Not tainted 3.12.0-rc1-mcsum #2
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
       task: ffff8802451c9720 ti: ffff880230698000 task.ti: ffff880230698000
       RIP: 0010:[<ffffffff8111e28a>]  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
       RSP: 0018:ffff880230699688  EFLAGS: 00010286
       RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000005f8445
       RDX: 0000000000000001 RSI: 0000000000000010 RDI: 0000000000000000
       RBP: ffff8802306996f8 R08: 0000000000011200 R09: 0000000000000008
       R10: 0000000000000020 R11: ffff88009d6e8000 R12: 0000000000011210
       R13: 0000000000000030 R14: ffff8802306996b8 R15: ffff8802451c9720
       FS:  00007f25b8a16800(0000) GS:ffff88024fc80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000018 CR3: 0000000230576000 CR4: 00000000000007e0
       Stack:
        ffff8802451c9720 0000000000000002 ffffffff81a97100 0000000000281250
        ffffffff81a96480 ffff88024fc99150 ffff880228d18200 0000000000000000
        0000000000000000 0000000000000040 ffff880230e8c2e8 ffff8802459dc900
       Call Trace:
        [<ffffffff811b2208>] bio_integrity_alloc+0x48/0x1b0
        [<ffffffff811b26fc>] bio_integrity_prep+0xac/0x360
        [<ffffffff8111e298>] ? mempool_alloc+0x58/0x150
        [<ffffffffa03e8041>] ? alloc_extent_state+0x31/0x110 [btrfs]
        [<ffffffff81241579>] blk_queue_bio+0x1c9/0x460
        [<ffffffff8123e58a>] generic_make_request+0xca/0x100
        [<ffffffff8123e639>] submit_bio+0x79/0x160
        [<ffffffffa03f865e>] btrfs_map_bio+0x48e/0x5b0 [btrfs]
        [<ffffffffa03c821a>] btree_submit_bio_hook+0xda/0x110 [btrfs]
        [<ffffffffa03e7eba>] submit_one_bio+0x6a/0xa0 [btrfs]
        [<ffffffffa03ef450>] read_extent_buffer_pages+0x250/0x310 [btrfs]
        [<ffffffff8125eef6>] ? __radix_tree_preload+0x66/0xf0
        [<ffffffff8125f1c5>] ? radix_tree_insert+0x95/0x260
        [<ffffffffa03c66f6>] btree_read_extent_buffer_pages.constprop.128+0xb6/0x120
      [btrfs]
        [<ffffffffa03c8c1a>] read_tree_block+0x3a/0x60 [btrfs]
        [<ffffffffa03caefd>] open_ctree+0x139d/0x2030 [btrfs]
        [<ffffffffa03a282a>] btrfs_mount+0x53a/0x7d0 [btrfs]
        [<ffffffff8113ab0b>] ? pcpu_alloc+0x8eb/0x9f0
        [<ffffffff81167305>] ? __kmalloc_track_caller+0x35/0x1e0
        [<ffffffff81176ba0>] mount_fs+0x20/0xd0
        [<ffffffff81191096>] vfs_kern_mount+0x76/0x120
        [<ffffffff81193320>] do_mount+0x200/0xa40
        [<ffffffff81135cdb>] ? strndup_user+0x5b/0x80
        [<ffffffff81193bf0>] SyS_mount+0x90/0xe0
        [<ffffffff8156d31d>] system_call_fastpath+0x1a/0x1f
       Code: 4c 8d 75 a8 4c 89 6d e8 45 89 e0 4c 8d 6f 30 48 89 5d d8 41 83 e0 af 48
      89 fb 49 83 c6 18 4c 89 7d f8 65 4c 8b 3c 25 c0 b8 00 00 <48> 8b 73 18 44 89 c7
      44 89 45 98 ff 53 20 48 85 c0 48 89 c2 74
       RIP  [<ffffffff8111e28a>] mempool_alloc+0x4a/0x150
        RSP <ffff880230699688>
       CR2: 0000000000000018
       ---[ end trace 7a96042017ed21e2 ]---
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      b208c2f7
    • I
      Btrfs: fix a use-after-free bug in btrfs_dev_replace_finishing · 1357272f
      Ilya Dryomov 提交于
      free_device rcu callback, scheduled from btrfs_rm_dev_replace_srcdev,
      can be processed before btrfs_scratch_superblock is called, which would
      result in a use-after-free on btrfs_device contents.  Fix this by
      zeroing the superblock before the rcu callback is registered.
      
      Cc: Stefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      1357272f
    • I
      Btrfs: eliminate races in worker stopping code · 964fb15a
      Ilya Dryomov 提交于
      The current implementation of worker threads in Btrfs has races in
      worker stopping code, which cause all kinds of panics and lockups when
      running btrfs/011 xfstest in a loop.  The problem is that
      btrfs_stop_workers is unsynchronized with respect to check_idle_worker,
      check_busy_worker and __btrfs_start_workers.
      
      E.g., check_idle_worker race flow:
      
             btrfs_stop_workers():            check_idle_worker(aworker):
      - grabs the lock
      - splices the idle list into the
        working list
      - removes the first worker from the
        working list
      - releases the lock to wait for
        its kthread's completion
                                        - grabs the lock
                                        - if aworker is on the working list,
                                          moves aworker from the working list
                                          to the idle list
                                        - releases the lock
      - grabs the lock
      - puts the worker
      - removes the second worker from the
        working list
                                    ......
              btrfs_stop_workers returns, aworker is on the idle list
                       FS is umounted, memory is freed
                                    ......
                    aworker is waken up, fireworks ensue
      
      With this applied, I wasn't able to trigger the problem in 48 hours,
      whereas previously I could reliably reproduce at least one of these
      races within an hour.
      Reported-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      964fb15a
    • L
      Btrfs: fix crash of compressed writes · 385fe0be
      Liu Bo 提交于
      The crash[1] is found by xfstests/generic/208 with "-o compress",
      it's not reproduced everytime, but it does panic.
      
      The bug is quite interesting, it's actually introduced by a recent commit
      (573aecaf,
      Btrfs: actually limit the size of delalloc range).
      
      Btrfs implements delay allocation, so during writeback, we
      (1) get a page A and lock it
      (2) search the state tree for delalloc bytes and lock all pages within the range
      (3) process the delalloc range, including find disk space and create
          ordered extent and so on.
      (4) submit the page A.
      
      It runs well in normal cases, but if we're in a racy case, eg.
      buffered compressed writes and aio-dio writes,
      sometimes we may fail to lock all pages in the 'delalloc' range,
      in which case, we need to fall back to search the state tree again with
      a smaller range limit(max_bytes = PAGE_CACHE_SIZE - offset).
      
      The mentioned commit has a side effect, that is, in the fallback case,
      we can find delalloc bytes before the index of the page we already have locked,
      so we're in the case of (delalloc_end <= *start) and return with (found > 0).
      
      This ends with not locking delalloc pages but making ->writepage still
      process them, and the crash happens.
      
      This fixes it by just thinking that we find nothing and returning to caller
      as the caller knows how to deal with it properly.
      
      [1]:
      ------------[ cut here ]------------
      kernel BUG at mm/page-writeback.c:2170!
      [...]
      CPU: 2 PID: 11755 Comm: btrfs-delalloc- Tainted: G           O 3.11.0+ #8
      [...]
      RIP: 0010:[<ffffffff810f5093>]  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
      [...]
      [ 4934.248731] Stack:
      [ 4934.248731]  ffff8801477e5dc8 ffffea00049b9f00 ffff8801869f9ce8 ffffffffa02b841a
      [ 4934.248731]  0000000000000000 0000000000000000 0000000000000fff 0000000000000620
      [ 4934.248731]  ffff88018db59c78 ffffea0005da8d40 ffffffffa02ff860 00000001810016c0
      [ 4934.248731] Call Trace:
      [ 4934.248731]  [<ffffffffa02b841a>] extent_range_clear_dirty_for_io+0xcf/0xf5 [btrfs]
      [ 4934.248731]  [<ffffffffa02a8889>] compress_file_range+0x1dc/0x4cb [btrfs]
      [ 4934.248731]  [<ffffffff8104f7af>] ? detach_if_pending+0x22/0x4b
      [ 4934.248731]  [<ffffffffa02a8bad>] async_cow_start+0x35/0x53 [btrfs]
      [ 4934.248731]  [<ffffffffa02c694b>] worker_loop+0x14b/0x48c [btrfs]
      [ 4934.248731]  [<ffffffffa02c6800>] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
      [ 4934.248731]  [<ffffffff810608f5>] kthread+0x8d/0x95
      [ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
      [ 4934.248731]  [<ffffffff814fe09c>] ret_from_fork+0x7c/0xb0
      [ 4934.248731]  [<ffffffff81060868>] ? kthread_freezable_should_stop+0x43/0x43
      [ 4934.248731] Code: ff 85 c0 0f 94 c0 0f b6 c0 59 5b 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb e8 2c de 00 00 49 89 c4 48 8b 03 a8 01 75 02 <0f> 0b 4d 85 e4 74 52 49 8b 84 24 80 00 00 00 f6 40 20 01 75 44
      [ 4934.248731] RIP  [<ffffffff810f5093>] clear_page_dirty_for_io+0x1e/0x83
      [ 4934.248731]  RSP <ffff8801869f9c48>
      [ 4934.280307] ---[ end trace 36f06d3f8750236a ]---
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      385fe0be