1. 17 4月, 2017 5 次提交
    • J
      nbd: add a basic netlink interface · e46c7287
      Josef Bacik 提交于
      The existing ioctl interface for configuring NBD devices is a bit
      cumbersome and hard to extend.  The other problem is we leave a
      userspace app sitting in it's syscall until the device disconnects,
      which is less than ideal.
      
      This patch introduces a netlink interface for adding and disconnecting
      nbd devices.  This has the benefits of being easily extendable without
      breaking older userspace applications, and allows us to configure a nbd
      device without leaving a userspace app sitting waiting for the device to
      disconnect.
      
      With this interface we also gain the ability to configure more devices
      than are preallocated at insmod time.  We also have gained the ability
      to not specify a particular device and be provided one for us so that
      userspace doesn't need to find a free device to configure.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e46c7287
    • J
      nbd: stop using the bdev everywhere · 29eaadc0
      Josef Bacik 提交于
      In preparation for the upcoming netlink interface we need to not rely on
      already having the bdev for the NBD device we are doing operations on.
      Instead of passing the bdev around, just use it in places where we know
      we already have the bdev.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      29eaadc0
    • J
      nbd: separate out the config information · 5ea8d108
      Josef Bacik 提交于
      In order to properly refcount the various aspects of a NBD device we
      need to separate out the configuration elements of the nbd device.  The
      configuration of a NBD device has a different lifetime from the actual
      device, so it doesn't make sense to bundle these two concepts.  Add a
      config_refs to keep track of the configuration structure, that way we
      can be sure that we never access it when we've torn down the device.
      Add a new nbd_config structure to hold all of the transient
      configuration information.  Finally create this when we open the device
      so that it is in place when we start to configure the device.  This has
      a nice side-effect of fixing a long standing problem where you could end
      up with a half-configured nbd device that needed to be "disconnected" in
      order to be usable again.  Now once we close our device the
      configuration will be discarded.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5ea8d108
    • J
      nbd: handle single path failures gracefully · f3733247
      Josef Bacik 提交于
      Currently if we have multiple connections and one of them goes down we will tear
      down the whole device.  However there's no reason we need to do this as we
      could have other connections that are working fine.  Deal with this by keeping
      track of the state of the different connections, and if we lose one we mark it
      as dead and send all IO destined for that socket to one of the other healthy
      sockets.  Any outstanding requests that were on the dead socket will timeout and
      be re-submitted properly.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f3733247
    • J
      nbd: put socket in error cases · 9b1355d5
      Josef Bacik 提交于
      When adding a new socket we look it up and then try to add it to our
      configuration.  If any of those steps fail we need to make sure we put
      the socket so we don't leak them.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9b1355d5
  2. 09 4月, 2017 1 次提交
  3. 31 3月, 2017 1 次提交
  4. 25 3月, 2017 4 次提交
    • R
      nbd: replace kill_bdev() with __invalidate_device() · abbbdf12
      Ratna Manoj Bolla 提交于
      When a filesystem is mounted on a nbd device and on a disconnect, because
      of kill_bdev(), and resetting bdev size to zero, buffer_head mappings are
      getting destroyed under mounted filesystem.
      
      After a bdev size reset(i.e bdev->bd_inode->i_size = 0) on a disconnect,
      followed by a sys_umount(),
              generic_shutdown_super()->...
              ->__sync_blockdev()->...
              -blkdev_writepages()->...
              ->do_invalidatepage()->...
              -discard_buffer()   is discarding superblock buffer_head assumed
      to be in mapped state by ext4_commit_super().
      
      [mlin: ported to 4.11-rc2]
      Signed-off-by: Ratna Manoj Bolla <manoj.br@gmail.com
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      abbbdf12
    • J
      nbd: set queue timeout properly · f8586855
      Josef Bacik 提交于
      We can't just set the timeout on the tagset, we have to set it on the
      queue as it would have been setup already at this point.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      f8586855
    • J
      nbd: set rq->errors to actual error code · c103b4da
      Josef Bacik 提交于
      We've been relying on the block layer to assume rq->errors being set
      translates into -EIO.  I noticed in testing that sometimes this isn't
      true, and really there's not much of a reason to have a counter instead
      of just using -EIO.  So set it properly so we don't leak random numbers
      to unsuspecting victims.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c103b4da
    • J
      nbd: handle ERESTARTSYS properly · 9dd5d3ab
      Josef Bacik 提交于
      We can submit IO in a processes context, which means there can be
      pending signals.  This isn't a fatal error for NBD, but it does require
      some finesse.  If the signal happens before we transmit anything then we
      are ok, just requeue the request and carry on.  However if we've done a
      partial transmit we can't allow anything else to be transmitted on this
      socket until we transmit the remaining part of the request.  Deal with
      this by keeping track of how much we've sent for the current request,
      and if we get an ERESTARTSYS during any part of our transmission save
      the state of that request and requeue the IO.  If anybody tries to
      submit a request that isn't our pending request then requeue that
      request until we are able to service the one that is pending.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9dd5d3ab
  5. 02 3月, 2017 1 次提交
  6. 22 2月, 2017 3 次提交
  7. 02 2月, 2017 2 次提交
  8. 01 2月, 2017 3 次提交
  9. 20 1月, 2017 1 次提交
  10. 11 1月, 2017 1 次提交
  11. 27 12月, 2016 2 次提交
  12. 25 12月, 2016 1 次提交
  13. 09 12月, 2016 2 次提交
  14. 04 12月, 2016 1 次提交
    • J
      nbd: fix 64-bit division · e88f72cb
      Jens Axboe 提交于
      We have this:
      
      ERROR: "__aeabi_ldivmod" [drivers/block/nbd.ko] undefined!
      ERROR: "__divdi3" [drivers/block/nbd.ko] undefined!
      nbd.c:(.text+0x247c72): undefined reference to `__divdi3'
      
      due to a recent commit, that did 64-bit division. Use the proper
      divider function so that 32-bit compiles don't break.
      
      Fixes: ef77b515 ("nbd: use loff_t for blocksize and nbd_set_size args")
      Signed-off-by: NJens Axboe <axboe@fb.com>
      e88f72cb
  15. 03 12月, 2016 1 次提交
  16. 23 11月, 2016 2 次提交
  17. 18 11月, 2016 1 次提交
  18. 07 11月, 2016 1 次提交
  19. 25 10月, 2016 1 次提交
  20. 24 9月, 2016 1 次提交
  21. 09 9月, 2016 4 次提交
  22. 05 8月, 2016 1 次提交
    • V
      nbd: fix race in ioctl · 97240963
      Vegard Nossum 提交于
      Quentin ran into this bug:
      
      WARNING: CPU: 64 PID: 10085 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x65/0x80
      sysfs: cannot create duplicate filename '/devices/virtual/block/nbd3/pid'
      Modules linked in: nbd
      CPU: 64 PID: 10085 Comm: qemu-nbd Tainted: G      D         4.6.0+ #7
       0000000000000000 ffff8820330bba68 ffffffff814b8791 ffff8820330bbac8
       0000000000000000 ffff8820330bbab8 ffffffff810d04ab ffff8820330bbaa8
       0000001f00000296 0000000000017681 ffff8810380bf000 ffffffffa0001790
      Call Trace:
       [<ffffffff814b8791>] dump_stack+0x4d/0x6c
       [<ffffffff810d04ab>] __warn+0xdb/0x100
       [<ffffffff810d0574>] warn_slowpath_fmt+0x44/0x50
       [<ffffffff81218c65>] sysfs_warn_dup+0x65/0x80
       [<ffffffff81218a02>] sysfs_add_file_mode_ns+0x172/0x180
       [<ffffffff81218a35>] sysfs_create_file_ns+0x25/0x30
       [<ffffffff81594a76>] device_create_file+0x36/0x90
       [<ffffffffa0000e8d>] __nbd_ioctl+0x32d/0x9b0 [nbd]
       [<ffffffff814cc8e8>] ? find_next_bit+0x18/0x20
       [<ffffffff810f7c29>] ? select_idle_sibling+0xe9/0x120
       [<ffffffff810f6cd7>] ? __enqueue_entity+0x67/0x70
       [<ffffffff810f9bf0>] ? enqueue_task_fair+0x630/0xe20
       [<ffffffff810efa76>] ? resched_curr+0x36/0x70
       [<ffffffff810f0078>] ? check_preempt_curr+0x78/0x90
       [<ffffffff810f00a2>] ? ttwu_do_wakeup+0x12/0x80
       [<ffffffff810f01b1>] ? ttwu_do_activate.constprop.86+0x61/0x70
       [<ffffffff810f0c15>] ? try_to_wake_up+0x185/0x2d0
       [<ffffffff810f0d6d>] ? default_wake_function+0xd/0x10
       [<ffffffff81105471>] ? autoremove_wake_function+0x11/0x40
       [<ffffffffa0001577>] nbd_ioctl+0x67/0x94 [nbd]
       [<ffffffff814ac0fd>] blkdev_ioctl+0x14d/0x940
       [<ffffffff811b0da2>] ? put_pipe_info+0x22/0x60
       [<ffffffff811d96cc>] block_ioctl+0x3c/0x40
       [<ffffffff811ba08d>] do_vfs_ioctl+0x8d/0x5e0
       [<ffffffff811aa329>] ? ____fput+0x9/0x10
       [<ffffffff810e9092>] ? task_work_run+0x72/0x90
       [<ffffffff811ba627>] SyS_ioctl+0x47/0x80
       [<ffffffff8185f5df>] entry_SYSCALL_64_fastpath+0x17/0x93
      ---[ end trace 7899b295e4f850c8 ]---
      
      It seems fairly obvious that device_create_file() is not being protected
      from being run concurrently on the same nbd.
      
      Quentin found the following relevant commits:
      
      1a2ad211 nbd: add locking to nbd_ioctl
      90b8f282 [PATCH] end of methods switch: remove the old ones
      d4430d62 [PATCH] beginning of methods conversion
      08f85851 [PATCH] move block_device_operations to blkdev.h
      
      It would seem that the race was introduced in the process of moving nbd
      from BKL to unlocked ioctls.
      
      By setting nbd->task_recv while the mutex is held, we can prevent other
      processes from running concurrently (since nbd->task_recv is also checked
      while the mutex is held).
      Reported-and-tested-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Markus Pargmann <mpa@pengutronix.de>
      Cc: Paul Clements <paul.clements@steeleye.com>
      Cc: Pavel Machek <pavel@suse.cz>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      97240963