1. 12 10月, 2019 1 次提交
    • M
      nbd: fix max number of supported devs · 9f0f39c9
      Mike Christie 提交于
      commit e9e006f5fcf2bab59149cb38a48a4817c1b538b4 upstream.
      
      This fixes a bug added in 4.10 with commit:
      
      commit 9561a7ad
      Author: Josef Bacik <jbacik@fb.com>
      Date:   Tue Nov 22 14:04:40 2016 -0500
      
          nbd: add multi-connection support
      
      that limited the number of devices to 256. Before the patch we could
      create 1000s of devices, but the patch switched us from using our
      own thread to using a work queue which has a default limit of 256
      active works.
      
      The problem is that our recv_work function sits in a loop until
      disconnection but only handles IO for one connection. The work is
      started when the connection is started/restarted, but if we end up
      creating 257 or more connections, the queue_work call just queues
      connection257+'s recv_work and that waits for connection 1 - 256's
      recv_work to be disconnected and that work instance completing.
      
      Instead of reverting back to kthreads, this has us allocate a
      workqueue_struct per device, so we can block in the work.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9f0f39c9
  2. 05 10月, 2019 1 次提交
  3. 07 8月, 2019 1 次提交
    • M
      nbd: replace kill_bdev() with __invalidate_device() again · eb828241
      Munehisa Kamata 提交于
      commit 2b5c8f0063e4b263cf2de82029798183cf85c320 upstream.
      
      Commit abbbdf12 ("replace kill_bdev() with __invalidate_device()")
      once did this, but 29eaadc0 ("nbd: stop using the bdev everywhere")
      resurrected kill_bdev() and it has been there since then. So buffer_head
      mappings still get killed on a server disconnection, and we can still
      hit the BUG_ON on a filesystem on the top of the nbd device.
      
        EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
        block nbd0: Receive control failed (result -32)
        block nbd0: shutting down sockets
        print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
        EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
        print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
        EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
        EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
        ------------[ cut here ]------------
        kernel BUG at fs/buffer.c:3057!
        invalid opcode: 0000 [#1] SMP PTI
        CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
        Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
        RIP: 0010:submit_bh_wbc+0x18b/0x190
        ...
        Call Trace:
         jbd2_write_superblock+0xf1/0x230 [jbd2]
         ? account_entity_enqueue+0xc5/0xf0
         jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
         jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
         ? __switch_to_asm+0x40/0x70
         ...
         ? lock_timer_base+0x67/0x80
         kjournald2+0x121/0x360 [jbd2]
         ? remove_wait_queue+0x60/0x60
         kthread+0xf8/0x130
         ? commit_timeout+0x10/0x10 [jbd2]
         ? kthread_bind+0x10/0x10
         ret_from_fork+0x35/0x40
      
      With __invalidate_device(), I no longer hit the BUG_ON with sync or
      unmount on the disconnected device.
      
      Fixes: 29eaadc0 ("nbd: stop using the bdev everywhere")
      Cc: linux-block@vger.kernel.org
      Cc: Ratna Manoj Bolla <manoj.br@gmail.com>
      Cc: nbd@other.debian.org
      Cc: stable@vger.kernel.org
      Cc: David Woodhouse <dwmw@amazon.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NMunehisa Kamata <kamatam@amazon.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      eb828241
  4. 23 1月, 2019 1 次提交
  5. 05 9月, 2018 1 次提交
  6. 21 7月, 2018 1 次提交
  7. 17 7月, 2018 2 次提交
    • J
      nbd: handle unexpected replies better · 8f3ea359
      Josef Bacik 提交于
      If the server or network is misbehaving and we get an unexpected reply
      we can sometimes miss the request not being started and wait on a
      request and never get a response, or even double complete the same
      request.  Fix this by replacing the send_complete completion with just a
      per command lock.  Add a per command cookie as well so that we can know
      if we're getting a double completion for a previous event.  Also check
      to make sure we dont have REQUEUED set as that means we raced with the
      timeout handler and need to just let the retry occur.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8f3ea359
    • J
      nbd: don't requeue the same request twice. · d7d94d48
      Josef Bacik 提交于
      We can race with the snd timeout and the per-request timeout and end up
      requeuing the same request twice.  We can't use the send_complete
      completion to tell if everything is ok because we hold the tx_lock
      during send, so the timeout stuff will block waiting to mark the socket
      dead, and we could be marked complete and still requeue.  Instead add a
      flag to the socket so we know whether we've been requeued yet.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d7d94d48
  8. 21 6月, 2018 1 次提交
  9. 05 6月, 2018 2 次提交
  10. 31 5月, 2018 2 次提交
  11. 29 5月, 2018 2 次提交
  12. 25 5月, 2018 1 次提交
  13. 24 5月, 2018 1 次提交
  14. 23 5月, 2018 1 次提交
  15. 17 5月, 2018 6 次提交
  16. 09 3月, 2018 1 次提交
  17. 28 2月, 2018 1 次提交
  18. 07 11月, 2017 2 次提交
  19. 25 10月, 2017 1 次提交
  20. 10 10月, 2017 1 次提交
    • J
      nbd: don't set the device size until we're connected · 639812a1
      Josef Bacik 提交于
      A user reported a regression with using the normal ioctl interface on
      newer kernels.  This happens because I was setting the device size
      before the device was actually connected, which caused us to error out
      and close everything down.  This didn't happen on netlink because we
      hold the device lock the whole time we're setting things up, but we
      don't do that for the ioctl path.  This fixes the problem.
      
      Cc: stable@vger.kernel.org
      Fixes: 29eaadc0 ("nbd: stop using the bdev everywhere")
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      639812a1
  21. 03 10月, 2017 1 次提交
    • J
      nbd: fix -ERESTARTSYS handling · 6e60a3bb
      Josef Bacik 提交于
      Christoph made it so that if we return'ed BLK_STS_RESOURCE whenever we
      got ERESTARTSYS from sending our packets we'd return BLK_STS_OK, which
      means we'd never requeue and just hang.  We really need to return the
      right value from the upper layer.
      
      Fixes: fc17b653 ("blk-mq: switch ->queue_rq return value to blk_status_t")
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6e60a3bb
  22. 25 9月, 2017 1 次提交
    • J
      nbd: ignore non-nbd ioctl's · 1dae69be
      Josef Bacik 提交于
      In testing we noticed that nbd would spew if you ran a fio job against
      the raw device itself.  This is because fio calls a block device
      specific ioctl, however the block layer will first pass this back to the
      driver ioctl handler in case the driver wants to do something special.
      Since the device was setup using netlink this caused us to spew every
      time fio called this ioctl.  Since we don't have special handling, just
      error out for any non-nbd specific ioctl's that come in.  This fixes the
      spew.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1dae69be
  23. 29 8月, 2017 1 次提交
  24. 18 8月, 2017 2 次提交
  25. 26 7月, 2017 1 次提交
    • J
      nbd: clear disconnected on reconnect · 7a362ea9
      Josef Bacik 提交于
      If our device loses its connection for longer than the dead timeout we
      will set NBD_DISCONNECTED in order to quickly fail any pending IO's that
      flood in after the IO's that were waiting during the dead timer.
      However if we re-connect at some point in the future we'll still see
      this DISCONNECTED flag set if we then lose our connection again after
      that, which means we won't get notifications for our newly lost
      connections.  Fix this by just clearing the DISCONNECTED flag on
      reconnect in order to make sure everything works as it's supposed to.
      Reported-by: NDan Melnic <dmm@fb.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7a362ea9
  26. 23 7月, 2017 3 次提交
  27. 13 7月, 2017 1 次提交