1. 24 6月, 2020 1 次提交
  2. 12 3月, 2020 2 次提交
  3. 30 1月, 2020 1 次提交
    • S
      nbd: add a flush_workqueue in nbd_start_device · 5c0dd228
      Sun Ke 提交于
      When kzalloc fail, may cause trying to destroy the
      workqueue from inside the workqueue.
      
      If num_connections is m (2 < m), and NO.1 ~ NO.n
      (1 < n < m) kzalloc are successful. The NO.(n + 1)
      failed. Then, nbd_start_device will return ENOMEM
      to nbd_start_device_ioctl, and nbd_start_device_ioctl
      will return immediately without running flush_workqueue.
      However, we still have n recv threads. If nbd_release
      run first, recv threads may have to drop the last
      config_refs and try to destroy the workqueue from
      inside the workqueue.
      
      To fix it, add a flush_workqueue in nbd_start_device.
      
      Fixes: e9e006f5 ("nbd: fix max number of supported devs")
      Signed-off-by: NSun Ke <sunke32@huawei.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5c0dd228
  4. 17 12月, 2019 1 次提交
    • M
      nbd: fix shutdown and recv work deadlock v2 · 1c05839a
      Mike Christie 提交于
      This fixes a regression added with:
      
      commit e9e006f5
      Author: Mike Christie <mchristi@redhat.com>
      Date:   Sun Aug 4 14:10:06 2019 -0500
      
          nbd: fix max number of supported devs
      
      where we can deadlock during device shutdown. The problem occurs if
      the recv_work's nbd_config_put occurs after nbd_start_device_ioctl has
      returned and the userspace app has droppped its reference via closing
      the device and running nbd_release. The recv_work nbd_config_put call
      would then drop the refcount to zero and try to destroy the config which
      would try to do destroy_workqueue from the recv work.
      
      This patch just has nbd_start_device_ioctl do a flush_workqueue when it
      wakes so we know after the ioctl returns running works have exited. This
      also fixes a possible race where we could try to reuse the device while
      old recv_works are still running.
      
      Cc: stable@vger.kernel.org
      Fixes: e9e006f5 ("nbd: fix max number of supported devs")
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1c05839a
  5. 22 11月, 2019 1 次提交
  6. 20 11月, 2019 1 次提交
  7. 26 10月, 2019 3 次提交
    • M
      nbd: verify socket is supported during setup · cf1b2326
      Mike Christie 提交于
      nbd requires socket families to support the shutdown method so the nbd
      recv workqueue can be woken up from its sock_recvmsg call. If the socket
      does not support the callout we will leave recv works running or get hangs
      later when the device or module is removed.
      
      This adds a check during socket connection/reconnection to make sure the
      socket being passed in supports the needed callout.
      
      Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com
      Fixes: e9e006f5 ("nbd: fix max number of supported devs")
      Tested-by: NRichard W.M. Jones <rjones@redhat.com>
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      cf1b2326
    • J
      nbd: handle racing with error'ed out commands · 7ce23e8e
      Josef Bacik 提交于
      We hit the following warning in production
      
      print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
      ------------[ cut here ]------------
      refcount_t: underflow; use-after-free.
      WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
      Workqueue: knbd-recv recv_work [nbd]
      RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
      Call Trace:
       blk_mq_free_request+0xb7/0xf0
       blk_mq_complete_request+0x62/0xf0
       recv_work+0x29/0xa1 [nbd]
       process_one_work+0x1f5/0x3f0
       worker_thread+0x2d/0x3d0
       ? rescuer_thread+0x340/0x340
       kthread+0x111/0x130
       ? kthread_create_on_node+0x60/0x60
       ret_from_fork+0x1f/0x30
      ---[ end trace b079c3c67f98bb7c ]---
      
      This was preceded by us timing out everything and shutting down the
      sockets for the device.  The problem is we had a request in the queue at
      the same time, so we completed the request twice.  This can actually
      happen in a lot of cases, we fail to get a ref on our config, we only
      have one connection and just error out the command, etc.
      
      Fix this by checking cmd->status in nbd_read_stat.  We only change this
      under the cmd->lock, so we are safe to check this here and see if we've
      already error'ed this command out, which would indicate that we've
      completed it as well.
      Reviewed-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7ce23e8e
    • J
      nbd: protect cmd->status with cmd->lock · de6346ec
      Josef Bacik 提交于
      We already do this for the most part, except in timeout and clear_req.
      For the timeout case we take the lock after we grab a ref on the config,
      but that isn't really necessary because we're safe to touch the cmd at
      this point, so just move the order around.
      
      For the clear_req cause this is initiated by the user, so again is safe.
      Reviewed-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      de6346ec
  8. 10 10月, 2019 1 次提交
  9. 18 9月, 2019 2 次提交
    • X
      nbd: fix possible page fault for nbd disk · 8454d685
      Xiubo Li 提交于
      When the NBD_CFLAG_DESTROY_ON_DISCONNECT flag is set and at the same
      time when the socket is closed due to the server daemon is restarted,
      just before the last DISCONNET is totally done if we start a new connection
      by using the old nbd_index, there will be crashing randomly, like:
      
      <3>[  110.151949] block nbd1: Receive control failed (result -32)
      <1>[  110.152024] BUG: unable to handle page fault for address: 0000058000000840
      <1>[  110.152063] #PF: supervisor read access in kernel mode
      <1>[  110.152083] #PF: error_code(0x0000) - not-present page
      <6>[  110.152094] PGD 0 P4D 0
      <4>[  110.152106] Oops: 0000 [#1] SMP PTI
      <4>[  110.152120] CPU: 0 PID: 6698 Comm: kworker/u5:1 Kdump: loaded Not tainted 5.3.0-rc4+ #2
      <4>[  110.152136] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      <4>[  110.152166] Workqueue: knbd-recv recv_work [nbd]
      <4>[  110.152187] RIP: 0010:__dev_printk+0xd/0x67
      <4>[  110.152206] Code: 10 e8 c5 fd ff ff 48 8b 4c 24 18 65 48 33 0c 25 28 00 [...]
      <4>[  110.152244] RSP: 0018:ffffa41581f13d18 EFLAGS: 00010206
      <4>[  110.152256] RAX: ffffa41581f13d30 RBX: ffff96dd7374e900 RCX: 0000000000000000
      <4>[  110.152271] RDX: ffffa41581f13d20 RSI: 00000580000007f0 RDI: ffffffff970ec24f
      <4>[  110.152285] RBP: ffffa41581f13d80 R08: ffff96dd7fc17908 R09: 0000000000002e56
      <4>[  110.152299] R10: ffffffff970ec24f R11: 0000000000000003 R12: ffff96dd7374e900
      <4>[  110.152313] R13: 0000000000000000 R14: ffff96dd7374e9d8 R15: ffff96dd6e3b02c8
      <4>[  110.152329] FS:  0000000000000000(0000) GS:ffff96dd7fc00000(0000) knlGS:0000000000000000
      <4>[  110.152362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[  110.152383] CR2: 0000058000000840 CR3: 0000000067cc6002 CR4: 00000000001606f0
      <4>[  110.152401] Call Trace:
      <4>[  110.152422]  _dev_err+0x6c/0x83
      <4>[  110.152435]  nbd_read_stat.cold+0xda/0x578 [nbd]
      <4>[  110.152448]  ? __switch_to_asm+0x34/0x70
      <4>[  110.152468]  ? __switch_to_asm+0x40/0x70
      <4>[  110.152478]  ? __switch_to_asm+0x34/0x70
      <4>[  110.152491]  ? __switch_to_asm+0x40/0x70
      <4>[  110.152501]  ? __switch_to_asm+0x34/0x70
      <4>[  110.152511]  ? __switch_to_asm+0x40/0x70
      <4>[  110.152522]  ? __switch_to_asm+0x34/0x70
      <4>[  110.152533]  recv_work+0x35/0x9e [nbd]
      <4>[  110.152547]  process_one_work+0x19d/0x340
      <4>[  110.152558]  worker_thread+0x50/0x3b0
      <4>[  110.152568]  kthread+0xfb/0x130
      <4>[  110.152577]  ? process_one_work+0x340/0x340
      <4>[  110.152609]  ? kthread_park+0x80/0x80
      <4>[  110.152637]  ret_from_fork+0x35/0x40
      
      This is very easy to reproduce by running the nbd-runner.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NXiubo Li <xiubli@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8454d685
    • X
      nbd: rename the runtime flags as NBD_RT_ prefixed · ec76a7b9
      Xiubo Li 提交于
      Preparing for the destory when disconnecting crash fixing.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NXiubo Li <xiubli@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ec76a7b9
  10. 21 8月, 2019 5 次提交
  11. 31 7月, 2019 1 次提交
    • M
      nbd: replace kill_bdev() with __invalidate_device() again · 2b5c8f00
      Munehisa Kamata 提交于
      Commit abbbdf12 ("replace kill_bdev() with __invalidate_device()")
      once did this, but 29eaadc0 ("nbd: stop using the bdev everywhere")
      resurrected kill_bdev() and it has been there since then. So buffer_head
      mappings still get killed on a server disconnection, and we can still
      hit the BUG_ON on a filesystem on the top of the nbd device.
      
        EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
        block nbd0: Receive control failed (result -32)
        block nbd0: shutting down sockets
        print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
        EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
        print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
        EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
        EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
        ------------[ cut here ]------------
        kernel BUG at fs/buffer.c:3057!
        invalid opcode: 0000 [#1] SMP PTI
        CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
        Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
        RIP: 0010:submit_bh_wbc+0x18b/0x190
        ...
        Call Trace:
         jbd2_write_superblock+0xf1/0x230 [jbd2]
         ? account_entity_enqueue+0xc5/0xf0
         jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
         jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
         ? __switch_to_asm+0x40/0x70
         ...
         ? lock_timer_base+0x67/0x80
         kjournald2+0x121/0x360 [jbd2]
         ? remove_wait_queue+0x60/0x60
         kthread+0xf8/0x130
         ? commit_timeout+0x10/0x10 [jbd2]
         ? kthread_bind+0x10/0x10
         ret_from_fork+0x35/0x40
      
      With __invalidate_device(), I no longer hit the BUG_ON with sync or
      unmount on the disconnected device.
      
      Fixes: 29eaadc0 ("nbd: stop using the bdev everywhere")
      Cc: linux-block@vger.kernel.org
      Cc: Ratna Manoj Bolla <manoj.br@gmail.com>
      Cc: nbd@other.debian.org
      Cc: stable@vger.kernel.org
      Cc: David Woodhouse <dwmw@amazon.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NMunehisa Kamata <kamatam@amazon.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2b5c8f00
  12. 11 7月, 2019 2 次提交
  13. 31 5月, 2019 1 次提交
  14. 28 4月, 2019 3 次提交
    • J
      genetlink: optionally validate strictly/dumps · ef6243ac
      Johannes Berg 提交于
      Add options to strictly validate messages and dump messages,
      sometimes perhaps validating dump messages non-strictly may
      be required, so add an option for that as well.
      
      Since none of this can really be applied to existing commands,
      set the options everwhere using the following spatch:
      
          @@
          identifier ops;
          expression X;
          @@
          struct genl_ops ops[] = {
          ...,
           {
                  .cmd = X,
          +       .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
                  ...
           },
          ...
          };
      
      For new commands one should just not copy the .validate 'opt-out'
      flags and thus get strict validation.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef6243ac
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  15. 27 4月, 2019 2 次提交
  16. 22 3月, 2019 1 次提交
    • J
      genetlink: make policy common to family · 3b0f31f2
      Johannes Berg 提交于
      Since maxattr is common, the policy can't really differ sanely,
      so make it common as well.
      
      The only user that did in fact manage to make a non-common policy
      is taskstats, which has to be really careful about it (since it's
      still using a common maxattr!). This is no longer supported, but
      we can fake it using pre_doit.
      
      This reduces the size of e.g. nl80211.o (which has lots of commands):
      
         text	   data	    bss	    dec	    hex	filename
       398745	  14323	   2240	 415308	  6564c	net/wireless/nl80211.o (before)
       397913	  14331	   2240	 414484	  65314	net/wireless/nl80211.o (after)
      --------------------------------
         -832      +8       0    -824
      
      Which is obviously just 8 bytes for each command, and an added 8
      bytes for the new policy pointer. I'm not sure why the ops list is
      counted as .text though.
      
      Most of the code transformations were done using the following spatch:
          @ops@
          identifier OPS;
          expression POLICY;
          @@
          struct genl_ops OPS[] = {
          ...,
           {
          -	.policy = POLICY,
           },
          ...
          };
      
          @@
          identifier ops.OPS;
          expression ops.POLICY;
          identifier fam;
          expression M;
          @@
          struct genl_family fam = {
                  .ops = OPS,
                  .maxattr = M,
          +       .policy = POLICY,
                  ...
          };
      
      This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
      the cb->data as ops, which we want to change in a later genl patch.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b0f31f2
  17. 01 3月, 2019 1 次提交
  18. 15 2月, 2019 1 次提交
  19. 15 1月, 2019 1 次提交
  20. 09 11月, 2018 1 次提交
  21. 24 10月, 2018 1 次提交
    • D
      iov_iter: Separate type from direction and use accessor functions · aa563d7b
      David Howells 提交于
      In the iov_iter struct, separate the iterator type from the iterator
      direction and use accessor functions to access them in most places.
      
      Convert a bunch of places to use switch-statements to access them rather
      then chains of bitwise-AND statements.  This makes it easier to add further
      iterator types.  Also, this can be more efficient as to implement a switch
      of small contiguous integers, the compiler can use ~50% fewer compare
      instructions than it has to use bitwise-and instructions.
      
      Further, cease passing the iterator type into the iterator setup function.
      The iterator function can set that itself.  Only the direction is required.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      aa563d7b
  22. 05 9月, 2018 1 次提交
  23. 21 7月, 2018 1 次提交
  24. 17 7月, 2018 2 次提交
    • J
      nbd: handle unexpected replies better · 8f3ea359
      Josef Bacik 提交于
      If the server or network is misbehaving and we get an unexpected reply
      we can sometimes miss the request not being started and wait on a
      request and never get a response, or even double complete the same
      request.  Fix this by replacing the send_complete completion with just a
      per command lock.  Add a per command cookie as well so that we can know
      if we're getting a double completion for a previous event.  Also check
      to make sure we dont have REQUEUED set as that means we raced with the
      timeout handler and need to just let the retry occur.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8f3ea359
    • J
      nbd: don't requeue the same request twice. · d7d94d48
      Josef Bacik 提交于
      We can race with the snd timeout and the per-request timeout and end up
      requeuing the same request twice.  We can't use the send_complete
      completion to tell if everything is ok because we hold the tx_lock
      during send, so the timeout stuff will block waiting to mark the socket
      dead, and we could be marked complete and still requeue.  Instead add a
      flag to the socket so we know whether we've been requeued yet.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d7d94d48
  25. 21 6月, 2018 1 次提交
  26. 05 6月, 2018 2 次提交