1. 18 9月, 2019 2 次提交
    • X
      nbd: fix possible page fault for nbd disk · 8454d685
      Xiubo Li 提交于
      When the NBD_CFLAG_DESTROY_ON_DISCONNECT flag is set and at the same
      time when the socket is closed due to the server daemon is restarted,
      just before the last DISCONNET is totally done if we start a new connection
      by using the old nbd_index, there will be crashing randomly, like:
      
      <3>[  110.151949] block nbd1: Receive control failed (result -32)
      <1>[  110.152024] BUG: unable to handle page fault for address: 0000058000000840
      <1>[  110.152063] #PF: supervisor read access in kernel mode
      <1>[  110.152083] #PF: error_code(0x0000) - not-present page
      <6>[  110.152094] PGD 0 P4D 0
      <4>[  110.152106] Oops: 0000 [#1] SMP PTI
      <4>[  110.152120] CPU: 0 PID: 6698 Comm: kworker/u5:1 Kdump: loaded Not tainted 5.3.0-rc4+ #2
      <4>[  110.152136] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
      <4>[  110.152166] Workqueue: knbd-recv recv_work [nbd]
      <4>[  110.152187] RIP: 0010:__dev_printk+0xd/0x67
      <4>[  110.152206] Code: 10 e8 c5 fd ff ff 48 8b 4c 24 18 65 48 33 0c 25 28 00 [...]
      <4>[  110.152244] RSP: 0018:ffffa41581f13d18 EFLAGS: 00010206
      <4>[  110.152256] RAX: ffffa41581f13d30 RBX: ffff96dd7374e900 RCX: 0000000000000000
      <4>[  110.152271] RDX: ffffa41581f13d20 RSI: 00000580000007f0 RDI: ffffffff970ec24f
      <4>[  110.152285] RBP: ffffa41581f13d80 R08: ffff96dd7fc17908 R09: 0000000000002e56
      <4>[  110.152299] R10: ffffffff970ec24f R11: 0000000000000003 R12: ffff96dd7374e900
      <4>[  110.152313] R13: 0000000000000000 R14: ffff96dd7374e9d8 R15: ffff96dd6e3b02c8
      <4>[  110.152329] FS:  0000000000000000(0000) GS:ffff96dd7fc00000(0000) knlGS:0000000000000000
      <4>[  110.152362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      <4>[  110.152383] CR2: 0000058000000840 CR3: 0000000067cc6002 CR4: 00000000001606f0
      <4>[  110.152401] Call Trace:
      <4>[  110.152422]  _dev_err+0x6c/0x83
      <4>[  110.152435]  nbd_read_stat.cold+0xda/0x578 [nbd]
      <4>[  110.152448]  ? __switch_to_asm+0x34/0x70
      <4>[  110.152468]  ? __switch_to_asm+0x40/0x70
      <4>[  110.152478]  ? __switch_to_asm+0x34/0x70
      <4>[  110.152491]  ? __switch_to_asm+0x40/0x70
      <4>[  110.152501]  ? __switch_to_asm+0x34/0x70
      <4>[  110.152511]  ? __switch_to_asm+0x40/0x70
      <4>[  110.152522]  ? __switch_to_asm+0x34/0x70
      <4>[  110.152533]  recv_work+0x35/0x9e [nbd]
      <4>[  110.152547]  process_one_work+0x19d/0x340
      <4>[  110.152558]  worker_thread+0x50/0x3b0
      <4>[  110.152568]  kthread+0xfb/0x130
      <4>[  110.152577]  ? process_one_work+0x340/0x340
      <4>[  110.152609]  ? kthread_park+0x80/0x80
      <4>[  110.152637]  ret_from_fork+0x35/0x40
      
      This is very easy to reproduce by running the nbd-runner.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NXiubo Li <xiubli@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8454d685
    • X
      nbd: rename the runtime flags as NBD_RT_ prefixed · ec76a7b9
      Xiubo Li 提交于
      Preparing for the destory when disconnecting crash fixing.
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NXiubo Li <xiubli@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ec76a7b9
  2. 21 8月, 2019 5 次提交
  3. 31 7月, 2019 1 次提交
    • M
      nbd: replace kill_bdev() with __invalidate_device() again · 2b5c8f00
      Munehisa Kamata 提交于
      Commit abbbdf12 ("replace kill_bdev() with __invalidate_device()")
      once did this, but 29eaadc0 ("nbd: stop using the bdev everywhere")
      resurrected kill_bdev() and it has been there since then. So buffer_head
      mappings still get killed on a server disconnection, and we can still
      hit the BUG_ON on a filesystem on the top of the nbd device.
      
        EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
        block nbd0: Receive control failed (result -32)
        block nbd0: shutting down sockets
        print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
        EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
        print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
        EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
        EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
        ------------[ cut here ]------------
        kernel BUG at fs/buffer.c:3057!
        invalid opcode: 0000 [#1] SMP PTI
        CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
        Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
        RIP: 0010:submit_bh_wbc+0x18b/0x190
        ...
        Call Trace:
         jbd2_write_superblock+0xf1/0x230 [jbd2]
         ? account_entity_enqueue+0xc5/0xf0
         jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
         jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
         ? __switch_to_asm+0x40/0x70
         ...
         ? lock_timer_base+0x67/0x80
         kjournald2+0x121/0x360 [jbd2]
         ? remove_wait_queue+0x60/0x60
         kthread+0xf8/0x130
         ? commit_timeout+0x10/0x10 [jbd2]
         ? kthread_bind+0x10/0x10
         ret_from_fork+0x35/0x40
      
      With __invalidate_device(), I no longer hit the BUG_ON with sync or
      unmount on the disconnected device.
      
      Fixes: 29eaadc0 ("nbd: stop using the bdev everywhere")
      Cc: linux-block@vger.kernel.org
      Cc: Ratna Manoj Bolla <manoj.br@gmail.com>
      Cc: nbd@other.debian.org
      Cc: stable@vger.kernel.org
      Cc: David Woodhouse <dwmw@amazon.com>
      Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NMunehisa Kamata <kamatam@amazon.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2b5c8f00
  4. 11 7月, 2019 2 次提交
  5. 31 5月, 2019 1 次提交
  6. 28 4月, 2019 3 次提交
    • J
      genetlink: optionally validate strictly/dumps · ef6243ac
      Johannes Berg 提交于
      Add options to strictly validate messages and dump messages,
      sometimes perhaps validating dump messages non-strictly may
      be required, so add an option for that as well.
      
      Since none of this can really be applied to existing commands,
      set the options everwhere using the following spatch:
      
          @@
          identifier ops;
          expression X;
          @@
          struct genl_ops ops[] = {
          ...,
           {
                  .cmd = X,
          +       .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
                  ...
           },
          ...
          };
      
      For new commands one should just not copy the .validate 'opt-out'
      flags and thus get strict validation.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef6243ac
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de
  7. 27 4月, 2019 2 次提交
  8. 22 3月, 2019 1 次提交
    • J
      genetlink: make policy common to family · 3b0f31f2
      Johannes Berg 提交于
      Since maxattr is common, the policy can't really differ sanely,
      so make it common as well.
      
      The only user that did in fact manage to make a non-common policy
      is taskstats, which has to be really careful about it (since it's
      still using a common maxattr!). This is no longer supported, but
      we can fake it using pre_doit.
      
      This reduces the size of e.g. nl80211.o (which has lots of commands):
      
         text	   data	    bss	    dec	    hex	filename
       398745	  14323	   2240	 415308	  6564c	net/wireless/nl80211.o (before)
       397913	  14331	   2240	 414484	  65314	net/wireless/nl80211.o (after)
      --------------------------------
         -832      +8       0    -824
      
      Which is obviously just 8 bytes for each command, and an added 8
      bytes for the new policy pointer. I'm not sure why the ops list is
      counted as .text though.
      
      Most of the code transformations were done using the following spatch:
          @ops@
          identifier OPS;
          expression POLICY;
          @@
          struct genl_ops OPS[] = {
          ...,
           {
          -	.policy = POLICY,
           },
          ...
          };
      
          @@
          identifier ops.OPS;
          expression ops.POLICY;
          identifier fam;
          expression M;
          @@
          struct genl_family fam = {
                  .ops = OPS,
                  .maxattr = M,
          +       .policy = POLICY,
                  ...
          };
      
      This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
      the cb->data as ops, which we want to change in a later genl patch.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b0f31f2
  9. 01 3月, 2019 1 次提交
  10. 15 2月, 2019 1 次提交
  11. 15 1月, 2019 1 次提交
  12. 09 11月, 2018 1 次提交
  13. 24 10月, 2018 1 次提交
    • D
      iov_iter: Separate type from direction and use accessor functions · aa563d7b
      David Howells 提交于
      In the iov_iter struct, separate the iterator type from the iterator
      direction and use accessor functions to access them in most places.
      
      Convert a bunch of places to use switch-statements to access them rather
      then chains of bitwise-AND statements.  This makes it easier to add further
      iterator types.  Also, this can be more efficient as to implement a switch
      of small contiguous integers, the compiler can use ~50% fewer compare
      instructions than it has to use bitwise-and instructions.
      
      Further, cease passing the iterator type into the iterator setup function.
      The iterator function can set that itself.  Only the direction is required.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      aa563d7b
  14. 05 9月, 2018 1 次提交
  15. 21 7月, 2018 1 次提交
  16. 17 7月, 2018 2 次提交
    • J
      nbd: handle unexpected replies better · 8f3ea359
      Josef Bacik 提交于
      If the server or network is misbehaving and we get an unexpected reply
      we can sometimes miss the request not being started and wait on a
      request and never get a response, or even double complete the same
      request.  Fix this by replacing the send_complete completion with just a
      per command lock.  Add a per command cookie as well so that we can know
      if we're getting a double completion for a previous event.  Also check
      to make sure we dont have REQUEUED set as that means we raced with the
      timeout handler and need to just let the retry occur.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8f3ea359
    • J
      nbd: don't requeue the same request twice. · d7d94d48
      Josef Bacik 提交于
      We can race with the snd timeout and the per-request timeout and end up
      requeuing the same request twice.  We can't use the send_complete
      completion to tell if everything is ok because we hold the tx_lock
      during send, so the timeout stuff will block waiting to mark the socket
      dead, and we could be marked complete and still requeue.  Instead add a
      flag to the socket so we know whether we've been requeued yet.
      Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d7d94d48
  17. 21 6月, 2018 1 次提交
  18. 05 6月, 2018 2 次提交
  19. 31 5月, 2018 2 次提交
  20. 29 5月, 2018 2 次提交
  21. 25 5月, 2018 1 次提交
  22. 24 5月, 2018 1 次提交
  23. 23 5月, 2018 1 次提交
  24. 17 5月, 2018 4 次提交