提交 · dcb77e4b274b8f13ac6482dfb09160cd2fae9a40 · openeuler / Kernel

26 10月, 2019 3 次提交

nbd: verify socket is supported during setup · cf1b2326

由 Mike Christie 提交于 10月 17, 2019

nbd requires socket families to support the shutdown method so the nbd
recv workqueue can be woken up from its sock_recvmsg call. If the socket
does not support the callout we will leave recv works running or get hangs
later when the device or module is removed.

This adds a check during socket connection/reconnection to make sure the
socket being passed in supports the needed callout.

Reported-by: syzbot+24c12fa8d218ed26011a@syzkaller.appspotmail.com
Fixes: e9e006f5 ("nbd: fix max number of supported devs")
Tested-by: NRichard W.M. Jones <rjones@redhat.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cf1b2326

nbd: handle racing with error'ed out commands · 7ce23e8e

由 Josef Bacik 提交于 10月 21, 2019

We hit the following warning in production

print_req_error: I/O error, dev nbd0, sector 7213934408 flags 80700
------------[ cut here ]------------
refcount_t: underflow; use-after-free.
WARNING: CPU: 25 PID: 32407 at lib/refcount.c:190 refcount_sub_and_test_checked+0x53/0x60
Workqueue: knbd-recv recv_work [nbd]
RIP: 0010:refcount_sub_and_test_checked+0x53/0x60
Call Trace:
 blk_mq_free_request+0xb7/0xf0
 blk_mq_complete_request+0x62/0xf0
 recv_work+0x29/0xa1 [nbd]
 process_one_work+0x1f5/0x3f0
 worker_thread+0x2d/0x3d0
 ? rescuer_thread+0x340/0x340
 kthread+0x111/0x130
 ? kthread_create_on_node+0x60/0x60
 ret_from_fork+0x1f/0x30
---[ end trace b079c3c67f98bb7c ]---

This was preceded by us timing out everything and shutting down the
sockets for the device.  The problem is we had a request in the queue at
the same time, so we completed the request twice.  This can actually
happen in a lot of cases, we fail to get a ref on our config, we only
have one connection and just error out the command, etc.

Fix this by checking cmd->status in nbd_read_stat.  We only change this
under the cmd->lock, so we are safe to check this here and see if we've
already error'ed this command out, which would indicate that we've
completed it as well.
Reviewed-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7ce23e8e

nbd: protect cmd->status with cmd->lock · de6346ec

由 Josef Bacik 提交于 10月 21, 2019

We already do this for the most part, except in timeout and clear_req.
For the timeout case we take the lock after we grab a ref on the config,
but that isn't really necessary because we're safe to touch the cmd at
this point, so just move the order around.

For the clear_req cause this is initiated by the user, so again is safe.
Reviewed-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

de6346ec

10 10月, 2019 1 次提交

nbd: fix possible sysfs duplicate warning · 86248810

由 Xiubo Li 提交于 9月 19, 2019

1. nbd_put takes the mutex and drops nbd->ref to 0. It then does
idr_remove and drops the mutex.

2. nbd_genl_connect takes the mutex. idr_find/idr_for_each fails
to find an existing device, so it does nbd_dev_add.

3. just before the nbd_put could call nbd_dev_remove or not finished
totally, but if nbd_dev_add try to add_disk, we can hit:

debugfs: Directory 'nbd1' with parent 'block' already present!

This patch will make sure all the disk add/remove stuff are done
by holding the nbd_index_mutex lock.
Reported-by: NMike Christie <mchristi@redhat.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NXiubo Li <xiubli@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

86248810

18 9月, 2019 2 次提交

nbd: fix possible page fault for nbd disk · 8454d685

由 Xiubo Li 提交于 9月 17, 2019

When the NBD_CFLAG_DESTROY_ON_DISCONNECT flag is set and at the same
time when the socket is closed due to the server daemon is restarted,
just before the last DISCONNET is totally done if we start a new connection
by using the old nbd_index, there will be crashing randomly, like:

<3>[  110.151949] block nbd1: Receive control failed (result -32)
<1>[  110.152024] BUG: unable to handle page fault for address: 0000058000000840
<1>[  110.152063] #PF: supervisor read access in kernel mode
<1>[  110.152083] #PF: error_code(0x0000) - not-present page
<6>[  110.152094] PGD 0 P4D 0
<4>[  110.152106] Oops: 0000 [#1] SMP PTI
<4>[  110.152120] CPU: 0 PID: 6698 Comm: kworker/u5:1 Kdump: loaded Not tainted 5.3.0-rc4+ #2
<4>[  110.152136] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
<4>[  110.152166] Workqueue: knbd-recv recv_work [nbd]
<4>[  110.152187] RIP: 0010:__dev_printk+0xd/0x67
<4>[  110.152206] Code: 10 e8 c5 fd ff ff 48 8b 4c 24 18 65 48 33 0c 25 28 00 [...]
<4>[  110.152244] RSP: 0018:ffffa41581f13d18 EFLAGS: 00010206
<4>[  110.152256] RAX: ffffa41581f13d30 RBX: ffff96dd7374e900 RCX: 0000000000000000
<4>[  110.152271] RDX: ffffa41581f13d20 RSI: 00000580000007f0 RDI: ffffffff970ec24f
<4>[  110.152285] RBP: ffffa41581f13d80 R08: ffff96dd7fc17908 R09: 0000000000002e56
<4>[  110.152299] R10: ffffffff970ec24f R11: 0000000000000003 R12: ffff96dd7374e900
<4>[  110.152313] R13: 0000000000000000 R14: ffff96dd7374e9d8 R15: ffff96dd6e3b02c8
<4>[  110.152329] FS:  0000000000000000(0000) GS:ffff96dd7fc00000(0000) knlGS:0000000000000000
<4>[  110.152362] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  110.152383] CR2: 0000058000000840 CR3: 0000000067cc6002 CR4: 00000000001606f0
<4>[  110.152401] Call Trace:
<4>[  110.152422]  _dev_err+0x6c/0x83
<4>[  110.152435]  nbd_read_stat.cold+0xda/0x578 [nbd]
<4>[  110.152448]  ? __switch_to_asm+0x34/0x70
<4>[  110.152468]  ? __switch_to_asm+0x40/0x70
<4>[  110.152478]  ? __switch_to_asm+0x34/0x70
<4>[  110.152491]  ? __switch_to_asm+0x40/0x70
<4>[  110.152501]  ? __switch_to_asm+0x34/0x70
<4>[  110.152511]  ? __switch_to_asm+0x40/0x70
<4>[  110.152522]  ? __switch_to_asm+0x34/0x70
<4>[  110.152533]  recv_work+0x35/0x9e [nbd]
<4>[  110.152547]  process_one_work+0x19d/0x340
<4>[  110.152558]  worker_thread+0x50/0x3b0
<4>[  110.152568]  kthread+0xfb/0x130
<4>[  110.152577]  ? process_one_work+0x340/0x340
<4>[  110.152609]  ? kthread_park+0x80/0x80
<4>[  110.152637]  ret_from_fork+0x35/0x40

This is very easy to reproduce by running the nbd-runner.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NXiubo Li <xiubli@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8454d685

nbd: rename the runtime flags as NBD_RT_ prefixed · ec76a7b9

由 Xiubo Li 提交于 9月 17, 2019

Preparing for the destory when disconnecting crash fixing.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NXiubo Li <xiubli@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ec76a7b9

21 8月, 2019 5 次提交

nbd: fix max number of supported devs · e9e006f5

由 Mike Christie 提交于 8月 04, 2019

This fixes a bug added in 4.10 with commit:

commit 9561a7ad
Author: Josef Bacik <jbacik@fb.com>
Date:   Tue Nov 22 14:04:40 2016 -0500

    nbd: add multi-connection support

that limited the number of devices to 256. Before the patch we could
create 1000s of devices, but the patch switched us from using our
own thread to using a work queue which has a default limit of 256
active works.

The problem is that our recv_work function sits in a loop until
disconnection but only handles IO for one connection. The work is
started when the connection is started/restarted, but if we end up
creating 257 or more connections, the queue_work call just queues
connection257+'s recv_work and that waits for connection 1 - 256's
recv_work to be disconnected and that work instance completing.

Instead of reverting back to kthreads, this has us allocate a
workqueue_struct per device, so we can block in the work.

Cc: stable@vger.kernel.org
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e9e006f5

nbd: fix zero cmd timeout handling v2 · 2da22da5

由 Mike Christie 提交于 8月 13, 2019

This fixes a regression added in 4.9 with commit:

commit 0eadf37a
Author: Josef Bacik <jbacik@fb.com>
Date:   Thu Sep 8 12:33:40 2016 -0700

    nbd: allow block mq to deal with timeouts

where before the patch userspace would set the timeout to 0 to disable
it. With the above patch, a zero timeout tells the block layer to use
the default value of 30 seconds. For setups where commands can take a
long time or experience transient issues like network disruptions this
then results in IO errors being sent to the application.

To fix this, the patch still uses the common block layer timeout
framework, but if zero is set, nbd just logs a message and then resets
the timer when it expires.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2da22da5

nbd: add missing config put · 887e975c

由 Mike Christie 提交于 8月 13, 2019

Fix bug added with the patch:

commit 8f3ea359
Author: Josef Bacik <josef@toxicpanda.com>
Date:   Mon Jul 16 12:11:35 2018 -0400

    nbd: handle unexpected replies better

where if the timeout handler runs when the completion path is and we fail
to grab the mutex in the timeout handler we will leave a config reference
and cannot free the config later.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

887e975c

nbd: add function to convert blk req op to nbd cmd · 00514677

由 Mike Christie 提交于 8月 13, 2019

This adds a helper function to convert a block req op to a nbd cmd type.
It will be used in the last patch to log the type in the timeout
handler.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

00514677

nbd: add set cmd timeout helper · 55313e92

由 Mike Christie 提交于 8月 13, 2019

Add a helper to set the cmd timeout. It does not really do a lot now,
but will be more useful in the next patches.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

55313e92

31 7月, 2019 1 次提交

nbd: replace kill_bdev() with __invalidate_device() again · 2b5c8f00

由 Munehisa Kamata 提交于 7月 31, 2019

Commit abbbdf12 ("replace kill_bdev() with __invalidate_device()")
once did this, but 29eaadc0 ("nbd: stop using the bdev everywhere")
resurrected kill_bdev() and it has been there since then. So buffer_head
mappings still get killed on a server disconnection, and we can still
hit the BUG_ON on a filesystem on the top of the nbd device.

  EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
  block nbd0: Receive control failed (result -32)
  block nbd0: shutting down sockets
  print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
  EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
  print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
  EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
  EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
  ------------[ cut here ]------------
  kernel BUG at fs/buffer.c:3057!
  invalid opcode: 0000 [#1] SMP PTI
  CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
  Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
  RIP: 0010:submit_bh_wbc+0x18b/0x190
  ...
  Call Trace:
   jbd2_write_superblock+0xf1/0x230 [jbd2]
   ? account_entity_enqueue+0xc5/0xf0
   jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
   jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
   ? __switch_to_asm+0x40/0x70
   ...
   ? lock_timer_base+0x67/0x80
   kjournald2+0x121/0x360 [jbd2]
   ? remove_wait_queue+0x60/0x60
   kthread+0xf8/0x130
   ? commit_timeout+0x10/0x10 [jbd2]
   ? kthread_bind+0x10/0x10
   ret_from_fork+0x35/0x40

With __invalidate_device(), I no longer hit the BUG_ON with sync or
unmount on the disconnected device.

Fixes: 29eaadc0 ("nbd: stop using the bdev everywhere")
Cc: linux-block@vger.kernel.org
Cc: Ratna Manoj Bolla <manoj.br@gmail.com>
Cc: nbd@other.debian.org
Cc: stable@vger.kernel.org
Cc: David Woodhouse <dwmw@amazon.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMunehisa Kamata <kamatam@amazon.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2b5c8f00

11 7月, 2019 2 次提交

nbd: add netlink reconfigure resize support · 4ddeaae8

由 Mike Christie 提交于 5月 29, 2019

If the device is setup with ioctl we can resize the device after the
initial setup, but if the device is setup with netlink we cannot use the
resize related ioctls and there is no netlink reconfigure size ATTR
handling code.

This patch adds netlink reconfigure resize support to match the ioctl
interface.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

4ddeaae8

nbd: fix crash when the blksize is zero · 553768d1

由 Xiubo Li 提交于 5月 29, 2019

This will allow the blksize to be set zero and then use 1024 as
default.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NXiubo Li <xiubli@redhat.com>
[fix to use goto out instead of return in genl_connect]
Signed-off-by: NMike Christie <mchristi@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

553768d1

31 5月, 2019 1 次提交

treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 127 · eb1fe3bf

由 Thomas Gleixner 提交于 5月 24, 2019

Based on 1 normalized pattern(s):

  this file is released under gplv2 or later

extracted by the scancode license scanner the SPDX license identifier

  GPL-2.0-or-later

has been chosen to replace the boilerplate/reference in 1 file(s).
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NRichard Fontana <rfontana@redhat.com>
Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
Reviewed-by: NAllison Randal <allison@lohutok.net>
Reviewed-by: NArmijn Hemel <armijn@tjaldur.nl>
Cc: linux-spdx@vger.kernel.org
Link: https://lkml.kernel.org/r/20190527063114.295960793@linutronix.de
Link: https://lkml.kernel.org/r/20190524100843.018830140@linutronix.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

eb1fe3bf

28 4月, 2019 3 次提交

genetlink: optionally validate strictly/dumps · ef6243ac

由 Johannes Berg 提交于 4月 26, 2019

Add options to strictly validate messages and dump messages,
sometimes perhaps validating dump messages non-strictly may
be required, so add an option for that as well.

Since none of this can really be applied to existing commands,
set the options everwhere using the following spatch:

    @@
    identifier ops;
    expression X;
    @@
    struct genl_ops ops[] = {
    ...,
     {
            .cmd = X,
    +       .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
            ...
     },
    ...
    };

For new commands one should just not copy the .validate 'opt-out'
flags and thus get strict validation.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef6243ac

netlink: make validation more configurable for future strictness · 8cb08174

由 Johannes Berg 提交于 4月 26, 2019

We currently have two levels of strict validation:

 1) liberal (default)
     - undefined (type >= max) & NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted
     - garbage at end of message accepted
 2) strict (opt-in)
     - NLA_UNSPEC attributes accepted
     - attribute length >= expected accepted

Split out parsing strictness into four different options:
 * TRAILING     - check that there's no trailing data after parsing
                  attributes (in message or nested)
 * MAXTYPE      - reject attrs > max known type
 * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
 * STRICT_ATTRS - strictly validate attribute size

The default for future things should be *everything*.
The current *_strict() is a combination of TRAILING and MAXTYPE,
and is renamed to _deprecated_strict().
The current regular parsing has none of this, and is renamed to
*_parse_deprecated().

Additionally it allows us to selectively set one of the new flags
even on old policies. Notably, the UNSPEC flag could be useful in
this case, since it can be arranged (by filling in the policy) to
not be an incompatible userspace ABI change, but would then going
forward prevent forgetting attribute entries. Similar can apply
to the POLICY flag.

We end up with the following renames:
 * nla_parse           -> nla_parse_deprecated
 * nla_parse_strict    -> nla_parse_deprecated_strict
 * nlmsg_parse         -> nlmsg_parse_deprecated
 * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
 * nla_parse_nested    -> nla_parse_nested_deprecated
 * nla_validate_nested -> nla_validate_nested_deprecated

Using spatch, of course:
    @@
    expression TB, MAX, HEAD, LEN, POL, EXT;
    @@
    -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
    +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, TB, MAX, POL, EXT;
    @@
    -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
    +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)

    @@
    expression TB, MAX, NLA, POL, EXT;
    @@
    -nla_parse_nested(TB, MAX, NLA, POL, EXT)
    +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)

    @@
    expression START, MAX, POL, EXT;
    @@
    -nla_validate_nested(START, MAX, POL, EXT)
    +nla_validate_nested_deprecated(START, MAX, POL, EXT)

    @@
    expression NLH, HDRLEN, MAX, POL, EXT;
    @@
    -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
    +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)

For this patch, don't actually add the strict, non-renamed versions
yet so that it breaks compile if I get it wrong.

Also, while at it, make nla_validate and nla_parse go down to a
common __nla_validate_parse() function to avoid code duplication.

Ultimately, this allows us to have very strict validation for every
new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
next patch, while existing things will continue to work as is.

In effect then, this adds fully strict validation for any new command.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8cb08174

netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de

由 Michal Kubecek 提交于 4月 26, 2019

Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
netlink based interfaces (including recently added ones) are still not
setting it in kernel generated messages. Without the flag, message parsers
not aware of attribute semantics (e.g. wireshark dissector or libmnl's
mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
the structure of their contents.

Unfortunately we cannot just add the flag everywhere as there may be
userspace applications which check nlattr::nla_type directly rather than
through a helper masking out the flags. Therefore the patch renames
nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
are rewritten to use nla_nest_start().

Except for changes in include/net/netlink.h, the patch was generated using
this semantic patch:

@@ expression E1, E2; @@
-nla_nest_start(E1, E2)
+nla_nest_start_noflag(E1, E2)

@@ expression E1, E2; @@
-nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
+nla_nest_start(E1, E2)
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Acked-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae0be8de

27 4月, 2019 2 次提交

nbd: add tracepoints for send/receive timing · 2abd2de7

由 Andrew Hall 提交于 4月 26, 2019

This adds four tracepoints to nbd, enabling separate tracing of payload
and header sending/receipt.

In the send path for headers that have already been sent, we also
explicitly initialize the handle so it can be referenced by the later
tracepoint.
Signed-off-by: NAndrew Hall <hall@fb.com>
Signed-off-by: NMatt Mullins <mmullins@fb.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

2abd2de7

nbd: trace sending nbd requests · ea106722

由 Matt Mullins 提交于 4月 26, 2019

This adds a tracepoint that can both observe the nbd request being sent
to the server, as well as modify that request , e.g., setting a flag in
the request that will cause the server to collect detailed tracing data.

The struct request * being handled is included to permit correlation
with the block tracepoints.
Signed-off-by: NMatt Mullins <mmullins@fb.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

ea106722

22 3月, 2019 1 次提交

genetlink: make policy common to family · 3b0f31f2

由 Johannes Berg 提交于 3月 21, 2019

Since maxattr is common, the policy can't really differ sanely,
so make it common as well.

The only user that did in fact manage to make a non-common policy
is taskstats, which has to be really careful about it (since it's
still using a common maxattr!). This is no longer supported, but
we can fake it using pre_doit.

This reduces the size of e.g. nl80211.o (which has lots of commands):

   text	   data	    bss	    dec	    hex	filename
 398745	  14323	   2240	 415308	  6564c	net/wireless/nl80211.o (before)
 397913	  14331	   2240	 414484	  65314	net/wireless/nl80211.o (after)
--------------------------------
   -832      +8       0    -824

Which is obviously just 8 bytes for each command, and an added 8
bytes for the new policy pointer. I'm not sure why the ops list is
counted as .text though.

Most of the code transformations were done using the following spatch:
    @ops@
    identifier OPS;
    expression POLICY;
    @@
    struct genl_ops OPS[] = {
    ...,
     {
    -	.policy = POLICY,
     },
    ...
    };

    @@
    identifier ops.OPS;
    expression ops.POLICY;
    identifier fam;
    expression M;
    @@
    struct genl_family fam = {
            .ops = OPS,
            .maxattr = M,
    +       .policy = POLICY,
            ...
    };

This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
the cb->data as ops, which we want to change in a later genl patch.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b0f31f2

01 3月, 2019 1 次提交

nbd: propagate genlmsg_reply return code · cd46eb89

由 Li RongQing 提交于 2月 19, 2019

genlmsg_reply can fail, so propagate its return code
Signed-off-by: NLi RongQing <lirongqing@baidu.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cd46eb89

15 2月, 2019 1 次提交

block: kill BLK_MQ_F_SG_MERGE · 56d18f62

由 Ming Lei 提交于 2月 15, 2019

QUEUE_FLAG_NO_SG_MERGE has been killed, so kill BLK_MQ_F_SG_MERGE too.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

56d18f62

15 1月, 2019 1 次提交

nbd: Use set_blocksize() to set device blocksize · c8a83a6b

由 Jan Kara 提交于 1月 14, 2019

NBD can update block device block size implicitely through
bd_set_size(). Make it explicitely set blocksize with set_blocksize() as
this behavior of bd_set_size() is going away.

CC: Josef Bacik <jbacik@fb.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c8a83a6b

09 11月, 2018 1 次提交

blk-mq-tag: change busy_iter_fn to return whether to continue or not · 7baa8572

由 Jens Axboe 提交于 11月 08, 2018

We have this functionality in sbitmap, but we don't export it in
blk-mq for users of the tags busy iteration. This can be useful
for stopping the iteration, if the caller doesn't need to find
more requests.
Reviewed-by: NMike Snitzer <snitzer@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7baa8572

24 10月, 2018 1 次提交

iov_iter: Separate type from direction and use accessor functions · aa563d7b

由 David Howells 提交于 10月 20, 2018

In the iov_iter struct, separate the iterator type from the iterator
direction and use accessor functions to access them in most places.

Convert a bunch of places to use switch-statements to access them rather
then chains of bitwise-AND statements. This makes it easier to add further
iterator types. Also, this can be more efficient as to implement a switch
of small contiguous integers, the compiler can use ~50% fewer compare
instructions than it has to use bitwise-and instructions.

Further, cease passing the iterator type into the iterator setup function.
The iterator function can set that itself. Only the direction is required.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

aa563d7b

05 9月, 2018 1 次提交

nbd: don't allow invalid blocksize settings · bc811f05

由 Jens Axboe 提交于 9月 04, 2018

syzbot reports a divide-by-zero off the NBD_SET_BLKSIZE ioctl.
We need proper validation of the input here. Not just if it's
zero, but also if the value is a power-of-2 and in a valid
range. Add that.

Cc: stable@vger.kernel.org
Reported-by: Nsyzbot <syzbot+25dbecbec1e62c6b0dd4@syzkaller.appspotmail.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

bc811f05

21 7月, 2018 1 次提交

nbd: constify nla_policy · a86c4120

由 Stephen Hemminger 提交于 7月 18, 2018

The netlink policy should be const like other drivers.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a86c4120

17 7月, 2018 2 次提交

nbd: handle unexpected replies better · 8f3ea359

由 Josef Bacik 提交于 7月 16, 2018

If the server or network is misbehaving and we get an unexpected reply
we can sometimes miss the request not being started and wait on a
request and never get a response, or even double complete the same
request. Fix this by replacing the send_complete completion with just a
per command lock. Add a per command cookie as well so that we can know
if we're getting a double completion for a previous event. Also check
to make sure we dont have REQUEUED set as that means we raced with the
timeout handler and need to just let the retry occur.
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8f3ea359

nbd: don't requeue the same request twice. · d7d94d48

由 Josef Bacik 提交于 7月 16, 2018

We can race with the snd timeout and the per-request timeout and end up
requeuing the same request twice. We can't use the send_complete
completion to tell if everything is ok because we hold the tx_lock
during send, so the timeout stuff will block waiting to mark the socket
dead, and we could be marked complete and still requeue. Instead add a
flag to the socket so we know whether we've been requeued yet.
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d7d94d48

21 6月, 2018 1 次提交

nbd: Add the nbd NBD_DISCONNECT_ON_CLOSE config flag. · 08ba91ee

由 Doron Roberts-Kedes 提交于 6月 15, 2018

If NBD_DISCONNECT_ON_CLOSE is set on a device, then the driver will
issue a disconnect from nbd_release if the device has no remaining
bdev->bd_openers.

Fix ret val so reconfigure with only setting the flag succeeds.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NDoron Roberts-Kedes <doronrk@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

08ba91ee

05 6月, 2018 2 次提交

nbd: set discard_alignment to the granularity · 07ce213f

由 Josef Bacik 提交于 6月 05, 2018

Technically we should be able to get away with 0 as the
discard_alignment, but there's no way currently for the protocol to
indicate different alignments, and in real life most disks have
discard_alignment == discard_granularity.  Just set our alignment to our
blocksize to make sure discards will actually work properly with 4k
drives.
Signed-off-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

07ce213f

nbd: Consistently use request pointer in debug messages. · ee57a05c

由 Kevin Vigor 提交于 6月 04, 2018

Existing dev_dbg messages sometimes identify request using request
pointer, sometimes using nbd_cmd pointer. This makes it hard to
follow request flow. Consistently use request pointer instead.
Reviewed-by: NJosef Bacik <jbacik@toxicpanda.com>
Signed-off-by: NKevin Vigor <kvigor@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

ee57a05c

31 5月, 2018 2 次提交

blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter · d250bf4e

由 Christoph Hellwig 提交于 5月 30, 2018

We already check for started commands in all callbacks, but we should
also protect against already completed commands.  Do this by taking
the checks to common code.
Acked-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d250bf4e

nbd: clear DISCONNECT_REQUESTED flag once disconnection occurs. · 5e3c3a7e

由 Kevin Vigor 提交于 5月 30, 2018

When a userspace client requests a NBD device be disconnected, the
DISCONNECT_REQUESTED flag is set. While this flag is set, the driver
will not inform userspace when a connection is closed.

Unfortunately the flag was never cleared, so once a disconnect was
requested the driver would thereafter never tell userspace about a
closed connection. Thus when connections failed due to timeout, no
attempt to reconnect was made and eventually the device would fail.

Fix by clearing the DISCONNECT_REQUESTED flag (and setting the
DISCONNECTED flag) once all connections are closed.
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NKevin Vigor <kvigor@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5e3c3a7e

29 5月, 2018 2 次提交

nbd: complete requests from ->timeout · e5eab017

由 Christoph Hellwig 提交于 5月 29, 2018

By completing the request entirely in the driver we can remove the
BLK_EH_HANDLED return value and thus the split responsibility between the
driver and the block layer that has been causing trouble.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

e5eab017

block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONE · 6600593c

由 Christoph Hellwig 提交于 5月 29, 2018

The BLK_EH_NOT_HANDLED implies nothing happen, but very often that
is not what is happening - instead the driver already completed the
command.  Fix the symbolic name to reflect that a little better.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NHannes Reinecke <hare@suse.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6600593c

25 5月, 2018 1 次提交

block drivers/block: Use octal not symbolic permissions · 5657a819

由 Joe Perches 提交于 5月 24, 2018

Convert the S_<FOO> symbolic permissions to their octal equivalents as
using octal and not symbolic permissions is preferred by many as more
readable.

see: https://lkml.org/lkml/2016/8/2/1945

Done with automated conversion via:
$ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...>

Miscellanea:

o Wrapped modified multi-line calls to a single line where appropriate
o Realign modified multi-line calls to open parenthesis
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

5657a819

24 5月, 2018 1 次提交

nbd: set discard granularity properly · 6df133a1

由 Josef Bacik 提交于 5月 23, 2018

For some reason we had discard granularity set to 512 always even when
discards were disabled.  Fix this by having the default be 0, and then
if we turn it on set the discard granularity to the blocksize.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

6df133a1

23 5月, 2018 1 次提交

block/ndb: add WQ_UNBOUND to the knbd-recv workqueue · 2189c97c

由 Dan Melnic 提交于 9月 18, 2017

Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound
to a single CPU that is selected at device creation time.
Signed-off-by: NDan Melnic <dmm@fb.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

2189c97c

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功