提交 · 07b05bcc3213ac9f8c28c9d835b4bf3d5798cc60 · openeuler / Kernel

22 9月, 2018 3 次提交

blkcg: convert blkg_lookup_create to find closest blkg · 07b05bcc

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

There are several scenarios where blkg_lookup_create can fail. Examples
include the blkcg dying, request_queue is dying, or simply being OOM. At
the end of the day, most handle this by simply falling back to the
q->root_blkg and calling it a day.

This patch implements the notion of closest blkg. During
blkg_lookup_create, if it fails to create, return the closest blkg
found or the q->root_blkg. blkg_try_get_closest is introduced and used
during association so a bio is always attached to a blkg.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

07b05bcc

blkcg: update blkg_lookup_create to do locking · 49f4c2dc

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

To know when to create a blkg, the general pattern is to do a
blkg_lookup and if that fails, lock and then do a lookup again and if
that fails finally create. It doesn't make much sense for everyone who
wants to do creation to write this themselves.

This changes blkg_lookup_create to do locking and implement this
pattern. The old blkg_lookup_create is renamed to __blkg_lookup_create.
If a call site wants to do its own error handling or already owns the
queue lock, they can use __blkg_lookup_create. This will be used in
upcoming patches.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Reviewed-by: NJosef Bacik <josef@toxicpanda.com>
Acked-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

49f4c2dc

blkcg: fix ref count issue with bio_blkcg using task_css · 27e6fa99

由 Dennis Zhou (Facebook) 提交于 9月 11, 2018

The accessor function bio_blkcg either returns the blkcg associated with
the bio or finds one in the current context. This can cause an issue
when trying to associate a bio with a blkcg. Particularly, it's the
third case that is problematic:

	return css_to_blkcg(task_css(current, io_cgrp_id));

As the above may race against task migration and the cgroup exiting, it
is not always ok to take a reference on the blkcg returned from
bio_blkcg.

This patch adds association ahead of calling bio_blkcg rather than
after. This makes association a required and explicit step along the
code paths for calling bio_blkcg. blk_get_rl is modified as well to get
a reference to the blkcg it may use and blk_put_rl will always put the
reference back. Association is also moved above the bio_blkcg call to
ensure it will not return NULL in blk-iolatency.

BFQ and CFQ utilize this flaw, but due to the complexity, I do not want
to address this in this series. I've created a private version of the
function with notes not to use it describing the flaw. Hopefully soon,
that code can be cleaned up.
Signed-off-by: NDennis Zhou <dennisszhou@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

27e6fa99

21 9月, 2018 1 次提交

Blk-throttle: update to use rbtree with leftmost node cached · 9ff01255

由 Liu Bo 提交于 8月 21, 2018

As rbtree has native support of caching leftmost node,
i.e. rb_root_cached, no need to do the caching by ourselves.
Signed-off-by: NLiu Bo <bo.liu@linux.alibaba.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9ff01255

20 9月, 2018 1 次提交

block: use bio_add_page in bio_iov_iter_get_pages · 576ed913

由 Christoph Hellwig 提交于 9月 20, 2018

Replace a nasty hack with a different nasty hack to prepare for multipage
bio_vecs.  By moving the temporary page array as far up as possible in
the space allocated for the bio_vec array we can iterate forward over it
and thus use bio_add_page.  Using bio_add_page means we'll be able to
merge physically contiguous pages once support for multipath bio_vecs is
merged.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

576ed913

15 9月, 2018 3 次提交

blok, bfq: do not plug I/O if all queues are weight-raised · c8765de0

由 Paolo Valente 提交于 9月 14, 2018

To reduce latency for interactive and soft real-time applications, bfq
privileges the bfq_queues containing the I/O of these
applications. These privileged queues, referred-to as weight-raised
queues, get a much higher share of the device throughput
w.r.t. non-privileged queues. To preserve this higher share, the I/O
of any non-weight-raised queue must be plugged whenever a sync
weight-raised queue, while being served, remains temporarily empty. To
attain this goal, bfq simply plugs any I/O (from any queue), if a sync
weight-raised queue remains empty while in service.

Unfortunately, this plugging typically lowers throughput with random
I/O, on devices with internal queueing (because it reduces the filling
level of the internal queues of the device).

This commit addresses this issue by restricting the cases where
plugging is performed: if a sync weight-raised queue remains empty
while in service, then I/O plugging is performed only if some of the
active bfq_queues are *not* weight-raised (which is actually the only
circumstance where plugging is needed to preserve the higher share of
the throughput of weight-raised queues). This restriction proved able
to boost throughput in really many use cases needing only maximum
throughput.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c8765de0

block, bfq: inject other-queue I/O into seeky idle queues on NCQ flash · d0edc247

由 Paolo Valente 提交于 9月 14, 2018

The Achilles' heel of BFQ is its failing to reach a high throughput
with sync random I/O on flash storage with internal queueing, in case
the processes doing I/O have differentiated weights.

The cause of this failure is as follows. If at least two processes do
sync I/O, and have a different weight from each other, then BFQ plugs
I/O dispatching every time one of these processes, while it is being
served, remains temporarily without pending I/O requests. This
plugging is necessary to guarantee that every process enjoys a
bandwidth proportional to its weight; but it empties the internal
queue(s) of the drive. And this kills throughput with random I/O. So,
if some processes have differentiated weights and do both sync and
random I/O, the end result is a throughput collapse.

This commit tries to counter this problem by injecting the service of
other processes, in a controlled way, while the process in service
happens to have no I/O. This injection is performed only if the medium
is non rotational and performs internal queueing, and the process in
service does random I/O (service injection might be beneficial for
sequential I/O too, we'll work on that).

As an example of the benefits of this commit, on a PLEXTOR PX-256M5S
SSD, and with five processes having differentiated weights and doing
sync random 4KB I/O, this commit makes the throughput with bfq grow by
400%, from 25 to 100MB/s. This higher throughput is 10MB/s lower than
that reached with none. As some less random I/O is added to the mix,
the throughput becomes equal to or higher than that with none.

This commit is a very first attempt to recover throughput without
losing control, and certainly has many limitations. One is, e.g., that
the processes whose service is injected are not chosen so as to
distribute the extra bandwidth they receive in accordance to their
weights. Thus there might be loss of weighted fairness in some
cases. Anyway, this loss concerns extra service, which would not have
been received at all without this commit. Other limitations and issues
will probably show up with usage.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d0edc247

block, bfq: correctly charge and reset entity service in all cases · cbeb869a

由 Paolo Valente 提交于 9月 14, 2018

BFQ schedules entities (which represent either per-process queues or
groups of queues) as a function of their timestamps. In particular, as
a function of their (virtual) finish times. The finish time of an
entity is computed as a function of the budget assigned to the entity,
assuming, tentatively, that the entity, once in service, will receive
an amount of service equal to its budget. Then, when the entity is
expired because it finishes to be served, this finish time is updated
as a function of the actual service received by the entity. This
allows the entity to be correctly charged with only the service
received, and then to be correctly re-scheduled.

Yet an entity may receive service also while not being the entity in
service (in the scheduling environment of its parent entity), for
several reasons. If the entity remains with no backlog while receiving
this 'unofficial' service, then it is expired. Also on such an
expiration, the finish time of the entity should be updated to account
for only the service actually received by the entity. Unfortunately,
such an update is not performed for an entity expiring without being
the entity in service.

In a similar vein, the service counter of the entity in service is
reset when the entity is expired, to be ready to be used for next
service cycle. This reset too should be performed also in case an
entity is expired because it remains empty after receiving service
while not being the entity in service. But in this case the reset is
not performed.

This commit performs the above update of the finish time and reset of
the service received, also for an entity expiring while not being the
entity in service.
Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

cbeb869a

14 9月, 2018 1 次提交

blk-iolatency: remove set but not used variables 'changed' and 'blkiolat' · f8c0d7b1

由 YueHaibing 提交于 9月 14, 2018

Fixes gcc '-Wunused-but-set-variable' warning:

block/blk-iolatency.c: In function 'scale_change':
block/blk-iolatency.c:301:7: warning:
 variable 'changed' set but not used [-Wunused-but-set-variable]

block/blk-iolatency.c: In function 'iolatency_set_limit':
block/blk-iolatency.c:765:24: warning:
 variable 'blkiolat' set but not used [-Wunused-but-set-variable]
Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f8c0d7b1

12 9月, 2018 1 次提交

rsxx: Remove unnecessary parentheses · 798ef9e7

由 Nathan Chancellor 提交于 9月 11, 2018

Clang warns when more than one set of parentheses is used for a
single conditional statement:

drivers/block/rsxx/cregs.c:279:15: warning: equality comparison with
extraneous parentheses [-Wparentheses-equality]
        if ((cmd->op == CREG_OP_READ)) {
             ~~~~~~~~^~~~~~~~~~~~~~~
drivers/block/rsxx/cregs.c:279:15: note: remove extraneous parentheses
around the comparison to silence this warning
        if ((cmd->op == CREG_OP_READ)) {
            ~        ^              ~
drivers/block/rsxx/cregs.c:279:15: note: use '=' to turn this equality
comparison into an assignment
        if ((cmd->op == CREG_OP_READ)) {
                     ^~
                     =
1 warning generated.
Reported-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NNick Desaulniers <ndesaulniers@google.com>
Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

798ef9e7

08 9月, 2018 1 次提交

block: umem: replace spin_lock_bh with spin_lock in tasklet callback · 902d5391

由 jun qian 提交于 9月 07, 2018

As you are already in a tasklet, it is unnecessary to call spin_lock_bh.
Signed-off-by: Njun qian <hangdianqj@163.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

902d5391

07 9月, 2018 7 次提交

block: remove bio_rewind_iter() · 7759eb23

由 Ming Lei 提交于 9月 05, 2018

It is pointed that bio_rewind_iter() is one very bad API[1]:

1) bio size may not be restored after rewinding

2) it causes some bogus change, such as 5151842b (block: reset
bi_iter.bi_done after splitting bio)

3) rewinding really makes things complicated wrt. bio splitting

4) unnecessary updating of .bi_done in fast path

[1] https://marc.info/?t=153549924200005&r=1&w=2

So this patch takes Kent's suggestion to restore one bio into its original
state via saving bio iterator(struct bvec_iter) in bio_integrity_prep(),
given now bio_rewind_iter() is only used by bio integrity code.

Cc: Dmitry Monakhov <dmonakhov@openvz.org>
Cc: Hannes Reinecke <hare@suse.com>
Suggested-by: NKent Overstreet <kent.overstreet@gmail.com>
Acked-by: NKent Overstreet <kent.overstreet@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

7759eb23

drbd: Convert from ahash to shash · 3d0e6375

由 Kees Cook 提交于 8月 06, 2018

In preparing to remove all stack VLA usage from the kernel[1], this
removes the discouraged use of AHASH_REQUEST_ON_STACK in favor of
the smaller SHASH_DESC_ON_STACK by converting from ahash-wrapped-shash
to direct shash. By removing a layer of indirection this both improves
performance and reduces stack usage. The stack allocation will be made
a fixed size in a later patch to the crypto subsystem.

The bulk of the lines in this change are simple s/ahash/shash/, but the
main logic differences are in drbd_csum_ee() and drbd_csum_bio(), which
externalizes the page walking with k(un)map_atomic() instead of using
scattergather.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.comAcked-by: NLars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3d0e6375

Merge tag 'for-linus-20180906' of git://git.kernel.dk/linux-block · ca16eb34

由 Linus Torvalds 提交于 9月 06, 2018

Pull block fixes from Jens Axboe:
 "Small collection of fixes that should go into this release. This
  contains:

   - Small series that fixes a race between blkcg teardown and writeback
     (Dennis Zhou)

   - Fix disallowing invalid block size settings from the nbd ioctl (me)

   - BFQ fix for a use-after-free on last release of a bfqg (Konstantin
     Khlebnikov)

   - Fix for the "don't warn for flush" fix (Mikulas)"

* tag 'for-linus-20180906' of git://git.kernel.dk/linux-block:
  block: bfq: swap puts in bfqg_and_blkg_put
  block: don't warn when doing fsync on read-only devices
  nbd: don't allow invalid blocksize settings
  blkcg: use tryget logic when associating a blkg with a bio
  blkcg: delay blkg destruction until after writeback has finished
  Revert "blk-throttle: fix race between blkcg_bio_issue_check() and cgroup_rmdir()"

ca16eb34

block: bfq: swap puts in bfqg_and_blkg_put · d5274b3c

由 Konstantin Khlebnikov 提交于 9月 06, 2018

Fix trivial use-after-free. This could be last reference to bfqg.

Fixes: 8f9bebc3 ("block, bfq: access and cache blkg data only when safe")
Acked-by: NPaolo Valente <paolo.valente@linaro.org>
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

d5274b3c

Merge tag 'apparmor-pr-2018-09-06' of... · db44bf4b

由 Linus Torvalds 提交于 9月 06, 2018

Merge tag 'apparmor-pr-2018-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull apparmor fix from John Johansen:
 "A fix for an issue syzbot discovered last week:

   - Fix for bad debug check when converting secids to secctx"

* tag 'apparmor-pr-2018-09-06' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor:
  apparmor: fix bad debug check in apparmor_secid_to_secctx()

db44bf4b

Merge tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace · be65e259

由 Linus Torvalds 提交于 9月 06, 2018

Pull tracing fixes from Steven Rostedt:
 "This fixes two annoying bugs:

   - The first one is a side effect caused by using SRCU for rcuidle
     tracepoints. It seems that the perf was depending on the rcuidle
     tracepoints to make RCU watch when it wasn't.

     The real fix will be to have perf use SRCU instead of depending on
     RCU watching, but that can't be done until SRCU is safe to use in
     NMI context (Paul's working on that).

   - The second bug fix is for a bug that's been periodically making my
     tests fail randomly for some time. I haven't had time to track it
     down, but finally have. It has to do with stressing NMIs (via perf)
     while enabling or disabling ftrace function handling with lockdep
     enabled.

     If an interrupt happens and just as it returns, it sets lockdep
     back to "interrupts enabled" but before it returns an NMI is
     triggered, and if this happens while printk_nmi_enter has a
     breakpoint attached to it (because ftrace is converting it to or
     from nop to call fentry), the breakpoint trap also calls into
     lockdep, and since returning from the NMI to a interrupt handler,
     interrupts were disabled when the NMI went off, lockdep keeps its
     state as interrupts disabled when it returns back from the
     interrupt handler where interrupts are enabled.

     This causes lockdep_assert_irqs_enabled() to trigger a false
     positive"

* tag 'trace-v4.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  printk/tracing: Do not trace printk_nmi_enter()
  tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints

be65e259

Merge tag 'for-4.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 5404525b

由 Linus Torvalds 提交于 9月 06, 2018

Pull btrfs fixes from David Sterba:

 - fix for improper fsync after hardlink

 - fix for a corruption during file deduplication

 - use after free fixes

 - RCU warning fix

 - fix for buffered write to nodatacow file

* tag 'for-4.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: Fix suspicious RCU usage warning in btrfs_debug_in_rcu
  btrfs: use after free in btrfs_quota_enable
  btrfs: btrfs_shrink_device should call commit transaction at the end
  btrfs: fix qgroup_free wrong num_bytes in btrfs_subvolume_reserve_metadata
  Btrfs: fix data corruption when deduplicating between different files
  Btrfs: sync log after logging new name
  Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space

5404525b

06 9月, 2018 5 次提交

printk/tracing: Do not trace printk_nmi_enter() · d1c392c9

由 Steven Rostedt (VMware) 提交于 9月 05, 2018

I hit the following splat in my tests:

------------[ cut here ]------------
IRQs not enabled as expected
WARNING: CPU: 3 PID: 0 at kernel/time/tick-sched.c:982 tick_nohz_idle_enter+0x44/0x8c
Modules linked in: ip6t_REJECT nf_reject_ipv6 ip6table_filter ip6_tables ipv6
CPU: 3 PID: 0 Comm: swapper/3 Not tainted 4.19.0-rc2-test+ #2
Hardware name: MSI MS-7823/CSM-H87M-G43 (MS-7823), BIOS V1.6 02/22/2014
EIP: tick_nohz_idle_enter+0x44/0x8c
Code: ec 05 00 00 00 75 26 83 b8 c0 05 00 00 00 75 1d 80 3d d0 36 3e c1 00
75 14 68 94 63 12 c1 c6 05 d0 36 3e c1 01 e8 04 ee f8 ff <0f> 0b 58 fa bb a0
e5 66 c1 e8 25 0f 04 00 64 03 1d 28 31 52 c1 8b
EAX: 0000001c EBX: f26e7f8c ECX: 00000006 EDX: 00000007
ESI: f26dd1c0 EDI: 00000000 EBP: f26e7f40 ESP: f26e7f38
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010296
CR0: 80050033 CR2: 0813c6b0 CR3: 2f342000 CR4: 001406f0
Call Trace:
 do_idle+0x33/0x202
 cpu_startup_entry+0x61/0x63
 start_secondary+0x18e/0x1ed
 startup_32_smp+0x164/0x168
irq event stamp: 18773830
hardirqs last  enabled at (18773829): [<c040150c>] trace_hardirqs_on_thunk+0xc/0x10
hardirqs last disabled at (18773830): [<c040151c>] trace_hardirqs_off_thunk+0xc/0x10
softirqs last  enabled at (18773824): [<c0ddaa6f>] __do_softirq+0x25f/0x2bf
softirqs last disabled at (18773767): [<c0416bbe>] call_on_stack+0x45/0x4b
---[ end trace b7c64aa79e17954a ]---

After a bit of debugging, I found what was happening. This would trigger
when performing "perf" with a high NMI interrupt rate, while enabling and
disabling function tracer. Ftrace uses breakpoints to convert the nops at
the start of functions to calls to the function trampolines. The breakpoint
traps disable interrupts and this makes calls into lockdep via the
trace_hardirqs_off_thunk in the entry.S code. What happens is the following:

  do_idle {

    [interrupts enabled]

    <interrupt> [interrupts disabled]
	TRACE_IRQS_OFF [lockdep says irqs off]
	[...]
	TRACE_IRQS_IRET
	    test if pt_regs say return to interrupts enabled [yes]
	    TRACE_IRQS_ON [lockdep says irqs are on]

	    <nmi>
		nmi_enter() {
		    printk_nmi_enter() [traced by ftrace]
		    [ hit ftrace breakpoint ]
		    <breakpoint exception>
			TRACE_IRQS_OFF [lockdep says irqs off]
			[...]
			TRACE_IRQS_IRET [return from breakpoint]
			   test if pt_regs say interrupts enabled [no]
			   [iret back to interrupt]
	   [iret back to code]

    tick_nohz_idle_enter() {

	lockdep_assert_irqs_enabled() [lockdep say no!]

Although interrupts are indeed enabled, lockdep thinks it is not, and since
we now do asserts via lockdep, it gives a false warning. The issue here is
that printk_nmi_enter() is called before lockdep_off(), which disables
lockdep (for this reason) in NMIs. By simply not allowing ftrace to see
printk_nmi_enter() (via notrace annotation) we keep lockdep from getting
confused.

Cc: stable@vger.kernel.org
Fixes: 42a0bb3f ("printk/nmi: generic solution for safe printk in NMI")
Acked-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NPetr Mladek <pmladek@suse.com>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

d1c392c9

block: don't warn when doing fsync on read-only devices · 8b2ded1c

由 Mikulas Patocka 提交于 9月 05, 2018

It is possible to call fsync on a read-only handle (for example, fsck.ext2
does it when doing read-only check), and this call results in kernel
warning.

The patch b089cfd9 ("block: don't warn for flush on read-only device")
attempted to disable the warning, but it is buggy and it doesn't
(op_is_flush tests flags, but bio_op strips off the flags).
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Fixes: 721c7fc7 ("block: fail op_is_write() requests to read-only partitions")
Cc: stable@vger.kernel.org	# 4.18
Signed-off-by: NJens Axboe <axboe@kernel.dk>

8b2ded1c

Merge tag 'gpio-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio · b36fdc68

由 Linus Torvalds 提交于 9月 05, 2018

Pull GPIO fixes from Linus Walleij:
 "Some GPIO fixes. The ACPI stuff is probably the most annoying for
  users that get fixed this time.

   - Atomic contexts, cansleep* calls and such fastpath/slopwpath
     things.

   - Defer ACPI event handler registration to late_initcall() so IRQs do
     not fire in our face before other drivers have a chance to register
     handlers.

   - Race condition if a consumer requests a GPIO after
     gpiochip_add_data_with_key() but before of_gpiochip_add()

   - Probe errorpath in the dwapb driver"

* tag 'gpio-v4.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio:
  gpio: Fix crash due to registration race
  gpio: dwapb: Fix error handling in dwapb_gpio_probe()
  gpiolib-acpi: Register GpioInt ACPI event handlers from a late_initcall
  gpiolib: acpi: Switch to cansleep version of GPIO library call
  gpio: adp5588: Fix sleep-in-atomic-context bug

b36fdc68

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · f4697d9a

由 Linus Torvalds 提交于 9月 05, 2018

Pull SCSI fixes from James Bottomley:
 "A set of very minor fixes and a couple of reverts to fix a major
  problem (the attempt to change the busy count causes a hang when
  attempting to change the drive cache type)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: aacraid: fix a signedness bug
  Revert "scsi: core: avoid host-wide host_busy counter for scsi_mq"
  Revert "scsi: core: fix scsi_host_queue_ready"
  scsi: libata: Add missing newline at end of file
  scsi: target: iscsi: cxgbit: use pr_debug() instead of pr_info()
  scsi: hpsa: limit transfer length to 1MB, not 512kB
  scsi: lpfc: Correct MDS diag and nvmet configuration
  scsi: lpfc: Default fdmi_on to on
  scsi: csiostor: fix incorrect port capabilities
  scsi: csiostor: add a check for NULL pointer after kmalloc()
  scsi: documentation: add scsi_mod.use_blk_mq to scsi-parameters
  scsi: core: Update SCSI_MQ_DEFAULT help text to match default

f4697d9a

Merge tag 'nds32-for-linus-4.19-tag1' of... · d0c1db1d

由 Linus Torvalds 提交于 9月 05, 2018

Merge tag 'nds32-for-linus-4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux

Pull nds32 updates from Greentime Hu:
 "Contained in here are the bug fixes, building error fixes and ftrace
  support for nds32"

* tag 'nds32-for-linus-4.19-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/greentime/linux:
  nds32: linker script: GCOV kernel may refers data in __exit
  nds32: fix build error because of wrong semicolon
  nds32: Fix a kernel panic issue because of wrong frame pointer access.
  nds32: Only print one page of stack when die to prevent printing too much information.
  nds32: Add macro definition for offset of lp register on stack
  nds32: Remove the deprecated ABI implementation
  nds32/stack: Get real return address by using ftrace_graph_ret_addr
  nds32/ftrace: Support dynamic function graph tracer
  nds32/ftrace: Support dynamic function tracer
  nds32/ftrace: Add RECORD_MCOUNT support
  nds32/ftrace: Support static function graph tracer
  nds32/ftrace: Support static function tracer
  nds32: Extract the checking and getting pointer to a macro
  nds32: Clean up the coding style
  nds32: Fix get_user/put_user macro expand pointer problem
  nds32: Fix empty call trace
  nds32: add NULL entry to the end of_device_id array
  nds32: fix logic for module

d0c1db1d

05 9月, 2018 17 次提交

tracing: Add back in rcu_irq_enter/exit_irqson() for rcuidle tracepoints · 865e63b0

由 Steven Rostedt (VMware) 提交于 9月 04, 2018

Borislav reported the following splat:

 =============================
 WARNING: suspicious RCU usage
 4.19.0-rc1+ #1 Not tainted
 -----------------------------
 ./include/linux/rcupdate.h:631 rcu_read_lock() used illegally while idle!
 other info that might help us debug this:

 RCU used illegally from idle CPU!
 rcu_scheduler_active = 2, debug_locks = 1
 RCU used illegally from extended quiescent state!
 1 lock held by swapper/0/0:
  #0: 000000004557ee0e (rcu_read_lock){....}, at: perf_event_output_forward+0x0/0x130

 stack backtrace:
 CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc1+ #1
 Hardware name: LENOVO 2320CTO/2320CTO, BIOS G2ET86WW (2.06 ) 11/13/2012
 Call Trace:
  dump_stack+0x85/0xcb
  perf_event_output_forward+0xf6/0x130
  __perf_event_overflow+0x52/0xe0
  perf_swevent_overflow+0x91/0xb0
  perf_tp_event+0x11a/0x350
  ? find_held_lock+0x2d/0x90
  ? __lock_acquire+0x2ce/0x1350
  ? __lock_acquire+0x2ce/0x1350
  ? retint_kernel+0x2d/0x2d
  ? find_held_lock+0x2d/0x90
  ? tick_nohz_get_sleep_length+0x83/0xb0
  ? perf_trace_cpu+0xbb/0xd0
  ? perf_trace_buf_alloc+0x5a/0xa0
  perf_trace_cpu+0xbb/0xd0
  cpuidle_enter_state+0x185/0x340
  do_idle+0x1eb/0x260
  cpu_startup_entry+0x5f/0x70
  start_kernel+0x49b/0x4a6
  secondary_startup_64+0xa4/0xb0

This is due to the tracepoints moving to SRCU usage which does not require
RCU to be "watching". But perf uses these tracepoints with RCU and expects
it to be. Hence, we still need to add in the rcu_irq_enter/exit_irqson()
calls for "rcuidle" tracepoints. This is a temporary fix until we have SRCU
working in NMI context, and then perf can be converted to use that instead
of normal RCU.

Link: http://lkml.kernel.org/r/20180904162611.6a120068@gandalf.local.home

Cc: x86-ml <x86@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: NBorislav Petkov <bp@alien8.de>
Tested-by: NBorislav Petkov <bp@alien8.de>
Reviewed-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Fixes: e6753f23 ("tracepoint: Make rcuidle tracepoint callers use SRCU")
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

865e63b0

nds32: linker script: GCOV kernel may refers data in __exit · 3350139c

由 Greentime Hu 提交于 9月 04, 2018

This patch is used to fix nds32 allmodconfig/allyesconfig build error
because GCOV kernel embeds counters in the kernel for each line
and a part of that embed in __exit text. So we need to keep the
EXIT_TEXT and EXIT_DATA if CONFIG_GCOV_KERNEL=y.

Link: https://lkml.org/lkml/2018/9/1/125Signed-off-by: NGreentime Hu <greentime@andestech.com>
Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>

3350139c

Merge branch 'akpm' (patches from Andrew) · 0e9b1039

由 Linus Torvalds 提交于 9月 04, 2018

Merge misc fixes from Andrew Morton:
 "17 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  nilfs2: convert to SPDX license tags
  drivers/dax/device.c: convert variable to vm_fault_t type
  lib/Kconfig.debug: fix three typos in help text
  checkpatch: add __ro_after_init to known $Attribute
  mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal
  uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name
  memory_hotplug: fix kernel_panic on offline page processing
  checkpatch: add optional static const to blank line declarations test
  ipc/shm: properly return EIDRM in shm_lock()
  mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported.
  mm/util.c: improve kvfree() kerneldoc
  tools/vm/page-types.c: fix "defined but not used" warning
  tools/vm/slabinfo.c: fix sign-compare warning
  kmemleak: always register debugfs file
  mm: respect arch_dup_mmap() return value
  mm, oom: fix missing tlb_finish_mmu() in __oom_reap_task_mm().
  mm: memcontrol: print proper OOM header when no eligible victim left

0e9b1039

nilfs2: convert to SPDX license tags · ae98043f

由 Ryusuke Konishi 提交于 9月 04, 2018

Remove the verbose license text from NILFS2 files and replace them with
SPDX tags. This does not change the license of any of the code.

Link: http://lkml.kernel.org/r/1535624528-5982-1-git-send-email-konishi.ryusuke@lab.ntt.co.jpSigned-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ae98043f

drivers/dax/device.c: convert variable to vm_fault_t type · 36bdac1e

由 Souptick Joarder 提交于 9月 04, 2018

As part of 226ab561 ("device-dax: Convert to vmf_insert_mixed and
vm_fault_t") in 4.19-rc1, 'rc' was not converted to vm_fault_t.  Now
converted.

Link: http://lkml.kernel.org/r/20180830153813.GA26059@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Dave Jiang <dave.jiang@intel.com>
Cc: Ross Zwisler <zwisler@kernel.org>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

36bdac1e

lib/Kconfig.debug: fix three typos in help text · 4c5d114e

由 Thibaut Sautereau 提交于 9月 04, 2018

Fix three typos in CONFIG_WARN_ALL_UNSEEDED_RANDOM help text.

Link: http://lkml.kernel.org/r/20180830194505.4778-1-thibaut@sautereau.frSigned-off-by: NThibaut Sautereau <thibaut@sautereau.fr>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c5d114e

checkpatch: add __ro_after_init to known $Attribute · c5967e98

由 Joe Perches 提交于 9月 04, 2018

__ro_after_init is a specific __attribute__ that checkpatch does currently
not understand.

Add it to the known $Attribute types so that code that uses variables
declared with __ro_after_init are not thought to be a modifier type.

This appears as a defect in checkpatch output of code like:

static bool trust_cpu __ro_after_init = IS_ENABLED(CONFIG_RANDOM_TRUST_CPU);
[...]
       if (trust_cpu && arch_init) {

where checkpatch reports:

ERROR: space prohibited after that '&&' (ctx:WxW)
	if (trust_cpu && arch_init) {

Link: http://lkml.kernel.org/r/0fa8a2cb83ade4c525e18261ecf6cfede3015983.camel@perches.comSigned-off-by: NJoe Perches <joe@perches.com>
Reported-by: NKees Cook <keescook@chromium.org>
Tested-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c5967e98

mm: fix BUG_ON() in vmf_insert_pfn_pud() from VM_MIXEDMAP removal · 62ec0d8c

由 Dave Jiang 提交于 9月 04, 2018

It looks like I missed the PUD path when doing VM_MIXEDMAP removal.
This can be triggered by:
1. Boot with memmap=4G!8G
2. build ndctl with destructive flag on
3. make TESTS=device-dax check

[  +0.000675] kernel BUG at mm/huge_memory.c:824!

Applying the same change that was applied to vmf_insert_pfn_pmd() in the
original patch.

Link: http://lkml.kernel.org/r/153565957352.35524.1005746906902065126.stgit@djiang5-desk3.ch.intel.com
Fixes: e1fb4a08 ("dax: remove VM_MIXEDMAP for fsdax and device dax")
Signed-off-by: NDave Jiang <dave.jiang@intel.com>
Reported-by: NVishal Verma <vishal.l.verma@intel.com>
Tested-by: NVishal Verma <vishal.l.verma@intel.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

62ec0d8c

uapi/linux/keyctl.h: don't use C++ reserved keyword as a struct member name · 8a2336e5

由 Randy Dunlap 提交于 9月 04, 2018

Since this header is in "include/uapi/linux/", apparently people want to
use it in userspace programs -- even in C++ ones.  However, the header
uses a C++ reserved keyword ("private"), so change that to "dh_private"
instead to allow the header file to be used in C++ userspace.

Fixes https://bugzilla.kernel.org/show_bug.cgi?id=191051
Link: http://lkml.kernel.org/r/0db6c314-1ef4-9bfa-1baa-7214dd2ee061@infradead.org
Fixes: ddbb4114 ("KEYS: Add KEYCTL_DH_COMPUTE command")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: David Howells <dhowells@redhat.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Mat Martineau <mathew.j.martineau@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8a2336e5

memory_hotplug: fix kernel_panic on offline page processing · 4e8346d0

由 Mikhail Zaslonko 提交于 9月 04, 2018

Within show_valid_zones() the function test_pages_in_a_zone() should be
called for online memory blocks only.

Otherwise it might lead to the VM_BUG_ON due to uninitialized struct
pages (when CONFIG_DEBUG_VM_PGFLAGS kernel option is set):

 page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
 ------------[ cut here ]------------
 Call Trace:
 ([<000000000038f91e>] test_pages_in_a_zone+0xe6/0x168)
  [<0000000000923472>] show_valid_zones+0x5a/0x1a8
  [<0000000000900284>] dev_attr_show+0x3c/0x78
  [<000000000046f6f0>] sysfs_kf_seq_show+0xd0/0x150
  [<00000000003ef662>] seq_read+0x212/0x4b8
  [<00000000003bf202>] __vfs_read+0x3a/0x178
  [<00000000003bf3ca>] vfs_read+0x8a/0x148
  [<00000000003bfa3a>] ksys_read+0x62/0xb8
  [<0000000000bc2220>] system_call+0xdc/0x2d8

That VM_BUG_ON was triggered by the page poisoning introduced in
mm/sparse.c with the git commit d0dc12e8 ("mm/memory_hotplug:
optimize memory hotplug").

With the same commit the new 'nid' field has been added to the struct
memory_block in order to store and later on derive the node id for
offline pages (instead of accessing struct page which might be
uninitialized).  But one reference to nid in show_valid_zones() function
has been overlooked.  Fixed with current commit.  Also, nr_pages will
not be used any more after test_pages_in_a_zone() call, do not update
it.

Link: http://lkml.kernel.org/r/20180828090539.41491-1-zaslonko@linux.ibm.com
Fixes: d0dc12e8 ("mm/memory_hotplug: optimize memory hotplug")
Signed-off-by: NMikhail Zaslonko <zaslonko@linux.ibm.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NPavel Tatashin <pavel.tatashin@microsoft.com>
Cc: <stable@vger.kernel.org>	[4.17+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e8346d0

checkpatch: add optional static const to blank line declarations test · 328b5f41

由 Joe Perches 提交于 9月 04, 2018

Using a static const struct definition as part of a series of
declarations produces a false positive "Missing a blank line after
declarations" for code like:

  WARNING: Missing a blank line after declarations
  #710: FILE: drivers/gpu/drm/tidss/tidss_scale_coefs.c:137:
  +       int inc;
  +       static const struct {

So fix it.

Link: http://lkml.kernel.org/r/5905126e70b0ed1781e49265fd5c49c5090d0223.camel@perches.comSigned-off-by: NJoe Perches <joe@perches.com>
Reported-by: NJyri Sarha <jsarha@ti.com>
Cc: "Valkeinen, Tomi" <tomi.valkeinen@ti.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

328b5f41

ipc/shm: properly return EIDRM in shm_lock() · 9c21dae2

由 Davidlohr Bueso 提交于 9月 04, 2018

When getting rid of the general ipc_lock(), this was missed furthermore,
making the comment around the ipc object validity check bogus.  Under
EIDRM conditions, callers will in turn not see the error and continue
with the operation.

Link: http://lkml.kernel.org/r/20180824030920.GD3677@linux-r8p5
Link: http://lkml.kernel.org/r/20180823024051.GC13343@shao2-debian
Fixes: 82061c57 ("ipc: drop ipc_lock()")
Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
Reported-by: Nkernel test robot <rong.a.chen@intel.com>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c21dae2

mm/hugetlb: filter out hugetlb pages if HUGEPAGE migration is not supported. · 464c7ffb

由 Aneesh Kumar K.V 提交于 9月 04, 2018

When scanning for movable pages, filter out Hugetlb pages if hugepage
migration is not supported.  Without this we hit infinte loop in
__offline_pages() where we do

	pfn = scan_movable_pages(start_pfn, end_pfn);
	if (pfn) { /* We have movable pages */
		ret = do_migrate_range(pfn, end_pfn);
		goto repeat;
	}

Fix this by checking hugepage_migration_supported both in
has_unmovable_pages which is the primary backoff mechanism for page
offlining and for consistency reasons also into scan_movable_pages
because it doesn't make any sense to return a pfn to non-migrateable
huge page.

This issue was revealed by, but not caused by 72b39cfc ("mm,
memory_hotplug: do not fail offlining too early").

Link: http://lkml.kernel.org/r/20180824063314.21981-1-aneesh.kumar@linux.ibm.com
Fixes: 72b39cfc ("mm, memory_hotplug: do not fail offlining too early")
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Reported-by: NHaren Myneni <haren@linux.vnet.ibm.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

464c7ffb

mm/util.c: improve kvfree() kerneldoc · 04b8e946

由 Andrew Morton 提交于 9月 04, 2018

Scooped from an email from Matthew.

Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

04b8e946

tools/vm/page-types.c: fix "defined but not used" warning · 7ab660f8

由 Naoya Horiguchi 提交于 9月 04, 2018

debugfs_known_mountpoints[] is not used any more, so let's remove it.

Link: http://lkml.kernel.org/r/1535102651-19418-1-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ab660f8

tools/vm/slabinfo.c: fix sign-compare warning · 90450656

由 Naoya Horiguchi 提交于 9月 04, 2018

Currently we get the following compiler warning:

    slabinfo.c:854:22: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
       if (s->object_size < min_objsize)
                          ^

due to the mismatch of signed/unsigned comparison.  ->object_size and
->slab_size are never expected to be negative, so let's define them as
unsigned int.

[n-horiguchi@ah.jp.nec.com: convert everything - none of these can be negative]
  Link: http://lkml.kernel.org/r/20180826234947.GA9787@hori1.linux.bs1.fc.nec.co.jp
Link: http://lkml.kernel.org/r/1535103134-20239-1-git-send-email-n-horiguchi@ah.jp.nec.comSigned-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90450656

kmemleak: always register debugfs file · b353756b

由 Vincent Whitchurch 提交于 9月 04, 2018

If kmemleak built in to the kernel, but is disabled by default, the
debugfs file is never registered. Because of this, it is not possible
to find out if the kernel is built with kmemleak support by checking for
the presence of this file. To allow this, always register the file.

After this patch, if the file doesn't exist, kmemleak is not available
in the kernel. If writing "scan" or any other value than "clear" to
this file results in EBUSY, then kmemleak is available but is disabled
by default and can be activated via the kernel command line.

Catalin: "that's also consistent with a late disabling of kmemleak when
the debugfs entry sticks around."

Link: http://lkml.kernel.org/r/20180824131220.19176-1-vincent.whitchurch@axis.comSigned-off-by: NVincent Whitchurch <vincent.whitchurch@axis.com>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b353756b

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功