提交 · 2d1b69ed65ee033aa541518cc9f6a815296ac493 · openeuler / Kernel

28 6月, 2020 2 次提交

net/mlx5: kTLS, Improve TLS params layout structures · 2d1b69ed

由 Tariq Toukan 提交于 6月 25, 2020

Add explicit WQE segment structures for the TLS static and progress
params.
According to the HW spec, TISN is not part of the progress params context,
take it out of it.
Rename the control segment tisn field as it could hold either a TIS or
a TIR number.
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

2d1b69ed

net/mlx5: Avoid RDMA file inclusion in core driver · 9205d7b1

由 Parav Pandit 提交于 6月 25, 2020

mlx5 cq.h does not depend on RDMA verbs.
Remove RDMA verbs file inclusion.
Signed-off-by: NParav Pandit <parav@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

9205d7b1

23 6月, 2020 2 次提交

net/mlx5: Add support in query QP, CQ and MKEY segments · 608ca553

由 Maor Gottlieb 提交于 4月 08, 2020

Introduce new resource dump segments - PRM_QUERY_QP,
PRM_QUERY_CQ and PRM_QUERY_MKEY. These segments contains the resource
dump in PRM query format.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

608ca553

net/mlx5: Export resource dump interface · d63cc249

由 Maor Gottlieb 提交于 4月 08, 2020

Export some of the resource dump API. mlx5_ib driver will use
it in downstream patches.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>

d63cc249

20 6月, 2020 1 次提交

mm: Allow arches to provide ptep_get() · 481e980a

由 Christophe Leroy 提交于 6月 15, 2020

Since commit 9e343b46 ("READ_ONCE: Enforce atomicity for
{READ,WRITE}_ONCE() memory accesses") it is not possible anymore to
use READ_ONCE() to access complex page table entries like the one
defined for powerpc 8xx with 16k size pages.

Define a ptep_get() helper that architectures can override instead
of performing a READ_ONCE() on the page table entry pointer.

Fixes: 9e343b46 ("READ_ONCE: Enforce atomicity for {READ,WRITE}_ONCE() memory accesses")
Signed-off-by: NChristophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: NWill Deacon <will@kernel.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/087fa12b6e920e32315136b998aa834f99242695.1592225558.git.christophe.leroy@csgroup.eu

481e980a

19 6月, 2020 4 次提交

i2c: remove deprecated i2c_new_device API · 390fd047

由 Wolfram Sang 提交于 6月 15, 2020

All in-tree users have been converted to the new i2c_new_client_device
function, so remove this deprecated one.
Signed-off-by: NWolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: NWolfram Sang <wsa@kernel.org>

390fd047

maccess: make get_kernel_nofault() check for minimal type compatibility · 0c389d89

由 Linus Torvalds 提交于 6月 18, 2020

Now that we've renamed probe_kernel_address() to get_kernel_nofault()
and made it look and behave more in line with get_user(), some of the
subtle type behavior differences end up being more obvious and possibly
dangerous.

When you do

        get_user(val, user_ptr);

the type of the access comes from the "user_ptr" part, and the above
basically acts as

        val = *user_ptr;

by design (except, of course, for the fact that the actual dereference
is done with a user access).

Note how in the above case, the type of the end result comes from the
pointer argument, and then the value is cast to the type of 'val' as
part of the assignment.

So the type of the pointer is ultimately the more important type both
for the access itself.

But 'get_kernel_nofault()' may now _look_ similar, but it behaves very
differently.  When you do

        get_kernel_nofault(val, kernel_ptr);

it behaves like

        val = *(typeof(val) *)kernel_ptr;

except, of course, for the fact that the actual dereference is done with
exception handling so that a faulting access is suppressed and returned
as the error code.

But note how different the casting behavior of the two superficially
similar accesses are: one does the actual access in the size of the type
the pointer points to, while the other does the access in the size of
the target, and ignores the pointer type entirely.

Actually changing get_kernel_nofault() to act like get_user() is almost
certainly the right thing to do eventually, but in the meantime this
patch adds logit to at least verify that the pointer type is compatible
with the type of the result.

In many cases, this involves just casting the pointer to 'void *' to
make it obvious that the type of the pointer is not the important part.
It's not how 'get_user()' acts, but at least the behavioral difference
is now obvious and explicit.

Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0c389d89

maccess: rename probe_kernel_address to get_kernel_nofault · 25f12ae4

由 Christoph Hellwig 提交于 6月 17, 2020

Better describe what this helper does, and match the naming of
copy_from_kernel_nofault.

Also switch the argument order around, so that it acts and looks
like get_user().
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

25f12ae4

sparse: use identifiers to define address spaces · 670d0a4b

由 Luc Van Oostenryck 提交于 6月 18, 2020

Currently, address spaces in warnings are displayed as '<asn:X>' with
'X' being the address space's arbitrary number.

But since sparse v0.6.0-rc1 (late December 2018), sparse allows you to
define the address spaces using an identifier instead of a number.  This
identifier is then directly used in the warnings.

So, use the identifiers '__user', '__iomem', '__percpu' & '__rcu' for
the corresponding address spaces.  The default address space, __kernel,
being not displayed in warnings, stays defined as '0'.

With this change, warnings that used to be displayed as:

	cast removes address space '<asn:1>' of expression
	... void [noderef] <asn:2> *

will now be displayed as:

	cast removes address space '__user' of expression
	... void [noderef] __iomem *

This also moves the __kernel annotation to be the first one, since it is
quite different from the others because it's the default one, and so:

 - it's never displayed

 - it's normally not needed, nor in type annotations, nor in cast
   between address spaces. The only time it's needed is when it's
   combined with a typeof to express "the same type as this one but
   without the address space"

 - it can't be defined with a name, '0' must be used.

So, it seemed strange to me to have it in the middle of the other
ones.
Signed-off-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
Acked-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

670d0a4b

18 6月, 2020 4 次提交

block: make function 'kill_bdev' static · 3373a346

由 Zheng Bin 提交于 6月 18, 2020

kill_bdev does not have any external user, so make it static.
Signed-off-by: NZheng Bin <zhengbin13@huawei.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3373a346

libata: Use per port sync for detach · b5292111

由 Kai-Heng Feng 提交于 6月 03, 2020

Commit 130f4caf ("libata: Ensure ata_port probe has completed before
detach") may cause system freeze during suspend.

Using async_synchronize_full() in PM callbacks is wrong, since async
callbacks that are already scheduled may wait for not-yet-scheduled
callbacks, causes a circular dependency.

Instead of using big hammer like async_synchronize_full(), use async
cookie to make sure port probe are synced, without affecting other
scheduled PM callbacks.

Fixes: 130f4caf ("libata: Ensure ata_port probe has completed before detach")
Suggested-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: NJohn Garry <john.garry@huawei.com>
BugLink: https://bugs.launchpad.net/bugs/1867983Signed-off-by: NJens Axboe <axboe@kernel.dk>

b5292111

maccess: rename probe_user_{read,write} to copy_{from,to}_user_nofault · c0ee37e8

由 Christoph Hellwig 提交于 6月 17, 2020

Better describe what these functions do.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c0ee37e8

maccess: rename probe_kernel_{read,write} to copy_{from,to}_kernel_nofault · fe557319

由 Christoph Hellwig 提交于 6月 17, 2020

Better describe what these functions do.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fe557319

17 6月, 2020 2 次提交

overflow.h: Add flex_array_size() helper · b19d57d0

由 Gustavo A. R. Silva 提交于 6月 08, 2020

Add flex_array_size() helper for the calculation of the size, in bytes,
of a flexible array member contained within an enclosing structure.

Example of usage:

struct something {
	size_t count;
	struct foo items[];
};

struct something *instance;

instance = kmalloc(struct_size(instance, items, count), GFP_KERNEL);
instance->count = count;
memcpy(instance->items, src, flex_array_size(instance, items, instance->count));

The helper returns SIZE_MAX on overflow instead of wrapping around.

Additionally replaces parameter "n" with "count" in struct_size() helper
for greater clarity and unification.
Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/20200609012233.GA3371@embeddedorSigned-off-by: NKees Cook <keescook@chromium.org>

b19d57d0

kretprobe: Prevent triggering kretprobe from within kprobe_flush_task · 9b38cc70

由 Jiri Olsa 提交于 5月 12, 2020

Ziqian reported lockup when adding retprobe on _raw_spin_lock_irqsave.
My test was also able to trigger lockdep output:

 ============================================
 WARNING: possible recursive locking detected
 5.6.0-rc6+ #6 Not tainted
 --------------------------------------------
 sched-messaging/2767 is trying to acquire lock:
 ffffffff9a492798 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_hash_lock+0x52/0xa0

 but task is already holding lock:
 ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&(kretprobe_table_locks[i].lock));
   lock(&(kretprobe_table_locks[i].lock));

  *** DEADLOCK ***

  May be due to missing lock nesting notation

 1 lock held by sched-messaging/2767:
  #0: ffffffff9a491a18 (&(kretprobe_table_locks[i].lock)){-.-.}, at: kretprobe_trampoline+0x0/0x50

 stack backtrace:
 CPU: 3 PID: 2767 Comm: sched-messaging Not tainted 5.6.0-rc6+ #6
 Call Trace:
  dump_stack+0x96/0xe0
  __lock_acquire.cold.57+0x173/0x2b7
  ? native_queued_spin_lock_slowpath+0x42b/0x9e0
  ? lockdep_hardirqs_on+0x590/0x590
  ? __lock_acquire+0xf63/0x4030
  lock_acquire+0x15a/0x3d0
  ? kretprobe_hash_lock+0x52/0xa0
  _raw_spin_lock_irqsave+0x36/0x70
  ? kretprobe_hash_lock+0x52/0xa0
  kretprobe_hash_lock+0x52/0xa0
  trampoline_handler+0xf8/0x940
  ? kprobe_fault_handler+0x380/0x380
  ? find_held_lock+0x3a/0x1c0
  kretprobe_trampoline+0x25/0x50
  ? lock_acquired+0x392/0xbc0
  ? _raw_spin_lock_irqsave+0x50/0x70
  ? __get_valid_kprobe+0x1f0/0x1f0
  ? _raw_spin_unlock_irqrestore+0x3b/0x40
  ? finish_task_switch+0x4b9/0x6d0
  ? __switch_to_asm+0x34/0x70
  ? __switch_to_asm+0x40/0x70

The code within the kretprobe handler checks for probe reentrancy,
so we won't trigger any _raw_spin_lock_irqsave probe in there.

The problem is in outside kprobe_flush_task, where we call:

  kprobe_flush_task
    kretprobe_table_lock
      raw_spin_lock_irqsave
        _raw_spin_lock_irqsave

where _raw_spin_lock_irqsave triggers the kretprobe and installs
kretprobe_trampoline handler on _raw_spin_lock_irqsave return.

The kretprobe_trampoline handler is then executed with already
locked kretprobe_table_locks, and first thing it does is to
lock kretprobe_table_locks ;-) the whole lockup path like:

  kprobe_flush_task
    kretprobe_table_lock
      raw_spin_lock_irqsave
        _raw_spin_lock_irqsave ---> probe triggered, kretprobe_trampoline installed

        ---> kretprobe_table_locks locked

        kretprobe_trampoline
          trampoline_handler
            kretprobe_hash_lock(current, &head, &flags);  <--- deadlock

Adding kprobe_busy_begin/end helpers that mark code with fake
probe installed to prevent triggering of another kprobe within
this code.

Using these helpers in kprobe_flush_task, so the probe recursion
protection check is hit and the probe is never set to prevent
above lockup.

Link: http://lkml.kernel.org/r/158927059835.27680.7011202830041561604.stgit@devnote2

Fixes: ef53d9c5 ("kprobes: improve kretprobe scalability with hashed locking")
Cc: Ingo Molnar <mingo@kernel.org>
Cc: "Gustavo A . R . Silva" <gustavoars@kernel.org>
Cc: Anders Roxell <anders.roxell@linaro.org>
Cc: "Naveen N . Rao" <naveen.n.rao@linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David Miller <davem@davemloft.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Reported-by: N"Ziqian SUN (Zamir)" <zsun@redhat.com>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>

9b38cc70

16 6月, 2020 12 次提交

libceph: move away from global osd_req_flags · 22d2cfdf

由 Ilya Dryomov 提交于 6月 04, 2020

osd_req_flags is overly general and doesn't suit its only user
(read_from_replica option) well:

- applying osd_req_flags in account_request() affects all OSD
  requests, including linger (i.e. watch and notify).  However,
  linger requests should always go to the primary even though
  some of them are reads (e.g. notify has side effects but it
  is a read because it doesn't result in mutation on the OSDs).

- calls to class methods that are reads are allowed to go to
  the replica, but most such calls issued for "rbd map" and/or
  exclusive lock transitions are requested to be resent to the
  primary via EAGAIN, doubling the latency.

Get rid of global osd_req_flags and set read_from_replica flag
only on specific OSD requests instead.

Fixes: 8ad44d5e ("libceph: read_from_replica option")
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>

22d2cfdf

tifm: Replace zero-length array with flexible-array · 5cab1634

由 Gustavo A. R. Silva 提交于 5月 28, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

5cab1634

sctp: Replace zero-length array with flexible-array · af6bb61c

由 Gustavo A. R. Silva 提交于 5月 28, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

af6bb61c

libata: Replace zero-length array with flexible-array · 9c5fbf05

由 Gustavo A. R. Silva 提交于 5月 28, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

9c5fbf05

kprobes: Replace zero-length array with flexible-array · 67a862a9

由 Gustavo A. R. Silva 提交于 5月 28, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

67a862a9

kexec: Replace zero-length array with flexible-array · 50b6951f

由 Gustavo A. R. Silva 提交于 5月 28, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

50b6951f

KVM: Replace zero-length array with flexible-array · 764e515f

由 Gustavo A. R. Silva 提交于 5月 28, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

764e515f

FS-Cache: Replace zero-length array with flexible-array · 67cd4624

由 Gustavo A. R. Silva 提交于 5月 28, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

67cd4624

cb710: Replace zero-length array with flexible-array · 6b5679d2

由 Gustavo A. R. Silva 提交于 5月 28, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

6b5679d2

can: Replace zero-length array with flexible-array · d6562f1c

由 Gustavo A. R. Silva 提交于 5月 28, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

d6562f1c

dmaengine: Replace zero-length array with flexible-array · 466f966b

由 Gustavo A. R. Silva 提交于 5月 28, 2020

There is a regular need in the kernel to provide a way to declare having a
dynamically sized set of trailing elements in a structure. Kernel code should
always use “flexible array members”[1] for these cases. The older style of
one-element or zero-length arrays should no longer be used[2].

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://github.com/KSPP/linux/issues/21Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>

466f966b

scsi: libata: Provide an ata_scsi_dma_need_drain stub for !CONFIG_ATA · 7bb7ee87

由 Christoph Hellwig 提交于 6月 15, 2020

SAS drivers can be compiled with ata support disabled. Provide a stub so
that the drivers don't have to ifdef around wiring up
ata_scsi_dma_need_drain.

Link: https://lore.kernel.org/r/20200615064624.37317-2-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

7bb7ee87

15 6月, 2020 2 次提交

crypto: ccp - Fix sparse warnings in sev-dev · 376bd28d

由 Herbert Xu 提交于 6月 04, 2020

This patch fixes a bunch of sparse warnings in sev-dev where the
__user marking is incorrectly handled.
Reported-by: Nkbuild test robot <lkp@intel.com>
Fixes: 7360e4b1 ("crypto: ccp: Implement SEV_PEK_CERT_IMPORT...")
Fixes: e7990356 ("crypto: ccp: Implement SEV_PEK_CSR ioctl...")
Fixes: 76a2b524 ("crypto: ccp: Implement SEV_PDH_CERT_EXPORT...")
Fixes: d6112ea0 ("crypto: ccp - introduce SEV_GET_ID2 command")
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: NBrijesh Singh <brijesh.singh@amd.com>
Acked-by: NTom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

376bd28d

security: Add LSM hooks to set*gid syscalls · 39030e13

由 Thomas Cedeno 提交于 6月 09, 2020

The SafeSetID LSM uses the security_task_fix_setuid hook to filter
set*uid() syscalls according to its configured security policy. In
preparation for adding analagous support in the LSM for set*gid()
syscalls, we add the requisite hook here. Tested by putting print
statements in the security_task_fix_setgid hook and seeing them get hit
during kernel boot.
Signed-off-by: NThomas Cedeno <thomascedeno@google.com>
Signed-off-by: NMicah Morton <mortonm@chromium.org>

39030e13

13 6月, 2020 1 次提交

ext4, jbd2: ensure panic by fix a race between jbd2 abort and ext4 error handlers · 7b97d868

由 zhangyi (F) 提交于 6月 09, 2020

In the ext4 filesystem with errors=panic, if one process is recording
errno in the superblock when invoking jbd2_journal_abort() due to some
error cases, it could be raced by another __ext4_abort() which is
setting the SB_RDONLY flag but missing panic because errno has not been
recorded.

jbd2_journal_commit_transaction()
 jbd2_journal_abort()
  journal->j_flags |= JBD2_ABORT;
  jbd2_journal_update_sb_errno()
                                    | ext4_journal_check_start()
                                    |  __ext4_abort()
                                    |   sb->s_flags |= SB_RDONLY;
                                    |   if (!JBD2_REC_ERR)
                                    |        return;
  journal->j_flags |= JBD2_REC_ERR;

Finally, it will no longer trigger panic because the filesystem has
already been set read-only. Fix this by introduce j_abort_mutex to make
sure journal abort is completed before panic, and remove JBD2_REC_ERR
flag.

Fixes: 4327ba52 ("ext4, jbd2: ensure entering into panic after recording an error in superblock")
Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20200609073540.3810702-1-yi.zhang@huawei.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>

7b97d868

12 6月, 2020 8 次提交

compiler_types.h, kasan: Use __SANITIZE_ADDRESS__ instead of CONFIG_KASAN to decide inlining · 1f44328e

由 Marco Elver 提交于 5月 21, 2020

Use __always_inline in compilation units that have instrumentation
disabled (KASAN_SANITIZE_foo.o := n) for KASAN, like it is done for
KCSAN.

Also, add common documentation for KASAN and KCSAN explaining the
attribute.

 [ bp: Massage commit message. ]
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200521142047.169334-12-elver@google.com

1f44328e

compiler.h: Move function attributes to compiler_types.h · eb73876c

由 Marco Elver 提交于 5月 21, 2020

Cleanup and move the KASAN and KCSAN related function attributes to
compiler_types.h, where the rest of the same kind live.

No functional change intended.
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200521142047.169334-11-elver@google.com

eb73876c

compiler.h: Avoid nested statement expression in data_race() · 95c094fc

由 Marco Elver 提交于 5月 21, 2020

It appears that compilers have trouble with nested statement
expressions. Therefore, remove one level of statement expression nesting
from the data_race() macro. This will help avoiding potential problems
in the future as its usage increases.
Reported-by: NBorislav Petkov <bp@suse.de>
Reported-by: NNathan Chancellor <natechancellor@gmail.com>
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will@kernel.org>
Tested-by: NNick Desaulniers <ndesaulniers@google.com>
Link: https://lkml.kernel.org/r/20200520221712.GA21166@zn.tnic
Link: https://lkml.kernel.org/r/20200521142047.169334-10-elver@google.com

95c094fc

compiler.h: Remove data_race() and unnecessary checks from {READ,WRITE}_ONCE() · 44b97dcc

由 Marco Elver 提交于 5月 21, 2020

The volatile accesses no longer need to be wrapped in data_race()
because compilers that emit instrumentation distinguishing volatile
accesses are required for KCSAN.

Consequently, the explicit kcsan_check_atomic*() are no longer required
either since the compiler emits instrumentation distinguishing the
volatile accesses.

Finally, simplify __READ_ONCE_SCALAR() and remove __WRITE_ONCE_SCALAR().

 [ bp: Convert commit message to passive voice. ]
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200521142047.169334-9-elver@google.com

44b97dcc

kcsan: Remove 'noinline' from __no_kcsan_or_inline · e3b779d9

由 Marco Elver 提交于 5月 21, 2020

Some compilers incorrectly inline small __no_kcsan functions, which then
results in instrumenting the accesses. For this reason, the 'noinline'
attribute was added to __no_kcsan_or_inline. All known versions of GCC
are affected by this. Supported versions of Clang are unaffected, and
never inline a no_sanitize function.

However, the attribute 'noinline' in __no_kcsan_or_inline causes
unexpected code generation in functions that are __no_kcsan and call a
__no_kcsan_or_inline function.

In certain situations it is expected that the __no_kcsan_or_inline
function is actually inlined by the __no_kcsan function, and *no* calls
are emitted. By removing the 'noinline' attribute, give the compiler
the ability to inline and generate the expected code in __no_kcsan
functions.
Signed-off-by: NMarco Elver <elver@google.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/CANpmjNNOpJk0tprXKB_deiNAv_UmmORf1-2uajLhnLWQQ1hvoA@mail.gmail.com
Link: https://lkml.kernel.org/r/20200521142047.169334-6-elver@google.com

e3b779d9

nfs: set invalid blocks after NFSv4 writes · 3a39e778

由 Zheng Bin 提交于 5月 21, 2020

Use the following command to test nfsv4(size of file1M is 1MB):
mount -t nfs -o vers=4.0,actimeo=60 127.0.0.1/dir1 /mnt
cp file1M /mnt
du -h /mnt/file1M  -->0 within 60s, then 1M

When write is done(cp file1M /mnt), will call this:
nfs_writeback_done
  nfs4_write_done
    nfs4_write_done_cb
      nfs_writeback_update_inode
        nfs_post_op_update_inode_force_wcc_locked(change, ctime, mtime
nfs_post_op_update_inode_force_wcc_locked
   nfs_set_cache_invalid
   nfs_refresh_inode_locked
     nfs_update_inode

nfsd write response contains change, ctime, mtime, the flag will be
clear after nfs_update_inode. Howerver, write response does not contain
space_used, previous open response contains space_used whose value is 0,
so inode->i_blocks is still 0.

nfs_getattr  -->called by "du -h"
  do_update |= force_sync || nfs_attribute_cache_expired -->false in 60s
  cache_validity = READ_ONCE(NFS_I(inode)->cache_validity)
  do_update |= cache_validity & (NFS_INO_INVALID_ATTR    -->false
  if (do_update) {
        __nfs_revalidate_inode
  }

Within 60s, does not send getattr request to nfsd, thus "du -h /mnt/file1M"
is 0.

Add a NFS_INO_INVALID_BLOCKS flag, set it when nfsv4 write is done.

Fixes: 16e14375 ("NFS: More fine grained attribute tracking")
Signed-off-by: NZheng Bin <zhengbin13@huawei.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

3a39e778

SUNRPC: receive buffer size estimation values almost never change · 53bc19f1

由 Chuck Lever 提交于 5月 12, 2020

Avoid unnecessary cache sloshing by placing the buffer size
estimation update logic behind an atomic bit flag.

The size of GSS information included in each wrapped Reply does
not change during the lifetime of a GSS context. Therefore, the
au_rslack and au_ralign fields need to be updated only once after
establishing a fresh GSS credential.

Thus a slack size update must occur after a cred is created,
duplicated, renewed, or expires. I'm not sure I have this exactly
right. A trace point is introduced to track updates to these
variables to enable troubleshooting the problem if I missed a spot.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

53bc19f1

KVM: async_pf: Inject 'page ready' event only if 'page not present' was previously injected · 2a18b7e7

由 Vitaly Kuznetsov 提交于 6月 10, 2020

'Page not present' event may or may not get injected depending on
guest's state. If the event wasn't injected, there is no need to
inject the corresponding 'page ready' event as the guest may get
confused. E.g. Linux thinks that the corresponding 'page not present'
event wasn't delivered *yet* and allocates a 'dummy entry' for it.
This entry is never freed.

Note, 'wakeup all' events have no corresponding 'page not present'
event and always get injected.

s390 seems to always be able to inject 'page not present', the
change is effectively a nop.
Suggested-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200610175532.779793-2-vkuznets@redhat.com>
Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=208081Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

2a18b7e7

11 6月, 2020 2 次提交

x86/{mce,mm}: Unmap the entire page if the whole page is affected and poisoned · 17fae129

由 Tony Luck 提交于 5月 20, 2020

An interesting thing happened when a guest Linux instance took a machine
check. The VMM unmapped the bad page from guest physical space and
passed the machine check to the guest.

Linux took all the normal actions to offline the page from the process
that was using it. But then guest Linux crashed because it said there
was a second machine check inside the kernel with this stack trace:

do_memory_failure
    set_mce_nospec
         set_memory_uc
              _set_memory_uc
                   change_page_attr_set_clr
                        cpa_flush
                             clflush_cache_range_opt

This was odd, because a CLFLUSH instruction shouldn't raise a machine
check (it isn't consuming the data). Further investigation showed that
the VMM had passed in another machine check because is appeared that the
guest was accessing the bad page.

Fix is to check the scope of the poison by checking the MCi_MISC register.
If the entire page is affected, then unmap the page. If only part of the
page is affected, then mark the page as uncacheable.

This assumes that VMMs will do the logical thing and pass in the "whole
page scope" via the MCi_MISC register (since they unmapped the entire
page).

  [ bp: Adjust to x86/entry changes. ]

Fixes: 284ce401 ("x86/memory_failure: Introduce {set, clear}_mce_nospec()")
Reported-by: NJue Wang <juew@google.com>
Signed-off-by: NTony Luck <tony.luck@intel.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NJue Wang <juew@google.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/20200520163546.GA7977@agluck-desk2.amr.corp.intel.com

17fae129

x86/entry: Unbreak __irqentry_text_start/end magic · f0178fc0

由 Thomas Gleixner 提交于 6月 10, 2020

The entry rework moved interrupt entry code from the irqentry to the
noinstr section which made the irqentry section empty.

This breaks boundary checks which rely on the __irqentry_text_start/end
markers to find out whether a function in a stack trace is
interrupt/exception entry code. This affects the function graph tracer and
filter_irq_stacks().

As the IDT entry points are all sequentialy emitted this is rather simple
to unbreak by injecting __irqentry_text_start/end as global labels.

To make this work correctly:

  - Remove the IRQENTRY_TEXT section from the x86 linker script
  - Define __irqentry so it breaks the build if it's used
  - Adjust the entry mirroring in PTI
  - Remove the redundant kprobes and unwinder bound checks
Reported-by: NQian Cai <cai@lca.pw>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>

f0178fc0

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功