提交 · e1720fee27246dfb84f7c433e35367170a2d4436 · openanolis / cloud-kernel

03 8月, 2018 2 次提交

mm/memblock: add a name for memblock flags enumeration · e1720fee

由 Mike Rapoport 提交于 6月 30, 2018

Since kernel-doc does not like anonymous enums the name is required for
adding documentation. While on it, I've also updated all the function
declarations to use 'enum memblock_flags' instead of unsigned long.
Signed-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>

e1720fee

M
docs/mm: bootmem: add kernel-doc description of 'struct bootmem_data' · a83c0ea7
由 Mike Rapoport 提交于 6月 30, 2018
```
Signed-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: NJonathan Corbet <corbet@lwn.net>
```
a83c0ea7

29 6月, 2018 4 次提交

sg: remove ->sg_magic member · 9544bc53

由 Jens Axboe 提交于 6月 29, 2018

This was introduced more than a decade ago when sg chaining was
added, but we never really caught anything with it. The scatterlist
entry size can be critical, since drivers allocate it, so remove
the magic member. Recently it's been triggering allocation stalls
and failures in NVMe.
Tested-by: NJordan Glover <Golden_Miller83@protonmail.ch>
Acked-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

9544bc53

include/linux/dax.h: dax_iomap_fault() returns vm_fault_t · f77bc3a8

由 Souptick Joarder 提交于 6月 27, 2018

Commit 1c8f4220 ("mm: change return type to vm_fault_t") missed a
conversion.  It's not a big problem at present because mainline is still
using

	typedef int vm_fault_t;

Fixes: 1c8f4220 ("mm: change return type to vm_fault_t")
Link: http://lkml.kernel.org/r/20180620172046.GA27894@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f77bc3a8

slub: fix failure when we delete and create a slab cache · d50d82fa

由 Mikulas Patocka 提交于 6月 27, 2018

In kernel 4.17 I removed some code from dm-bufio that did slab cache
merging (commit 21bb1327: "dm bufio: remove code that merges slab
caches") - both slab and slub support merging caches with identical
attributes, so dm-bufio now just calls kmem_cache_create and relies on
implicit merging.

This uncovered a bug in the slub subsystem - if we delete a cache and
immediatelly create another cache with the same attributes, it fails
because of duplicate filename in /sys/kernel/slab/.  The slub subsystem
offloads freeing the cache to a workqueue - and if we create the new
cache before the workqueue runs, it complains because of duplicate
filename in sysfs.

This patch fixes the bug by moving the call of kobject_del from
sysfs_slab_remove_workfn to shutdown_cache.  kobject_del must be called
while we hold slab_mutex - so that the sysfs entry is deleted before a
cache with the same attributes could be created.

Running device-mapper-test-suite with:

  dmtest run --suite thin-provisioning -n /commit_failure_causes_fallback/

triggered:

  Buffer I/O error on dev dm-0, logical block 1572848, async page read
  device-mapper: thin: 253:1: metadata operation 'dm_pool_alloc_data_block' failed: error = -5
  device-mapper: thin: 253:1: aborting current metadata transaction
  sysfs: cannot create duplicate filename '/kernel/slab/:a-0000144'
  CPU: 2 PID: 1037 Comm: kworker/u48:1 Not tainted 4.17.0.snitm+ #25
  Hardware name: Supermicro SYS-1029P-WTR/X11DDW-L, BIOS 2.0a 12/06/2017
  Workqueue: dm-thin do_worker [dm_thin_pool]
  Call Trace:
   dump_stack+0x5a/0x73
   sysfs_warn_dup+0x58/0x70
   sysfs_create_dir_ns+0x77/0x80
   kobject_add_internal+0xba/0x2e0
   kobject_init_and_add+0x70/0xb0
   sysfs_slab_add+0xb1/0x250
   __kmem_cache_create+0x116/0x150
   create_cache+0xd9/0x1f0
   kmem_cache_create_usercopy+0x1c1/0x250
   kmem_cache_create+0x18/0x20
   dm_bufio_client_create+0x1ae/0x410 [dm_bufio]
   dm_block_manager_create+0x5e/0x90 [dm_persistent_data]
   __create_persistent_data_objects+0x38/0x940 [dm_thin_pool]
   dm_pool_abort_metadata+0x64/0x90 [dm_thin_pool]
   metadata_operation_failed+0x59/0x100 [dm_thin_pool]
   alloc_data_block.isra.53+0x86/0x180 [dm_thin_pool]
   process_cell+0x2a3/0x550 [dm_thin_pool]
   do_worker+0x28d/0x8f0 [dm_thin_pool]
   process_one_work+0x171/0x370
   worker_thread+0x49/0x3f0
   kthread+0xf8/0x130
   ret_from_fork+0x35/0x40
  kobject_add_internal failed for :a-0000144 with -EEXIST, don't try to register things with the same name in the same directory.
  kmem_cache_create(dm_bufio_buffer-16) failed with error -17

Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1806151817130.6333@file01.intranet.prod.int.rdu2.redhat.comSigned-off-by: NMikulas Patocka <mpatocka@redhat.com>
Reported-by: NMike Snitzer <snitzer@redhat.com>
Tested-by: NMike Snitzer <snitzer@redhat.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d50d82fa

Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43

由 Linus Torvalds 提交于 6月 28, 2018

The poll() changes were not well thought out, and completely
unexplained.  They also caused a huge performance regression, because
"->poll()" was no longer a trivial file operation that just called down
to the underlying file operations, but instead did at least two indirect
calls.

Indirect calls are sadly slow now with the Spectre mitigation, but the
performance problem could at least be largely mitigated by changing the
"->get_poll_head()" operation to just have a per-file-descriptor pointer
to the poll head instead.  That gets rid of one of the new indirections.

But that doesn't fix the new complexity that is completely unwarranted
for the regular case.  The (undocumented) reason for the poll() changes
was some alleged AIO poll race fixing, but we don't make the common case
slower and more complex for some uncommon special case, so this all
really needs way more explanations and most likely a fundamental
redesign.

[ This revert is a revert of about 30 different commits, not reverted
  individually because that would just be unnecessarily messy  - Linus ]

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a11e1d43

27 6月, 2018 1 次提交

block: Fix transfer when chunk sectors exceeds max · 15bfd21f

由 Keith Busch 提交于 6月 26, 2018

A device may have boundary restrictions where the number of sectors
between boundaries exceeds its max transfer size. In this case, we need
to cap the max size to the smaller of the two limits.
Reported-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com>
Tested-by: NJitendra Bhivare <jitendra.bhivare@broadcom.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NKeith Busch <keith.busch@intel.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

15bfd21f

25 6月, 2018 5 次提交

disable -Wattribute-alias warning for SYSCALL_DEFINEx() · bee20031

由 Arnd Bergmann 提交于 6月 19, 2018

gcc-8 warns for every single definition of a system call entry
point, e.g.:

include/linux/compat.h:56:18: error: 'compat_sys_rt_sigprocmask' alias between functions of incompatible types 'long int(int,  compat_sigset_t *, compat_sigset_t *, compat_size_t)' {aka 'long int(int,  struct <anonymous> *, struct <anonymous> *, unsigned int)'} and 'long int(long int,  long int,  long int,  long int)' [-Werror=attribute-alias]
  asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
                  ^~~~~~~~~~
include/linux/compat.h:45:2: note: in expansion of macro 'COMPAT_SYSCALL_DEFINEx'
  COMPAT_SYSCALL_DEFINEx(4, _##name, __VA_ARGS__)
  ^~~~~~~~~~~~~~~~~~~~~~
kernel/signal.c:2601:1: note: in expansion of macro 'COMPAT_SYSCALL_DEFINE4'
 COMPAT_SYSCALL_DEFINE4(rt_sigprocmask, int, how, compat_sigset_t __user *, nset,
 ^~~~~~~~~~~~~~~~~~~~~~
include/linux/compat.h:60:18: note: aliased declaration here
  asmlinkage long compat_SyS##name(__MAP(x,__SC_LONG,__VA_ARGS__))\
                  ^~~~~~~~~~

The new warning seems reasonable in principle, but it doesn't
help us here, since we rely on the type mismatch to sanitize the
system call arguments. After I reported this as GCC PR82435, a new
-Wno-attribute-alias option was added that could be used to turn the
warning off globally on the command line, but I'd prefer to do it a
little more fine-grained.

Interestingly, turning a warning off and on again inside of
a single macro doesn't always work, in this case I had to add
an extra statement inbetween and decided to copy the __SC_TEST
one from the native syscall to the compat syscall macro.  See
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83256 for more details
about this.

[paul.burton@mips.com:
  - Rebase atop current master.
  - Split GCC & version arguments to __diag_ignore() in order to match
    changes to the preceding patch.
  - Add the comment argument to match the preceding patch.]

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82435Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NPaul Burton <paul.burton@mips.com>
Tested-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Tested-by: NStafford Horne <shorne@gmail.com>
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

bee20031

kbuild: add macro for controlling warnings to linux/compiler.h · 8793bb7f

由 Arnd Bergmann 提交于 6月 19, 2018

I have occasionally run into a situation where it would make sense to
control a compiler warning from a source file rather than doing so from
a Makefile using the $(cc-disable-warning, ...) or $(cc-option, ...)
helpers.

The approach here is similar to what glibc uses, using __diag() and
related macros to encapsulate a _Pragma("GCC diagnostic ...") statement
that gets turned into the respective "#pragma GCC diagnostic ..." by
the preprocessor when the macro gets expanded.

Like glibc, I also have an argument to pass the affected compiler
version, but decided to actually evaluate that one. For now, this
supports GCC_4_6, GCC_4_7, GCC_4_8, GCC_4_9, GCC_5, GCC_6, GCC_7,
GCC_8 and GCC_9. Adding support for CLANG_5 and other interesting
versions is straightforward here. GNU compilers starting with gcc-4.2
could support it in principle, but "#pragma GCC diagnostic push"
was only added in gcc-4.6, so it seems simpler to not deal with those
at all. The same versions show a large number of warnings already,
so it seems easier to just leave it at that and not do a more
fine-grained control for them.

The use cases I found so far include:

- turning off the gcc-8 -Wattribute-alias warning inside of the
  SYSCALL_DEFINEx() macro without having to do it globally.

- Reducing the build time for a simple re-make after a change,
  once we move the warnings from ./Makefile and
  ./scripts/Makefile.extrawarn into linux/compiler.h

- More control over the warnings based on other configurations,
  using preprocessor syntax instead of Makefile syntax. This should make
  it easier for the average developer to understand and change things.

- Adding an easy way to turn the W=1 option on unconditionally
  for a subdirectory or a specific file. This has been requested
  by several developers in the past that want to have their subsystems
  W=1 clean.

- Integrating clang better into the build systems. Clang supports
  more warnings than GCC, and we probably want to classify them
  as default, W=1, W=2 etc, but there are cases in which the
  warnings should be classified differently due to excessive false
  positives from one or the other compiler.

- Adding a way to turn the default warnings into errors (e.g. using
  a new "make E=0" tag) while not also turning the W=1 warnings into
  errors.

This patch for now just adds the minimal infrastructure in order to
do the first of the list above. As the #pragma GCC diagnostic
takes precedence over command line options, the next step would be
to convert a lot of the individual Makefiles that set nonstandard
options to use __diag() instead.

[paul.burton@mips.com:
  - Rebase atop current master.
  - Add __diag_GCC, or more generally __diag_<compiler>, abstraction to
    avoid code outside of linux/compiler-gcc.h needing to duplicate
    knowledge about different GCC versions.
  - Add a comment argument to __diag_{ignore,warn,error} which isn't
    used in the expansion of the macros but serves to push people to
    document the reason for using them - per feedback from Kees Cook.
  - Translate severity to GCC-specific pragmas in linux/compiler-gcc.h
    rather than using GCC-specific in linux/compiler_types.h.
  - Drop all but GCC 8 macros, since we only need to define macros for
    versions that we need to introduce pragmas for, and as of this
    series that's just GCC 8.
  - Capitalize comments in linux/compiler-gcc.h to match the style of
    the rest of the file.
  - Line up macro definitions with tabs in linux/compiler-gcc.h.]
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NPaul Burton <paul.burton@mips.com>
Tested-by: NChristophe Leroy <christophe.leroy@c-s.fr>
Tested-by: NStafford Horne <shorne@gmail.com>
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

8793bb7f

acpi: Add helper for deactivating memory region · d2d2e3c4

由 Heikki Krogerus 提交于 6月 21, 2018

Sometimes memory resource may be overlapping with
SystemMemory Operation Region by design, for example if the
memory region is used as a mailbox for communication with a
firmware in the system. One occasion of such mailboxes is
USB Type-C Connector System Software Interface (UCSI).

With regions like that, it is important that the driver is
able to map the memory with the requirements it has. For
example, the driver should be allowed to map the memory as
non-cached memory. However, if the operation region has been
accessed before the driver has mapped the memory, the memory
has been marked as write-back by the time the driver is
loaded. That means the driver will fail to map the memory
if it expects non-cached memory.

To work around the problem, introducing helper that the
drivers can use to temporarily deactivate (unmap)
SystemMemory Operation Regions that overlap with their
IO memory.

Fixes: 8243edf4 ("usb: typec: ucsi: Add ACPI driver")
Cc: stable@vger.kernel.org
Reviewed-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NHeikki Krogerus <heikki.krogerus@linux.intel.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

d2d2e3c4

PM / Domains: Rename opp_node to np · ad6384ba

由 Viresh Kumar 提交于 6月 20, 2018

The DT node passed here isn't necessarily an OPP node, as this routine
can also be used for cases where the "required-opps" property is present
directly in the device's node. Rename it.

This also removes a stale comment.
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

ad6384ba

PM / Domains: Fix return value of of_genpd_opp_to_performance_state() · 5e03aa61

由 Viresh Kumar 提交于 6月 14, 2018

of_genpd_opp_to_performance_state() should return 0 for errors, but the
dummy routine isn't doing that. Fix it.
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NUlf Hansson <ulf.hansson@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

5e03aa61

24 6月, 2018 1 次提交

iio: buffer: fix the function signature to match implementation · 92397a6c

由 Phil Reid 提交于 6月 05, 2018

linux/iio/buffer-dma.h was not updated to when length was changed to
unsigned int.

Fixes: c043ec1c ("iio:buffer: make length types match kfifo types")
Signed-off-by: NPhil Reid <preid@electromag.com.au>
Signed-off-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>

92397a6c

23 6月, 2018 2 次提交

bdi: Fix another oops in wb_workfn() · 3ee7e869

由 Jan Kara 提交于 6月 18, 2018

syzbot is reporting NULL pointer dereference at wb_workfn() [1] due to
wb->bdi->dev being NULL. And Dmitry confirmed that wb->state was
WB_shutting_down after wb->bdi->dev became NULL. This indicates that
unregister_bdi() failed to call wb_shutdown() on one of wb objects.

The problem is in cgwb_bdi_unregister() which does cgwb_kill() and thus
drops bdi's reference to wb structures before going through the list of
wbs again and calling wb_shutdown() on each of them. This way the loop
iterating through all wbs can easily miss a wb if that wb has already
passed through cgwb_remove_from_bdi_list() called from wb_shutdown()
from cgwb_release_workfn() and as a result fully shutdown bdi although
wb_workfn() for this wb structure is still running. In fact there are
also other ways cgwb_bdi_unregister() can race with
cgwb_release_workfn() leading e.g. to use-after-free issues:

CPU1                            CPU2
                                cgwb_bdi_unregister()
                                  cgwb_kill(*slot);

cgwb_release()
  queue_work(cgwb_release_wq, &wb->release_work);
cgwb_release_workfn()
                                  wb = list_first_entry(&bdi->wb_list, ...)
                                  spin_unlock_irq(&cgwb_lock);
  wb_shutdown(wb);
  ...
  kfree_rcu(wb, rcu);
                                  wb_shutdown(wb); -> oops use-after-free

We solve these issues by synchronizing writeback structure shutdown from
cgwb_bdi_unregister() with cgwb_release_workfn() using a new mutex. That
way we also no longer need synchronization using WB_shutting_down as the
mutex provides it for CONFIG_CGROUP_WRITEBACK case and without
CONFIG_CGROUP_WRITEBACK wb_shutdown() can be called only once from
bdi_unregister().
Reported-by: Nsyzbot <syzbot+4a7438e774b21ddd8eca@syzkaller.appspotmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

3ee7e869

rseq: Avoid infinite recursion when delivering SIGSEGV · 784e0300

由 Will Deacon 提交于 6月 22, 2018

When delivering a signal to a task that is using rseq, we call into
__rseq_handle_notify_resume() so that the registers pushed in the
sigframe are updated to reflect the state of the restartable sequence
(for example, ensuring that the signal returns to the abort handler if
necessary).

However, if the rseq management fails due to an unrecoverable fault when
accessing userspace or certain combinations of RSEQ_CS_* flags, then we
will attempt to deliver a SIGSEGV. This has the potential for infinite
recursion if the rseq code continuously fails on signal delivery.

Avoid this problem by using force_sigsegv() instead of force_sig(), which
is explicitly designed to reset the SEGV handler to SIG_DFL in the case
of a recursive fault. In doing so, remove rseq_signal_deliver() from the
internal rseq API and have an optional struct ksignal * parameter to
rseq_handle_notify_resume() instead.
Signed-off-by: NWill Deacon <will.deacon@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: peterz@infradead.org
Cc: paulmck@linux.vnet.ibm.com
Cc: boqun.feng@gmail.com
Link: https://lkml.kernel.org/r/1529664307-983-1-git-send-email-will.deacon@arm.com

784e0300

22 6月, 2018 2 次提交

irqdesc: Delete irq_desc_get_msi_desc() · bed9df97

由 John Garry 提交于 6月 22, 2018

Function irq_desc_get_msi_desc() is not referenced in the kernel (and does
not seem to have been referenced since e39758e0, 3 years ago), so
delete it.
Signed-off-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: <marc.zyngier@arm.com>
Cc: <will.deacon@arm.com>
Cc: <kstewart@linuxfoundation.org>
Cc: <julien.thierry@arm.com>
Cc: <andrew@lunn.ch>
Cc: <trivial@kernel.org>
Link: https://lkml.kernel.org/r/1529667333-92959-1-git-send-email-john.garry@huawei.com

bed9df97

genirq/debugfs: Add missing IRQCHIP_SUPPORTS_LEVEL_MSI debug · 72a8edc2

由 Marc Zyngier 提交于 6月 22, 2018

Debug is missing the IRQCHIP_SUPPORTS_LEVEL_MSI debug entry, making debugfs
slightly less useful.

Take this opportunity to also add a missing comment in the definition of
IRQCHIP_SUPPORTS_LEVEL_MSI.

Fixes: 6988e0e0 ("genirq/msi: Limit level-triggered MSI to platform devices")
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Alexandre Belloni <alexandre.belloni@bootlin.com>
Cc: Yang Yingliang <yangyingliang@huawei.com>
Cc: Sumit Garg <sumit.garg@linaro.org>
Link: https://lkml.kernel.org/r/20180622095254.5906-2-marc.zyngier@arm.com

72a8edc2

21 6月, 2018 4 次提交

kernel.h: Fix a typo in comment · 8730662d

由 Wei Wang 提交于 4月 24, 2018

Signed-off-by: NWei Wang <wvw@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: Crt Mori <cmo@melexis.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: gregkh@linuxfoundation.org
Cc: wei.vince.wang@gmail.com
Link: https://lkml.kernel.org/lkml/20180424212241.16013-1-wvw@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8730662d

x86/platform/UV: Add adjustable set memory block size function · f642fb58

由 mike.travis@hpe.com 提交于 5月 24, 2018

Add a new function to "adjust" the current fixed UV memory block size
of 2GB so it can be changed to a different physical boundary.  This is
out of necessity so arch dependent code can accommodate specific BIOS
requirements which can align these new PMEM modules at less than the
default boundaries.

A "set order" type of function was used to insure that the memory block
size will be a power of two value without requiring a validity check.
64GB was chosen as the upper limit for memory block size values to
accommodate upcoming 4PB systems which have 6 more bits of physical
address space (46 becoming 52).
Signed-off-by: NMike Travis <mike.travis@hpe.com>
Reviewed-by: NAndrew Banman <andrew.banman@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russ Anderson <russ.anderson@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: dan.j.williams@intel.com
Cc: jgross@suse.com
Cc: kirill.shutemov@linux.intel.com
Cc: mhocko@suse.com
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/lkml/20180524201711.609546602@stormcage.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

f642fb58

rseq/cleanup: Do not abort rseq c.s. in child on fork() · 9a789fcf

由 Mathieu Desnoyers 提交于 6月 19, 2018

Considering that we explicitly forbid system calls in rseq critical
sections, it is not valid to issue a fork or clone system call within a
rseq critical section, so rseq_fork() is not required to restart an
active rseq c.s. in the child process.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ben Maurer <bmaurer@fb.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chris Lameter <cl@linux.com>
Cc: Dave Watson <davejwatson@fb.com>
Cc: Joel Fernandes <joelaf@google.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com>
Cc: Paul Turner <pjt@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Russell King <linux@arm.linux.org.uk>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-api@vger.kernel.org
Cc: linux-kselftest@vger.kernel.org
Link: https://lore.kernel.org/lkml/20180619133230.4087-4-mathieu.desnoyers@efficios.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

9a789fcf

bpf: enforce correct alignment for instructions · 92624782

由 Eric Dumazet 提交于 6月 20, 2018

After commit 9facc336 ("bpf: reject any prog that failed read-only lock")
offsetof(struct bpf_binary_header, image) became 3 instead of 4,
breaking powerpc BPF badly, since instructions need to be word aligned.

Fixes: 9facc336 ("bpf: reject any prog that failed read-only lock")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92624782

19 6月, 2018 1 次提交

pNFS/flexfiles: Don't tie up all the rpciod threads in resends · 42f86b44

由 Trond Myklebust 提交于 6月 18, 2018

We do not want to have rpciod threads perform recursive calls into the
RPC layer since that can deadlock. In particular, having to wait for
a layoutget can be nasty... We want rather to defer scheduling those
retries until we're in the rpc_release() callback, since that is
called from the nfsiod workqueue.
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

42f86b44

17 6月, 2018 2 次提交

firmware: dmi: Add access to the SKU ID string · b23908d3

由 Simon Glass 提交于 6月 17, 2018

This is used in some systems from user space for determining the identity
of the device.

Expose this as a file so that that user-space tools don't need to read
from /sys/firmware/dmi/tables/DMI
Signed-off-by: NSimon Glass <sjg@chromium.org>
Signed-off-by: NJean Delvare <jdelvare@suse.de>

b23908d3

atm: Preserve value of skb->truesize when accounting to vcc · 9bbe60a6

由 David Woodhouse 提交于 6月 16, 2018

ATM accounts for in-flight TX packets in sk_wmem_alloc of the VCC on
which they are to be sent. But it doesn't take ownership of those
packets from the sock (if any) which originally owned them. They should
remain owned by their actual sender until they've left the box.

There's a hack in pskb_expand_head() to avoid adjusting skb->truesize
for certain skbs, precisely to avoid messing up sk_wmem_alloc
accounting. Ideally that hack would cover the ATM use case too, but it
doesn't — skbs which aren't owned by any sock, for example PPP control
frames, still get their truesize adjusted when the low-level ATM driver
adds headroom.

This has always been an issue, it seems. The truesize of a packet
increases, and sk_wmem_alloc on the VCC goes negative. But this wasn't
for normal traffic, only for control frames. So I think we just got away
with it, and we probably needed to send 2GiB of LCP echo frames before
the misaccounting would ever have caused a problem and caused
atm_may_send() to start refusing packets.

Commit 14afee4b ("net: convert sock.sk_wmem_alloc from atomic_t to
refcount_t") did exactly what it was intended to do, and turned this
mostly-theoretical problem into a real one, causing PPPoATM to fail
immediately as sk_wmem_alloc underflows and atm_may_send() *immediately*
starts refusing to allow new packets.

The least intrusive solution to this problem is to stash the value of
skb->truesize that was accounted to the VCC, in a new member of the
ATM_SKB(skb) structure. Then in atm_pop_raw() subtract precisely that
value instead of the then-current value of skb->truesize.

Fixes: 158f323b ("net: adjust skb->truesize in pskb_expand_head()")
Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
Tested-by: NKevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9bbe60a6

16 6月, 2018 6 次提交

xdp: Fix handling of devmap in generic XDP · 6d5fc195

由 Toshiaki Makita 提交于 6月 14, 2018

Commit 67f29e07 ("bpf: devmap introduce dev_map_enqueue") changed
the return value type of __devmap_lookup_elem() from struct net_device *
to struct bpf_dtab_netdev * but forgot to modify generic XDP code
accordingly.

Thus generic XDP incorrectly used struct bpf_dtab_netdev where struct
net_device is expected, then skb->dev was set to invalid value.

v2:
- Fix compiler warning without CONFIG_BPF_SYSCALL.

Fixes: 67f29e07 ("bpf: devmap introduce dev_map_enqueue")
Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Acked-by: NYonghong Song <yhs@fb.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>

6d5fc195

fix a series of Documentation/ broken file name references · 44348e8a

由 Mauro Carvalho Chehab 提交于 6月 14, 2018

As files move around, their previous links break. Fix the
references for them.
Acked-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: NJonathan Corbet <corbet@lwn.net>

44348e8a

docs: Fix some broken references · 5fb94e9c

由 Mauro Carvalho Chehab 提交于 5月 08, 2018

As we move stuff around, some doc references are broken. Fix some of
them via this script:
	./scripts/documentation-file-ref-check --fix

Manually checked if the produced result is valid, removing a few
false-positives.
Acked-by: NTakashi Iwai <tiwai@suse.de>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Acked-by: NStephen Boyd <sboyd@kernel.org>
Acked-by: NCharles Keepax <ckeepax@opensource.wolfsonmicro.com>
Acked-by: NMathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: NColy Li <colyli@suse.de>
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: NJonathan Corbet <corbet@lwn.net>

5fb94e9c

docs: fix broken references with multiple hints · ec15872d

由 Mauro Carvalho Chehab 提交于 5月 08, 2018

The script:
	./scripts/documentation-file-ref-check --fix

Gives multiple hints for broken references on some files.
Manually use the one that applies for some files.
Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: NJames Morris <james.morris@microsoft.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab+samsung@kernel.org>
Acked-by: NJonathan Corbet <corbet@lwn.net>

ec15872d

bpf: reject any prog that failed read-only lock · 9facc336

由 Daniel Borkmann 提交于 6月 15, 2018

We currently lock any JITed image as read-only via bpf_jit_binary_lock_ro()
as well as the BPF image as read-only through bpf_prog_lock_ro(). In
the case any of these would fail we throw a WARN_ON_ONCE() in order to
yell loudly to the log. Perhaps, to some extend, this may be comparable
to an allocation where __GFP_NOWARN is explicitly not set.

Added via 65869a47 ("bpf: improve read-only handling"), this behavior
is slightly different compared to any of the other in-kernel set_memory_ro()
users who do not check the return code of set_memory_ro() and friends /at
all/ (e.g. in the case of module_enable_ro() / module_disable_ro()). Given
in BPF this is mandatory hardening step, we want to know whether there
are any issues that would leave both BPF data writable. So it happens
that syzkaller enabled fault injection and it triggered memory allocation
failure deep inside x86's change_page_attr_set_clr() which was triggered
from set_memory_ro().

Now, there are two options: i) leaving everything as is, and ii) reworking
the image locking code in order to have a final checkpoint out of the
central bpf_prog_select_runtime() which probes whether any of the calls
during prog setup weren't successful, and then bailing out with an error.
Option ii) is a better approach since this additional paranoia avoids
altogether leaving any potential W+X pages from BPF side in the system.
Therefore, lets be strict about it, and reject programs in such unlikely
occasion. While testing I noticed also that one bpf_prog_lock_ro()
call was missing on the outer dummy prog in case of calls, e.g. in the
destructor we call bpf_prog_free_deferred() on the main prog where we
try to bpf_prog_unlock_free() the program, and since we go via
bpf_prog_select_runtime() do that as well.

Reported-by: syzbot+3b889862e65a98317058@syzkaller.appspotmail.com
Reported-by: syzbot+9e762b52dd17e616a7a5@syzkaller.appspotmail.com
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

9facc336

bpf: fix panic in prog load calls cleanup · 7d1982b4

由 Daniel Borkmann 提交于 6月 15, 2018

While testing I found that when hitting error path in bpf_prog_load()
where we jump to free_used_maps and prog contained BPF to BPF calls
that were JITed earlier, then we never clean up the bpf_prog_kallsyms_add()
done under jit_subprogs(). Add proper API to make BPF kallsyms deletion
more clear and fix that.

Fixes: 1c2a088a ("bpf: x64: add JIT support for multi-function programs")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NMartin KaFai Lau <kafai@fb.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>

7d1982b4

15 6月, 2018 5 次提交

afs: Display manually added cells in dynamic root mount · 0da0b7fd

由 David Howells 提交于 6月 15, 2018

Alter the dynroot mount so that cells created by manipulation of
/proc/fs/afs/cells and /proc/fs/afs/rootcell and by specification of a root
cell as a module parameter will cause directories for those cells to be
created in the dynamic root superblock for the network namespace[*].

To this end:

 (1) Only one dynamic root superblock is now created per network namespace
     and this is shared between all attempts to mount it.  This makes it
     easier to find the superblock to modify.

 (2) When a dynamic root superblock is created, the list of cells is walked
     and directories created for each cell already defined.

 (3) When a new cell is added, if a dynamic root superblock exists, a
     directory is created for it.

 (4) When a cell is destroyed, the directory is removed.

 (5) These directories are created by calling lookup_one_len() on the root
     dir which automatically creates them if they don't exist.

[*] Inasmuch as network namespaces are currently supported here.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

0da0b7fd

block: remov blk_queue_invalidate_tags · be7f99c5

由 Christoph Hellwig 提交于 6月 15, 2018

This function is entirely unused, so remove it and the tag_queue_busy
member of struct request_queue.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

be7f99c5

mm: fix oom_kill event handling · fe6bdfc8

由 Roman Gushchin 提交于 6月 14, 2018

Commit e27be240 ("mm: memcg: make sure memory.events is uptodate
when waking pollers") converted most of memcg event counters to
per-memcg atomics, which made them less confusing for a user.  The
"oom_kill" counter remained untouched, so now it behaves differently
than other counters (including "oom").  This adds nothing but confusion.

Let's fix this by adding the MEMCG_OOM_KILL event, and follow the
MEMCG_OOM approach.

This also removes a hack from count_memcg_event_mm(), introduced earlier
specially for the OOM_KILL counter.

[akpm@linux-foundation.org: fix for droppage of memcg-replace-mm-owner-with-mm-memcg.patch]
Link: http://lkml.kernel.org/r/20180508124637.29984-1-guro@fb.comSigned-off-by: NRoman Gushchin <guro@fb.com>
Acked-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fe6bdfc8

sched/core / kcov: avoid kcov_area during task switch · 0ed557aa

由 Mark Rutland 提交于 6月 14, 2018

During a context switch, we first switch_mm() to the next task's mm,
then switch_to() that new task. This means that vmalloc'd regions which
had previously been faulted in can transiently disappear in the context
of the prev task.

Functions instrumented by KCOV may try to access a vmalloc'd kcov_area
during this window, and as the fault handling code is instrumented, this
results in a recursive fault.

We must avoid accessing any kcov_area during this window. We can do so
with a new flag in kcov_mode, set prior to switching the mm, and cleared
once the new task is live. Since task_struct::kcov_mode isn't always a
specific enum kcov_mode value, this is made an unsigned int.

The manipulation is hidden behind kcov_{prepare,finish}_switch() helpers,
which are empty for !CONFIG_KCOV kernels.

The code uses macros because I can't use static inline functions without a
circular include dependency between <linux/sched.h> and <linux/kcov.h>,
since the definition of task_struct uses things defined in <linux/kcov.h>

Link: http://lkml.kernel.org/r/20180504135535.53744-4-mark.rutland@arm.comSigned-off-by: NMark Rutland <mark.rutland@arm.com>
Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ed557aa

mm: fix race between kmem_cache destroy, create and deactivate · 92ee383f

由 Shakeel Butt 提交于 6月 14, 2018

The memcg kmem cache creation and deactivation (SLUB only) is
asynchronous.  If a root kmem cache is destroyed whose memcg cache is in
the process of creation or deactivation, the kernel may crash.

Example of one such crash:
	general protection fault: 0000 [#1] SMP PTI
	CPU: 1 PID: 1721 Comm: kworker/14:1 Not tainted 4.17.0-smp
	...
	Workqueue: memcg_kmem_cache kmemcg_deactivate_workfn
	RIP: 0010:has_cpu_slab
	...
	Call Trace:
	? on_each_cpu_cond
	__kmem_cache_shrink
	kmemcg_cache_deact_after_rcu
	kmemcg_deactivate_workfn
	process_one_work
	worker_thread
	kthread
	ret_from_fork+0x35/0x40

To fix this race, on root kmem cache destruction, mark the cache as
dying and flush the workqueue used for memcg kmem cache creation and
deactivation.  SLUB's memcg kmem cache deactivation also includes RCU
callback and thus make sure all previous registered RCU callbacks have
completed as well.

[shakeelb@google.com: handle the RCU callbacks for SLUB deactivation]
  Link: http://lkml.kernel.org/r/20180611192951.195727-1-shakeelb@google.com
[shakeelb@google.com: add more documentation, rename fields for readability]
  Link: http://lkml.kernel.org/r/20180522201336.196994-1-shakeelb@google.com
[akpm@linux-foundation.org: fix build, per Shakeel]
[shakeelb@google.com: v3.  Instead of refcount, flush the workqueue]
  Link: http://lkml.kernel.org/r/20180530001204.183758-1-shakeelb@google.com
Link: http://lkml.kernel.org/r/20180521174116.171846-1-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com>
Acked-by: NVladimir Davydov <vdavydov.dev@gmail.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92ee383f

14 6月, 2018 4 次提交

blk-mq: remove blk_mq_tagset_iter · e6c3456a

由 Christoph Hellwig 提交于 6月 14, 2018

Unused now that nvme stopped using it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NJens Axboe <axboe@kernel.dk>

e6c3456a

blk-mq: don't time out requests again that are in the timeout handler · da661267

由 Christoph Hellwig 提交于 6月 14, 2018

We can currently call the timeout handler again on a request that has
already been handed over to the timeout handler.  Prevent that with a new
flag.

Fixes: 12f5b931 ("blk-mq: Remove generation seqeunce")
Reported-by: NAndrew Randrianasulu <randrianasulu@gmail.com>
Tested-by: NAndrew Randrianasulu <randrianasulu@gmail.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

da661267

dma-mapping: move all DMA mapping code to kernel/dma · cf65a0f6

由 Christoph Hellwig 提交于 6月 12, 2018

Currently the code is split over various files with dma- prefixes in the
lib/ and drives/base directories, and the number of files keeps growing.
Move them into a single directory to keep the code together and remove
the file name prefixes. To match the irq infrastructure this directory
is placed under the kernel/ directory.
Signed-off-by: NChristoph Hellwig <hch@lst.de>

cf65a0f6

Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables · 050e9baa

由 Linus Torvalds 提交于 6月 14, 2018

The changes to automatically test for working stack protector compiler
support in the Kconfig files removed the special STACKPROTECTOR_AUTO
option that picked the strongest stack protector that the compiler
supported.

That was all a nice cleanup - it makes no sense to have the AUTO case
now that the Kconfig phase can just determine the compiler support
directly.

HOWEVER.

It also meant that doing "make oldconfig" would now _disable_ the strong
stackprotector if you had AUTO enabled, because in a legacy config file,
the sane stack protector configuration would look like

  CONFIG_HAVE_CC_STACKPROTECTOR=y
  # CONFIG_CC_STACKPROTECTOR_NONE is not set
  # CONFIG_CC_STACKPROTECTOR_REGULAR is not set
  # CONFIG_CC_STACKPROTECTOR_STRONG is not set
  CONFIG_CC_STACKPROTECTOR_AUTO=y

and when you ran this through "make oldconfig" with the Kbuild changes,
it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had
been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just
CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version
used to be disabled (because it was really enabled by AUTO), and would
disable it in the new config, resulting in:

  CONFIG_HAVE_CC_STACKPROTECTOR=y
  CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
  CONFIG_CC_STACKPROTECTOR=y
  # CONFIG_CC_STACKPROTECTOR_STRONG is not set
  CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

That's dangerously subtle - people could suddenly find themselves with
the weaker stack protector setup without even realizing.

The solution here is to just rename not just the old RECULAR stack
protector option, but also the strong one.  This does that by just
removing the CC_ prefix entirely for the user choices, because it really
is not about the compiler support (the compiler support now instead
automatially impacts _visibility_ of the options to users).

This results in "make oldconfig" actually asking the user for their
choice, so that we don't have any silent subtle security model changes.
The end result would generally look like this:

  CONFIG_HAVE_CC_STACKPROTECTOR=y
  CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
  CONFIG_STACKPROTECTOR=y
  CONFIG_STACKPROTECTOR_STRONG=y
  CONFIG_CC_HAS_SANE_STACKPROTECTOR=y

where the "CC_" versions really are about internal compiler
infrastructure, not the user selections.
Acked-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

050e9baa

13 6月, 2018 1 次提交

mm: Introduce kvcalloc() · 1c542f38

由 Kees Cook 提交于 6月 11, 2018

The kv*alloc()-family was missing kvcalloc(). Adding this allows for
2-argument multiplication conversions of kvzalloc(a * b, ...) into
kvcalloc(a, b, ...).
Signed-off-by: NKees Cook <keescook@chromium.org>

1c542f38

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功