提交 · af2c68fe94e8c0a628519b60ba070c5cf6526a99 · openeuler / Kernel

05 8月, 2019 15 次提交

block: Declare several function pointer arguments 'const' · af2c68fe

由 Bart Van Assche 提交于 8月 01, 2019

Make it clear to the compiler and also to humans that the functions
that query request queue properties do not modify any member of the
request_queue data structure.
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.com>
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

af2c68fe

blk-mq: remove blk_mq_complete_request_sync · a87ccce0

由 Ming Lei 提交于 7月 24, 2019

blk_mq_tagset_wait_completed_request() has been applied for waiting
for completed request's fn, so not necessary to use
blk_mq_complete_request_sync() any more.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

a87ccce0

nvme: wait until all completed request's complete fn is called · 622b8b68

由 Ming Lei 提交于 7月 24, 2019

When aborting in-flight request for recovering controller, we have
to make sure that queue's complete function is called on completed
request before moving on. Otherwise, for example, the warning of
WARN_ON_ONCE(qp->mrs_used > 0) in ib_destroy_qp_user() may be
triggered on nvme-rdma.

Fix this issue by using blk_mq_tagset_wait_completed_request.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

622b8b68

nvme: don't abort completed request in nvme_cancel_request · 78ca4072

由 Ming Lei 提交于 7月 24, 2019

Before aborting in-flight requests, all IO queues and their interrupts
have been shutdown. However, request's completion function may not be
done yet because it can be scheduled to run via IPI.

So don't abort one request if it is marked as completed, otherwise
we may abort one normal completed request.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

78ca4072

blk-mq: introduce blk_mq_tagset_wait_completed_request() · f9934a80

由 Ming Lei 提交于 7月 24, 2019

blk-mq may schedule to call queue's complete function on remote CPU via
IPI, but doesn't provide any way to synchronize the request's complete
fn. The current queue freeze interface can't provide the synchonization
because aborted requests stay at blk-mq queues during EH.

In some driver's EH(such as NVMe), hardware queue's resource may be freed &
re-allocated. If the completed request's complete fn is run finally after the
hardware queue's resource is released, kernel crash will be triggered.

Prepare for fixing this kind of issue by introducing
blk_mq_tagset_wait_completed_request().

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

f9934a80

blk-mq: introduce blk_mq_request_completed() · aa306ab7

由 Ming Lei 提交于 7月 24, 2019

NVMe needs this function to decide if one request to be aborted has
been completed in normal IO path already.

So introduce it.

Cc: Max Gurtovoy <maxg@mellanox.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Keith Busch <keith.busch@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
Signed-off-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

aa306ab7

L

Linux 5.3-rc3 · e21a712a
由 Linus Torvalds 提交于 8月 04, 2019

e21a712a

Merge tag 'tpmdd-next-20190805' of git://git.infradead.org/users/jjs/linux-tpmdd · a6831a89

由 Linus Torvalds 提交于 8月 04, 2019

Pull tpm fixes from Jarkko Sakkinen:
 "Two bug fixes that did not make into my first pull request"

* tag 'tpmdd-next-20190805' of git://git.infradead.org/users/jjs/linux-tpmdd:
  tpm: tpm_ibm_vtpm: Fix unallocated banks
  tpm: Fix null pointer dereference on chip register error path

a6831a89

Merge tag 'mtd/fixes-for-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux · 62d17163

由 Linus Torvalds 提交于 8月 04, 2019

Pull MTD fixes from Miquel Raynal:
 "NAND:

   - Fix Micron driver as some chips enable internal ECC correction
     during their discovery while they advertize they do not have any.

  Hyperbus:

   - Restrict the build to only ARM64 SoCs (and compile testing) which
     is what should have been done since the beginning.

   - Fix Kconfig issue by selection something instead of implying it"

* tag 'mtd/fixes-for-5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux:
  mtd: hyperbus: Add hardware dependency to AM654 driver
  mtd: hyperbus: Kconfig: Fix HBMC_AM654 dependencies
  mtd: rawnand: micron: handle on-die "ECC-off" devices correctly

62d17163

tpm: tpm_ibm_vtpm: Fix unallocated banks · fa4f99c0

由 Nayna Jain 提交于 7月 11, 2019

The nr_allocated_banks and allocated banks are initialized as part of
tpm_chip_register. Currently, this is done as part of auto startup
function. However, some drivers, like the ibm vtpm driver, do not run
auto startup during initialization. This results in uninitialized memory
issue and causes a kernel panic during boot.

This patch moves the pcr allocation outside the auto startup function
into tpm_chip_register. This ensures that allocated banks are initialized
in any case.

Fixes: 879b5892 ("tpm: retrieve digest size of unknown algorithms with PCR read")
Reported-by: NMichal Suchanek <msuchanek@suse.de>
Signed-off-by: NNayna Jain <nayna@linux.ibm.com>
Reviewed-by: NMimi Zohar <zohar@linux.ibm.com>
Tested-by: NSachin Sant <sachinp@linux.vnet.ibm.com>
Tested-by: NMichal Suchánek <msuchanek@suse.de>
Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>

fa4f99c0

tpm: Fix null pointer dereference on chip register error path · 1e5ac630

由 Milan Broz 提交于 7月 04, 2019

If clk_enable is not defined and chip initialization
is canceled code hits null dereference.

Easily reproducible with vTPM init fail:
  swtpm chardev --tpmstate dir=nonexistent_dir --tpm2 --vtpm-proxy

BUG: kernel NULL pointer dereference, address: 00000000
...
Call Trace:
 tpm_chip_start+0x9d/0xa0 [tpm]
 tpm_chip_register+0x10/0x1a0 [tpm]
 vtpm_proxy_work+0x11/0x30 [tpm_vtpm_proxy]
 process_one_work+0x214/0x5a0
 worker_thread+0x134/0x3e0
 ? process_one_work+0x5a0/0x5a0
 kthread+0xd4/0x100
 ? process_one_work+0x5a0/0x5a0
 ? kthread_park+0x90/0x90
 ret_from_fork+0x19/0x24

Fixes: 719b7d81 ("tpm: introduce tpm_chip_start() and tpm_chip_stop()")
Cc: stable@vger.kernel.org # v5.1+
Signed-off-by: NMilan Broz <gmazyland@gmail.com>
Reviewed-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: NJarkko Sakkinen <jarkko.sakkinen@linux.intel.com>

1e5ac630

Merge tag 'powerpc-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 4b6f2316

由 Linus Torvalds 提交于 8月 04, 2019

Pull powerpc fixes from Michael Ellerman:
 "Some more powerpc fixes for 5.3:

   - Wire up the new clone3 syscall.

   - A fix for the PAPR SCM nvdimm driver, to fix a crash when firmware
     gives us a device that's attached to a non-online NUMA node.

   - A fix for a boot failure on 32-bit with KASAN enabled.

   - Three fixes for implicit fall through warnings, some of which are
     errors for us due to -Werror.

  Thanks to: Aneesh Kumar K.V, Christophe Leroy, Kees Cook, Santosh
  Sivaraj, Stephen Rothwell"

* tag 'powerpc-5.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/kasan: fix early boot failure on PPC32
  drivers/macintosh/smu.c: Mark expected switch fall-through
  powerpc/spe: Mark expected switch fall-throughs
  powerpc/nvdimm: Pick nearby online node if the device node is not online
  powerpc/kvm: Fall through switch case explicitly
  powerpc: Wire up clone3 syscall

4b6f2316

MAINTAINERS: Add Geert as Renesas SoC Co-Maintainer · 4c0d228c

由 Geert Uytterhoeven 提交于 7月 29, 2019

At the end of the v5.3 upstream kernel development cycle, Simon will be
stepping down from his role as Renesas SoC maintainer. Starting with
the v5.4 development cycle, Geert is taking over this role.

Add Geert as a co-maintainer, and add his git repository and branch.
Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: NNiklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Acked-by: NSimon Horman <horms+renesas@verge.net.au>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c0d228c

Merge tag 'kbuild-fixes-v5.3-2' of... · 05e4f88b

由 Linus Torvalds 提交于 8月 04, 2019

Merge tag 'kbuild-fixes-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

Pull Kbuild fixes from Masahiro Yamada:

 - detect missing missing "WITH Linux-syscall-note" for uapi headers

 - fix needless rebuild when using Clang

 - fix false-positive cc-option in Kconfig when using Clang

 - avoid including corrupted .*.cmd files in the modpost stage

 - fix warning of 'make vmlinux'

 - fix {m,n,x,g}config to not generate the broken .config on the second
   save operation.

 - some trivial Makefile fixes

* tag 'kbuild-fixes-v5.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
  kconfig: Clear "written" flag to avoid data loss
  kbuild: Check for unknown options with cc-option usage in Kconfig and clang
  lib/raid6: fix unnecessary rebuild of vpermxor*.c
  kbuild: modpost: do not parse unnecessary rules for vmlinux modpost
  kbuild: modpost: remove unnecessary dependency for __modpost
  kbuild: modpost: handle KBUILD_EXTRA_SYMBOLS only for external modules
  kbuild: modpost: include .*.cmd files only when targets exist
  kbuild: initialize CLANG_FLAGS correctly in the top Makefile
  kbuild: detect missing "WITH Linux-syscall-note" for uapi headers

05e4f88b

Merge tag 'safesetid-maintainers-correction-5.3-rc2' of git://github.com/micah-morton/linux · 8449c980

由 Linus Torvalds 提交于 8月 04, 2019

Pull SafeSetID maintainer update from Micah Morton:
 "Add entry in MAINTAINERS file for SafeSetID LSM"

* tag 'safesetid-maintainers-correction-5.3-rc2' of git://github.com/micah-morton/linux:
  Add entry in MAINTAINERS file for SafeSetID LSM

8449c980

04 8月, 2019 8 次提交

kconfig: Clear "written" flag to avoid data loss · 0c5b6c28

由 M. Vefa Bicakci 提交于 8月 03, 2019

Prior to this commit, starting nconfig, xconfig or gconfig, and saving
the .config file more than once caused data loss, where a .config file
that contained only comments would be written to disk starting from the
second save operation.

This bug manifests itself because the SYMBOL_WRITTEN flag is never
cleared after the first call to conf_write, and subsequent calls to
conf_write then skip all of the configuration symbols due to the
SYMBOL_WRITTEN flag being set.

This commit resolves this issue by clearing the SYMBOL_WRITTEN flag
from all symbols before conf_write returns.

Fixes: 8e2442a5 ("kconfig: fix missing choice values in auto.conf")
Cc: linux-stable <stable@vger.kernel.org> # 4.19+
Signed-off-by: NM. Vefa Bicakci <m.v.b@runbox.com>
Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>

0c5b6c28

Merge tag 'xtensa-20190803' of git://github.com/jcmvbkbc/linux-xtensa · d8778f13

由 Linus Torvalds 提交于 8月 03, 2019

Pull Xtensa fix from Max Filippov:
 "Fix build for xtensa cores with coprocessors that was broken by
  entry/return abstraction patch"

* tag 'xtensa-20190803' of git://github.com/jcmvbkbc/linux-xtensa:
  xtensa: fix build for cores with coprocessors

d8778f13

Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux · cf6c8aef

由 Linus Torvalds 提交于 8月 03, 2019

Pull i2c fixes from Wolfram Sang:
 "A set of driver fixes for the I2C subsystem"

* 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: s3c2410: Mark expected switch fall-through
  i2c: at91: fix clk_offset for sama5d2
  i2c: at91: disable TXRDY interrupt after sending data
  i2c: iproc: Fix i2c master read more than 63 bytes
  eeprom: at24: make spd world-readable again

cf6c8aef

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 8b7fd679

由 Linus Torvalds 提交于 8月 03, 2019

Pull perf tooling fixes from Thomas Gleixner:
 "A set of updates for perf tools and documentation:

  perf header:
    - Prevent a division by zero
    - Deal with an uninitialized warning proper

  libbpf:
    - Fix the missiong __WORDSIZE definition for musl & al

  UAPI headers:
    - Synchronize kernel headers

  Documentation:
    - Fix the memory units for perf.data size"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  libbpf: fix missing __WORDSIZE definition
  perf tools: Fix perf.data documentation units for memory size
  perf header: Fix use of unitialized value warning
  perf header: Fix divide by zero error if f_header.attr_size==0
  tools headers UAPI: Sync if_link.h with the kernel
  tools headers UAPI: Sync sched.h with the kernel
  tools headers UAPI: Sync usbdevice_fs.h with the kernels to get new ioctl
  tools perf beauty: Fix usbdevfs_ioctl table generator to handle _IOC()
  tools headers UAPI: Update tools's copy of drm.h headers
  tools headers UAPI: Update tools's copy of mman.h headers
  tools headers UAPI: Update tools's copy of kvm.h headers
  tools include UAPI: Sync x86's syscalls_64.tbl and generic unistd.h to pick up clone3 and pidfd_open

8b7fd679

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 0432a0a0

由 Linus Torvalds 提交于 8月 03, 2019

Pull vdso timer fixes from Thomas Gleixner:
 "A series of commits to deal with the regression caused by the generic
  VDSO implementation.

  The usage of clock_gettime64() for 32bit compat fallback syscalls
  caused seccomp filters to kill innocent processes because they only
  allow clock_gettime().

  Handle the compat syscalls with clock_gettime() as before, which is
  not a functional problem for the VDSO as the legacy compat application
  interface is not y2038 safe anyway. It's just extra fallback code
  which needs to be implemented on every architecture.

  It's opt in for now so that it does not break the compile of already
  converted architectures in linux-next. Once these are fixed, the
  #ifdeffery goes away.

  So much for trying to be smart and reuse code..."

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  arm64: compat: vdso: Use legacy syscalls as fallback
  x86/vdso/32: Use 32bit syscall fallback
  lib/vdso/32: Provide legacy syscall fallbacks
  lib/vdso: Move fallback invocation to the callers
  lib/vdso/32: Remove inconsistent NULL pointer checks

0432a0a0

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · af42e745

由 Linus Torvalds 提交于 8月 03, 2019

Pull irq fixes from Thomas Gleixner:
 "A small bunch of fixes from the irqchip department:

   - Fix a couple of UAF on error paths (RZA1, GICv3 ITS)

   - Fix iMX GPCv2 trigger setting

   - Add missing of_node_put() on error path in MBIGEN

   - Add another bunch of /* fall-through */ to silence warnings"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/renesas-rza1: Fix an use-after-free in rza1_irqc_probe()
  irqchip/irq-imx-gpcv2: Forward irq type to parent
  irqchip/irq-mbigen: Add of_node_put() before return
  irqchip/gic-v3-its: Free unused vpt_page when alloc vpe table fail
  irqchip/gic-v3: Mark expected switch fall-through

af42e745

Merge tag 'xfs-5.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · e12b243d

由 Linus Torvalds 提交于 8月 03, 2019

Pull xfs fixes from Darrick Wong:

 - Avoid leaking kernel stack contents to userspace

 - Fix a potential null pointer dereference in the dabtree scrub code

* tag 'xfs-5.3-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: Fix possible null-pointer dereferences in xchk_da_btree_block_check_sibling()
  xfs: fix stack contents leakage in the v1 inumber ioctls

e12b243d

Merge branch 'akpm' (patches from Andrew) · b7aea68a

由 Linus Torvalds 提交于 8月 03, 2019

Merge misc fixes from Andrew Morton:
 "17 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  drivers/acpi/scan.c: document why we don't need the device_hotplug_lock
  memremap: move from kernel/ to mm/
  lib/test_meminit.c: use GFP_ATOMIC in RCU critical section
  asm-generic: fix -Wtype-limits compiler warnings
  cgroup: kselftest: relax fs_spec checks
  mm/memory_hotplug.c: remove unneeded return for void function
  mm/migrate.c: initialize pud_entry in migrate_vma()
  coredump: split pipe command whitespace before expanding template
  page flags: prioritize kasan bits over last-cpuid
  ubsan: build ubsan.c more conservatively
  kasan: remove clang version check for KASAN_STACK
  mm: compaction: avoid 100% CPU usage during compaction when a task is killed
  mm: migrate: fix reference check race between __find_get_block() and migration
  mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker
  ocfs2: remove set but not used variable 'last_hash'
  Revert "kmemleak: allow to coexist with fault injection"
  kernel/signal.c: fix a kernel-doc markup

b7aea68a

03 8月, 2019 17 次提交

Merge tag 'riscv/for-v5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 61672549

由 Linus Torvalds 提交于 8月 03, 2019

Pull RISC-V fixes from Paul Walmsley:
 "Three minor RISC-V-related changes for v5.3-rc3:

   - Add build ID to VDSO builds to avoid a double-free in perf when
     libelf isn't used

   - Align the RV64 defconfig to the output of "make savedefconfig" so
     subsequent defconfig patches don't get out of hand

   - Drop a superfluous DT property from the FU540 SoC DT data (since it
     must be already set in board data that includes it)"

* tag 'riscv/for-v5.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: defconfig: align RV64 defconfig to the output of "make savedefconfig"
  riscv: dts: fu540-c000: drop "timebase-frequency"
  riscv: Fix perf record without libelf support

61672549

drivers/acpi/scan.c: document why we don't need the device_hotplug_lock · 7291edca

由 David Hildenbrand 提交于 8月 02, 2019

Let's document why the lock is not needed in acpi_scan_init(), right now
this is not really obvious.

[akpm@linux-foundation.org: fix tpyo]
Link: http://lkml.kernel.org/r/20190731135306.31524-1-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7291edca

memremap: move from kernel/ to mm/ · 14c5ceba

由 Christoph Hellwig 提交于 8月 02, 2019

memremap.c implements MM functionality for ZONE_DEVICE, so it really
should be in the mm/ directory, not the kernel/ one.

Link: http://lkml.kernel.org/r/20190722094143.18387-1-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NAnshuman Khandual <anshuman.khandual@arm.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

14c5ceba

lib/test_meminit.c: use GFP_ATOMIC in RCU critical section · 733d1d1a

由 Alexander Potapenko 提交于 8月 02, 2019

kmalloc() shouldn't sleep while in RCU critical section, therefore use
GFP_ATOMIC instead of GFP_KERNEL.

The bug was spotted by the 0day kernel testing robot.

Link: http://lkml.kernel.org/r/20190725121703.210874-1-glider@google.com
Fixes: 7e659650cbda ("lib: introduce test_meminit module")
Signed-off-by: NAlexander Potapenko <glider@google.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Reported-by: Nkernel test robot <lkp@intel.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

733d1d1a

asm-generic: fix -Wtype-limits compiler warnings · cbedfe11

由 Qian Cai 提交于 8月 02, 2019

Commit d66acc39 ("bitops: Optimise get_order()") introduced a
compilation warning because "rx_frag_size" is an "ushort" while
PAGE_SHIFT here is 16.

The commit changed the get_order() to be a multi-line macro where
compilers insist to check all statements in the macro even when
__builtin_constant_p(rx_frag_size) will return false as "rx_frag_size"
is a module parameter.

In file included from ./arch/powerpc/include/asm/page_64.h:107,
                 from ./arch/powerpc/include/asm/page.h:242,
                 from ./arch/powerpc/include/asm/mmu.h:132,
                 from ./arch/powerpc/include/asm/lppaca.h:47,
                 from ./arch/powerpc/include/asm/paca.h:17,
                 from ./arch/powerpc/include/asm/current.h:13,
                 from ./include/linux/thread_info.h:21,
                 from ./arch/powerpc/include/asm/processor.h:39,
                 from ./include/linux/prefetch.h:15,
                 from drivers/net/ethernet/emulex/benet/be_main.c:14:
drivers/net/ethernet/emulex/benet/be_main.c: In function 'be_rx_cqs_create':
./include/asm-generic/getorder.h:54:9: warning: comparison is always
true due to limited range of data type [-Wtype-limits]
   (((n) < (1UL << PAGE_SHIFT)) ? 0 :  \
         ^
drivers/net/ethernet/emulex/benet/be_main.c:3138:33: note: in expansion
of macro 'get_order'
  adapter->big_page_size = (1 << get_order(rx_frag_size)) * PAGE_SIZE;
                                 ^~~~~~~~~

Fix it by moving all of this multi-line macro into a proper function,
and killing __get_order() off.

[akpm@linux-foundation.org: remove __get_order() altogether]
[cai@lca.pw: v2]
  Link: http://lkml.kernel.org/r/1564000166-31428-1-git-send-email-cai@lca.pw
Link: http://lkml.kernel.org/r/1563914986-26502-1-git-send-email-cai@lca.pw
Fixes: d66acc39 ("bitops: Optimise get_order()")
Signed-off-by: NQian Cai <cai@lca.pw>
Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: David Howells <dhowells@redhat.com>
Cc: Jakub Jelinek <jakub@redhat.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Bill Wendling <morbo@google.com>
Cc: James Y Knight <jyknight@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cbedfe11

cgroup: kselftest: relax fs_spec checks · b59b1baa

由 Chris Down 提交于 8月 02, 2019

On my laptop most memcg kselftests were being skipped because it claimed
cgroup v2 hierarchy wasn't mounted, but this isn't correct.  Instead, it
seems current systemd HEAD mounts it with the name "cgroup2" instead of
"cgroup":

    % grep cgroup /proc/mounts
    cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate 0 0

I can't think of a reason to need to check fs_spec explicitly
since it's arbitrary, so we can just rely on fs_vfstype.

After these changes, `make TARGETS=cgroup kselftest` actually runs the
cgroup v2 tests in more cases.

Link: http://lkml.kernel.org/r/20190723210737.GA487@chrisdown.nameSigned-off-by: NChris Down <chris@chrisdown.name>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b59b1baa

mm/memory_hotplug.c: remove unneeded return for void function · aa4996b3

由 Weitao Hou 提交于 8月 02, 2019

return is unneeded in void function

Link: http://lkml.kernel.org/r/20190723130814.21826-1-houweitaoo@gmail.comSigned-off-by: NWeitao Hou <houweitaoo@gmail.com>
Reviewed-by: NDavid Hildenbrand <david@redhat.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa4996b3

mm/migrate.c: initialize pud_entry in migrate_vma() · 7b358c6f

由 Ralph Campbell 提交于 8月 02, 2019

When CONFIG_MIGRATE_VMA_HELPER is enabled, migrate_vma() calls
migrate_vma_collect() which initializes a struct mm_walk but didn't
initialize mm_walk.pud_entry.  (Found by code inspection) Use a C
structure initialization to make sure it is set to NULL.

Link: http://lkml.kernel.org/r/20190719233225.12243-1-rcampbell@nvidia.com
Fixes: 8763cb45 ("mm/migrate: new memory migration helper for use with device memory")
Signed-off-by: NRalph Campbell <rcampbell@nvidia.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7b358c6f

coredump: split pipe command whitespace before expanding template · 315c6926

由 Paul Wise 提交于 8月 02, 2019

Save the offsets of the start of each argument to avoid having to update
pointers to each argument after every corename krealloc and to avoid
having to duplicate the memory for the dump command.

Executable names containing spaces were previously being expanded from
%e or %E and then split in the middle of the filename. This is
incorrect behaviour since an argument list can represent arguments with
spaces.

The splitting could lead to extra arguments being passed to the core
dump handler that it might have interpreted as options or ignored
completely.

Core dump handlers that are not aware of this Linux kernel issue will be
using %e or %E without considering that it may be split and so they will
be vulnerable to processes with spaces in their names breaking their
argument list. If their internals are otherwise well written, such as
if they are written in shell but quote arguments, they will work better
after this change than before. If they are not well written, then there
is a slight chance of breakage depending on the details of the code but
they will already be fairly broken by the split filenames.

Core dump handlers that are aware of this Linux kernel issue will be
placing %e or %E as the last item in their core_pattern and then
aggregating all of the remaining arguments into one, separated by
spaces. Alternatively they will be obtaining the filename via other
methods. Both of these will be compatible with the new arrangement.

A side effect from this change is that unknown template types (for
example %z) result in an empty argument to the dump handler instead of
the argument being dropped. This is a desired change as:

It is easier for dump handlers to process empty arguments than dropped
ones, especially if they are written in shell or don't pass each
template item with a preceding command-line option in order to
differentiate between individual template types. Most core_patterns in
the wild do not use options so they can confuse different template types
(especially numeric ones) if an earlier one gets dropped in old kernels.
If the kernel introduces a new template type and a core_pattern uses it,
the core dump handler might not expect that the argument can be dropped
in old kernels.

For example, this can result in security issues when %d is dropped in
old kernels. This happened with the corekeeper package in Debian and
resulted in the interface between corekeeper and Linux having to be
rewritten to use command-line options to differentiate between template
types.

The core_pattern for most core dump handlers is written by the handler
author who would generally not insert unknown template types so this
change should be compatible with all the core dump handlers that exist.

Link: http://lkml.kernel.org/r/20190528051142.24939-1-pabs3@bonedaddy.net
Fixes: 74aadce9 ("core_pattern: allow passing of arguments to user mode helper when core_pattern is a pipe")
Signed-off-by: NPaul Wise <pabs3@bonedaddy.net>
Reported-by: Jakub Wilk <jwilk@jwilk.net> [https://bugs.debian.org/924398]
Reported-by: Paul Wise <pabs3@bonedaddy.net> [https://lore.kernel.org/linux-fsdevel/c8b7ecb8508895bf4adb62a748e2ea2c71854597.camel@bonedaddy.net/]
Suggested-by: NJakub Wilk <jwilk@jwilk.net>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

315c6926

page flags: prioritize kasan bits over last-cpuid · ee38d94a

由 Arnd Bergmann 提交于 8月 02, 2019

ARM64 randdconfig builds regularly run into a build error, especially
when NUMA_BALANCING and SPARSEMEM are enabled but not SPARSEMEM_VMEMMAP:

  #error "KASAN: not enough bits in page flags for tag"

The last-cpuid bits are already contitional on the available space, so
the result of the calculation is a bit random on whether they were
already left out or not.

Adding the kasan tag bits before last-cpuid makes it much more likely to
end up with a successful build here, and should be reliable for
randconfig at least, as long as that does not randomize NR_CPUS or
NODES_SHIFT but uses the defaults.

In order for the modified check to not trigger in the x86 vdso32 code
where all constants are wrong (building with -m32), enclose all the
definitions with an #ifdef.

[arnd@arndb.de: build fix]
  Link: http://lkml.kernel.org/r/CAK8P3a3Mno1SWTcuAOT0Wa9VS15pdU6EfnkxLbDpyS55yO04+g@mail.gmail.com
Link: http://lkml.kernel.org/r/20190722115520.3743282-1-arnd@arndb.de
Link: https://lore.kernel.org/lkml/20190618095347.3850490-1-arnd@arndb.de/
Fixes: 2813b9c0 ("kasan, mm, arm64: tag non slab memory allocated via pagealloc")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NAndrey Konovalov <andreyknvl@google.com>
Reviewed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee38d94a

ubsan: build ubsan.c more conservatively · af700eae

由 Arnd Bergmann 提交于 8月 02, 2019

objtool points out several conditions that it does not like, depending
on the combination with other configuration options and compiler
variants:

stack protector:
  lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0xbf: call to __stack_chk_fail() with UACCESS enabled
  lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0xbe: call to __stack_chk_fail() with UACCESS enabled

stackleak plugin:
  lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x4a: call to stackleak_track_stack() with UACCESS enabled
  lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x4a: call to stackleak_track_stack() with UACCESS enabled

kasan:
  lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch()+0x25: call to memcpy() with UACCESS enabled
  lib/ubsan.o: warning: objtool: __ubsan_handle_type_mismatch_v1()+0x25: call to memcpy() with UACCESS enabled

The stackleak and kasan options just need to be disabled for this file
as we do for other files already.  For the stack protector, we already
attempt to disable it, but this fails on clang because the check is
mixed with the gcc specific -fno-conserve-stack option.  According to
Andrey Ryabinin, that option is not even needed, dropping it here fixes
the stackprotector issue.

Link: http://lkml.kernel.org/r/20190722125139.1335385-1-arnd@arndb.de
Link: https://lore.kernel.org/lkml/20190617123109.667090-1-arnd@arndb.de/t/
Link: https://lore.kernel.org/lkml/20190722091050.2188664-1-arnd@arndb.de/t/
Fixes: d08965a2 ("x86/uaccess, ubsan: Fix UBSAN vs. SMAP")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

af700eae

kasan: remove clang version check for KASAN_STACK · ebb6d35a

由 Arnd Bergmann 提交于 8月 02, 2019

asan-stack mode still uses dangerously large kernel stacks of tens of
kilobytes in some drivers, and it does not seem that anyone is working
on the clang bug.

Turn it off for all clang versions to prevent users from accidentally
enabling it once they update to clang-9, and to help automated build
testing with clang-9.

Link: https://bugs.llvm.org/show_bug.cgi?id=38809
Link: http://lkml.kernel.org/r/20190719200347.2596375-1-arnd@arndb.de
Fixes: 6baec880 ("kasan: turn off asan-stack for clang-8 and earlier")
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Acked-by: NNick Desaulniers <ndesaulniers@google.com>
Reviewed-by: NMark Brown <broonie@kernel.org>
Reviewed-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ebb6d35a

mm: compaction: avoid 100% CPU usage during compaction when a task is killed · 670105a2

由 Mel Gorman 提交于 8月 02, 2019

"howaboutsynergy" reported via kernel buzilla number 204165 that
compact_zone_order was consuming 100% CPU during a stress test for
prolonged periods of time. Specifically the following command, which
should exit in 10 seconds, was taking an excessive time to finish while
the CPU was pegged at 100%.

stress -m 220 --vm-bytes 1000000000 --timeout 10

Tracing indicated a pattern as follows

stress-3923 [007] 519.106208: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106212: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106216: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106219: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106223: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106227: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106231: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106235: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106238: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0
stress-3923 [007] 519.106242: mm_compaction_isolate_migratepages: range=(0x70bb80 ~ 0x70bb80) nr_scanned=0 nr_taken=0

Note that compaction is entered in rapid succession while scanning and
isolating nothing. The problem is that when a task that is compacting
receives a fatal signal, it retries indefinitely instead of exiting
while making no progress as a fatal signal is pending.

It's not easy to trigger this condition although enabling zswap helps on
the basis that the timing is altered. A very small window has to be hit
for the problem to occur (signal delivered while compacting and
isolating a PFN for migration that is not aligned to SWAP_CLUSTER_MAX).

This was reproduced locally -- 16G single socket system, 8G swap, 30%
zswap configured, vm-bytes 22000000000 using Colin Kings stress-ng
implementation from github running in a loop until the problem hits).
Tracing recorded the problem occurring almost 200K times in a short
window. With this patch, the problem hit 4 times but the task existed
normally instead of consuming CPU.

This problem has existed for some time but it was made worse by commit
cf66f070 ("mm, compaction: do not consider a need to reschedule as
contention"). Before that commit, if the same condition was hit then
locks would be quickly contended and compaction would exit that way.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204165
Link: http://lkml.kernel.org/r/20190718085708.GE24383@techsingularity.net
Fixes: cf66f070 ("mm, compaction: do not consider a need to reschedule as contention")
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Reviewed-by: NVlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org> [5.1+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

670105a2

mm: migrate: fix reference check race between __find_get_block() and migration · ebdf4de5

由 Jan Kara 提交于 8月 02, 2019

buffer_migrate_page_norefs() can race with bh users in the following
way:

CPU1                                    CPU2
buffer_migrate_page_norefs()
  buffer_migrate_lock_buffers()
  checks bh refs
  spin_unlock(&mapping->private_lock)
                                        __find_get_block()
                                          spin_lock(&mapping->private_lock)
                                          grab bh ref
                                          spin_unlock(&mapping->private_lock)
  move page                               do bh work

This can result in various issues like lost updates to buffers (i.e.
metadata corruption) or use after free issues for the old page.

This patch closes the race by holding mapping->private_lock while the
mapping is being moved to a new page.  Ordinarily, a reference can be
taken outside of the private_lock using the per-cpu BH LRU but the
references are checked and the LRU invalidated if necessary.  The
private_lock is held once the references are known so the buffer lookup
slow path will spin on the private_lock.  Between the page lock and
private_lock, it should be impossible for other references to be
acquired and updates to happen during the migration.

A user had reported data corruption issues on a distribution kernel with
a similar page migration implementation as mainline.  The data
corruption could not be reproduced with this patch applied.  A small
number of migration-intensive tests were run and no performance problems
were noted.

[mgorman@techsingularity.net: Changelog, removed tracing]
Link: http://lkml.kernel.org/r/20190718090238.GF24383@techsingularity.net
Fixes: 89cb0888 "mm: migrate: provide buffer_migrate_page_norefs()"
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>	[5.0+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ebdf4de5

mm: vmscan: check if mem cgroup is disabled or not before calling memcg slab shrinker · fa1e512f

由 Yang Shi 提交于 8月 02, 2019

Shakeel Butt reported premature oom on kernel with
"cgroup_disable=memory" since mem_cgroup_is_root() returns false even
though memcg is actually NULL.  The drop_caches is also broken.

It is because commit aeed1d32 ("mm/vmscan.c: generalize
shrink_slab() calls in shrink_node()") removed the !memcg check before
!mem_cgroup_is_root().  And, surprisingly root memcg is allocated even
though memory cgroup is disabled by kernel boot parameter.

Add mem_cgroup_disabled() check to make reclaimer work as expected.

Link: http://lkml.kernel.org/r/1563385526-20805-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: aeed1d32 ("mm/vmscan.c: generalize shrink_slab() calls in shrink_node()")
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Reported-by: NShakeel Butt <shakeelb@google.com>
Reviewed-by: NShakeel Butt <shakeelb@google.com>
Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Jan Hadrava <had@kam.mff.cuni.cz>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Roman Gushchin <guro@fb.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>	[4.19+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa1e512f

ocfs2: remove set but not used variable 'last_hash' · 7bc36e3c

由 YueHaibing 提交于 8月 02, 2019

Fixes gcc '-Wunused-but-set-variable' warning:

  fs/ocfs2/xattr.c: In function ocfs2_xattr_bucket_find:
  fs/ocfs2/xattr.c:3828:6: warning: variable last_hash set but not used [-Wunused-but-set-variable]

It's never used and can be removed.

Link: http://lkml.kernel.org/r/20190716132110.34836-1-yuehaibing@huawei.comSigned-off-by: NYueHaibing <yuehaibing@huawei.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
Cc: Mark Fasheh <mark@fasheh.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Changwei Ge <gechangwei@live.cn>
Cc: Gang He <ghe@suse.com>
Cc: Jun Piao <piaojun@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7bc36e3c

Revert "kmemleak: allow to coexist with fault injection" · df9576de

由 Yang Shi 提交于 8月 02, 2019

When running ltp's oom test with kmemleak enabled, the below warning was
triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is
passed in:

  WARNING: CPU: 105 PID: 2138 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1c31/0x1d50
  Modules linked in: loop dax_pmem dax_pmem_core ip_tables x_tables xfs virtio_net net_failover virtio_blk failover ata_generic virtio_pci virtio_ring virtio libata
  CPU: 105 PID: 2138 Comm: oom01 Not tainted 5.2.0-next-20190710+ #7
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
  RIP: 0010:__alloc_pages_nodemask+0x1c31/0x1d50
  ...
   kmemleak_alloc+0x4e/0xb0
   kmem_cache_alloc+0x2a7/0x3e0
   mempool_alloc_slab+0x2d/0x40
   mempool_alloc+0x118/0x2b0
   bio_alloc_bioset+0x19d/0x350
   get_swap_bio+0x80/0x230
   __swap_writepage+0x5ff/0xb20

The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, however kmemleak
has __GFP_NOFAIL set all the time due to d9570ee3 ("kmemleak:
allow to coexist with fault injection").  But, it doesn't make any sense
to have __GFP_NOFAIL and ~__GFP_DIRECT_RECLAIM specified at the same
time.

According to the discussion on the mailing list, the commit should be
reverted for short term solution.  Catalin Marinas would follow up with
a better solution for longer term.

The failure rate of kmemleak metadata allocation may increase in some
circumstances, but this should be expected side effect.

Link: http://lkml.kernel.org/r/1563299431-111710-1-git-send-email-yang.shi@linux.alibaba.com
Fixes: d9570ee3 ("kmemleak: allow to coexist with fault injection")
Signed-off-by: NYang Shi <yang.shi@linux.alibaba.com>
Suggested-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Qian Cai <cai@lca.pw>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df9576de

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功