提交 · 784c9a29e99eb40b842c29ecf1cc3a79e00fb629 · openanolis / cloud-kernel

08 8月, 2018 3 次提交

dm kcopyd: avoid softlockup in run_complete_job · 784c9a29

由 John Pittman 提交于 8月 06, 2018

It was reported that softlockups occur when using dm-snapshot ontop of
slow (rbd) storage.  E.g.:

[ 4047.990647] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [kworker/10:23:26177]
...
[ 4048.034151] Workqueue: kcopyd do_work [dm_mod]
[ 4048.034156] RIP: 0010:copy_callback+0x41/0x160 [dm_snapshot]
...
[ 4048.034190] Call Trace:
[ 4048.034196]  ? __chunk_is_tracked+0x70/0x70 [dm_snapshot]
[ 4048.034200]  run_complete_job+0x5f/0xb0 [dm_mod]
[ 4048.034205]  process_jobs+0x91/0x220 [dm_mod]
[ 4048.034210]  ? kcopyd_put_pages+0x40/0x40 [dm_mod]
[ 4048.034214]  do_work+0x46/0xa0 [dm_mod]
[ 4048.034219]  process_one_work+0x171/0x370
[ 4048.034221]  worker_thread+0x1fc/0x3f0
[ 4048.034224]  kthread+0xf8/0x130
[ 4048.034226]  ? max_active_store+0x80/0x80
[ 4048.034227]  ? kthread_bind+0x10/0x10
[ 4048.034231]  ret_from_fork+0x35/0x40
[ 4048.034233] Kernel panic - not syncing: softlockup: hung tasks

Fix this by calling cond_resched() after run_complete_job()'s callout to
the dm_kcopyd_notify_fn (which is dm-snap.c:copy_callback in the above
trace).
Signed-off-by: NJohn Pittman <jpittman@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

784c9a29

dm cache metadata: save in-core policy_hint_size to on-disk superblock · fd2fa954

由 Mike Snitzer 提交于 8月 02, 2018

policy_hint_size starts as 0 during __write_initial_superblock().  It
isn't until the policy is loaded that policy_hint_size is set in-core
(cmd->policy_hint_size).  But it never got recorded in the on-disk
superblock because __commit_transaction() didn't deal with transfering
the in-core cmd->policy_hint_size to the on-disk superblock.

The in-core cmd->policy_hint_size gets initialized by metadata_open()'s
__begin_transaction_flags() which re-reads all superblock fields.
Because the superblock's policy_hint_size was never properly stored, when
the cache was created, hints_array_available() would always return false
when re-activating a previously created cache.  This means
__load_mappings() always considered the hints invalid and never made use
of the hints (these hints served to optimize).

Another detremental side-effect of this oversight is the cache_check
utility would fail with: "invalid hint width: 0"

Cc: stable@vger.kernel.org
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

fd2fa954

dm thin: stop no_space_timeout worker when switching to write-mode · 75294442

由 Hou Tao 提交于 8月 02, 2018

Now both check_for_space() and do_no_space_timeout() will read & write
pool->pf.error_if_no_space.  If these functions run concurrently, as
shown in the following case, the default setting of "queue_if_no_space"
can get lost.

precondition:
    * error_if_no_space = false (aka "queue_if_no_space")
    * pool is in Out-of-Data-Space (OODS) mode
    * no_space_timeout worker has been queued

CPU 0:                          CPU 1:
// delete a thin device
process_delete_mesg()
// check_for_space() invoked by commit()
set_pool_mode(pool, PM_WRITE)
    pool->pf.error_if_no_space = \
     pt->requested_pf.error_if_no_space

				// timeout, pool is still in OODS mode
				do_no_space_timeout
				    // "queue_if_no_space" config is lost
				    pool->pf.error_if_no_space = true
    pool->pf.mode = new_mode

Fix it by stopping no_space_timeout worker when switching to write mode.

Fixes: bcc696fa ("dm thin: stay in out-of-data-space mode once no_space_timeout expires")
Cc: stable@vger.kernel.org
Signed-off-by: NHou Tao <houtao1@huawei.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

75294442

01 8月, 2018 1 次提交

dm kcopyd: return void from dm_kcopyd_copy() · 7209049d

由 Mike Snitzer 提交于 7月 31, 2018

dm_kcopyd_copy() only ever returns 0 so there is no need for callers to
account for possible failure.  Same goes for dm_kcopyd_zero().
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

7209049d

30 7月, 2018 1 次提交

dm thin: include metadata_low_watermark threshold in pool status · 63c8ecb6

由 Andy Grover 提交于 7月 27, 2018

The metadata low watermark threshold is set by the kernel.  But the
kernel depends on userspace to extend the thinpool metadata device when
the threshold is crossed.

Since the metadata low watermark threshold is not visible to userspace,
upon receiving an event, userspace cannot tell that the kernel wants the
metadata device extended, instead of some other eventing condition.
Making it visible (but not settable) enables userspace to affirmatively
know the kernel is asking for a metadata device extension, by comparing
metadata_low_watermark against nr_free_blocks_metadata, also reported in
status.

Current solutions like dmeventd have their own thresholds for extending
the data and metadata devices, and both devices are checked against
their thresholds on each event.  This lessens the value of the kernel-set
threshold, since userspace will either extend the metadata device sooner,
when receiving another event; or will receive the metadata lowater event
and do nothing, if dmeventd's threshold is less than the kernel's.
(This second case is dangerous. The metadata lowater event will not be
re-sent, so no further event will be generated before the metadata
device is out if space, unless some other event causes userspace to
recheck its thresholds.)
Signed-off-by: NAndy Grover <agrover@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

63c8ecb6

28 7月, 2018 16 次提交

dm writecache: report start_sector in status line · 9ff07e7d

由 Mikulas Patocka 提交于 7月 25, 2018

Fixes: d284f824 ("dm writecache: support optional offset for start of device")
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

9ff07e7d

dm crypt: convert essiv from ahash to shash · c07c88f5

由 Kees Cook 提交于 7月 15, 2018

In preparing to remove all stack VLA usage from the kernel[1], remove
the discouraged use of AHASH_REQUEST_ON_STACK in favor of the smaller
SHASH_DESC_ON_STACK by converting from ahash-wrapped-shash to direct
shash.  The stack allocation will be made a fixed size in a later patch
to the crypto subsystem.

[1] https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qPXydAacU1RqZWA@mail.gmail.comSigned-off-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c07c88f5

dm crypt: use wake_up_process() instead of a wait queue · c7329eff

由 Mikulas Patocka 提交于 7月 11, 2018

This is a small simplification of dm-crypt - use wake_up_process()
instead of a wait queue in a case where only one process may be
waiting.  dm-writecache uses a similar pattern.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c7329eff

dm integrity: recalculate checksums on creation · a3fcf725

由 Mikulas Patocka 提交于 7月 03, 2018

When using external metadata device and internal hash, recalculate the
checksums when the device is created - so that dm-integrity doesn't
have to overwrite the device.  The superblock stores the last position
when the recalculation ended, so that it is properly restarted.

Integrity tags that haven't been recalculated yet are ignored.

Also bump the target version.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

a3fcf725

dm integrity: flush journal on suspend when using separate metadata device · 747829a8

由 Mikulas Patocka 提交于 7月 03, 2018

Flush the journal on suspend when using separate data and metadata devices,
so that the metadata device can be discarded and the table can be reloaded
with a linear target pointing to the data device.

NOTE: the journal is deliberately not flushed when using the same device
for metadata and data, so that the journal replay code is tested.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

747829a8

dm integrity: use version 2 for separate metadata · 1f9fc0b8

由 Mikulas Patocka 提交于 7月 03, 2018

Use version "2" in the superblock when data and metadata devices are
separate, so that the device is not accidentally read by older kernel
version.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

1f9fc0b8

dm integrity: allow separate metadata device · 356d9d52

由 Mikulas Patocka 提交于 7月 03, 2018

Add the ability to store DM integrity metadata on a separate device.
This feature is activated with the option "meta_device:/dev/device".
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

356d9d52

dm integrity: add ic->start in get_data_sector() · 71e9ddbc

由 Mikulas Patocka 提交于 7月 03, 2018

A small refactoring.  Add the variable ic->start to the result
returned by get_data_sector() and not in the callers.  This is a
prerequisite for the commit that adds the ability to use an external
metadata device.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

71e9ddbc

M
dm integrity: report provided data sectors in the status · f84fd2c9
由 Mikulas Patocka 提交于 7月 03, 2018
```
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
```
f84fd2c9

dm integrity: implement fair range locks · 724376a0

由 Mikulas Patocka 提交于 7月 03, 2018

dm-integrity locks a range of sectors to prevent concurrent I/O or journal
writeback. These locks were not fair - so that many small overlapping I/Os
could starve a large I/O indefinitely.

Fix this by making the range locks fair. The ranges that are waiting are
added to the list "wait_list". If a new I/O overlaps some of the waiting
I/Os, it is not dispatched, but it is also added to that wait list.
Entries on the wait list are processed in first-in-first-out order, so
that an I/O can't starve indefinitely.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

724376a0

dm integrity: decouple common code in dm_integrity_map_continue() · 518748b1

由 Mikulas Patocka 提交于 7月 03, 2018

Decouple how dm_integrity_map_continue() responds to being out of free
sectors and when add_new_range() fails.

This has no functional change, but helps prepare for the next commit.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

518748b1

dm integrity: change 'suspending' variable from bool to int · c21b1639

由 Mikulas Patocka 提交于 7月 03, 2018

Early alpha processors can't write a byte or short atomically - they
read 8 bytes, modify the byte or two bytes in registers and write back
8 bytes.

The modification of the variable "suspending" may race with
modification of the variable "failed".  Fix this by changing
"suspending" to an int.

Cc: stable@vger.kernel.org
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

c21b1639

dm delay: add flush as a third class of IO · cda6b5ab

由 Mikulas Patocka 提交于 4月 17, 2018

Add a new class for dm-delay that delays flush requests.  Previously,
flushes were delayed as writes, but it caused problems if the user
needed to create a device with one or a few slow sectors for the purpose
of testing - all flushes would be forwarded to this device and delayed,
and that skews the test results.  Fix this by allowing to select 0 delay
for flushes.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

cda6b5ab

dm delay: refactor repetitive code · 3876ac76

由 Mikulas Patocka 提交于 4月 17, 2018

dm-delay has a lot of code that is repeated for delaying read and write
bios. Repetitive code is generally bad; refactor out the repetitive
code in preperation for adding another delay class for flush bios.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

3876ac76

dm cache: only allow a single io_mode cache feature to be requested · af9313c3

由 John Pittman 提交于 6月 21, 2018

More than one io_mode feature can be requested when creating a dm cache
device (as is: last one wins). The io_mode selections are incompatible
with one another, we should force them to be selected exclusively. Add
a counter to check for more than one io_mode selection.

Fixes: 629d0a8a ("dm cache metadata: add "metadata2" feature")
Signed-off-by: NJohn Pittman <jpittman@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

af9313c3

dm thin: update stale "Status" Documentation · 6c7413c0

由 Mike Snitzer 提交于 7月 23, 2018

Documentation/device-mapper-/thin-provisioning.txt's "Status" section no
longer reflected the current fitness level of DM thin-provisioning.
That is, DM thinp is no longer "EXPERIMENTAL".  It has since seen
considerable improvement, has been fairly widely deployed and has
performed in a robust manner.

Update Documentation to dispel concern raised by potential DM thinp
users.
Reported-by: NDrew Hastings <dhastings@crucialwebhost.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

6c7413c0

23 7月, 2018 4 次提交

L

Linux 4.18-rc6 · d72e90f3
由 Linus Torvalds 提交于 7月 22, 2018

d72e90f3

Merge tag 'nvme-for-4.18' of git://git.infradead.org/nvme · 74413084

由 Linus Torvalds 提交于 7月 22, 2018

Pull NVMe fixes from Christoph Hellwig:

 - fix a regression in 4.18 that causes a memory leak on probe failure
   (Keith Bush)

 - fix a deadlock in the passthrough ioctl code (Scott Bauer)

 - don't enable AENs if not supported (Weiping Zhang)

 - fix an old regression in metadata handling in the passthrough ioctl
   code (Roland Dreier)

* tag 'nvme-for-4.18' of git://git.infradead.org/nvme:
  nvme: fix handling of metadata_len for NVME_IOCTL_IO_CMD
  nvme: don't enable AEN if not supported
  nvme: ensure forward progress during Admin passthru
  nvme-pci: fix memory leak on probe failure

74413084

Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 165ea0d1

由 Linus Torvalds 提交于 7月 22, 2018

Pull vfs fixes from Al Viro:
 "Fix several places that screw up cleanups after failures halfway
  through opening a file (one open-coding filp_clone_open() and getting
  it wrong, two misusing alloc_file()). That part is -stable fodder from
  the 'work.open' branch.

  And Christoph's regression fix for uapi breakage in aio series;
  include/uapi/linux/aio_abi.h shouldn't be pulling in the kernel
  definition of sigset_t, the reason for doing so in the first place had
  been bogus - there's no need to expose struct __aio_sigset in
  aio_abi.h at all"

* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  aio: don't expose __aio_sigset in uapi
  ocxlflash_getfile(): fix double-iput() on alloc_file() failures
  cxl_getfile(): fix double-iput() on alloc_file() failures
  drm_mode_create_lease_ioctl(): fix open-coded filp_clone_open()

165ea0d1

alpha: fix osf_wait4() breakage · f88a333b

由 Al Viro 提交于 7月 22, 2018

kernel_wait4() expects a userland address for status - it's only
rusage that goes as a kernel one (and needs a copyout afterwards)

[ Also, fix the prototype of kernel_wait4() to have that __user
  annotation   - Linus ]

Fixes: 92ebce5a ("osf_wait4: switch to kernel_wait4()")
Cc: stable@kernel.org # v4.13+
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f88a333b

22 7月, 2018 15 次提交

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 45ae4df9

由 Linus Torvalds 提交于 7月 21, 2018

Pull ARM SoC fixes from Olof Johansson:

 - Fix interrupt type on ethernet switch for i.MX-based RDU2

 - GPC on i.MX exposed too large a register window which resulted in
   userspace being able to crash the machine.

 - Fixup of bad merge resolution moving GPIO DT nodes under pinctrl on
   droid4.

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  ARM: dts: imx6: RDU2: fix irq type for mv88e6xxx switch
  soc: imx: gpc: restrict register range for regmap access
  ARM: dts: omap4-droid4: fix dts w.r.t. pwm

45ae4df9

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ef81e63e

由 Linus Torvalds 提交于 7月 21, 2018

Pull x86 fix from Ingo Molnar:
 "A single fix for a MCE-polling regression, which prevented the
  disabling of polling"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/MCE: Remove min interval polling limitation

ef81e63e

Merge branch 'x86-pti-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 43227e09

由 Linus Torvalds 提交于 7月 21, 2018

Pull x86 pti fixes from Ingo Molnar:
 "An APM fix, and a BTS hardware-tracing fix related to PTI changes"

* 'x86-pti-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/apm: Don't access __preempt_count with zeroed fs
  x86/events/intel/ds: Fix bts_interrupt_threshold alignment

43227e09

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 48b1db7c

由 Linus Torvalds 提交于 7月 21, 2018

Pull scheduler fixes from Ingo Molnar:
 "Two fixes: a stop-machine preemption fix and a SCHED_DEADLINE fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/deadline: Fix switched_from_dl() warning
  stop_machine: Disable preemption when waking two stopper threads

48b1db7c

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ea75a2c7

由 Linus Torvalds 提交于 7月 21, 2018

Pull core kernel fixes from Ingo Molnar:
 "This is mostly the copy_to_user_mcsafe() related fixes from Dan
  Williams, and an ORC fix for Clang"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/asm/memcpy_mcsafe: Fix copy_to_user_mcsafe() exception handling
  lib/iov_iter: Fix pipe handling in _copy_to_iter_mcsafe()
  lib/iov_iter: Document _copy_to_iter_flushcache()
  lib/iov_iter: Document _copy_to_iter_mcsafe()
  objtool: Use '.strtab' if '.shstrtab' doesn't exist, to support ORC tables on Clang

ea75a2c7

Merge tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · ffb48e79

由 Linus Torvalds 提交于 7月 21, 2018

Pull powerpc fixes from Michael Ellerman:
 "Two regression fixes, one for xmon disassembly formatting and the
  other to fix the E500 build.

  Two commits to fix a potential security issue in the VFIO code under
  obscure circumstances.

  And finally a fix to the Power9 idle code to restore SPRG3, which is
  user visible and used for sched_getcpu().

  Thanks to: Alexey Kardashevskiy, David Gibson. Gautham R. Shenoy,
  James Clarke"

* tag 'powerpc-4.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/powernv: Fix save/restore of SPRG3 on entry/exit from stop (idle)
  powerpc/Makefile: Assemble with -me500 when building for E500
  KVM: PPC: Check if IOMMU page is contained in the pinned physical page
  vfio/spapr: Use IOMMU pageshift rather than pagesize
  powerpc/xmon: Fix disassembly since printf changes

ffb48e79

Merge tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 55b636b4

由 Linus Torvalds 提交于 7月 21, 2018

Pull btrfs fix from David Sterba:
 "A fix of a corruption regarding fsync and clone, under some very
  specific conditions explained in the patch.

  The fix is marked for stable 3.16+ so I'd like to get it merged now
  given the impact"

* tag 'for-4.18-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  Btrfs: fix file data corruption after cloning a range and fsync

55b636b4

mm: make vm_area_alloc() initialize core fields · 490fc053

由 Linus Torvalds 提交于 7月 21, 2018

Like vm_area_dup(), it initializes the anon_vma_chain head, and the
basic mm pointer.

The rest of the fields end up being different for different users,
although the plan is to also initialize the 'vm_ops' field to a dummy
entry.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

490fc053

mm: make vm_area_dup() actually copy the old vma data · 95faf699

由 Linus Torvalds 提交于 7月 21, 2018

.. and re-initialize th eanon_vma_chain head.

This removes some boiler-plate from the users, and also makes it clear
why it didn't need use the 'zalloc()' version.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

95faf699

mm: use helper functions for allocating and freeing vm_area structs · 3928d4f5

由 Linus Torvalds 提交于 7月 21, 2018

The vm_area_struct is one of the most fundamental memory management
objects, but the management of it is entirely open-coded evertwhere,
ranging from allocation and freeing (using kmem_cache_[z]alloc and
kmem_cache_free) to initializing all the fields.

We want to unify this in order to end up having some unified
initialization of the vmas, and the first step to this is to at least
have basic allocation functions.

Right now those functions are literally just wrappers around the
kmem_cache_*() calls.  This is a purely mechanical conversion:

    # new vma:
    kmem_cache_zalloc(vm_area_cachep, GFP_KERNEL) -> vm_area_alloc()

    # copy old vma
    kmem_cache_alloc(vm_area_cachep, GFP_KERNEL) -> vm_area_dup(old)

    # free vma
    kmem_cache_free(vm_area_cachep, vma) -> vm_area_free(vma)

to the point where the old vma passed in to the vm_area_dup() function
isn't even used yet (because I've left all the old manual initialization
alone).
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3928d4f5

Merge branch 'akpm' (patches from Andrew) · 191a3afa

由 Linus Torvalds 提交于 7月 21, 2018

Merge fixes from Andrew Morton:
 "5 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: memcg: fix use after free in mem_cgroup_iter()
  mm/huge_memory.c: fix data loss when splitting a file pmd
  fat: fix memory allocation failure handling of match_strdup()
  MAINTAINERS: Peter has moved
  mm/memblock: add missing include <linux/bootmem.h>

191a3afa

mm: memcg: fix use after free in mem_cgroup_iter() · 9f15bde6

由 Jing Xia 提交于 7月 20, 2018

It was reported that a kernel crash happened in mem_cgroup_iter(), which
can be triggered if the legacy cgroup-v1 non-hierarchical mode is used.

Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b8f
......
Call trace:
  mem_cgroup_iter+0x2e0/0x6d4
  shrink_zone+0x8c/0x324
  balance_pgdat+0x450/0x640
  kswapd+0x130/0x4b8
  kthread+0xe8/0xfc
  ret_from_fork+0x10/0x20

  mem_cgroup_iter():
      ......
      if (css_tryget(css))    <-- crash here
	    break;
      ......

The crashing reason is that mem_cgroup_iter() uses the memcg object whose
pointer is stored in iter->position, which has been freed before and
filled with POISON_FREE(0x6b).

And the root cause of the use-after-free issue is that
invalidate_reclaim_iterators() fails to reset the value of iter->position
to NULL when the css of the memcg is released in non- hierarchical mode.

Link: http://lkml.kernel.org/r/1531994807-25639-1-git-send-email-jing.xia@unisoc.com
Fixes: 6df38689 ("mm: memcontrol: fix possible memcg leak due to interrupted reclaim")
Signed-off-by: NJing Xia <jing.xia.mail@gmail.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: <chunyan.zhang@unisoc.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9f15bde6

mm/huge_memory.c: fix data loss when splitting a file pmd · e1f1b157

由 Hugh Dickins 提交于 7月 20, 2018

__split_huge_pmd_locked() must check if the cleared huge pmd was dirty,
and propagate that to PageDirty: otherwise, data may be lost when a huge
tmpfs page is modified then split then reclaimed.

How has this taken so long to be noticed?  Because there was no problem
when the huge page is written by a write system call (shmem_write_end()
calls set_page_dirty()), nor when the page is allocated for a write fault
(fault_dirty_shared_page() calls set_page_dirty()); but when allocated for
a read fault (which MAP_POPULATE simulates), no set_page_dirty().

Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1807111741430.1106@eggly.anvils
Fixes: d21b9e57 ("thp: handle file pages in split_huge_pmd()")
Signed-off-by: NHugh Dickins <hughd@google.com>
Reported-by: NAshwin Chaugule <ashwinch@google.com>
Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
Reviewed-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: <stable@vger.kernel.org>	[4.8+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e1f1b157

fat: fix memory allocation failure handling of match_strdup() · 35033ab9

由 OGAWA Hirofumi 提交于 7月 20, 2018

In parse_options(), if match_strdup() failed, parse_options() leaves
opts->iocharset in unexpected state (i.e. still pointing the freed
string). And this can be the cause of double free.

To fix, this initialize opts->iocharset always when freeing.

Link: http://lkml.kernel.org/r/8736wp9dzc.fsf@mail.parknet.co.jpSigned-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Reported-by: syzbot+90b8e10515ae88228a92@syzkaller.appspotmail.com
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

35033ab9

MAINTAINERS: Peter has moved · 5a696494

由 Peter Senna Tschudin 提交于 7月 20, 2018

Update my E-mail address in the MAINTAINERS file.

Link: http://lkml.kernel.org/r/20180710144702.1308-1-peter.senna@gmail.comSigned-off-by: NPeter Senna Tschudin <peter.senna@gmail.com>
Reviewed-by: NSebastian Reichel <sebastian.reichel@collabora.co.uk>
Acked-by: NMartyn Welch <martyn.welch@collabora.co.uk>
Cc: David S. Miller <davem@davemloft.net>
Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Martin Donnelly <martin.donnelly@ge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5a696494

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功