提交 · 2fe59f507a65dbd734b990a11ebc7488f6f87a24 · openanolis / cloud-kernel

24 8月, 2017 1 次提交

timers: Fix excessive granularity of new timers after a nohz idle · 2fe59f50

由 Nicholas Piggin 提交于 8月 22, 2017

When a timer base is idle, it is forwarded when a new timer is added
to ensure that granularity does not become excessive. When not idle,
the timer tick is expected to increment the base.

However there are several problems:

- If an existing timer is modified, the base is forwarded only after
  the index is calculated.

- The base is not forwarded by add_timer_on.

- There is a window after a timer is restarted from a nohz idle, after
  it is marked not-idle and before the timer tick on this CPU, where a
  timer may be added but the ancient base does not get forwarded.

These result in excessive granularity (a 1 jiffy timeout can blow out
to 100s of jiffies), which cause the rcu lockup detector to trigger,
among other things.

Fix this by keeping track of whether the timer base has been idle
since it was last run or forwarded, and if so then forward it before
adding a new timer.

There is still a case where mod_timer optimises the case of a pending
timer mod with the same expiry time, where the timer can see excessive
granularity relative to the new, shorter interval. A comment is added,
but it's not changed because it is an important fastpath for
networking.

This has been tested and found to fix the RCU softlockup messages.

Testing was also done with tracing to measure requested versus
achieved wakeup latencies for all non-deferrable timers in an idle
system (with no lockup watchdogs running). Wakeup latency relative to
absolute latency is calculated (note this suffers from round-up skew
at low absolute times) and analysed:

             max     avg      std
upstream   506.0    1.20     4.68
patched      2.0    1.08     0.15

The bug was noticed due to the lockup detector Kconfig changes
dropping it out of people's .configs and resulting in larger base
clk skew When the lockup detectors are enabled, no CPU can go idle for
longer than 4 seconds, which limits the granularity errors.
Sub-optimal timer behaviour is observable on a smaller scale in that
case:

	     max     avg      std
upstream     9.0    1.05     0.19
patched      2.0    1.04     0.11

Fixes: Fixes: a683f390 ("timers: Forward the wheel clock whenever possible")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: NDavid Miller <davem@davemloft.net>
Cc: dzickus@redhat.com
Cc: sfr@canb.auug.org.au
Cc: mpe@ellerman.id.au
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: linuxarm@huawei.com
Cc: abdhalee@linux.vnet.ibm.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: akpm@linux-foundation.org
Cc: paulmck@linux.vnet.ibm.com
Cc: torvalds@linux-foundation.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.com

2fe59f50

21 8月, 2017 6 次提交

L

Linux 4.13-rc6 · 14ccee78
由 Linus Torvalds 提交于 8月 20, 2017

14ccee78

Sanitize 'move_pages()' permission checks · 197e7e52

由 Linus Torvalds 提交于 8月 20, 2017

The 'move_paghes()' system call was introduced long long ago with the
same permission checks as for sending a signal (except using
CAP_SYS_NICE instead of CAP_SYS_KILL for the overriding capability).

That turns out to not be a great choice - while the system call really
only moves physical page allocations around (and you need other
capabilities to do a lot of it), you can check the return value to map
out some the virtual address choices and defeat ASLR of a binary that
still shares your uid.

So change the access checks to the more common 'ptrace_may_access()'
model instead.

This tightens the access checks for the uid, and also effectively
changes the CAP_SYS_NICE check to CAP_SYS_PTRACE, but it's unlikely that
anybody really _uses_ this legacy system call any more (we hav ebetter
NUMA placement models these days), so I expect nobody to notice.

Famous last words.
Reported-by: NOtto Ebeling <otto.ebeling@iki.fi>
Acked-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

197e7e52

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7f680d7e

由 Linus Torvalds 提交于 8月 20, 2017

Pull x86 fixes from Thomas Gleixner:
 "Another pile of small fixes and updates for x86:

   - Plug a hole in the SMAP implementation which misses to clear AC on
     NMI entry

   - Fix the norandmaps/ADDR_NO_RANDOMIZE logic so the command line
     parameter works correctly again

   - Use the proper accessor in the startup64 code for next_early_pgt to
     prevent accessing of invalid addresses and faulting in the early
     boot code.

   - Prevent CPU hotplug lock recursion in the MTRR code

   - Unbreak CPU0 hotplugging

   - Rename overly long CPUID bits which got introduced in this cycle

   - Two commits which mark data 'const' and restrict the scope of data
     and functions to file scope by making them 'static'"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86: Constify attribute_group structures
  x86/boot/64/clang: Use fixup_pointer() to access 'next_early_pgt'
  x86/elf: Remove the unnecessary ADDR_NO_RANDOMIZE checks
  x86: Fix norandmaps/ADDR_NO_RANDOMIZE
  x86/mtrr: Prevent CPU hotplug lock recursion
  x86: Mark various structures and functions as 'static'
  x86/cpufeature, kvm/svm: Rename (shorten) the new "virtualized VMSAVE/VMLOAD" CPUID flag
  x86/smpboot: Unbreak CPU0 hotplug
  x86/asm/64: Clear AC on NMI entries

7f680d7e

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 2615a38f

由 Linus Torvalds 提交于 8月 20, 2017

Pull timer fixes from Thomas Gleixner:
 "A few small fixes for timer drivers:

   - Prevent infinite recursion in the arm architected timer driver with
     ftrace

   - Propagate error codes to the caller in case of failure in EM STI
     driver

   - Adjust a bogus loop iteration in the arm architected timer driver

   - Add a missing Kconfig dependency to the pistachio clocksource to
     prevent build failures

   - Correctly check for IS_ERR() instead of NULL in the shared timer-of
     code"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  clocksource/drivers/arm_arch_timer: Avoid infinite recursion when ftrace is enabled
  clocksource/drivers/Kconfig: Fix CLKSRC_PISTACHIO dependencies
  clocksource/drivers/timer-of: Checking for IS_ERR() instead of NULL
  clocksource/drivers/em_sti: Fix error return codes in em_sti_probe()
  clocksource/drivers/arm_arch_timer: Fix mem frame loop initialization

2615a38f

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e46db8d2

由 Linus Torvalds 提交于 8月 20, 2017

Pull perf fixes from Thomas Gleixner:
 "Two fixes for the perf subsystem:

   - Fix an inconsistency of RDPMC mm struct tagging across exec() which
     causes RDPMC to fault.

   - Correct the timestamp mechanics across IOC_DISABLE/ENABLE which
     causes incorrect timestamps and total time calculations"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/core: Fix time on IOC_ENABLE
  perf/x86: Fix RDPMC vs. mm_struct tracking

e46db8d2

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 9dae41a2

由 Linus Torvalds 提交于 8月 20, 2017

Pull irq fixes from Thomas Gleixner:
 "A pile of smallish changes all over the place:

   - Add a missing ISB in the GIC V1 driver

   - Remove an ACPI version check in the GIC V3 ITS driver

   - Add the missing irq_pm_shutdown function for BRCMSTB-L2 to avoid
     spurious wakeups

   - Remove the artifical limitation of ITS instances to the number of
     NUMA nodes which prevents utilizing the ITS hardware correctly

   - Prevent a infinite parsing loop in the GIC-V3 ITS/MSI code

   - Honour the force affinity argument in the GIC-V3 driver which is
     required to make perf work correctly

   - Correctly report allocation failures in GIC-V2/V3 to avoid using
     half allocated and initialized interrupts.

   - Fixup checks against nr_cpu_ids in the generic IPI code"

* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  genirq/ipi: Fixup checks against nr_cpu_ids
  genirq: Restore trigger settings in irq_modify_status()
  MAINTAINERS: Remove Jason Cooper's irqchip git tree
  irqchip/gic-v3-its-platform-msi: Fix msi-parent parsing loop
  irqchip/gic-v3-its: Allow GIC ITS number more than MAX_NUMNODES
  irqchip: brcmstb-l2: Define an irq_pm_shutdown function
  irqchip/gic: Ensure we have an ISB between ack and ->handle_irq
  irqchip/gic-v3-its: Remove ACPICA version check for ACPI NUMA
  irqchip/gic-v3: Honor forced affinity setting
  irqchip/gic-v3: Report failures in gic_irq_domain_alloc
  irqchip/gic-v2: Report failures in gic_irq_domain_alloc
  irqchip/atmel-aic: Remove root argument from ->fixup() prototype
  irqchip/atmel-aic: Fix unbalanced refcount in aic_common_rtc_irq_fixup()
  irqchip/atmel-aic: Fix unbalanced of_node_put() in aic_common_irq_fixup()

9dae41a2

20 8月, 2017 2 次提交

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · e18a5ebc

由 Linus Torvalds 提交于 8月 20, 2017

Pull watchdog fix from Thomas Gleixner:
 "A fix for the hardlockup watchdog to prevent false positives with
  extreme Turbo-Modes which make the perf/NMI watchdog fire faster than
  the hrtimer which is used to verify.

  Slightly larger than the minimal fix, which just would increase the
  hrtimer frequency, but comes with extra overhead of more watchdog
  timer interrupts and thread wakeups for all users.

  With this change we restrict the overhead to the extreme Turbo-Mode
  systems"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  kernel/watchdog: Prevent false positives with turbo modes

e18a5ebc

genirq/ipi: Fixup checks against nr_cpu_ids · 8fbbe2d7

由 Alexey Dobriyan 提交于 8月 19, 2017

Valid CPU ids are [0, nr_cpu_ids-1] inclusive.

Fixes: 3b8e29a8 ("genirq: Implement ipi_send_mask/single()")
Fixes: f9bce791 ("genirq: Add a new function to get IPI reverse mapping")
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170819095751.GB27864@avx2

8fbbe2d7

19 8月, 2017 22 次提交

Merge branch 'akpm' (patches from Andrew) · 58d4e450

由 Linus Torvalds 提交于 8月 18, 2017

Merge misc fixes from Andrew Morton:
 "14 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes
  mm/vmalloc.c: don't unconditonally use __GFP_HIGHMEM
  mm/mempolicy: fix use after free when calling get_mempolicy
  mm/cma_debug.c: fix stack corruption due to sprintf usage
  signal: don't remove SIGNAL_UNKILLABLE for traced tasks.
  mm, oom: fix potential data corruption when oom_reaper races with writer
  mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS
  slub: fix per memcg cache leak on css offline
  mm: discard memblock data later
  test_kmod: fix description for -s -and -c parameters
  kmod: fix wait on recursive loop
  wait: add wait_event_killable_timeout()
  kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog
  mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback()

58d4e450

mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes · c715b72c

由 Kees Cook 提交于 8月 18, 2017

Moving the x86_64 and arm64 PIE base from 0x555555554000 to 0x000100000000
broke AddressSanitizer.  This is a partial revert of:

  eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
  02445990 ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")

The AddressSanitizer tool has hard-coded expectations about where
executable mappings are loaded.

The motivation for changing the PIE base in the above commits was to
avoid the Stack-Clash CVEs that allowed executable mappings to get too
close to heap and stack.  This was mainly a problem on 32-bit, but the
64-bit bases were moved too, in an effort to proactively protect those
systems (proofs of concept do exist that show 64-bit collisions, but
other recent changes to fix stack accounting and setuid behaviors will
minimize the impact).

The new 32-bit PIE base is fine for ASan (since it matches the ET_EXEC
base), so only the 64-bit PIE base needs to be reverted to let x86 and
arm64 ASan binaries run again.  Future changes to the 64-bit PIE base on
these architectures can be made optional once a more dynamic method for
dealing with AddressSanitizer is found.  (e.g.  always loading PIE into
the mmap region for marked binaries.)

Link: http://lkml.kernel.org/r/20170807201542.GA21271@beast
Fixes: eab09532 ("binfmt_elf: use ELF_ET_DYN_BASE only for PIE")
Fixes: 02445990 ("arm64: move ELF_ET_DYN_BASE to 4GB / 4MB")
Signed-off-by: NKees Cook <keescook@chromium.org>
Reported-by: NKostya Serebryany <kcc@google.com>
Acked-by: NWill Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c715b72c

mm/vmalloc.c: don't unconditonally use __GFP_HIGHMEM · 704b862f

由 Laura Abbott 提交于 8月 18, 2017

Commit 19809c2d ("mm, vmalloc: use __GFP_HIGHMEM implicitly") added
use of __GFP_HIGHMEM for allocations.  vmalloc_32 may use
GFP_DMA/GFP_DMA32 which does not play nice with __GFP_HIGHMEM and will
trigger a BUG in gfp_zone.

Only add __GFP_HIGHMEM if we aren't using GFP_DMA/GFP_DMA32.

Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1482249
Link: http://lkml.kernel.org/r/20170816220705.31374-1-labbott@redhat.com
Fixes: 19809c2d ("mm, vmalloc: use __GFP_HIGHMEM implicitly")
Signed-off-by: NLaura Abbott <labbott@redhat.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

704b862f

mm/mempolicy: fix use after free when calling get_mempolicy · 73223e4e

由 zhong jiang 提交于 8月 18, 2017

I hit a use after free issue when executing trinity and repoduced it
with KASAN enabled.  The related call trace is as follows.

  BUG: KASan: use after free in SyS_get_mempolicy+0x3c8/0x960 at addr ffff8801f582d766
  Read of size 2 by task syz-executor1/798

  INFO: Allocated in mpol_new.part.2+0x74/0x160 age=3 cpu=1 pid=799
     __slab_alloc+0x768/0x970
     kmem_cache_alloc+0x2e7/0x450
     mpol_new.part.2+0x74/0x160
     mpol_new+0x66/0x80
     SyS_mbind+0x267/0x9f0
     system_call_fastpath+0x16/0x1b
  INFO: Freed in __mpol_put+0x2b/0x40 age=4 cpu=1 pid=799
     __slab_free+0x495/0x8e0
     kmem_cache_free+0x2f3/0x4c0
     __mpol_put+0x2b/0x40
     SyS_mbind+0x383/0x9f0
     system_call_fastpath+0x16/0x1b
  INFO: Slab 0xffffea0009cb8dc0 objects=23 used=8 fp=0xffff8801f582de40 flags=0x200000000004080
  INFO: Object 0xffff8801f582d760 @offset=5984 fp=0xffff8801f582d600

  Bytes b4 ffff8801f582d750: ae 01 ff ff 00 00 00 00 5a 5a 5a 5a 5a 5a 5a 5a  ........ZZZZZZZZ
  Object ffff8801f582d760: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
  Object ffff8801f582d770: 6b 6b 6b 6b 6b 6b 6b a5                          kkkkkkk.
  Redzone ffff8801f582d778: bb bb bb bb bb bb bb bb                          ........
  Padding ffff8801f582d8b8: 5a 5a 5a 5a 5a 5a 5a 5a                          ZZZZZZZZ
  Memory state around the buggy address:
  ffff8801f582d600: fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff8801f582d680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  >ffff8801f582d700: fc fc fc fc fc fc fc fc fc fc fc fc fb fb fb fc

!shared memory policy is not protected against parallel removal by other
thread which is normally protected by the mmap_sem.  do_get_mempolicy,
however, drops the lock midway while we can still access it later.

Early premature up_read is a historical artifact from times when
put_user was called in this path see https://lwn.net/Articles/124754/
but that is gone since 8bccd85f ("[PATCH] Implement sys_* do_*
layering in the memory policy layer.").  but when we have the the
current mempolicy ref count model.  The issue was introduced
accordingly.

Fix the issue by removing the premature release.

Link: http://lkml.kernel.org/r/1502950924-27521-1-git-send-email-zhongjiang@huawei.comSigned-off-by: Nzhong jiang <zhongjiang@huawei.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>	[2.6+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

73223e4e

mm/cma_debug.c: fix stack corruption due to sprintf usage · da094e42

由 Prakash Gupta 提交于 8月 18, 2017

name[] in cma_debugfs_add_one() can only accommodate 16 chars including
NULL to store sprintf output.  It's common for cma device name to be
larger than 15 chars.  This can cause stack corrpution.  If the gcc
stack protector is turned on, this can cause a panic due to stack
corruption.

Below is one example trace:

  Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in:
  ffffff8e69a75730
  Call trace:
     dump_backtrace+0x0/0x2c4
     show_stack+0x20/0x28
     dump_stack+0xb8/0xf4
     panic+0x154/0x2b0
     print_tainted+0x0/0xc0
     cma_debugfs_init+0x274/0x290
     do_one_initcall+0x5c/0x168
     kernel_init_freeable+0x1c8/0x280

Fix the short sprintf buffer in cma_debugfs_add_one() by using
scnprintf() instead of sprintf().

Link: http://lkml.kernel.org/r/1502446217-21840-1-git-send-email-guptap@codeaurora.org
Fixes: f318dd08 ("cma: Store a name in the cma structure")
Signed-off-by: NPrakash Gupta <guptap@codeaurora.org>
Acked-by: NLaura Abbott <labbott@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

da094e42

signal: don't remove SIGNAL_UNKILLABLE for traced tasks. · eb61b591

由 Jamie Iles 提交于 8月 18, 2017

When forcing a signal, SIGNAL_UNKILLABLE is removed to prevent recursive
faults, but this is undesirable when tracing. For example, debugging an
init process (whether global or namespace), hitting a breakpoint and
SIGTRAP will force SIGTRAP and then remove SIGNAL_UNKILLABLE.
Everything continues fine, but then once debugging has finished, the
init process is left killable which is unlikely what the user expects,
resulting in either an accidentally killed init or an init that stops
reaping zombies.

Link: http://lkml.kernel.org/r/20170815112806.10728-1-jamie.iles@oracle.comSigned-off-by: NJamie Iles <jamie.iles@oracle.com>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eb61b591

mm, oom: fix potential data corruption when oom_reaper races with writer · 6b31d595

由 Michal Hocko 提交于 8月 18, 2017

Wenwei Tao has noticed that our current assumption that the oom victim
is dying and never doing any visible changes after it dies, and so the
oom_reaper can tear it down, is not entirely true.

__task_will_free_mem consider a task dying when SIGNAL_GROUP_EXIT is set
but do_group_exit sends SIGKILL to all threads _after_ the flag is set.
So there is a race window when some threads won't have
fatal_signal_pending while the oom_reaper could start unmapping the
address space.  Moreover some paths might not check for fatal signals
before each PF/g-u-p/copy_from_user.

We already have a protection for oom_reaper vs.  PF races by checking
MMF_UNSTABLE.  This has been, however, checked only for kernel threads
(use_mm users) which can outlive the oom victim.  A simple fix would be
to extend the current check in handle_mm_fault for all tasks but that
wouldn't be sufficient because the current check assumes that a kernel
thread would bail out after EFAULT from get_user*/copy_from_user and
never re-read the same address which would succeed because the PF path
has established page tables already.  This seems to be the case for the
only existing use_mm user currently (virtio driver) but it is rather
fragile in general.

This is even more fragile in general for more complex paths such as
generic_perform_write which can re-read the same address more times
(e.g.  iov_iter_copy_from_user_atomic to fail and then
iov_iter_fault_in_readable on retry).

Therefore we have to implement MMF_UNSTABLE protection in a robust way
and never make a potentially corrupted content visible.  That requires
to hook deeper into the PF path and check for the flag _every time_
before a pte for anonymous memory is established (that means all
!VM_SHARED mappings).

The corruption can be triggered artificially
(http://lkml.kernel.org/r/201708040646.v746kkhC024636@www262.sakura.ne.jp)
but there doesn't seem to be any real life bug report.  The race window
should be quite tight to trigger most of the time.

Link: http://lkml.kernel.org/r/20170807113839.16695-3-mhocko@kernel.org
Fixes: aac45363 ("mm, oom: introduce oom reaper")
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Reported-by: NWenwei Tao <wenwei.tww@alibaba-inc.com>
Tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6b31d595

mm: fix double mmap_sem unlock on MMF_UNSTABLE enforced SIGBUS · 5b53a6ea

由 Michal Hocko 提交于 8月 18, 2017

Tetsuo Handa has noticed that MMF_UNSTABLE SIGBUS path in
handle_mm_fault causes a lockdep splat

  Out of memory: Kill process 1056 (a.out) score 603 or sacrifice child
  Killed process 1056 (a.out) total-vm:4268108kB, anon-rss:2246048kB, file-rss:0kB, shmem-rss:0kB
  a.out (1169) used greatest stack depth: 11664 bytes left
  DEBUG_LOCKS_WARN_ON(depth <= 0)
  ------------[ cut here ]------------
  WARNING: CPU: 6 PID: 1339 at kernel/locking/lockdep.c:3617 lock_release+0x172/0x1e0
  CPU: 6 PID: 1339 Comm: a.out Not tainted 4.13.0-rc3-next-20170803+ #142
  Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
  RIP: 0010:lock_release+0x172/0x1e0
  Call Trace:
     up_read+0x1a/0x40
     __do_page_fault+0x28e/0x4c0
     do_page_fault+0x30/0x80
     page_fault+0x28/0x30

The reason is that the page fault path might have dropped the mmap_sem
and returned with VM_FAULT_RETRY.  MMF_UNSTABLE check however rewrites
the error path to VM_FAULT_SIGBUS and we always expect mmap_sem taken in
that path.  Fix this by taking mmap_sem when VM_FAULT_RETRY is held in
the MMF_UNSTABLE path.

We cannot simply add VM_FAULT_SIGBUS to the existing error code because
all arch specific page fault handlers and g-u-p would have to learn a
new error code combination.

Link: http://lkml.kernel.org/r/20170807113839.16695-2-mhocko@kernel.org
Fixes: 3f70dc38 ("mm: make sure that kthreads will not refault oom reaped memory")
Reported-by: NTetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Andrea Argangeli <andrea@kernel.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Wenwei Tao <wenwei.tww@alibaba-inc.com>
Cc: <stable@vger.kernel.org>	[4.9+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b53a6ea

slub: fix per memcg cache leak on css offline · f6ba4880

由 Vladimir Davydov 提交于 8月 18, 2017

To avoid a possible deadlock, sysfs_slab_remove() schedules an
asynchronous work to delete sysfs entries corresponding to the kmem
cache.  To ensure the cache isn't freed before the work function is
called, it takes a reference to the cache kobject.  The reference is
supposed to be released by the work function.

However, the work function (sysfs_slab_remove_workfn()) does nothing in
case the cache sysfs entry has already been deleted, leaking the kobject
and the corresponding cache.

This may happen on a per memcg cache destruction, because sysfs entries
of a per memcg cache are deleted on memcg offline if the cache is empty
(see __kmemcg_cache_deactivate()).

The kmemleak report looks like this:

  unreferenced object 0xffff9f798a79f540 (size 32):
    comm "kworker/1:4", pid 15416, jiffies 4307432429 (age 28687.554s)
    hex dump (first 32 bytes):
      6b 6d 61 6c 6c 6f 63 2d 31 36 28 31 35 39 39 3a  kmalloc-16(1599:
      6e 65 77 72 6f 6f 74 29 00 23 6b c0 ff ff ff ff  newroot).#k.....
    backtrace:
       kmemleak_alloc+0x4a/0xa0
       __kmalloc_track_caller+0x148/0x2c0
       kvasprintf+0x66/0xd0
       kasprintf+0x49/0x70
       memcg_create_kmem_cache+0xe6/0x160
       memcg_kmem_cache_create_func+0x20/0x110
       process_one_work+0x205/0x5d0
       worker_thread+0x4e/0x3a0
       kthread+0x109/0x140
       ret_from_fork+0x2a/0x40
  unreferenced object 0xffff9f79b6136840 (size 416):
    comm "kworker/1:4", pid 15416, jiffies 4307432429 (age 28687.573s)
    hex dump (first 32 bytes):
      40 fb 80 c2 3e 33 00 00 00 00 00 40 00 00 00 00  @...>3.....@....
      00 00 00 00 00 00 00 00 10 00 00 00 10 00 00 00  ................
    backtrace:
       kmemleak_alloc+0x4a/0xa0
       kmem_cache_alloc+0x128/0x280
       create_cache+0x3b/0x1e0
       memcg_create_kmem_cache+0x118/0x160
       memcg_kmem_cache_create_func+0x20/0x110
       process_one_work+0x205/0x5d0
       worker_thread+0x4e/0x3a0
       kthread+0x109/0x140
       ret_from_fork+0x2a/0x40

Fix the leak by adding the missing call to kobject_put() to
sysfs_slab_remove_workfn().

Link: http://lkml.kernel.org/r/20170812181134.25027-1-vdavydov.dev@gmail.com
Fixes: 3b7b3140 ("slub: make sysfs file removal asynchronous")
Signed-off-by: NVladimir Davydov <vdavydov.dev@gmail.com>
Reported-by: NAndrei Vagin <avagin@gmail.com>
Tested-by: NAndrei Vagin <avagin@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>	[4.12.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f6ba4880

mm: discard memblock data later · 3010f876

由 Pavel Tatashin 提交于 8月 18, 2017

There is existing use after free bug when deferred struct pages are
enabled:

The memblock_add() allocates memory for the memory array if more than
128 entries are needed.  See comment in e820__memblock_setup():

  * The bootstrap memblock region count maximum is 128 entries
  * (INIT_MEMBLOCK_REGIONS), but EFI might pass us more E820 entries
  * than that - so allow memblock resizing.

This memblock memory is freed here:
        free_low_memory_core_early()

We access the freed memblock.memory later in boot when deferred pages
are initialized in this path:

        deferred_init_memmap()
                for_each_mem_pfn_range()
                  __next_mem_pfn_range()
                    type = &memblock.memory;

One possible explanation for why this use-after-free hasn't been hit
before is that the limit of INIT_MEMBLOCK_REGIONS has never been
exceeded at least on systems where deferred struct pages were enabled.

Tested by reducing INIT_MEMBLOCK_REGIONS down to 4 from the current 128,
and verifying in qemu that this code is getting excuted and that the
freed pages are sane.

Link: http://lkml.kernel.org/r/1502485554-318703-2-git-send-email-pasha.tatashin@oracle.com
Fixes: 7e18adb4 ("mm: meminit: initialise remaining struct pages in parallel with kswapd")
Signed-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: NSteven Sistare <steven.sistare@oracle.com>
Reviewed-by: NDaniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: NBob Picco <bob.picco@oracle.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3010f876

test_kmod: fix description for -s -and -c parameters · 768dc4e4

由 Luis R. Rodriguez 提交于 8月 18, 2017

The descriptions were reversed, correct this.

Link: http://lkml.kernel.org/r/20170809234635.13443-4-mcgrof@kernel.org
Fixes: 64b67120 ("test_sysctl: add generic script to expand on tests")
Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
Reported-by: NDaniel Mentz <danielmentz@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matt Redfearn <matt.redfearn@imgetc.com>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Michal Marek <mmarek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

768dc4e4

kmod: fix wait on recursive loop · 2ba293c9

由 Luis R. Rodriguez 提交于 8月 18, 2017

Recursive loops with module loading were previously handled in kmod by
restricting the number of modprobe calls to 50 and if that limit was
breached request_module() would return an error and a user would see the
following on their kernel dmesg:

  request_module: runaway loop modprobe binfmt-464c
  Starting init:/sbin/init exists but couldn't execute it (error -8)

This issue could happen for instance when a 64-bit kernel boots a 32-bit
userspace on some architectures and has no 32-bit binary format
hanlders.  This is visible, for instance, when a CONFIG_MODULES enabled
64-bit MIPS kernel boots a into o32 root filesystem and the binfmt
handler for o32 binaries is not built-in.

After commit 6d7964a7 ("kmod: throttle kmod thread limit") we now
don't have any visible signs of an error and the kernel just waits for
the loop to end somehow.

Although this *particular* recursive loop could also be addressed by
doing a sanity check on search_binary_handler() and disallowing a
modular binfmt to be required for modprobe, a generic solution for any
recursive kernel kmod issues is still needed.

This should catch these loops.  We can investigate each loop and address
each one separately as they come in, this however puts a stop gap for
them as before.

Link: http://lkml.kernel.org/r/20170809234635.13443-3-mcgrof@kernel.org
Fixes: 6d7964a7 ("kmod: throttle kmod thread limit")
Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
Reported-by: NMatt Redfearn <matt.redfearn@imgtec.com>
Tested-by: NMatt Redfearn <matt.redfearn@imgetc.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Michal Marek <mmarek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2ba293c9

wait: add wait_event_killable_timeout() · 8ada9279

由 Luis R. Rodriguez 提交于 8月 18, 2017

These are the few pending fixes I have queued up for v4.13-final.  One
is a a generic regression fix for recursive loops on kmod and the other
one is a trivial print out correction.

During the v4.13 development we assumed that recursive kmod loops were
no longer possible.  Clearly that is not true.  The regression fix makes
use of a new killable wait.  We use a killable wait to be paranoid in
how signals might be sent to modprobe and only accept a proper SIGKILL.
The signal will only be available to userspace to issue *iff* a thread
has already entered a wait state, and that happens only if we've already
throttled after 50 kmod threads have been hit.

Note that although it may seem excessive to trigger a failure afer 5
seconds if all kmod thread remain busy, prior to the series of changes
that went into v4.13 we would actually *always* fatally fail any request
which came in if the limit was already reached.  The new waiting
implemented in v4.13 actually gives us *more* breathing room -- the wait
for 5 seconds is a wait for *any* kmod thread to finish.  We give up and
fail *iff* no kmod thread has finished and they're *all* running
straight for 5 consecutive seconds.  If 50 kmod threads are running
consecutively for 5 seconds something else must be really bad.

Recursive loops with kmod are bad but they're also hard to implement
properly as a selftest without currently fooling current userspace tools
like kmod [1].  For instance kmod will complain when you run depmod if
it finds a recursive loop with symbol dependency between modules as such
this type of recursive loop cannot go upstream as the modules_install
target will fail after running depmod.

These tests already exist on userspace kmod upstream though (refer to
the testsuite/module-playground/mod-loop-*.c files).  The same is not
true if request_module() is used though, or worst if aliases are used.

Likewise the issue with 64-bit kernels booting 32-bit userspace without
a binfmt handler built-in is also currently not detected and proactively
avoided by userspace kmod tools, or kconfig for all architectures.
Although we could complain in the kernel when some of these individual
recursive issues creep up, proactively avoiding these situations in
userspace at build time is what we should keep striving for.

Lastly, since recursive loops could happen with kmod it may mean
recursive loops may also be possible with other kernel usermode helpers,
this should be investigated and long term if we can come up with a more
sensible generic solution even better!

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20170809-kmod-for-v4.13-final
[1] https://git.kernel.org/pub/scm/utils/kernel/kmod/kmod.git

This patch (of 3):

This wait is similar to wait_event_interruptible_timeout() but only
accepts SIGKILL interrupt signal.  Other signals are ignored.

Link: http://lkml.kernel.org/r/20170809234635.13443-2-mcgrof@kernel.orgSigned-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Cc: Jessica Yu <jeyu@redhat.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michal Marek <mmarek@suse.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Miroslav Benes <mbenes@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Matt Redfearn <matt.redfearn@imgtec.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: David Binderman <dcb314@hotmail.com>
Cc: Matt Redfearn <matt.redfearn@imgetc.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ada9279

kernel/watchdog: fix Kconfig constraints for perf hardlockup watchdog · 92e5aae4

由 Nicholas Piggin 提交于 8月 18, 2017

Commit 05a4a952 ("kernel/watchdog: split up config options") lost
the perf-based hardlockup detector's dependency on PERF_EVENTS, which
can result in broken builds with some powerpc configurations.

Restore the dependency.  Add it in for x86 too, despite x86 always
selecting PERF_EVENTS it seems reasonable to make the dependency
explicit.

Link: http://lkml.kernel.org/r/20170810114452.6673-1-npiggin@gmail.com
Fixes: 05a4a952 ("kernel/watchdog: split up config options")
Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
Acked-by: NDon Zickus <dzickus@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92e5aae4

mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback() · 739f79fc

由 Johannes Weiner 提交于 8月 18, 2017

Jaegeuk and Brad report a NULL pointer crash when writeback ending tries
to update the memcg stats:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000003b0
    IP: test_clear_page_writeback+0x12e/0x2c0
    [...]
    RIP: 0010:test_clear_page_writeback+0x12e/0x2c0
    Call Trace:
     <IRQ>
     end_page_writeback+0x47/0x70
     f2fs_write_end_io+0x76/0x180 [f2fs]
     bio_endio+0x9f/0x120
     blk_update_request+0xa8/0x2f0
     scsi_end_request+0x39/0x1d0
     scsi_io_completion+0x211/0x690
     scsi_finish_command+0xd9/0x120
     scsi_softirq_done+0x127/0x150
     __blk_mq_complete_request_remote+0x13/0x20
     flush_smp_call_function_queue+0x56/0x110
     generic_smp_call_function_single_interrupt+0x13/0x30
     smp_call_function_single_interrupt+0x27/0x40
     call_function_single_interrupt+0x89/0x90
    RIP: 0010:native_safe_halt+0x6/0x10

    (gdb) l *(test_clear_page_writeback+0x12e)
    0xffffffff811bae3e is in test_clear_page_writeback (./include/linux/memcontrol.h:619).
    614		mod_node_page_state(page_pgdat(page), idx, val);
    615		if (mem_cgroup_disabled() || !page->mem_cgroup)
    616			return;
    617		mod_memcg_state(page->mem_cgroup, idx, val);
    618		pn = page->mem_cgroup->nodeinfo[page_to_nid(page)];
    619		this_cpu_add(pn->lruvec_stat->count[idx], val);
    620	}
    621
    622	unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
    623							gfp_t gfp_mask,

The issue is that writeback doesn't hold a page reference and the page
might get freed after PG_writeback is cleared (and the mapping is
unlocked) in test_clear_page_writeback().  The stat functions looking up
the page's node or zone are safe, as those attributes are static across
allocation and free cycles.  But page->mem_cgroup is not, and it will
get cleared if we race with truncation or migration.

It appears this race window has been around for a while, but less likely
to trigger when the memcg stats were updated first thing after
PG_writeback is cleared.  Recent changes reshuffled this code to update
the global node stats before the memcg ones, though, stretching the race
window out to an extent where people can reproduce the problem.

Update test_clear_page_writeback() to look up and pin page->mem_cgroup
before clearing PG_writeback, then not use that pointer afterward.  It
is a partial revert of 62cccb8c ("mm: simplify lock_page_memcg()")
but leaves the pageref-holding callsites that aren't affected alone.

Link: http://lkml.kernel.org/r/20170809183825.GA26387@cmpxchg.org
Fixes: 62cccb8c ("mm: simplify lock_page_memcg()")
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reported-by: NJaegeuk Kim <jaegeuk@kernel.org>
Tested-by: NJaegeuk Kim <jaegeuk@kernel.org>
Reported-by: NBradley Bolen <bradleybolen@gmail.com>
Tested-by: NBrad Bolen <bradleybolen@gmail.com>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>	[4.6+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

739f79fc

Merge tag 'xfs-4.13-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · cc28fcdc

由 Linus Torvalds 提交于 8月 18, 2017

Pull xfs fixes from Darrick Wong:
 "A handful more bug fixes for you today.

  Changes since last time:

   - Don't leak resources when mount fails

   - Don't accidentally clobber variables when looking for free inodes"

* tag 'xfs-4.13-fixes-5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: don't leak quotacheck dquots when cow recovery
  xfs: clear MS_ACTIVE after finishing log recovery
  iomap: fix integer truncation issues in the zeroing and dirtying helpers
  xfs: fix inobt inode allocation search optimization

cc28fcdc

Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 70bfc741

由 Linus Torvalds 提交于 8月 18, 2017

Pull block fixes from Jens Axboe:
 "A small set of fixes that should go into this release. This contains:

   - An NVMe pull request from Christoph, with a few select fixes.

     One of them fix a polling regression in this series, in which it's
     trivial to cause the kernel to disable most of the hardware queue
     interrupts.

   - Fixup for a blk-mq queue usage imbalance on request allocation,
     from Keith.

   - A xen block pull request from Konrad, fixing two issues with
     xen/xen-blkfront"

* 'for-linus' of git://git.kernel.dk/linux-block:
  blk-mq-pci: add a fallback when pci_irq_get_affinity returns NULL
  nvme-pci: set cqe_seen on polled completions
  nvme-fabrics: fix reporting of unrecognized options
  nvmet-fc: eliminate incorrect static markers on local variables
  nvmet-fc: correct use after free on list teardown
  nvmet: don't overwrite identify sn/fr with 0-bytes
  xen-blkfront: use a right index when checking requests
  xen: fix bio vec merging
  blk-mq: Fix queue usage on failed request allocation

70bfc741

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma · edb20a1b

由 Linus Torvalds 提交于 8月 18, 2017

Pull rdma fixes from Doug Ledford:
 "Fourth set of -rc fixes for 4.13 cycle. This is all of the -rc fixes
  that we know of. I suspect this will be the last rc pull request, but
  you never know, I could be wrong.

  Nothing major here. There are the i40iw patches I mentioned in my last
  pull request minus one that I pulled out because it wasn't a fix and
  not appropriate for the rc cycle. Then a few other items trickled in
  and were added to the pull request. It's fairly small aside from those
  five i40iw patches

   - Set of five i40iw fixes (the first of these is rather large by line
     count consideration, but I decided to send it because if fixes a
     legitimate issue and the line count is because it does so by
     creating a new function and using it where needed instead of just
     patching up a few lines...a smaller fix could probably be done, but
     the larger fix is the better code solution)

   - One vmw_pvrdma fix

   - One hns_roce fix (this silences a checker warning, but can't
     actually happen, I expect a patch to remove this from all drivers
     that share this same check in for-next)

   - One iw_cxgb4 fix

   - Two IB core fixes"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
  IB/uverbs: Fix NULL pointer dereference during device removal
  IB/core: Protect sysfs entry on ib_unregister_device
  iw_cxgb4: fix misuse of integer variable
  IB/hns: fix memory leak on ah on error return path
  i40iw: Fix potential fcn_id_array out of bounds
  i40iw: Use correct alignment for CQ0 memory
  i40iw: Fix typecast of tcp_seq_num
  i40iw: Correct variable names
  i40iw: Fix parsing of query/commit FPM buffers
  RDMA/vmw_pvrdma: Report CQ missed events

edb20a1b

Merge tag 'powerpc-4.13-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 039a8e38

由 Linus Torvalds 提交于 8月 18, 2017

Pull powerpc fixes from Michael Ellerman:
 "A bug in the VSX register saving that could cause userspace FP/VMX
  register corruption.

  Never seen to happen (that we know of), was found by code inspection,
  but still tagged for stable given the consequences"

* tag 'powerpc-4.13-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Fix VSX enabling/flushing to also test MSR_FP and MSR_VEC

039a8e38

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 42833468

由 Linus Torvalds 提交于 8月 18, 2017

Pull ARM SoC fixes from Arnd Bergmann:
 "A small number of bugfixes, nothing serious this time. Here is a full
  list.

  4.13 regression fix:

   - imx7d-sdb pinctrl support regressed in 4.13 due to an incomplete
     patch

  DT fixes for recently added devices:

   - badly copied DT entries on imx6qdl-nitrogen6_som broke PCI reset

   - sama5d2 memory controller had the wrong ID and registers

   - imx7 power domains did not work correctly with deferred probing
     (driver added in 4.12)

   - Allwinner H5 pinctrl (added in 4.12) did not work right with GPIO
     interrupts

  Fixes for older bugs that just got noticed:

   - i.MX25 ADC support (added in 4.6) apparently never worked right due
     to a missing 'ranges' property in DT.

   - Renesas Salvador Audio support (added in v4.5) was broken for
     device repeated bind/unbind due to a naming conflict.

   - Various allwinner boards are missing an 'ethernet' alias in DT,
     leading to unstable device naming.

  Preventive bugfix:

   - TI Keystone needs a fix to prevent a NULL pointer dereference with
     an upcoming PM change"

* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  soc: ti: ti_sci_pm_domains: Populate name for genpd
  ARM: dts: imx6qdl-nitrogen6_som2: fix PCIe reset
  arm64: allwinner: h5: fix pinctrl IRQs
  arm64: allwinner: a64: sopine: add missing ethernet0 alias
  arm64: allwinner: a64: pine64: add missing ethernet0 alias
  arm64: allwinner: a64: bananapi-m64: add missing ethernet0 alias
  arm64: renesas: salvator-common: avoid audio_clkout naming conflict
  ARM: dts: i.MX25: add ranges to tscadc
  soc: imx: gpcv2: fix regulator deferred probe
  ARM: dts: at91: sama5d2: fix EBI/NAND controllers declaration
  ARM: dts: at91: sama5d2: use sama5d2 compatible string for SMC
  ARM: dts: imx7d-sdb: Put pinctrl_spi4 in the correct location

42833468

Merge tag 'sound-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · cb247857

由 Linus Torvalds 提交于 8月 18, 2017

Pull sound fixes from Takashi Iwai:
 "A collection of small fixes, mostly for regression fixes (sequencer
  kconfig and emu10k1 probe) and device-specific quirks (three for USB
  and one for HD-audio).

  One significant change is a fix for races in ALSA sequencer core,
  which covers over the previous incomplete fix"

* tag 'sound-4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: emu10k1: Fix forgotten user-copy conversion in init code
  ALSA: usb-audio: add DSD support for new Amanero PID
  ALSA: usb-audio: Add mute TLV for playback volumes on C-Media devices
  ALSA: usb-audio: Apply sample rate quirk to Sennheiser headset
  ALSA: seq: 2nd attempt at fixing race creating a queue
  ALSA: hda/realtek - Fix pincfg for Dell XPS 13 9370
  ALSA: seq: Fix CONFIG_SND_SEQ_MIDI dependency

cb247857

Merge tag 'dma-mapping-4.13-3' of git://git.infradead.org/users/hch/dma-mapping · 4478976a

由 Linus Torvalds 提交于 8月 18, 2017

Pull dma-mapping fix from Christoph Hellwig:
 "Another dma-mapping regression fix"

* tag 'dma-mapping-4.13-3' of git://git.infradead.org/users/hch/dma-mapping:
  of: fix DMA mask generation

4478976a

18 8月, 2017 9 次提交

blk-mq-pci: add a fallback when pci_irq_get_affinity returns NULL · c0053903

由 Christoph Hellwig 提交于 8月 17, 2017

While pci_irq_get_affinity should never fail for SMP kernel that
implement the affinity mapping, it will always return NULL in the
UP case, so provide a fallback mapping of all queues to CPU 0 in
that case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>

c0053903

Merge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-linus · 6caa0503

由 Jens Axboe 提交于 8月 18, 2017

Pull NVMe changes from Christoph:

"The fixes are getting really small now - two for FC, one for PCI, one
 for the fabrics layer and one for the target."

6caa0503

kernel/watchdog: Prevent false positives with turbo modes · 7edaeb68

由 Thomas Gleixner 提交于 8月 15, 2017

The hardlockup detector on x86 uses a performance counter based on unhalted
CPU cycles and a periodic hrtimer. The hrtimer period is about 2/5 of the
performance counter period, so the hrtimer should fire 2-3 times before the
performance counter NMI fires. The NMI code checks whether the hrtimer
fired since the last invocation. If not, it assumess a hard lockup.

The calculation of those periods is based on the nominal CPU
frequency. Turbo modes increase the CPU clock frequency and therefore
shorten the period of the perf/NMI watchdog. With extreme Turbo-modes (3x
nominal frequency) the perf/NMI period is shorter than the hrtimer period
which leads to false positives.

A simple fix would be to shorten the hrtimer period, but that comes with
the side effect of more frequent hrtimer and softlockup thread wakeups,
which is not desired.

Implement a low pass filter, which checks the perf/NMI period against
kernel time. If the perf/NMI fires before 4/5 of the watchdog period has
elapsed then the event is ignored and postponed to the next perf/NMI.

That solves the problem and avoids the overhead of shorter hrtimer periods
and more frequent softlockup thread wakeups.

Fixes: 58687acb ("lockup_detector: Combine nmi_watchdog and softlockup detector")
Reported-and-tested-by: NKan Liang <Kan.liang@intel.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: dzickus@redhat.com
Cc: prarit@redhat.com
Cc: ak@linux.intel.com
Cc: babu.moger@oracle.com
Cc: peterz@infradead.org
Cc: eranian@google.com
Cc: acme@redhat.com
Cc: stable@vger.kernel.org
Cc: atomlin@redhat.com
Cc: akpm@linux-foundation.org
Cc: torvalds@linux-foundation.org
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1708150931310.1886@nanos

7edaeb68

genirq: Restore trigger settings in irq_modify_status() · e8f24189

由 Marc Zyngier 提交于 8月 18, 2017

irq_modify_status starts by clearing the trigger settings from
irq_data before applying the new settings, but doesn't restore them,
leaving them to IRQ_TYPE_NONE.

That's pretty confusing to the potential request_irq() that could
follow. Instead, snapshot the settings before clearing them, and restore
them if the irq_modify_status() invocation was not changing the trigger.

Fixes: 1e2a7d78 ("irqdomain: Don't set type when mapping an IRQ")
Reported-and-tested-by: Njeffy <jeffy.chen@rock-chips.com>
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Jon Hunter <jonathanh@nvidia.com>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170818095345.12378-1-marc.zyngier@arm.com

e8f24189

soc: ti: ti_sci_pm_domains: Populate name for genpd · 4dd6a997

由 Dave Gerlach 提交于 7月 28, 2017

Commit b6a1d093 ("PM / Domains: Extend generic power domain
debugfs") now creates a debugfs directory for each genpd based on the
name of the genpd. Currently no name is given to the genpd created by
ti_sci_pm_domains driver so because of this we see a NULL pointer
dereferences when it is accessed on boot when the debugfs entry creation
is attempted.

Give the genpd a name before registering it to avoid this.

Fixes: 52835d59 ("soc: ti: Add ti_sci_pm_domains driver")
Signed-off-by: NDave Gerlach <d-gerlach@ti.com>
Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
Signed-off-by: NArnd Bergmann <arnd@arndb.de>

4dd6a997

Merge tag 'imx-fixes-4.13-3' of... · 93112486

由 Arnd Bergmann 提交于 8月 18, 2017

Merge tag 'imx-fixes-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes

Pull "i.MX fixes for 4.13, round 3" from Shawn Guo:

 - Fix PCIe reset GPIO of imx6qdl-nitrogen6_som2 board, which was
   a bad copy from nitrogen6_max device tree.

* tag 'imx-fixes-4.13-3' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
  ARM: dts: imx6qdl-nitrogen6_som2: fix PCIe reset

93112486

Merge tag 'sunxi-fixes-for-4.13-2' of... · 552c497c

由 Arnd Bergmann 提交于 8月 18, 2017

Merge tag 'sunxi-fixes-for-4.13-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes

Pull "Allwinner fixes for 4.13, round 2" from Chen-Yu Tsai:

Three fixes adding a missing alias for the Ethernet controller on A64
boards. One adding a missing interrupt for the pin controller.

* tag 'sunxi-fixes-for-4.13-2' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
  arm64: allwinner: h5: fix pinctrl IRQs
  arm64: allwinner: a64: sopine: add missing ethernet0 alias
  arm64: allwinner: a64: pine64: add missing ethernet0 alias
  arm64: allwinner: a64: bananapi-m64: add missing ethernet0 alias

552c497c

x86: Constify attribute_group structures · 45bd07ad

由 Arvind Yadav 提交于 7月 20, 2017

attribute_groups are not supposed to change at runtime and none of the
groups is modified.

Mark the non-const structs as const.

[ tglx: Folded into one big patch ]
Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: tony.luck@intel.com
Cc: bp@alien8.de
Link: http://lkml.kernel.org/r/1500550238-15655-2-git-send-email-arvind.yadav.cs@gmail.com

45bd07ad

MAINTAINERS: Remove Jason Cooper's irqchip git tree · 7374bfb8

由 Florian Fainelli 提交于 7月 27, 2017

Jason's irqchip tree does not seem to have been updated for many months
now, remove it from the list of trees to avoid any possible confusion.

Jason says:

  "Unfortunately, when I have time for irqchip, I don't always have the
   time to properly follow up with pull-requests. So, for the time being,
   I'll stick to reviewing as I can."
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Acked-by: NJason Cooper <jason@lakedaemon.net>
Cc: marc.zyngier@arm.com
Link: http://lkml.kernel.org/r/20170727224733.8288-1-f.fainelli@gmail.com

7374bfb8

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功