提交 · 2aa79af64263190eec610422b07f60e99a7d230a · openeuler / raspberrypi-kernel

08 5月, 2015 12 次提交

locking/qspinlock: Revert to test-and-set on hypervisors · 2aa79af6

由 Peter Zijlstra (Intel) 提交于 4月 24, 2015

When we detect a hypervisor (!paravirt, see qspinlock paravirt support
patches), revert to a simple test-and-set lock to avoid the horrors
of queue preemption.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-8-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

2aa79af6

locking/qspinlock: Use a simple write to grab the lock · 2c83e8e9

由 Waiman Long 提交于 4月 24, 2015

Currently, atomic_cmpxchg() is used to get the lock. However, this
is not really necessary if there is more than one task in the queue
and the queue head don't need to reset the tail code. For that case,
a simple write to set the lock bit is enough as the queue head will
be the only one eligible to get the lock as long as it checks that
both the lock and pending bits are not set. The current pending bit
waiting code will ensure that the bit will not be set as soon as the
tail code in the lock is set.

With that change, the are some slight improvement in the performance
of the queued spinlock in the 5M loop micro-benchmark run on a 4-socket
Westere-EX machine as shown in the tables below.

		[Standalone/Embedded - same node]
  # of tasks	Before patch	After patch	%Change
  ----------	-----------	----------	-------
       3	 2324/2321	2248/2265	 -3%/-2%
       4	 2890/2896	2819/2831	 -2%/-2%
       5	 3611/3595	3522/3512	 -2%/-2%
       6	 4281/4276	4173/4160	 -3%/-3%
       7	 5018/5001	4875/4861	 -3%/-3%
       8	 5759/5750	5563/5568	 -3%/-3%

		[Standalone/Embedded - different nodes]
  # of tasks	Before patch	After patch	%Change
  ----------	-----------	----------	-------
       3	12242/12237	12087/12093	 -1%/-1%
       4	10688/10696	10507/10521	 -2%/-2%

It was also found that this change produced a much bigger performance
improvement in the newer IvyBridge-EX chip and was essentially to close
the performance gap between the ticket spinlock and queued spinlock.

The disk workload of the AIM7 benchmark was run on a 4-socket
Westmere-EX machine with both ext4 and xfs RAM disks at 3000 users
on a 3.14 based kernel. The results of the test runs were:

                AIM7 XFS Disk Test
  kernel                 JPM    Real Time   Sys Time    Usr Time
  -----                  ---    ---------   --------    --------
  ticketlock            5678233    3.17       96.61       5.81
  qspinlock             5750799    3.13       94.83       5.97

                AIM7 EXT4 Disk Test
  kernel                 JPM    Real Time   Sys Time    Usr Time
  -----                  ---    ---------   --------    --------
  ticketlock            1114551   16.15      509.72       7.11
  qspinlock             2184466    8.24      232.99       6.01

The ext4 filesystem run had a much higher spinlock contention than
the xfs filesystem run.

The "ebizzy -m" test was also run with the following results:

  kernel               records/s  Real Time   Sys Time    Usr Time
  -----                ---------  ---------   --------    --------
  ticketlock             2075       10.00      216.35       3.49
  qspinlock              3023       10.00      198.20       4.80
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-7-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

2c83e8e9

locking/qspinlock: Optimize for smaller NR_CPUS · 69f9cae9

由 Peter Zijlstra (Intel) 提交于 4月 24, 2015

When we allow for a max NR_CPUS < 2^14 we can optimize the pending
wait-acquire and the xchg_tail() operations.

By growing the pending bit to a byte, we reduce the tail to 16bit.
This means we can use xchg16 for the tail part and do away with all
the repeated compxchg() operations.

This in turn allows us to unconditionally acquire; the locked state
as observed by the wait loops cannot change. And because both locked
and pending are now a full byte we can use simple stores for the
state transition, obviating one atomic operation entirely.

This optimization is needed to make the qspinlock achieve performance
parity with ticket spinlock at light load.

All this is horribly broken on Alpha pre EV56 (and any other arch that
cannot do single-copy atomic byte stores).
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-6-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

69f9cae9

locking/qspinlock: Extract out code snippets for the next patch · 6403bd7d

由 Waiman Long 提交于 4月 24, 2015

This is a preparatory patch that extracts out the following 2 code
snippets to prepare for the next performance optimization patch.

 1) the logic for the exchange of new and previous tail code words
    into a new xchg_tail() function.
 2) the logic for clearing the pending bit and setting the locked bit
    into a new clear_pending_set_locked() function.

This patch also simplifies the trylock operation before queuing by
calling queued_spin_trylock() directly.
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-5-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6403bd7d

locking/qspinlock: Add pending bit · c1fb159d

由 Peter Zijlstra (Intel) 提交于 4月 24, 2015

Because the qspinlock needs to touch a second cacheline (the per-cpu
mcs_nodes[]); add a pending bit and allow a single in-word spinner
before we punt to the second cacheline.

It is possible so observe the pending bit without the locked bit when
the last owner has just released but the pending owner has not yet
taken ownership.

In this case we would normally queue -- because the pending bit is
already taken. However, in this case the pending bit is guaranteed
to be released 'soon', therefore wait for it and avoid queueing.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-4-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

c1fb159d

locking/qspinlock, x86: Enable x86-64 to use queued spinlocks · d73a3397

由 Waiman Long 提交于 4月 24, 2015

This patch makes the necessary changes at the x86 architecture
specific layer to enable the use of queued spinlocks for x86-64. As
x86-32 machines are typically not multi-socket. The benefit of queue
spinlock may not be apparent. So queued spinlocks are not enabled.

Currently, there is some incompatibilities between the para-virtualized
spinlock code (which hard-codes the use of ticket spinlock) and the
queued spinlocks. Therefore, the use of queued spinlocks is disabled
when the para-virtualized spinlock is enabled.

The arch/x86/include/asm/qspinlock.h header file includes some x86
specific optimization which will make the queueds spinlock code
perform better than the generic implementation.
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-3-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d73a3397

locking/qspinlock: Introduce a simple generic 4-byte queued spinlock · a33fda35

由 Waiman Long 提交于 4月 24, 2015

This patch introduces a new generic queued spinlock implementation that
can serve as an alternative to the default ticket spinlock. Compared
with the ticket spinlock, this queued spinlock should be almost as fair
as the ticket spinlock. It has about the same speed in single-thread
and it can be much faster in high contention situations especially when
the spinlock is embedded within the data structure to be protected.

Only in light to moderate contention where the average queue depth
is around 1-3 will this queued spinlock be potentially a bit slower
due to the higher slowpath overhead.

This queued spinlock is especially suit to NUMA machines with a large
number of cores as the chance of spinlock contention is much higher
in those machines. The cost of contention is also higher because of
slower inter-node memory traffic.

Due to the fact that spinlocks are acquired with preemption disabled,
the process will not be migrated to another CPU while it is trying
to get a spinlock. Ignoring interrupt handling, a CPU can only be
contending in one spinlock at any one time. Counting soft IRQ, hard
IRQ and NMI, a CPU can only have a maximum of 4 concurrent lock waiting
activities.  By allocating a set of per-cpu queue nodes and used them
to form a waiting queue, we can encode the queue node address into a
much smaller 24-bit size (including CPU number and queue node index)
leaving one byte for the lock.

Please note that the queue node is only needed when waiting for the
lock. Once the lock is acquired, the queue node can be released to
be used later.
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Daniel J Blueman <daniel@numascale.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paolo Bonzini <paolo.bonzini@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/1429901803-29771-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

a33fda35

kernel: Replace reference to ASSIGN_ONCE() with WRITE_ONCE() in comment · 663fdcbe

由 Preeti U Murthy 提交于 4月 30, 2015

Looks like commit :

 43239cbe ("kernel: Change ASSIGN_ONCE(val, x) to WRITE_ONCE(x, val)")

left behind a reference to ASSIGN_ONCE(). Update this to WRITE_ONCE().
Signed-off-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: borntraeger@de.ibm.com
Cc: dave@stgolabs.net
Cc: paulmck@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/20150430115721.22278.94082.stgit@preeti.in.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

663fdcbe

locking/rwsem: Reduce spinlock contention in wakeup after up_read()/up_write() · 59aabfc7

由 Waiman Long 提交于 4月 30, 2015

In up_write()/up_read(), rwsem_wake() will be called whenever it
detects that some writers/readers are waiting. The rwsem_wake()
function will take the wait_lock and call __rwsem_do_wake() to do the
real wakeup.  For a heavily contended rwsem, doing a spin_lock() on
wait_lock will cause further contention on the heavily contended rwsem
cacheline resulting in delay in the completion of the up_read/up_write
operations.

This patch makes the wait_lock taking and the call to __rwsem_do_wake()
optional if at least one spinning writer is present. The spinning
writer will be able to take the rwsem and call rwsem_wake() later
when it calls up_write(). With the presence of a spinning writer,
rwsem_wake() will now try to acquire the lock using trylock. If that
fails, it will just quit.
Suggested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDavidlohr Bueso <dave@stgolabs.net>
Acked-by: NJason Low <jason.low2@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Douglas Hatch <doug.hatch@hp.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Scott J Norton <scott.norton@hp.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1430428337-16802-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

59aabfc7

Merge tag 'pm+acpi-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 3e0283a5

由 Linus Torvalds 提交于 5月 07, 2015

Pull power management and ACPI fixes from Rafael Wysocki:
 "These include three regression fixes (PCI resources management,
  ACPI/PNP device enumeration, ACPI SBS on MacBook) and two ACPI
  documentation fixes related to GPIO.

  Specifics:

   - Fix for a PCI resources management regression introduced during the
     4.0 cycle and related to the handling of ACPI resources'
     Producer/Consumer flags that turn out to be useless (Jiang Liu)

   - Fix for a MacBook regression related to the Smart Battery Subsystem
     (SBS) driver causing various problems (stalls on boot, failure to
     detect or report battery) to happen and introduced during the 3.18
     cycle (Chris Bainbridge)

   - Fix for an ACPI/PNP device enumeration regression introduced during
     the 3.16 cycle caused by failing to include two PNP device IDs into
     the list of IDs that PNP device objects need to be created for
     (Witold Szczeponik)

   - Fixes for two minor mistakes in the ACPI GPIO properties
     documentation (Antonio Ospite, Rafael J Wysocki)"

* tag 'pm+acpi-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / PNP: add two IDs to list for PNPACPI device enumeration
  ACPI / documentation: Fix ambiguity in the GPIO properties document
  ACPI / documentation: fix a sentence about GPIO resources
  ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook
  x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus

3e0283a5

Merge branches 'acpi-resources', 'acpi-battery', 'acpi-doc' and 'acpi-pnp' · 9a5d9315

由 Rafael J. Wysocki 提交于 5月 07, 2015

* acpi-resources:
  x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus

* acpi-battery:
  ACPI / SBS: Add 5 us delay to fix SBS hangs on MacBook

* acpi-doc:
  ACPI / documentation: Fix ambiguity in the GPIO properties document
  ACPI / documentation: fix a sentence about GPIO resources

* acpi-pnp:
  ACPI / PNP: add two IDs to list for PNPACPI device enumeration

9a5d9315

Merge tag 'for-f2fs-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs · 68c2f356

由 Linus Torvalds 提交于 5月 07, 2015

Pull f2fs fixes from Jaegeuk Kim:
 "Fix a performance regression and a bug"

* tag 'for-f2fs-4.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
  f2fs: fix wrong error hanlder in f2fs_follow_link
  Revert "f2fs: enhance multi-threads performance"

68c2f356

07 5月, 2015 7 次提交

Merge tag 'pinctrl-v4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl · fbb7b92f

由 Linus Torvalds 提交于 5月 07, 2015

Pull pin control fixes from Linus Walleij:
 "Here is a smallish set of pin control fixes for the v4.1 cycle,
  collected the last two weeks:

   - fix a real nasty legacy bug that has screwed up the protection of
     adding pinctrl maps dynamically.  Normally this didn't happen so
     much but Dough Anderson ran into it and fixed it, kudos!

  - minor driver fixes for Qualcomm spmi, mediatek and Marvell drivers"

* tag 'pinctrl-v4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl: Don't just pretend to protect pinctrl_maps, do it for real
  pinctrl: mediatek: mtk-common: initialize unmask
  pinctrl: qcom-spmi-mpp: Fix input value report
  pinctrl: qcom-spmi: Fix pin direction configuration
  pinctrl: mvebu: Fix mapping of pin 63 (gpo -> gpio)

fbb7b92f

Merge tag 'vfio-v4.1-rc3' of git://github.com/awilliam/linux-vfio · 7bbcd1b8

由 Linus Torvalds 提交于 5月 07, 2015

Pull vfio fixes from Alex Williamson:
 "Fix some undesirable behavior with the vfio device request interface:

   - increase verbosity of device request channel (Alex Williamson)

   - fix runaway interruptible timeout (Alex Williamson)"

* tag 'vfio-v4.1-rc3' of git://github.com/awilliam/linux-vfio:
  vfio: Fix runaway interruptible timeout
  vfio-pci: Log device requests more verbosely

7bbcd1b8

Merge tag 'for-linus' of git://github.com/dledford/linux · 8cb7c15b

由 Linus Torvalds 提交于 5月 07, 2015

Pull infiniband updates from Doug Ledford:
 "Minor updates for 4.1-rc

  Most of the changes are fairly small and well confined.  The iWARP
  address reporting changes are the only ones that are a medium size.  I
  had these queued up prior to rc1, but due to the shuffle in
  maintainers, they did not get submitted when I expected.  My apologies
  for that.  I feel comfortable with them however due to the testing
  they've received, so I left them in this submission"

* tag 'for-linus' of git://github.com/dledford/linux:
  MAINTAINERS: Update InfiniBand subsystem maintainer
  MAINTAINERS: add include/rdma/ to InfiniBand subsystem
  IPoIB/CM: Fix indentation level
  iw_cxgb4: Remove negative advice dmesg warnings
  IB/core: Fix unaligned accesses
  IB/core: change rdma_gid2ip into void function as it always return zero
  IB/qib: use arch_phys_wc_add()
  IB/qib: add acounting for MTRR
  IB/core: dma unmap optimizations
  IB/core: dma map/unmap locking optimizations
  RDMA/cxgb4: Report the actual address of the remote connecting peer
  RDMA/nes: Report the actual address of the remote connecting peer
  RDMA/core: Enable the iWarp Port Mapper to provide the actual address of the connecting peer to its clients
  iw_cxgb4: enforce qp/cq id requirements
  iw_cxgb4: use BAR2 GTS register for T5 kernel mode CQs
  iw_cxgb4: 32b platform fixes
  iw_cxgb4: Cleanup register defines/MACROS
  RDMA/CMA: Canonize IPv4 on IPV6 sockets properly

8cb7c15b

Merge tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 0e1dc427

由 Linus Torvalds 提交于 5月 06, 2015

Pull xen bug fixes from David Vrabel:

 - fix blkback regression if using persistent grants

 - fix various event channel related suspend/resume bugs

 - fix AMD x86 regression with X86_BUG_SYSRET_SS_ATTRS

 - SWIOTLB on ARM now uses frames <4 GiB (if available) so device only
   capable of 32-bit DMA work.

* tag 'for-linus-4.1b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
  xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM
  hypervisor/x86/xen: Unset X86_BUG_SYSRET_SS_ATTRS on Xen PV guests
  xen/events: Set irq_info->evtchn before binding the channel to CPU in __startup_pirq()
  xen/console: Update console event channel on resume
  xen/xenbus: Update xenbus event channel on resume
  xen/events: Clear cpu_evtchn_mask before resuming
  xen-pciback: Add name prefix to global 'permissive' variable
  xen: Suspend ticks on all CPUs during suspend
  xen/grant: introduce func gnttab_unmap_refs_sync()
  xen/blkback: safely unmap purge persistent grants

0e1dc427

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3d54ac9e

由 Linus Torvalds 提交于 5月 06, 2015

Pull x86 fixes from Ingo Molnar:
 "EFI fixes, and FPU fix, a ticket spinlock boundary condition fix and
  two build fixes"

* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/fpu: Always restore_xinit_state() when use_eager_cpu()
  x86: Make cpu_tss available to external modules
  efi: Fix error handling in add_sysfs_runtime_map_entry()
  x86/spinlocks: Fix regression in spinlock contention detection
  x86/mm: Clean up types in xlate_dev_mem_ptr()
  x86/efi: Store upper bits of command line buffer address in ext_cmd_line_ptr
  efivarfs: Ensure VariableName is NUL-terminated

3d54ac9e

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · d8fce2db

由 Linus Torvalds 提交于 5月 06, 2015

Pull perf fixes from Ingo Molnar:
 "Mostly tooling fixes, but also an uncore PMU driver fix and an uncore
  PMU driver hardware-enablement addition"

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf probe: Fix segfault if passed with ''.
  perf report: Fix -T/--threads option to work again
  perf bench numa: Fix immediate meeting of convergence condition
  perf bench numa: Fixes of --quiet argument
  perf bench futex: Fix hung wakeup tasks after requeueing
  perf probe: Fix bug with global variables handling
  perf top: Fix a segfault when kernel map is restricted.
  tools lib traceevent: Fix build failure on 32-bit arch
  perf kmem: Fix compiles on RHEL6/OL6
  tools lib api: Undefine _FORTIFY_SOURCE before setting it
  perf kmem: Consistently use PRIu64 for printing u64 values
  perf trace: Disable events and drain events when forked workload ends
  perf trace: Enable events when doing system wide tracing and starting a workload
  perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
  perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
  perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu

d8fce2db

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 02f0f572

由 Linus Torvalds 提交于 5月 06, 2015

Pull RCU fix from Ingo Molnar:
 "An RCU Kconfig fix that eliminates an annoying interactive kconfig
  question for CONFIG_RCU_TORTURE_TEST_SLOW_INIT"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  rcu: Control grace-period delays directly from value

02f0f572

06 5月, 2015 21 次提交

pinctrl: Don't just pretend to protect pinctrl_maps, do it for real · c5272a28

由 Doug Anderson 提交于 5月 01, 2015

Way back, when the world was a simpler place and there was no war, no
evil, and no kernel bugs, there was just a single pinctrl lock.  That
was how the world was when (57291ce2 pinctrl: core device tree mapping
table parsing support) was written.  In that case, there were
instances where the pinctrl mutex was already held when
pinctrl_register_map() was called, hence a "locked" parameter was
passed to the function to indicate that the mutex was already locked
(so we shouldn't lock it again).

A few years ago in (42fed7ba pinctrl: move subsystem mutex to
pinctrl_dev struct), we switched to a separate pinctrl_maps_mutex.
...but (oops) we forgot to re-think about the whole "locked" parameter
for pinctrl_register_map().  Basically the "locked" parameter appears
to still refer to whether the bigger pinctrl_dev mutex is locked, but
we're using it to skip locks of our (now separate) pinctrl_maps_mutex.

That's kind of a bad thing(TM).  Probably nobody noticed because most
of the calls to pinctrl_register_map happen at boot time and we've got
synchronous device probing.  ...and even cases where we're
asynchronous don't end up actually hitting the race too often.  ...but
after banging my head against the wall for a bug that reproduced 1 out
of 1000 reboots and lots of looking through kgdb, I finally noticed
this.

Anyway, we can now safely remove the "locked" parameter and go back to
a war-free, evil-free, and kernel-bug-free world.

Fixes: 42fed7ba ("pinctrl: move subsystem mutex to pinctrl_dev struct")
Signed-off-by: NDoug Anderson <dianders@chromium.org>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>

c5272a28

xen: Add __GFP_DMA flag when xen_swiotlb_init gets free pages on ARM · 8746515d

由 Stefano Stabellini 提交于 4月 24, 2015

Make sure that xen_swiotlb_init allocates buffers that are DMA capable
when at least one memblock is available below 4G. Otherwise we assume
that all devices on the SoC can cope with >4G addresses. We do this on
ARM and ARM64, where dom0 is mapped 1:1, so pfn == mfn in this case.

No functional changes on x86.

From: Chen Baozi <baozich@gmail.com>
Signed-off-by: NChen Baozi <baozich@gmail.com>
Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
Tested-by: NChen Baozi <baozich@gmail.com>
Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>

8746515d

x86/fpu: Always restore_xinit_state() when use_eager_cpu() · c88d4748

由 Bobby Powers 提交于 4月 27, 2015

The following commit:

  f893959b ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")

removed drop_init_fpu() usage from flush_thread(). This seems to break
things for me - the Go 1.4 test suite fails all over the place with
floating point comparision errors (offending commit found through
bisection).

The functional change was that flush_thread() after this commit
only calls restore_init_xstate() when both use_eager_fpu() and
!used_math() are true. drop_init_fpu() (now fpu_reset_state()) calls
restore_init_xstate() regardless of whether current used_math() - apply
the same logic here.

Switch used_math() -> tsk_used_math(tsk) to consistently use the grabbed
tsk instead of current, like in the rest of flush_thread().
Tested-by: NDave Hansen <dave.hansen@intel.com>
Signed-off-by: NBobby Powers <bobbypowers@gmail.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NOleg Nesterov <oleg@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pekka Riikonen <priikone@iki.fi>
Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Suresh Siddha <sbsiddha@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: f893959b ("x86/fpu: Don't abuse drop_init_fpu() in flush_thread()")
Link: http://lkml.kernel.org/r/1430147441-9820-1-git-send-email-bobbypowers@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

c88d4748

Merge tag 'efi-urgent' of... · c102cb09

由 Ingo Molnar 提交于 5月 06, 2015

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

Pull EFI fixes from Matt Fleming:

 * Avoid garbage names in efivarfs due to buggy firmware by zeroing
   EFI variable name. (Ross Lagerwall)

 * Stop erroneously dropping upper 32 bits of boot command line pointer
   in EFI boot stub and stash them in ext_cmd_line_ptr. (Roy Franz)

 * Fix double-free bug in error handling code path of EFI runtime map
   code. (Dan Carpenter)
Signed-off-by: NIngo Molnar <mingo@kernel.org>

c102cb09

Merge tag 'perf-urgent-for-mingo' of... · 74f40c1f

由 Ingo Molnar 提交于 5月 06, 2015

Merge tag 'perf-urgent-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/urgent fixes from Arnaldo Carvalho de Melo:

  - Fix 'perf probe -a' segfault if passed with '' (Wang Nan)

  - Fix report -T/--threads option (Namhyung Kim)
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

74f40c1f

Merge tag 'for-linus-4.1-1' of git://git.code.sf.net/p/openipmi/linux-ipmi · 5198b443

由 Linus Torvalds 提交于 5月 05, 2015

Pull IPMI fixes from Corey Minyard:
 "Lots of minor IPMI fixes, especially ones that have have come up since
  the SSIF driver has been in the main kernel for a while"

* tag 'for-linus-4.1-1' of git://git.code.sf.net/p/openipmi/linux-ipmi:
  ipmi: Fix multi-part message handling
  ipmi: Add alert handling to SSIF
  ipmi: Fix a problem that messages are not issued in run_to_completion mode
  ipmi: Report an error if ACPI _IFT doesn't exist
  ipmi: Remove unused including <linux/version.h>
  ipmi: Don't report err in the SI driver for SSIF devices
  ipmi: Remove incorrect use of seq_has_overflowed
  ipmi:ssif: Ignore spaces when comparing I2C adapter names
  ipmi_ssif: Fix the logic on user-supplied addresses

5198b443

Merge branch 'akpm' (patches from Andrew) · 2a171aa2

由 Linus Torvalds 提交于 5月 05, 2015

Merge misc fixes from Andrew Morton:
 "16 patches

  This includes a new rtc driver for the Abracon AB x80x and isn't very
  appropriate for -rc2.  It was still being fiddled with a bit during
  the merge window and I fell asleep during -rc1"

[ So I took the new driver, it seems small and won't regress anything.
  I'm a softy.   - Linus ]

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  rtc: armada38x: fix concurrency access in armada38x_rtc_set_time
  ocfs2: dlm: fix race between purge and get lock resource
  nilfs2: fix sanity check of btree level in nilfs_btree_root_broken()
  util_macros.h: have array pointer point to array of constants
  configfs: init configfs module earlier at boot time
  mm/hwpoison-inject: check PageLRU of hpage
  mm/hwpoison-inject: fix refcounting in no-injection case
  mm: soft-offline: fix num_poisoned_pages counting on concurrent events
  rtc: add rtc-abx80x, a driver for the Abracon AB x80x i2c rtc
  Documentation: bindings: add abracon,abx80x
  kasan: show gcc version requirements in Kconfig and Documentation
  mm/memory-failure: call shake_page() when error hits thp tail page
  lib: delete lib/find_last_bit.c
  MAINTAINERS: add co-maintainer for LED subsystem
  zram: add Designated Reviewer for zram in MAINTAINERS
  revert "zram: move compact_store() to sysfs functions area"

2a171aa2

Merge tag 'platform-drivers-x86-v4.1-2' of... · 3ce05a4e

由 Linus Torvalds 提交于 5月 05, 2015

Merge tag 'platform-drivers-x86-v4.1-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86

Pull x86 platform driver fixes from Darren Hart:
 "This includes a trivial warning and adding a Lenovo laptop to an
  existing quirk.

  I've held off on things like the latter in the past, but I didn't feel
  it was risky enough to push out to 4.2.

   - thinkpad_acpi:
        Fix warning for static not at beginning

   - ideapad_laptop:
        Add Lenovo G40-30 to devices without radio switch"

* tag 'platform-drivers-x86-v4.1-2' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86:
  thinkpad_acpi: Fix warning for static not at beginning
  ideapad_laptop: Add Lenovo G40-30 to devices without radio switch

3ce05a4e

ipmi: Fix multi-part message handling · 3d69d43b

由 Corey Minyard 提交于 4月 29, 2015

Lots of little fixes for multi-part messages:

The values was not being re-initialized, if something went wrong
handling a multi-part message and it got left in a bad state, it
might be an issue.

The commands were not correct when issuing multi-part reads, the
code was not passing in the proper value for commands.  Also clean
up some minor formatting issues.

Get the block number from the right location, limit the maximum send
message size to 63 bytes and explain why, and fix some minor sylistic
issues.
Signed-off-by: NCorey Minyard <cminyard@mvista.com>

3d69d43b

ipmi: Add alert handling to SSIF · 91620521

由 Corey Minyard 提交于 4月 24, 2015

The SSIF interface can optionally have an SMBus alert come in when
data is ready.  Unfortunately, the IPMI spec gives wiggle room to
the implementer to allow them to always have the alert enabled,
even if the driver doesn't enable it.  So implement alerts.
If you don't in this situation, the SMBus alert handling will
constantly complain.
Signed-off-by: NCorey Minyard <cminyard@mvista.com>

91620521

ipmi: Fix a problem that messages are not issued in run_to_completion mode · 9f812704

由 Hidehiro Kawai 提交于 4月 23, 2015

start_next_msg() issues a message placed in smi_info->waiting_msg
if it is non-NULL.  However, sender() sets a message to
smi_info->curr_msg and NULL to smi_info->waiting_msg in the context
of run_to_completion mode.  As the result, it leads an infinite
loop by waiting the completion of unissued message when leaving
dying message after kernel panic.

sender() should set the message to smi_info->waiting_msg not
curr_msg.
Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
Signed-off-by: NCorey Minyard <cminyard@mvista.com>

9f812704

ipmi: Report an error if ACPI _IFT doesn't exist · a182a4b2

由 Corey Minyard 提交于 4月 22, 2015

When probing an ACPI table, report a specific error, instead of just
returning an error, if _IFT doesn't exist.
Signed-off-by: NCorey Minyard <cminyard@mvista.com>

a182a4b2

ipmi: Remove unused including <linux/version.h> · 15c5725e

由 Wei Yongjun 提交于 4月 16, 2015

Remove including <linux/version.h> that don't need it.
Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: NCorey Minyard <cminyard@mvista.com>

15c5725e

rtc: armada38x: fix concurrency access in armada38x_rtc_set_time · 489405fe

由 Gregory CLEMENT 提交于 5月 05, 2015

While setting the time, the RTC TIME register should not be accessed.
However due to hardware constraints, setting the RTC time involves
sleeping during 100ms.  This sleep was done outside the critical section
protected by the spinlock, so it was possible to read the RTC TIME
register and get an incorrect value.  This patch introduces a mutex for
protecting the RTC TIME access, unlike the spinlock it is allowed to
sleep in a critical section protected by a mutex.

The RTC STATUS register can still be used from the interrupt handler but
it has no effect on setting the time.
Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
Acked-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: NAndrew Lunn <andrew@lunn.ch>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: <stable@vger.kernel.org>	[4.0]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

489405fe

ocfs2: dlm: fix race between purge and get lock resource · b1432a2a

由 Junxiao Bi 提交于 5月 05, 2015

There is a race window in dlm_get_lock_resource(), which may return a
lock resource which has been purged.  This will cause the process to
hang forever in dlmlock() as the ast msg can't be handled due to its
lock resource not existing.

    dlm_get_lock_resource {
        ...
        spin_lock(&dlm->spinlock);
        tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
        if (tmpres) {
             spin_unlock(&dlm->spinlock);
             >>>>>>>> race window, dlm_run_purge_list() may run and purge
                              the lock resource
             spin_lock(&tmpres->spinlock);
             ...
             spin_unlock(&tmpres->spinlock);
        }
    }
Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b1432a2a

nilfs2: fix sanity check of btree level in nilfs_btree_root_broken() · d8fd150f

由 Ryusuke Konishi 提交于 5月 05, 2015

The range check for b-tree level parameter in nilfs_btree_root_broken()
is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even
though the level is limited to values in the range of 0 to
(NILFS_BTREE_LEVEL_MAX - 1).

Since the level parameter is read from storage device and used to index
nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it
can cause memory overrun during btree operations if the boundary value
is set to the level parameter on device.

This fixes the broken sanity check and adds a comment to clarify that
the upper bound NILFS_BTREE_LEVEL_MAX is exclusive.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8fd150f

util_macros.h: have array pointer point to array of constants · 05836c37

由 Guenter Roeck 提交于 5月 05, 2015

Using the new find_closest() macro can result in the following sparse
warnings.

  drivers/hwmon/lm85.c:194:16: warning:
  		incorrect type in initializer (different modifiers)
  drivers/hwmon/lm85.c:194:16:    expected int *__fc_a
  drivers/hwmon/lm85.c:194:16:    got int static const [toplevel] *<noident>
  drivers/hwmon/lm85.c:210:16: warning:
  		incorrect type in initializer (different modifiers)
  drivers/hwmon/lm85.c:210:16:    expected int *__fc_a
  drivers/hwmon/lm85.c:210:16:    got int const *map

This is because the array passed to find_closest() will typically be
declared as array of constants, but the macro declares a non-constant
pointer to it.
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

05836c37

configfs: init configfs module earlier at boot time · f5b69770

由 Daniel Baluta 提交于 5月 05, 2015

We need this earlier in the boot process to allow various subsystems to
use configfs (e.g Industrial IIO).

Also, debugfs is at core_initcall level and configfs should be on the same
level from infrastructure point of view.
Signed-off-by: NDaniel Baluta <daniel.baluta@intel.com>
Suggested-by: NLars-Peter Clausen <lars@metafoo.de>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5b69770

mm/hwpoison-inject: check PageLRU of hpage · e386eed8

由 Naoya Horiguchi 提交于 5月 05, 2015

Hwpoison injector checks PageLRU of the raw target page to find out
whether the page is an appropriate target, but current code now filters
out thp tail pages, which prevents us from testing for such cases via this
interface.  So let's check hpage instead of p.
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: NDean Nelson <dnelson@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e386eed8

mm/hwpoison-inject: fix refcounting in no-injection case · 7ea434a4

由 Naoya Horiguchi 提交于 5月 05, 2015

Hwpoison injection via debugfs:hwpoison/corrupt-pfn takes a refcount of
the target page.  But current code doesn't release it if the target page
is not supposed to be injected, which results in memory leak.  This patch
simply adds the refcount releasing code.
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: NDean Nelson <dnelson@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ea434a4

mm: soft-offline: fix num_poisoned_pages counting on concurrent events · 602498f9

由 Naoya Horiguchi 提交于 5月 05, 2015

If multiple soft offline events hit one free page/hugepage concurrently,
soft_offline_page() can handle the free page/hugepage multiple times,
which makes num_poisoned_pages counter increased more than once.  This
patch fixes this wrong counting by checking TestSetPageHWPoison for normal
papes and by checking the return value of dequeue_hwpoisoned_huge_page()
for hugepages.
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: NDean Nelson <dnelson@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: <stable@vger.kernel.org>	[3.14+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

602498f9