提交 · c79574abe2baddf569532e7e430e4977771dd25c · openeuler / raspberrypi-kernel

16 4月, 2015 2 次提交

mm: allow compaction of unevictable pages · 5bbe3547

由 Eric B Munson 提交于 4月 15, 2015

Currently, pages which are marked as unevictable are protected from
compaction, but not from other types of migration.  The POSIX real time
extension explicitly states that mlock() will prevent a major page
fault, but the spirit of this is that mlock() should give a process the
ability to control sources of latency, including minor page faults.
However, the mlock manpage only explicitly says that a locked page will
not be written to swap and this can cause some confusion.  The
compaction code today does not give a developer who wants to avoid swap
but wants to have large contiguous areas available any method to achieve
this state.  This patch introduces a sysctl for controlling compaction
behavior with respect to the unevictable lru.  Users who demand no page
faults after a page is present can set compact_unevictable_allowed to 0
and users who need the large contiguous areas can enable compaction on
locked memory by leaving the default value of 1.

To illustrate this problem I wrote a quick test program that mmaps a
large number of 1MB files filled with random data.  These maps are
created locked and read only.  Then every other mmap is unmapped and I
attempt to allocate huge pages to the static huge page pool.  When the
compact_unevictable_allowed sysctl is 0, I cannot allocate hugepages
after fragmenting memory.  When the value is set to 1, allocations
succeed.
Signed-off-by: NEric B Munson <emunson@akamai.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5bbe3547

memcg: zap mem_cgroup_lookup() · adbe427b

由 Vladimir Davydov 提交于 4月 15, 2015

mem_cgroup_lookup() is a wrapper around mem_cgroup_from_id(), which
checks that id != 0 before issuing the function call.  Today, there is
no point in this additional check apart from optimization, because there
is no css with id <= 0, so that css_from_id, called by
mem_cgroup_from_id, will return NULL for any id <= 0.

Since mem_cgroup_from_id is only called from mem_cgroup_lookup, let us
zap mem_cgroup_lookup, substituting calls to it with mem_cgroup_from_id
and moving the check if id > 0 to css_from_id.
Signed-off-by: NVladimir Davydov <vdavydov@parallels.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

adbe427b

15 4月, 2015 10 次提交

kernel, cpuset: remove exception for __GFP_THISNODE · 6e276d2a

由 David Rientjes 提交于 4月 14, 2015

Nothing calls __cpuset_node_allowed() with __GFP_THISNODE set anymore, so
remove the obscure comment about it and its special-case exception.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NVlastimil Babka <vbabka@suse.cz>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Pravin Shelar <pshelar@nicira.com>
Cc: Jarno Rajahalme <jrajahalme@nicira.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6e276d2a

watchdog: introduce the hardlockup_detector_disable() function · 692297d8

由 Ulrich Obergfell 提交于 4月 14, 2015

Have kvm_guest_init() use hardlockup_detector_disable() instead of
watchdog_enable_hardlockup_detector(false).

Remove the watchdog_hardlockup_detector_is_enabled() and the
watchdog_enable_hardlockup_detector() function which are no longer needed.
Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

692297d8

watchdog: clean up some function names and arguments · b2f57c3a

由 Ulrich Obergfell 提交于 4月 14, 2015

Rename the update_timers*() functions to update_watchdog*().

Remove the boolean argument from watchdog_enable_all_cpus() because
update_watchdog_all_cpus() is now a generic function to change the run
state of the lockup detectors and to have the lockup detectors use a new
sample period.
Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b2f57c3a

watchdog: enable the new user interface of the watchdog mechanism · 195daf66

由 Ulrich Obergfell 提交于 4月 14, 2015

With the current user interface of the watchdog mechanism it is only
possible to disable or enable both lockup detectors at the same time.
This series introduces new kernel parameters and changes the semantics of
some existing kernel parameters, so that the hard lockup detector and the
soft lockup detector can be disabled or enabled individually.  With this
series applied, the user interface is as follows.

- parameters in /proc/sys/kernel

  . soft_watchdog
    This is a new parameter to control and examine the run state of
    the soft lockup detector.

  . nmi_watchdog
    The semantics of this parameter have changed. It can now be used
    to control and examine the run state of the hard lockup detector.

  . watchdog
    This parameter is still available to control the run state of both
    lockup detectors at the same time. If this parameter is examined,
    it shows the logical OR of soft_watchdog and nmi_watchdog.

  . watchdog_thresh
    The semantics of this parameter are not affected by the patch.

- kernel command line parameters

  . nosoftlockup
    The semantics of this parameter have changed. It can now be used
    to disable the soft lockup detector at boot time.

  . nmi_watchdog=0 or nmi_watchdog=1
    Disable or enable the hard lockup detector at boot time. The patch
    introduces '=1' as a new option.

  . nowatchdog
    The semantics of this parameter are not affected by the patch. It
    is still available to disable both lockup detectors at boot time.

Also, remove the proc_dowatchdog() function which is no longer needed.

[dzickus@redhat.com: wrote changelog]
[dzickus@redhat.com: update documentation for kernel params and sysctl]
Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

195daf66

watchdog: implement error handling for failure to set up hardware perf events · bcfba4f4

由 Ulrich Obergfell 提交于 4月 14, 2015

If watchdog_nmi_enable() fails to set up the hardware perf event of one
CPU, the entire hard lockup detector is deemed unreliable.  Hence, disable
the hard lockup detector and shut down the hardware perf events on all
CPUs.

[dzickus@redhat.com: update comments to explain some code]
Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bcfba4f4

watchdog: introduce separate handlers for parameters in /proc/sys/kernel · 83a80a39

由 Ulrich Obergfell 提交于 4月 14, 2015

Separate handlers for each watchdog parameter in /proc/sys/kernel replace
the proc_dowatchdog() function.  Three of those handlers merely call
proc_watchdog_common() with one different argument.
Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

83a80a39

watchdog: introduce proc_watchdog_common() · ef246a21

由 Ulrich Obergfell 提交于 4月 14, 2015

Three of four handlers for the watchdog parameters in /proc/sys/kernel
essentially have to do the same thing.

  if the parameter is being read {
    return the state of the corresponding bit(s) in 'watchdog_enabled'
  } else {
    set/clear the state of the corresponding bit(s) in 'watchdog_enabled'
    update the run state of the lockup detector(s)
  }

Hence, introduce a common function that can be called by those handlers.
The callers pass a 'bit mask' to this function to indicate which bit(s)
should be set/cleared in 'watchdog_enabled'.

This function handles an uncommon race with watchdog_nmi_enable() where a
concurrent update of 'watchdog_enabled' is possible.  We use 'cmpxchg' to
detect the concurrency.  [This avoids introducing a new spinlock or a
mutex to synchronize updates of 'watchdog_enabled'.  Using the same lock
or mutex in watchdog thread context and in system call context needs to be
considered carefully because it can make the code prone to deadlock
situations in connection with parking/unparking the watchdog threads.]
Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef246a21

watchdog: move definition of 'watchdog_proc_mutex' outside of proc_dowatchdog() · f54c2274

由 Ulrich Obergfell 提交于 4月 14, 2015

This series removes proc_dowatchdog().  Since multiple new functions need
the 'watchdog_proc_mutex' to serialize access to the watchdog parameters
in /proc/sys/kernel, move the mutex outside of any function.
Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f54c2274

watchdog: introduce the proc_watchdog_update() function · a0c9cbb9

由 Ulrich Obergfell 提交于 4月 14, 2015

This series introduces a separate handler for each watchdog parameter in
/proc/sys/kernel.  The separate handlers need a common function that they
can call to update the run state of the lockup detectors, or to have the
lockup detectors use a new sample period.
Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a0c9cbb9

watchdog: new definitions and variables, initialization · 84d56e66

由 Ulrich Obergfell 提交于 4月 14, 2015

The hardlockup and softockup had always been tied together.  Due to the
request of KVM folks, they had a need to have one enabled but not the
other.  Internally rework the code to split things apart more cleanly.

There is a bunch of churn here, but the end result should be code that
should be easier to maintain and fix without knowing the internals of what
is going on.

This patch (of 9):

Introduce new definitions and variables to separate the user interface in
/proc/sys/kernel from the internal run state of the lockup detectors.  The
internal run state is represented by two bits in a new variable that is
named 'watchdog_enabled'.  This helps simplify the code, for example:

- In order to check if any of the two lockup detectors is enabled,
  it is sufficient to check if 'watchdog_enabled' is not zero.

- In order to enable/disable one or both lockup detectors,
  it is sufficient to set/clear one or both bits in 'watchdog_enabled'.

- Concurrent updates of 'watchdog_enabled' need not be synchronized via
  a spinlock or a mutex. Updates can either be atomic or concurrency can
  be detected by using 'cmpxchg'.
Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

84d56e66

13 4月, 2015 1 次提交

cpu: Defer smpboot kthread unparking until CPU known to scheduler · 00df35f9

由 Paul E. McKenney 提交于 4月 12, 2015

Currently, smpboot_unpark_threads() is invoked before the incoming CPU
has been added to the scheduler's runqueue structures.  This might
potentially cause the unparked kthread to run on the wrong CPU, since the
correct CPU isn't fully set up yet.

That causes a sporadic, hard to debug boot crash triggering on some
systems, reported by Borislav Petkov, and bisected down to:

  2a442c9c ("x86: Use common outgoing-CPU-notification code")

This patch places smpboot_unpark_threads() in a CPU hotplug
notifier with priority set so that these kthreads are unparked just after
the CPU has been added to the runqueues.
Reported-and-tested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NIngo Molnar <mingo@kernel.org>

00df35f9

09 4月, 2015 4 次提交

locking/mutex: Further simplify mutex_spin_on_owner() · 01ac33c1

由 Jason Low 提交于 4月 08, 2015

Similar to what Linus suggested for rwsem_spin_on_owner(), in
mutex_spin_on_owner() instead of having while (true) and
breaking out of the spin loop on lock->owner != owner, we can
have the loop directly check for while (lock->owner == owner) to
improve the readability of the code.

It also shrinks the code a bit:

   text    data     bss     dec     hex filename
   3721       0       0    3721     e89 mutex.o.before
   3705       0       0    3705     e79 mutex.o.after
Signed-off-by: NJason Low <jason.low2@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Link: http://lkml.kernel.org/r/1428521960-5268-2-git-send-email-jason.low2@hp.com
[ Added code generation info. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

01ac33c1

Copy the kernel module data from user space in chunks · 3afe9f84

由 Linus Torvalds 提交于 4月 07, 2015

Unlike most (all?) other copies from user space, kernel module loading
is almost unlimited in size.  So we do a potentially huge
"copy_from_user()" when we copy the module data from user space to the
kernel buffer, which can be a latency concern when preemption is
disabled (or voluntary).

Also, because 'copy_from_user()' clears the tail of the kernel buffer on
failures, even a *failed* copy can end up wasting a lot of time.

Normally neither of these are concerns in real life, but they do trigger
when doing stress-testing with trinity.  Running in a VM seems to add
its own overheadm causing trinity module load testing to even trigger
the watchdog.

The simple fix is to just chunk up the module loading, so that it never
tries to copy insanely big areas in one go.  That bounds the latency,
and also the amount of (unnecessarily, in this case) cleared memory for
the failure case.
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3afe9f84

genirq: Allow the irqchip state of an IRQ to be save/restored · 1b7047ed

由 Marc Zyngier 提交于 3月 18, 2015

There is a number of cases where a kernel subsystem may want to
introspect the state of an interrupt at the irqchip level:

- When a peripheral is shared between virtual machines,
  its interrupt state becomes part of the guest's state,
  and must be switched accordingly. KVM on arm/arm64 requires
  this for its guest-visible timer
- Some GPIO controllers seem to require peeking into the
  interrupt controller they are connected to to report
  their internal state

This seem to be a pattern that is common enough for the core code
to try and support this without too many horrible hacks. Introduce
a pair of accessors (irq_get_irqchip_state/irq_set_irqchip_state)
to retrieve the bits that can be of interest to another subsystem:
pending, active, and masked.

- irq_get_irqchip_state returns the state of the interrupt according
  to a parameter set to IRQCHIP_STATE_PENDING, IRQCHIP_STATE_ACTIVE,
  IRQCHIP_STATE_MASKED or IRQCHIP_STATE_LINE_LEVEL.
- irq_set_irqchip_state similarly sets the state of the interrupt.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NBjorn Andersson <bjorn.andersson@sonymobile.com>
Tested-by: NBjorn Andersson <bjorn.andersson@sonymobile.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Abhijeet Dharmapurikar <adharmap@codeaurora.org>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Phong Vo <pvo@apm.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Tin Huynh <tnhuynh@apm.com>
Cc: Y Vo <yvo@apm.com>
Cc: Toan Le <toanle@apm.com>
Cc: Bjorn Andersson <bjorn@kryo.se>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Link: http://lkml.kernel.org/r/1426676484-21812-2-git-send-email-marc.zyngier@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

1b7047ed

genirq: MSI: Fix freeing of unallocated MSI · fe0c52fc

由 Marc Zyngier 提交于 1月 26, 2015

While debugging an unrelated issue with the GICv3 ITS driver, the
following trace triggered:

WARNING: CPU: 1 PID: 1 at kernel/irq/irqdomain.c:1121 irq_domain_free_irqs+0x160/0x17c()
NULL pointer, cannot free irq
Modules linked in:
CPU: 1 PID: 1 Comm: swapper/0 Tainted: G        W      3.19.0-rc6+ #3690
Hardware name: FVP Base (DT)
Call trace:
[<ffffffc000089398>] dump_backtrace+0x0/0x13c
[<ffffffc0000894e4>] show_stack+0x10/0x1c
[<ffffffc00066d134>] dump_stack+0x74/0x94
[<ffffffc0000a92f8>] warn_slowpath_common+0x9c/0xd4
[<ffffffc0000a938c>] warn_slowpath_fmt+0x5c/0x80
[<ffffffc0000ee04c>] irq_domain_free_irqs+0x15c/0x17c
[<ffffffc0000ef918>] msi_domain_free_irqs+0x58/0x74
[<ffffffc000386f58>] free_msi_irqs+0xb4/0x1c0

    // The msi_prepare callback fails here

[<ffffffc0003872c0>] pci_enable_msix+0x25c/0x3d4
[<ffffffc00038746c>] pci_enable_msix_range+0x34/0x80
[<ffffffc0003924ac>] vp_try_to_find_vqs+0xec/0x528
[<ffffffc000392954>] vp_find_vqs+0x6c/0xa8
[<ffffffc0003ee2a8>] init_vq+0x120/0x248
[<ffffffc0003eefb0>] virtblk_probe+0xb0/0x6bc
[<ffffffc00038fc34>] virtio_dev_probe+0x17c/0x214
[<ffffffc0003d4a04>] driver_probe_device+0x7c/0x23c
[<ffffffc0003d4cb0>] __driver_attach+0x98/0xa0
[<ffffffc0003d2c60>] bus_for_each_dev+0x60/0xb4
[<ffffffc0003d455c>] driver_attach+0x1c/0x28
[<ffffffc0003d41b0>] bus_add_driver+0x150/0x208
[<ffffffc0003d54c0>] driver_register+0x64/0x130
[<ffffffc00038f9e8>] register_virtio_driver+0x24/0x68
[<ffffffc00091320c>] init+0x70/0xac
[<ffffffc0000828f0>] do_one_initcall+0x94/0x1d0
[<ffffffc0008e9b00>] kernel_init_freeable+0x144/0x1e4
[<ffffffc00066a434>] kernel_init+0xc/0xd8
---[ end trace f9ee562a77cc7bae ]---

The ITS msi_prepare callback having failed, we end-up trying to
free MSIs that have never been allocated. Oddly enough, the kernel
is pretty upset about it.

It turns out that this behaviour was expected before the MSI domain
was introduced (and dealt with in arch_teardown_msi_irqs).

The obvious fix is to detect this early enough and bail out.
Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
Reviewed-by: NJiang Liu <jiang.liu@linux.intel.com>
Link: http://lkml.kernel.org/r/1422299419-6051-1-git-send-email-marc.zyngier@arm.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>

fe0c52fc

08 4月, 2015 4 次提交

tracing: Add enum_map file to show enums that have been mapped · 9828413d

由 Steven Rostedt (Red Hat) 提交于 3月 31, 2015

Add a enum_map file in the tracing directory to see what enums have been
saved to convert in the print fmt files.

As this requires the enum mapping to be persistent in memory, it is only
created if the new config option CONFIG_TRACE_ENUM_MAP_FILE is enabled.
This is for debugging and will increase the persistent memory footprint
of the kernel.

Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.orgReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

9828413d

tracing: Allow for modules to convert their enums to values · 3673b8e4

由 Steven Rostedt (Red Hat) 提交于 3月 25, 2015

Update the infrastructure such that modules that declare TRACE_DEFINE_ENUM()
will have those enums converted into their values in the tracepoint
print fmt strings.

Link: http://lkml.kernel.org/r/87vbhjp74q.fsf@rustcorp.com.auAcked-by: NRusty Russell <rusty@rustcorp.com.au>
Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

3673b8e4

tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values · 0c564a53

由 Steven Rostedt (Red Hat) 提交于 3月 24, 2015

Several tracepoints use the helper functions __print_symbolic() or
__print_flags() and pass in enums that do the mapping between the
binary data stored and the value to print. This works well for reading
the ASCII trace files, but when the data is read via userspace tools
such as perf and trace-cmd, the conversion of the binary value to a
human string format is lost if an enum is used, as userspace does not
have access to what the ENUM is.

For example, the tracepoint trace_tlb_flush() has:

 __print_symbolic(REC->reason,
    { TLB_FLUSH_ON_TASK_SWITCH, "flush on task switch" },
    { TLB_REMOTE_SHOOTDOWN, "remote shootdown" },
    { TLB_LOCAL_SHOOTDOWN, "local shootdown" },
    { TLB_LOCAL_MM_SHOOTDOWN, "local mm shootdown" })

Which maps the enum values to the strings they represent. But perf and
trace-cmd do no know what value TLB_LOCAL_MM_SHOOTDOWN is, and would
not be able to map it.

With TRACE_DEFINE_ENUM(), developers can place these in the event header
files and ftrace will convert the enums to their values:

By adding:

 TRACE_DEFINE_ENUM(TLB_FLUSH_ON_TASK_SWITCH);
 TRACE_DEFINE_ENUM(TLB_REMOTE_SHOOTDOWN);
 TRACE_DEFINE_ENUM(TLB_LOCAL_SHOOTDOWN);
 TRACE_DEFINE_ENUM(TLB_LOCAL_MM_SHOOTDOWN);

 $ cat /sys/kernel/debug/tracing/events/tlb/tlb_flush/format
[...]
 __print_symbolic(REC->reason,
    { 0, "flush on task switch" },
    { 1, "remote shootdown" },
    { 2, "local shootdown" },
    { 3, "local mm shootdown" })

The above is what userspace expects to see, and tools do not need to
be modified to parse them.

Link: http://lkml.kernel.org/r/20150403013802.220157513@goodmis.org

Cc: Guilherme Cox <cox@computer.org>
Cc: Tony Luck <tony.luck@gmail.com>
Cc: Xie XiuQi <xiexiuqi@huawei.com>
Acked-by: NNamhyung Kim <namhyung@kernel.org>
Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>

0c564a53

mm: numa: disable change protection for vma(VM_HUGETLB) · 6b79c57b

由 Naoya Horiguchi 提交于 4月 07, 2015

Currently when a process accesses a hugetlb range protected with
PROTNONE, unexpected COWs are triggered, which finally puts the hugetlb
subsystem into a broken/uncontrollable state, where for example
h->resv_huge_pages is subtracted too much and wraps around to a very
large number, and the free hugepage pool is no longer maintainable.

This patch simply stops changing protection for vma(VM_HUGETLB) to fix
the problem.  And this also allows us to avoid useless overhead of minor
faults.
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Suggested-by: NMel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: David Rientjes <rientjes@google.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6b79c57b

07 4月, 2015 1 次提交

Revert "PM / hibernate: avoid unsafe pages in e820 reserved regions" · f82daee4

由 Rafael J. Wysocki 提交于 4月 07, 2015

Commit 84c91b7a (PM / hibernate: avoid unsafe pages in e820 reserved
regions) is reported to make resume from hibernation on Lenovo x230
unreliable, so revert it.

We will revisit the issue the commit in question was supposed to fix
in the future.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=96111Reported-by: Nrhn <kebuac.rhn@porcupinefactory.org>
Cc: 3.17+ <stable@vger.kernel.org> # 3.17+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f82daee4

06 4月, 2015 1 次提交

workqueue: Reorder sysfs code · 6ba94429

由 Frederic Weisbecker 提交于 4月 02, 2015

The sysfs code usually belongs to the botom of the file since it deals
with high level objects. In the workqueue code it's misplaced and such
that we'll need to work around functions references to allow the sysfs
code to call APIs like apply_workqueue_attrs().

Lets move that block further in the file, almost the botom.

And declare workqueue_sysfs_unregister() just before destroy_workqueue()
which reference it.

tj: Moved workqueue_sysfs_unregister() forward declaration where other
    forward declarations are.
Suggested-by: NTejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Kevin Hilman <khilman@linaro.org>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
Cc: Mike Galbraith <bitbucket@online.de>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

6ba94429

03 4月, 2015 17 次提交

timers/PM: Drop unnecessary braces from tick_freeze() · def74708

由 Rafael J. Wysocki 提交于 4月 03, 2015

Some braces in tick_freeze() are not necessary, so drop them.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: peterz@infradead.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1534128.H5hN3KBFB4@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

def74708

timers/PM: Fix up tick_unfreeze() · 422fe750

由 Rafael J. Wysocki 提交于 4月 03, 2015

A recent conflict resolution has left tick_resume() in
tick_unfreeze() which leads to an unbalanced execution of
tick_resume_broadcast() every time that function runs.

Fix that by replacing the tick_resume() in tick_unfreeze()
with tick_resume_local() as appropriate.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: boris.ostrovsky@oracle.com
Cc: david.vrabel@citrix.com
Cc: konrad.wilk@oracle.com
Cc: peterz@infradead.org
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/8099075.V0LvN3pQAV@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

422fe750

sched/core: Drop debugging leftover trace_printk call · 62a935b2

由 Borislav Petkov 提交于 4月 03, 2015

Commit:

  3c18d447 ("sched/core: Check for available DL bandwidth in cpuset_cpu_inactive()")

forgot a trace_printk() debugging piece in and Steve's banner screamed
in dmesg. Remove it.
Signed-off-by: NBorislav Petkov <bp@suse.de>
Cc: Juri Lelli <juri.lelli@arm.com>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/1428050570-21041-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

62a935b2

timekeeping: Get rid of stale comment · 347c6f6d

由 Thomas Gleixner 提交于 4月 03, 2015

Arch specific management of xtime/jiffies/wall_to_monotonic is
gone for quite a while. Zap the stale comment.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/2422730.dmO29q661S@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

347c6f6d

clockevents: Cleanup dead cpu explicitely · a49b116d

由 Thomas Gleixner 提交于 4月 03, 2015

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the cleanup function for a dead cpu and invoke it
directly from the cpu down code. Make it conditional on
CPU_HOTPLUG as well.

Temporary change, will be refined in the future.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
[ Rebased, added clockevents_notify() removal ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1735025.raBZdQHM3m@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

a49b116d

clockevents: Make tick handover explicit · 52c063d1

由 Thomas Gleixner 提交于 4月 03, 2015

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the tick_handover call and invoke it explicitely from
the hotplug code. Temporary solution will be cleaned up in later
patches.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
[ Rebase ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1658173.RkEEILFiQZ@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

52c063d1

clockevents: Remove broadcast oneshot control leftovers · ffa48c0d

由 Rafael J. Wysocki 提交于 4月 03, 2015

Now that all users are converted over to explicit calls into the
clockevents state machine, remove the notification chain leftovers.

Original-from: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/14018863.NQUzkFuafr@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

ffa48c0d

sched/idle: Use explicit broadcast oneshot control function · 335f4919

由 Thomas Gleixner 提交于 4月 03, 2015

Replace the clockevents_notify() call with an explicit function call.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/6422336.RMm7oUHcXh@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

335f4919

clockevents: Provide explicit broadcast oneshot control functions · 1fe5d5c3

由 Thomas Gleixner 提交于 4月 03, 2015

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the broadcast oneshot control into a separate function
and provide inline helpers. Switch clockevents_notify() over.
This will go away once all callers are converted.

This also gets rid of the nested locking of clockevents_lock and
broadcast_lock. The broadcast oneshot control functions do not
require clockevents_lock. Only the managing functions
(setup/shutdown/suspend/resume of the broadcast device require
clockevents_lock.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Alexandre Courbot <gnurou@gmail.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Tony Lindgren <tony@atomide.com>
Link: http://lkml.kernel.org/r/13000649.8qZuEDV0OA@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

1fe5d5c3

clockevents: Remove the broadcast control leftovers · 89feddbf

由 Thomas Gleixner 提交于 4月 03, 2015

All users converted. Remove the notify leftovers.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/2076318.76XJZ8QYP3@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

89feddbf

clockevents: Provide explicit broadcast control functions · 592a438f

由 Thomas Gleixner 提交于 4月 03, 2015

clockevents_notify() is a leftover from the early design of the
clockevents facility. It's really not a notification mechanism,
it's a multiplex call. We are way better off to have explicit
calls instead of this monstrosity.

Split out the broadcast control into a separate function and
provide inline helpers. Switch clockevents_notify() over. This
will go away once all callers are converted.

This also gets rid of the nested locking of clockevents_lock and
broadcast_lock. The broadcast control functions do not require
clockevents_lock. Only the managing functions
(setup/shutdown/suspend/resume of the broadcast device require
clockevents_lock.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tony Lindgren <tony@atomide.com>
Link: http://lkml.kernel.org/r/8086559.ttsuS0n1Xr@vostro.rjw.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

592a438f

clocksource: Improve comment explaining clocks_calc_max_nsecs()'s 50% safety margin · 8e56f33f

由 John Stultz 提交于 4月 01, 2015

Ingo noted that the description of clocks_calc_max_nsecs()'s
50% safety margin was somewhat circular. So this patch tries
to improve the comment to better explain what we mean by the
50% safety margin and why we need it.
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-20-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

8e56f33f

time, drivers/rtc: Don't bother with rtc_resume() for the nonstop clocksource · 0fa88cb4

由 Xunlei Pang 提交于 4月 01, 2015

If a system does not provide a persistent_clock(), the time
will be updated on resume by rtc_resume(). With the addition
of the non-stop clocksources for suspend timing, those systems
set the time on resume in timekeeping_resume(), but may not
provide a valid persistent_clock().

This results in the rtc_resume() logic thinking no one has set
the time and it then will over-write the suspend time again,
which is not necessary and only increases clock error.

So, fix this for rtc_resume().

This patch also improves the name of persistent_clock_exist to
make it more grammatical.
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-19-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

0fa88cb4

time: Fix a bug in timekeeping_suspend() with no persistent clock · 264bb3f7

由 Xunlei Pang 提交于 4月 01, 2015

When there's no persistent clock, normally
timekeeping_suspend_time should always be zero, but this can
break in timekeeping_suspend().

At T1, there was a system suspend, so old_delta was assigned T1.
After some time, one time adjustment happened, and xtime got the
value of T1-dt(0s<dt<2s). Then, there comes another system
suspend soon after this adjustment, obviously we will get a
small negative delta_delta, resulting in a negative
timekeeping_suspend_time.

This is problematic, when doing timekeeping_resume() if there is
no nonstop clocksource for example, it will hit the else leg and
inject the improper sleeptime which is the wrong logic.

So, we can solve this problem by only doing delta related code
when the persistent clock is existent. Actually the code only
makes sense for persistent clock cases.
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-18-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

264bb3f7

time: Don't build timekeeping_inject_sleeptime64() if no one uses it · 7f298139

由 Xunlei Pang 提交于 4月 01, 2015

timekeeping_inject_sleeptime64() is only used by RTC
suspend/resume, so add build dependencies on the necessary RTC
related macros.
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
[ Improve commit message clarity. ]
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-16-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

7f298139

time: Add y2038 safe update_persistent_clock64() · 3c00a1fe

由 Xunlei Pang 提交于 4月 01, 2015

As part of addressing in-kernel y2038 issues, this patch adds
update_persistent_clock64() and replaces all the call sites of
update_persistent_clock() with this function. This is a __weak
implementation, which simply calls the existing y2038 unsafe
update_persistent_clock().

This allows architecture specific implementations to be
converted independently, and eventually y2038-unsafe
update_persistent_clock() can be removed after all its
architecture specific implementations have been converted to
update_persistent_clock64().
Suggested-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-4-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

3c00a1fe

time: Add y2038 safe read_persistent_clock64() · 2ee96632

由 Xunlei Pang 提交于 4月 01, 2015

As part of addressing in-kernel y2038 issues, this patch adds
read_persistent_clock64() and replaces all the call sites of
read_persistent_clock() with this function. This is a __weak
implementation, which simply calls the existing y2038 unsafe
read_persistent_clock().

This allows architecture specific implementations to be
converted independently, and eventually the y2038 unsafe
read_persistent_clock() can be removed after all its
architecture specific implementations have been converted to
read_persistent_clock64().
Suggested-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NXunlei Pang <pang.xunlei@linaro.org>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1427945681-29972-3-git-send-email-john.stultz@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

2ee96632