1. 15 12月, 2016 1 次提交
  2. 13 12月, 2016 7 次提交
  3. 11 12月, 2016 2 次提交
    • V
      sched/core: Use load_avg for selecting idlest group · 6b94780e
      Vincent Guittot 提交于
      find_idlest_group() only compares the runnable_load_avg when looking
      for the least loaded group. But on fork intensive use case like
      hackbench where tasks blocked quickly after the fork, this can lead to
      selecting the same CPU instead of other CPUs, which have similar
      runnable load but a lower load_avg.
      
      When the runnable_load_avg of 2 CPUs are close, we now take into
      account the amount of blocked load as a 2nd selection factor. There is
      now 3 zones for the runnable_load of the rq:
      
       - [0 .. (runnable_load - imbalance)]:
      	Select the new rq which has significantly less runnable_load
      
       - [(runnable_load - imbalance) .. (runnable_load + imbalance)]:
      	The runnable loads are close so we use load_avg to chose
      	between the 2 rq
      
       - [(runnable_load + imbalance) .. ULONG_MAX]:
      	Keep the current rq which has significantly less runnable_load
      
      The scale factor that is currently used for comparing runnable_load,
      doesn't work well with small value. As an example, the use of a
      scaling factor fails as soon as this_runnable_load == 0 because we
      always select local rq even if min_runnable_load is only 1, which
      doesn't really make sense because they are just the same. So instead
      of scaling factor, we use an absolute margin for runnable_load to
      detect CPUs with similar runnable_load and we keep using scaling
      factor for blocked load.
      
      For use case like hackbench, this enable the scheduler to select
      different CPUs during the fork sequence and to spread tasks across the
      system.
      
      Tests have been done on a Hikey board (ARM based octo cores) for
      several kernel. The result below gives min, max, avg and stdev values
      of 18 runs with each configuration.
      
      The patches depend on the "no missing update_rq_clock()" work.
      
      hackbench -P -g 1
      
               ea86cb4b  7dc603c9  v4.8        v4.8+patches
        min    0.049         0.050         0.051       0,048
        avg    0.057         0.057(0%)     0.057(0%)   0,055(+5%)
        max    0.066         0.068         0.070       0,063
        stdev  +/-9%         +/-9%         +/-8%       +/-9%
      
      More performance numbers here:
      
        https://lkml.kernel.org/r/20161203214707.GI20785@codeblueprint.co.ukTested-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten.Rasmussen@arm.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: kernellwp@gmail.com
      Cc: umgwanakikbuti@gmail.com
      Cc: yuyang.du@intel.comc
      Link: http://lkml.kernel.org/r/1481216215-24651-3-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6b94780e
    • V
      sched/core: Fix find_idlest_group() for fork · f519a3f1
      Vincent Guittot 提交于
      During fork, the utilization of a task is init once the rq has been
      selected because the current utilization level of the rq is used to
      set the utilization of the fork task. As the task's utilization is
      still 0 at this step of the fork sequence, it doesn't make sense to
      look for some spare capacity that can fit the task's utilization.
      Furthermore, I can see perf regressions for the test:
      
         hackbench -P -g 1
      
      because the least loaded policy is always bypassed and tasks are not
      spread during fork.
      
      With this patch and the fix below, we are back to same performances as
      for v4.8. The fix below is only a temporary one used for the test
      until a smarter solution is found because we can't simply remove the
      test which is useful for others benchmarks
      
      | @@ -5708,13 +5708,6 @@ static int select_idle_cpu(struct task_struct *p, struct sched_domain *sd, int t
      |
      |	avg_cost = this_sd->avg_scan_cost;
      |
      | -	/*
      | -	 * Due to large variance we need a large fuzz factor; hackbench in
      | -	 * particularly is sensitive here.
      | -	 */
      | -	if ((avg_idle / 512) < avg_cost)
      | -		return -1;
      | -
      |	time = local_clock();
      |
      |	for_each_cpu_wrap(cpu, sched_domain_span(sd), target, wrap) {
      Tested-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Acked-by: NMorten Rasmussen <morten.rasmussen@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dietmar.eggemann@arm.com
      Cc: kernellwp@gmail.com
      Cc: umgwanakikbuti@gmail.com
      Cc: yuyang.du@intel.comc
      Link: http://lkml.kernel.org/r/1481216215-24651-2-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f519a3f1
  4. 09 12月, 2016 6 次提交
    • T
      timekeeping: Use mul_u64_u32_shr() instead of open coding it · c029a2be
      Thomas Gleixner 提交于
      The resume code must deal with a clocksource delta which is potentially big
      enough to overflow the 64bit mult.
      
      Replace the open coded handling with the proper function.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Parit Bhargava <prarit@redhat.com>
      Cc: Laurent Vivier <lvivier@redhat.com>
      Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Liav Rehana <liavr@mellanox.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Link: http://lkml.kernel.org/r/20161208204228.921674404@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c029a2be
    • T
      timekeeping: Get rid of pointless typecasts · cbd99e3b
      Thomas Gleixner 提交于
      cycle_t is defined as u64, so casting it to u64 is a pointless and
      confusing exercise. cycle_t should simply go away and be replaced with a
      plain u64 to avoid further confusion.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Parit Bhargava <prarit@redhat.com>
      Cc: Laurent Vivier <lvivier@redhat.com>
      Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Liav Rehana <liavr@mellanox.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Link: http://lkml.kernel.org/r/20161208204228.844699737@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cbd99e3b
    • T
      timekeeping: Make the conversion call chain consistently unsigned · acc89612
      Thomas Gleixner 提交于
      Propagating a unsigned value through signed variables and functions makes
      absolutely no sense and is just prone to (re)introduce subtle signed
      vs. unsigned issues as happened recently.
      
      Clean it up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Parit Bhargava <prarit@redhat.com>
      Cc: Laurent Vivier <lvivier@redhat.com>
      Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: Liav Rehana <liavr@mellanox.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Link: http://lkml.kernel.org/r/20161208204228.765843099@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      acc89612
    • T
      timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion · 9c164572
      Thomas Gleixner 提交于
      The clocksource delta to nanoseconds conversion is using signed math, but
      the delta is unsigned. This makes the conversion space smaller than
      necessary and in case of a multiplication overflow the conversion can
      become negative. The conversion is done with scaled math:
      
          s64 nsec_delta = ((s64)clkdelta * clk->mult) >> clk->shift;
      
      Shifting a signed integer right obvioulsy preserves the sign, which has
      interesting consequences:
       
       - Time jumps backwards
       
       - __iter_div_u64_rem() which is used in one of the calling code pathes
         will take forever to piecewise calculate the seconds/nanoseconds part.
      
      This has been reported by several people with different scenarios:
      
      David observed that when stopping a VM with a debugger:
      
       "It was essentially the stopped by debugger case.  I forget exactly why,
        but the guest was being explicitly stopped from outside, it wasn't just
        scheduling lag.  I think it was something in the vicinity of 10 minutes
        stopped."
      
       When lifting the stop the machine went dead.
      
      The stopped by debugger case is not really interesting, but nevertheless it
      would be a good thing not to die completely.
      
      But this was also observed on a live system by Liav:
      
       "When the OS is too overloaded, delta will get a high enough value for the
        msb of the sum delta * tkr->mult + tkr->xtime_nsec to be set, and so
        after the shift the nsec variable will gain a value similar to
        0xffffffffff000000."
      
      Unfortunately this has been reintroduced recently with commit 6bd58f09
      ("time: Add cycles to nanoseconds translation"). It had been fixed a year
      ago already in commit 35a4933a ("time: Avoid signed overflow in
      timekeeping_get_ns()").
      
      Though it's not surprising that the issue has been reintroduced because the
      function itself and the whole call chain uses s64 for the result and the
      propagation of it. The change in this recent commit is subtle:
      
         s64 nsec;
      
      -  nsec = (d * m + n) >> s:
      +  nsec = d * m + n;
      +  nsec >>= s;
      
      d being type of cycle_t adds another level of obfuscation.
      
      This wouldn't have happened if the previous change to unsigned computation
      would have made the 'nsec' variable u64 right away and a follow up patch
      had cleaned up the whole call chain.
      
      There have been patches submitted which basically did a revert of the above
      patch leaving everything else unchanged as signed. Back to square one. This
      spawned a admittedly pointless discussion about potential users which rely
      on the unsigned behaviour until someone pointed out that it had been fixed
      before. The changelogs of said patches added further confusion as they made
      finally false claims about the consequences for eventual users which expect
      signed results.
      
      Despite delta being cycle_t, aka. u64, it's very well possible to hand in
      a signed negative value and the signed computation will happily return the
      correct result. But nobody actually sat down and analyzed the code which
      was added as user after the propably unintended signed conversion.
      
      Though in sensitive code like this it's better to analyze it proper and
      make sure that nothing relies on this than hunting the subtle wreckage half
      a year later. After analyzing all call chains it stands that no caller can
      hand in a negative value (which actually would work due to the s64 cast)
      and rely on the signed math to do the right thing.
      
      Change the conversion function to unsigned math. The conversion of all call
      chains is done in a follow up patch.
      
      This solves the starvation issue, which was caused by the negative result,
      but it does not solve the underlying problem. It merily procrastinates
      it. When the timekeeper update is deferred long enough that the unsigned
      multiplication overflows, then time going backwards is observable again.
      
      It does neither solve the issue of clocksources with a small counter width
      which will wrap around possibly several times and cause random time stamps
      to be generated. But those are usually not found on systems used for
      virtualization, so this is likely a non issue.
      
      I took the liberty to claim authorship for this simply because
      analyzing all callsites and writing the changelog took substantially
      more time than just making the simple s/s64/u64/ change and ignore the
      rest.
      
      Fixes: 6bd58f09 ("time: Add cycles to nanoseconds translation")
      Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Reported-by: NLiav Rehana <liavr@mellanox.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Parit Bhargava <prarit@redhat.com>
      Cc: Laurent Vivier <lvivier@redhat.com>
      Cc: "Christopher S. Hall" <christopher.s.hall@intel.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161208204228.688545601@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9c164572
    • M
      bpf: xdp: Allow head adjustment in XDP prog · 17bedab2
      Martin KaFai Lau 提交于
      This patch allows XDP prog to extend/remove the packet
      data at the head (like adding or removing header).  It is
      done by adding a new XDP helper bpf_xdp_adjust_head().
      
      It also renames bpf_helper_changes_skb_data() to
      bpf_helper_changes_pkt_data() to better reflect
      that XDP prog does not work on skb.
      
      This patch adds one "xdp_adjust_head" bit to bpf_prog for the
      XDP-capable driver to check if the XDP prog requires
      bpf_xdp_adjust_head() support.  The driver can then decide
      to error out during XDP_SETUP_PROG.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17bedab2
    • A
      bpf: fix state equivalence · d2a4dd37
      Alexei Starovoitov 提交于
      Commmits 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
      and 48461135 ("bpf: allow access into map value arrays") by themselves
      are correct, but in combination they make state equivalence ignore 'id' field
      of the register state which can lead to accepting invalid program.
      
      Fixes: 57a09bf0 ("bpf: Detect identical PTR_TO_MAP_VALUE_OR_NULL registers")
      Fixes: 48461135 ("bpf: allow access into map value arrays")
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2a4dd37
  5. 08 12月, 2016 9 次提交
    • O
      kthread: Don't abuse kthread_create_on_cpu() in __kthread_create_worker() · 8fb9dcbd
      Oleg Nesterov 提交于
      kthread_create_on_cpu() sets KTHREAD_IS_PER_CPU and kthread->cpu, this
      only makes sense if this kthread can be parked/unparked by cpuhp code.
      kthread workers never call kthread_parkme() so this has no effect.
      
      Change __kthread_create_worker() to simply call kthread_bind(task, cpu).
      The very fact that kthread_create_on_cpu() doesn't accept a generic fmt
      shows that it should not be used outside of smpboot.c.
      
      Now, the only reason we can not unexport this helper and move it into
      smpboot.c is that it sets kthread->cpu and struct kthread is not exported.
      And the only reason we can not kill kthread->cpu is that kthread_unpark()
      is used by drivers/gpu/drm/amd/scheduler/gpu_scheduler.c and thus we can
      not turn _unpark into kthread_unpark(struct smp_hotplug_thread *, cpu).
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Tested-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: Chunming Zhou <David1.Zhou@amd.com>
      Cc: Roman Pen <roman.penyaev@profitbricks.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161129175110.GA5342@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      8fb9dcbd
    • O
      kthread: Don't use to_live_kthread() in kthread_[un]park() · cf380a4a
      Oleg Nesterov 提交于
      Now that to_kthread() is always validm change kthread_park() and
      kthread_unpark() to use it and kill to_live_kthread().
      
      The conversion of kthread_unpark() is trivial. If KTHREAD_IS_PARKED is set
      then the task has called complete(&self->parked) and there the function
      cannot race against a concurrent kthread_stop() and exit.
      
      kthread_park() is more tricky, because its semantics are not well
      defined. It returns -ENOSYS if the thread exited but this can never happen
      and as Roman pointed out kthread_park() can obviously block forever if it
      would race with the exiting kthread.
      
      The usage of kthread_park() in cpuhp code (cpu.c, smpboot.c, stop_machine.c)
      is fine. It can never see an exiting/exited kthread, smpboot_destroy_threads()
      clears *ht->store, smpboot_park_thread() checks it is not NULL under the same
      smpboot_threads_lock. cpuhp_threads and cpu_stop_threads never exit, so other
      callers are fine too.
      
      But it has two more users:
      
      - watchdog_park_threads():
      
        The code is actually correct, get_online_cpus() ensures that
        kthread_park() can't race with itself (note that kthread_park() can't
        handle this race correctly), but it should not use kthread_park()
        directly.
      
      - drivers/gpu/drm/amd/scheduler/gpu_scheduler.c should not use
        kthread_park() either.
      
        kthread_park() must not be called after amd_sched_fini() which does
        kthread_stop(), otherwise even to_live_kthread() is not safe because
        task_struct can be already freed and sched->thread can point to nowhere.
      
      The usage of kthread_park/unpark should either be restricted to core code
      which is properly protected against the exit race or made more robust so it
      is safe to use it in drivers.
      
      To catch eventual exit issues, add a WARN_ON(PF_EXITING) for now.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Chunming Zhou <David1.Zhou@amd.com>
      Cc: Roman Pen <roman.penyaev@profitbricks.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161129175107.GA5339@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      cf380a4a
    • O
      kthread: Don't use to_live_kthread() in kthread_stop() · efb29fbf
      Oleg Nesterov 提交于
      kthread_stop() had to use to_live_kthread() simply because it was not
      possible to access kthread->exited after the exiting task clears
      task_struct->vfork_done. Now that to_kthread() is always valid,
      wake_up_process() + wait_for_completion() can be done
      ununconditionally. It's not an issue anymore if the task has already issued
      complete_vfork_done() or died.
      
      The exiting task can get the spurious wakeup after mm_release() but this is
      possible without this change too and is fine; do_task_dead() ensures that
      this can't make any harm.
      
      As a further enhancement this could be converted to task_work_add() later,
      so ->vfork_done can be avoided completely.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Chunming Zhou <David1.Zhou@amd.com>
      Cc: Roman Pen <roman.penyaev@profitbricks.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161129175103.GA5336@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      efb29fbf
    • O
      Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in... · eff96625
      Oleg Nesterov 提交于
      Revert "kthread: Pin the stack via try_get_task_stack()/put_task_stack() in to_live_kthread() function"
      
      This reverts commit 23196f2e.
      
      Now that struct kthread is kmalloc'ed and not longer on the task stack
      there is no need anymore to pin the stack.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Chunming Zhou <David1.Zhou@amd.com>
      Cc: Roman Pen <roman.penyaev@profitbricks.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161129175100.GA5333@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      
      eff96625
    • O
      kthread: Make struct kthread kmalloc'ed · 1da5c46f
      Oleg Nesterov 提交于
      commit 23196f2e "kthread: Pin the stack via try_get_task_stack() /
      put_task_stack() in to_live_kthread() function" is a workaround for the
      fragile design of struct kthread being allocated on the task stack.
      
      struct kthread in its current form should be removed, but this needs
      cleanups outside of kthread.c.
      
      As a first step move struct kthread away from the task stack by making it
      kmalloc'ed. This allows to access kthread.exited without the magic of
      trying to pin task stack and the try logic in to_live_kthread().
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Chunming Zhou <David1.Zhou@amd.com>
      Cc: Roman Pen <roman.penyaev@profitbricks.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20161129175057.GA5330@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1da5c46f
    • M
      hotplug: Make register and unregister notifier API symmetric · 777c6e0d
      Michal Hocko 提交于
      Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its
      notifiers when HOTPLUG_CPU=y while the registration might succeed even
      when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap
      might keep a stale notifier on the list on the manual clean up during
      the pool tear down and thus corrupt the list. Resulting in the following
      
      [  144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78
      [  144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40
      <snipped>
      [  145.122628] Call Trace:
      [  145.125086]  [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20
      [  145.131350]  [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400
      [  145.137268]  [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300
      [  145.143188]  [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10
      [  145.149018]  [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30
      [  145.154940]  [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0
      [  145.160511]  [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20
      [  145.167035]  [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0
      [  145.172694]  [<ffffffffa290848d>] module_attr_store+0x1d/0x30
      [  145.178443]  [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70
      [  145.183925]  [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180
      [  145.189761]  [<ffffffffa2a99248>] __vfs_write+0x18/0x40
      [  145.194982]  [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0
      [  145.200122]  [<ffffffffa2a9a732>] SyS_write+0x52/0xa0
      [  145.205177]  [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17
      
      This can be even triggered manually by changing
      /sys/module/zswap/parameters/compressor multiple times.
      
      Fix this issue by making unregister APIs symmetric to the register so
      there are no surprises.
      
      Fixes: 47e627bc ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU")
      Reported-and-tested-by: NYu Zhao <yuzhao@google.com>
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Cc: linux-mm@kvack.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      777c6e0d
    • K
      kcov: add missing #include <linux/sched.h> · 166ad0e1
      Kefeng Wang 提交于
      In __sanitizer_cov_trace_pc we use task_struct and fields within it, but
      as we haven't included <linux/sched.h>, it is not guaranteed to be
      defined.  While we usually happen to acquire the definition through a
      transitive include, this is fragile (and hasn't been true in the past,
      causing issues with backports).
      
      Include <linux/sched.h> to avoid any fragility.
      
      [mark.rutland@arm.com: rewrote changelog]
      Link: http://lkml.kernel.org/r/1481007384-27529-1-git-send-email-wangkefeng.wang@huawei.comSigned-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: James Morse <james.morse@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      166ad0e1
    • D
      bpf: fix loading of BPF_MAXINSNS sized programs · ef0915ca
      Daniel Borkmann 提交于
      General assumption is that single program can hold up to BPF_MAXINSNS,
      that is, 4096 number of instructions. It is the case with cBPF and
      that limit was carried over to eBPF. When recently testing digest, I
      noticed that it's actually not possible to feed 4096 instructions
      via bpf(2).
      
      The check for > BPF_MAXINSNS was added back then to bpf_check() in
      cbd35700 ("bpf: verifier (add ability to receive verification log)").
      However, 09756af4 ("bpf: expand BPF syscall with program load/unload")
      added yet another check that comes before that into bpf_prog_load(),
      but this time bails out already in case of >= BPF_MAXINSNS.
      
      Fix it up and perform the check early in bpf_prog_load(), so we can drop
      the second one in bpf_check(). It makes sense, because also a 0 insn
      program is useless and we don't want to waste any resources doing work
      up to bpf_check() point. The existing bpf(2) man page documents E2BIG
      as the official error for such cases, so just stick with it as well.
      
      Fixes: 09756af4 ("bpf: expand BPF syscall with program load/unload")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef0915ca
    • M
      clocksource: export the clocks_calc_mult_shift to use by timestamp code · 5304121a
      Murali Karicheri 提交于
      The CPSW CPTS driver is capable of doing timestamping on tx/rx packets and
      requires to know mult and shift factors for timestamp conversion from raw
      value to nanoseconds (ptp clock). Now these mult and shift factors are
      calculated manually and provided through DT, which makes very hard to
      support of a lot number of platforms, especially if CPTS refclk is not the
      same for some kind of boards and depends on efuse settings (Keystone 2
      platforms). Hence, export clocks_calc_mult_shift() to allow drivers like
      CPSW CPTS (and other ptp drivesr) to benefit from automaitc calculation of
      mult and shift factors.
      
      Cc: John Stultz <john.stultz@linaro.org>
      Signed-off-by: NMurali Karicheri <m-karicheri2@ti.com>
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5304121a
  6. 07 12月, 2016 1 次提交
  7. 06 12月, 2016 4 次提交
    • D
      lockdep: Fix report formatting · f943fe0f
      Dmitry Vyukov 提交于
      Since commit:
      
        4bcc595c ("printk: reinstate KERN_CONT for printing continuation lines")
      
      printk() requires KERN_CONT to continue log messages. Lots of printk()
      in lockdep.c and print_ip_sym() don't have it. As the result lockdep
      reports are completely messed up.
      
      Add missing KERN_CONT and inline print_ip_sym() where necessary.
      
      Example of a messed up report:
      
        0-rc5+ #41 Not tainted
        -------------------------------------------------------
        syz-executor0/5036 is trying to acquire lock:
         (
        rtnl_mutex
        ){+.+.+.}
        , at:
        [<ffffffff86b3d6ac>] rtnl_lock+0x1c/0x20
        but task is already holding lock:
         (
        &net->packet.sklist_lock
        ){+.+...}
        , at:
        [<ffffffff873541a6>] packet_diag_dump+0x1a6/0x1920
        which lock already depends on the new lock.
        the existing dependency chain (in reverse order) is:
        -> #3
         (
        &net->packet.sklist_lock
        +.+...}
        ...
      
      Without this patch all scripts that parse kernel bug reports are broken.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: andreyknvl@google.com
      Cc: aryabinin@virtuozzo.com
      Cc: joe@perches.com
      Cc: syzkaller@googlegroups.com
      Link: http://lkml.kernel.org/r/1480343083-48731-1-git-send-email-dvyukov@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f943fe0f
    • D
      perf/core: Remove invalid warning from list_update_cgroup_even()t · 8fc31ce8
      David Carrillo-Cisneros 提交于
      The warning introduced in commit:
      
        864c2357 ("perf/core: Do not set cpuctx->cgrp for unscheduled cgroups")
      
      assumed that a cgroup switch always precedes list_del_event. This is
      not the case. Remove warning.
      
      Make sure that cpuctx->cgrp is NULL until a cgroup event is sched in
      or ctx->nr_cgroups == 0.
      Signed-off-by: NDavid Carrillo-Cisneros <davidcc@google.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Nilay Vaish <nilayvaish@gmail.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi V Shankar <ravi.v.shankar@intel.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Vikas Shivappa <vikas.shivappa@linux.intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1480841177-27299-1-git-send-email-davidcc@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8fc31ce8
    • D
      bpf: add prog_digest and expose it via fdinfo/netlink · 7bd509e3
      Daniel Borkmann 提交于
      When loading a BPF program via bpf(2), calculate the digest over
      the program's instruction stream and store it in struct bpf_prog's
      digest member. This is done at a point in time before any instructions
      are rewritten by the verifier. Any unstable map file descriptor
      number part of the imm field will be zeroed for the hash.
      
      fdinfo example output for progs:
      
        # cat /proc/1590/fdinfo/5
        pos:          0
        flags:        02000002
        mnt_id:       11
        prog_type:    1
        prog_jited:   1
        prog_digest:  b27e8b06da22707513aa97363dfb11c7c3675d28
        memlock:      4096
      
      When programs are pinned and retrieved by an ELF loader, the loader
      can check the program's digest through fdinfo and compare it against
      one that was generated over the ELF file's program section to see
      if the program needs to be reloaded. Furthermore, this can also be
      exposed through other means such as netlink in case of a tc cls/act
      dump (or xdp in future), but also through tracepoints or other
      facilities to identify the program. Other than that, the digest can
      also serve as a base name for the work in progress kallsyms support
      of programs. The digest doesn't depend/select the crypto layer, since
      we need to keep dependencies to a minimum. iproute2 will get support
      for this facility.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7bd509e3
    • G
      bpf: Preserve const register type on const OR alu ops · 3c839744
      Gianluca Borello 提交于
      Occasionally, clang (e.g. version 3.8.1) translates a sum between two
      constant operands using a BPF_OR instead of a BPF_ADD. The verifier is
      currently not handling this scenario, and the destination register type
      becomes UNKNOWN_VALUE even if it's still storing a constant. As a result,
      the destination register cannot be used as argument to a helper function
      expecting a ARG_CONST_STACK_*, limiting some use cases.
      
      Modify the verifier to handle this case, and add a few tests to make sure
      all combinations are supported, and stack boundaries are still verified
      even with BPF_OR.
      Signed-off-by: NGianluca Borello <g.borello@gmail.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c839744
  8. 03 12月, 2016 2 次提交
  9. 02 12月, 2016 7 次提交
    • T
      bpf: BPF for lightweight tunnel infrastructure · 3a0af8fd
      Thomas Graf 提交于
      Registers new BPF program types which correspond to the LWT hooks:
        - BPF_PROG_TYPE_LWT_IN   => dst_input()
        - BPF_PROG_TYPE_LWT_OUT  => dst_output()
        - BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit()
      
      The separate program types are required to differentiate between the
      capabilities each LWT hook allows:
      
       * Programs attached to dst_input() or dst_output() are restricted and
         may only read the data of an skb. This prevent modification and
         possible invalidation of already validated packet headers on receive
         and the construction of illegal headers while the IP headers are
         still being assembled.
      
       * Programs attached to lwtunnel_xmit() are allowed to modify packet
         content as well as prepending an L2 header via a newly introduced
         helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is
         invoked after the IP header has been assembled completely.
      
      All BPF programs receive an skb with L3 headers attached and may return
      one of the following error codes:
      
       BPF_OK - Continue routing as per nexthop
       BPF_DROP - Drop skb and return EPERM
       BPF_REDIRECT - Redirect skb to device as per redirect() helper.
                      (Only valid in lwtunnel_xmit() context)
      
      The return codes are binary compatible with their TC_ACT_
      relatives to ease compatibility.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a0af8fd
    • T
      locking/rtmutex: Explain locking rules for rt_mutex_proxy_unlock()/init_proxy_locked() · 84d82ec5
      Thomas Gleixner 提交于
      While debugging the unlock vs. dequeue race which resulted in state
      corruption of futexes the lockless nature of rt_mutex_proxy_unlock()
      caused some confusion.
      
      Add commentry to explain why it is safe to do this lockless. Add matching
      comments to rt_mutex_init_proxy_locked() for completeness sake.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/20161130210030.591941927@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      84d82ec5
    • T
      locking/rtmutex: Get rid of RT_MUTEX_OWNER_MASKALL · b5016e82
      Thomas Gleixner 提交于
      This is a left over from the original rtmutex implementation which used
      both bit0 and bit1 in the owner pointer. Commit:
      
        8161239a ("rtmutex: Simplify PI algorithm and make highest prio task get lock")
      
      ... removed the usage of bit1, but kept the extra mask around. This is
      confusing at best.
      
      Remove it and just use RT_MUTEX_HAS_WAITERS for the masking.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/20161130210030.509567906@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b5016e82
    • T
      locking/rtmutex: Use READ_ONCE() in rt_mutex_owner() · 1be5d4fa
      Thomas Gleixner 提交于
      While debugging the rtmutex unlock vs. dequeue race Will suggested to use
      READ_ONCE() in rt_mutex_owner() as it might race against the
      cmpxchg_release() in unlock_rt_mutex_safe().
      
      Will: "It's a minor thing which will most likely not matter in practice"
      
      Careful search did not unearth an actual problem in todays code, but it's
      better to be safe than surprised.
      Suggested-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20161130210030.431379999@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1be5d4fa
    • T
      locking/rtmutex: Prevent dequeue vs. unlock race · dbb26055
      Thomas Gleixner 提交于
      David reported a futex/rtmutex state corruption. It's caused by the
      following problem:
      
      CPU0		CPU1		CPU2
      
      l->owner=T1
      		rt_mutex_lock(l)
      		lock(l->wait_lock)
      		l->owner = T1 | HAS_WAITERS;
      		enqueue(T2)
      		boost()
      		  unlock(l->wait_lock)
      		schedule()
      
      				rt_mutex_lock(l)
      				lock(l->wait_lock)
      				l->owner = T1 | HAS_WAITERS;
      				enqueue(T3)
      				boost()
      				  unlock(l->wait_lock)
      				schedule()
      		signal(->T2)	signal(->T3)
      		lock(l->wait_lock)
      		dequeue(T2)
      		deboost()
      		  unlock(l->wait_lock)
      				lock(l->wait_lock)
      				dequeue(T3)
      				  ===> wait list is now empty
      				deboost()
      				 unlock(l->wait_lock)
      		lock(l->wait_lock)
      		fixup_rt_mutex_waiters()
      		  if (wait_list_empty(l)) {
      		    owner = l->owner & ~HAS_WAITERS;
      		    l->owner = owner
      		     ==> l->owner = T1
      		  }
      
      				lock(l->wait_lock)
      rt_mutex_unlock(l)		fixup_rt_mutex_waiters()
      				  if (wait_list_empty(l)) {
      				    owner = l->owner & ~HAS_WAITERS;
      cmpxchg(l->owner, T1, NULL)
       ===> Success (l->owner = NULL)
      				    l->owner = owner
      				     ==> l->owner = T1
      				  }
      
      That means the problem is caused by fixup_rt_mutex_waiters() which does the
      RMW to clear the waiters bit unconditionally when there are no waiters in
      the rtmutexes rbtree.
      
      This can be fatal: A concurrent unlock can release the rtmutex in the
      fastpath because the waiters bit is not set. If the cmpxchg() gets in the
      middle of the RMW operation then the previous owner, which just unlocked
      the rtmutex is set as the owner again when the write takes place after the
      successfull cmpxchg().
      
      The solution is rather trivial: verify that the owner member of the rtmutex
      has the waiters bit set before clearing it. This does not require a
      cmpxchg() or other atomic operations because the waiters bit can only be
      set and cleared with the rtmutex wait_lock held. It's also safe against the
      fast path unlock attempt. The unlock attempt via cmpxchg() will either see
      the bit set and take the slowpath or see the bit cleared and release it
      atomically in the fastpath.
      
      It's remarkable that the test program provided by David triggers on ARM64
      and MIPS64 really quick, but it refuses to reproduce on x86-64, while the
      problem exists there as well. That refusal might explain that this got not
      discovered earlier despite the bug existing from day one of the rtmutex
      implementation more than 10 years ago.
      
      Thanks to David for meticulously instrumenting the code and providing the
      information which allowed to decode this subtle problem.
      Reported-by: NDavid Daney <ddaney@caviumnetworks.com>
      Tested-by: NDavid Daney <david.daney@cavium.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Siewior <bigeasy@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: stable@vger.kernel.org
      Fixes: 23f78d4a ("[PATCH] pi-futex: rt mutex core")
      Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dbb26055
    • S
      tracing/rb: Convert to hotplug state machine · b32614c0
      Sebastian Andrzej Siewior 提交于
      Install the callbacks via the state machine. The notifier in struct
      ring_buffer is replaced by the multi instance interface.  Upon
      __ring_buffer_alloc() invocation, cpuhp_state_add_instance() will invoke
      the trace_rb_cpu_prepare() on each CPU.
      
      This callback may now fail. This means __ring_buffer_alloc() will fail and
      cleanup (like previously) and during a CPU up event this failure will not
      allow the CPU to come up.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: rt@linutronix.de
      Link: http://lkml.kernel.org/r/20161126231350.10321-7-bigeasy@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      b32614c0
    • W
      audit: remove useless synchronize_net() · 60602982
      WANG Cong 提交于
      netlink kernel socket is protected by refcount, not RCU.
      Its rcv path is neither protected by RCU. So the synchronize_net()
      is just pointless.
      
      Cc: Richard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60602982
  10. 01 12月, 2016 1 次提交
    • B
      alarmtimer: Add tracepoints for alarm timers · 4a057549
      Baolin Wang 提交于
      Alarm timers are one of the mechanisms to wake up a system from suspend,
      but there exist no tracepoints to analyse which process/thread armed an
      alarmtimer.
      
      Add tracepoints for start/cancel/expire of individual alarm timers and one
      for tracing the suspend time decision when to resume the system.
      
      The following trace excerpt illustrates the new mechanism:
      
      Binder:3292_2-3304  [000] d..2   149.981123: alarmtimer_cancel:
      alarmtimer:ffffffc1319a7800 type:REALTIME
      expires:1325463120000000000 now:1325376810370370245
      
      Binder:3292_2-3304  [000] d..2   149.981136: alarmtimer_start:
      alarmtimer:ffffffc1319a7800 type:REALTIME
      expires:1325376840000000000 now:1325376810370384591
      
      Binder:3292_9-3953  [000] d..2   150.212991: alarmtimer_cancel:
      alarmtimer:ffffffc1319a5a00 type:BOOTTIME
      expires:179552000000 now:150154008122
      
      Binder:3292_9-3953  [000] d..2   150.213006: alarmtimer_start:
      alarmtimer:ffffffc1319a5a00 type:BOOTTIME
      expires:179551000000 now:150154025622
      
      system_server-3000  [002] ...1  162.701940: alarmtimer_suspend:
      alarmtimer type:REALTIME expires:1325376840000000000
      
      The wakeup time which is selected at suspend time allows to map it back to
      the task arming the timer: Binder:3292_2.
      
      [ tglx: Store alarm timer expiry time instead of some useless RTC relative
        	information, add proper type information for wakeups which are
        	handled via the clock_nanosleep/freezer and massage the changelog. ]
      Signed-off-by: NBaolin Wang <baolin.wang@linaro.org>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Richard Cochran <richardcochran@gmail.com>
      Link: http://lkml.kernel.org/r/1480372524-15181-5-git-send-email-john.stultz@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4a057549