1. 07 5月, 2014 6 次提交
  2. 03 5月, 2014 1 次提交
    • S
      tracing: Use rcu_dereference_sched() for trace event triggers · 561a4fe8
      Steven Rostedt (Red Hat) 提交于
      As trace event triggers are now part of the mainline kernel, I added
      my trace event trigger tests to my test suite I run on all my kernels.
      Now these tests get run under different config options, and one of
      those options is CONFIG_PROVE_RCU, which checks under lockdep that
      the rcu locking primitives are being used correctly. This triggered
      the following splat:
      
      ===============================
      [ INFO: suspicious RCU usage. ]
      3.15.0-rc2-test+ #11 Not tainted
      -------------------------------
      kernel/trace/trace_events_trigger.c:80 suspicious rcu_dereference_check() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 1, debug_locks = 0
      4 locks held by swapper/1/0:
       #0:  ((&(&j_cdbs->work)->timer)){..-...}, at: [<ffffffff8104d2cc>] call_timer_fn+0x5/0x1be
       #1:  (&(&pool->lock)->rlock){-.-...}, at: [<ffffffff81059856>] __queue_work+0x140/0x283
       #2:  (&p->pi_lock){-.-.-.}, at: [<ffffffff8106e961>] try_to_wake_up+0x2e/0x1e8
       #3:  (&rq->lock){-.-.-.}, at: [<ffffffff8106ead3>] try_to_wake_up+0x1a0/0x1e8
      
      stack backtrace:
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.15.0-rc2-test+ #11
      Hardware name:                  /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
       0000000000000001 ffff88007e083b98 ffffffff819f53a5 0000000000000006
       ffff88007b0942c0 ffff88007e083bc8 ffffffff81081307 ffff88007ad96d20
       0000000000000000 ffff88007af2d840 ffff88007b2e701c ffff88007e083c18
      Call Trace:
       <IRQ>  [<ffffffff819f53a5>] dump_stack+0x4f/0x7c
       [<ffffffff81081307>] lockdep_rcu_suspicious+0x107/0x110
       [<ffffffff810ee51c>] event_triggers_call+0x99/0x108
       [<ffffffff810e8174>] ftrace_event_buffer_commit+0x42/0xa4
       [<ffffffff8106aadc>] ftrace_raw_event_sched_wakeup_template+0x71/0x7c
       [<ffffffff8106bcbf>] ttwu_do_wakeup+0x7f/0xff
       [<ffffffff8106bd9b>] ttwu_do_activate.constprop.126+0x5c/0x61
       [<ffffffff8106eadf>] try_to_wake_up+0x1ac/0x1e8
       [<ffffffff8106eb77>] wake_up_process+0x36/0x3b
       [<ffffffff810575cc>] wake_up_worker+0x24/0x26
       [<ffffffff810578bc>] insert_work+0x5c/0x65
       [<ffffffff81059982>] __queue_work+0x26c/0x283
       [<ffffffff81059999>] ? __queue_work+0x283/0x283
       [<ffffffff810599b7>] delayed_work_timer_fn+0x1e/0x20
       [<ffffffff8104d3a6>] call_timer_fn+0xdf/0x1be^M
       [<ffffffff8104d2cc>] ? call_timer_fn+0x5/0x1be
       [<ffffffff81059999>] ? __queue_work+0x283/0x283
       [<ffffffff8104d823>] run_timer_softirq+0x1a4/0x22f^M
       [<ffffffff8104696d>] __do_softirq+0x17b/0x31b^M
       [<ffffffff81046d03>] irq_exit+0x42/0x97
       [<ffffffff81a08db6>] smp_apic_timer_interrupt+0x37/0x44
       [<ffffffff81a07a2f>] apic_timer_interrupt+0x6f/0x80
       <EOI>  [<ffffffff8100a5d8>] ? default_idle+0x21/0x32
       [<ffffffff8100a5d6>] ? default_idle+0x1f/0x32
       [<ffffffff8100ac10>] arch_cpu_idle+0xf/0x11
       [<ffffffff8107b3a4>] cpu_startup_entry+0x1a3/0x213
       [<ffffffff8102a23c>] start_secondary+0x212/0x219
      
      The cause is that the triggers are protected by rcu_read_lock_sched() but
      the data is dereferenced with rcu_dereference() which expects it to
      be protected with rcu_read_lock(). The proper reference should be
      rcu_dereference_sched().
      
      Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
      Cc: stable@vger.kernel.org # 3.14+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      561a4fe8
  3. 30 4月, 2014 3 次提交
    • J
      timer: Prevent overflow in apply_slack · 98a01e77
      Jiri Bohac 提交于
      On architectures with sizeof(int) < sizeof (long), the
      computation of mask inside apply_slack() can be undefined if the
      computed bit is > 32.
      
      E.g. with: expires = 0xffffe6f5 and slack = 25, we get:
      
      expires_limit = 0x20000000e
      bit = 33
      mask = (1 << 33) - 1  /* undefined */
      
      On x86, mask becomes 1 and and the slack is not applied properly.
      On s390, mask is -1, expires is set to 0 and the timer fires immediately.
      
      Use 1UL << bit to solve that issue.
      Suggested-by: NDeborah Townsend <dstownse@us.ibm.com>
      Signed-off-by: NJiri Bohac <jbohac@suse.cz>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20140418152310.GA13654@midget.suse.czSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      98a01e77
    • L
      hrtimer: Prevent remote enqueue of leftmost timers · 012a45e3
      Leon Ma 提交于
      If a cpu is idle and starts an hrtimer which is not pinned on that
      same cpu, the nohz code might target the timer to a different cpu.
      
      In the case that we switch the cpu base of the timer we already have a
      sanity check in place, which determines whether the timer is earlier
      than the current leftmost timer on the target cpu. In that case we
      enqueue the timer on the current cpu because we cannot reprogram the
      clock event device on the target.
      
      If the timers base is already the target CPU we do not have this
      sanity check in place so we enqueue the timer as the leftmost timer in
      the target cpus rb tree, but we cannot reprogram the clock event
      device on the target cpu. So the timer expires late and subsequently
      prevents the reprogramming of the target cpu clock event device until
      the previously programmed event fires or a timer with an earlier
      expiry time gets enqueued on the target cpu itself.
      
      Add the same target check as we have for the switch base case and
      start the timer on the current cpu if it would become the leftmost
      timer on the target.
      
      [ tglx: Rewrote subject and changelog ]
      Signed-off-by: NLeon Ma <xindong.ma@intel.com>
      Link: http://lkml.kernel.org/r/1398847391-5994-1-git-send-email-xindong.ma@intel.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      012a45e3
    • S
      hrtimer: Prevent all reprogramming if hang detected · 6c6c0d5a
      Stuart Hayes 提交于
      If the last hrtimer interrupt detected a hang it sets hang_detected=1
      and programs the clock event device with a delay to let the system
      make progress.
      
      If hang_detected == 1, we prevent reprogramming of the clock event
      device in hrtimer_reprogram() but not in hrtimer_force_reprogram().
      
      This can lead to the following situation:
      
      hrtimer_interrupt()
         hang_detected = 1;
         program ce device to Xms from now (hang delay)
      
      We have two timers pending:
         T1 expires 50ms from now
         T2 expires 5s from now
      
      Now T1 gets canceled, which causes hrtimer_force_reprogram() to be
      invoked, which in turn programs the clock event device to T2 (5
      seconds from now).
      
      Any hrtimer_start after that will not reprogram the hardware due to
      hang_detected still being set. So we effectivly block all timers until
      the T2 event fires and cleans up the hang situation.
      
      Add a check for hang_detected to hrtimer_force_reprogram() which
      prevents the reprogramming of the hang delay in the hardware
      timer. The subsequent hrtimer_interrupt will resolve all outstanding
      issues.
      
      [ tglx: Rewrote subject and changelog and fixed up the comment in
        	hrtimer_force_reprogram() ]
      Signed-off-by: NStuart Hayes <stuart.w.hayes@gmail.com>
      Link: http://lkml.kernel.org/r/53602DC6.2060101@gmail.com
      Cc: stable@vger.kernel.org
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      6c6c0d5a
  4. 28 4月, 2014 3 次提交
    • S
      ftrace/module: Hardcode ftrace_module_init() call into load_module() · a949ae56
      Steven Rostedt (Red Hat) 提交于
      A race exists between module loading and enabling of function tracer.
      
      	CPU 1				CPU 2
      	-----				-----
        load_module()
         module->state = MODULE_STATE_COMING
      
      				register_ftrace_function()
      				 mutex_lock(&ftrace_lock);
      				 ftrace_startup()
      				  update_ftrace_function();
      				   ftrace_arch_code_modify_prepare()
      				    set_all_module_text_rw();
      				   <enables-ftrace>
      				    ftrace_arch_code_modify_post_process()
      				     set_all_module_text_ro();
      
      				[ here all module text is set to RO,
      				  including the module that is
      				  loading!! ]
      
         blocking_notifier_call_chain(MODULE_STATE_COMING);
          ftrace_init_module()
      
           [ tries to modify code, but it's RO, and fails!
             ftrace_bug() is called]
      
      When this race happens, ftrace_bug() will produces a nasty warning and
      all of the function tracing features will be disabled until reboot.
      
      The simple solution is to treate module load the same way the core
      kernel is treated at boot. To hardcode the ftrace function modification
      of converting calls to mcount into nops. This is done in init/main.c
      there's no reason it could not be done in load_module(). This gives
      a better control of the changes and doesn't tie the state of the
      module to its notifiers as much. Ftrace is special, it needs to be
      treated as such.
      
      The reason this would work, is that the ftrace_module_init() would be
      called while the module is in MODULE_STATE_UNFORMED, which is ignored
      by the set_all_module_text_ro() call.
      
      Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.comReported-by: NTakao Indoh <indou.takao@jp.fujitsu.com>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: stable@vger.kernel.org # 2.6.38+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      a949ae56
    • T
      genirq: x86: Ensure that dynamic irq allocation does not conflict · 62a08ae2
      Thomas Gleixner 提交于
      On x86 the allocation of irq descriptors may allocate interrupts which
      are in the range of the GSI interrupts. That's wrong as those
      interrupts are hardwired and we don't have the irq domain translation
      like PPC. So one of these interrupts can be hooked up later to one of
      the devices which are hard wired to it and the io_apic init code for
      that particular interrupt line happily reuses that descriptor with a
      completely different configuration so hell breaks lose.
      
      Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
      except for a few usage sites which have not yet blown up in our face
      for whatever reason. But for drivers which need an irq range, like the
      GPIO drivers, we have no limit in place and we don't want to expose
      such a detail to a driver.
      
      To cure this introduce a function which an architecture can implement
      to impose a lower bound on the dynamic interrupt allocations.
      
      Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
      the end of the hardwired interrupt space, so all dynamic allocations
      happen above.
      
      That not only allows the GPIO driver to work sanely, it also protects
      the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
      htirq code. They need to be cleaned up as well, but that's a separate
      issue.
      Reported-by: NJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Krogerus Heikki <heikki.krogerus@intel.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      62a08ae2
    • R
      module: remove warning about waiting module removal. · 79465d2f
      Rusty Russell 提交于
      We remove the waiting module removal in commit 3f2b9c9c (September
      2013), but it turns out that modprobe in kmod (< version 16) was
      asking for waiting module removal.  No one noticed since modprobe would
      check for 0 usage immediately before trying to remove the module, and
      the race is unlikely.
      
      However, it means that anyone running old (but not ancient) kmod
      versions is hitting the printk designed to see if anyone was running
      "rmmod -w".  All reports so far have been false positives, so remove
      the warning.
      
      Fixes: 3f2b9c9cReported-by: NValerio Vanni <valerio.vanni@inwind.it>
      Cc: Elliott, Robert (Server Storage) <Elliott@hp.com>
      Cc: stable@kernel.org
      Acked-by: NLucas De Marchi <lucas.de.marchi@gmail.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      79465d2f
  5. 25 4月, 2014 1 次提交
  6. 22 4月, 2014 1 次提交
  7. 19 4月, 2014 1 次提交
  8. 18 4月, 2014 5 次提交
    • Y
      hrtimer: Export __hrtimer_start_range_ns() · 8588a2bb
      Yan, Zheng 提交于
      Export __hrtimer_start_range_ns() to allow building perf Intel uncore
      driver as a module.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1395133004-23205-2-git-send-email-zheng.z.yan@intel.com
      Cc: eranian@google.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8588a2bb
    • Y
      perf: Allow building PMU drivers as modules · c464c76e
      Yan, Zheng 提交于
      This patch adds support for building PMU driver as module. It exports
      the functions perf_pmu_{register,unregister}() and adds reference tracking
      for the PMU driver module.
      
      When the PMU driver is built as a module, each active event of the PMU
      holds a reference to the driver module.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1395133004-23205-1-git-send-email-zheng.z.yan@intel.com
      Cc: eranian@google.com
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c464c76e
    • T
      genirq: Allow forcing cpu affinity of interrupts · 01f8fa4f
      Thomas Gleixner 提交于
      The current implementation of irq_set_affinity() refuses rightfully to
      route an interrupt to an offline cpu.
      
      But there is a special case, where this is actually desired. Some of
      the ARM SoCs have per cpu timers which require setting the affinity
      during cpu startup where the cpu is not yet in the online mask.
      
      If we can't do that, then the local timer interrupt for the about to
      become online cpu is routed to some random online cpu.
      
      The developers of the affected machines tried to work around that
      issue, but that results in a massive mess in that timer code.
      
      We have a yet unused argument in the set_affinity callbacks of the irq
      chips, which I added back then for a similar reason. It was never
      required so it got not used. But I'm happy that I never removed it.
      
      That allows us to implement a sane handling of the above scenario. So
      the affected SoC drivers can add the required force handling to their
      interrupt chip, switch the timer code to irq_force_affinity() and
      things just work.
      
      This does not affect any existing user of irq_set_affinity().
      
      Tagged for stable to allow a simple fix of the affected SoC clock
      event drivers.
      Reported-and-tested-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Tomasz Figa <t.figa@samsung.com>,
      Cc: Daniel Lezcano <daniel.lezcano@linaro.org>,
      Cc: Kukjin Kim <kgene.kim@samsung.com>
      Cc: linux-arm-kernel@lists.infradead.org,
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20140416143315.717251504@linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      01f8fa4f
    • O
      uprobes/x86: Send SIGILL if arch_uprobe_post_xol() fails · 014940ba
      Oleg Nesterov 提交于
      Currently the error from arch_uprobe_post_xol() is silently ignored.
      This doesn't look good and this can lead to the hard-to-debug problems.
      
      1. Change handle_singlestep() to loudly complain and send SIGILL.
      
         Note: this only affects x86, ppc/arm can't fail.
      
      2. Change arch_uprobe_post_xol() to call arch_uprobe_abort_xol() and
         avoid TF games if it is going to return an error.
      
         This can help to to analyze the problem, if nothing else we should
         not report ->ip = xol_slot in the core-file.
      
         Note: this means that handle_riprel_post_xol() can be called twice,
         but this is fine because it is idempotent.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      014940ba
    • O
      uprobes: Kill UPROBE_SKIP_SSTEP and can_skip_sstep() · 8a6b1732
      Oleg Nesterov 提交于
      UPROBE_COPY_INSN, UPROBE_SKIP_SSTEP, and uprobe->flags must die. This
      patch kills UPROBE_SKIP_SSTEP. I never understood why it was added;
      not only it doesn't help, it harms.
      
      It can only help to avoid arch_uprobe_skip_sstep() if it was already
      called before and failed. But this is ugly, if we want to know whether
      we can emulate this instruction or not we should do this analysis in
      arch_uprobe_analyze_insn(), not when we hit this probe for the first
      time.
      
      And in fact this logic is simply wrong. arch_uprobe_skip_sstep() can
      fail or not depending on the task/register state, if this insn can be
      emulated but, say, put_user() fails we need to xol it this time, but
      this doesn't mean we shouldn't try to emulate it when this or another
      thread hits this bp next time.
      
      And this is the actual reason for this change. We need to emulate the
      "call" insn, but push(return-address) can obviously fail.
      
      Per-arch notes:
      
      	x86: __skip_sstep() can only emulate "rep;nop". With this
      	     change it will be called every time and most probably
      	     for no reason.
      
      	     This will be fixed by the next changes. We need to
      	     change this suboptimal code anyway.
      
      	arm: Should not be affected. It has its own "bool simulate"
      	     flag checked in arch_uprobe_skip_sstep().
      
      	ppc: Looks like, it can emulate almost everything. Does it
      	     actually need to record the fact that emulate_step()
      	     failed? Hopefully not. But if yes, it can add the ppc-
      	     specific flag into arch_uprobe.
      
      TODO: rename arch_uprobe_skip_sstep() to arch_uprobe_emulate_insn(),
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Reviewed-by: NDavid A. Long <dave.long@linaro.org>
      Reviewed-by: NJim Keniston <jkenisto@us.ibm.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      8a6b1732
  9. 17 4月, 2014 4 次提交
  10. 16 4月, 2014 3 次提交
  11. 15 4月, 2014 2 次提交
    • M
      user namespace: fix incorrect memory barriers · e79323bd
      Mikulas Patocka 提交于
      smp_read_barrier_depends() can be used if there is data dependency between
      the readers - i.e. if the read operation after the barrier uses address
      that was obtained from the read operation before the barrier.
      
      In this file, there is only control dependency, no data dependecy, so the
      use of smp_read_barrier_depends() is incorrect. The code could fail in the
      following way:
      * the cpu predicts that idx < entries is true and starts executing the
        body of the for loop
      * the cpu fetches map->extent[0].first and map->extent[0].count
      * the cpu fetches map->nr_extents
      * the cpu verifies that idx < extents is true, so it commits the
        instructions in the body of the for loop
      
      The problem is that in this scenario, the cpu read map->extent[0].first
      and map->nr_extents in the wrong order. We need a full read memory barrier
      to prevent it.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e79323bd
    • D
      seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF · 2eac7648
      Daniel Borkmann 提交于
      Linus reports that on 32-bit x86 Chromium throws the following seccomp
      resp. audit log messages:
      
        audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500
      gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
      pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0
      syscall=172 compat=0 ip=0xb2dd9852 code=0x30000
      
        audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500
      gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
      pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5
      compat=0 ip=0xb2dd9852 code=0x50000
      
      These audit messages are being triggered via audit_seccomp() through
      __secure_computing() in seccomp mode (BPF) filter with seccomp return
      codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
      during filter runtime. Moreover, Linus reports that x86_64 Chromium
      seems fine.
      
      The underlying issue that explains this is that the implementation of
      populate_seccomp_data() is wrong. Our seccomp data structure sd that
      is being shared with user ABI is:
      
        struct seccomp_data {
          int nr;
          __u32 arch;
          __u64 instruction_pointer;
          __u64 args[6];
        };
      
      Therefore, a simple cast to 'unsigned long *' for storing the value of
      the syscall argument via syscall_get_arguments() is just wrong as on
      32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
      at wrong offsets in args[] member, and thus i) could leak stack memory
      to user space and ii) tampers with the logic of seccomp BPF programs
      that read out and check for syscall arguments:
      
        syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);
      
      Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
      test machine through slow ssh X forwarding, but it fixes the issue on
      my side. So fix it up by storing args in type correct variables, gcc
      is clever and optimizes the copy away in other cases, e.g. x86_64.
      
      Fixes: bd4cf0ed ("net: filter: rework/optimize internal BPF interpreter's instruction set")
      Reported-and-bisected-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2eac7648
  12. 13 4月, 2014 1 次提交
  13. 12 4月, 2014 1 次提交
  14. 11 4月, 2014 3 次提交
  15. 10 4月, 2014 1 次提交
  16. 09 4月, 2014 4 次提交
    • L
      futex: avoid race between requeue and wake · 69cd9eba
      Linus Torvalds 提交于
      Jan Stancek reported:
       "pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
        occasionally fails, because some threads fail to wake up.
      
        Testcase creates 5 threads, which are all waiting on same condition.
        Main thread then calls pthread_cond_broadcast() without holding mutex,
        which calls:
      
            futex(uaddr1, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, uaddr2, ..)
      
        This immediately wakes up single thread A, which unlocks mutex and
        tries to wake up another thread:
      
            futex(uaddr2, FUTEX_WAKE_PRIVATE, 1)
      
        If thread A manages to call futex_wake() before any waiters are
        requeued for uaddr2, no other thread is woken up"
      
      The ordering constraints for the hash bucket waiter counting are that
      the waiter counts have to be incremented _before_ getting the spinlock
      (because the spinlock acts as part of the memory barrier), but the
      "requeue" operation didn't honor those rules, and nobody had even
      thought about that case.
      
      This fairly simple patch just increments the waiter count for the target
      hash bucket (hb2) when requeing a futex before taking the locks.  It
      then decrements them again after releasing the lock - the code that
      actually moves the futex(es) between hash buckets will do the additional
      required waiter count housekeeping.
      Reported-and-tested-by: NJan Stancek <jstancek@redhat.com>
      Acked-by: NDavidlohr Bueso <davidlohr@hp.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org # 3.14
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69cd9eba
    • M
      tracepoint: Fix sparse warnings in tracepoint.c · b725dfea
      Mathieu Desnoyers 提交于
      Fix the following sparse warnings:
      
        CHECK   kernel/tracepoint.c
      kernel/tracepoint.c:184:18: warning: incorrect type in assignment (different address spaces)
      kernel/tracepoint.c:184:18:    expected struct tracepoint_func *tp_funcs
      kernel/tracepoint.c:184:18:    got struct tracepoint_func [noderef] <asn:4>*funcs
      kernel/tracepoint.c:216:18: warning: incorrect type in assignment (different address spaces)
      kernel/tracepoint.c:216:18:    expected struct tracepoint_func *tp_funcs
      kernel/tracepoint.c:216:18:    got struct tracepoint_func [noderef] <asn:4>*funcs
      kernel/tracepoint.c:392:24: error: return expression in void function
        CC      kernel/tracepoint.o
      kernel/tracepoint.c: In function tracepoint_module_going:
      kernel/tracepoint.c:491:6: warning: symbol 'syscall_regfunc' was not declared. Should it be static?
      kernel/tracepoint.c:508:6: warning: symbol 'syscall_unregfunc' was not declared. Should it be static?
      
      Link: http://lkml.kernel.org/r/1397049883-28692-1-git-send-email-mathieu.desnoyers@efficios.comSigned-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b725dfea
    • S
      tracepoint: Simplify tracepoint module search · eb7d035c
      Steven Rostedt (Red Hat) 提交于
      Instead of copying the num_tracepoints and tracepoints_ptrs from
      the module structure to the tp_mod structure, which only uses it to
      find the module associated to tracepoints of modules that are coming
      and going, simply copy the pointer to the module struct to the tracepoint
      tp_module structure.
      
      Also removed un-needed brackets around an if statement.
      
      Link: http://lkml.kernel.org/r/20140408201705.4dad2c4a@gandalf.local.homeAcked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      eb7d035c
    • M
      tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints · de7b2973
      Mathieu Desnoyers 提交于
      Register/unregister tracepoint probes with struct tracepoint pointer
      rather than tracepoint name.
      
      This change, which vastly simplifies tracepoint.c, has been proposed by
      Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux
      size.
      
      From this point on, the tracers need to pass a struct tracepoint pointer
      to probe register/unregister. A probe can now only be connected to a
      tracepoint that exists. Moreover, tracers are responsible for
      unregistering the probe before the module containing its associated
      tracepoint is unloaded.
      
         text    data     bss     dec     hex filename
      10443444        4282528 10391552        25117524        17f4354 vmlinux.orig
      10434930        4282848 10391552        25109330        17f2352 vmlinux
      
      Link: http://lkml.kernel.org/r/1396992381-23785-2-git-send-email-mathieu.desnoyers@efficios.com
      
      CC: Ingo Molnar <mingo@kernel.org>
      CC: Frederic Weisbecker <fweisbec@gmail.com>
      CC: Andrew Morton <akpm@linux-foundation.org>
      CC: Frank Ch. Eigler <fche@redhat.com>
      CC: Johannes Berg <johannes.berg@intel.com>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      [ SDR - fixed return val in void func in tracepoint_module_going() ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      de7b2973