1. 18 3月, 2011 2 次提交
    • M
      call_function_many: add missing ordering · 45a57919
      Milton Miller 提交于
      Paul McKenney's review pointed out two problems with the barriers in the
      2.6.38 update to the smp call function many code.
      
      First, a barrier that would force the func and info members of data to
      be visible before their consumption in the interrupt handler was
      missing.  This can be solved by adding a smp_wmb between setting the
      func and info members and setting setting the cpumask; this will pair
      with the existing and required smp_rmb ordering the cpumask read before
      the read of refs.  This placement avoids the need a second smp_rmb in
      the interrupt handler which would be executed on each of the N cpus
      executing the call request.  (I was thinking this barrier was present
      but was not).
      
      Second, the previous write to refs (establishing the zero that we the
      interrupt handler was testing from all cpus) was performed by a third
      party cpu.  This would invoke transitivity which, as a recient or
      concurrent addition to memory-barriers.txt now explicitly states, would
      require a full smp_mb().
      
      However, we know the cpumask will only be set by one cpu (the data
      owner) and any preivous iteration of the mask would have cleared by the
      reading cpu.  By redundantly writing refs to 0 on the owning cpu before
      the smp_wmb, the write to refs will follow the same path as the writes
      that set the cpumask, which in turn allows us to keep the barrier in the
      interrupt handler a smp_rmb instead of promoting it to a smp_mb (which
      will be be executed by N cpus for each of the possible M elements on the
      list).
      
      I moved and expanded the comment about our (ab)use of the rcu list
      primitives for the concurrent walk earlier into this function.  I
      considered moving the first two paragraphs to the queue list head and
      lock, but felt it would have been too disconected from the code.
      
      Cc: Paul McKinney <paulmck@linux.vnet.ibm.com>
      Cc: stable@kernel.org (2.6.32 and later)
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45a57919
    • M
      call_function_many: fix list delete vs add race · e6cd1e07
      Milton Miller 提交于
      Peter pointed out there was nothing preventing the list_del_rcu in
      smp_call_function_interrupt from running before the list_add_rcu in
      smp_call_function_many.
      
      Fix this by not setting refs until we have gotten the lock for the list.
      Take advantage of the wmb in list_add_rcu to save an explicit additional
      one.
      
      I tried to force this race with a udelay before the lock & list_add and
      by mixing all 64 online cpus with just 3 random cpus in the mask, but
      was unsuccessful.  Still, inspection shows a valid race, and the fix is
      a extension of the existing protection window in the current code.
      
      Cc: stable@kernel.org (v2.6.32 and later)
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6cd1e07
  2. 17 3月, 2011 1 次提交
  3. 15 3月, 2011 13 次提交
  4. 14 3月, 2011 4 次提交
    • K
      printk: do not mangle valid userspace syslog prefixes · 9d90c8d9
      Kay Sievers 提交于
      printk: do not mangle valid userspace syslog prefixes with /dev/kmsg
      
      Log messages passed to the kernel log by using /dev/kmsg or /dev/ttyprintk
      might contain a syslog prefix including the syslog facility value.
      
      This makes printk to recognize these headers properly, extract the real log
      level from it to use, and add the prefix as a proper prefix to the
      log buffer, instead of wrongly printing it as the log message text.
      
      Before:
        $ echo '<14>text' > /dev/kmsg
        $ dmesg -r
        <4>[135159.594810] <14>text
      
      After:
        $ echo '<14>text' > /dev/kmsg
        $ dmesg -r
        <14>[   50.750654] text
      
      Cc: Lennart Poettering <lennart@poettering.net>
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      9d90c8d9
    • A
      open-style analog of vfs_path_lookup() · 73d049a4
      Al Viro 提交于
      new function: file_open_root(dentry, mnt, name, flags) opens the file
      vfs_path_lookup would arrive to.
      
      Note that name can be empty; in that case the usual requirement that
      dentry should be a directory is lifted.
      
      open-coded equivalents switched to it, may_open() got down exactly
      one caller and became static.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      73d049a4
    • A
      kill path_lookup() · c9c6cac0
      Al Viro 提交于
      all remaining callers pass LOOKUP_PARENT to it, so
      flags argument can die; renamed to kern_path_parent()
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c9c6cac0
    • A
      fix race in audit_get_nd() · 15a9155f
      Al Viro 提交于
      don't rely on pathname resolution ending up twice at the same point...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      15a9155f
  5. 13 3月, 2011 1 次提交
    • T
      posix-clocks: Check write permissions in posix syscalls · 6e6823d1
      Torben Hohn 提交于
      pc_clock_settime() and pc_clock_adjtime() do not check whether the fd
      was opened in write mode, so a clock can be set with a read only fd.
      
      [ tglx: We deliberately do not return -EPERM as we want this to be
        	distingushable from the capability based permission check ]
      Signed-off-by: NTorben Hohn <torbenh@gmx.de>
      LKML-Reference: <1299173174-348-4-git-send-email-torbenh@gmx.de>
      Cc: Richard Cochran <richard.cochran@omicron.at>
      Cc: John Stultz <johnstul@us.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      6e6823d1
  6. 12 3月, 2011 3 次提交
    • T
      genirq: Add chip flag to force mask on suspend · d209a699
      Thomas Gleixner 提交于
      On suspend we disable all interrupts in the core code, but this does
      not mask the interrupt line in the default implementation as we use a
      lazy disable approach. That means we mark the interrupt disabled, but
      leave the hardware unmasked. That's an optimization because we avoid
      the hardware access for the common case where no interrupt happens
      after we marked it disabled. If an interrupt happens, then the
      interrupt flow handler masks the line at the hardware level and marks
      it pending.
      
      Suspend makes use of this delayed disable as it "disables" all
      interrupts when preparing the suspend transition. Right before the
      system goes into hardware suspend state it checks whether one of the
      interrupts which is marked as a wakeup interrupt came in after
      disabling it.
      
      Most interrupt chips have a separate register which selects the
      interrupts which can wake up the system from suspend, so we don't have
      to mask any on the non wakeup interrupts.
      
      But now we have to deal with brilliant designed hardware which lacks
      such a wakeup configuration facility. For such hardware it's necessary
      to mask all non wakeup interrupts before going into suspend in order
      to avoid the wakeup from random interrupts.
      
      Rather than working around this in the affected interrupt chip
      implementations we can solve this elegant in the core code itself.
      
      Add a flag IRQCHIP_MASK_ON_SUSPEND which can be set by the irq chip
      implementation to indicate, that the interrupts which are not selected
      as wakeup sources must be masked in the suspend path. Mask them in the
      loop which checks the wakeup interrupts pending flag.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NAbhijeet Dharmapurikar <adharmap@codeaurora.org>
      LKML-Reference: <alpine.LFD.2.00.1103112112310.2787@localhost6.localdomain6>
      d209a699
    • L
      futex,plist: Remove debug lock assignment from plist_node · 017f2b23
      Lai Jiangshan 提交于
      The original code uses &plist_node->plist as the fake head of
      the priority list for plist_del(), these debug locks in
      the fake head are needed for CONFIG_DEBUG_PI_LIST.
      
      But now we always pass the real head to plist_del(), the debug locks
      in plist_node will not be used, so we remove these assignments.
      Acked-by: NDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4D10797E.7040803@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      017f2b23
    • L
      futex,plist: Pass the real head of the priority list to plist_del() · 2e12978a
      Lai Jiangshan 提交于
      Some plist_del()s in kernel/futex.c are passed a faked head of the
      priority list.
      
      It does not fail because the current code does not require the real head
      in plist_del(). The current code of plist_del() just uses the head for checking,
      so it will not cause a bad result even when we use a faked head.
      
      But it is undocumented usage:
      
      /**
       * plist_del - Remove a @node from plist.
       *
       * @node:	&struct plist_node pointer - entry to be removed
       * @head:	&struct plist_head pointer - list head
       */
      
      The document says that the @head is the "list head" head of the priority list.
      
      In futex code, several places use "plist_del(&q->list, &q->list.plist);",
      they pass a fake head. We need to fix them all.
      
      Thanks to Darren Hart for many suggestions.
      Acked-by: NDarren Hart <dvhart@linux.intel.com>
      Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      LKML-Reference: <4D11984A.5030203@cn.fujitsu.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      2e12978a
  7. 11 3月, 2011 5 次提交
    • M
      futex: Sanitize cmpxchg_futex_value_locked API · 37a9d912
      Michel Lespinasse 提交于
      The cmpxchg_futex_value_locked API was funny in that it returned either
      the original, user-exposed futex value OR an error code such as -EFAULT.
      This was confusing at best, and could be a source of livelocks in places
      that retry the cmpxchg_futex_value_locked after trying to fix the issue
      by running fault_in_user_writeable().
          
      This change makes the cmpxchg_futex_value_locked API more similar to the
      get_futex_value_locked one, returning an error code and updating the
      original value through a reference argument.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Acked-by: Chris Metcalf <cmetcalf@tilera.com>  [tile]
      Acked-by: Tony Luck <tony.luck@intel.com>  [ia64]
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: Michal Simek <monstr@monstr.eu>  [microblaze]
      Acked-by: David Howells <dhowells@redhat.com> [frv]
      Cc: Darren Hart <darren@dvhart.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <20110311024851.GC26122@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      37a9d912
    • T
      futex: Avoid redudant evaluation of task_pid_vnr() · c0c9ed15
      Thomas Gleixner 提交于
      The result is not going to change under us, so no need to reevaluate
      this over and over. Seems to be a leftover from the mechanical mass
      conversion of task->pid to task_pid_vnr(tsk).
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      c0c9ed15
    • T
      SUNRPC: Close a race in __rpc_wait_for_completion_task() · bf294b41
      Trond Myklebust 提交于
      Although they run as rpciod background tasks, under normal operation
      (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
      and nfs4_do_close() want to be fully synchronous. This means that when we
      exit, we want all references to the rpc_task to be gone, and we want
      any dentry references etc. held by that task to be released.
      
      For this reason these functions call __rpc_wait_for_completion_task(),
      followed by rpc_put_task() in the expectation that the latter will be
      releasing the last reference to the rpc_task, and thus ensuring that the
      callback_ops->rpc_release() has been called synchronously.
      
      This patch fixes a race which exists due to the fact that
      rpciod calls rpc_complete_task() (in order to wake up the callers of
      __rpc_wait_for_completion_task()) and then subsequently calls
      rpc_put_task() without ensuring that these two steps are done atomically.
      
      In order to avoid adding new spin locks, the patch uses the existing
      waitqueue spin lock to order the rpc_task reference count releases between
      the waiting process and rpciod.
      The common case where nobody is waiting for completion is optimised for by
      checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
      reference count is 1: in those cases we drop trying to grab the spin lock,
      and immediately free up the rpc_task.
      
      Those few processes that need to put the rpc_task from inside an
      asynchronous context and that do not care about ordering are given a new
      helper: rpc_put_task_async().
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      bf294b41
    • M
      futex: Update futex_wait_setup comments about locking · 8fe8f545
      Michel Lespinasse 提交于
      Reviving a cleanup I had done about a year ago as part of a larger
      futex_set_wait proposal. Over the years, the locking of the hashed
      futex queue got improved, so that some of the "rare but normal" race
      conditions described in comments can't actually happen anymore.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Darren Hart <dvhltc@us.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      LKML-Reference: <20110307020750.GA31188@google.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      8fe8f545
    • T
      hrtimer: Remove empty hrtimer_init_hres_timer() · a9e7acff
      Thomas Gleixner 提交于
      Leftover from earlier implementation. All empty, remove it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a9e7acff
  8. 10 3月, 2011 9 次提交
    • S
      tracing: Fix irqoff selftest expanding max buffer · 4a0b1665
      Steven Rostedt 提交于
      If the kernel command line declares a tracer "ftrace=sometracer" and
      that tracer is either not defined or is enabled after irqsoff,
      then the irqs off selftest will fail with the following error:
      
      Testing tracer irqsoff:
      ------------[ cut here ]------------
      WARNING: at /home/rostedt/work/autotest/nobackup/linux-test.git/kernel/trace/tra
      ce.c:713 update_max_tr_single+0xfa/0x11b()
      Hardware name:
      Modules linked in:
      Pid: 1, comm: swapper Not tainted 2.6.38-rc8-test #1
      Call Trace:
       [<c0441d9d>] ? warn_slowpath_common+0x65/0x7a
       [<c049adb2>] ? update_max_tr_single+0xfa/0x11b
       [<c0441dc1>] ? warn_slowpath_null+0xf/0x13
       [<c049adb2>] ? update_max_tr_single+0xfa/0x11b
       [<c049e454>] ? stop_critical_timing+0x154/0x204
       [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
       [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
       [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
       [<c049e529>] ? time_hardirqs_on+0x25/0x28
       [<c0468bca>] ? trace_hardirqs_on_caller+0x18/0x12f
       [<c0468cec>] ? trace_hardirqs_on+0xb/0xd
       [<c049b54b>] ? trace_selftest_startup_irqsoff+0x5b/0xc1
       [<c049b6b8>] ? register_tracer+0xf8/0x1a3
       [<c14e93fe>] ? init_irqsoff_tracer+0xd/0x11
       [<c040115e>] ? do_one_initcall+0x71/0x121
       [<c14e93f1>] ? init_irqsoff_tracer+0x0/0x11
       [<c14ce3a9>] ? kernel_init+0x13a/0x1b6
       [<c14ce26f>] ? kernel_init+0x0/0x1b6
       [<c0403842>] ? kernel_thread_helper+0x6/0x10
      ---[ end trace e93713a9d40cd06c ]---
      .. no entries found ..FAILED!
      
      What happens is the "ftrace=..." will expand the ring buffer to its
      default size (from its minimum size) but it will not expand the
      max ring buffer (the ring buffer to store maximum latencies).
      When the irqsoff test runs, it will call the ring buffer swap routine
      that checks if the max ring buffer is the same size as the normal
      ring buffer, and will fail if it is not. This causes the test to fail.
      
      The solution is to expand the max ring buffer before running the self
      test if the max ring buffer is used by that tracer and the normal ring
      buffer is expanded. The max ring buffer should be shrunk again after
      the test is done to save space.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4a0b1665
    • S
      tracing: Align 4 byte ints together in struct tracer · 9a24470b
      Steven Rostedt 提交于
      Move elements in struct tracer for better alignment.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      9a24470b
    • Y
      tracing: Export trace_set_clr_event() · 56355b83
      Yuanhan Liu 提交于
      Trace events belonging to a module only exists when the module is
      loaded. Well, we can use trace_set_clr_event funtion to enable some
      trace event at the module init routine, so that we will not miss
      something while loading then module.
      
      So, Export the trace_set_clr_event function so that module can use it.
      Signed-off-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      LKML-Reference: <1289196312-25323-1-git-send-email-yuanhan.liu@linux.intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      56355b83
    • J
      tracing: Explain about unstable clock on resume with ring buffer warning · 31274d72
      Jiri Olsa 提交于
      The "Delta way too big" warning might appear on a system with a
      unstable shed clock right after the system is resumed and tracing
      was enabled at time of suspend.
      
      Since it's not realy a bug, and the unstable sched clock is working
      fast and reliable otherwise, Steven suggested to keep using the
      sched clock in any case and just to make note in the warning itself.
      
      v2 changes:
      - added #ifdef CONFIG_HAVE_UNSTABLE_SCHED_CLOCK
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      LKML-Reference: <20110218145219.GD2604@jolsa.brq.redhat.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      31274d72
    • D
      tracing: Adjust conditional expression latency formatting. · 10da37a6
      David Sharp 提交于
      Formatting change only to improve code readability. No code changes except to
      introduce intermediate variables.
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1291421609-14665-13-git-send-email-dhsharp@google.com>
      
      [ Keep variable declarations and assignment separate ]
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      10da37a6
    • D
      tracing: Fix event alignment: ftrace:context_switch and ftrace:wakeup · 140e4f2d
      David Sharp 提交于
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1291421609-14665-6-git-send-email-dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      140e4f2d
    • S
      tracing: Remove lock_depth from event entry · e6e1e259
      Steven Rostedt 提交于
      The lock_depth field in the event headers was added as a temporary
      data point for help in removing the BKL. Now that the BKL is pretty
      much been removed, we can remove this field.
      
      This in turn changes the header from 12 bytes to 8 bytes,
      removing the 4 byte buffer that gcc would insert if the first field
      in the data load was 8 bytes in size.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e6e1e259
    • D
      ring-buffer: Remove unused #include <linux/trace_irq.h> · de29be5e
      David Sharp 提交于
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1291421609-14665-3-git-send-email-dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      de29be5e
    • D
      tracing: Add an 'overwrite' trace_option. · 750912fa
      David Sharp 提交于
      Add an "overwrite" trace_option for ftrace to control whether the buffer should
      be overwritten on overflow or not. The default remains to overwrite old events
      when the buffer is full. This patch adds the option to instead discard newest
      events when the buffer is full. This is useful to get a snapshot of traces just
      after enabling traces. Dropping the current event is also a simpler code path.
      Signed-off-by: NDavid Sharp <dhsharp@google.com>
      LKML-Reference: <1291844807-15481-1-git-send-email-dhsharp@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      750912fa
  9. 09 3月, 2011 1 次提交
  10. 08 3月, 2011 1 次提交
    • S
      debugobjects: Add hint for better object identification · 99777288
      Stanislaw Gruszka 提交于
      In complex subsystems like mac80211 structures can contain several
      timers and work structs, so identifying a specific instance from the
      call trace and object type output of debugobjects can be hard.
      
      Allow the subsystems which support debugobjects to provide a hint
      function. This function returns a pointer to a kernel address
      (preferrably the objects callback function) which is printed along
      with the debugobjects type.
      
      Add hint methods for timer_list, work_struct and hrtimer.
      
      [ tglx: Massaged changelog, made it compile ]
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      LKML-Reference: <20110307085809.GA9334@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      99777288