1. 27 6月, 2013 1 次提交
    • N
      net: fix kernel deadlock with interface rename and netdev name retrieval. · 5dbe7c17
      Nicolas Schichan 提交于
      When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
      rename of a network interface, it can end up waiting for a workqueue
      to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
      SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
      the fact that read_secklock_begin() will spin forever waiting for the
      writer process (the one doing the interface rename) to update the
      devnet_rename_seq sequence.
      
      This patch fixes the problem by adding a helper (netdev_get_name())
      and using it in the code handling the SIOCGIFNAME ioctl and
      SO_BINDTODEVICE setsockopt.
      
      The netdev_get_name() helper uses raw_seqcount_begin() to avoid
      spinning forever, waiting for devnet_rename_seq->sequence to become
      even. cond_resched() is used in the contended case, before retrying
      the access to give the writer process a chance to finish.
      
      The use of raw_seqcount_begin() will incur some unneeded work in the
      reader process in the contended case, but this is better than
      deadlocking the system.
      Signed-off-by: NNicolas Schichan <nschichan@freebox.fr>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dbe7c17
  2. 26 6月, 2013 1 次提交
  3. 24 6月, 2013 1 次提交
    • R
      ACPI / dock / PCI: Synchronous handling of dock events for PCI devices · 21a31013
      Rafael J. Wysocki 提交于
      The interactions between the ACPI dock driver and the ACPI-based PCI
      hotplug (acpiphp) are currently problematic because of ordering
      issues during hot-remove operations.
      
      First of all, the current ACPI glue code expects that physical
      devices will always be deleted before deleting the companion ACPI
      device objects.  Otherwise, acpi_unbind_one() will fail with a
      warning message printed to the kernel log, for example:
      
      [  185.026073] usb usb5: Oops, 'acpi_handle' corrupt
      [  185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt
      [  185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt
      [  180.013656]  port1: Oops, 'acpi_handle' corrupt
      
      This means, in particular, that struct pci_dev objects have to
      be deleted before the struct acpi_device objects they are "glued"
      with.
      
      Now, the following happens the during the undocking of an ACPI-based
      dock station:
       1) hotplug_dock_devices() invokes registered hotplug callbacks to
          destroy physical devices associated with the ACPI device objects
          depending on the dock station.  It calls dd->ops->handler() for
          each of those device objects.
       2) For PCI devices dd->ops->handler() points to
          handle_hotplug_event_func() that queues up a separate work item
          to execute _handle_hotplug_event_func() for the given device and
          returns immediately.  That work item will be executed later.
       3) hotplug_dock_devices() calls dock_remove_acpi_device() for each
          device depending on the dock station.  This runs acpi_bus_trim()
          for each of them, which causes the underlying ACPI device object
          to be destroyed, but the work items queued up by
          handle_hotplug_event_func() haven't been started yet.
       4) _handle_hotplug_event_func() queued up in step 2) are executed
          and cause the above failure to happen, because the PCI devices
          they handle do not have the companion ACPI device objects any
          more (those objects have been deleted in step 3).
      
      The possible breakage doesn't end here, though, because
      hotplug_dock_devices() may return before at least some of the
      _handle_hotplug_event_func() work items spawned by it have a
      chance to complete and then undock() will cause _DCK to be
      evaluated and that will cause the devices handled by the
      _handle_hotplug_event_func() to go away possibly while they are
      being accessed.
      
      This means that dd->ops->handler() for PCI devices should not point
      to handle_hotplug_event_func().  Instead, it should point to a
      function that will do the work of _handle_hotplug_event_func()
      synchronously.  For this reason, introduce such a function,
      hotplug_event_func(), and modity acpiphp_dock_ops to point to
      it as the handler.
      
      Unfortunately, however, this is not sufficient, because if the dock
      code were not changed further, hotplug_event_func() would now
      deadlock with hotplug_dock_devices() that called it, since it would
      run unregister_hotplug_dock_device() which in turn would attempt to
      acquire the dock station's hp_lock mutex already acquired by
      hotplug_dock_devices().
      
      To resolve that deadlock use the observation that
      unregister_hotplug_dock_device() won't need to acquire hp_lock
      if PCI bridges the devices on the dock station depend on are
      prevented from being removed prematurely while the first loop in
      hotplug_dock_devices() is in progress.
      
      To make that possible, introduce a mechanism by which the callers of
      register_hotplug_dock_device() can provide "init" and "release"
      routines that will be executed, respectively, during the addition
      and removal of the physical device object associated with the
      given ACPI device handle.  Make acpiphp use two new functions,
      acpiphp_dock_init() and acpiphp_dock_release(), that call
      get_bridge() and put_bridge(), respectively, on the acpiphp bridge
      holding the given device, for this purpose.
      
      In addition to that, remove the dock station's list of
      "hotplug devices" and make the dock code always walk the whole list
      of "dependent devices" instead in such a way that the loops in
      hotplug_dock_devices() and dock_event() (replacing the loops over
      "hotplug devices") will take references to the list entries that
      register_hotplug_dock_device() has been called for.  That prevents
      the "release" routines associated with those entries from being
      called while the given entry is being processed and for PCI
      devices this means that their bridges won't be removed (by a
      concurrent thread) while hotplug_event_func() handling them is
      being executed.
      
      This change is based on two earlier patches from Jiang Liu.
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=59501Reported-and-tested-by: NAlexander E. Patrakov <patrakov@gmail.com>
      Tracked-down-by: NJiang Liu <jiang.liu@huawei.com>
      Tested-by: NIllya Klymov <xanf@xanf.me>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Cc: 3.9+ <stable@vger.kernel.org>
      21a31013
  4. 20 6月, 2013 4 次提交
  5. 19 6月, 2013 1 次提交
    • S
      tracing/context-tracking: Add preempt_schedule_context() for tracing · 29bb9e5a
      Steven Rostedt 提交于
      Dave Jones hit the following bug report:
      
       ===============================
       [ INFO: suspicious RCU usage. ]
       3.10.0-rc2+ #1 Not tainted
       -------------------------------
       include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
       other info that might help us debug this:
       RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
       RCU used illegally from extended quiescent state!
       2 locks held by cc1/63645:
        #0:  (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
        #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0
      
       CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
       Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
        0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
        ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
        0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
       Call Trace:
        [<ffffffff816ae383>] dump_stack+0x19/0x1b
        [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
        [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
        [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
        [<ffffffff8108dffc>] update_curr+0xec/0x240
        [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
        [<ffffffff816b3a71>] __schedule+0x161/0x9b0
        [<ffffffff816b4721>] preempt_schedule+0x51/0x80
        [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
        [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
        [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
        [<ffffffff816be280>] ftrace_call+0x5/0x2f
        [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
        [<ffffffff816b4805>] ? schedule_user+0x5/0x70
        [<ffffffff816b4805>] ? schedule_user+0x5/0x70
        [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
       ------------[ cut here ]------------
      
      What happened was that the function tracer traced the schedule_user() code
      that tells RCU that the system is coming back from userspace, and to
      add the CPU back to the RCU monitoring.
      
      Because the function tracer does a preempt_disable/enable_notrace() calls
      the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
      then preempt_schedule() is called. But this is called before the user_exit()
      function can inform the kernel that the CPU is no longer in user mode and
      needs to be accounted for by RCU.
      
      The fix is to create a new preempt_schedule_context() that checks if
      the kernel is still in user mode and if so to switch it to kernel mode
      before calling schedule. It also switches back to user mode coming back
      from schedule in need be.
      
      The only user of this currently is the preempt_enable_notrace(), which is
      only used by the tracing subsystem.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
      29bb9e5a
  6. 15 6月, 2013 1 次提交
    • D
      smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu(). · f21afc25
      David Daney 提交于
      Thanks to commit f91eb62f ("init: scream bloody murder if interrupts
      are enabled too early"), "bloody murder" is now being screamed.
      
      With a MIPS OCTEON config, we use on_each_cpu() in our
      irq_chip.irq_bus_sync_unlock() function.  This gets called in early as a
      result of the time_init() call.  Because the !SMP version of
      on_each_cpu() unconditionally enables irqs, we get:
      
          WARNING: at init/main.c:560 start_kernel+0x250/0x410()
          Interrupts were enabled early
          CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801
          Call Trace:
            show_stack+0x68/0x80
            warn_slowpath_common+0x78/0xb0
            warn_slowpath_fmt+0x38/0x48
            start_kernel+0x250/0x410
      
      Suggested fix: Do what we already do in the SMP version of
      on_each_cpu(), and use local_irq_save/local_irq_restore.  Because we
      need a flags variable, make it a static inline to avoid name space
      issues.
      
      [ Change from v1: Convert on_each_cpu to a static inline function, add
        #include <linux/irqflags.h> to avoid build breakage on some files.
      
        on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as
        on_each_cpu(), but they are not causing !SMP bugs for me, so I will
        defer changing them to a less urgent patch. ]
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f21afc25
  7. 13 6月, 2013 5 次提交
  8. 12 6月, 2013 2 次提交
  9. 11 6月, 2013 3 次提交
  10. 07 6月, 2013 2 次提交
  11. 06 6月, 2013 1 次提交
    • P
      arch, mm: Remove tlb_fast_mode() · 29eb7782
      Peter Zijlstra 提交于
      Since the introduction of preemptible mmu_gather TLB fast mode has been
      broken. TLB fast mode relies on there being absolutely no concurrency;
      it frees pages first and invalidates TLBs later.
      
      However now we can get concurrency and stuff goes *bang*.
      
      This patch removes all tlb_fast_mode() code; it was found the better
      option vs trying to patch the hole by entangling tlb invalidation with
      the scheduler.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Reported-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29eb7782
  12. 05 6月, 2013 1 次提交
  13. 04 6月, 2013 1 次提交
  14. 03 6月, 2013 2 次提交
  15. 01 6月, 2013 3 次提交
  16. 31 5月, 2013 4 次提交
    • F
      kvm: Move guest entry/exit APIs to context_tracking · 521921ba
      Frederic Weisbecker 提交于
      The kvm_host.h header file doesn't handle well
      inclusion when archs don't support KVM.
      
      This results in build crashes for such archs when they
      want to implement context tracking because this subsystem
      includes kvm_host.h in order to implement the
      guest_enter/exit APIs but it doesn't handle KVM off case.
      
      To fix this, move the guest_enter()/guest_exit()
      declarations and generic implementation to the context
      tracking headers. These generic APIs actually belong to
      this subsystem, besides other domains boundary tracking
      like user_enter() et al.
      
      KVM now properly becomes a user of this library, not the
      other buggy way around.
      Reported-by: NKevin Hilman <khilman@linaro.org>
      Reviewed-by: NKevin Hilman <khilman@linaro.org>
      Tested-by: NKevin Hilman <khilman@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      521921ba
    • F
      vtime: Use consistent clocks among nohz accounting · 45eacc69
      Frederic Weisbecker 提交于
      While computing the cputime delta of dynticks CPUs,
      we are mixing up clocks of differents natures:
      
      * local_clock() which takes care of unstable clock
      sources and fix these if needed.
      
      * sched_clock() which is the weaker version of
      local_clock(). It doesn't compute any fixup in case
      of unstable source.
      
      If the clock source is stable, those two clocks are the
      same and we can safely compute the difference against
      two random points.
      
      Otherwise it results in random deltas as sched_clock()
      can randomly drift away, back or forward, from local_clock().
      
      As a consequence, some strange behaviour with unstable tsc
      has been observed such as non progressing constant zero cputime.
      (The 'top' command showing no load).
      
      Fix this by only using local_clock(), or its irq safe/remote
      equivalent, in vtime code.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Suggested-by: NMike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      45eacc69
    • N
      target: Propigate up ->cmd_kref put return via transport_generic_free_cmd · d5ddad41
      Nicholas Bellinger 提交于
      Go ahead and propigate up the ->cmd_kref put return value from
      target_put_sess_cmd() -> transport_release_cmd() -> transport_put_cmd()
      -> transport_generic_free_cmd().
      
      This is useful for certain fabrics when determining the active I/O
      shutdown case with SCF_ACK_KREF where a final target_put_sess_cmd()
      is still required by the caller.
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      d5ddad41
    • L
      aerdrv: Move cper_print_aer() call out of interrupt context · 37448adf
      Lance Ortiz 提交于
      The following warning was seen on 3.9 when a corrected PCIe error was being
      handled by the AER subsystem.
      
      WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90()
      
      This occurred because a call to pci_get_domain_bus_and_slot() was added to
      cper_print_pcie() to setup for the call to cper_print_aer().  The warning
      showed up because cper_print_pcie() is called in an interrupt context and
      pci_get* functions are not supposed to be called in that context.
      
      The solution is to move the cper_print_aer() call out of the interrupt
      context and into aer_recover_work_func() to avoid any warnings when calling
      pci_get* functions.
      Signed-off-by: NLance Ortiz <lance.ortiz@hp.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      37448adf
  17. 30 5月, 2013 2 次提交
    • R
      scatterlist: sg_set_buf() argument must be in linear mapping · ac4e97ab
      Rusty Russell 提交于
      Add a check behind CONFIG_DEBUG_SG to verify this.
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ac4e97ab
    • N
      target: Re-instate sess_wait_list for target_wait_for_sess_cmds · 9b31a328
      Nicholas Bellinger 提交于
      Switch back to pre commit 1c7b13fe list splicing logic for active I/O
      shutdown with tcm_qla2xxx + ib_srpt fabrics.
      
      The original commit was done under the incorrect assumption that it's safe to
      walk se_sess->sess_cmd_list unprotected in target_wait_for_sess_cmds() after
      sess->sess_tearing_down = 1 has been set by target_sess_cmd_list_set_waiting()
      during session shutdown.
      
      So instead of adding sess->sess_cmd_lock protection around sess->sess_cmd_list
      during target_wait_for_sess_cmds(), switch back to sess->sess_wait_list to
      allow wait_for_completion() + TFO->release_cmd() to occur without having to
      walk ->sess_cmd_list after the list_splice.
      
      Also add a check to exit if target_sess_cmd_list_set_waiting() has already
      been called, and add a WARN_ON to check for any fabric bug where new se_cmds
      are added to sess->sess_cmd_list after sess->sess_tearing_down = 1 has already
      been set.
      
      Cc: Joern Engel <joern@logfs.org>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      9b31a328
  18. 29 5月, 2013 2 次提交
  19. 28 5月, 2013 1 次提交
    • P
      perf: Fix perf mmap bugs · 26cb63ad
      Peter Zijlstra 提交于
      Vince reported a problem found by his perf specific trinity
      fuzzer.
      
      Al noticed 2 problems with perf's mmap():
      
       - it has issues against fork() since we use vma->vm_mm for accounting.
       - it has an rb refcount leak on double mmap().
      
      We fix the issues against fork() by using VM_DONTCOPY; I don't
      think there's code out there that uses this; we didn't hear
      about weird accounting problems/crashes. If we do need this to
      work, the previously proposed VM_PINNED could make this work.
      
      Aside from the rb reference leak spotted by Al, Vince's example
      prog was indeed doing a double mmap() through the use of
      perf_event_set_output().
      
      This exposes another problem, since we now have 2 events with
      one buffer, the accounting gets screwy because we account per
      event. Fix this by making the buffer responsible for its own
      accounting.
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Link: http://lkml.kernel.org/r/20130528085548.GA12193@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      26cb63ad
  20. 25 5月, 2013 2 次提交