1. 08 7月, 2013 1 次提交
  2. 05 7月, 2013 1 次提交
    • T
      clocksource: Reselect clocksource when watchdog validated high-res capability · 332962f2
      Thomas Gleixner 提交于
      Up to commit 5d33b883 (clocksource: Always verify highres capability)
      we had no sanity check when selecting a clocksource, which prevented
      that a non highres capable clocksource is used when the system already
      switched to highres/nohz mode.
      
      The new sanity check works as Alex and Tim found out. It prevents the
      TSC from being used. This happens because on x86 the boot process
      looks like this:
      
       tsc_start_freqency_validation(TSC);
       clocksource_register(HPET);
       clocksource_done_booting();
      	clocksource_select()
      		Selects HPET which is valid for high-res
      
       switch_to_highres();
      
       clocksource_register(TSC);
       	TSC is not selected, because it is not yet
      	flagged as VALID_HIGH_RES
      
       clocksource_watchdog()
      	Validates TSC for highres, but that does not make TSC
      	the current clocksource.
      
      Before the sanity check was added, we installed TSC unvalidated which
      worked most of the time. If the TSC was really detected as unstable,
      then the unstable logic removed it and installed HPET again.
      
      The sanity check is correct and needed. So the watchdog needs to kick
      a reselection of the clocksource, when it qualifies TSC as a valid
      high res clocksource.
      
      To solve this, we mark the clocksource which got the flag
      CLOCK_SOURCE_VALID_FOR_HRES set by the watchdog with an new flag
      CLOCK_SOURCE_RESELECT and trigger the watchdog thread. The watchdog
      thread evaluates the flag and invokes clocksource_select() when set.
      
      To avoid that the clocksource_done_booting() code, which is about to
      install the first real clocksource anyway, needs to go through
      clocksource_select and tick_oneshot_notify() pointlessly, split out
      the clocksource_watchdog_kthread() list walk code and invoke the
      select/notify only when called from clocksource_watchdog_kthread().
      
      So clocksource_done_booting() can utilize the same splitout code
      without the select/notify invocation and the clocksource_mutex
      unlock/relock dance.
      Reported-and-tested-by: NAlex Shi <alex.shi@intel.com>
      Cc: Hans Peter Anvin <hpa@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Andi Kleen <andi.kleen@intel.com>
      Tested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1307042239150.11637@ionos.tec.linutronix.deSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      332962f2
  3. 03 7月, 2013 1 次提交
    • F
      posix_cpu_timer: consolidate expiry time type · 55ccb616
      Frederic Weisbecker 提交于
      The posix cpu timer expiry time is stored in a union of two types: a 64
      bits field if we rely on scheduler precise accounting, or a cputime_t if
      we rely on jiffies.
      
      This results in quite some duplicate code and special cases to handle the
      two types.
      
      Just unify this into a single 64 bits field.  cputime_t can always fit
      into it.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Olivier Langlois <olivier@trillion01.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      55ccb616
  4. 29 6月, 2013 1 次提交
  5. 27 6月, 2013 1 次提交
    • N
      net: fix kernel deadlock with interface rename and netdev name retrieval. · 5dbe7c17
      Nicolas Schichan 提交于
      When the kernel (compiled with CONFIG_PREEMPT=n) is performing the
      rename of a network interface, it can end up waiting for a workqueue
      to complete. If userland is able to invoke a SIOCGIFNAME ioctl or a
      SO_BINDTODEVICE getsockopt in between, the kernel will deadlock due to
      the fact that read_secklock_begin() will spin forever waiting for the
      writer process (the one doing the interface rename) to update the
      devnet_rename_seq sequence.
      
      This patch fixes the problem by adding a helper (netdev_get_name())
      and using it in the code handling the SIOCGIFNAME ioctl and
      SO_BINDTODEVICE setsockopt.
      
      The netdev_get_name() helper uses raw_seqcount_begin() to avoid
      spinning forever, waiting for devnet_rename_seq->sequence to become
      even. cond_resched() is used in the contended case, before retrying
      the access to give the writer process a chance to finish.
      
      The use of raw_seqcount_begin() will incur some unneeded work in the
      reader process in the contended case, but this is better than
      deadlocking the system.
      Signed-off-by: NNicolas Schichan <nschichan@freebox.fr>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5dbe7c17
  6. 26 6月, 2013 1 次提交
  7. 24 6月, 2013 1 次提交
    • R
      ACPI / dock / PCI: Synchronous handling of dock events for PCI devices · 21a31013
      Rafael J. Wysocki 提交于
      The interactions between the ACPI dock driver and the ACPI-based PCI
      hotplug (acpiphp) are currently problematic because of ordering
      issues during hot-remove operations.
      
      First of all, the current ACPI glue code expects that physical
      devices will always be deleted before deleting the companion ACPI
      device objects.  Otherwise, acpi_unbind_one() will fail with a
      warning message printed to the kernel log, for example:
      
      [  185.026073] usb usb5: Oops, 'acpi_handle' corrupt
      [  185.035150] pci 0000:1b:00.0: Oops, 'acpi_handle' corrupt
      [  185.035515] pci 0000:18:02.0: Oops, 'acpi_handle' corrupt
      [  180.013656]  port1: Oops, 'acpi_handle' corrupt
      
      This means, in particular, that struct pci_dev objects have to
      be deleted before the struct acpi_device objects they are "glued"
      with.
      
      Now, the following happens the during the undocking of an ACPI-based
      dock station:
       1) hotplug_dock_devices() invokes registered hotplug callbacks to
          destroy physical devices associated with the ACPI device objects
          depending on the dock station.  It calls dd->ops->handler() for
          each of those device objects.
       2) For PCI devices dd->ops->handler() points to
          handle_hotplug_event_func() that queues up a separate work item
          to execute _handle_hotplug_event_func() for the given device and
          returns immediately.  That work item will be executed later.
       3) hotplug_dock_devices() calls dock_remove_acpi_device() for each
          device depending on the dock station.  This runs acpi_bus_trim()
          for each of them, which causes the underlying ACPI device object
          to be destroyed, but the work items queued up by
          handle_hotplug_event_func() haven't been started yet.
       4) _handle_hotplug_event_func() queued up in step 2) are executed
          and cause the above failure to happen, because the PCI devices
          they handle do not have the companion ACPI device objects any
          more (those objects have been deleted in step 3).
      
      The possible breakage doesn't end here, though, because
      hotplug_dock_devices() may return before at least some of the
      _handle_hotplug_event_func() work items spawned by it have a
      chance to complete and then undock() will cause _DCK to be
      evaluated and that will cause the devices handled by the
      _handle_hotplug_event_func() to go away possibly while they are
      being accessed.
      
      This means that dd->ops->handler() for PCI devices should not point
      to handle_hotplug_event_func().  Instead, it should point to a
      function that will do the work of _handle_hotplug_event_func()
      synchronously.  For this reason, introduce such a function,
      hotplug_event_func(), and modity acpiphp_dock_ops to point to
      it as the handler.
      
      Unfortunately, however, this is not sufficient, because if the dock
      code were not changed further, hotplug_event_func() would now
      deadlock with hotplug_dock_devices() that called it, since it would
      run unregister_hotplug_dock_device() which in turn would attempt to
      acquire the dock station's hp_lock mutex already acquired by
      hotplug_dock_devices().
      
      To resolve that deadlock use the observation that
      unregister_hotplug_dock_device() won't need to acquire hp_lock
      if PCI bridges the devices on the dock station depend on are
      prevented from being removed prematurely while the first loop in
      hotplug_dock_devices() is in progress.
      
      To make that possible, introduce a mechanism by which the callers of
      register_hotplug_dock_device() can provide "init" and "release"
      routines that will be executed, respectively, during the addition
      and removal of the physical device object associated with the
      given ACPI device handle.  Make acpiphp use two new functions,
      acpiphp_dock_init() and acpiphp_dock_release(), that call
      get_bridge() and put_bridge(), respectively, on the acpiphp bridge
      holding the given device, for this purpose.
      
      In addition to that, remove the dock station's list of
      "hotplug devices" and make the dock code always walk the whole list
      of "dependent devices" instead in such a way that the loops in
      hotplug_dock_devices() and dock_event() (replacing the loops over
      "hotplug devices") will take references to the list entries that
      register_hotplug_dock_device() has been called for.  That prevents
      the "release" routines associated with those entries from being
      called while the given entry is being processed and for PCI
      devices this means that their bridges won't be removed (by a
      concurrent thread) while hotplug_event_func() handling them is
      being executed.
      
      This change is based on two earlier patches from Jiang Liu.
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=59501Reported-and-tested-by: NAlexander E. Patrakov <patrakov@gmail.com>
      Tracked-down-by: NJiang Liu <jiang.liu@huawei.com>
      Tested-by: NIllya Klymov <xanf@xanf.me>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Cc: 3.9+ <stable@vger.kernel.org>
      21a31013
  8. 20 6月, 2013 4 次提交
  9. 19 6月, 2013 2 次提交
    • S
      tracing/context-tracking: Add preempt_schedule_context() for tracing · 29bb9e5a
      Steven Rostedt 提交于
      Dave Jones hit the following bug report:
      
       ===============================
       [ INFO: suspicious RCU usage. ]
       3.10.0-rc2+ #1 Not tainted
       -------------------------------
       include/linux/rcupdate.h:771 rcu_read_lock() used illegally while idle!
       other info that might help us debug this:
       RCU used illegally from idle CPU! rcu_scheduler_active = 1, debug_locks = 0
       RCU used illegally from extended quiescent state!
       2 locks held by cc1/63645:
        #0:  (&rq->lock){-.-.-.}, at: [<ffffffff816b39fd>] __schedule+0xed/0x9b0
        #1:  (rcu_read_lock){.+.+..}, at: [<ffffffff8109d645>] cpuacct_charge+0x5/0x1f0
      
       CPU: 1 PID: 63645 Comm: cc1 Not tainted 3.10.0-rc2+ #1 [loadavg: 40.57 27.55 13.39 25/277 64369]
       Hardware name: Gigabyte Technology Co., Ltd. GA-MA78GM-S2H/GA-MA78GM-S2H, BIOS F12a 04/23/2010
        0000000000000000 ffff88010f78fcf8 ffffffff816ae383 ffff88010f78fd28
        ffffffff810b698d ffff88011c092548 000000000023d073 ffff88011c092500
        0000000000000001 ffff88010f78fd60 ffffffff8109d7c5 ffffffff8109d645
       Call Trace:
        [<ffffffff816ae383>] dump_stack+0x19/0x1b
        [<ffffffff810b698d>] lockdep_rcu_suspicious+0xfd/0x130
        [<ffffffff8109d7c5>] cpuacct_charge+0x185/0x1f0
        [<ffffffff8109d645>] ? cpuacct_charge+0x5/0x1f0
        [<ffffffff8108dffc>] update_curr+0xec/0x240
        [<ffffffff8108f528>] put_prev_task_fair+0x228/0x480
        [<ffffffff816b3a71>] __schedule+0x161/0x9b0
        [<ffffffff816b4721>] preempt_schedule+0x51/0x80
        [<ffffffff816b4800>] ? __cond_resched_softirq+0x60/0x60
        [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
        [<ffffffff810ff3cc>] ftrace_ops_control_func+0x1dc/0x210
        [<ffffffff816be280>] ftrace_call+0x5/0x2f
        [<ffffffff816b681d>] ? retint_careful+0xb/0x2e
        [<ffffffff816b4805>] ? schedule_user+0x5/0x70
        [<ffffffff816b4805>] ? schedule_user+0x5/0x70
        [<ffffffff816b6824>] ? retint_careful+0x12/0x2e
       ------------[ cut here ]------------
      
      What happened was that the function tracer traced the schedule_user() code
      that tells RCU that the system is coming back from userspace, and to
      add the CPU back to the RCU monitoring.
      
      Because the function tracer does a preempt_disable/enable_notrace() calls
      the preempt_enable_notrace() checks the NEED_RESCHED flag. If it is set,
      then preempt_schedule() is called. But this is called before the user_exit()
      function can inform the kernel that the CPU is no longer in user mode and
      needs to be accounted for by RCU.
      
      The fix is to create a new preempt_schedule_context() that checks if
      the kernel is still in user mode and if so to switch it to kernel mode
      before calling schedule. It also switches back to user mode coming back
      from schedule in need be.
      
      The only user of this currently is the preempt_enable_notrace(), which is
      only used by the tracing subsystem.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1369423420.6828.226.camel@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
      29bb9e5a
    • J
      Revert "dw_apb_timer_of.c: Remove parts that were picoxcell-specific" · d3d8fee4
      John Stultz 提交于
      This reverts commit 55a68c23.
      
      In order to avoid a collision with dw_apb_timer changes in
      the arm-soc tree, revert this change.
      
      I'm leaving it to the arm-soc folks to sort out if they want
      to keep the other side of the collision or if they're just going
      to back it all out and try again during the next release cycle.
      Reported-by: NDinh Nguyen <dinguyen@altera.com>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      d3d8fee4
  10. 15 6月, 2013 1 次提交
    • D
      smp.h: Use local_irq_{save,restore}() in !SMP version of on_each_cpu(). · f21afc25
      David Daney 提交于
      Thanks to commit f91eb62f ("init: scream bloody murder if interrupts
      are enabled too early"), "bloody murder" is now being screamed.
      
      With a MIPS OCTEON config, we use on_each_cpu() in our
      irq_chip.irq_bus_sync_unlock() function.  This gets called in early as a
      result of the time_init() call.  Because the !SMP version of
      on_each_cpu() unconditionally enables irqs, we get:
      
          WARNING: at init/main.c:560 start_kernel+0x250/0x410()
          Interrupts were enabled early
          CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801
          Call Trace:
            show_stack+0x68/0x80
            warn_slowpath_common+0x78/0xb0
            warn_slowpath_fmt+0x38/0x48
            start_kernel+0x250/0x410
      
      Suggested fix: Do what we already do in the SMP version of
      on_each_cpu(), and use local_irq_save/local_irq_restore.  Because we
      need a flags variable, make it a static inline to avoid name space
      issues.
      
      [ Change from v1: Convert on_each_cpu to a static inline function, add
        #include <linux/irqflags.h> to avoid build breakage on some files.
      
        on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as
        on_each_cpu(), but they are not causing !SMP bugs for me, so I will
        defer changing them to a less urgent patch. ]
      Signed-off-by: NDavid Daney <david.daney@cavium.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f21afc25
  11. 13 6月, 2013 6 次提交
  12. 12 6月, 2013 2 次提交
  13. 11 6月, 2013 3 次提交
  14. 07 6月, 2013 2 次提交
  15. 06 6月, 2013 1 次提交
    • P
      arch, mm: Remove tlb_fast_mode() · 29eb7782
      Peter Zijlstra 提交于
      Since the introduction of preemptible mmu_gather TLB fast mode has been
      broken. TLB fast mode relies on there being absolutely no concurrency;
      it frees pages first and invalidates TLBs later.
      
      However now we can get concurrency and stuff goes *bang*.
      
      This patch removes all tlb_fast_mode() code; it was found the better
      option vs trying to patch the hole by entangling tlb invalidation with
      the scheduler.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Luck <tony.luck@intel.com>
      Reported-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29eb7782
  16. 05 6月, 2013 1 次提交
  17. 04 6月, 2013 1 次提交
  18. 03 6月, 2013 2 次提交
  19. 01 6月, 2013 3 次提交
  20. 31 5月, 2013 4 次提交
    • F
      kvm: Move guest entry/exit APIs to context_tracking · 521921ba
      Frederic Weisbecker 提交于
      The kvm_host.h header file doesn't handle well
      inclusion when archs don't support KVM.
      
      This results in build crashes for such archs when they
      want to implement context tracking because this subsystem
      includes kvm_host.h in order to implement the
      guest_enter/exit APIs but it doesn't handle KVM off case.
      
      To fix this, move the guest_enter()/guest_exit()
      declarations and generic implementation to the context
      tracking headers. These generic APIs actually belong to
      this subsystem, besides other domains boundary tracking
      like user_enter() et al.
      
      KVM now properly becomes a user of this library, not the
      other buggy way around.
      Reported-by: NKevin Hilman <khilman@linaro.org>
      Reviewed-by: NKevin Hilman <khilman@linaro.org>
      Tested-by: NKevin Hilman <khilman@linaro.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      521921ba
    • F
      vtime: Use consistent clocks among nohz accounting · 45eacc69
      Frederic Weisbecker 提交于
      While computing the cputime delta of dynticks CPUs,
      we are mixing up clocks of differents natures:
      
      * local_clock() which takes care of unstable clock
      sources and fix these if needed.
      
      * sched_clock() which is the weaker version of
      local_clock(). It doesn't compute any fixup in case
      of unstable source.
      
      If the clock source is stable, those two clocks are the
      same and we can safely compute the difference against
      two random points.
      
      Otherwise it results in random deltas as sched_clock()
      can randomly drift away, back or forward, from local_clock().
      
      As a consequence, some strange behaviour with unstable tsc
      has been observed such as non progressing constant zero cputime.
      (The 'top' command showing no load).
      
      Fix this by only using local_clock(), or its irq safe/remote
      equivalent, in vtime code.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Suggested-by: NMike Galbraith <efault@gmx.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      45eacc69
    • N
      target: Propigate up ->cmd_kref put return via transport_generic_free_cmd · d5ddad41
      Nicholas Bellinger 提交于
      Go ahead and propigate up the ->cmd_kref put return value from
      target_put_sess_cmd() -> transport_release_cmd() -> transport_put_cmd()
      -> transport_generic_free_cmd().
      
      This is useful for certain fabrics when determining the active I/O
      shutdown case with SCF_ACK_KREF where a final target_put_sess_cmd()
      is still required by the caller.
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      d5ddad41
    • L
      aerdrv: Move cper_print_aer() call out of interrupt context · 37448adf
      Lance Ortiz 提交于
      The following warning was seen on 3.9 when a corrected PCIe error was being
      handled by the AER subsystem.
      
      WARNING: at .../drivers/pci/search.c:214 pci_get_dev_by_id+0x8a/0x90()
      
      This occurred because a call to pci_get_domain_bus_and_slot() was added to
      cper_print_pcie() to setup for the call to cper_print_aer().  The warning
      showed up because cper_print_pcie() is called in an interrupt context and
      pci_get* functions are not supposed to be called in that context.
      
      The solution is to move the cper_print_aer() call out of the interrupt
      context and into aer_recover_work_func() to avoid any warnings when calling
      pci_get* functions.
      Signed-off-by: NLance Ortiz <lance.ortiz@hp.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      37448adf
  21. 30 5月, 2013 1 次提交