1. 08 2月, 2018 1 次提交
  2. 07 2月, 2018 3 次提交
  3. 05 2月, 2018 1 次提交
    • B
      cpufreq: Skip cpufreq resume if it's not suspended · 703cbaa6
      Bo Yan 提交于
      cpufreq_resume can be called even without preceding cpufreq_suspend.
      This can happen in following scenario:
      
          suspend_devices_and_enter
             --> dpm_suspend_start
                --> dpm_prepare
                    --> device_prepare : this function errors out
                --> dpm_suspend: this is skipped due to dpm_prepare failure
                                 this means cpufreq_suspend is skipped over
             --> goto Recover_platform, due to previous error
             --> goto Resume_devices
             --> dpm_resume_end
                 --> dpm_resume
                     --> cpufreq_resume
      
      In case schedutil is used as frequency governor, cpufreq_resume will
      eventually call sugov_start, which does following:
      
          memset(sg_cpu, 0, sizeof(*sg_cpu));
          ....
      
      This effectively erases function pointer for frequency update, causing
      crash later on. The function pointer would have been set correctly if
      subsequent cpufreq_add_update_util_hook runs successfully, but that
      function returns earlier because cpufreq_suspend was not called:
      
          if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
      		return;
      
      The fix is to check cpufreq_suspended first, if it's false, that means
      cpufreq_suspend was not called in the first place, so do not resume
      cpufreq.
      Signed-off-by: NBo Yan <byan@nvidia.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      [ rjw: Dropped printing a message ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      703cbaa6
  4. 30 1月, 2018 3 次提交
    • L
      Merge tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 7f3fdd40
      Linus Torvalds 提交于
      Pull power management updates from Rafael Wysocki:
       "This includes some infrastructure changes in the PM core, mostly
        related to integration between runtime PM and system-wide suspend and
        hibernation, plus some driver changes depending on them and fixes for
        issues in that area which have become quite apparent recently.
      
        Also included are changes making more x86-based systems use the Low
        Power Sleep S0 _DSM interface by default, which turned out to be
        necessary to handle power button wakeups from suspend-to-idle on
        Surface Pro3.
      
        On the cpufreq front we have fixes and cleanups in the core, some new
        hardware support, driver updates and the removal of some unused code
        from the CPU cooling thermal driver.
      
        Apart from this, the Operating Performance Points (OPP) framework is
        prepared to be used with power domains in the future and there is a
        usual bunch of assorted fixes and cleanups.
      
        Specifics:
      
         - Define a PM driver flag allowing drivers to request that their
           devices be left in suspend after system-wide transitions to the
           working state if possible and add support for it to the PCI bus
           type and the ACPI PM domain (Rafael Wysocki).
      
         - Make the PM core carry out optimizations for devices with driver PM
           flags set in some cases and make a few drivers set those flags
           (Rafael Wysocki).
      
         - Fix and clean up wrapper routines allowing runtime PM device
           callbacks to be re-used for system-wide PM, change the generic
           power domains (genpd) framework to stop using those routines
           incorrectly and fix up a driver depending on that behavior of genpd
           (Rafael Wysocki, Ulf Hansson, Geert Uytterhoeven).
      
         - Fix and clean up the PM core's device wakeup framework and
           re-factor system-wide PM core code related to device wakeup
           (Rafael Wysocki, Ulf Hansson, Brian Norris).
      
         - Make more x86-based systems use the Low Power Sleep S0 _DSM
           interface by default (to fix power button wakeup from
           suspend-to-idle on Surface Pro3) and add a kernel command line
           switch to tell it to ignore the system sleep blacklist in the ACPI
           core (Rafael Wysocki).
      
         - Fix a race condition related to cpufreq governor module removal and
           clean up the governor management code in the cpufreq core (Rafael
           Wysocki).
      
         - Drop the unused generic code related to the handling of the static
           power energy usage model in the CPU cooling thermal driver along
           with the corresponding documentation (Viresh Kumar).
      
         - Add mt2712 support to the Mediatek cpufreq driver (Andrew-sh
           Cheng).
      
         - Add a new operating point to the imx6ul and imx6q cpufreq drivers
           and switch the latter to using clk_bulk_get() (Anson Huang, Dong
           Aisheng).
      
         - Add support for multiple regulators to the TI cpufreq driver along
           with a new DT binding related to that and clean up that driver
           somewhat (Dave Gerlach).
      
         - Fix a powernv cpufreq driver regression leading to incorrect CPU
           frequency reporting, fix that driver to deal with non-continguous
           P-states correctly and clean it up (Gautham Shenoy, Shilpasri
           Bhat).
      
         - Add support for frequency scaling on Armada 37xx SoCs through the
           generic DT cpufreq driver (Gregory CLEMENT).
      
         - Fix error code paths in the mvebu cpufreq driver (Gregory CLEMENT).
      
         - Fix a transition delay setting regression in the longhaul cpufreq
           driver (Viresh Kumar).
      
         - Add Skylake X (server) support to the intel_pstate cpufreq driver
           and clean up that driver somewhat (Srinivas Pandruvada).
      
         - Clean up the cpufreq statistics collection code (Viresh Kumar).
      
         - Drop cluster terminology and dependency on physical_package_id from
           the PSCI driver and drop dependency on arm_big_little from the SCPI
           cpufreq driver (Sudeep Holla).
      
         - Add support for system-wide suspend and resume to the RAPL power
           capping driver and drop a redundant semicolon from it (Zhen Han,
           Luis de Bethencourt).
      
         - Make SPI domain validation (in the SCSI SPI transport driver) and
           system-wide suspend mutually exclusive as they rely on the same
           underlying mechanism and cannot be carried out at the same time
           (Bart Van Assche).
      
         - Fix the computation of the amount of memory to preallocate in the
           hibernation core and clean up one function in there (Rainer Fiebig,
           Kyungsik Lee).
      
         - Prepare the Operating Performance Points (OPP) framework for being
           used with power domains and clean up one function in it (Viresh
           Kumar, Wei Yongjun).
      
         - Clean up the generic sysfs interface for device PM (Andy
           Shevchenko).
      
         - Fix several minor issues in power management frameworks and clean
           them up a bit (Arvind Yadav, Bjorn Andersson, Geert Uytterhoeven,
           Gustavo Silva, Julia Lawall, Luis de Bethencourt, Paul Gortmaker,
           Sergey Senozhatsky, gaurav jindal).
      
         - Make it easier to disable PM via Kconfig (Mark Brown).
      
         - Clean up the cpupower and intel_pstate_tracer utilities (Doug
           Smythies, Laura Abbott)"
      
      * tag 'pm-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (89 commits)
        PCI / PM: Remove spurious semicolon
        cpufreq: scpi: remove arm_big_little dependency
        drivers: psci: remove cluster terminology and dependency on physical_package_id
        powercap: intel_rapl: Fix trailing semicolon
        dmaengine: rcar-dmac: Make DMAC reinit during system resume explicit
        PM / runtime: Allow no callbacks in pm_runtime_force_suspend|resume()
        PM / hibernate: Drop unused parameter of enough_swap
        PM / runtime: Check ignore_children in pm_runtime_need_not_resume()
        PM / runtime: Rework pm_runtime_force_suspend/resume()
        PM / genpd: Stop/start devices without pm_runtime_force_suspend/resume()
        cpufreq: powernv: Dont assume distinct pstate values for nominal and pmin
        cpufreq: intel_pstate: Add Skylake servers support
        cpufreq: intel_pstate: Replace bxt_funcs with core_funcs
        platform/x86: surfacepro3: Support for wakeup from suspend-to-idle
        ACPI / PM: Use Low Power S0 Idle on more systems
        PM / wakeup: Print warn if device gets enabled as wakeup source during sleep
        PM / domains: Don't skip driver's ->suspend|resume_noirq() callbacks
        PM / core: Propagate wakeup_path status flag in __device_suspend_late()
        PM / core: Re-structure code for clearing the direct_complete flag
        powercap: add suspend and resume mechanism for SOC power limit
        ...
      7f3fdd40
    • L
      Merge tag 'sound-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 1c1f395b
      Linus Torvalds 提交于
      Pull sound updates from Takashi Iwai:
       "The major changes in the core API side in this cycle are the still
        on-going ASoC componentization works. Other than that, only few small
        changes such as 20bit PCM format support are found.
      
        Meanwhile the rest majority of changes are for ASoC drivers:
      
         - Large cleanups of some of the TI CODEC drivers
      
         - Continued work on Intel ASoC stuff for new quirks, ACPI GPIO
           handling, Kconfigs and lots of cleanups
      
         - Refactoring of the Freescale SSI driver, as preliminary work for
           the upcoming changes
      
         - Work on ST DFSDM driver, including the required IIO patches
      
         - New drivers for Allwinner A83T, Maxim MAX89373, SocioNext UiniPhier
           EVEA Tempo Semiconductor TSCS42xx and TI PCM816x, TAS5722 and
           TAS6424 devices
      
         - Removal of dead codes for SN95031 and board drivers
      
        Last but not least, a few HD-audio and USB-audio quirks are included
        as usual, too"
      
      * tag 'sound-4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (303 commits)
        ALSA: hda - Reduce the suspend time consumption for ALC256
        ASoC: use seq_file to dump the contents of dai_list,platform_list and codec_list
        ASoC: soc-core: add missing EXPORT_SYMBOL_GPL() for snd_soc_rtdcom_lookup
        IIO: ADC: stm32-dfsdm: remove unused variable again
        ASoC: bcm2835: fix hw_params error when device is in prepared state
        ASoC: mxs-sgtl5000: Do not print error on probe deferral
        ASoC: sgtl5000: Do not print error on probe deferral
        ASoC: Intel: remove select on non-existing SND_SOC_INTEL_COMMON
        ALSA: usb-audio: Support changing input on Sound Blaster E1
        ASoC: Intel: remove second duplicated assignment to pointer 'res'
        ALSA: hda/realtek - update ALC215 depop optimize
        ALSA: hda/realtek - Support headset mode for ALC215/ALC285/ALC289
        ALSA: pcm: Fix trailing semicolon
        ASoC: add Component level .read/.write
        ASoC: cx20442: fix regression by adding back .read/.write
        ASoC: uda1380: fix regression by adding back .read/.write
        ASoC: tlv320dac33: fix regression by adding back .read/.write
        ALSA: hda - Use IS_REACHABLE() for dependency on input
        IIO: ADC: stm32-dfsdm: fix static check warning
        IIO: ADC: stm32-dfsdm: code optimization
        ...
      1c1f395b
    • L
      Merge tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · 49f9c355
      Linus Torvalds 提交于
      Pull init_task initializer cleanups from David Howells:
       "It doesn't seem useful to have the init_task in a header file rather
        than in a normal source file. We could consolidate init_task handling
        instead and expand out various macros.
      
        Here's a series of patches that consolidate init_task handling:
      
         (1) Make THREAD_SIZE available to vmlinux.lds for cris, hexagon and
             openrisc.
      
         (2) Alter the INIT_TASK_DATA linker script macro to set
             init_thread_union and init_stack rather than defining these in C.
      
             Insert init_task and init_thread_into into the init_stack area in
             the linker script as appropriate to the configuration, with
             different section markers so that they end up correctly ordered.
      
             We can then get merge ia64's init_task.c into the main one.
      
             We then have a bunch of single-use INIT_*() macros that seem only
             to be macros because they used to be used per-arch. We can then
             expand these in place of the user and get rid of a few lines and
             a lot of backslashes.
      
         (3) Expand INIT_TASK() in place.
      
         (4) Expand in place various small INIT_*() macros that are defined
             conditionally. Expand them and surround them by #if[n]def/#endif
             in the .c file as it takes fewer lines.
      
         (5) Expand INIT_SIGNALS() and INIT_SIGHAND() in place.
      
         (6) Expand INIT_STRUCT_PID in place.
      
        These macros can then be discarded"
      
      * tag 'init_task-20180117' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
        Expand INIT_STRUCT_PID and remove
        Expand the INIT_SIGNALS and INIT_SIGHAND macros and remove
        Expand various INIT_* macros and remove
        Expand INIT_TASK() in init/init_task.c and remove
        Construct init thread stack in the linker script rather than by union
        openrisc: Make THREAD_SIZE available to vmlinux.lds
        hexagon: Make THREAD_SIZE available to vmlinux.lds
        cris: Make THREAD_SIZE available to vmlinux.lds
      49f9c355
  5. 29 1月, 2018 7 次提交
  6. 28 1月, 2018 1 次提交
  7. 27 1月, 2018 8 次提交
    • T
      hrtimer: Reset hrtimer cpu base proper on CPU hotplug · d5421ea4
      Thomas Gleixner 提交于
      The hrtimer interrupt code contains a hang detection and mitigation
      mechanism, which prevents that a long delayed hrtimer interrupt causes a
      continous retriggering of interrupts which prevent the system from making
      progress. If a hang is detected then the timer hardware is programmed with
      a certain delay into the future and a flag is set in the hrtimer cpu base
      which prevents newly enqueued timers from reprogramming the timer hardware
      prior to the chosen delay. The subsequent hrtimer interrupt after the delay
      clears the flag and resumes normal operation.
      
      If such a hang happens in the last hrtimer interrupt before a CPU is
      unplugged then the hang_detected flag is set and stays that way when the
      CPU is plugged in again. At that point the timer hardware is not armed and
      it cannot be armed because the hang_detected flag is still active, so
      nothing clears that flag. As a consequence the CPU does not receive hrtimer
      interrupts and no timers expire on that CPU which results in RCU stalls and
      other malfunctions.
      
      Clear the flag along with some other less critical members of the hrtimer
      cpu base to ensure starting from a clean state when a CPU is plugged in.
      
      Thanks to Paul, Sebastian and Anna-Maria for their help to get down to the
      root cause of that hard to reproduce heisenbug. Once understood it's
      trivial and certainly justifies a brown paperbag.
      
      Fixes: 41d2e494 ("hrtimer: Tune hrtimer_interrupt hang logic")
      Reported-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sebastian Sewior <bigeasy@linutronix.de>
      Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801261447590.2067@nanos
      d5421ea4
    • H
      x86: Mark hpa as a "Designated Reviewer" for the time being · 8a95b74d
      H. Peter Anvin 提交于
      Due to some unfortunate events, I have not been directly involved in
      the x86 kernel patch flow for a while now.  I have also not been able
      to ramp back up by now like I had hoped to, and after reviewing what I
      will need to work on both internally at Intel and elsewhere in the near
      term, it is clear that I am not going to be able to ramp back up until
      late 2018 at the very earliest.
      
      It is not acceptable to not recognize that this load is currently
      taken by Ingo and Thomas without my direct participation, so I mark
      myself as R: (designated reviewer) rather than M: (maintainer) until
      further notice.  This is in fact recognizing the de facto situation
      for the past few years.
      
      I have obviously no intention of going away, and I will do everything
      within my power to improve Linux on x86 and x86 for Linux.  This,
      however, puts credit where it is due and reflects a change of focus.
      
      This patch also removes stale entries for portions of the x86
      architecture which have not been maintained separately from arch/x86
      for a long time.  If there is a reason to re-introduce them then that
      can happen later.
      Signed-off-by: NH. Peter Anvin <h.peter.anvin@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bruce Schlobohm <bruce.schlobohm@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180125195934.5253-1-hpa@zytor.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8a95b74d
    • L
      Merge tag 'riscv-for-linus-4.15-maintainers' of... · c4e0ca7f
      Linus Torvalds 提交于
      Merge tag 'riscv-for-linus-4.15-maintainers' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
      
      Pull RISC-V update from Palmer Dabbelt:
       "RISC-V: We have a new mailing list and git repo!
      
        Sorry to send something essentially as late as possible (Friday after
        an rc9), but we managed to get a mailing list for the RISC-V Linux
        port. We've been using patches@groups.riscv.org for a while, but that
        list has some problems (it's Google Groups and it's shared over all
        RISC-V software projects). The new infaread.org list is much better.
        We just got it on Wednesday but I used it a bit on Thursday to shake
        out all the configuration problems and it appears to be in working
        order.
      
        When I updated the mailing list I noticed that the MAINTAINERS file
        was pointing to our github repo, but now that we have a kernel.org
        repo I'd like to point to that instead so I changed that as well.
        We'll be centralizing all RISC-V Linux related development here as
        that seems to be the saner way to go about it.
      
        I can understand if it's too late to get this into 4.15, but given
        that it's not a code change I was hoping it'd still be OK. It would be
        nice to have the new mailing list and git repo in the release tarballs
        so when people start to find bugs they'll get to the right place"
      
      * tag 'riscv-for-linus-4.15-maintainers' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
        Update the RISC-V MAINTAINERS file
      c4e0ca7f
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · ba804bb4
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) The per-network-namespace loopback device, and thus its namespace,
          can have its teardown deferred for a long time if a kernel created
          TCP socket closes and the namespace is exiting meanwhile. The kernel
          keeps trying to finish the close sequence until it times out (which
          takes quite some time).
      
          Fix this by forcing the socket closed in this situation, from Dan
          Streetman.
      
       2) Fix regression where we're trying to invoke the update_pmtu method
          on route types (in this case metadata tunnel routes) that don't
          implement the dst_ops method. Fix from Nicolas Dichtel.
      
       3) Fix long standing memory corruption issues in r8169 driver by
          performing the chip statistics DMA programming more correctly. From
          Francois Romieu.
      
       4) Handle local broadcast sends over VRF routes properly, from David
          Ahern.
      
       5) Don't refire the DCCP CCID2 timer endlessly, otherwise the socket
          can never be released. From Alexey Kodanev.
      
       6) Set poll flags properly in VSOCK protocol layer, from Stefan
          Hajnoczi.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        VSOCK: set POLLOUT | POLLWRNORM for TCP_CLOSING
        dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state
        net: vrf: Add support for sends to local broadcast address
        r8169: fix memory corruption on retrieval of hardware statistics.
        net: don't call update_pmtu unconditionally
        net: tcp: close sock if net namespace is exiting
      ba804bb4
    • L
      Merge tag 'drm-fixes-for-v4.15-rc10-2' of git://people.freedesktop.org/~airlied/linux · db218549
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "A fairly urgent nouveau regression fix for broken irqs across
        suspend/resume came in. This was broken before but a patch in 4.15 has
        made it much more obviously broken and now s/r fails a lot more often.
      
        The fix removes freeing the irq across s/r which never should have
        been done anyways.
      
        Also two vc4 fixes for a NULL deference and some misrendering /
        flickering on screen"
      
      * tag 'drm-fixes-for-v4.15-rc10-2' of git://people.freedesktop.org/~airlied/linux:
        drm/nouveau: Move irq setup/teardown to pci ctor/dtor
        drm/vc4: Fix NULL pointer dereference in vc4_save_hang_state()
        drm/vc4: Flush the caches before the bin jobs, as well.
      db218549
    • S
      VSOCK: set POLLOUT | POLLWRNORM for TCP_CLOSING · ba3169fc
      Stefan Hajnoczi 提交于
      select(2) with wfds but no rfds must return when the socket is shut down
      by the peer.  This way userspace notices socket activity and gets -EPIPE
      from the next write(2).
      
      Currently select(2) does not return for virtio-vsock when a SEND+RCV
      shutdown packet is received.  This is because vsock_poll() only sets
      POLLOUT | POLLWRNORM for TCP_CLOSE, not the TCP_CLOSING state that the
      socket is in when the shutdown is received.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba3169fc
    • A
      dccp: don't restart ccid2_hc_tx_rto_expire() if sk in closed state · dd5684ec
      Alexey Kodanev 提交于
      ccid2_hc_tx_rto_expire() timer callback always restarts the timer
      again and can run indefinitely (unless it is stopped outside), and after
      commit 120e9dab ("dccp: defer ccid_hc_tx_delete() at dismantle time"),
      which moved ccid_hc_tx_delete() (also includes sk_stop_timer()) from
      dccp_destroy_sock() to sk_destruct(), this started to happen quite often.
      The timer prevents releasing the socket, as a result, sk_destruct() won't
      be called.
      
      Found with LTP/dccp_ipsec tests running on the bonding device,
      which later couldn't be unloaded after the tests were completed:
      
        unregister_netdevice: waiting for bond0 to become free. Usage count = 148
      
      Fixes: 2a91aa39 ("[DCCP] CCID2: Initial CCID2 (TCP-Like) implementation")
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd5684ec
    • P
      Update the RISC-V MAINTAINERS file · 6572cc2b
      Palmer Dabbelt 提交于
      Now that we're upstream in Linux we've been able to make some
      infrastructure changes so our port works a bit more like other ports.
      Specifically:
      
      * We now have a mailing list specific to the RISC-V Linux port, hosted
        at lists.infreadead.org.
      * We now have a kernel.org git tree where work on our port is
        coordinated.
      
      This patch changes the RISC-V maintainers entry to reflect these new
      bits of infrastructure.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NPalmer Dabbelt <palmer@sifive.com>
      6572cc2b
  8. 26 1月, 2018 11 次提交
  9. 25 1月, 2018 5 次提交
    • D
      net: tcp: close sock if net namespace is exiting · 4ee806d5
      Dan Streetman 提交于
      When a tcp socket is closed, if it detects that its net namespace is
      exiting, close immediately and do not wait for FIN sequence.
      
      For normal sockets, a reference is taken to their net namespace, so it will
      never exit while the socket is open.  However, kernel sockets do not take a
      reference to their net namespace, so it may begin exiting while the kernel
      socket is still open.  In this case if the kernel socket is a tcp socket,
      it will stay open trying to complete its close sequence.  The sock's dst(s)
      hold a reference to their interface, which are all transferred to the
      namespace's loopback interface when the real interfaces are taken down.
      When the namespace tries to take down its loopback interface, it hangs
      waiting for all references to the loopback interface to release, which
      results in messages like:
      
      unregister_netdevice: waiting for lo to become free. Usage count = 1
      
      These messages continue until the socket finally times out and closes.
      Since the net namespace cleanup holds the net_mutex while calling its
      registered pernet callbacks, any new net namespace initialization is
      blocked until the current net namespace finishes exiting.
      
      After this change, the tcp socket notices the exiting net namespace, and
      closes immediately, releasing its dst(s) and their reference to the
      loopback interface, which lets the net namespace continue exiting.
      
      Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1711407
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=97811Signed-off-by: NDan Streetman <ddstreet@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ee806d5
    • P
      perf/x86: Fix perf,x86,cpuhp deadlock · efe951d3
      Peter Zijlstra 提交于
      More lockdep gifts, a 5-way lockup race:
      
      	perf_event_create_kernel_counter()
      	  perf_event_alloc()
      	    perf_try_init_event()
      	      x86_pmu_event_init()
      		__x86_pmu_event_init()
      		  x86_reserve_hardware()
       #0		    mutex_lock(&pmc_reserve_mutex);
      		    reserve_ds_buffer()
       #1		      get_online_cpus()
      
      	perf_event_release_kernel()
      	  _free_event()
      	    hw_perf_event_destroy()
      	      x86_release_hardware()
       #0		mutex_lock(&pmc_reserve_mutex)
      		release_ds_buffer()
       #1		  get_online_cpus()
      
       #1	do_cpu_up()
      	  perf_event_init_cpu()
       #2	    mutex_lock(&pmus_lock)
       #3	    mutex_lock(&ctx->mutex)
      
      	sys_perf_event_open()
      	  mutex_lock_double()
       #3	    mutex_lock(ctx->mutex)
       #4	    mutex_lock_nested(ctx->mutex, 1);
      
      	perf_try_init_event()
       #4	  mutex_lock_nested(ctx->mutex, 1)
      	  x86_pmu_event_init()
      	    intel_pmu_hw_config()
      	      x86_add_exclusive()
       #0		mutex_lock(&pmc_reserve_mutex)
      
      Fix it by using ordering constructs instead of locking.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      efe951d3
    • P
      perf/core: Fix ctx::mutex deadlock · 0c7296ca
      Peter Zijlstra 提交于
      Lockdep noticed the following 3-way lockup scenario:
      
      	sys_perf_event_open()
      	  perf_event_alloc()
      	    perf_try_init_event()
       #0	      ctx = perf_event_ctx_lock_nested(1)
      	      perf_swevent_init()
      		swevent_hlist_get()
       #1		  mutex_lock(&pmus_lock)
      
      	perf_event_init_cpu()
       #1	  mutex_lock(&pmus_lock)
       #2	  mutex_lock(&ctx->mutex)
      
      	sys_perf_event_open()
      	  mutex_lock_double()
       #2	   mutex_lock()
       #0	   mutex_lock_nested()
      
      And while we need that perf_event_ctx_lock_nested() for HW PMUs such
      that they can iterate the sibling list, trying to match it to the
      available counters, the software PMUs need do no such thing. Exclude
      them.
      
      In particular the swevent triggers the above invertion, while the
      tpevent PMU triggers a more elaborate one through their event_mutex.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0c7296ca
    • P
      perf/core: Fix another perf,trace,cpuhp lock inversion · 43fa87f7
      Peter Zijlstra 提交于
      Lockdep noticed the following 3-way lockup race:
      
              perf_trace_init()
       #0       mutex_lock(&event_mutex)
                perf_trace_event_init()
                  perf_trace_event_reg()
                    tp_event->class->reg() := tracepoint_probe_register
       #1              mutex_lock(&tracepoints_mutex)
                        trace_point_add_func()
       #2                  static_key_enable()
      
       #2	do_cpu_up()
      	  perf_event_init_cpu()
       #3	    mutex_lock(&pmus_lock)
       #4	    mutex_lock(&ctx->mutex)
      
      	perf_ioctl()
       #4	  ctx = perf_event_ctx_lock()
      	  _perf_iotcl()
      	    ftrace_profile_set_filter()
       #0	      mutex_lock(&event_mutex)
      
      Fudge it for now by noting that the tracepoint state does not depend
      on the event <-> context relation. Ugly though :/
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      43fa87f7
    • P
      perf/core: Fix lock inversion between perf,trace,cpuhp · 82d94856
      Peter Zijlstra 提交于
      Lockdep gifted us with noticing the following 4-way lockup scenario:
      
              perf_trace_init()
       #0       mutex_lock(&event_mutex)
                perf_trace_event_init()
                  perf_trace_event_reg()
                    tp_event->class->reg() := tracepoint_probe_register
       #1             mutex_lock(&tracepoints_mutex)
                        trace_point_add_func()
       #2                 static_key_enable()
      
       #2     do_cpu_up()
                perf_event_init_cpu()
       #3         mutex_lock(&pmus_lock)
       #4         mutex_lock(&ctx->mutex)
      
              perf_event_task_disable()
                mutex_lock(&current->perf_event_mutex)
       #4       ctx = perf_event_ctx_lock()
       #5       perf_event_for_each_child()
      
              do_exit()
                task_work_run()
                  __fput()
                    perf_release()
                      perf_event_release_kernel()
       #4               mutex_lock(&ctx->mutex)
       #5               mutex_lock(&event->child_mutex)
                        free_event()
                          _free_event()
                            event->destroy() := perf_trace_destroy
       #0                     mutex_lock(&event_mutex);
      
      Fix that by moving the free_event() out from under the locks.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      82d94856