1. 31 3月, 2018 2 次提交
    • T
      net/dim: Fix int overflow · f97c3dc3
      Tal Gilboa 提交于
      When calculating difference between samples, the values
      are multiplied by 100. Large values may cause int overflow
      when multiplied (usually on first iteration).
      Fixed by forcing 100 to be of type unsigned long.
      
      Fixes: 4c4dbb4a ("net/mlx5e: Move dynamic interrupt coalescing code to include/linux")
      Signed-off-by: NTal Gilboa <talgi@mellanox.com>
      Reviewed-by: NAndy Gospodarek <gospo@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f97c3dc3
    • T
      vlan: Fix vlan insertion for packets without ethernet header · c769accd
      Toshiaki Makita 提交于
      In some situation vlan packets do not have ethernet headers. One example
      is packets from tun devices. Users can specify vlan protocol in tun_pi
      field instead of IP protocol. When we have a vlan device with reorder_hdr
      disabled on top of the tun device, such packets from tun devices are
      untagged in skb_vlan_untag() and vlan headers will be inserted back in
      vlan_insert_inner_tag().
      
      vlan_insert_inner_tag() however did not expect packets without ethernet
      headers, so in such a case size argument for memmove() underflowed.
      
      We don't need to copy headers for packets which do not have preceding
      headers of vlan headers, so skip memmove() in that case.
      Also don't write vlan protocol in skb->data when it does not have enough
      room for it.
      
      Fixes: cbe7128c ("vlan: Fix out of order vlan headers with reorder header off")
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c769accd
  2. 29 3月, 2018 1 次提交
  3. 27 3月, 2018 1 次提交
  4. 23 3月, 2018 1 次提交
    • D
      Revert "mm: page_alloc: skip over regions of invalid pfns where possible" · f59f1caf
      Daniel Vacek 提交于
      This reverts commit b92df1de ("mm: page_alloc: skip over regions of
      invalid pfns where possible").  The commit is meant to be a boot init
      speed up skipping the loop in memmap_init_zone() for invalid pfns.
      
      But given some specific memory mapping on x86_64 (or more generally
      theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the
      implementation also skips valid pfns which is plain wrong and causes
      'kernel BUG at mm/page_alloc.c:1389!'
      
        crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
        kernel BUG at mm/page_alloc.c:1389!
        invalid opcode: 0000 [#1] SMP
        --
        RIP: 0010: move_freepages+0x15e/0x160
        --
        Call Trace:
          move_freepages_block+0x73/0x80
          __rmqueue+0x263/0x460
          get_page_from_freelist+0x7e1/0x9e0
          __alloc_pages_nodemask+0x176/0x420
        --
      
        crash> page_init_bug -v | grep RAM
        <struct resource 0xffff88067fffd2f8>          1000 -        9bfff       System RAM (620.00 KiB)
        <struct resource 0xffff88067fffd3a0>        100000 -     430bffff       System RAM (  1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
        <struct resource 0xffff88067fffd410>      4b0c8000 -     4bf9cfff       System RAM ( 14.83 MiB = 15188.00 KiB)
        <struct resource 0xffff88067fffd480>      4bfac000 -     646b1fff       System RAM (391.02 MiB = 400408.00 KiB)
        <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
        <struct resource 0xffff88067fffd640>     100000000 -    67fffffff       System RAM ( 22.00 GiB)
      
        crash> page_init_bug | head -6
        <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
        <struct page 0xffffea0001ede200>   1fffff00000000  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
        <struct page 0xffffea0001ede200>       505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
        <struct page 0xffffea0001ed8000>                0  0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA               1       4095
        <struct page 0xffffea0001edffc0>   1fffff00000400  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
        BUG, zones differ!
      
        crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
              PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
        ffffea0001e00000  78000000                0        0  0 0
        ffffea0001ed7fc0  7b5ff000                0        0  0 0
        ffffea0001ed8000  7b600000                0        0  0 0       <<<<
        ffffea0001ede1c0  7b787000                0        0  0 0
        ffffea0001ede200  7b788000                0        0  1 1fffff00000000
      
      Link: http://lkml.kernel.org/r/20180316143855.29838-1-neelx@redhat.com
      Fixes: b92df1de ("mm: page_alloc: skip over regions of invalid pfns where possible")
      Signed-off-by: NDaniel Vacek <neelx@redhat.com>
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f59f1caf
  5. 22 3月, 2018 1 次提交
  6. 21 3月, 2018 1 次提交
  7. 20 3月, 2018 3 次提交
    • J
      jump_label: Disable jump labels in __exit code · 578ae447
      Josh Poimboeuf 提交于
      With the following commit:
      
        33352244 ("jump_label: Explicitly disable jump labels in __init code")
      
      ... we explicitly disabled jump labels in __init code, so they could be
      detected and not warned about in the following commit:
      
        dc1dd184 ("jump_label: Warn on failed jump_label patching attempt")
      
      In-kernel __exit code has the same issue.  It's never used, so it's
      freed along with the rest of initmem.  But jump label entries in __exit
      code aren't explicitly disabled, so we get the following warning when
      enabling pr_debug() in __exit code:
      
        can't patch jump_label at dmi_sysfs_exit+0x0/0x2d
        WARNING: CPU: 0 PID: 22572 at kernel/jump_label.c:376 __jump_label_update+0x9d/0xb0
      
      Fix the warning by disabling all jump labels in initmem (which includes
      both __init and __exit code).
      Reported-and-tested-by: NLi Wang <liwang@redhat.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: dc1dd184 ("jump_label: Warn on failed jump_label patching attempt")
      Link: http://lkml.kernel.org/r/7121e6e595374f06616c505b6e690e275c0054d1.1521483452.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      578ae447
    • P
      sched/fair: Add util_est on top of PELT · 7f65ea42
      Patrick Bellasi 提交于
      The util_avg signal computed by PELT is too variable for some use-cases.
      For example, a big task waking up after a long sleep period will have its
      utilization almost completely decayed. This introduces some latency before
      schedutil will be able to pick the best frequency to run a task.
      
      The same issue can affect task placement. Indeed, since the task
      utilization is already decayed at wakeup, when the task is enqueued in a
      CPU, this can result in a CPU running a big task as being temporarily
      represented as being almost empty. This leads to a race condition where
      other tasks can be potentially allocated on a CPU which just started to run
      a big task which slept for a relatively long period.
      
      Moreover, the PELT utilization of a task can be updated every [ms], thus
      making it a continuously changing value for certain longer running
      tasks. This means that the instantaneous PELT utilization of a RUNNING
      task is not really meaningful to properly support scheduler decisions.
      
      For all these reasons, a more stable signal can do a better job of
      representing the expected/estimated utilization of a task/cfs_rq.
      Such a signal can be easily created on top of PELT by still using it as
      an estimator which produces values to be aggregated on meaningful
      events.
      
      This patch adds a simple implementation of util_est, a new signal built on
      top of PELT's util_avg where:
      
          util_est(task) = max(task::util_avg, f(task::util_avg@dequeue))
      
      This allows to remember how big a task has been reported by PELT in its
      previous activations via f(task::util_avg@dequeue), which is the new
      _task_util_est(struct task_struct*) function added by this patch.
      
      If a task should change its behavior and it runs longer in a new
      activation, after a certain time its util_est will just track the
      original PELT signal (i.e. task::util_avg).
      
      The estimated utilization of cfs_rq is defined only for root ones.
      That's because the only sensible consumer of this signal are the
      scheduler and schedutil when looking for the overall CPU utilization
      due to FAIR tasks.
      
      For this reason, the estimated utilization of a root cfs_rq is simply
      defined as:
      
          util_est(cfs_rq) = max(cfs_rq::util_avg, cfs_rq::util_est::enqueued)
      
      where:
      
          cfs_rq::util_est::enqueued = sum(_task_util_est(task))
                                       for each RUNNABLE task on that root cfs_rq
      
      It's worth noting that the estimated utilization is tracked only for
      objects of interests, specifically:
      
       - Tasks: to better support tasks placement decisions
       - root cfs_rqs: to better support both tasks placement decisions as
                       well as frequencies selection
      Signed-off-by: NPatrick Bellasi <patrick.bellasi@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Joel Fernandes <joelaf@google.com>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Rafael J . Wysocki <rafael.j.wysocki@intel.com>
      Cc: Steve Muckle <smuckle@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Todd Kjos <tkjos@android.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Link: http://lkml.kernel.org/r/20180309095245.11071-2-patrick.bellasi@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7f65ea42
    • T
      percpu_ref: Update doc to dissuade users from depending on internal RCU grace periods · b3a5d111
      Tejun Heo 提交于
      percpu_ref internally uses sched-RCU to implement the percpu -> atomic
      mode switching and the documentation suggested that this could be
      depended upon.  This doesn't seem like a good idea.
      
      * percpu_ref uses sched-RCU which has different grace periods regular
        RCU.  Users may combine percpu_ref with regular RCU usage and
        incorrectly believe that regular RCU grace periods are performed by
        percpu_ref.  This can lead to, for example, use-after-free due to
        premature freeing.
      
      * percpu_ref has a grace period when switching from percpu to atomic
        mode.  It doesn't have one between the last put and release.  This
        distinction is subtle and can lead to surprising bugs.
      
      * percpu_ref allows starting in and switching to atomic mode manually
        for debugging and other purposes.  This means that there may not be
        any grace periods from kill to release.
      
      This patch makes it clear that the grace periods are percpu_ref's
      internal implementation detail and can't be depended upon by the
      users.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b3a5d111
  8. 17 3月, 2018 1 次提交
  9. 16 3月, 2018 3 次提交
    • P
      perf: Fix sibling iteration · 7eb709f2
      Peter Zijlstra 提交于
      Mark noticed that the change to sibling_list changed some iteration
      semantics; because previously we used group_list as list entry,
      sibling events would always have an empty sibling_list.
      
      But because we now use sibling_list for both list head and list entry,
      siblings will report as having siblings.
      
      Fix this with a custom for_each_sibling_event() iterator.
      
      Fixes: 8343aae6 ("perf/core: Remove perf_event::group_entry")
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Suggested-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: vincent.weaver@maine.edu
      Cc: alexander.shishkin@linux.intel.com
      Cc: torvalds@linux-foundation.org
      Cc: alexey.budankov@linux.intel.com
      Cc: valery.cherepennikov@intel.com
      Cc: eranian@google.com
      Cc: acme@redhat.com
      Cc: linux-tip-commits@vger.kernel.org
      Cc: davidcc@google.com
      Cc: kan.liang@intel.com
      Cc: Dmitry.Prohorov@intel.com
      Cc: jolsa@redhat.com
      Link: https://lkml.kernel.org/r/20180315170129.GX4043@hirez.programming.kicks-ass.net
      7eb709f2
    • T
      vlan: Fix out of order vlan headers with reorder header off · cbe7128c
      Toshiaki Makita 提交于
      With reorder header off, received packets are untagged in skb_vlan_untag()
      called from within __netif_receive_skb_core(), and later the tag will be
      inserted back in vlan_do_receive().
      
      This caused out of order vlan headers when we create a vlan device on top
      of another vlan device, because vlan_do_receive() inserts a tag as the
      outermost vlan tag. E.g. the outer tag is first removed in skb_vlan_untag()
      and inserted back in vlan_do_receive(), then the inner tag is next removed
      and inserted back as the outermost tag.
      
      This patch fixes the behaviour by inserting the inner tag at the right
      position.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbe7128c
    • E
      fs: Teach path_connected to handle nfs filesystems with multiple roots. · 95dd7758
      Eric W. Biederman 提交于
      On nfsv2 and nfsv3 the nfs server can export subsets of the same
      filesystem and report the same filesystem identifier, so that the nfs
      client can know they are the same filesystem.  The subsets can be from
      disjoint directory trees.  The nfsv2 and nfsv3 filesystems provides no
      way to find the common root of all directory trees exported form the
      server with the same filesystem identifier.
      
      The practical result is that in struct super s_root for nfs s_root is
      not necessarily the root of the filesystem.  The nfs mount code sets
      s_root to the root of the first subset of the nfs filesystem that the
      kernel mounts.
      
      This effects the dcache invalidation code in generic_shutdown_super
      currently called shrunk_dcache_for_umount and that code for years
      has gone through an additional list of dentries that might be dentry
      trees that need to be freed to accomodate nfs.
      
      When I wrote path_connected I did not realize nfs was so special, and
      it's hueristic for avoiding calling is_subdir can fail.
      
      The practical case where this fails is when there is a move of a
      directory from the subtree exposed by one nfs mount to the subtree
      exposed by another nfs mount.  This move can happen either locally or
      remotely.  With the remote case requiring that the move directory be cached
      before the move and that after the move someone walks the path
      to where the move directory now exists and in so doing causes the
      already cached directory to be moved in the dcache through the magic
      of d_splice_alias.
      
      If someone whose working directory is in the move directory or a
      subdirectory and now starts calling .. from the initial mount of nfs
      (where s_root == mnt_root), then path_connected as a heuristic will
      not bother with the is_subdir check.  As s_root really is not the root
      of the nfs filesystem this heuristic is wrong, and the path may
      actually not be connected and path_connected can fail.
      
      The is_subdir function might be cheap enough that we can call it
      unconditionally.  Verifying that will take some benchmarking and
      the result may not be the same on all kernels this fix needs
      to be backported to.  So I am avoiding that for now.
      
      Filesystems with snapshots such as nilfs and btrfs do something
      similar.  But as the directory tree of the snapshots are disjoint
      from one another and from the main directory tree rename won't move
      things between them and this problem will not occur.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Fixes: 397d425d ("vfs: Test for and handle paths that are unreachable from their mnt_root")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      95dd7758
  10. 15 3月, 2018 2 次提交
    • M
      KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid · 16ca6a60
      Marc Zyngier 提交于
      The vgic code is trying to be clever when injecting GICv2 SGIs,
      and will happily populate LRs with the same interrupt number if
      they come from multiple vcpus (after all, they are distinct
      interrupt sources).
      
      Unfortunately, this is against the letter of the architecture,
      and the GICv2 architecture spec says "Each valid interrupt stored
      in the List registers must have a unique VirtualID for that
      virtual CPU interface.". GICv3 has similar (although slightly
      ambiguous) restrictions.
      
      This results in guests locking up when using GICv2-on-GICv3, for
      example. The obvious fix is to stop trying so hard, and inject
      a single vcpu per SGI per guest entry. After all, pending SGIs
      with multiple source vcpus are pretty rare, and are mostly seen
      in scenario where the physical CPUs are severely overcomitted.
      
      But as we now only inject a single instance of a multi-source SGI per
      vcpu entry, we may delay those interrupts for longer than strictly
      necessary, and run the risk of injecting lower priority interrupts
      in the meantime.
      
      In order to address this, we adopt a three stage strategy:
      - If we encounter a multi-source SGI in the AP list while computing
        its depth, we force the list to be sorted
      - When populating the LRs, we prevent the injection of any interrupt
        of lower priority than that of the first multi-source SGI we've
        injected.
      - Finally, the injection of a multi-source SGI triggers the request
        of a maintenance interrupt when there will be no pending interrupt
        in the LRs (HCR_NPIE).
      
      At the point where the last pending interrupt in the LRs switches
      from Pending to Active, the maintenance interrupt will be delivered,
      allowing us to add the remaining SGIs using the same process.
      
      Cc: stable@vger.kernel.org
      Fixes: 0919e84c ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
      Acked-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      16ca6a60
    • E
      net: use skb_to_full_sk() in skb_update_prio() · 4dcb31d4
      Eric Dumazet 提交于
      Andrei Vagin reported a KASAN: slab-out-of-bounds error in
      skb_update_prio()
      
      Since SYNACK might be attached to a request socket, we need to
      get back to the listener socket.
      Since this listener is manipulated without locks, add const
      qualifiers to sock_cgroup_prioidx() so that the const can also
      be used in skb_update_prio()
      
      Also add the const qualifier to sock_cgroup_classid() for consistency.
      
      Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4dcb31d4
  11. 14 3月, 2018 4 次提交
    • L
      vga_switcheroo: Use device link for HDA controller · 07f4f97d
      Lukas Wunner 提交于
      Back in 2013, runtime PM for GPUs with integrated HDA controller was
      introduced with commits 0d69704a ("gpu/vga_switcheroo: add driver
      control power feature. (v3)") and 246efa4a ("snd/hda: add runtime
      suspend/resume on optimus support (v4)").
      
      Briefly, the idea was that the HDA controller is forced on and off in
      unison with the GPU.
      
      The original code is mostly still in place even though it was never a
      100% perfect solution:  E.g. on access to the HDA controller, the GPU
      is powered up via vga_switcheroo_runtime_resume_hdmi_audio() but there
      are no provisions to keep it resumed until access to the HDA controller
      has ceased:  The GPU autosuspends after 5 seconds, rendering the HDA
      controller inaccessible.
      
      Additionally, a kludge is required when hda_intel.c probes:  It has to
      check whether the GPU is powered down (check_hdmi_disabled()) and defer
      probing if so.
      
      However in the meantime (in v4.10) the driver core has gained a feature
      called device links which promises to solve such issues in a clean way:
      It allows us to declare a dependency from the HDA controller (consumer)
      to the GPU (supplier).  The PM core then automagically ensures that the
      GPU is runtime resumed as long as the HDA controller's ->probe hook is
      executed and whenever the HDA controller is accessed.
      
      By default, the HDA controller has a dependency on its parent, a PCIe
      Root Port.  Adding a device link creates another dependency on its
      sibling:
      
                                  PCIe Root Port
                                   ^          ^
                                   |          |
                                   |          |
                                  HDA  ===>  GPU
      
      The device link is not only used for runtime PM, it also guarantees that
      on system sleep, the HDA controller suspends before the GPU and resumes
      after the GPU, and on system shutdown the HDA controller's ->shutdown
      hook is executed before the one of the GPU.  It is a complete solution.
      
      Using this functionality is as simple as calling device_link_add(),
      which results in a dmesg entry like this:
      
              pci 0000:01:00.1: Linked as a consumer to 0000:01:00.0
      
      The code for the GPU-governed audio power management can thus be removed
      (except where it's still needed for legacy manual power control).
      
      The device link is added in a PCI quirk rather than in hda_intel.c.
      It is therefore legal for the GPU to runtime suspend to D3cold even if
      the HDA controller is not bound to a driver or if CONFIG_SND_HDA_INTEL
      is not enabled, for accesses to the HDA controller will cause the GPU to
      wake up regardless if they're occurring outside of hda_intel.c (think
      config space readout via sysfs).
      
      Contrary to the previous implementation, the HDA controller's power
      state is now self-governed, rather than GPU-governed, whereas the GPU's
      power state is no longer fully self-governed.  (The HDA controller needs
      to runtime suspend before the GPU can.)
      
      It is thus crucial that runtime PM is always activated on the HDA
      controller even if CONFIG_SND_HDA_POWER_SAVE_DEFAULT is set to 0 (which
      is the default), lest the GPU stays awake.  This is achieved by setting
      the auto_runtime_pm flag on every codec and the AZX_DCAPS_PM_RUNTIME
      flag on the HDA controller.
      
      A side effect is that power consumption might be reduced if the GPU is
      in use but the HDA controller is not, because the HDA controller is now
      allowed to go to D3hot.  Before, it was forced to stay in D0 as long as
      the GPU was in use.  (There is no reduction in power consumption on my
      Nvidia GK107, but there might be on other chips.)
      
      The code paths for legacy manual power control are adjusted such that
      runtime PM is disabled during power off, thereby preventing the PM core
      from resuming the HDA controller.
      
      Note that the device link is not only added on vga_switcheroo capable
      systems, but for *any* GPU with integrated HDA controller.  The idea is
      that the HDA controller streams audio via connectors located on the GPU,
      so the GPU needs to be on for the HDA controller to do anything useful.
      
      This commit implicitly fixes an unbalanced runtime PM ref upon unbind of
      hda_intel.c:  On ->probe, a runtime PM ref was previously released under
      the condition "azx_has_pm_runtime(chip) || hda->use_vga_switcheroo", but
      on ->remove a runtime PM ref was only acquired under the first of those
      conditions.  Thus, binding and unbinding the driver twice on a
      vga_switcheroo capable system caused the runtime PM refcount to drop
      below zero.  The issue is resolved because the AZX_DCAPS_PM_RUNTIME flag
      is now always set if use_vga_switcheroo is true.
      
      For more information on device links please refer to:
      https://www.kernel.org/doc/html/latest/driver-api/device_link.html
      Documentation/driver-api/device_link.rst
      
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NTakashi Iwai <tiwai@suse.de>
      Reviewed-by: NPeter Wu <peter@lekensteyn.nl>
      Tested-by: Kai Heng Feng <kai.heng.feng@canonical.com> # AMD PowerXpress
      Tested-by: Mike Lothian <mike@fireburn.co.uk>          # AMD PowerXpress
      Tested-by: Denis Lisov <dennis.lissov@gmail.com>       # Nvidia Optimus
      Tested-by: Peter Wu <peter@lekensteyn.nl>              # Nvidia Optimus
      Tested-by: Lukas Wunner <lukas@wunner.de>              # MacBook Pro
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/51bd38360ff502a8c42b1ebf4405ee1d3f27118d.1520068884.git.lukas@wunner.de
      07f4f97d
    • L
      PCI: Make pci_wakeup_bus() & pci_bus_set_current_state() public · 2a4d2c42
      Lukas Wunner 提交于
      There are PCI devices which are power-manageable by a nonstandard means,
      such as a custom ACPI method.  One example are discrete GPUs in hybrid
      graphics laptops, another are Thunderbolt controllers in Macs.
      
      Such devices can't be put into D3cold with pci_set_power_state() because
      pci_platform_power_transition() fails with -ENODEV.  Instead they're put
      into D3hot by pci_set_power_state() and subsequently into D3cold by
      invoking the nonstandard means.  However as a consequence the cached
      current_state is incorrectly left at D3hot.
      
      What we need to do is walk the hierarchy below such a PCI device on
      powerdown and update the current_state to D3cold.  On powerup the PCI
      device itself and the hierarchy below it is in D0uninitialized, so we
      need to walk the hierarchy again and wake all devices, causing them to
      be put into D0active and then letting them autosuspend as they see fit.
      
      To this end make pci_wakeup_bus() & pci_bus_set_current_state() public
      so PCI drivers don't have to reinvent the wheel.
      
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/2962443259e7faec577274b4ef8c54aad66f9a94.1520068884.git.lukas@wunner.de
      2a4d2c42
    • S
      workqueue: remove unused cancel_work() · 6417250d
      Stephen Hemminger 提交于
      Found this by accident.
      There are no usages of bare cancel_work() in current kernel source.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6417250d
    • B
      IB/mlx5: Fix integer overflows in mlx5_ib_create_srq · c2b37f76
      Boris Pismenny 提交于
      This patch validates user provided input to prevent integer overflow due
      to integer manipulation in the mlx5_ib_create_srq function.
      
      Cc: syzkaller <syzkaller@googlegroups.com>
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c2b37f76
  12. 13 3月, 2018 1 次提交
    • M
      perf/core: Implement fast breakpoint modification via _IOC_MODIFY_ATTRIBUTES · 32ff77e8
      Milind Chabbi 提交于
      Problem and motivation: Once a breakpoint perf event (PERF_TYPE_BREAKPOINT)
      is created, there is no flexibility to change the breakpoint type
      (bp_type), breakpoint address (bp_addr), or breakpoint length (bp_len). The
      only option is to close the perf event and configure a new breakpoint
      event. This inflexibility has a significant performance overhead. For
      example, sampling-based, lightweight performance profilers (and also
      concurrency bug detection tools),  monitor different addresses for a short
      duration using PERF_TYPE_BREAKPOINT and change the address (bp_addr) to
      another address or change the kind of breakpoint (bp_type) from  "write" to
      a "read" or vice-versa or change the length (bp_len) of the address being
      monitored. The cost of these modifications is prohibitive since it involves
      unmapping the circular buffer associated with the perf event, closing the
      perf event, opening another perf event and mmaping another circular buffer.
      
      Solution: The new ioctl flag for perf events,
      PERF_EVENT_IOC_MODIFY_ATTRIBUTES, introduced in this patch takes a pointer
      to a struct perf_event_attr as an argument to update an old breakpoint
      event with new address, type, and size. This facility allows retaining a
      previous mmaped perf events ring buffer and avoids having to close and
      reopen another perf event.
      
      This patch supports only changing PERF_TYPE_BREAKPOINT event type; future
      implementations can extend this feature. The patch replicates some of its
      functionality of modify_user_hw_breakpoint() in
      kernel/events/hw_breakpoint.c. modify_user_hw_breakpoint cannot be called
      directly since perf_event_ctx_lock() is already held in _perf_ioctl().
      
      Evidence: Experiments show that the baseline (not able to modify an already
      created breakpoint) costs an order of magnitude (~10x) more than the
      suggested optimization (having the ability to dynamically modifying a
      configured breakpoint via ioctl). When the breakpoints typically do not
      trap, the speedup due to the suggested optimization is ~10x; even when the
      breakpoints always trap, the speedup is ~4x due to the suggested
      optimization.
      
      Testing: tests posted at
      https://github.com/linux-contrib/perf_event_modify_bp demonstrate the
      performance significance of this patch. Tests also check the functional
      correctness of the patch.
      Signed-off-by: NMilind Chabbi <chabbi.milind@gmail.com>
      [ Using modify_user_hw_breakpoint_check function. ]
      [ Reformated PERF_EVENT_IOC_*, so the values are all in one column. ]
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Hari Bathini <hbathini@linux.vnet.ibm.com>
      Cc: Jin Yao <yao.jin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Oleg Nesterov <onestero@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Link: http://lkml.kernel.org/r/20180312134548.31532-8-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      32ff77e8
  13. 12 3月, 2018 7 次提交
    • X
      sock_diag: request _diag module only when the family or proto has been registered · bf2ae2e4
      Xin Long 提交于
      Now when using 'ss' in iproute, kernel would try to load all _diag
      modules, which also causes corresponding family and proto modules
      to be loaded as well due to module dependencies.
      
      Like after running 'ss', sctp, dccp, af_packet (if it works as a module)
      would be loaded.
      
      For example:
      
        $ lsmod|grep sctp
        $ ss
        $ lsmod|grep sctp
        sctp_diag              16384  0
        sctp                  323584  5 sctp_diag
        inet_diag              24576  4 raw_diag,tcp_diag,sctp_diag,udp_diag
        libcrc32c              16384  3 nf_conntrack,nf_nat,sctp
      
      As these family and proto modules are loaded unintentionally, it
      could cause some problems, like:
      
      - Some debug tools use 'ss' to collect the socket info, which loads all
        those diag and family and protocol modules. It's noisy for identifying
        issues.
      
      - Users usually expect to drop sctp init packet silently when they
        have no sense of sctp protocol instead of sending abort back.
      
      - It wastes resources (especially with multiple netns), and SCTP module
        can't be unloaded once it's loaded.
      
      ...
      
      In short, it's really inappropriate to have these family and proto
      modules loaded unexpectedly when just doing debugging with inet_diag.
      
      This patch is to introduce sock_load_diag_module() where it loads
      the _diag module only when it's corresponding family or proto has
      been already registered.
      
      Note that we can't just load _diag module without the family or
      proto loaded, as some symbols used in _diag module are from the
      family or proto module.
      
      v1->v2:
        - move inet proto check to inet_diag to avoid a compiling err.
      v2->v3:
        - define sock_load_diag_module in sock.c and export one symbol
          only.
        - improve the changelog.
      Reported-by: NSabrina Dubroca <sd@queasysnail.net>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NPhil Sutter <phil@nwl.cc>
      Acked-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf2ae2e4
    • B
      net: phy: Tell caller result of phy_change() · a2c054a8
      Brad Mouring 提交于
      In 664fcf12 (net: phy: Threaded interrupts allow some simplification)
      the phy_interrupt system was changed to use a traditional threaded
      interrupt scheme instead of a workqueue approach.
      
      With this change, the phy status check moved into phy_change, which
      did not report back to the caller whether or not the interrupt was
      handled. This means that, in the case of a shared phy interrupt,
      only the first phydev's interrupt registers are checked (since
      phy_interrupt() would always return IRQ_HANDLED). This leads to
      interrupt storms when it is a secondary device that's actually the
      interrupt source.
      Signed-off-by: NBrad Mouring <brad.mouring@ni.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2c054a8
    • P
      perf/core: Optimize ctx_sched_out() · 6668128a
      Peter Zijlstra 提交于
      When an event group contains more events than can be scheduled on the
      hardware, iterating the full event group for ctx_sched_out is a waste
      of time.
      
      Keep track of the events that got programmed on the hardware, such
      that we can iterate this smaller list in order to schedule them out.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      6668128a
    • P
      perf/core: Remove perf_event::group_entry · 8343aae6
      Peter Zijlstra 提交于
      Now that all the grouping is done with RB trees, we no longer need
      group_entry and can replace the whole thing with sibling_list.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexey Budankov <alexey.budankov@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8343aae6
    • A
      perf/cor: Use RB trees for pinned/flexible groups · 8e1a2031
      Alexey Budankov 提交于
      Change event groups into RB trees sorted by CPU and then by a 64bit
      index, so that multiplexing hrtimer interrupt handler would be able
      skipping to the current CPU's list and ignore groups allocated for the
      other CPUs.
      
      New API for manipulating event groups in the trees is implemented as well
      as adoption on the API in the current implementation.
      
      pinned_group_sched_in() and flexible_group_sched_in() API are
      introduced to consolidate code enabling the whole group from pinned
      and flexible groups appropriately.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: David Carrillo-Cisneros <davidcc@google.com>
      Cc: Dmitri Prokhorov <Dmitry.Prohorov@intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Valery Cherepennikov <valery.cherepennikov@intel.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/372f9c8b-0cfe-4240-e44d-83d863d40813@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8e1a2031
    • P
      sched/core: Remove TASK_ALL · 9e49e244
      Peter Zijlstra 提交于
      It's unused:
      
        $ git grep "\<TASK_ALL\>" | wc -l
        1
      
      ... and it is also dangerous, kill the bugger.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/20180227160510.10829-1-bigeasy@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9e49e244
    • F
      netfilter: x_tables: add and use xt_check_proc_name · b1d0a5d0
      Florian Westphal 提交于
      recent and hashlimit both create /proc files, but only check that
      name is 0 terminated.
      
      This can trigger WARN() from procfs when name is "" or "/".
      Add helper for this and then use it for both.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Reported-by: <syzbot+0502b00edac2a0680b61@syzkaller.appspotmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b1d0a5d0
  14. 10 3月, 2018 1 次提交
  15. 09 3月, 2018 4 次提交
    • P
      sched/cpufreq: Provide migration hint · ea14b57e
      Peter Zijlstra 提交于
      It was suggested that a migration hint might be usefull for the
      CPU-freq governors.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ea14b57e
    • P
      sched/nohz: Clean up nohz enter/exit · 00357f5e
      Peter Zijlstra 提交于
      The primary observation is that nohz enter/exit is always from the
      current CPU, therefore NOHZ_TICK_STOPPED does not in fact need to be
      an atomic.
      
      Secondary is that we appear to have 2 nearly identical hooks in the
      nohz enter code, set_cpu_sd_state_idle() and
      nohz_balance_enter_idle(). Fold the whole set_cpu_sd_state thing into
      nohz_balance_{enter,exit}_idle.
      
      Removes an atomic op from both enter and exit paths.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      00357f5e
    • P
      cpufreq/schedutil: Rewrite CPUFREQ_RT support · 8f111bc3
      Peter Zijlstra 提交于
      Instead of trying to duplicate scheduler state to track if an RT task
      is running, directly use the scheduler runqueue state for it.
      
      This vastly simplifies things and fixes a number of bugs related to
      sugov and the scheduler getting out of sync wrt this state.
      
      As a consequence we not also update the remove cfs/dl state when
      iterating the shared mask.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8f111bc3
    • P
      cpufreq/schedutil: Remove unused CPUFREQ_DL · 4042d003
      Peter Zijlstra 提交于
      Bitrot...
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Viresh Kumar <viresh.kumar@linaro.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4042d003
  16. 08 3月, 2018 1 次提交
    • E
      net: usbnet: fix potential deadlock on 32bit hosts · 2695578b
      Eric Dumazet 提交于
      Marek reported a LOCKDEP issue occurring on 32bit host,
      that we tracked down to the fact that usbnet could either
      run from soft or hard irqs.
      
      This patch adds u64_stats_update_begin_irqsave() and
      u64_stats_update_end_irqrestore() helpers to solve this case.
      
      [   17.768040] ================================
      [   17.772239] WARNING: inconsistent lock state
      [   17.776511] 4.16.0-rc3-next-20180227-00007-g876c53a7493c #453 Not tainted
      [   17.783329] --------------------------------
      [   17.787580] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
      [   17.793607] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [   17.798751]  (&syncp->seq#5){?.-.}, at: [<9b22e5f0>]
      asix_rx_fixup_internal+0x188/0x288
      [   17.806790] {IN-HARDIRQ-W} state was registered at:
      [   17.811677]   tx_complete+0x100/0x208
      [   17.815319]   __usb_hcd_giveback_urb+0x60/0xf0
      [   17.819770]   xhci_giveback_urb_in_irq+0xa8/0x240
      [   17.824469]   xhci_td_cleanup+0xf4/0x16c
      [   17.828367]   xhci_irq+0xe74/0x2240
      [   17.831827]   usb_hcd_irq+0x24/0x38
      [   17.835343]   __handle_irq_event_percpu+0x98/0x510
      [   17.840111]   handle_irq_event_percpu+0x1c/0x58
      [   17.844623]   handle_irq_event+0x38/0x5c
      [   17.848519]   handle_fasteoi_irq+0xa4/0x138
      [   17.852681]   generic_handle_irq+0x18/0x28
      [   17.856760]   __handle_domain_irq+0x6c/0xe4
      [   17.860941]   gic_handle_irq+0x54/0xa0
      [   17.864666]   __irq_svc+0x70/0xb0
      [   17.867964]   arch_cpu_idle+0x20/0x3c
      [   17.871578]   arch_cpu_idle+0x20/0x3c
      [   17.875190]   do_idle+0x144/0x218
      [   17.878468]   cpu_startup_entry+0x18/0x1c
      [   17.882454]   start_kernel+0x394/0x400
      [   17.886177] irq event stamp: 161912
      [   17.889616] hardirqs last  enabled at (161912): [<7bedfacf>]
      __netdev_alloc_skb+0xcc/0x140
      [   17.897893] hardirqs last disabled at (161911): [<d58261d0>]
      __netdev_alloc_skb+0x94/0x140
      [   17.904903] exynos5-hsi2c 12ca0000.i2c: tx timeout
      [   17.906116] softirqs last  enabled at (161904): [<387102ff>]
      irq_enter+0x78/0x80
      [   17.906123] softirqs last disabled at (161905): [<cf4c628e>]
      irq_exit+0x134/0x158
      [   17.925722].
      [   17.925722] other info that might help us debug this:
      [   17.933435]  Possible unsafe locking scenario:
      [   17.933435].
      [   17.940331]        CPU0
      [   17.942488]        ----
      [   17.944894]   lock(&syncp->seq#5);
      [   17.948274]   <Interrupt>
      [   17.950847]     lock(&syncp->seq#5);
      [   17.954386].
      [   17.954386]  *** DEADLOCK ***
      [   17.954386].
      [   17.962422] no locks held by swapper/0/0.
      
      Fixes: c8b5d129 ("net: usbnet: support 64bit stats")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2695578b
  17. 07 3月, 2018 2 次提交
    • P
      rhashtable: Fix rhlist duplicates insertion · d3dcf8eb
      Paul Blakey 提交于
      When inserting duplicate objects (those with the same key),
      current rhlist implementation messes up the chain pointers by
      updating the bucket pointer instead of prev next pointer to the
      newly inserted node. This causes missing elements on removal and
      travesal.
      
      Fix that by properly updating pprev pointer to point to
      the correct rhash_head next pointer.
      
      Issue: 1241076
      Change-Id: I86b2c140bcb4aeb10b70a72a267ff590bb2b17e7
      Fixes: ca26893f ('rhashtable: Add rhlist interface')
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3dcf8eb
    • D
      usb: quirks: add control message delay for 1b1c:1b20 · cb88a058
      Danilo Krummrich 提交于
      Corsair Strafe RGB keyboard does not respond to usb control messages
      sometimes and hence generates timeouts.
      
      Commit de3af5bf ("usb: quirks: add delay init quirk for Corsair
      Strafe RGB keyboard") tried to fix those timeouts by adding
      USB_QUIRK_DELAY_INIT.
      
      Unfortunately, even with this quirk timeouts of usb_control_msg()
      can still be seen, but with a lower frequency (approx. 1 out of 15):
      
      [   29.103520] usb 1-8: string descriptor 0 read error: -110
      [   34.363097] usb 1-8: can't set config #1, error -110
      
      Adding further delays to different locations where usb control
      messages are issued just moves the timeouts to other locations,
      e.g.:
      
      [   35.400533] usbhid 1-8:1.0: can't add hid device: -110
      [   35.401014] usbhid: probe of 1-8:1.0 failed with error -110
      
      The only way to reliably avoid those issues is having a pause after
      each usb control message. In approx. 200 boot cycles no more timeouts
      were seen.
      
      Addionaly, keep USB_QUIRK_DELAY_INIT as it turned out to be necessary
      to have the delay in hub_port_connect() after hub_port_init().
      
      The overall boot time seems not to be influenced by these additional
      delays, even on fast machines and lightweight distributions.
      
      Fixes: de3af5bf ("usb: quirks: add delay init quirk for Corsair Strafe RGB keyboard")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDanilo Krummrich <danilokrummrich@dk-develop.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb88a058
  18. 06 3月, 2018 2 次提交
  19. 05 3月, 2018 2 次提交