1. 23 3月, 2018 1 次提交
    • D
      Revert "mm: page_alloc: skip over regions of invalid pfns where possible" · f59f1caf
      Daniel Vacek 提交于
      This reverts commit b92df1de ("mm: page_alloc: skip over regions of
      invalid pfns where possible").  The commit is meant to be a boot init
      speed up skipping the loop in memmap_init_zone() for invalid pfns.
      
      But given some specific memory mapping on x86_64 (or more generally
      theoretically anywhere but on arm with CONFIG_HAVE_ARCH_PFN_VALID) the
      implementation also skips valid pfns which is plain wrong and causes
      'kernel BUG at mm/page_alloc.c:1389!'
      
        crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
        kernel BUG at mm/page_alloc.c:1389!
        invalid opcode: 0000 [#1] SMP
        --
        RIP: 0010: move_freepages+0x15e/0x160
        --
        Call Trace:
          move_freepages_block+0x73/0x80
          __rmqueue+0x263/0x460
          get_page_from_freelist+0x7e1/0x9e0
          __alloc_pages_nodemask+0x176/0x420
        --
      
        crash> page_init_bug -v | grep RAM
        <struct resource 0xffff88067fffd2f8>          1000 -        9bfff       System RAM (620.00 KiB)
        <struct resource 0xffff88067fffd3a0>        100000 -     430bffff       System RAM (  1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
        <struct resource 0xffff88067fffd410>      4b0c8000 -     4bf9cfff       System RAM ( 14.83 MiB = 15188.00 KiB)
        <struct resource 0xffff88067fffd480>      4bfac000 -     646b1fff       System RAM (391.02 MiB = 400408.00 KiB)
        <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
        <struct resource 0xffff88067fffd640>     100000000 -    67fffffff       System RAM ( 22.00 GiB)
      
        crash> page_init_bug | head -6
        <struct resource 0xffff88067fffd560>      7b788000 -     7b7fffff       System RAM (480.00 KiB)
        <struct page 0xffffea0001ede200>   1fffff00000000  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
        <struct page 0xffffea0001ede200>       505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
        <struct page 0xffffea0001ed8000>                0  0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA               1       4095
        <struct page 0xffffea0001edffc0>   1fffff00000400  0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32          4096    1048575
        BUG, zones differ!
      
        crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
              PAGE        PHYSICAL      MAPPING       INDEX CNT FLAGS
        ffffea0001e00000  78000000                0        0  0 0
        ffffea0001ed7fc0  7b5ff000                0        0  0 0
        ffffea0001ed8000  7b600000                0        0  0 0       <<<<
        ffffea0001ede1c0  7b787000                0        0  0 0
        ffffea0001ede200  7b788000                0        0  1 1fffff00000000
      
      Link: http://lkml.kernel.org/r/20180316143855.29838-1-neelx@redhat.com
      Fixes: b92df1de ("mm: page_alloc: skip over regions of invalid pfns where possible")
      Signed-off-by: NDaniel Vacek <neelx@redhat.com>
      Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Paul Burton <paul.burton@imgtec.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f59f1caf
  2. 22 3月, 2018 1 次提交
  3. 21 3月, 2018 1 次提交
  4. 20 3月, 2018 2 次提交
    • J
      jump_label: Disable jump labels in __exit code · 578ae447
      Josh Poimboeuf 提交于
      With the following commit:
      
        33352244 ("jump_label: Explicitly disable jump labels in __init code")
      
      ... we explicitly disabled jump labels in __init code, so they could be
      detected and not warned about in the following commit:
      
        dc1dd184 ("jump_label: Warn on failed jump_label patching attempt")
      
      In-kernel __exit code has the same issue.  It's never used, so it's
      freed along with the rest of initmem.  But jump label entries in __exit
      code aren't explicitly disabled, so we get the following warning when
      enabling pr_debug() in __exit code:
      
        can't patch jump_label at dmi_sysfs_exit+0x0/0x2d
        WARNING: CPU: 0 PID: 22572 at kernel/jump_label.c:376 __jump_label_update+0x9d/0xb0
      
      Fix the warning by disabling all jump labels in initmem (which includes
      both __init and __exit code).
      Reported-and-tested-by: NLi Wang <liwang@redhat.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Jason Baron <jbaron@akamai.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: dc1dd184 ("jump_label: Warn on failed jump_label patching attempt")
      Link: http://lkml.kernel.org/r/7121e6e595374f06616c505b6e690e275c0054d1.1521483452.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      578ae447
    • T
      percpu_ref: Update doc to dissuade users from depending on internal RCU grace periods · b3a5d111
      Tejun Heo 提交于
      percpu_ref internally uses sched-RCU to implement the percpu -> atomic
      mode switching and the documentation suggested that this could be
      depended upon.  This doesn't seem like a good idea.
      
      * percpu_ref uses sched-RCU which has different grace periods regular
        RCU.  Users may combine percpu_ref with regular RCU usage and
        incorrectly believe that regular RCU grace periods are performed by
        percpu_ref.  This can lead to, for example, use-after-free due to
        premature freeing.
      
      * percpu_ref has a grace period when switching from percpu to atomic
        mode.  It doesn't have one between the last put and release.  This
        distinction is subtle and can lead to surprising bugs.
      
      * percpu_ref allows starting in and switching to atomic mode manually
        for debugging and other purposes.  This means that there may not be
        any grace periods from kill to release.
      
      This patch makes it clear that the grace periods are percpu_ref's
      internal implementation detail and can't be depended upon by the
      users.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      b3a5d111
  5. 16 3月, 2018 2 次提交
    • T
      vlan: Fix out of order vlan headers with reorder header off · cbe7128c
      Toshiaki Makita 提交于
      With reorder header off, received packets are untagged in skb_vlan_untag()
      called from within __netif_receive_skb_core(), and later the tag will be
      inserted back in vlan_do_receive().
      
      This caused out of order vlan headers when we create a vlan device on top
      of another vlan device, because vlan_do_receive() inserts a tag as the
      outermost vlan tag. E.g. the outer tag is first removed in skb_vlan_untag()
      and inserted back in vlan_do_receive(), then the inner tag is next removed
      and inserted back as the outermost tag.
      
      This patch fixes the behaviour by inserting the inner tag at the right
      position.
      Signed-off-by: NToshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbe7128c
    • E
      fs: Teach path_connected to handle nfs filesystems with multiple roots. · 95dd7758
      Eric W. Biederman 提交于
      On nfsv2 and nfsv3 the nfs server can export subsets of the same
      filesystem and report the same filesystem identifier, so that the nfs
      client can know they are the same filesystem.  The subsets can be from
      disjoint directory trees.  The nfsv2 and nfsv3 filesystems provides no
      way to find the common root of all directory trees exported form the
      server with the same filesystem identifier.
      
      The practical result is that in struct super s_root for nfs s_root is
      not necessarily the root of the filesystem.  The nfs mount code sets
      s_root to the root of the first subset of the nfs filesystem that the
      kernel mounts.
      
      This effects the dcache invalidation code in generic_shutdown_super
      currently called shrunk_dcache_for_umount and that code for years
      has gone through an additional list of dentries that might be dentry
      trees that need to be freed to accomodate nfs.
      
      When I wrote path_connected I did not realize nfs was so special, and
      it's hueristic for avoiding calling is_subdir can fail.
      
      The practical case where this fails is when there is a move of a
      directory from the subtree exposed by one nfs mount to the subtree
      exposed by another nfs mount.  This move can happen either locally or
      remotely.  With the remote case requiring that the move directory be cached
      before the move and that after the move someone walks the path
      to where the move directory now exists and in so doing causes the
      already cached directory to be moved in the dcache through the magic
      of d_splice_alias.
      
      If someone whose working directory is in the move directory or a
      subdirectory and now starts calling .. from the initial mount of nfs
      (where s_root == mnt_root), then path_connected as a heuristic will
      not bother with the is_subdir check.  As s_root really is not the root
      of the nfs filesystem this heuristic is wrong, and the path may
      actually not be connected and path_connected can fail.
      
      The is_subdir function might be cheap enough that we can call it
      unconditionally.  Verifying that will take some benchmarking and
      the result may not be the same on all kernels this fix needs
      to be backported to.  So I am avoiding that for now.
      
      Filesystems with snapshots such as nilfs and btrfs do something
      similar.  But as the directory tree of the snapshots are disjoint
      from one another and from the main directory tree rename won't move
      things between them and this problem will not occur.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAl Viro <viro@ZenIV.linux.org.uk>
      Fixes: 397d425d ("vfs: Test for and handle paths that are unreachable from their mnt_root")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      95dd7758
  6. 15 3月, 2018 2 次提交
    • M
      KVM: arm/arm64: vgic: Don't populate multiple LRs with the same vintid · 16ca6a60
      Marc Zyngier 提交于
      The vgic code is trying to be clever when injecting GICv2 SGIs,
      and will happily populate LRs with the same interrupt number if
      they come from multiple vcpus (after all, they are distinct
      interrupt sources).
      
      Unfortunately, this is against the letter of the architecture,
      and the GICv2 architecture spec says "Each valid interrupt stored
      in the List registers must have a unique VirtualID for that
      virtual CPU interface.". GICv3 has similar (although slightly
      ambiguous) restrictions.
      
      This results in guests locking up when using GICv2-on-GICv3, for
      example. The obvious fix is to stop trying so hard, and inject
      a single vcpu per SGI per guest entry. After all, pending SGIs
      with multiple source vcpus are pretty rare, and are mostly seen
      in scenario where the physical CPUs are severely overcomitted.
      
      But as we now only inject a single instance of a multi-source SGI per
      vcpu entry, we may delay those interrupts for longer than strictly
      necessary, and run the risk of injecting lower priority interrupts
      in the meantime.
      
      In order to address this, we adopt a three stage strategy:
      - If we encounter a multi-source SGI in the AP list while computing
        its depth, we force the list to be sorted
      - When populating the LRs, we prevent the injection of any interrupt
        of lower priority than that of the first multi-source SGI we've
        injected.
      - Finally, the injection of a multi-source SGI triggers the request
        of a maintenance interrupt when there will be no pending interrupt
        in the LRs (HCR_NPIE).
      
      At the point where the last pending interrupt in the LRs switches
      from Pending to Active, the maintenance interrupt will be delivered,
      allowing us to add the remaining SGIs using the same process.
      
      Cc: stable@vger.kernel.org
      Fixes: 0919e84c ("KVM: arm/arm64: vgic-new: Add IRQ sync/flush framework")
      Acked-by: NChristoffer Dall <cdall@kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      16ca6a60
    • E
      net: use skb_to_full_sk() in skb_update_prio() · 4dcb31d4
      Eric Dumazet 提交于
      Andrei Vagin reported a KASAN: slab-out-of-bounds error in
      skb_update_prio()
      
      Since SYNACK might be attached to a request socket, we need to
      get back to the listener socket.
      Since this listener is manipulated without locks, add const
      qualifiers to sock_cgroup_prioidx() so that the const can also
      be used in skb_update_prio()
      
      Also add the const qualifier to sock_cgroup_classid() for consistency.
      
      Fixes: ca6fb065 ("tcp: attach SYNACK messages to request sockets instead of listener")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4dcb31d4
  7. 14 3月, 2018 4 次提交
    • L
      vga_switcheroo: Use device link for HDA controller · 07f4f97d
      Lukas Wunner 提交于
      Back in 2013, runtime PM for GPUs with integrated HDA controller was
      introduced with commits 0d69704a ("gpu/vga_switcheroo: add driver
      control power feature. (v3)") and 246efa4a ("snd/hda: add runtime
      suspend/resume on optimus support (v4)").
      
      Briefly, the idea was that the HDA controller is forced on and off in
      unison with the GPU.
      
      The original code is mostly still in place even though it was never a
      100% perfect solution:  E.g. on access to the HDA controller, the GPU
      is powered up via vga_switcheroo_runtime_resume_hdmi_audio() but there
      are no provisions to keep it resumed until access to the HDA controller
      has ceased:  The GPU autosuspends after 5 seconds, rendering the HDA
      controller inaccessible.
      
      Additionally, a kludge is required when hda_intel.c probes:  It has to
      check whether the GPU is powered down (check_hdmi_disabled()) and defer
      probing if so.
      
      However in the meantime (in v4.10) the driver core has gained a feature
      called device links which promises to solve such issues in a clean way:
      It allows us to declare a dependency from the HDA controller (consumer)
      to the GPU (supplier).  The PM core then automagically ensures that the
      GPU is runtime resumed as long as the HDA controller's ->probe hook is
      executed and whenever the HDA controller is accessed.
      
      By default, the HDA controller has a dependency on its parent, a PCIe
      Root Port.  Adding a device link creates another dependency on its
      sibling:
      
                                  PCIe Root Port
                                   ^          ^
                                   |          |
                                   |          |
                                  HDA  ===>  GPU
      
      The device link is not only used for runtime PM, it also guarantees that
      on system sleep, the HDA controller suspends before the GPU and resumes
      after the GPU, and on system shutdown the HDA controller's ->shutdown
      hook is executed before the one of the GPU.  It is a complete solution.
      
      Using this functionality is as simple as calling device_link_add(),
      which results in a dmesg entry like this:
      
              pci 0000:01:00.1: Linked as a consumer to 0000:01:00.0
      
      The code for the GPU-governed audio power management can thus be removed
      (except where it's still needed for legacy manual power control).
      
      The device link is added in a PCI quirk rather than in hda_intel.c.
      It is therefore legal for the GPU to runtime suspend to D3cold even if
      the HDA controller is not bound to a driver or if CONFIG_SND_HDA_INTEL
      is not enabled, for accesses to the HDA controller will cause the GPU to
      wake up regardless if they're occurring outside of hda_intel.c (think
      config space readout via sysfs).
      
      Contrary to the previous implementation, the HDA controller's power
      state is now self-governed, rather than GPU-governed, whereas the GPU's
      power state is no longer fully self-governed.  (The HDA controller needs
      to runtime suspend before the GPU can.)
      
      It is thus crucial that runtime PM is always activated on the HDA
      controller even if CONFIG_SND_HDA_POWER_SAVE_DEFAULT is set to 0 (which
      is the default), lest the GPU stays awake.  This is achieved by setting
      the auto_runtime_pm flag on every codec and the AZX_DCAPS_PM_RUNTIME
      flag on the HDA controller.
      
      A side effect is that power consumption might be reduced if the GPU is
      in use but the HDA controller is not, because the HDA controller is now
      allowed to go to D3hot.  Before, it was forced to stay in D0 as long as
      the GPU was in use.  (There is no reduction in power consumption on my
      Nvidia GK107, but there might be on other chips.)
      
      The code paths for legacy manual power control are adjusted such that
      runtime PM is disabled during power off, thereby preventing the PM core
      from resuming the HDA controller.
      
      Note that the device link is not only added on vga_switcheroo capable
      systems, but for *any* GPU with integrated HDA controller.  The idea is
      that the HDA controller streams audio via connectors located on the GPU,
      so the GPU needs to be on for the HDA controller to do anything useful.
      
      This commit implicitly fixes an unbalanced runtime PM ref upon unbind of
      hda_intel.c:  On ->probe, a runtime PM ref was previously released under
      the condition "azx_has_pm_runtime(chip) || hda->use_vga_switcheroo", but
      on ->remove a runtime PM ref was only acquired under the first of those
      conditions.  Thus, binding and unbinding the driver twice on a
      vga_switcheroo capable system caused the runtime PM refcount to drop
      below zero.  The issue is resolved because the AZX_DCAPS_PM_RUNTIME flag
      is now always set if use_vga_switcheroo is true.
      
      For more information on device links please refer to:
      https://www.kernel.org/doc/html/latest/driver-api/device_link.html
      Documentation/driver-api/device_link.rst
      
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NTakashi Iwai <tiwai@suse.de>
      Reviewed-by: NPeter Wu <peter@lekensteyn.nl>
      Tested-by: Kai Heng Feng <kai.heng.feng@canonical.com> # AMD PowerXpress
      Tested-by: Mike Lothian <mike@fireburn.co.uk>          # AMD PowerXpress
      Tested-by: Denis Lisov <dennis.lissov@gmail.com>       # Nvidia Optimus
      Tested-by: Peter Wu <peter@lekensteyn.nl>              # Nvidia Optimus
      Tested-by: Lukas Wunner <lukas@wunner.de>              # MacBook Pro
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/51bd38360ff502a8c42b1ebf4405ee1d3f27118d.1520068884.git.lukas@wunner.de
      07f4f97d
    • L
      PCI: Make pci_wakeup_bus() & pci_bus_set_current_state() public · 2a4d2c42
      Lukas Wunner 提交于
      There are PCI devices which are power-manageable by a nonstandard means,
      such as a custom ACPI method.  One example are discrete GPUs in hybrid
      graphics laptops, another are Thunderbolt controllers in Macs.
      
      Such devices can't be put into D3cold with pci_set_power_state() because
      pci_platform_power_transition() fails with -ENODEV.  Instead they're put
      into D3hot by pci_set_power_state() and subsequently into D3cold by
      invoking the nonstandard means.  However as a consequence the cached
      current_state is incorrectly left at D3hot.
      
      What we need to do is walk the hierarchy below such a PCI device on
      powerdown and update the current_state to D3cold.  On powerup the PCI
      device itself and the hierarchy below it is in D0uninitialized, so we
      need to walk the hierarchy again and wake all devices, causing them to
      be put into D0active and then letting them autosuspend as they see fit.
      
      To this end make pci_wakeup_bus() & pci_bus_set_current_state() public
      so PCI drivers don't have to reinvent the wheel.
      
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Link: https://patchwork.freedesktop.org/patch/msgid/2962443259e7faec577274b4ef8c54aad66f9a94.1520068884.git.lukas@wunner.de
      2a4d2c42
    • S
      workqueue: remove unused cancel_work() · 6417250d
      Stephen Hemminger 提交于
      Found this by accident.
      There are no usages of bare cancel_work() in current kernel source.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6417250d
    • B
      IB/mlx5: Fix integer overflows in mlx5_ib_create_srq · c2b37f76
      Boris Pismenny 提交于
      This patch validates user provided input to prevent integer overflow due
      to integer manipulation in the mlx5_ib_create_srq function.
      
      Cc: syzkaller <syzkaller@googlegroups.com>
      Fixes: e126ba97 ("mlx5: Add driver for Mellanox Connect-IB adapters")
      Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
      Signed-off-by: NLeon Romanovsky <leon@kernel.org>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      c2b37f76
  8. 12 3月, 2018 3 次提交
    • X
      sock_diag: request _diag module only when the family or proto has been registered · bf2ae2e4
      Xin Long 提交于
      Now when using 'ss' in iproute, kernel would try to load all _diag
      modules, which also causes corresponding family and proto modules
      to be loaded as well due to module dependencies.
      
      Like after running 'ss', sctp, dccp, af_packet (if it works as a module)
      would be loaded.
      
      For example:
      
        $ lsmod|grep sctp
        $ ss
        $ lsmod|grep sctp
        sctp_diag              16384  0
        sctp                  323584  5 sctp_diag
        inet_diag              24576  4 raw_diag,tcp_diag,sctp_diag,udp_diag
        libcrc32c              16384  3 nf_conntrack,nf_nat,sctp
      
      As these family and proto modules are loaded unintentionally, it
      could cause some problems, like:
      
      - Some debug tools use 'ss' to collect the socket info, which loads all
        those diag and family and protocol modules. It's noisy for identifying
        issues.
      
      - Users usually expect to drop sctp init packet silently when they
        have no sense of sctp protocol instead of sending abort back.
      
      - It wastes resources (especially with multiple netns), and SCTP module
        can't be unloaded once it's loaded.
      
      ...
      
      In short, it's really inappropriate to have these family and proto
      modules loaded unexpectedly when just doing debugging with inet_diag.
      
      This patch is to introduce sock_load_diag_module() where it loads
      the _diag module only when it's corresponding family or proto has
      been already registered.
      
      Note that we can't just load _diag module without the family or
      proto loaded, as some symbols used in _diag module are from the
      family or proto module.
      
      v1->v2:
        - move inet proto check to inet_diag to avoid a compiling err.
      v2->v3:
        - define sock_load_diag_module in sock.c and export one symbol
          only.
        - improve the changelog.
      Reported-by: NSabrina Dubroca <sd@queasysnail.net>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NPhil Sutter <phil@nwl.cc>
      Acked-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf2ae2e4
    • B
      net: phy: Tell caller result of phy_change() · a2c054a8
      Brad Mouring 提交于
      In 664fcf12 (net: phy: Threaded interrupts allow some simplification)
      the phy_interrupt system was changed to use a traditional threaded
      interrupt scheme instead of a workqueue approach.
      
      With this change, the phy status check moved into phy_change, which
      did not report back to the caller whether or not the interrupt was
      handled. This means that, in the case of a shared phy interrupt,
      only the first phydev's interrupt registers are checked (since
      phy_interrupt() would always return IRQ_HANDLED). This leads to
      interrupt storms when it is a secondary device that's actually the
      interrupt source.
      Signed-off-by: NBrad Mouring <brad.mouring@ni.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a2c054a8
    • F
      netfilter: x_tables: add and use xt_check_proc_name · b1d0a5d0
      Florian Westphal 提交于
      recent and hashlimit both create /proc files, but only check that
      name is 0 terminated.
      
      This can trigger WARN() from procfs when name is "" or "/".
      Add helper for this and then use it for both.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Reported-by: <syzbot+0502b00edac2a0680b61@syzkaller.appspotmail.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      b1d0a5d0
  9. 10 3月, 2018 1 次提交
  10. 08 3月, 2018 1 次提交
    • E
      net: usbnet: fix potential deadlock on 32bit hosts · 2695578b
      Eric Dumazet 提交于
      Marek reported a LOCKDEP issue occurring on 32bit host,
      that we tracked down to the fact that usbnet could either
      run from soft or hard irqs.
      
      This patch adds u64_stats_update_begin_irqsave() and
      u64_stats_update_end_irqrestore() helpers to solve this case.
      
      [   17.768040] ================================
      [   17.772239] WARNING: inconsistent lock state
      [   17.776511] 4.16.0-rc3-next-20180227-00007-g876c53a7493c #453 Not tainted
      [   17.783329] --------------------------------
      [   17.787580] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
      [   17.793607] swapper/0/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
      [   17.798751]  (&syncp->seq#5){?.-.}, at: [<9b22e5f0>]
      asix_rx_fixup_internal+0x188/0x288
      [   17.806790] {IN-HARDIRQ-W} state was registered at:
      [   17.811677]   tx_complete+0x100/0x208
      [   17.815319]   __usb_hcd_giveback_urb+0x60/0xf0
      [   17.819770]   xhci_giveback_urb_in_irq+0xa8/0x240
      [   17.824469]   xhci_td_cleanup+0xf4/0x16c
      [   17.828367]   xhci_irq+0xe74/0x2240
      [   17.831827]   usb_hcd_irq+0x24/0x38
      [   17.835343]   __handle_irq_event_percpu+0x98/0x510
      [   17.840111]   handle_irq_event_percpu+0x1c/0x58
      [   17.844623]   handle_irq_event+0x38/0x5c
      [   17.848519]   handle_fasteoi_irq+0xa4/0x138
      [   17.852681]   generic_handle_irq+0x18/0x28
      [   17.856760]   __handle_domain_irq+0x6c/0xe4
      [   17.860941]   gic_handle_irq+0x54/0xa0
      [   17.864666]   __irq_svc+0x70/0xb0
      [   17.867964]   arch_cpu_idle+0x20/0x3c
      [   17.871578]   arch_cpu_idle+0x20/0x3c
      [   17.875190]   do_idle+0x144/0x218
      [   17.878468]   cpu_startup_entry+0x18/0x1c
      [   17.882454]   start_kernel+0x394/0x400
      [   17.886177] irq event stamp: 161912
      [   17.889616] hardirqs last  enabled at (161912): [<7bedfacf>]
      __netdev_alloc_skb+0xcc/0x140
      [   17.897893] hardirqs last disabled at (161911): [<d58261d0>]
      __netdev_alloc_skb+0x94/0x140
      [   17.904903] exynos5-hsi2c 12ca0000.i2c: tx timeout
      [   17.906116] softirqs last  enabled at (161904): [<387102ff>]
      irq_enter+0x78/0x80
      [   17.906123] softirqs last disabled at (161905): [<cf4c628e>]
      irq_exit+0x134/0x158
      [   17.925722].
      [   17.925722] other info that might help us debug this:
      [   17.933435]  Possible unsafe locking scenario:
      [   17.933435].
      [   17.940331]        CPU0
      [   17.942488]        ----
      [   17.944894]   lock(&syncp->seq#5);
      [   17.948274]   <Interrupt>
      [   17.950847]     lock(&syncp->seq#5);
      [   17.954386].
      [   17.954386]  *** DEADLOCK ***
      [   17.954386].
      [   17.962422] no locks held by swapper/0/0.
      
      Fixes: c8b5d129 ("net: usbnet: support 64bit stats")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2695578b
  11. 07 3月, 2018 2 次提交
    • P
      rhashtable: Fix rhlist duplicates insertion · d3dcf8eb
      Paul Blakey 提交于
      When inserting duplicate objects (those with the same key),
      current rhlist implementation messes up the chain pointers by
      updating the bucket pointer instead of prev next pointer to the
      newly inserted node. This causes missing elements on removal and
      travesal.
      
      Fix that by properly updating pprev pointer to point to
      the correct rhash_head next pointer.
      
      Issue: 1241076
      Change-Id: I86b2c140bcb4aeb10b70a72a267ff590bb2b17e7
      Fixes: ca26893f ('rhashtable: Add rhlist interface')
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d3dcf8eb
    • D
      usb: quirks: add control message delay for 1b1c:1b20 · cb88a058
      Danilo Krummrich 提交于
      Corsair Strafe RGB keyboard does not respond to usb control messages
      sometimes and hence generates timeouts.
      
      Commit de3af5bf ("usb: quirks: add delay init quirk for Corsair
      Strafe RGB keyboard") tried to fix those timeouts by adding
      USB_QUIRK_DELAY_INIT.
      
      Unfortunately, even with this quirk timeouts of usb_control_msg()
      can still be seen, but with a lower frequency (approx. 1 out of 15):
      
      [   29.103520] usb 1-8: string descriptor 0 read error: -110
      [   34.363097] usb 1-8: can't set config #1, error -110
      
      Adding further delays to different locations where usb control
      messages are issued just moves the timeouts to other locations,
      e.g.:
      
      [   35.400533] usbhid 1-8:1.0: can't add hid device: -110
      [   35.401014] usbhid: probe of 1-8:1.0 failed with error -110
      
      The only way to reliably avoid those issues is having a pause after
      each usb control message. In approx. 200 boot cycles no more timeouts
      were seen.
      
      Addionaly, keep USB_QUIRK_DELAY_INIT as it turned out to be necessary
      to have the delay in hub_port_connect() after hub_port_init().
      
      The overall boot time seems not to be influenced by these additional
      delays, even on fast machines and lightweight distributions.
      
      Fixes: de3af5bf ("usb: quirks: add delay init quirk for Corsair Strafe RGB keyboard")
      Cc: stable@vger.kernel.org
      Signed-off-by: NDanilo Krummrich <danilokrummrich@dk-develop.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cb88a058
  12. 06 3月, 2018 2 次提交
  13. 05 3月, 2018 2 次提交
  14. 04 3月, 2018 2 次提交
    • F
      of: change overlay apply input data from unflattened to FDT · 39a751a4
      Frank Rowand 提交于
      Move duplicating and unflattening of an overlay flattened devicetree
      (FDT) into the overlay application code.  To accomplish this,
      of_overlay_apply() is replaced by of_overlay_fdt_apply().
      
      The copy of the FDT (aka "duplicate FDT") now belongs to devicetree
      code, which is thus responsible for freeing the duplicate FDT.  The
      caller of of_overlay_fdt_apply() remains responsible for freeing the
      original FDT.
      
      The unflattened devicetree now belongs to devicetree code, which is
      thus responsible for freeing the unflattened devicetree.
      
      These ownership changes prevent early freeing of the duplicated FDT
      or the unflattened devicetree, which could result in use after free
      errors.
      
      of_overlay_fdt_apply() is a private function for the anticipated
      overlay loader.
      
      Update unittest.c to use of_overlay_fdt_apply() instead of
      of_overlay_apply().
      
      Move overlay fragments from artificial locations in
      drivers/of/unittest-data/tests-overlay.dtsi into one devicetree
      source file per overlay.  This led to changes in
      drivers/of/unitest-data/Makefile and drivers/of/unitest.c.
      
        - Add overlay directives to the overlay devicetree source files so
          that dtc will compile them as true overlays into one FDT data
          chunk per overlay.
      
        - Set CFLAGS for drivers/of/unittest-data/testcases.dts so that
          symbols will be generated for overlay resolution of overlays
          that are no longer artificially contained in testcases.dts
      
        - Unflatten and apply each unittest overlay FDT using
          of_overlay_fdt_apply().
      
        - Enable the of_resolve_phandles() check for whether the unflattened
          overlay is detached.  This check was previously disabled because the
          overlays from tests-overlay.dtsi were not unflattened into detached
          trees.
      
        - Other changes to unittest.c infrastructure to manage multiple test
          FDTs built into the kernel image (access by name instead of
          arbitrary number).
      
        - of_unittest_overlay_high_level(): previously unused code to add
          properties from the overlay_base devicetree to the live tree
          was triggered by the restructuring of tests-overlay.dtsi and thus
          testcases.dts.  This exposed two bugs: (1) the need to dup a
          property before adding it, and (2) property 'name' is
          auto-generated in the unflatten code and thus will be a duplicate
          in the __symbols__ node - do not treat this duplicate as an error.
      Signed-off-by: NFrank Rowand <frank.rowand@sony.com>
      39a751a4
    • D
      bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat to deal with gso sctp skbs · d02f51cb
      Daniel Axtens 提交于
      SCTP GSO skbs have a gso_size of GSO_BY_FRAGS, so any sort of
      unconditionally mangling of that will result in nonsense value
      and would corrupt the skb later on.
      
      Therefore, i) add two helpers skb_increase_gso_size() and
      skb_decrease_gso_size() that would throw a one time warning and
      bail out for such skbs and ii) refuse and return early with an
      error in those BPF helpers that are affected. We do need to bail
      out as early as possible from there before any changes on the
      skb have been performed.
      
      Fixes: 6578171a ("bpf: add bpf_skb_change_proto helper")
      Co-authored-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDaniel Axtens <dja@axtens.net>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      d02f51cb
  15. 03 3月, 2018 1 次提交
    • M
      signals: Move put_compat_sigset to compat.h to silence hardened usercopy · fde9fc76
      Matt Redfearn 提交于
      Since commit afcc90f8 ("usercopy: WARN() on slab cache usercopy
      region violations"), MIPS systems booting with a compat root filesystem
      emit a warning when copying compat siginfo to userspace:
      
      WARNING: CPU: 0 PID: 953 at mm/usercopy.c:81 usercopy_warn+0x98/0xe8
      Bad or missing usercopy whitelist? Kernel memory exposure attempt
      detected from SLAB object 'task_struct' (offset 1432, size 16)!
      Modules linked in:
      CPU: 0 PID: 953 Comm: S01logging Not tainted 4.16.0-rc2 #10
      Stack : ffffffff808c0000 0000000000000000 0000000000000001 65ac85163f3bdc4a
      	65ac85163f3bdc4a 0000000000000000 90000000ff667ab8 ffffffff808c0000
      	00000000000003f8 ffffffff808d0000 00000000000000d1 0000000000000000
      	000000000000003c 0000000000000000 ffffffff808c8ca8 ffffffff808d0000
      	ffffffff808d0000 ffffffff80810000 fffffc0000000000 ffffffff80785c30
      	0000000000000009 0000000000000051 90000000ff667eb0 90000000ff667db0
      	000000007fe0d938 0000000000000018 ffffffff80449958 0000000020052798
      	ffffffff808c0000 90000000ff664000 90000000ff667ab0 00000000100c0000
      	ffffffff80698810 0000000000000000 0000000000000000 0000000000000000
      	0000000000000000 0000000000000000 ffffffff8010d02c 65ac85163f3bdc4a
      	...
      Call Trace:
      [<ffffffff8010d02c>] show_stack+0x9c/0x130
      [<ffffffff80698810>] dump_stack+0x90/0xd0
      [<ffffffff80137b78>] __warn+0x100/0x118
      [<ffffffff80137bdc>] warn_slowpath_fmt+0x4c/0x70
      [<ffffffff8021e4a8>] usercopy_warn+0x98/0xe8
      [<ffffffff8021e68c>] __check_object_size+0xfc/0x250
      [<ffffffff801bbfb8>] put_compat_sigset+0x30/0x88
      [<ffffffff8011af24>] setup_rt_frame_n32+0xc4/0x160
      [<ffffffff8010b8b4>] do_signal+0x19c/0x230
      [<ffffffff8010c408>] do_notify_resume+0x60/0x78
      [<ffffffff80106f50>] work_notifysig+0x10/0x18
      ---[ end trace 88fffbf69147f48a ]---
      
      Commit 5905429a ("fork: Provide usercopy whitelisting for
      task_struct") noted that:
      
      "While the blocked and saved_sigmask fields of task_struct are copied to
      userspace (via sigmask_to_save() and setup_rt_frame()), it is always
      copied with a static length (i.e. sizeof(sigset_t))."
      
      However, this is not true in the case of compat signals, whose sigset
      is copied by put_compat_sigset and receives size as an argument.
      
      At most call sites, put_compat_sigset is copying a sigset from the
      current task_struct. This triggers a warning when
      CONFIG_HARDENED_USERCOPY is active. However, by marking this function as
      static inline, the warning can be avoided because in all of these cases
      the size is constant at compile time, which is allowed. The only site
      where this is not the case is handling the rt_sigpending syscall, but
      there the copy is being made from a stack local variable so does not
      trigger the warning.
      
      Move put_compat_sigset to compat.h, and mark it static inline. This
      fixes the WARN on MIPS.
      
      Fixes: afcc90f8 ("usercopy: WARN() on slab cache usercopy region violations")
      Signed-off-by: NMatt Redfearn <matt.redfearn@mips.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: "Dmitry V . Levin" <ldv@altlinux.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/18639/Signed-off-by: NJames Hogan <jhogan@kernel.org>
      fde9fc76
  16. 01 3月, 2018 1 次提交
  17. 28 2月, 2018 2 次提交
    • T
      tty: make n_tty_read() always abort if hangup is in progress · 28b0f8a6
      Tejun Heo 提交于
      A tty is hung up by __tty_hangup() setting file->f_op to
      hung_up_tty_fops, which is skipped on ttys whose write operation isn't
      tty_write().  This means that, for example, /dev/console whose write
      op is redirected_tty_write() is never actually marked hung up.
      
      Because n_tty_read() uses the hung up status to decide whether to
      abort the waiting readers, the lack of hung-up marking can lead to the
      following scenario.
      
       1. A session contains two processes.  The leader and its child.  The
          child ignores SIGHUP.
      
       2. The leader exits and starts disassociating from the controlling
          terminal (/dev/console).
      
       3. __tty_hangup() skips setting f_op to hung_up_tty_fops.
      
       4. SIGHUP is delivered and ignored.
      
       5. tty_ldisc_hangup() is invoked.  It wakes up the waits which should
          clear the read lockers of tty->ldisc_sem.
      
       6. The reader wakes up but because tty_hung_up_p() is false, it
          doesn't abort and goes back to sleep while read-holding
          tty->ldisc_sem.
      
       7. The leader progresses to tty_ldisc_lock() in tty_ldisc_hangup()
          and is now stuck in D sleep indefinitely waiting for
          tty->ldisc_sem.
      
      The following is Alan's explanation on why some ttys aren't hung up.
      
       http://lkml.kernel.org/r/20171101170908.6ad08580@alans-desktop
      
       1. It broke the serial consoles because they would hang up and close
          down the hardware. With tty_port that *should* be fixable properly
          for any cases remaining.
      
       2. The console layer was (and still is) completely broken and doens't
          refcount properly. So if you turn on console hangups it breaks (as
          indeed does freeing consoles and half a dozen other things).
      
      As neither can be fixed quickly, this patch works around the problem
      by introducing a new flag, TTY_HUPPING, which is used solely to tell
      n_tty_read() that hang-up is in progress for the console and the
      readers should be aborted regardless of the hung-up status of the
      device.
      
      The following is a sample hung task warning caused by this issue.
      
        INFO: task agetty:2662 blocked for more than 120 seconds.
              Not tainted 4.11.3-dbg-tty-lockup-02478-gfd6c7ee-dirty #28
        "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
            0  2662      1 0x00000086
        Call Trace:
         __schedule+0x267/0x890
         schedule+0x36/0x80
         schedule_timeout+0x23c/0x2e0
         ldsem_down_write+0xce/0x1f6
         tty_ldisc_lock+0x16/0x30
         tty_ldisc_hangup+0xb3/0x1b0
         __tty_hangup+0x300/0x410
         disassociate_ctty+0x6c/0x290
         do_exit+0x7ef/0xb00
         do_group_exit+0x3f/0xa0
         get_signal+0x1b3/0x5d0
         do_signal+0x28/0x660
         exit_to_usermode_loop+0x46/0x86
         do_syscall_64+0x9c/0xb0
         entry_SYSCALL64_slow_path+0x25/0x25
      
      The following is the repro.  Run "$PROG /dev/console".  The parent
      process hangs in D state.
      
        #include <sys/types.h>
        #include <sys/stat.h>
        #include <sys/wait.h>
        #include <sys/ioctl.h>
        #include <fcntl.h>
        #include <unistd.h>
        #include <stdio.h>
        #include <stdlib.h>
        #include <errno.h>
        #include <signal.h>
        #include <time.h>
        #include <termios.h>
      
        int main(int argc, char **argv)
        {
      	  struct sigaction sact = { .sa_handler = SIG_IGN };
      	  struct timespec ts1s = { .tv_sec = 1 };
      	  pid_t pid;
      	  int fd;
      
      	  if (argc < 2) {
      		  fprintf(stderr, "test-hung-tty /dev/$TTY\n");
      		  return 1;
      	  }
      
      	  /* fork a child to ensure that it isn't already the session leader */
      	  pid = fork();
      	  if (pid < 0) {
      		  perror("fork");
      		  return 1;
      	  }
      
      	  if (pid > 0) {
      		  /* top parent, wait for everyone */
      		  while (waitpid(-1, NULL, 0) >= 0)
      			  ;
      		  if (errno != ECHILD)
      			  perror("waitpid");
      		  return 0;
      	  }
      
      	  /* new session, start a new session and set the controlling tty */
      	  if (setsid() < 0) {
      		  perror("setsid");
      		  return 1;
      	  }
      
      	  fd = open(argv[1], O_RDWR);
      	  if (fd < 0) {
      		  perror("open");
      		  return 1;
      	  }
      
      	  if (ioctl(fd, TIOCSCTTY, 1) < 0) {
      		  perror("ioctl");
      		  return 1;
      	  }
      
      	  /* fork a child, sleep a bit and exit */
      	  pid = fork();
      	  if (pid < 0) {
      		  perror("fork");
      		  return 1;
      	  }
      
      	  if (pid > 0) {
      		  nanosleep(&ts1s, NULL);
      		  printf("Session leader exiting\n");
      		  exit(0);
      	  }
      
      	  /*
      	   * The child ignores SIGHUP and keeps reading from the controlling
      	   * tty.  Because SIGHUP is ignored, the child doesn't get killed on
      	   * parent exit and the bug in n_tty makes the read(2) block the
      	   * parent's control terminal hangup attempt.  The parent ends up in
      	   * D sleep until the child is explicitly killed.
      	   */
      	  sigaction(SIGHUP, &sact, NULL);
      	  printf("Child reading tty\n");
      	  while (1) {
      		  char buf[1024];
      
      		  if (read(fd, buf, sizeof(buf)) < 0) {
      			  perror("read");
      			  return 1;
      		  }
      	  }
      
      	  return 0;
        }
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Alan Cox <alan@llwyncelyn.cymru>
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      28b0f8a6
    • A
      net: phy: Restore phy_resume() locking assumption · 9c2c2e62
      Andrew Lunn 提交于
      commit f5e64032 ("net: phy: fix resume handling") changes the
      locking semantics for phy_resume() such that the caller now needs to
      hold the phy mutex. Not all call sites were adopted to this new
      semantic, resulting in warnings from the added
      WARN_ON(!mutex_is_locked(&phydev->lock)).  Rather than change the
      semantics, add a __phy_resume() and restore the old behavior of
      phy_resume().
      Reported-by: NHeiner Kallweit <hkallweit1@gmail.com>
      Fixes: f5e64032 ("net: phy: fix resume handling")
      Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c2c2e62
  18. 27 2月, 2018 4 次提交
    • D
      dax: fix vma_is_fsdax() helper · 230f5a89
      Dan Williams 提交于
      Gerd reports that ->i_mode may contain other bits besides S_IFCHR. Use
      S_ISCHR() instead. Otherwise, get_user_pages_longterm() may fail on
      device-dax instances when those are meant to be explicitly allowed.
      
      Fixes: 2bb6d283 ("mm: introduce get_user_pages_longterm")
      Cc: <stable@vger.kernel.org>
      Reported-by: NGerd Rausch <gerd.rausch@oracle.com>
      Acked-by: NJane Chu <jane.chu@oracle.com>
      Reported-by: NHaozhong Zhang <haozhong.zhang@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      230f5a89
    • J
      genhd: Fix BUG in blkdev_open() · 56c0908c
      Jan Kara 提交于
      When two blkdev_open() calls for a partition race with device removal
      and recreation, we can hit BUG_ON(!bd_may_claim(bdev, whole, holder)) in
      blkdev_open(). The race can happen as follows:
      
      CPU0				CPU1			CPU2
      							del_gendisk()
      							  bdev_unhash_inode(part1);
      
      blkdev_open(part1, O_EXCL)	blkdev_open(part1, O_EXCL)
        bdev = bd_acquire()		  bdev = bd_acquire()
        blkdev_get(bdev)
          bd_start_claiming(bdev)
            - finds old inode 'whole'
            bd_prepare_to_claim() -> 0
      							  bdev_unhash_inode(whole);
      							<device removed>
      							<new device under same
      							 number created>
      				  blkdev_get(bdev);
      				    bd_start_claiming(bdev)
      				      - finds new inode 'whole'
      				      bd_prepare_to_claim()
      					- this also succeeds as we have
      					  different 'whole' here...
      					- bad things happen now as we
      					  have two exclusive openers of
      					  the same bdev
      
      The problem here is that block device opens can see various intermediate
      states while gendisk is shutting down and then being recreated.
      
      We fix the problem by introducing new lookup_sem in gendisk that
      synchronizes gendisk deletion with get_gendisk() and furthermore by
      making sure that get_gendisk() does not return gendisk that is being (or
      has been) deleted. This makes sure that once we ever manage to look up
      newly created bdev inode, we are also guaranteed that following
      get_gendisk() will either return failure (and we fail open) or it
      returns gendisk for the new device and following bdget_disk() will
      return new bdev inode (i.e., blkdev_open() follows the path as if it is
      completely run after new device is created).
      Reported-and-analyzed-by: NHou Tao <houtao1@huawei.com>
      Tested-by: NHou Tao <houtao1@huawei.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      56c0908c
    • J
      genhd: Add helper put_disk_and_module() · 9df6c299
      Jan Kara 提交于
      Add a proper counterpart to get_disk_and_module() -
      put_disk_and_module(). Currently it is opencoded in several places.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9df6c299
    • J
      genhd: Rename get_disk() to get_disk_and_module() · 3079c22e
      Jan Kara 提交于
      Rename get_disk() to get_disk_and_module() to make sure what the
      function does. It's not a great name but at least it is now clear that
      put_disk() is not it's counterpart.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3079c22e
  19. 26 2月, 2018 1 次提交
    • M
      regmap: mmio: Add function to attach a clock · 31895662
      Maxime Ripard 提交于
      regmap_init_mmio_clk allows to specify a clock that needs to be enabled
      while accessing the registers.
      
      However, that clock is retrieved through its clock ID, which means it will
      lookup that clock based on the current device that registers the regmap,
      and, in the DT case, will only look in that device OF node.
      
      This might be problematic if the clock to enable is stored in another node.
      Let's add a function that allows to attach a clock that has already been
      retrieved to a regmap in order to fix this.
      Signed-off-by: NMaxime Ripard <maxime.ripard@free-electrons.com>
      Signed-off-by: NMark Brown <broonie@kernel.org>
      31895662
  20. 24 2月, 2018 2 次提交
  21. 23 2月, 2018 2 次提交
  22. 22 2月, 2018 1 次提交
    • A
      bug.h: work around GCC PR82365 in BUG() · 173a3efd
      Arnd Bergmann 提交于
      Looking at functions with large stack frames across all architectures
      led me discovering that BUG() suffers from the same problem as
      fortify_panic(), which I've added a workaround for already.
      
      In short, variables that go out of scope by calling a noreturn function
      or __builtin_unreachable() keep using stack space in functions
      afterwards.
      
      A workaround that was identified is to insert an empty assembler
      statement just before calling the function that doesn't return.  I'm
      adding a macro "barrier_before_unreachable()" to document this, and
      insert calls to that in all instances of BUG() that currently suffer
      from this problem.
      
      The files that saw the largest change from this had these frame sizes
      before, and much less with my patch:
      
        fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=]
        fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=]
        net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=]
        drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=]
      
      In case of ARC and CRIS, it turns out that the BUG() implementation
      actually does return (or at least the compiler thinks it does),
      resulting in lots of warnings about uninitialized variable use and
      leaving noreturn functions, such as:
      
        block/cfq-iosched.c: In function 'cfq_async_queue_prio':
        block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type]
        include/linux/dmaengine.h: In function 'dma_maxpq':
        include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type]
      
      This makes them call __builtin_trap() instead, which should normally
      dump the stack and kill the current process, like some of the other
      architectures already do.
      
      I tried adding barrier_before_unreachable() to panic() and
      fortify_panic() as well, but that had very little effect, so I'm not
      submitting that patch.
      
      Vineet said:
      
      : For ARC, it is double win.
      :
      : 1. Fixes 3 -Wreturn-type warnings
      :
      : | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function
      : [-Wreturn-type]
      : | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function
      : [-Wreturn-type]
      : | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of
      : non-void function [-Wreturn-type]
      :
      : 2.  bloat-o-meter reports code size improvements as gcc elides the
      :    generated code for stack return.
      
      Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
      Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: Vineet Gupta <vgupta@synopsys.com>	[arch/arc]
      Tested-by: Vineet Gupta <vgupta@synopsys.com>	[arch/arc]
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Christopher Li <sparse@chrisli.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      173a3efd