1. 29 9月, 2019 4 次提交
    • D
      mm, page_alloc: allow hugepage fallback to remote nodes when madvised · 76e654cc
      David Rientjes 提交于
      For systems configured to always try hard to allocate transparent
      hugepages (thp defrag setting of "always") or for memory that has been
      explicitly madvised to MADV_HUGEPAGE, it is often better to fallback to
      remote memory to allocate the hugepage if the local allocation fails
      first.
      
      The point is to allow the initial call to __alloc_pages_node() to attempt
      to defragment local memory to make a hugepage available, if possible,
      rather than immediately fallback to remote memory.  Local hugepages will
      always have a better access latency than remote (huge)pages, so an attempt
      to make a hugepage available locally is always preferred.
      
      If memory compaction cannot be successful locally, however, it is likely
      better to fallback to remote memory.  This could take on two forms: either
      allow immediate fallback to remote memory or do per-zone watermark checks.
      It would be possible to fallback only when per-zone watermarks fail for
      order-0 memory, since that would require local reclaim for all subsequent
      faults so remote huge allocation is likely better than thrashing the local
      zone for large workloads.
      
      In this case, it is assumed that because the system is configured to try
      hard to allocate hugepages or the vma is advised to explicitly want to try
      hard for hugepages that remote allocation is better when local allocation
      and memory compaction have both failed.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76e654cc
    • D
      mm, page_alloc: avoid expensive reclaim when compaction may not succeed · b39d0ee2
      David Rientjes 提交于
      Memory compaction has a couple significant drawbacks as the allocation
      order increases, specifically:
      
       - isolate_freepages() is responsible for finding free pages to use as
         migration targets and is implemented as a linear scan of memory
         starting at the end of a zone,
      
       - failing order-0 watermark checks in memory compaction does not account
         for how far below the watermarks the zone actually is: to enable
         migration, there must be *some* free memory available.  Per the above,
         watermarks are not always suffficient if isolate_freepages() cannot
         find the free memory but it could require hundreds of MBs of reclaim to
         even reach this threshold (read: potentially very expensive reclaim with
         no indication compaction can be successful), and
      
       - if compaction at this order has failed recently so that it does not even
         run as a result of deferred compaction, looping through reclaim can often
         be pointless.
      
      For hugepage allocations, these are quite substantial drawbacks because
      these are very high order allocations (order-9 on x86) and falling back to
      doing reclaim can potentially be *very* expensive without any indication
      that compaction would even be successful.
      
      Reclaim itself is unlikely to free entire pageblocks and certainly no
      reliance should be put on it to do so in isolation (recall lumpy reclaim).
      This means we should avoid reclaim and simply fail hugepage allocation if
      compaction is deferred.
      
      It is also not helpful to thrash a zone by doing excessive reclaim if
      compaction may not be able to access that memory.  If order-0 watermarks
      fail and the allocation order is sufficiently large, it is likely better
      to fail the allocation rather than thrashing the zone.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b39d0ee2
    • D
      Revert "Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"" · 19deb769
      David Rientjes 提交于
      This reverts commit 92717d42.
      
      Since commit a8282608 ("Revert "mm, thp: restore node-local hugepage
      allocations"") is reverted in this series, it is better to restore the
      previous 5.2 behavior between the thp allocation and the page allocator
      rather than to attempt any consolidation or cleanup for a policy that is
      now reverted.  It's less risky during an rc cycle and subsequent patches
      in this series further modify the same policy that the pre-5.3 behavior
      implements.
      
      Consolidation and cleanup can be done subsequent to a sane default page
      allocation strategy, so this patch reverts a cleanup done on a strategy
      that is now reverted and thus is the least risky option.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19deb769
    • D
      Revert "Revert "mm, thp: restore node-local hugepage allocations"" · ac79f78d
      David Rientjes 提交于
      This reverts commit a8282608.
      
      The commit references the original intended semantic for MADV_HUGEPAGE
      which has subsequently taken on three unique purposes:
      
       - enables or disables thp for a range of memory depending on the system's
         config (is thp "enabled" set to "always" or "madvise"),
      
       - determines the synchronous compaction behavior for thp allocations at
         fault (is thp "defrag" set to "always", "defer+madvise", or "madvise"),
         and
      
       - reverts a previous MADV_NOHUGEPAGE (there is no madvise mode to only
         clear previous hugepage advice).
      
      These are the three purposes that currently exist in 5.2 and over the
      past several years that userspace has been written around.  Adding a
      NUMA locality preference adds a fourth dimension to an already conflated
      advice mode.
      
      Based on the semantic that MADV_HUGEPAGE has provided over the past
      several years, there exist workloads that use the tunable based on these
      principles: specifically that the allocation should attempt to
      defragment a local node before falling back.  It is agreed that remote
      hugepages typically (but not always) have a better access latency than
      remote native pages, although on Naples this is at parity for
      intersocket.
      
      The revert commit that this patch reverts allows hugepage allocation to
      immediately allocate remotely when local memory is fragmented.  This is
      contrary to the semantic of MADV_HUGEPAGE over the past several years:
      that is, memory compaction should be attempted locally before falling
      back.
      
      The performance degradation of remote hugepages over local hugepages on
      Rome, for example, is 53.5% increased access latency.  For this reason,
      the goal is to revert back to the 5.2 and previous behavior that would
      attempt local defragmentation before falling back.  With the patch that
      is reverted by this patch, we see performance degradations at the tail
      because the allocator happily allocates the remote hugepage rather than
      even attempting to make a local hugepage available.
      
      zone_reclaim_mode is not a solution to this problem since it does not
      only impact hugepage allocations but rather changes the memory
      allocation strategy for *all* page allocations.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac79f78d
  2. 16 9月, 2019 2 次提交
    • L
      Linux 5.3 · 4d856f72
      Linus Torvalds 提交于
      4d856f72
    • L
      Revert "ext4: make __ext4_get_inode_loc plug" · 72dbcf72
      Linus Torvalds 提交于
      This reverts commit b03755ad.
      
      This is sad, and done for all the wrong reasons.  Because that commit is
      good, and does exactly what it says: avoids a lot of small disk requests
      for the inode table read-ahead.
      
      However, it turns out that it causes an entirely unrelated problem: the
      getrandom() system call was introduced back in 2014 by commit
      c6e9d6f3 ("random: introduce getrandom(2) system call"), and people
      use it as a convenient source of good random numbers.
      
      But part of the current semantics for getrandom() is that it waits for
      the entropy pool to fill at least partially (unlike /dev/urandom).  And
      at least ArchLinux apparently has a systemd that uses getrandom() at
      boot time, and the improvements in IO patterns means that existing
      installations suddenly start hanging, waiting for entropy that will
      never happen.
      
      It seems to be an unlucky combination of not _quite_ enough entropy,
      together with a particular systemd version and configuration.  Lennart
      says that the systemd-random-seed process (which is what does this early
      access) is supposed to not block any other boot activity, but sadly that
      doesn't actually seem to be the case (possibly due bogus dependencies on
      cryptsetup for encrypted swapspace).
      
      The correct fix is to fix getrandom() to not block when it's not
      appropriate, but that fix is going to take a lot more discussion.  Do we
      just make it act like /dev/urandom by default, and add a new flag for
      "wait for entropy"? Do we add a boot-time option? Or do we just limit
      the amount of time it will wait for entropy?
      
      So in the meantime, we do the revert to give us time to discuss the
      eventual fix for the fundamental problem, at which point we can re-apply
      the ext4 inode table access optimization.
      Reported-by: NAhmed S. Darwish <darwish.07@gmail.com>
      Cc: Ted Ts'o <tytso@mit.edu>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Alexander E. Patrakov <patrakov@gmail.com>
      Cc: Lennart Poettering <mzxreary@0pointer.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72dbcf72
  3. 15 9月, 2019 7 次提交
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 1609d760
      Linus Torvalds 提交于
      Pull kvm fixes from Paolo Bonzini:
       "The main change here is a revert of reverts. We recently simplified
        some code that was thought unnecessary; however, since then KVM has
        grown quite a few cond_resched()s and for that reason the simplified
        code is prone to livelocks---one CPUs tries to empty a list of guest
        page tables while the others keep adding to them. This adds back the
        generation-based zapping of guest page tables, which was not
        unnecessary after all.
      
        On top of this, there is a fix for a kernel memory leak and a couple
        of s390 fixlets as well"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot
        KVM: x86: work around leak of uninitialized stack contents
        KVM: nVMX: handle page fault in vmread
        KVM: s390: Do not leak kernel stack data in the KVM_S390_INTERRUPT ioctl
        KVM: s390: kvm_s390_vm_start_migration: check dirty_bitmap before using it as target for memset()
      1609d760
    • L
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 1f9c632c
      Linus Torvalds 提交于
      Pull virtio fix from Michael Tsirkin:
       "A last minute revert
      
        The 32-bit build got broken by the latest defence in depth patch.
        Revert and we'll try again in the next cycle"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        Revert "vhost: block speculation of translated descriptors"
      1f9c632c
    • L
      Merge tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · b03c036e
      Linus Torvalds 提交于
      Pull RISC-V fix from Paul Walmsley:
       "Last week, Palmer and I learned that there was an error in the RISC-V
        kernel image header format that could make it less compatible with the
        ARM64 kernel image header format. I had missed this error during my
        original reviews of the patch.
      
        The kernel image header format is an interface that impacts
        bootloaders, QEMU, and other user tools. Those packages must be
        updated to align with whatever is merged in the kernel. We would like
        to avoid proliferating these image formats by keeping the RISC-V
        header as close as possible to the existing ARM64 header. Since the
        arch/riscv patch that adds support for the image header was merged
        with our v5.3-rc1 pull request as commit 0f327f2a ("RISC-V: Add
        an Image header that boot loader can parse."), we think it wise to try
        to fix this error before v5.3 is released.
      
        The fix itself should be backwards-compatible with any project that
        has already merged support for premature versions of this interface.
        It primarily involves ensuring that the RISC-V image header has
        something useful in the same field as the ARM64 image header"
      
      * tag 'riscv/for-v5.3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
        riscv: modify the Image header to improve compatibility with the ARM64 header
      b03c036e
    • M
      Revert "vhost: block speculation of translated descriptors" · 0d4a3f2a
      Michael S. Tsirkin 提交于
      This reverts commit a89db445.
      
      I was hasty to include this patch, and it breaks the build on 32 bit.
      Defence in depth is good but let's do it properly.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      0d4a3f2a
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 36024fcf
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Don't corrupt xfrm_interface parms before validation, from Nicolas
          Dichtel.
      
       2) Revert use of usb-wakeup in btusb, from Mario Limonciello.
      
       3) Block ipv6 packets in bridge netfilter if ipv6 is disabled, from
          Leonardo Bras.
      
       4) IPS_OFFLOAD not honored in ctnetlink, from Pablo Neira Ayuso.
      
       5) Missing ULP check in sock_map, from John Fastabend.
      
       6) Fix receive statistic handling in forcedeth, from Zhu Yanjun.
      
       7) Fix length of SKB allocated in 6pack driver, from Christophe
          JAILLET.
      
       8) ip6_route_info_create() returns an error pointer, not NULL. From
          Maciej Żenczykowski.
      
       9) Only add RDS sock to the hashes after rs_transport is set, from
          Ka-Cheong Poon.
      
      10) Don't double clean TX descriptors in ixgbe, from Ilya Maximets.
      
      11) Presence of transmit IPSEC offload in an SKB is not tested for
          correctly in ixgbe and ixgbevf. From Steffen Klassert and Jeff
          Kirsher.
      
      12) Need rcu_barrier() when register_netdevice() takes one of the
          notifier based failure paths, from Subash Abhinov Kasiviswanathan.
      
      13) Fix leak in sctp_do_bind(), from Mao Wenan.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
        cdc_ether: fix rndis support for Mediatek based smartphones
        sctp: destroy bucket if failed to bind addr
        sctp: remove redundant assignment when call sctp_get_port_local
        sctp: change return type of sctp_get_port_local
        ixgbevf: Fix secpath usage for IPsec Tx offload
        sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'
        ixgbe: Fix secpath usage for IPsec TX offload.
        net: qrtr: fix memort leak in qrtr_tun_write_iter
        net: Fix null de-reference of device refcount
        ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'
        tun: fix use-after-free when register netdev failed
        tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWR
        ixgbe: fix double clean of Tx descriptors with xdp
        ixgbe: Prevent u8 wrapping of ITR value to something less than 10us
        mlx4: fix spelling mistake "veify" -> "verify"
        net: hns3: fix spelling mistake "undeflow" -> "underflow"
        net: lmc: fix spelling mistake "runnin" -> "running"
        NFC: st95hf: fix spelling mistake "receieve" -> "receive"
        net/rds: An rds_sock is added too early to the hash table
        mac80211: Do not send Layer 2 Update frame before authorization
        ...
      36024fcf
    • L
      Merge tag 'mmc-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 1c4c5e25
      Linus Torvalds 提交于
      Pull MMC fixes from Ulf Hansson:
      
       - tmio: Fixup runtime PM management during probe and remove
      
       - sdhci-pci-o2micro: Fix eMMC initialization for an AMD SoC
      
       - bcm2835: Prevent lockups when terminating work
      
      * tag 'mmc-v5.3-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
        mmc: tmio: Fixup runtime PM management during remove
        mmc: tmio: Fixup runtime PM management during probe
        Revert "mmc: tmio: move runtime PM enablement to the driver implementations"
        Revert "mmc: sdhci: Remove unneeded quirk2 flag of O2 SD host controller"
        Revert "mmc: bcm2835: Terminate timeout work synchronously"
      1c4c5e25
    • L
      Merge tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm · 592b8d87
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "From the maintainer summit, just some last minute fixes for final:
      
        lima:
         - fix gem_wait ioctl
      
        core:
         - constify modes list
      
        i915:
         - DP MST high color depth regression
         - GPU hangs on vulkan compute workloads"
      
      * tag 'drm-fixes-2019-09-13' of git://anongit.freedesktop.org/drm/drm:
        drm/lima: fix lima_gem_wait() return value
        drm/i915: Restore relaxed padding (OCL_OOB_SUPPRES_ENABLE) for skl+
        drm/i915: Limit MST to <= 8bpc once again
        drm/modes: Make the whitelist more const
      592b8d87
  4. 14 9月, 2019 10 次提交
    • P
      Merge tag 'kvm-s390-master-5.3-1' of... · a9c20bb0
      Paolo Bonzini 提交于
      Merge tag 'kvm-s390-master-5.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into kvm-master
      
      KVM: s390: Fixes for 5.3
      
      - prevent a user triggerable oops in the migration code
      - do not leak kernel stack content
      a9c20bb0
    • S
      KVM: x86/mmu: Reintroduce fast invalidate/zap for flushing memslot · 002c5f73
      Sean Christopherson 提交于
      James Harvey reported a livelock that was introduced by commit
      d012a06a ("Revert "KVM: x86/mmu: Zap only the relevant pages when
      removing a memslot"").
      
      The livelock occurs because kvm_mmu_zap_all() as it exists today will
      voluntarily reschedule and drop KVM's mmu_lock, which allows other vCPUs
      to add shadow pages.  With enough vCPUs, kvm_mmu_zap_all() can get stuck
      in an infinite loop as it can never zap all pages before observing lock
      contention or the need to reschedule.  The equivalent of kvm_mmu_zap_all()
      that was in use at the time of the reverted commit (4e103134, "KVM:
      x86/mmu: Zap only the relevant pages when removing a memslot") employed
      a fast invalidate mechanism and was not susceptible to the above livelock.
      
      There are three ways to fix the livelock:
      
      - Reverting the revert (commit d012a06a) is not a viable option as
        the revert is needed to fix a regression that occurs when the guest has
        one or more assigned devices.  It's unlikely we'll root cause the device
        assignment regression soon enough to fix the regression timely.
      
      - Remove the conditional reschedule from kvm_mmu_zap_all().  However, although
        removing the reschedule would be a smaller code change, it's less safe
        in the sense that the resulting kvm_mmu_zap_all() hasn't been used in
        the wild for flushing memslots since the fast invalidate mechanism was
        introduced by commit 6ca18b69 ("KVM: x86: use the fast way to
        invalidate all pages"), back in 2013.
      
      - Reintroduce the fast invalidate mechanism and use it when zapping shadow
        pages in response to a memslot being deleted/moved, which is what this
        patch does.
      
      For all intents and purposes, this is a revert of commit ea145aac
      ("Revert "KVM: MMU: fast invalidate all pages"") and a partial revert of
      commit 7390de1e ("Revert "KVM: x86: use the fast way to invalidate
      all pages""), i.e. restores the behavior of commit 5304b8d3 ("KVM:
      MMU: fast invalidate all pages") and commit 6ca18b69 ("KVM: x86:
      use the fast way to invalidate all pages") respectively.
      
      Fixes: d012a06a ("Revert "KVM: x86/mmu: Zap only the relevant pages when removing a memslot"")
      Reported-by: NJames Harvey <jamespharvey20@gmail.com>
      Cc: Alex Willamson <alex.williamson@redhat.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      002c5f73
    • F
      KVM: x86: work around leak of uninitialized stack contents · 541ab2ae
      Fuqian Huang 提交于
      Emulation of VMPTRST can incorrectly inject a page fault
      when passed an operand that points to an MMIO address.
      The page fault will use uninitialized kernel stack memory
      as the CR2 and error code.
      
      The right behavior would be to abort the VM with a KVM_EXIT_INTERNAL_ERROR
      exit to userspace; however, it is not an easy fix, so for now just ensure
      that the error code and CR2 are zero.
      Signed-off-by: NFuqian Huang <huangfq.daxian@gmail.com>
      Cc: stable@vger.kernel.org
      [add comment]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      541ab2ae
    • P
      KVM: nVMX: handle page fault in vmread · f7eea636
      Paolo Bonzini 提交于
      The implementation of vmread to memory is still incomplete, as it
      lacks the ability to do vmread to I/O memory just like vmptrst.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f7eea636
    • P
      riscv: modify the Image header to improve compatibility with the ARM64 header · 474efecb
      Paul Walmsley 提交于
      Part of the intention during the definition of the RISC-V kernel image
      header was to lay the groundwork for a future merge with the ARM64
      image header.  One error during my original review was not noticing
      that the RISC-V header's "magic" field was at a different size and
      position than the ARM64's "magic" field.  If the existing ARM64 Image
      header parsing code were to attempt to parse an existing RISC-V kernel
      image header format, it would see a magic number 0.  This is
      undesirable, since it's our intention to align as closely as possible
      with the ARM64 header format.  Another problem was that the original
      "res3" field was not being initialized correctly to zero.
      
      Address these issues by creating a 32-bit "magic2" field in the RISC-V
      header which matches the ARM64 "magic" field.  RISC-V binaries will
      store "RSC\x05" in this field.  The intention is that the use of the
      existing 64-bit "magic" field in the RISC-V header will be deprecated
      over time.  Increment the minor version number of the file format to
      indicate this change, and update the documentation accordingly.  Fix
      the assembler directives in head.S to ensure that reserved fields are
      properly zero-initialized.
      Signed-off-by: NPaul Walmsley <paul.walmsley@sifive.com>
      Reported-by: NPalmer Dabbelt <palmer@sifive.com>
      Reviewed-by: NPalmer Dabbelt <palmer@sifive.com>
      Cc: Atish Patra <atish.patra@wdc.com>
      Cc: Karsten Merker <merker@debian.org>
      Link: https://lore.kernel.org/linux-riscv/194c2f10c9806720623430dbf0cc59a965e50448.camel@wdc.com/T/#u
      Link: https://lore.kernel.org/linux-riscv/mhng-755b14c4-8f35-4079-a7ff-e421fd1b02bc@palmer-si-x1e/T/#t
      474efecb
    • B
      cdc_ether: fix rndis support for Mediatek based smartphones · 4d7ffcf3
      Bjørn Mork 提交于
      A Mediatek based smartphone owner reports problems with USB
      tethering in Linux.  The verbose USB listing shows a rndis_host
      interface pair (e0/01/03 + 10/00/00), but the driver fails to
      bind with
      
      [  355.960428] usb 1-4: bad CDC descriptors
      
      The problem is a failsafe test intended to filter out ACM serial
      functions using the same 02/02/ff class/subclass/protocol as RNDIS.
      The serial functions are recognized by their non-zero bmCapabilities.
      
      No RNDIS function with non-zero bmCapabilities were known at the time
      this failsafe was added. But it turns out that some Wireless class
      RNDIS functions are using the bmCapabilities field. These functions
      are uniquely identified as RNDIS by their class/subclass/protocol, so
      the failing test can safely be disabled.  The same applies to the two
      types of Misc class RNDIS functions.
      
      Applying the failsafe to Communication class functions only retains
      the original functionality, and fixes the problem for the Mediatek based
      smartphone.
      
      Tow examples of CDC functional descriptors with non-zero bmCapabilities
      from Wireless class RNDIS functions are:
      
      0e8d:000a  Mediatek Crosscall Spider X5 3G Phone
      
            CDC Header:
              bcdCDC               1.10
            CDC ACM:
              bmCapabilities       0x0f
                connection notifications
                sends break
                line coding and serial state
                get/set/clear comm features
            CDC Union:
              bMasterInterface        0
              bSlaveInterface         1
            CDC Call Management:
              bmCapabilities       0x03
                call management
                use DataInterface
              bDataInterface          1
      
      and
      
      19d2:1023  ZTE K4201-z
      
            CDC Header:
              bcdCDC               1.10
            CDC ACM:
              bmCapabilities       0x02
                line coding and serial state
            CDC Call Management:
              bmCapabilities       0x03
                call management
                use DataInterface
              bDataInterface          1
            CDC Union:
              bMasterInterface        0
              bSlaveInterface         1
      
      The Mediatek example is believed to apply to most smartphones with
      Mediatek firmware.  The ZTE example is most likely also part of a larger
      family of devices/firmwares.
      Suggested-by: NLars Melin <larsm17@gmail.com>
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d7ffcf3
    • D
      Merge branch 'sctp_do_bind-leak' · ae3b06ed
      David S. Miller 提交于
      Mao Wenan says:
      
      ====================
      fix memory leak for sctp_do_bind
      
      First two patches are to do cleanup, remove redundant assignment,
      and change return type of sctp_get_port_local.
      Third patch is to fix memory leak for sctp_do_bind if failed
      to bind address.
      
      v2: add one patch to change return type of sctp_get_port_local.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae3b06ed
    • M
      sctp: destroy bucket if failed to bind addr · 29b99f54
      Mao Wenan 提交于
      There is one memory leak bug report:
      BUG: memory leak
      unreferenced object 0xffff8881dc4c5ec0 (size 40):
        comm "syz-executor.0", pid 5673, jiffies 4298198457 (age 27.578s)
        hex dump (first 32 bytes):
          02 00 00 00 81 88 ff ff 00 00 00 00 00 00 00 00  ................
          f8 63 3d c1 81 88 ff ff 00 00 00 00 00 00 00 00  .c=.............
        backtrace:
          [<0000000072006339>] sctp_get_port_local+0x2a1/0xa00 [sctp]
          [<00000000c7b379ec>] sctp_do_bind+0x176/0x2c0 [sctp]
          [<000000005be274a2>] sctp_bind+0x5a/0x80 [sctp]
          [<00000000b66b4044>] inet6_bind+0x59/0xd0 [ipv6]
          [<00000000c68c7f42>] __sys_bind+0x120/0x1f0 net/socket.c:1647
          [<000000004513635b>] __do_sys_bind net/socket.c:1658 [inline]
          [<000000004513635b>] __se_sys_bind net/socket.c:1656 [inline]
          [<000000004513635b>] __x64_sys_bind+0x3e/0x50 net/socket.c:1656
          [<0000000061f2501e>] do_syscall_64+0x72/0x2e0 arch/x86/entry/common.c:296
          [<0000000003d1e05e>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      This is because in sctp_do_bind, if sctp_get_port_local is to
      create hash bucket successfully, and sctp_add_bind_addr failed
      to bind address, e.g return -ENOMEM, so memory leak found, it
      needs to destroy allocated bucket.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29b99f54
    • M
      sctp: remove redundant assignment when call sctp_get_port_local · e0e4b8de
      Mao Wenan 提交于
      There are more parentheses in if clause when call sctp_get_port_local
      in sctp_do_bind, and redundant assignment to 'ret'. This patch is to
      do cleanup.
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0e4b8de
    • M
      sctp: change return type of sctp_get_port_local · 8e2ef6ab
      Mao Wenan 提交于
      Currently sctp_get_port_local() returns a long
      which is either 0,1 or a pointer casted to long.
      It's neither of the callers use the return value since
      commit 62208f12 ("net: sctp: simplify sctp_get_port").
      Now two callers are sctp_get_port and sctp_do_bind,
      they actually assumend a casted to an int was the same as
      a pointer casted to a long, and they don't save the return
      value just check whether it is zero or non-zero, so
      it would better change return type from long to int for
      sctp_get_port_local.
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8e2ef6ab
  5. 13 9月, 2019 8 次提交
    • J
      ixgbevf: Fix secpath usage for IPsec Tx offload · 8f6617ba
      Jeff Kirsher 提交于
      Port the same fix for ixgbe to ixgbevf.
      
      The ixgbevf driver currently does IPsec Tx offloading
      based on an existing secpath. However, the secpath
      can also come from the Rx side, in this case it is
      misinterpreted for Tx offload and the packets are
      dropped with a "bad sa_idx" error. Fix this by using
      the xfrm_offload() function to test for Tx offload.
      
      CC: Shannon Nelson <snelson@pensando.io>
      Fixes: 7f68d430 ("ixgbevf: enable VF IPsec offload operations")
      Reported-by: NJonathan Tooker <jonathan@reliablehosting.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Acked-by: NShannon Nelson <snelson@pensando.io>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f6617ba
    • U
      mmc: tmio: Fixup runtime PM management during remove · 87b5d602
      Ulf Hansson 提交于
      Accessing the device when it may be runtime suspended is a bug, which is
      the case in tmio_mmc_host_remove(). Let's fix the behaviour.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      87b5d602
    • U
      mmc: tmio: Fixup runtime PM management during probe · aa86f1a3
      Ulf Hansson 提交于
      The tmio_mmc_host_probe() calls pm_runtime_set_active() to update the
      runtime PM status of the device, as to make it reflect the current status
      of the HW. This works fine for most cases, but unfortunate not for all.
      Especially, there is a generic problem when the device has a genpd attached
      and that genpd have the ->start|stop() callbacks assigned.
      
      More precisely, if the driver calls pm_runtime_set_active() during
      ->probe(), genpd does not get to invoke the ->start() callback for it,
      which means the HW isn't really fully powered on. Furthermore, in the next
      phase, when the device becomes runtime suspended, genpd will invoke the
      ->stop() callback for it, potentially leading to usage count imbalance
      problems, depending on what's implemented behind the callbacks of course.
      
      To fix this problem, convert to call pm_runtime_get_sync() from
      tmio_mmc_host_probe() rather than pm_runtime_set_active(). Additionally, to
      avoid bumping usage counters and unnecessary re-initializing the HW the
      first time the tmio driver's ->runtime_resume() callback is called,
      introduce a state flag to keeping track of this.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      aa86f1a3
    • U
      Revert "mmc: tmio: move runtime PM enablement to the driver implementations" · 8861474a
      Ulf Hansson 提交于
      This reverts commit 7ff21319.
      
      It turns out that the above commit introduces other problems. For example,
      calling pm_runtime_set_active() must not be done prior calling
      pm_runtime_enable() as that makes it fail. This leads to additional
      problems, such as clock enables being wrongly balanced.
      
      Rather than fixing the problem on top, let's start over by doing a revert.
      
      Fixes: 7ff21319 ("mmc: tmio: move runtime PM enablement to the driver implementations")
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      Tested-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      8861474a
    • L
      Merge branch 'for-5.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · a7f89616
      Linus Torvalds 提交于
      Pull cgroup fix from Tejun Heo:
       "Roman found and fixed a bug in the cgroup2 freezer which allows new
        child cgroup to escape frozen state"
      
      * 'for-5.3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
        cgroup: freezer: fix frozen state inheritance
        kselftests: cgroup: add freezer mkdir test
      a7f89616
    • L
      Merge tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 1b304a1a
      Linus Torvalds 提交于
      Pull btrfs fixes from David Sterba:
       "Here are two fixes, one of them urgent fixing a bug introduced in 5.2
        and reported by many users. It took time to identify the root cause,
        catching the 5.3 release is higly desired also to push the fix to 5.2
        stable tree.
      
        The bug is a mess up of return values after adding proper error
        handling and honestly the kind of bug that can cause sleeping
        disorders until it's caught. My appologies to everybody who was
        affected.
      
        Summary of what could happen:
      
        1) either a hang when committing a transaction, if this happens
           there's no risk of corruption, still the hang is very inconvenient
           and can't be resolved without a reboot
      
        2) writeback for some btree nodes may never be started and we end up
           committing a transaction without noticing that, this is really
           serious and that will lead to the "parent transid verify failed"
           messages"
      
      * tag 'for-5.3-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        Btrfs: fix unwritten extent buffers and hangs on future writeback attempts
        Btrfs: fix assertion failure during fsync and use of stale transaction
      1b304a1a
    • R
      cgroup: freezer: fix frozen state inheritance · 97a61369
      Roman Gushchin 提交于
      If a new child cgroup is created in the frozen cgroup hierarchy
      (one or more of ancestor cgroups is frozen), the CGRP_FREEZE cgroup
      flag should be set. Otherwise if a process will be attached to the
      child cgroup, it won't become frozen.
      
      The problem can be reproduced with the test_cgfreezer_mkdir test.
      
      This is the output before this patch:
        ~/test_freezer
        ok 1 test_cgfreezer_simple
        ok 2 test_cgfreezer_tree
        ok 3 test_cgfreezer_forkbomb
        Cgroup /sys/fs/cgroup/cg_test_mkdir_A/cg_test_mkdir_B isn't frozen
        not ok 4 test_cgfreezer_mkdir
        ok 5 test_cgfreezer_rmdir
        ok 6 test_cgfreezer_migrate
        ok 7 test_cgfreezer_ptrace
        ok 8 test_cgfreezer_stopped
        ok 9 test_cgfreezer_ptraced
        ok 10 test_cgfreezer_vfork
      
      And with this patch:
        ~/test_freezer
        ok 1 test_cgfreezer_simple
        ok 2 test_cgfreezer_tree
        ok 3 test_cgfreezer_forkbomb
        ok 4 test_cgfreezer_mkdir
        ok 5 test_cgfreezer_rmdir
        ok 6 test_cgfreezer_migrate
        ok 7 test_cgfreezer_ptrace
        ok 8 test_cgfreezer_stopped
        ok 9 test_cgfreezer_ptraced
        ok 10 test_cgfreezer_vfork
      Reported-by: NMark Crossen <mcrossen@fb.com>
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Fixes: 76f969e8 ("cgroup: cgroup v2 freezer")
      Cc: Tejun Heo <tj@kernel.org>
      Cc: stable@vger.kernel.org # v5.2+
      Signed-off-by: NTejun Heo <tj@kernel.org>
      97a61369
    • R
      kselftests: cgroup: add freezer mkdir test · 44e9d308
      Roman Gushchin 提交于
      Add a new cgroup freezer selftest, which checks that if a cgroup is
      frozen, their new child cgroups will properly inherit the frozen
      state.
      
      It creates a parent cgroup, freezes it, creates a child cgroup
      and populates it with a dummy process. Then it checks that both
      parent and child cgroup are frozen.
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      44e9d308
  6. 12 9月, 2019 9 次提交