1. 10 12月, 2017 1 次提交
  2. 09 12月, 2017 39 次提交
    • M
      kmemcheck: rip it out for real · f335195a
      Michal Hocko 提交于
      Commit 4675ff05 ("kmemcheck: rip it out") has removed the code but
      for some reason SPDX header stayed in place.  This looks like a rebase
      mistake in the mmotm tree or the merge mistake.  Let's drop those
      leftovers as well.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f335195a
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · e9ef1fe3
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) CAN fixes from Martin Kelly (cancel URBs properly in all the CAN usb
          drivers).
      
       2) Revert returning -EEXIST from __dev_alloc_name() as this propagates
          to userspace and broke some apps. From Johannes Berg.
      
       3) Fix conn memory leaks and crashes in TIPC, from Jon Malloc and Cong
          Wang.
      
       4) Gianfar MAC can't do EEE so don't advertise it by default, from
          Claudiu Manoil.
      
       5) Relax strict netlink attribute validation, but emit a warning. From
          David Ahern.
      
       6) Fix regression in checksum offload of thunderx driver, from Florian
          Westphal.
      
       7) Fix UAPI bpf issues on s390, from Hendrik Brueckner.
      
       8) New card support in iwlwifi, from Ihab Zhaika.
      
       9) BBR congestion control bug fixes from Neal Cardwell.
      
      10) Fix port stats in nfp driver, from Pieter Jansen van Vuuren.
      
      11) Fix leaks in qualcomm rmnet, from Subash Abhinov Kasiviswanathan.
      
      12) Fix DMA API handling in sh_eth driver, from Thomas Petazzoni.
      
      13) Fix spurious netpoll warnings in bnxt_en, from Calvin Owens.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (67 commits)
        net: mvpp2: fix the RSS table entry offset
        tcp: evaluate packet losses upon RTT change
        tcp: fix off-by-one bug in RACK
        tcp: always evaluate losses in RACK upon undo
        tcp: correctly test congestion state in RACK
        bnxt_en: Fix sources of spurious netpoll warnings
        tcp_bbr: reset long-term bandwidth sampling on loss recovery undo
        tcp_bbr: reset full pipe detection on loss recovery undo
        tcp_bbr: record "full bw reached" decision in new full_bw_reached bit
        sfc: pass valid pointers from efx_enqueue_unwind
        gianfar: Disable EEE autoneg by default
        tcp: invalidate rate samples during SACK reneging
        can: peak/pcie_fd: fix potential bug in restarting tx queue
        can: usb_8dev: cancel urb on -EPIPE and -EPROTO
        can: kvaser_usb: cancel urb on -EPIPE and -EPROTO
        can: esd_usb2: cancel urb on -EPIPE and -EPROTO
        can: ems_usb: cancel urb on -EPIPE and -EPROTO
        can: mcba_usb: cancel urb on -EPROTO
        usbnet: fix alignment for frames with no ethernet header
        tcp: use current time in tcp_rcv_space_adjust()
        ...
      e9ef1fe3
    • L
      Merge tag 'media/v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · 77071bc6
      Linus Torvalds 提交于
      Pull media fixes from Mauro Carvalho Chehab:
      
       "A series of fixes for the media subsytem:
      
         - The largest amount of fixes in this series is with regards to
           comments that aren't kernel-doc, but start with "/**".
      
           A new check added for 4.15 makes it to produce a *huge* amount of
           new warnings (I'm compiling here with W=1). Most of the patches in
           this series fix those.
      
           No code changes - just comment changes at the source files
      
         - rc: some fixed in order to better handle RC repetition codes
      
         - v4l-async: use the v4l2_dev from the root notifier when matching
           sub-devices
      
         - v4l2-fwnode: Check subdev count after checking port
      
         - ov 13858 and et8ek8: compilation fix with randconfigs
      
         - usbtv: a trivial new USB ID addition
      
         - dibusb-common: don't do DMA on stack on firmware load
      
         - imx274: Fix error handling, add MAINTAINERS entry
      
         - sir_ir: detect presence of port"
      
      * tag 'media/v4.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (50 commits)
        media: imx274: Fix error handling, add MAINTAINERS entry
        media: v4l: async: use the v4l2_dev from the root notifier when matching sub-devices
        media: v4l2-fwnode: Check subdev count after checking port
        media: et8ek8: select V4L2_FWNODE
        media: ov13858: Select V4L2_FWNODE
        media: rc: partial revert of "media: rc: per-protocol repeat period"
        media: dvb: i2c transfers over usb cannot be done from stack
        media: dvb-frontends: complete kernel-doc markups
        media: docs: add documentation for frontend attach info
        media: dvb_frontends: fix kernel-doc macros
        media: drivers: remove "/**" from non-kernel-doc comments
        media: lm3560: add a missing kernel-doc parameter
        media: rcar_jpu: fix two kernel-doc markups
        media: vsp1: add a missing kernel-doc parameter
        media: soc_camera: fix a kernel-doc markup
        media: mt2063: fix some kernel-doc warnings
        media: radio-wl1273: fix a parameter name at kernel-doc macro
        media: s3c-camif: add missing description at s3c_camif_find_format()
        media: mtk-vpu: add description for wdt fields at struct mtk_vpu
        media: vdec: fix some kernel-doc warnings
        ...
      77071bc6
    • L
      Merge tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux · 4066aa72
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "This pull is a bit larger than I'd like but a large bunch of it is
        license fixes, AMD wanted to fix the licenses for a bunch of files
        that were missing them,
      
       Otherwise a bunch of TTM regression fix since the hugepage support,
       some i915 and gvt fixes, a core connector free in a safe context fix,
       and one bridge fix"
      
      * tag 'drm-fixes-for-v4.15-rc3' of git://people.freedesktop.org/~airlied/linux: (26 commits)
        drm/bridge: analogix dp: Fix runtime PM state in get_modes() callback
        Revert "drm/i915: Display WA #1133 WaFbcSkipSegments:cnl, glk"
        drm/vc4: Fix false positive WARN() backtrace on refcount_inc() usage
        drm/i915: Call i915_gem_init_userptr() before taking struct_mutex
        drm/exynos: remove unnecessary function declaration
        drm/exynos: remove unnecessary descrptions
        drm/exynos: gem: Drop NONCONTIG flag for buffers allocated without IOMMU
        drm/exynos: Fix dma-buf import
        drm/ttm: swap consecutive allocated pooled pages v4
        drm: safely free connectors from connector_iter
        drm/i915/gvt: set max priority for gvt context
        drm/i915/gvt: Don't mark vgpu context as inactive when preempted
        drm/i915/gvt: Limit read hw reg to active vgpu
        drm/i915/gvt: Export intel_gvt_render_mmio_to_ring_id()
        drm/i915/gvt: Emulate PCI expansion ROM base address register
        drm/ttm: swap consecutive allocated cached pages v3
        drm/ttm: roundup the shrink request to prevent skip huge pool
        drm/ttm: add page order support in ttm_pages_put
        drm/ttm: add set_pages_wb for handling page order more than zero
        drm/ttm: add page order in page pool
        ...
      4066aa72
    • L
      Merge tag 'md/4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md · 7267212c
      Linus Torvalds 提交于
      Pull md fixes from Shaohua Li:
       "Some MD fixes.
      
        The notable one is a raid5-cache deadlock bug with dm-raid, others are
        not significant"
      
      * tag 'md/4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md:
        md/raid1/10: add missed blk plug
        md: limit mdstat resync progress to max_sectors
        md/r5cache: move mddev_lock() out of r5c_journal_mode_set()
        md/raid5: correct degraded calculation in raid5_error
      7267212c
    • L
      Merge tag 'devicetree-fixes-for-4.15-part2' of... · 78d9b048
      Linus Torvalds 提交于
      Merge tag 'devicetree-fixes-for-4.15-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
      
      Pull DeviceTree fixes from Rob Herring:
       "Another set of DT fixes:
      
         - Fixes from overlay code rework. A trifecta of fixes to the locking,
           an out of bounds access, and a memory leak in of_overlay_apply()
      
         - Clean-up at25 eeprom binding document
      
         - Remove leading '0x' in unit-addresses from binding docs"
      
      * tag 'devicetree-fixes-for-4.15-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
        of: overlay: Make node skipping in init_overlay_changeset() clearer
        of: overlay: Fix out-of-bounds write in init_overlay_changeset()
        of: overlay: Fix (un)locking in of_overlay_apply()
        of: overlay: Fix memory leak in of_overlay_apply() error path
        dt-bindings: eeprom: at25: Document device-specific compatible values
        dt-bindings: eeprom: at25: Grammar s/are can/can/
        dt-bindings: Remove leading 0x from bindings notation
        of: overlay: Remove else after goto
        of: Spelling s/changset/changeset/
        of: unittest: Remove bogus overlay mutex release from overlay_data_add()
      78d9b048
    • L
      Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost · 900add27
      Linus Torvalds 提交于
      Pull virtio bugfixes from Michael Tsirkin:
       "A couple of minor bugfixes"
      
      * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
        virtio_net: fix return value check in receive_mergeable()
        virtio_mmio: add cleanup for virtio_mmio_remove
        virtio_mmio: add cleanup for virtio_mmio_probe
      900add27
    • L
      Merge tag 'for-linus-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · 32abeb09
      Linus Torvalds 提交于
      Pull xen fixes from Juergen Gross:
       "Just two small fixes for the new pvcalls frontend driver"
      
      * tag 'for-linus-4.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xen/pvcalls: Fix a check in pvcalls_front_remove()
        xen/pvcalls: check for xenbus_read() errors
      32abeb09
    • L
      Merge tag 'powerpc-4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · d90696ed
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
      
       "One notable fix for kexec on Power9, where we were not clearing MMU
        PID properly which sometimes leads to hangs. Finally debugged to a
        root cause by Nick.
      
        A revert of a patch which tried to rework our panic handling to get
        more output on the console, but inadvertently broke reporting the
        panic to the hypervisor, which apparently people care about.
      
        Then a fix for an oops in the PMU code, and finally some s/%p/%px/ in
        xmon.
      
        Thanks to: David Gibson, Nicholas Piggin, Ravi Bangoria"
      
      * tag 'powerpc-4.15-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/xmon: Don't print hashed pointers in xmon
        powerpc/64s: Initialize ISAv3 MMU registers before setting partition table
        Revert "powerpc: Do not call ppc_md.panic in fadump panic notifier"
        powerpc/perf: Fix oops when grouping different pmu events
      d90696ed
    • D
      Merge tag 'linux-can-fixes-for-4.15-20171208' of... · fd29117a
      David S. Miller 提交于
      Merge tag 'linux-can-fixes-for-4.15-20171208' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2017-12-08
      
      this is a pull request of 6 patches for net/master.
      
      Martin Kelly provides 5 patches for various USB based CAN drivers, that
      properly cancel the URBs on adapter unplug, so that the driver doesn't
      end up in an endless loop. Stephane Grosjean provides a patch to restart
      the tx queue if zero length packages are transmitted.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd29117a
    • G
      macvlan: fix memory hole in macvlan_dev · 5e54b3c1
      Girish Moodalbail 提交于
      Move 'macaddr_count' from after 'netpoll' to after 'nest_level' to pack
      and reduce a memory hole.
      
      Fixes: 88ca59d1 (macvlan: remove unused fields in struct macvlan_dev)
      Signed-off-by: NGirish Moodalbail <girish.moodalbail@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e54b3c1
    • D
      Merge tag 'wireless-drivers-for-davem-2017-12-08' of... · 03afb6e4
      David S. Miller 提交于
      Merge tag 'wireless-drivers-for-davem-2017-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
      
      Kalle Valo says:
      
      ====================
      wireless-drivers fixes for 4.15
      
      Second set of fixes for 4.15. This time a lot of iwlwifi patches and
      two brcmfmac patches. Most important here are the MIC and IVC fixes
      for iwlwifi to unbreak 9000 series.
      
      iwlwifi
      
      * fix rate-scaling to not start lowest possible rate
      
      * fix the TX queue hang detection for AP/GO modes
      
      * fix the TX queue hang timeout in monitor interfaces
      
      * fix packet injection
      
      * remove a wrong error message when dumping PCI registers
      
      * fix race condition with RF-kill
      
      * tell mac80211 when the MIC has been stripped (9000 series)
      
      * tell mac80211 when the IVC has been stripped (9000 series)
      
      * add 2 new PCI IDs, one for 9000 and one for 22000
      
      * fix a queue hang due during a P2P Remain-on-Channel operation
      
      brcmfmac
      
      * fix a race which sometimes caused a crash during sdio unbind
      
      * fix a kernel-doc related build error
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03afb6e4
    • M
      slip: sl_alloc(): remove unused parameter "dev_t line" · 936e5d8b
      Marc Kleine-Budde 提交于
      The first and only parameter of sl_alloc() is unused, so remove it.
      
      Fixes: 5342b77c slip: ("Clean up create and destroy")
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      936e5d8b
    • A
      net: mvpp2: fix the RSS table entry offset · 8a7b741e
      Antoine Tenart 提交于
      The macro used to access or set an RSS table entry was using an offset
      of 8, while it should use an offset of 0. This lead to wrongly configure
      the RSS table, not accessing the right entries.
      
      Fixes: 1d7d15d7 ("net: mvpp2: initialize the RSS tables")
      Signed-off-by: NAntoine Tenart <antoine.tenart@free-electrons.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8a7b741e
    • D
      Merge branch 'cxgb4-collect-hardware-logs-via-ethtool' · 2deeb495
      David S. Miller 提交于
      Rahul Lakkireddy says:
      
      ====================
      cxgb4: collect hardware logs via ethtool
      
      Collect more hardware logs via ethtool --get-dump facility.
      
      Patch 1 collects on-chip memory layout information.
      
      Patch 2 collects on-chip MC memory dumps.
      
      Patch 3 collects HMA memory dump.
      
      Patch 4 evaluates and skips TX and RX payload regions in memory dumps.
      
      Patch 5 collects egress and ingress SGE queue contexts.
      
      Patch 6 collects PCIe configuration logs
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2deeb495
    • R
    • R
      cxgb4: collect egress and ingress SGE queue contexts · 736c3b94
      Rahul Lakkireddy 提交于
      Use meminfo to identify the egress and ingress context regions and
      fetch all valid contexts from those regions. Also flush all contexts
      before attempting collection to prevent stale information.
      Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      736c3b94
    • R
      cxgb4: skip TX and RX payload regions in memory dumps · c1219653
      Rahul Lakkireddy 提交于
      Use meminfo to identify TX and RX payload regions and skip them in
      collection of EDC, MC, and HMA.
      Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c1219653
    • R
      4db0401f
    • R
      cxgb4: collect MC memory dump · a1c69520
      Rahul Lakkireddy 提交于
      Use meminfo to get base address and size of MC memory.  Also use same
      meminfo for EDC memory dumps.
      Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1c69520
    • R
      cxgb4: collect on-chip memory information · 123e25c4
      Rahul Lakkireddy 提交于
      Collect memory layout of various on-chip memory regions.  Move code
      for collecting on-chip memory information to cudbg_lib.c and update
      cxgb4_debugfs.c to use the common function.  Also include
      cudbg_entity.h before cudbg_lib.h to avoid adding cudbg entity
      structure forward declarations in cudbg_lib.h.
      Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      123e25c4
    • D
      Merge branch 'veth-and-GSO-maximums' · 62fd8b18
      David S. Miller 提交于
      Stephen Hemminger says:
      
      ====================
      veth and GSO maximums
      
      This is the more general way to solving the issue of GSO limits
      not being set correctly for containers on Azure. If a GSO packet
      is sent to host that exceeds the limit (reported by NDIS), then
      the host is forced to do segmentation in software which has noticeable
      performance impact.
      
      The core rtnetlink infrastructure already has the messages and
      infrastructure to allow changing gso limits. With an updated iproute2
      the following already works:
        # ip li set dev dummy0 gso_max_size 30000
      
      These patches are about making it easier with veth.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      62fd8b18
    • S
      veth: set peer GSO values · 72d24955
      Stephen Hemminger 提交于
      When new veth is created, and GSO values have been configured
      on one device, clone those values to the peer.
      
      For example:
         # ip link add dev vm1 gso_max_size 65530 type veth peer name vm2
      
      This should create vm1 <--> vm2 with both having GSO maximum
      size set to 65530.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      72d24955
    • S
      rtnetlink: allow GSO maximums to be set on device creation · 46e6b992
      Stephen Hemminger 提交于
      Netlink device already allows changing GSO sizes with
      ip set command. The part that is missing is allowing overriding
      GSO settings on device creation.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46e6b992
    • N
      net: stmmac: fix broken dma_interrupt handling for multi-queues · 5a6a0445
      Niklas Cassel 提交于
      There is nothing that says that number of TX queues == number of RX
      queues. E.g. the ARTPEC-6 SoC has 2 TX queues and 1 RX queue.
      
      This code is obviously wrong:
      for (chan = 0; chan < tx_channel_count; chan++) {
          struct stmmac_rx_queue *rx_q = &priv->rx_queue[chan];
      
      priv->rx_queue has size MTL_MAX_RX_QUEUES, so this will send an
      uninitialized napi_struct to __napi_schedule(), causing us to
      crash in net_rx_action(), because napi_struct->poll is zero.
      
      [12846.759880] Unable to handle kernel NULL pointer dereference at virtual address 00000000
      [12846.768014] pgd = (ptrval)
      [12846.770742] [00000000] *pgd=39ec7831, *pte=00000000, *ppte=00000000
      [12846.777023] Internal error: Oops: 80000007 [#1] PREEMPT SMP ARM
      [12846.782942] Modules linked in:
      [12846.785998] CPU: 0 PID: 161 Comm: dropbear Not tainted 4.15.0-rc2-00285-gf5fb5f2f39a7 #36
      [12846.794177] Hardware name: Axis ARTPEC-6 Platform
      [12846.798879] task: (ptrval) task.stack: (ptrval)
      [12846.803407] PC is at 0x0
      [12846.805942] LR is at net_rx_action+0x274/0x43c
      [12846.810383] pc : [<00000000>]    lr : [<80bff064>]    psr: 200e0113
      [12846.816648] sp : b90d9ae8  ip : b90d9ae8  fp : b90d9b44
      [12846.821871] r10: 00000008  r9 : 0013250e  r8 : 00000100
      [12846.827094] r7 : 0000012c  r6 : 00000000  r5 : 00000001  r4 : bac84900
      [12846.833619] r3 : 00000000  r2 : b90d9b08  r1 : 00000000  r0 : bac84900
      
      Since each DMA channel can be used for rx and tx simultaneously,
      the current code should probably be rewritten so that napi_struct is
      embedded in a new struct stmmac_channel.
      That way, stmmac_poll() can call stmmac_tx_clean() on just the tx queue
      where we got the IRQ, instead of looping through all tx queues.
      This is also how the xgbe driver does it (another driver for this IP).
      
      Fixes: c22a3f48 ("net: stmmac: adding multiple napi mechanism")
      Signed-off-by: NNiklas Cassel <niklas.cassel@axis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a6a0445
    • D
      Merge branch 'tcp-RACK-loss-recovery-bug-fixes' · b7e445a1
      David S. Miller 提交于
      Yuchung Cheng says:
      
      ====================
      tcp: RACK loss recovery bug fixes
      
      This patch set has four minor bug fixes in TCP RACK loss recovery.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7e445a1
    • Y
      tcp: evaluate packet losses upon RTT change · 6065fd0d
      Yuchung Cheng 提交于
      RACK skips an ACK unless it advances the most recently delivered
      TX timestamp (rack.mstamp). Since RACK also uses the most recent
      RTT to decide if a packet is lost, RACK should still run the
      loss detection whenever the most recent RTT changes. For example,
      an ACK that does not advance the timestamp but triggers the cwnd
      undo due to reordering, would then use the most recent (higher)
      RTT measurement to detect further losses.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Reviewed-by: NPriyaranjan Jha <priyarjha@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6065fd0d
    • Y
      tcp: fix off-by-one bug in RACK · 428aec5e
      Yuchung Cheng 提交于
      RACK should mark a packet lost when remaining wait time is zero.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Reviewed-by: NPriyaranjan Jha <priyarjha@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      428aec5e
    • Y
      tcp: always evaluate losses in RACK upon undo · cd1fc85b
      Yuchung Cheng 提交于
      When sender detects spurious retransmission, all packets
      marked lost are remarked to be in-flight. However some may
      be considered lost based on its timestamps in RACK. This patch
      forces RACK to re-evaluate, which may be skipped previously if
      the ACK does not advance RACK timestamp.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Reviewed-by: NPriyaranjan Jha <priyarjha@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd1fc85b
    • Y
      tcp: correctly test congestion state in RACK · 0ce294d8
      Yuchung Cheng 提交于
      RACK does not test the loss recovery state correctly to compute
      the reordering window. It assumes if lost_out is zero then TCP is
      not in loss recovery. But it can be zero during recovery before
      calling tcp_rack_detect_loss(): when an ACK acknowledges all
      packets marked lost before receiving this ACK, but has not yet
      to discover new ones by tcp_rack_detect_loss(). The fix is to
      simply test the congestion state directly.
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Reviewed-by: NNeal Cardwell <ncardwell@google.com>
      Reviewed-by: NPriyaranjan Jha <priyarjha@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ce294d8
    • E
      net: dsa: lan9303: Protect ALR operations with mutex · 2e8d243e
      Egil Hjelmeland 提交于
      ALR table operations are a sequence of related register operations which
      should be protected from concurrent access. The alr_cache should also be
      protected. Add alr_mutex doing that.
      Signed-off-by: NEgil Hjelmeland <privat@egil-hjelmeland.no>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2e8d243e
    • J
      net: sched: fix use-after-free in tcf_block_put_ext · df45bf84
      Jiri Pirko 提交于
      Since the block is freed with last chain being put, once we reach the
      end of iteration of list_for_each_entry_safe, the block may be
      already freed. I'm hitting this only by creating and deleting clsact:
      
      [  202.171952] ==================================================================
      [  202.180182] BUG: KASAN: use-after-free in tcf_block_put_ext+0x240/0x390
      [  202.187590] Read of size 8 at addr ffff880225539a80 by task tc/796
      [  202.194508]
      [  202.196185] CPU: 0 PID: 796 Comm: tc Not tainted 4.15.0-rc2jiri+ #5
      [  202.203200] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
      [  202.213613] Call Trace:
      [  202.216369]  dump_stack+0xda/0x169
      [  202.220192]  ? dma_virt_map_sg+0x147/0x147
      [  202.224790]  ? show_regs_print_info+0x54/0x54
      [  202.229691]  ? tcf_chain_destroy+0x1dc/0x250
      [  202.234494]  print_address_description+0x83/0x3d0
      [  202.239781]  ? tcf_block_put_ext+0x240/0x390
      [  202.244575]  kasan_report+0x1ba/0x460
      [  202.248707]  ? tcf_block_put_ext+0x240/0x390
      [  202.253518]  tcf_block_put_ext+0x240/0x390
      [  202.258117]  ? tcf_chain_flush+0x290/0x290
      [  202.262708]  ? qdisc_hash_del+0x82/0x1a0
      [  202.267111]  ? qdisc_hash_add+0x50/0x50
      [  202.271411]  ? __lock_is_held+0x5f/0x1a0
      [  202.275843]  clsact_destroy+0x3d/0x80 [sch_ingress]
      [  202.281323]  qdisc_destroy+0xcb/0x240
      [  202.285445]  qdisc_graft+0x216/0x7b0
      [  202.289497]  tc_get_qdisc+0x260/0x560
      
      Fix this by holding the block also by chain 0 and put chain 0
      explicitly, out of the list_for_each_entry_safe loop at the very
      end of tcf_block_put_ext.
      
      Fixes: efbf7897 ("net_sched: get rid of rcu_barrier() in tcf_block_put_ext()")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df45bf84
    • C
      bnxt_en: Fix sources of spurious netpoll warnings · 2edbdb31
      Calvin Owens 提交于
      After applying 2270bc5d ("bnxt_en: Fix netpoll handling") and
      903649e7 ("bnxt_en: Improve -ENOMEM logic in NAPI poll loop."),
      we still see the following WARN fire:
      
        ------------[ cut here ]------------
        WARNING: CPU: 0 PID: 1875170 at net/core/netpoll.c:165 netpoll_poll_dev+0x15a/0x160
        bnxt_poll+0x0/0xd0 exceeded budget in poll
        <snip>
        Call Trace:
         [<ffffffff814be5cd>] dump_stack+0x4d/0x70
         [<ffffffff8107e013>] __warn+0xd3/0xf0
         [<ffffffff8107e07f>] warn_slowpath_fmt+0x4f/0x60
         [<ffffffff8179519a>] netpoll_poll_dev+0x15a/0x160
         [<ffffffff81795f38>] netpoll_send_skb_on_dev+0x168/0x250
         [<ffffffff817962fc>] netpoll_send_udp+0x2dc/0x440
         [<ffffffff815fa9be>] write_ext_msg+0x20e/0x250
         [<ffffffff810c8125>] call_console_drivers.constprop.23+0xa5/0x110
         [<ffffffff810c9549>] console_unlock+0x339/0x5b0
         [<ffffffff810c9a88>] vprintk_emit+0x2c8/0x450
         [<ffffffff810c9d5f>] vprintk_default+0x1f/0x30
         [<ffffffff81173df5>] printk+0x48/0x50
         [<ffffffffa0197713>] edac_raw_mc_handle_error+0x563/0x5c0 [edac_core]
         [<ffffffffa0197b9b>] edac_mc_handle_error+0x42b/0x6e0 [edac_core]
         [<ffffffffa01c3a60>] sbridge_mce_output_error+0x410/0x10d0 [sb_edac]
         [<ffffffffa01c47cc>] sbridge_check_error+0xac/0x130 [sb_edac]
         [<ffffffffa0197f3c>] edac_mc_workq_function+0x3c/0x90 [edac_core]
         [<ffffffff81095f8b>] process_one_work+0x19b/0x480
         [<ffffffff810967ca>] worker_thread+0x6a/0x520
         [<ffffffff8109c7c4>] kthread+0xe4/0x100
         [<ffffffff81884c52>] ret_from_fork+0x22/0x40
      
      This happens because we increment rx_pkts on -ENOMEM and -EIO, resulting
      in rx_pkts > 0. Fix this by only bumping rx_pkts if we were actually
      given a non-zero budget.
      Signed-off-by: NCalvin Owens <calvinowens@fb.com>
      Acked-by: NMichael Chan <michael.chan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2edbdb31
    • D
      Merge branch 'lockless-qdisc-series' · fc8b81a5
      David S. Miller 提交于
      John Fastabend says:
      
      ====================
      lockless qdisc series
      
      This series adds support for building lockless qdiscs. This is
      the result of noticing the qdisc lock is a common hot-spot in
      perf analysis of the Linux network stack, especially when testing
      with high packet per second rates. However, nothing is free and
      most qdiscs rely on the qdisc lock for their data structures so
      each qdisc must be converted on a case by case basis. In this
      series, to kick things off, we make pfifo_fast, mq, and mqprio
      lockless. Follow up series can address additional qdiscs as needed.
      For example sch_tbf might be useful. To allow this the lockless
      design is an opt-in flag. In some future utopia we convert all
      qdiscs and we get to drop this case analysis, but in order to
      make progress we live in the real-world.
      
      There are also a handful of optimizations I have behind this
      series and a few code cleanups that I couldn't figure out how
      to fit neatly into this series with out increasing the patch
      count. Once this is in additional patches can address this. The
      most notable is in skb_dequeue we can push the consumer lock
      out a bit and consume multiple skbs off the skb_array in pfifo
      fast per iteration. Ideally we could push arrays of packets at
      drivers as well but we would need the infrastructure for this.
      The other notable improvement is to do less locking in the
      overrun cases where bad tx queue list and gso_skb are being
      hit. Although, nice in theory in practice this is the error
      case and I haven't found a benchmark where this matters yet.
      
      For testing...
      
      My first test case uses multiple containers (via cilium) where
      multiple client containers use 'wrk' to benchmark connections with
      a server container running lighttpd. Where lighttpd is configured
      to use multiple threads, one per core. Additionally this test has
      a proxy agent running so all traffic takes an extra hop through a
      proxy container. In these cases each TCP packet traverses the egress
      qdisc layer at least four times and the ingress qdisc layer an
      additional four times. This makes for a good stress test IMO, perf
      details below.
      
      The other micro-benchmark I run is injecting packets directly into
      qdisc layer using pktgen. This uses the benchmark script,
      
       ./pktgen_bench_xmit_mode_queue_xmit.sh
      
      Benchmarks taken in two cases, "base" running latest net-next no
      changes to qdisc layer and "qdisc" tests run with qdisc lockless
      updates. Numbers reported in req/sec. All virtual 'veth' devices
      run with pfifo_fast in the qdisc test case.
      
      `wrk -t16 -c $conns -d30 "http://[$SERVER_IP4]:80"`
      
      conns    16      32     64   1024
      -----------------------------------------------
      base:   18831  20201  21393  29151
      qdisc:  19309  21063  23899  29265
      
      notice in all cases we see performance improvement when running
      with qdisc case.
      
      Microbenchmarks using pktgen are as follows,
      
      `pktgen_bench_xmit_mode_queue_xmit.sh -t 1 -i eth2 -c 20000000
      
      base(mq):          2.1Mpps
      base(pfifo_fast):  2.1Mpps
      qdisc(mq):         2.6Mpps
      qdisc(pfifo_fast): 2.6Mpps
      
      notice numbers are the same for mq and pfifo_fast because only
      testing a single thread here. In both tests we see a nice bump
      in performance gain. The key with 'mq' is it is already per
      txq ring so contention is minimal in the above cases. Qdiscs
      such as tbf or htb which have more contention will likely show
      larger gains when/if lockless versions are implemented.
      
      Thanks to everyone who helped with this work especially Daniel
      Borkmann, Eric Dumazet and Willem de Bruijn for discussing the
      design and reviewing versions of the code.
      
      Changes from the RFC: dropped a couple patches off the end,
      fixed a bug with skb_queue_walk_safe not unlinking skb in all
      cases, fixed a lockdep splat with pfifo_fast_destroy not calling
      *_bh lock variant, addressed _most_ of Willem's comments, there
      was a bug in the bulk locking (final patch) of the RFC series.
      
      @Willem, I left out lockdep annotation for a follow on series
      to add lockdep more completely, rather than just in code I
      touched.
      
      Comments and feedback welcome.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc8b81a5
    • J
      net: sched: pfifo_fast use skb_array · c5ad119f
      John Fastabend 提交于
      This converts the pfifo_fast qdisc to use the skb_array data structure
      and set the lockless qdisc bit. pfifo_fast is the first qdisc to support
      the lockless bit that can be a child of a qdisc requiring locking. So
      we add logic to clear the lock bit on initialization in these cases when
      the qdisc graft operation occurs.
      
      This also removes the logic used to pick the next band to dequeue from
      and instead just checks a per priority array for packets from top priority
      to lowest. This might need to be a bit more clever but seems to work
      for now.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5ad119f
    • J
      net: skb_array: expose peek API · 4a86a4cf
      John Fastabend 提交于
      This adds a peek routine to skb_array.h for use with qdisc.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a86a4cf
    • J
      net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio · ce679e8d
      John Fastabend 提交于
      The sch_mqprio qdisc creates a sub-qdisc per tx queue which are then
      called independently for enqueue and dequeue operations. However
      statistics are aggregated and pushed up to the "master" qdisc.
      
      This patch adds support for any of the sub-qdiscs to be per cpu
      statistic qdiscs. To handle this case add a check when calculating
      stats and aggregate the per cpu stats if needed.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce679e8d
    • J
      net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq · b01ac095
      John Fastabend 提交于
      The sch_mq qdisc creates a sub-qdisc per tx queue which are then
      called independently for enqueue and dequeue operations. However
      statistics are aggregated and pushed up to the "master" qdisc.
      
      This patch adds support for any of the sub-qdiscs to be per cpu
      statistic qdiscs. To handle this case add a check when calculating
      stats and aggregate the per cpu stats if needed.
      
      Also exports __gnet_stats_copy_queue() to use as a helper function.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b01ac095
    • J
      net: sched: helpers to sum qlen and qlen for per cpu logic · 7e66016f
      John Fastabend 提交于
      Add qdisc qlen helper routines for lockless qdiscs to use.
      
      The qdisc qlen is no longer used in the hotpath but it is reported
      via stats query on the qdisc so it still needs to be tracked. This
      adds the per cpu operations needed along with a helper to return
      the summation of per cpu stats.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e66016f