1. 28 11月, 2018 1 次提交
    • N
      net: bridge: add support for user-controlled bool options · a428afe8
      Nikolay Aleksandrov 提交于
      We have been adding many new bridge options, a big number of which are
      boolean but still take up netlink attribute ids and waste space in the skb.
      Recently we discussed learning from link-local packets[1] and decided
      yet another new boolean option will be needed, thus introducing this API
      to save some bridge nl space.
      The API supports changing the value of multiple boolean options at once
      via the br_boolopt_multi struct which has an optmask (which options to
      set, bit per opt) and optval (options' new values). Future boolean
      options will only be added to the br_boolopt_id enum and then will have
      to be handled in br_boolopt_toggle/get. The API will automatically
      add the ability to change and export them via netlink, sysfs can use the
      single boolopt function versions to do the same. The behaviour with
      failing/succeeding is the same as with normal netlink option changing.
      
      If an option requires mapping to internal kernel flag or needs special
      configuration to be enabled then it should be handled in
      br_boolopt_toggle. It should also be able to retrieve an option's current
      state via br_boolopt_get.
      
      v2: WARN_ON() on unsupported option as that shouldn't be possible and
          also will help catch people who add new options without handling
          them for both set and get. Pass down extack so if an option desires
          it could set it on error and be more user-friendly.
      
      [1] https://www.spinics.net/lists/netdev/msg532698.htmlSigned-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a428afe8
  2. 27 11月, 2018 15 次提交
    • D
      Merge branch 'virtio-support-packed-ring' · 02c72d5e
      David S. Miller 提交于
      Tiwei Bie says:
      
      ====================
      virtio: support packed ring
      
      This patch set implements packed ring support in virtio driver.
      
      A performance test between pktgen (pktgen_sample03_burst_single_flow.sh)
      and DPDK vhost (testpmd/rxonly/vhost-PMD) has been done, I saw
      ~30% performance gain in packed ring in this case.
      
      To make this patch set work with below patch set for vhost,
      some hacks are needed to set the _F_NEXT flag in indirect
      descriptors (this should be fixed in vhost):
      
      https://lkml.org/lkml/2018/7/3/33
      
      v2 -> v3:
      - Use leXX instead of virtioXX (MST);
      - Refactor split ring first (MST);
      - Add debug helpers (MST);
      - Put split/packed ring specific fields in sub structures (MST);
      - Handle normal descriptors and indirect descriptors differently (MST);
      - Track the DMA addr/len related info in a separate structure (MST);
      - Calculate AVAIL/USED flags only when wrap counter wraps (MST);
      - Define a struct/union to read event structure (MST);
      - Define a macro for wrap counter bit in uapi (MST);
      - Define the AVAIL/USED bits as shifts instead of values (MST);
      - s/_F_/_FLAG_/ in VRING_PACKED_EVENT_* as they are values (MST);
      - Drop the notify workaround for QEMU's tx-timer in packed ring (MST);
      
      v1 -> v2:
      - Use READ_ONCE() to read event off_wrap and flags together (Jason);
      - Add comments related to ccw (Jason);
      
      RFC v6 -> v1:
      - Avoid extra virtio_wmb() in virtqueue_enable_cb_delayed_packed()
        when event idx is off (Jason);
      - Fix bufs calculation in virtqueue_enable_cb_delayed_packed() (Jason);
      - Test the state of the desc at used_idx instead of last_used_idx
        in virtqueue_enable_cb_delayed_packed() (Jason);
      - Save wrap counter (as part of queue state) in the return value
        of virtqueue_enable_cb_prepare_packed();
      - Refine the packed ring definitions in uapi;
      - Rebase on the net-next tree;
      
      RFC v5 -> RFC v6:
      - Avoid tracking addr/len/flags when DMA API isn't used (MST/Jason);
      - Define wrap counter as bool (Jason);
      - Use ALIGN() in vring_init_packed() (Jason);
      - Avoid using pointer to track `next` in detach_buf_packed() (Jason);
      - Add comments for barriers (Jason);
      - Don't enable RING_PACKED on ccw for now (noticed by Jason);
      - Refine the memory barrier in virtqueue_poll();
      - Add a missing memory barrier in virtqueue_enable_cb_delayed_packed();
      - Remove the hacks in virtqueue_enable_cb_prepare_packed();
      
      RFC v4 -> RFC v5:
      - Save DMA addr, etc in desc state (Jason);
      - Track used wrap counter;
      
      RFC v3 -> RFC v4:
      - Make ID allocation support out-of-order (Jason);
      - Various fixes for EVENT_IDX support;
      
      RFC v2 -> RFC v3:
      - Split into small patches (Jason);
      - Add helper virtqueue_use_indirect() (Jason);
      - Just set id for the last descriptor of a list (Jason);
      - Calculate the prev in virtqueue_add_packed() (Jason);
      - Fix/improve desc suppression code (Jason/MST);
      - Refine the code layout for XXX_split/packed and wrappers (MST);
      - Fix the comments and API in uapi (MST);
      - Remove the BUG_ON() for indirect (Jason);
      - Some other refinements and bug fixes;
      
      RFC v1 -> RFC v2:
      - Add indirect descriptor support - compile test only;
      - Add event suppression supprt - compile test only;
      - Move vring_packed_init() out of uapi (Jason, MST);
      - Merge two loops into one in virtqueue_add_packed() (Jason);
      - Split vring_unmap_one() for packed ring and split ring (Jason);
      - Avoid using '%' operator (Jason);
      - Rename free_head -> next_avail_idx (Jason);
      - Add comments for virtio_wmb() in virtqueue_add_packed() (Jason);
      - Some other refinements and bug fixes;
      ====================
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02c72d5e
    • T
      virtio_ring: advertize packed ring layout · f959a128
      Tiwei Bie 提交于
      Advertize the packed ring layout support.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f959a128
    • T
      virtio_ring: disable packed ring on unsupported transports · 3a814fdf
      Tiwei Bie 提交于
      Currently, ccw, vop and remoteproc need some legacy virtio
      APIs to create or access virtio rings, which are not supported
      by packed ring. So disable packed ring on these transports
      for now.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a814fdf
    • T
      virtio_ring: leverage event idx in packed ring · f51f9826
      Tiwei Bie 提交于
      Leverage the EVENT_IDX feature in packed ring to suppress
      events when it's available.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f51f9826
    • T
      virtio_ring: introduce packed ring support · 1ce9e605
      Tiwei Bie 提交于
      Introduce the packed ring support. Packed ring can only be
      created by vring_create_virtqueue() and each chunk of packed
      ring will be allocated individually. Packed ring can not be
      created on preallocated memory by vring_new_virtqueue() or
      the likes currently.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ce9e605
    • T
      virtio_ring: cache whether we will use DMA API · fb3fba6b
      Tiwei Bie 提交于
      Cache whether we will use DMA API, instead of doing the
      check every time. We are going to check whether DMA API
      is used more often in packed ring.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb3fba6b
    • T
      virtio_ring: extract split ring handling from ring creation · d79dca75
      Tiwei Bie 提交于
      Introduce a specific function to create the split ring.
      And also move the DMA allocation and size information to
      the .split sub-structure.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d79dca75
    • T
      virtio_ring: allocate desc state for split ring separately · cbeedb72
      Tiwei Bie 提交于
      Put the split ring's desc state into the .split sub-structure,
      and allocate desc state for split ring separately, this makes
      the code more readable and more consistent with what we will
      do for packed ring.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cbeedb72
    • T
      virtio_ring: introduce helper for indirect feature · 2f18c2d1
      Tiwei Bie 提交于
      Introduce a helper to check whether we will use indirect
      feature. It will be used by packed ring too.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f18c2d1
    • T
      virtio_ring: introduce debug helpers · 4d6a105e
      Tiwei Bie 提交于
      Introduce debug helpers for last_add_time update, check and
      invalid. They will be used by packed ring too.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d6a105e
    • T
      virtio_ring: put split ring fields in a sub struct · e593bf97
      Tiwei Bie 提交于
      Put the split ring specific fields in a sub-struct named
      as "split" to avoid misuse after introducing packed ring.
      There is no functional change.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e593bf97
    • T
      virtio_ring: put split ring functions together · e6f633e5
      Tiwei Bie 提交于
      Put the xxx_split() functions together to make the
      code more readable and avoid misuse after introducing
      the packed ring. There is no functional change.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6f633e5
    • T
      virtio_ring: add _split suffix for split ring functions · 138fd251
      Tiwei Bie 提交于
      Add _split suffix for split ring specific functions. This
      is a preparation for introducing the packed ring support.
      There is no functional change.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      138fd251
    • T
      virtio: add packed ring types and macros · 89a9157e
      Tiwei Bie 提交于
      Add types and macros for packed ring.
      Signed-off-by: NTiwei Bie <tiwei.bie@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89a9157e
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 4afe60a9
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf-next 2018-11-26
      
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Extend BTF to support function call types and improve the BPF
         symbol handling with this info for kallsyms and bpftool program
         dump to make debugging easier, from Martin and Yonghong.
      
      2) Optimize LPM lookups by making longest_prefix_match() handle
         multiple bytes at a time, from Eric.
      
      3) Adds support for loading and attaching flow dissector BPF progs
         from bpftool, from Stanislav.
      
      4) Extend the sk_lookup() helper to be supported from XDP, from Nitin.
      
      5) Enable verifier to support narrow context loads with offset > 0
         to adapt to LLVM code generation (currently only offset of 0 was
         supported). Add test cases as well, from Andrey.
      
      6) Simplify passing device functions for offloaded BPF progs by
         adding callbacks to bpf_prog_offload_ops instead of ndo_bpf.
         Also convert nfp and netdevsim to make use of them, from Quentin.
      
      7) Add support for sock_ops based BPF programs to send events to
         the perf ring-buffer through perf_event_output helper, from
         Sowmini and Daniel.
      
      8) Add read / write support for skb->tstamp from tc BPF and cg BPF
         programs to allow for supporting rate-limiting in EDT qdiscs
         like fq from BPF side, from Vlad.
      
      9) Extend libbpf API to support map in map types and add test cases
         for it as well to BPF kselftests, from Nikita.
      
      10) Account the maximum packet offset accessed by a BPF program in
          the verifier and use it for optimizing nfp JIT, from Jiong.
      
      11) Fix error handling regarding kprobe_events in BPF sample loader,
          from Daniel T.
      
      12) Add support for queue and stack map type in bpftool, from David.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4afe60a9
  3. 26 11月, 2018 8 次提交
  4. 25 11月, 2018 8 次提交
    • W
      selftests/net: add txring_overwrite · 358be656
      Willem de Bruijn 提交于
      Packet sockets with PACKET_TX_RING send skbs with user data in frags.
      
      Before commit 5cd8d46e ("packet: copy user buffers before orphan
      or clone") ring slots could be released prematurely, possibly allowing
      a process to overwrite data still in flight.
      
      This test opens two packet sockets, one to send and one to read.
      The sender has a tx ring of one slot. It sends two packets with
      different payload, then reads both and verifies their payload.
      
      Before the above commit, both receive calls return the same data as
      the send calls use the same buffer. From the commit, the clone
      needed for looping onto a packet socket triggers an skb_copy_ubufs
      to create a private copy. The separate sends each arrive correctly.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      358be656
    • C
      net: qualcomm: rmnet: move null check on dev before dereferecing it · 3c18aa14
      Colin Ian King 提交于
      Currently dev is dereferenced by the call dev_net(dev) before dev is null
      checked.  Fix this by null checking dev before the potential null
      pointer dereference.
      
      Detected by CoverityScan, CID#1462955 ("Dereference before null check")
      
      Fixes: 23790ef1 ("net: qualcomm: rmnet: Allow to configure flags for existing devices")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c18aa14
    • Y
      cxgb4: remove set but not used variables 'multitrc, speed' · 21ab664a
      YueHaibing 提交于
      Fixes gcc '-Wunused-but-set-variable' warning:
      
      drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:5883:6:
       warning: variable 'multitrc' set but not used [-Wunused-but-set-variable]
      
      drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:8585:32:
       warning: variable 'speed' set but not used [-Wunused-but-set-variable]
      
      'multitrc' never used since introduction in
      commit 8e3d04fd ("cxgb4: Add MPS tracing support")
      
      'speed' never used since introduction in
      commit c3168cab ("cxgb4/cxgbvf: Handle 32-bit fw port capabilities")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      21ab664a
    • A
      net: fixup type in netdev_start_xmit() · 2183435c
      Alexey Dobriyan 提交于
      Return code should be formally "netdev_tx_t".
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2183435c
    • D
      b1bf78bf
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · d146194f
      Linus Torvalds 提交于
      Pull arm64 fixes from Catalin Marinas::
      
       - Fix wrong conflict resolution around CONFIG_ARM64_SSBD
      
       - Fix sparse warning on unsigned long constant
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
        arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD block
        arm64: sysreg: fix sparse warnings
      d146194f
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 857fa628
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Need to take mutex in ath9k_add_interface(), from Dan Carpenter.
      
       2) Fix mt76 build without CONFIG_LEDS_CLASS, from Arnd Bergmann.
      
       3) Fix socket wmem accounting in SCTP, from Xin Long.
      
       4) Fix failed resume crash in ena driver, from Arthur Kiyanovski.
      
       5) qed driver passes bytes instead of bits into second arg of
          bitmap_weight(). From Denis Bolotin.
      
       6) Fix reset deadlock in ibmvnic, from Juliet Kim.
      
       7) skb_scrube_packet() needs to scrub the fwd marks too, from Petr
          Machata.
      
       8) Make sure older TCP stacks see enough dup ACKs, and avoid doing SACK
          compression during this period, from Eric Dumazet.
      
       9) Add atomicity to SMC protocol cursor handling, from Ursula Braun.
      
      10) Don't leave dangling error pointer if bpf_prog_add() fails in
          thunderx driver, from Lorenzo Bianconi. Also, when we unmap TSO
          headers, set sq->tso_hdrs to NULL.
      
      11) Fix race condition over state variables in act_police, from Davide
          Caratti.
      
      12) Disable guest csum in the presence of XDP in virtio_net, from Jason
          Wang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
        net: gemini: Fix copy/paste error
        net: phy: mscc: fix deadlock in vsc85xx_default_config
        dt-bindings: dsa: Fix typo in "probed"
        net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
        net: amd: add missing of_node_put()
        team: no need to do team_notify_peers or team_mcast_rejoin when disabling port
        virtio-net: fail XDP set if guest csum is negotiated
        virtio-net: disable guest csum during XDP set
        net/sched: act_police: add missing spinlock initialization
        net: don't keep lonely packets forever in the gro hash
        net/ipv6: re-do dad when interface has IFF_NOARP flag change
        packet: copy user buffers before orphan or clone
        ibmvnic: Update driver queues after change in ring size support
        ibmvnic: Fix RX queue buffer cleanup
        net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
        net/dim: Update DIM start sample after each DIM iteration
        net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts
        net/smc: use after free fix in smc_wr_tx_put_slot()
        net/smc: atomic SMCD cursor handling
        net/smc: add SMC-D shutdown signal
        ...
      857fa628
    • L
      Merge tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · abe72ff4
      Linus Torvalds 提交于
      Pull xfs fixes from Darrick Wong:
       "Dave and I have continued our work fixing corruption problems that can
        be found when running long-term burn-in exercisers on xfs. Here are
        some patches fixing most of the problems, but there will likely be
        more. :/
      
         - Numerous corruption fixes for copy on write
      
         - Numerous corruption fixes for blocksize < pagesize writes
      
         - Don't miscalculate AG reservations for small final AGs
      
         - Fix page cache truncation to work properly for reflink and extent
           shifting
      
         - Fix use-after-free when retrying failed inode/dquot buffer logging
      
         - Fix corruptions seen when using copy_file_range in directio mode"
      
      * tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        iomap: readpages doesn't zero page tail beyond EOF
        vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP
        iomap: dio data corruption and spurious errors when pipes fill
        iomap: sub-block dio needs to zeroout beyond EOF
        iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents
        xfs: delalloc -> unwritten COW fork allocation can go wrong
        xfs: flush removing page cache in xfs_reflink_remap_prep
        xfs: extent shifting doesn't fully invalidate page cache
        xfs: finobt AG reserves don't consider last AG can be a runt
        xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
        xfs: uncached buffer tracing needs to print bno
        xfs: make xfs_file_remap_range() static
        xfs: fix shared extent data corruption due to missing cow reservation
      abe72ff4
  5. 24 11月, 2018 8 次提交
    • A
      net: gemini: Fix copy/paste error · 07093b76
      Andreas Fiedler 提交于
      The TX stats should be started with the tx_stats_syncp,
      there seems to be a copy/paste error in the driver.
      Signed-off-by: NAndreas Fiedler <andreas.fiedler@gmx.net>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07093b76
    • Q
      net: phy: mscc: fix deadlock in vsc85xx_default_config · 3fa528b7
      Quentin Schulz 提交于
      The vsc85xx_default_config function called in the vsc85xx_config_init
      function which is used by VSC8530, VSC8531, VSC8540 and VSC8541 PHYs
      mistakenly calls phy_read and phy_write in-between phy_select_page and
      phy_restore_page.
      
      phy_select_page and phy_restore_page actually take and release the MDIO
      bus lock and phy_write and phy_read take and release the lock to write
      or read to a PHY register.
      
      Let's fix this deadlock by using phy_modify_paged which handles
      correctly a read followed by a write in a non-standard page.
      
      Fixes: 6a0bfbbe ("net: phy: mscc: migrate to phy_select/restore_page functions")
      Signed-off-by: NQuentin Schulz <quentin.schulz@bootlin.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3fa528b7
    • F
      dt-bindings: dsa: Fix typo in "probed" · e7b9fb4f
      Fabio Estevam 提交于
      The correct form is "can be probed", so fix the typo.
      Signed-off-by: NFabio Estevam <festevam@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7b9fb4f
    • L
      net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue · ef2a7cf1
      Lorenzo Bianconi 提交于
      Reset snd_queue tso_hdrs pointer to NULL in nicvf_free_snd_queue routine
      since it is used to check if tso dma descriptor queue has been previously
      allocated. The issue can be triggered with the following reproducer:
      
      $ip link set dev enP2p1s0v0 xdpdrv obj xdp_dummy.o
      $ip link set dev enP2p1s0v0 xdpdrv off
      
      [  341.467649] WARNING: CPU: 74 PID: 2158 at mm/vmalloc.c:1511 __vunmap+0x98/0xe0
      [  341.515010] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018
      [  341.521874] pstate: 60400005 (nZCv daif +PAN -UAO)
      [  341.526654] pc : __vunmap+0x98/0xe0
      [  341.530132] lr : __vunmap+0x98/0xe0
      [  341.533609] sp : ffff00001c5db860
      [  341.536913] x29: ffff00001c5db860 x28: 0000000000020000
      [  341.542214] x27: ffff810feb5090b0 x26: ffff000017e57000
      [  341.547515] x25: 0000000000000000 x24: 00000000fbd00000
      [  341.552816] x23: 0000000000000000 x22: ffff810feb5090b0
      [  341.558117] x21: 0000000000000000 x20: 0000000000000000
      [  341.563418] x19: ffff000017e57000 x18: 0000000000000000
      [  341.568719] x17: 0000000000000000 x16: 0000000000000000
      [  341.574020] x15: 0000000000000010 x14: ffffffffffffffff
      [  341.579321] x13: ffff00008985eb27 x12: ffff00000985eb2f
      [  341.584622] x11: ffff0000096b3000 x10: ffff00001c5db510
      [  341.589923] x9 : 00000000ffffffd0 x8 : ffff0000086868e8
      [  341.595224] x7 : 3430303030303030 x6 : 00000000000006ef
      [  341.600525] x5 : 00000000003fffff x4 : 0000000000000000
      [  341.605825] x3 : 0000000000000000 x2 : ffffffffffffffff
      [  341.611126] x1 : ffff0000096b3728 x0 : 0000000000000038
      [  341.616428] Call trace:
      [  341.618866]  __vunmap+0x98/0xe0
      [  341.621997]  vunmap+0x3c/0x50
      [  341.624961]  arch_dma_free+0x68/0xa0
      [  341.628534]  dma_direct_free+0x50/0x80
      [  341.632285]  nicvf_free_resources+0x160/0x2d8 [nicvf]
      [  341.637327]  nicvf_config_data_transfer+0x174/0x5e8 [nicvf]
      [  341.642890]  nicvf_stop+0x298/0x340 [nicvf]
      [  341.647066]  __dev_close_many+0x9c/0x108
      [  341.650977]  dev_close_many+0xa4/0x158
      [  341.654720]  rollback_registered_many+0x140/0x530
      [  341.659414]  rollback_registered+0x54/0x80
      [  341.663499]  unregister_netdevice_queue+0x9c/0xe8
      [  341.668192]  unregister_netdev+0x28/0x38
      [  341.672106]  nicvf_remove+0xa4/0xa8 [nicvf]
      [  341.676280]  nicvf_shutdown+0x20/0x30 [nicvf]
      [  341.680630]  pci_device_shutdown+0x44/0x88
      [  341.684720]  device_shutdown+0x144/0x250
      [  341.688640]  kernel_restart_prepare+0x44/0x50
      [  341.692986]  kernel_restart+0x20/0x68
      [  341.696638]  __se_sys_reboot+0x210/0x238
      [  341.700550]  __arm64_sys_reboot+0x24/0x30
      [  341.704555]  el0_svc_handler+0x94/0x110
      [  341.708382]  el0_svc+0x8/0xc
      [  341.711252] ---[ end trace 3f4019c8439959c9 ]---
      [  341.715874] page:ffff7e0003ef4000 count:0 mapcount:0 mapping:0000000000000000 index:0x4
      [  341.723872] flags: 0x1fffe000000000()
      [  341.727527] raw: 001fffe000000000 ffff7e0003f1a008 ffff7e0003ef4048 0000000000000000
      [  341.735263] raw: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
      [  341.742994] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)
      
      where xdp_dummy.c is a simple bpf program that forwards the incoming
      frames to the network stack (available here:
      https://github.com/altoor/xdp_walkthrough_examples/blob/master/sample_1/xdp_dummy.c)
      
      Fixes: 05c773f5 ("net: thunderx: Add basic XDP support")
      Fixes: 4863dea3 ("net: Adding support for Cavium ThunderX network controller")
      Signed-off-by: NLorenzo Bianconi <lorenzo.bianconi@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef2a7cf1
    • Y
      ptp: Fix pass zero to ERR_PTR() in ptp_clock_register · aea0a897
      YueHaibing 提交于
      Fix smatch warning:
      
      drivers/ptp/ptp_clock.c:298 ptp_clock_register() warn:
       passing zero to 'ERR_PTR'
      
      'err' should be set while device_create_with_groups and
      pps_register_source fails
      
      Fixes: 85a66e55 ("ptp: create "pins" together with the rest of attributes")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aea0a897
    • D
      Merge branch 'switchdev-blocking-notifiers' · 06d21290
      David S. Miller 提交于
      Petr Machata says:
      
      ====================
      switchdev: Convert switchdev_port_obj_{add,del}() to notifiers
      
      An offloading driver may need to have access to switchdev events on
      ports that aren't directly under its control. An example is a VXLAN port
      attached to a bridge offloaded by a driver. The driver needs to know
      about VLANs configured on the VXLAN device. However the VXLAN device
      isn't stashed between the bridge and a front-panel-port device (such as
      is the case e.g. for LAG devices), so the usual switchdev ops don't
      reach the driver.
      
      VXLAN is likely not the only device type like this: in theory any L2
      tunnel device that needs offloading will prompt requirement of this
      sort.
      
      A way to fix this is to give up the notion of port object addition /
      deletion as a switchdev operation, which assumes somewhat tight coupling
      between the message producer and consumer. And instead send the message
      over a notifier chain.
      
      The series starts with a clean-up patch #1, where
      SWITCHDEV_OBJ_PORT_{VLAN, MDB}() are fixed up to lift the constraint
      that the passed-in argument be a simple variable named "obj".
      
      switchdev_port_obj_add and _del are invoked in a context that permits
      blocking. Not only that, at least for the VLAN notification, being able
      to signal failure is actually important. Therefore introduce a new
      blocking notifier chain that the new events will be sent on. That's done
      in patch #2. Retain the current (atomic) notifier chain for the
      preexisting notifications.
      
      In patch #3, introduce two new switchdev notifier types,
      SWITCHDEV_PORT_OBJ_ADD and SWITCHDEV_PORT_OBJ_DEL. These notifier types
      communicate the same event as the corresponding switchdev op, except in
      a form of a notification. struct switchdev_notifier_port_obj_info was
      added to carry the fields that correspond to the switchdev op arguments.
      An additional field, handled, will be used to communicate back to
      switchdev that the event has reached an interested party, which will be
      important for the two-phase commit.
      
      In patches #4, #5, and #7, rocker, DSA resp. ethsw are updated to
      subscribe to the switchdev blocking notifier chain, and handle the new
      notifier types. #6 introduces a helper to determine whether a
      netdevice corresponds to a front panel port.
      
      What these three drivers have in common is that their ports don't
      support any uppers besides bridge. That makes it possible to ignore any
      notifiers that don't reference a front-panel port device, because they
      are certainly out of scope.
      
      Unlike the previous three, mlxsw and ocelot drivers admit stacked
      devices as uppers. While the current switchdev code recursively descends
      through layers of lower devices, eventually calling the op on a
      front-panel port device, the notifier would reference a stacking device
      that's one of front-panel ports uppers. The filtering is thus more
      complex.
      
      For ocelot, such iteration is currently pretty much required, because
      there's no bookkeeping of LAG devices. mlxsw does keep the list of LAGs,
      however it iterates the lower devices anyway when deciding whether an
      event on a tunnel device pertains to the driver or not.
      
      Therefore this patch set instead introduces, in patch #8, a helper to
      iterate through lowers, much like the current switchdev code does,
      looking for devices that match a given predicate.
      
      Then in patches #9 and #10, first mlxsw and then ocelot are updated to
      dispatch the newly-added notifier types to the preexisting
      port_obj_add/_del handlers. The dispatch is done via the new helper, to
      recursively descend through lower devices.
      
      Finally in patch #11, the actual switch is made, retiring the current
      SDO-based code in favor of a notifier.
      
      Now that the event is distributed through a notifier, the explicit
      netdevice check in rocker, DSA and ethsw doesn't let through any events
      except those done on a front-panel port itself. It is therefore
      unnecessary to check in VLAN-handling code whether a VLAN was added to
      the bridge itself: such events will simply be ignored much sooner.
      Therefore remove it in patch #12.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06d21290
    • P
      rocker, dsa, ethsw: Don't filter VLAN events on bridge itself · ab4a1686
      Petr Machata 提交于
      Due to an explicit check in rocker_world_port_obj_vlan_add(),
      dsa_slave_switchdev_event() resp. port_switchdev_event(), VLAN objects
      that are added to a device that is not a front-panel port device are
      ignored. Therefore this check is immaterial.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab4a1686
    • P
      switchdev: Replace port obj add/del SDO with a notification · d17d9f5e
      Petr Machata 提交于
      Drop switchdev_ops.switchdev_port_obj_add and _del. Drop the uses of
      this field from all clients, which were migrated to use switchdev
      notification in the previous patches.
      
      Add a new function switchdev_port_obj_notify() that sends the switchdev
      notifications SWITCHDEV_PORT_OBJ_ADD and _DEL.
      
      Update switchdev_port_obj_del_now() to dispatch to this new function.
      Drop __switchdev_port_obj_add() and update switchdev_port_obj_add()
      likewise.
      Signed-off-by: NPetr Machata <petrm@mellanox.com>
      Reviewed-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d17d9f5e