1. 16 9月, 2019 7 次提交
    • I
      bpf: fix accessing bpf_sysctl.file_pos on s390 · d895a0f1
      Ilya Leoshkevich 提交于
      "ctx:file_pos sysctl:read write ok" fails on s390 with "Read value  !=
      nux". This is because verifier rewrites a complete 32-bit
      bpf_sysctl.file_pos update to a partial update of the first 32 bits of
      64-bit *bpf_sysctl_kern.ppos, which is not correct on big-endian
      systems.
      
      Fix by using an offset on big-endian systems.
      
      Ditto for bpf_sysctl.file_pos reads. Currently the test does not detect
      a problem there, since it expects to see 0, which it gets with high
      probability in error cases, so change it to seek to offset 3 and expect
      3 in bpf_sysctl.file_pos.
      
      Fixes: e1550bfe ("bpf: Add file_pos field to bpf_sysctl ctx")
      Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20190816105300.49035-1-iii@linux.ibm.com/
      d895a0f1
    • T
      xdp: Fix race in dev_map_hash_update_elem() when replacing element · af58e7ee
      Toke Høiland-Jørgensen 提交于
      syzbot found a crash in dev_map_hash_update_elem(), when replacing an
      element with a new one. Jesper correctly identified the cause of the crash
      as a race condition between the initial lookup in the map (which is done
      before taking the lock), and the removal of the old element.
      
      Rather than just add a second lookup into the hashmap after taking the
      lock, fix this by reworking the function logic to take the lock before the
      initial lookup.
      
      Fixes: 6f9d451a ("xdp: Add devmap_hash map type for looking up devices by hashed index")
      Reported-and-tested-by: syzbot+4e7a85b1432052e8d6f8@syzkaller.appspotmail.com
      Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      af58e7ee
    • D
      Merge branch 'bpf-af-xdp-unaligned-fixes' · a4fa6e16
      Daniel Borkmann 提交于
      Ciara Loftus says:
      
      ====================
      This patch set contains some fixes for AF_XDP zero copy in the i40e and
      ixgbe drivers as well as a fix for the 'xdpsock' sample application when
      running in unaligned mode.
      
      Patches 1 and 2 fix a regression for the i40e and ixgbe drivers which
      caused the umem headroom to be added to the xdp handle twice, resulting in
      an incorrect value being received by the user for the case where the umem
      headroom is non-zero.
      
      Patch 3 fixes an issue with the xdpsock sample application whereby the
      start of the tx packet data (offset) was not being set correctly when the
      application was being run in unaligned mode.
      
      This patch set has been applied against commit a2c11b03 ("kcm: use
      BPF_PROG_RUN")
      ====================
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      a4fa6e16
    • C
      samples/bpf: fix xdpsock l2fwd tx for unaligned mode · 5a712e13
      Ciara Loftus 提交于
      Preserve the offset of the address of the received descriptor, and include
      it in the address set for the tx descriptor, so the kernel can correctly
      locate the start of the packet data.
      
      Fixes: 03895e63 ("samples/bpf: add buffer recycling for unaligned chunks to xdpsock")
      Signed-off-by: NCiara Loftus <ciara.loftus@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      5a712e13
    • C
      ixgbe: fix xdp handle calculations · 2e78fc62
      Ciara Loftus 提交于
      Commit 7cbbf9f1 ("ixgbe: fix xdp handle calculations") reintroduced
      the addition of the umem headroom to the xdp handle in the ixgbe_zca_free,
      ixgbe_alloc_buffer_slow_zc and ixgbe_alloc_buffer_zc functions. However,
      the headroom is already added to the handle in the function
      ixgbe_run_xdp_zc. This commit removes the latter addition and fixes the
      case where the headroom is non-zero.
      
      Fixes: 7cbbf9f1 ("ixgbe: fix xdp handle calculations")
      Signed-off-by: NCiara Loftus <ciara.loftus@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      2e78fc62
    • C
      i40e: fix xdp handle calculations · 168dfc3a
      Ciara Loftus 提交于
      Commit 4c5d9a7f ("i40e: fix xdp handle calculations") reintroduced
      the addition of the umem headroom to the xdp handle in the i40e_zca_free,
      i40e_alloc_buffer_slow_zc and i40e_alloc_buffer_zc functions. However,
      the headroom is already added to the handle in the function i40_run_xdp_zc.
      This commit removes the latter addition and fixes the case where the
      headroom is non-zero.
      
      Fixes: 4c5d9a7f ("i40e: fix xdp handle calculations")
      Signed-off-by: NCiara Loftus <ciara.loftus@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      168dfc3a
    • I
      selftests/bpf: add bpf-gcc support · 4ce150b6
      Ilya Leoshkevich 提交于
      Now that binutils and gcc support for BPF is upstream, make use of it in
      BPF selftests using alu32-like approach. Share as much as possible of
      CFLAGS calculation with clang.
      
      Fixes only obvious issues, leaving more complex ones for later:
      - Use gcc-provided bpf-helpers.h instead of manually defining the
        helpers, change bpf_helpers.h include guard to avoid conflict.
      - Include <linux/stddef.h> for __always_inline.
      - Add $(OUTPUT)/../usr/include to include path in order to use local
        kernel headers instead of system kernel headers when building with O=.
      
      In order to activate the bpf-gcc support, one needs to configure
      binutils and gcc with --target=bpf and make them available in $PATH. In
      particular, gcc must be installed as `bpf-gcc`, which is the default.
      
      Right now with binutils 25a2915e8dba and gcc r275589 only a handful of
      tests work:
      
      	# ./test_progs_bpf_gcc
      	# Summary: 7/39 PASSED, 1 SKIPPED, 98 FAILED
      
      The reason for those failures are as follows:
      
      - Build errors:
        - `error: too many function arguments for eBPF` for __always_inline
          functions read_str_var and read_map_var - must be inlining issue,
          and for process_l3_headers_v6, which relies on optimizing away
          function arguments.
        - `error: indirect call in function, which are not supported by eBPF`
          where there are no obvious indirect calls in the source calls, e.g.
          in __encap_ipip_none.
        - `error: field 'lock' has incomplete type` for fields of `struct
          bpf_spin_lock` type - bpf_spin_lock is re#defined by bpf-helpers.h,
          so its usage is sensitive to order of #includes.
        - `error: eBPF stack limit exceeded` in sysctl_tcp_mem.
      - Load errors:
        - Missing object files due to above build errors.
        - `libbpf: failed to create map (name: 'test_ver.bss')`.
        - `libbpf: object file doesn't contain bpf program`.
        - `libbpf: Program '.text' contains unrecognized relo data pointing to
          section 0`.
        - `libbpf: BTF is required, but is missing or corrupted` - no BTF
          support in gcc yet.
      Signed-off-by: NIlya Leoshkevich <iii@linux.ibm.com>
      Cc: Jose E. Marchesi <jose.marchesi@oracle.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      4ce150b6
  2. 07 9月, 2019 9 次提交
  3. 06 9月, 2019 20 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next · 1e46c09e
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      The following pull-request contains BPF updates for your *net-next* tree.
      
      The main changes are:
      
      1) Add the ability to use unaligned chunks in the AF_XDP umem. By
         relaxing where the chunks can be placed, it allows to use an
         arbitrary buffer size and place whenever there is a free
         address in the umem. Helps more seamless DPDK AF_XDP driver
         integration. Support for i40e, ixgbe and mlx5e, from Kevin and
         Maxim.
      
      2) Addition of a wakeup flag for AF_XDP tx and fill rings so the
         application can wake up the kernel for rx/tx processing which
         avoids busy-spinning of the latter, useful when app and driver
         is located on the same core. Support for i40e, ixgbe and mlx5e,
         from Magnus and Maxim.
      
      3) bpftool fixes for printf()-like functions so compiler can actually
         enforce checks, bpftool build system improvements for custom output
         directories, and addition of 'bpftool map freeze' command, from Quentin.
      
      4) Support attaching/detaching XDP programs from 'bpftool net' command,
         from Daniel.
      
      5) Automatic xskmap cleanup when AF_XDP socket is released, and several
         barrier/{read,write}_once fixes in AF_XDP code, from Björn.
      
      6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf
         inclusion as well as libbpf versioning improvements, from Andrii.
      
      7) Several new BPF kselftests for verifier precision tracking, from Alexei.
      
      8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya.
      
      9) And more BPF kselftest improvements all over the place, from Stanislav.
      
      10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub.
      
      11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan.
      
      12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni.
      
      13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin.
      
      14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar.
      
      15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari,
          Peter, Wei, Yue.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e46c09e
    • C
      lan743x: remove redundant assignment to variable rx_process_result · f9bcfe21
      Colin Ian King 提交于
      The variable rx_process_result is being initialized with a value that
      is never read and is being re-assigned immediately afterwards. The
      assignment is redundant, so replace it with the return from function
      lan743x_rx_process_packet.
      
      Addresses-Coverity: ("Unused value")
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9bcfe21
    • D
      Merge branch 'ravb-remove-use-of-undocumented-registers' · 5b1ab1ae
      David S. Miller 提交于
      Simon Horman says:
      
      ====================
      ravb: remove use of undocumented registers
      
      this short series cleans up the RAVB driver a little.
      
      The first patch corrects the spelling of the FBP field of SFO register.
      This register field is unused and should have no run-time effect.
      
      The remaining patches remove the use of undocumented registers
      after some consultation with the internal Renesas BSP team.
      
      Changes in v2:
      * Corrected mangled state of first patch
      * Patches 2/4 and 3/4 split out of a large patch
      * Accumulated acks
      * Tweaked changelog
      * Claimed authorship of all patches
      
      v1 of this series was tested on the following platforms.
      No behaviour change is expected in v2.
      * E3 Ebisu
      * H3 Salvator-XS (ES2.0)
      * M3-W Salvator-XS
      * M3-N Salvator-XS
      * RZ/G1C iW-RainboW-G23S
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5b1ab1ae
    • S
      ravb: TROCR register is only present on R-Car Gen3 · fd8ab76a
      Simon Horman 提交于
      Only use the TROCR register on R-Car Gen3 as it is not present on other
      SoCs.
      
      Offsets used for the undocumented registers are considered reserved and
      should not be written to. After some internal investigation with Renesas it
      remains unclear why this driver accesses these fields on R-Car Gen2 but
      regardless of what the historical reasons are the current code is
      considered incorrect.
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Reviewed-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Acked-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd8ab76a
    • S
      ravb: remove undocumented endianness selection · 2d957a7e
      Simon Horman 提交于
      This patch removes the use of the undocumented BOC bit of the CCC register.
      
      Current documentation for EtherAVB (ravb) describes the offset of what the
      driver uses as the BOC bit as reserved and that only a value of 0 should be
      written. After some internal investigation with Renesas it remains unclear
      why this driver accesses these fields but regardless of what the historical
      reasons are the current code is considered incorrect.
      
      Based on work by Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d957a7e
    • S
      ravb: remove undocumented counter processing · 009a4703
      Simon Horman 提交于
      This patch removes the use of the undocumented counter registers
      CDCR, LCCR, CERCR, CEECR.
      
      Offsets used for undocumented registers are considered reserved and
      should not be written to. After some internal investigation with Renesas
      it remains unclear why this driver accesses these fields but regardless of
      what the historical reasons are the current code is considered incorrect.
      
      Based on work by Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      009a4703
    • S
      ravb: correct typo in FBP field of SFO register · 845e4b80
      Simon Horman 提交于
      The field name is FBP rather than FPB.
      
      This field is unused and could equally be removed from the driver entirely.
      But there seems no harm in leaving as documentation of the presence of the
      field.
      
      Based on work by Kazuya Mizuguchi <kazuya.mizuguchi.ks@renesas.com>
      Signed-off-by: NSimon Horman <horms+renesas@verge.net.au>
      Reviewed-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Acked-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      845e4b80
    • D
      Merge branch 'net-hns3-add-some-bugfixes-and-cleanups' · 7250a9d2
      David S. Miller 提交于
      Huazhong Tan says:
      
      ====================
      net: hns3: add some bugfixes and cleanups
      
      This patch-set includes bugfixes and cleanups for the HNS3
      ethernet controller driver.
      
      [patch 01/07] fixes an error when setting VLAN offload.
      
      [patch 02/07] fixes an double free issue when setting ringparam.
      
      [patch 03/07] fixes a mis-assignment of hdev->reset_level.
      
      [patch 04/07] adds a checking for client's validity.
      
      [patch 05/07] simplifies bool variable's assignment.
      
      [patch 06/07] disables loopback when initializing.
      
      [patch 07/07] makes internal function to static.
      
      Change log:
      V1->V2: fixes comment from Sergei Shtylyov.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7250a9d2
    • G
      net: hns3: make hclge_dbg_get_m7_stats_info static · 91f8ff09
      Guojia Liao 提交于
      hclge_dbg_get_m7_info is used only in the hclge_debugfs.c,
      so it should be declared with static.
      Signed-off-by: NGuojia Liao <liaoguojia@huawei.com>
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91f8ff09
    • Y
      net: hns3: disable loopback setting in hclge_mac_init · 1cbc662d
      Yufeng Mo 提交于
      If the selftest and reset are performed at the same time, the loopback
      setting may be still in the enable state after the reset. As a result,
      packets cannot be sent out.
      
      This patch fixes this issue by disabling loopback in hclge_mac_init.
      Signed-off-by: NYufeng Mo <moyufeng@huawei.com>
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1cbc662d
    • G
      net: hns3: remove explicit conversion to bool · 1483fa49
      Guojia Liao 提交于
      Relational and logical operators evaluate to bool,
      explicit conversion is overly verbose and unnecessary.
      Signed-off-by: NGuojia Liao <liaoguojia@huawei.com>
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1483fa49
    • P
      net: hns3: add client node validity judgment · b7cf22b7
      Peng Li 提交于
      HNS3 driver can only unregister client which included in hnae3_client_list.
      This patch adds the client node validity judgment.
      Signed-off-by: NPeng Li <lipeng321@huawei.com>
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b7cf22b7
    • H
      net: hns3: fix mis-assignment to hdev->reset_level in hclge_reset · 525a294e
      Huazhong Tan 提交于
      Since hclge_get_reset_level may return HNAE3_NONE_RESET,
      so hdev->reset_level can not be assigned with the return
      value in the hclge_reset(), otherwise, it will cause
      the use of hdev->reset_level in hclge_reset_event get
      into error.
      
      Fixes: 012fcb52 ("net: hns3: activate reset timer when calling reset_event")
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      525a294e
    • H
      net: hns3: fix double free bug when setting ringparam · 323a2ac5
      Huazhong Tan 提交于
      The system will panic when change the ringparam in HNS3 drivers:
      
      [ 1459.627727] hns3 0000:bd:00.0 eth6: Changing Tx/Rx ring ds from 1024/1024 to 24/24
      [ 1459.635766] hns3 0000:bd:00.0 eth6: link down
      [ 1459.640788] BUG: Bad page state in process ethtool  pfn:203f75c18
      [ 1459.646940] page:ffff7ee4ffd70600 refcount:0 mapcount:0 mapping:ffff993fff40f400 index:0x0 compound_mapcount: 0
      [ 1459.656987] flags: 0x9fffe00000010200(slab|head)
      [ 1459.661591] raw: 9fffe00000010200 dead000000000100 dead000000000122 ffff993fff40f400
      [ 1459.669302] raw: 0000000000000000 0000000080100010 00000000ffffffff 0000000000000000
      [ 1459.677016] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
      [ 1459.683432] bad because of flags: 0x200(slab)
      [ 1459.687775] Modules linked in: ib_ipoib ib_umad rpcrdma ib_iser libiscsi scsi_transport_iscsi hns_roce_hw_v2 crct10dif_ce hns3 ses hclge hnae3 hisi_hpre hisi_zip qm uacce ip_tables x_tables hisi_sas_v3_hw hisi_sas_main libsas scsi_transport_sas
      [ 1459.709329] CPU: 14 PID: 17244 Comm: ethtool Tainted: G           O      5.3.0-rc4-00415-gc86f057 #1
      [ 1459.718419] Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V3.B040.01 07/26/2019
      [ 1459.727248] Call trace:
      [ 1459.729688]  dump_backtrace+0x0/0x150
      [ 1459.733335]  show_stack+0x24/0x30
      [ 1459.736639]  dump_stack+0xa0/0xc4
      [ 1459.739943]  bad_page+0xf0/0x158
      [ 1459.743157]  free_pages_check_bad+0x84/0xa0
      [ 1459.747322]  __free_pages_ok+0x348/0x378
      [ 1459.751228]  page_frag_free+0x80/0x88
      [ 1459.754877]  skb_free_head+0x38/0x48
      [ 1459.758436]  skb_release_data+0x134/0x160
      [ 1459.762427]  skb_release_all+0x30/0x40
      [ 1459.766158]  consume_skb+0x38/0x108
      [ 1459.769633]  __dev_kfree_skb_any+0x58/0x68
      [ 1459.773718]  hns3_fini_ring+0x48/0x58 [hns3]
      [ 1459.777970]  hns3_set_ringparam+0x2a8/0x418 [hns3]
      [ 1459.782741]  dev_ethtool+0x5f4/0x2080
      [ 1459.786390]  dev_ioctl+0x190/0x3d8
      [ 1459.789777]  sock_do_ioctl+0xf8/0x220
      [ 1459.793423]  sock_ioctl+0x3bc/0x490
      [ 1459.796896]  do_vfs_ioctl+0xc4/0x868
      [ 1459.800454]  ksys_ioctl+0x8c/0xa0
      [ 1459.803752]  __arm64_sys_ioctl+0x28/0x38
      [ 1459.807658]  el0_svc_common.constprop.0+0xe0/0x1e0
      [ 1459.812426]  el0_svc_handler+0x34/0x90
      [ 1459.816158]  el0_svc+0x10/0x14
      [ 1459.819220] Disabling lock debugging due to kernel taint
      [ 1459.825182] ------------[ cut here ]------------
      
      Since ndo_stop will reclaim the RX's skb allocated by the driver,
      so the backed up ring parameter should not keep this info.
      
      Fixes: a723fb8e ("net: hns3: refine for set ring parameters")
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      323a2ac5
    • J
      net: hns3: fix error VF index when setting VLAN offload · d9c0f275
      Jian Shen 提交于
      In original codes, the VF index used incorrectly in function
      hclge_set_vlan_rx_offload_cfg() and hclge_set_vlan_rx_offload_cfg().
      When VF id is greater than 8, for example 9, it will set the
      same bit with VF id 1.
      
      This patch fixes it by using  vport->vport_id % HCLGE_VF_NUM_PER_CMD /
      HCLGE_VF_NUM_PER_BYTE as the array index, instead of vport->vport_id /
      HCLGE_VF_NUM_PER_CMD.
      
      Fixes: 052ece6d ("net: hns3: add ethtool related offload command")
      Signed-off-by: NJian Shen <shenjian15@huawei.com>
      Signed-off-by: NHuazhong Tan <tanhuazhong@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9c0f275
    • A
      stmmac: platform: adjust messages and move to dev level · c3a502de
      Andy Shevchenko 提交于
      This patch amends the error and warning messages across the platform driver.
      It includes the following changes:
       - append \n to the end of messages
       - change pr_* macros to dev_*
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c3a502de
    • J
      net: phy: Do not check Link status when loopback is enabled · fe4a7a41
      Jose Abreu 提交于
      While running stmmac selftests I found that in my 1G setup some tests
      were failling when running with PHY loopback enabled.
      
      It looks like when loopback is enabled the PHY will report that Link is
      down even though there is a valid connection.
      
      As in loopback mode the data will not be sent anywhere we can bypass the
      logic of checking if Link is valid thus saving unecessary reads.
      Signed-off-by: NJose Abreu <joabreu@synopsys.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe4a7a41
    • D
      net_sched: act_police: add 2 new attributes to support police 64bit rate and peakrate · d1967e49
      David Dai 提交于
      For high speed adapter like Mellanox CX-5 card, it can reach upto
      100 Gbits per second bandwidth. Currently htb already supports 64bit rate
      in tc utility. However police action rate and peakrate are still limited
      to 32bit value (upto 32 Gbits per second). Add 2 new attributes
      TCA_POLICE_RATE64 and TCA_POLICE_RATE64 in kernel for 64bit support
      so that tc utility can use them for 64bit rate and peakrate value to
      break the 32bit limit, and still keep the backward binary compatibility.
      Tested-by: NDavid Dai <zdai@linux.vnet.ibm.com>
      Signed-off-by: NDavid Dai <zdai@linux.vnet.ibm.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d1967e49
    • P
      net: openvswitch: Set OvS recirc_id from tc chain index · 95a7233c
      Paul Blakey 提交于
      Offloaded OvS datapath rules are translated one to one to tc rules,
      for example the following simplified OvS rule:
      
      recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
      
      Will be translated to the following tc rule:
      
      $ tc filter add dev dev1 ingress \
      	    prio 1 chain 0 proto ip \
      		flower tcp ct_state -trk \
      		action ct pipe \
      		action goto chain 2
      
      Received packets will first travel though tc, and if they aren't stolen
      by it, like in the above rule, they will continue to OvS datapath.
      Since we already did some actions (action ct in this case) which might
      modify the packets, and updated action stats, we would like to continue
      the proccessing with the correct recirc_id in OvS (here recirc_id(2))
      where we left off.
      
      To support this, introduce a new skb extension for tc, which
      will be used for translating tc chain to ovs recirc_id to
      handle these miss cases. Last tc chain index will be set
      by tc goto chain action and read by OvS datapath.
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95a7233c
    • Z
      nfp: Drop unnecessary continue in nfp_net_pf_alloc_vnics · 47e25277
      zhong jiang 提交于
      Continue is not needed at the bottom of a loop.
      Signed-off-by: Nzhong jiang <zhongjiang@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      47e25277
  4. 05 9月, 2019 4 次提交
    • D
      Merge branch 'bpf-af-xdp-barrier-fixes' · 593f191a
      Daniel Borkmann 提交于
      Björn Töpel says:
      
      ====================
      This is a four patch series of various barrier, {READ, WRITE}_ONCE
      cleanups in the AF_XDP socket code. More details can be found in the
      corresponding commit message. Previous revisions: v1 [4] and v2 [5].
      
      For an AF_XDP socket, most control plane operations are done under the
      control mutex (struct xdp_sock, mutex), but there are some places
      where members of the struct is read outside the control mutex. The
      dev, queue_id members are set in bind() and cleared at cleanup. The
      umem, fq, cq, tx, rx, and state member are all assigned in various
      places, e.g. bind() and setsockopt(). When the members are assigned,
      they are protected by the control mutex, but since they are read
      outside the mutex, a WRITE_ONCE is required to avoid store-tearing on
      the read-side.
      
      Prior the state variable was introduced by Ilya, the dev member was
      used to determine whether the socket was bound or not. However, when
      dev was read, proper SMP barriers and READ_ONCE were missing. In order
      to address the missing barriers and READ_ONCE, we start using the
      state variable as a point of synchronization. The state member
      read/write is paired with proper SMP barriers, and from this follows
      that the members described above does not need READ_ONCE statements if
      used in conjunction with state check.
      
      To summarize: The members struct xdp_sock members dev, queue_id, umem,
      fq, cq, tx, rx, and state were read lock-less, with incorrect barriers
      and missing {READ, WRITE}_ONCE. After this series umem, fq, cq, tx,
      rx, and state are read lock-less. When these members are updated,
      WRITE_ONCE is used. When read, READ_ONCE are only used when read
      outside the control mutex (e.g. mmap) or, not synchronized with the
      state member (XSK_BOUND plus smp_rmb())
      
      [1] https://lore.kernel.org/bpf/beef16bb-a09b-40f1-7dd0-c323b4b89b17@iogearbox.net/
      [2] https://lwn.net/Articles/793253/
      [3] https://github.com/google/ktsan/wiki/READ_ONCE-and-WRITE_ONCE
      [4] https://lore.kernel.org/bpf/20190822091306.20581-1-bjorn.topel@gmail.com/
      [5] https://lore.kernel.org/bpf/20190826061053.15996-1-bjorn.topel@gmail.com/
      
      v2->v3:
        Minor restructure of commits.
        Improve cover and commit messages. (Daniel)
      v1->v2:
        Removed redundant dev check. (Jonathan)
      ====================
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      593f191a
    • B
      xsk: lock the control mutex in sock_diag interface · 25dc18ff
      Björn Töpel 提交于
      When accessing the members of an XDP socket, the control mutex should
      be held. This commit fixes that.
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Fixes: a36b38aa ("xsk: add sock_diag interface for AF_XDP")
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      25dc18ff
    • B
      xsk: use state member for socket synchronization · 42fddcc7
      Björn Töpel 提交于
      Prior the state variable was introduced by Ilya, the dev member was
      used to determine whether the socket was bound or not. However, when
      dev was read, proper SMP barriers and READ_ONCE were missing. In order
      to address the missing barriers and READ_ONCE, we start using the
      state variable as a point of synchronization. The state member
      read/write is paired with proper SMP barriers, and from this follows
      that the members described above does not need READ_ONCE if used in
      conjunction with state check.
      
      In all syscalls and the xsk_rcv path we check if state is
      XSK_BOUND. If that is the case we do a SMP read barrier, and this
      implies that the dev, umem and all rings are correctly setup. Note
      that no READ_ONCE are needed for these variable if used when state is
      XSK_BOUND (plus the read barrier).
      
      To summarize: The members struct xdp_sock members dev, queue_id, umem,
      fq, cq, tx, rx, and state were read lock-less, with incorrect barriers
      and missing {READ, WRITE}_ONCE. Now, umem, fq, cq, tx, rx, and state
      are read lock-less. When these members are updated, WRITE_ONCE is
      used. When read, READ_ONCE are only used when read outside the control
      mutex (e.g. mmap) or, not synchronized with the state member
      (XSK_BOUND plus smp_rmb())
      
      Note that dev and queue_id do not need a WRITE_ONCE or READ_ONCE, due
      to the introduce state synchronization (XSK_BOUND plus smp_rmb()).
      
      Introducing the state check also fixes a race, found by syzcaller, in
      xsk_poll() where umem could be accessed when stale.
      Suggested-by: NHillf Danton <hdanton@sina.com>
      Reported-by: syzbot+c82697e3043781e08802@syzkaller.appspotmail.com
      Fixes: 77cd0d7b ("xsk: add support for need_wakeup flag in AF_XDP rings")
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      42fddcc7
    • B
      xsk: avoid store-tearing when assigning umem · 9764f4b3
      Björn Töpel 提交于
      The umem member of struct xdp_sock is read outside of the control
      mutex, in the mmap implementation, and needs a WRITE_ONCE to avoid
      potential store-tearing.
      Acked-by: NJonathan Lemon <jonathan.lemon@gmail.com>
      Fixes: 423f3832 ("xsk: add umem fill queue support and mmap")
      Signed-off-by: NBjörn Töpel <bjorn.topel@intel.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      9764f4b3