1. 03 6月, 2018 25 次提交
  2. 02 6月, 2018 15 次提交
    • A
      Revert "vfio/type1: Improve memory pinning process for raw PFN mapping" · 89c29def
      Alex Williamson 提交于
      Bisection by Amadeusz Sławiński implicates this commit leading to bad
      page state issues after VM shutdown, likely due to unbalanced page
      references.  The original commit was intended only as a performance
      improvement, therefore revert for offline rework.
      
      Link: https://lkml.org/lkml/2018/6/2/97
      Fixes: 356e88eb ("vfio/type1: Improve memory pinning process for raw PFN mapping")
      Cc: Jason Cai (Xiang Feng) <jason.cai@linux.alibaba.com>
      Reported-by: NAmadeusz Sławiński <amade@asmblr.net>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      89c29def
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next · 1ffdd8e1
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter/IPVS updates for net-next
      
      The following patchset contains Netfilter/IPVS updates for your net-next
      tree, the most relevant things in this batch are:
      
      1) Compile masquerade infrastructure into NAT module, from Florian Westphal.
         Same thing with the redirection support.
      
      2) Abort transaction if early initialization of the commit phase fails.
         Also from Florian.
      
      3) Get rid of synchronize_rcu() by using rule array in nf_tables, from
         Florian.
      
      4) Abort nf_tables batch if fatal signal is pending, from Florian.
      
      5) Use .call_rcu nfnetlink from nf_tables to make dumps fully lockless.
         From Florian Westphal.
      
      6) Support to match transparent sockets from nf_tables, from Máté Eckl.
      
      7) Audit support for nf_tables, from Phil Sutter.
      
      8) Validate chain dependencies from commit phase, fall back to fine grain
         validation only in case of errors.
      
      9) Attach dst to skbuff from netfilter flowtable packet path, from
         Jason A. Donenfeld.
      
      10) Use artificial maximum attribute cap to remove VLA from nfnetlink.
          Patch from Kees Cook.
      
      11) Add extension to allow to forward packets through neighbour layer.
      
      12) Add IPv6 conntrack helper support to IPVS, from Julian Anastasov.
      
      13) Add IPv6 FTP conntrack support to IPVS, from Julian Anastasov.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ffdd8e1
    • D
      Merge tag 'mlx5e-updates-2018-06-01' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · f39c6b29
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      mlx5e-updates-2018-06-01
      
      1) From Tariq, Two patches to Fix IPoIB issues introduced in
         "net/mlx5e: TX, Use actual WQE size for SQ edge fill"
      
      2) From Eran, Additional improvements to mlx5e statistics reporting
      
      3) From Maor, Increase aRFS flow tables size
      
      4) From Adi, Support MTU change for ethernet representors
      
      5) From Ilan and Adi, Handle QP error events in FPGA
      
      6) From Tariq, last 10 patches mainly deals with RX buffer scheme improvements for legacy RQ
         to use only order-0 pages and fragmented SKBs for large MTUs.
      
      -  Tariq starts with some refactoring and removing HW LRO support from traditional
         (legacy) RQ, since it complicates the buffer scheme and removing it makes it smoother
         to move to cyclic descriptor buffer for traditional RQ.
      
      - Use cyclic WQ in legacy RQ, which has many benefits and paves the way for fragmented SKBs
        for large MTUs.
      
      - Enhance legacy Receive Queue memory scheme, such that only order-0 pages are used.
        Whenever possible, prefer using a linear SKB, and build it wrapping the WQE buffer.
        Otherwise (for example, jumbo frames on x86), use non-linear SKB, with as many frags
        as needed. In this case, multiple WQE scatter entries are used, up to a maximum of 4
        frags and 10KB of MTU.
      
      - TX statistics access improvements.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f39c6b29
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf · cd075ce4
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      pull-request: bpf 2018-06-02
      
      The following pull-request contains BPF updates for your *net* tree.
      
      The main changes are:
      
      1) BPF uapi fix in struct bpf_prog_info and struct bpf_map_info in
         order to fix offsets on 32 bit archs.
      
      This will have a minor merge conflict with net-next which has the
      __u32 gpl_compatible:1 bitfield in struct bpf_prog_info at this
      location. Resolution is to use the gpl_compatible member.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd075ce4
    • D
      bpf: fix uapi hole for 32 bit compat applications · 36f9814a
      Daniel Borkmann 提交于
      In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the
      case of struct bpf_map_info but also struct bpf_prog_info. In net-next
      commit b85fab0e ("bpf: Add gpl_compatible flag to struct bpf_prog_info")
      added a bitfield into it to expose some flags related to programs. Thus,
      add an unnamed __u32 bitfield for both so that alignment keeps the same
      in both 32 and 64 bit cases, and can be naturally extended from there
      as in b85fab0e.
      
      Before:
      
        # file test.o
        test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
        # pahole test.o
        struct bpf_map_info {
      	__u32                      type;                 /*     0     4 */
      	__u32                      id;                   /*     4     4 */
      	__u32                      key_size;             /*     8     4 */
      	__u32                      value_size;           /*    12     4 */
      	__u32                      max_entries;          /*    16     4 */
      	__u32                      map_flags;            /*    20     4 */
      	char                       name[16];             /*    24    16 */
      	__u32                      ifindex;              /*    40     4 */
      	__u64                      netns_dev;            /*    44     8 */
      	__u64                      netns_ino;            /*    52     8 */
      
      	/* size: 64, cachelines: 1, members: 10 */
      	/* padding: 4 */
        };
      
      After (same as on 64 bit):
      
        # file test.o
        test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
        # pahole test.o
        struct bpf_map_info {
      	__u32                      type;                 /*     0     4 */
      	__u32                      id;                   /*     4     4 */
      	__u32                      key_size;             /*     8     4 */
      	__u32                      value_size;           /*    12     4 */
      	__u32                      max_entries;          /*    16     4 */
      	__u32                      map_flags;            /*    20     4 */
      	char                       name[16];             /*    24    16 */
      	__u32                      ifindex;              /*    40     4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	__u64                      netns_dev;            /*    48     8 */
      	__u64                      netns_ino;            /*    56     8 */
      	/* --- cacheline 1 boundary (64 bytes) --- */
      
      	/* size: 64, cachelines: 1, members: 10 */
      	/* sum members: 60, holes: 1, sum holes: 4 */
        };
      Reported-by: NDmitry V. Levin <ldv@altlinux.org>
      Reported-by: NEugene Syromiatnikov <esyr@redhat.com>
      Fixes: 52775b33 ("bpf: offload: report device information about offloaded maps")
      Fixes: 675fc275 ("bpf: offload: report device information for offloaded programs")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      36f9814a
    • T
      net/mlx5e: TX, Separate cachelines of xmit and completion stats · f65a59ff
      Tariq Toukan 提交于
      Avoid false sharing of cachelines by separating the cachelines of
      TX stats that are dertied in xmit flow and in completion flow.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      f65a59ff
    • T
      net/mlx5e: RX, Always prefer Linear SKB configuration · 5ffd8194
      Tariq Toukan 提交于
      Prefer the linear SKB configuration of Legacy RQ over the
      non-linear one of Striding RQ.
      
      This implies that ConnectX-4 LX now uses legacy RQ by default,
      as it does not support the linear configuration of Striding RQ.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      5ffd8194
    • T
      net/mlx5e: RX, Enhance legacy Receive Queue memory scheme · 069d1146
      Tariq Toukan 提交于
      Enhance the memory scheme of the legacy RQ, such that
      only order-0 pages are used.
      
      Whenever possible, prefer using a linear SKB, and build it
      wrapping the WQE buffer.
      
      Otherwise (for example, jumbo frames on x86), use non-linear SKB,
      with as many frags as needed. In this case, multiple WQE
      scatter entries are used, up to a maximum of 4 frags and 10KB of MTU.
      
      This implied to remove support of HW LRO in legacy RQ, as it would
      require large number of page allocations and scatter entries per WQE
      on archs with PAGE_SIZE = 4KB, yielding bad performance.
      
      In earlier patches, we guaranteed that all completions are in-order,
      and that we use a cyclic WQ.
      This creates an oppurtunity for a performance optimization:
      The mapping between a "struct mlx5e_dma_info", and the
      WQEs (struct mlx5e_wqe_frag_info) pointing to it, is constant
      across different cycles of a WQ. This allows initializing
      the mapping in the time of RQ creation, and not handle it
      in datapath.
      
      A struct mlx5e_dma_info that is shared between different WQEs
      is allocated by the first WQE, and freed by the last one.
      This implies an important requirement: WQEs that share the same
      struct mlx5e_dma_info must be posted within the same NAPI.
      Otherwise, upon completion, struct mlx5e_wqe_frag_info would mistakenly
      point to the new struct mlx5e_dma_info, not the one that was posted
      (and the HW wrote to).
      This bulking requirement is actually good also for performance reasons,
      hence we extend the bulk beyong the minimal requirement above.
      
      With this memory scheme, the RQs memory footprint is reduce by a
      factor of 2 on x86, and by a factor of 32 on PowerPC.
      Same factors apply for the number of pages in a GRO session.
      
      Performance tests:
      ConnectX-4, single core, single RX ring, default MTU.
      
      x86:
      CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
      
      Packet rate (early drop in TC): no degradation
      TCP streams: ~5% improvement
      
      PowerPC:
      CPU: POWER8 (raw), altivec supported
      
      Packet rate (early drop in TC): 20% gain
      TCP streams: 25% gain
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      069d1146
    • T
      net/mlx5e: RX, Use cyclic WQ in legacy RQ · 99cbfa93
      Tariq Toukan 提交于
      Now that LRO is not supported for Legacy RQ, there is no source of
      out-of-order completions in the WQ, and we can use a cyclic one.
      This has multiple advantages:
      - reduces the WQE size (smaller PCI transactions).
      - lower overhead in datapath (no handling of 'next' pointers).
      - no reserved WQE for the WQ head (was need in linked-list).
      - allows using a constant map between frag and dma_info struct, in downstream patch.
      
      Performance tests:
      ConnectX-4, single core, single RX ring.
      Major gain in packet rate of single ring XDP drop.
      Bottleneck is shifted form HW (at 16Mpps) to SW (at 20Mpps).
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      99cbfa93
    • T
      net/mlx5e: RX, Split WQ objects for different RQ types · 422d4c40
      Tariq Toukan 提交于
      Replace the common RQ WQ object with two separate ones for the
      different RQ types.
      This is in preparation for switching to using a cyclic WQ type
      in Legacy RQ.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      422d4c40
    • T
      net/mlx5e: RX, Remove HW LRO support in legacy RQ · 6c3a823e
      Tariq Toukan 提交于
      Current LRO implementation in Legacy RQ uses high-order pages.
      In downstream patches of this series we complete the transition
      to using only order-0 pages in RX datapath (which was already done
      in Striding RQ).
      
      Unlike the more advanced Striding RQ, Legacy RQ does not make reuse
      of any non-consumed buffers of non-full LRO sessions, and combining
      it with order-0 pages has many performance drawbacks.
      
      Hence, here we totally remove LRO support in Legacy RQ.
      This guarantees having no out-of-order completions, which allows using
      a cyclic work queue (instead of a linked-list) in a downstream patch.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      6c3a823e
    • T
      net/mlx5e: RX, Dedicate a function for copying SKB header · 386471f1
      Tariq Toukan 提交于
      Get the logic of copying the packet header into the SKB linear part
      into a generic function. Function does copy length alignment
      and dma buffer sync.
      
      It is currently called only within the MPWQE flow.
      In a downstream patch, it will be called within the legacy RQ flow
      as well.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      386471f1
    • T
      net/mlx5e: RX, Generalise function of SKB frag addition · fa698366
      Tariq Toukan 提交于
      Rename it and pass truesize as an extra argument, as it will be used also
      in Legacy RQ in a downstream patch.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      fa698366
    • T
      net/mlx5e: RX, Generalise name of non-linear SKB head size · 75aa889f
      Tariq Toukan 提交于
      Make name more generic by dropping MPWRQ from it, as it will be
      used also in Legacy RQ in a downstream patch.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      75aa889f
    • T
      net/mlx5e: TX, Obsolete maintaining local copies of skb->len/data · 5e7d77a9
      Tariq Toukan 提交于
      Instead of maintaining a local copy of skb->len/data and updating
      it upon every copy to the WQE inline part, just calculate it once
      when needed, using the ihs.
      
      This obsoletes the function mlx5e_tx_skb_pull_inline.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      5e7d77a9