1. 24 4月, 2020 1 次提交
  2. 23 7月, 2019 1 次提交
  3. 02 4月, 2019 1 次提交
  4. 21 3月, 2019 1 次提交
    • P
      net: remove 'fallback' argument from dev->ndo_select_queue() · a350ecce
      Paolo Abeni 提交于
      After the previous patch, all the callers of ndo_select_queue()
      provide as a 'fallback' argument netdev_pick_tx.
      The only exceptions are nested calls to ndo_select_queue(),
      which pass down the 'fallback' available in the current scope
      - still netdev_pick_tx.
      
      We can drop such argument and replace fallback() invocation with
      netdev_pick_tx(). This avoids an indirect call per xmit packet
      in some scenarios (TCP syn, UDP unconnected, XDP generic, pktgen)
      with device drivers implementing such ndo. It also clean the code
      a bit.
      
      Tested with ixgbe and CONFIG_FCOE=m
      
      With pktgen using queue xmit:
      threads		vanilla 	patched
      		(kpps)		(kpps)
      1		2334		2428
      2		4166		4278
      4		7895		8100
      
       v1 -> v2:
       - rebased after helper's name change
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a350ecce
  5. 18 12月, 2018 1 次提交
  6. 04 11月, 2018 1 次提交
  7. 10 7月, 2018 2 次提交
  8. 30 4月, 2018 1 次提交
  9. 25 10月, 2017 1 次提交
    • M
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns... · 6aa7de05
      Mark Rutland 提交于
      locking/atomics: COCCINELLE/treewide: Convert trivial ACCESS_ONCE() patterns to READ_ONCE()/WRITE_ONCE()
      
      Please do not apply this to mainline directly, instead please re-run the
      coccinelle script shown below and apply its output.
      
      For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
      preference to ACCESS_ONCE(), and new code is expected to use one of the
      former. So far, there's been no reason to change most existing uses of
      ACCESS_ONCE(), as these aren't harmful, and changing them results in
      churn.
      
      However, for some features, the read/write distinction is critical to
      correct operation. To distinguish these cases, separate read/write
      accessors must be used. This patch migrates (most) remaining
      ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
      coccinelle script:
      
      ----
      // Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
      // WRITE_ONCE()
      
      // $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
      
      virtual patch
      
      @ depends on patch @
      expression E1, E2;
      @@
      
      - ACCESS_ONCE(E1) = E2
      + WRITE_ONCE(E1, E2)
      
      @ depends on patch @
      expression E;
      @@
      
      - ACCESS_ONCE(E)
      + READ_ONCE(E)
      ----
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: davem@davemloft.net
      Cc: linux-arch@vger.kernel.org
      Cc: mpe@ellerman.id.au
      Cc: shuah@kernel.org
      Cc: snitzer@redhat.com
      Cc: thor.thayer@linux.intel.com
      Cc: tj@kernel.org
      Cc: viro@zeniv.linux.org.uk
      Cc: will.deacon@arm.com
      Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6aa7de05
  10. 12 10月, 2017 3 次提交
  11. 10 10月, 2017 1 次提交
  12. 17 8月, 2017 1 次提交
  13. 24 7月, 2017 1 次提交
  14. 18 7月, 2017 1 次提交
  15. 30 6月, 2017 1 次提交
    • I
      net/mlx4_en: Do not allocate redundant TX queues when TC is disabled · ec327f7a
      Inbar Karmy 提交于
      Currently the number of TX queues that are allocated doesn't depend
      on the number of TCs, the module always loads with max num of UP
      per channel.
      In order to prevent the allocation of unnecessary memory, the
      module will load with minimum number of UPs per channel, and the
      user will be able to control the number of TX queues per channel
      by changing the number of TC to 8 using the tc command.
      The variable num_up will hold the information about the current
      number of UPs.
      Due to the change, needed to remove the lines that set the value of
      UP to be different than zero in the func "mlx4_en_select_queue",
      since now the num of TX queues that are allocated is only one per channel
      in default.
      In order not to force the UP to be zero in case of only one TC, added
      a condition before forcing it in the func "mlx4_en_fill_qp_context".
      
      Tested:
      After the module is loaded with minimum number of UP per channel, to
      increase num of TCs to 8, use:
      tc qdisc add dev ens8 root mqprio num_tc 8
      In order to decrease the number of TCs to minimum number of UP per channel,
      use:
      tc qdisc del dev ens8 root
      Signed-off-by: NInbar Karmy <inbark@mellanox.com>
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Cc: Tarick Bedeir <tarick@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec327f7a
  16. 16 6月, 2017 7 次提交
  17. 09 5月, 2017 1 次提交
    • M
      treewide: use kv[mz]alloc* rather than opencoded variants · 752ade68
      Michal Hocko 提交于
      There are many code paths opencoding kvmalloc.  Let's use the helper
      instead.  The main difference to kvmalloc is that those users are
      usually not considering all the aspects of the memory allocator.  E.g.
      allocation requests <= 32kB (with 4kB pages) are basically never failing
      and invoke OOM killer to satisfy the allocation.  This sounds too
      disruptive for something that has a reasonable fallback - the vmalloc.
      On the other hand those requests might fallback to vmalloc even when the
      memory allocator would succeed after several more reclaim/compaction
      attempts previously.  There is no guarantee something like that happens
      though.
      
      This patch converts many of those places to kv[mz]alloc* helpers because
      they are more conservative.
      
      Link: http://lkml.kernel.org/r/20170306103327.2766-2-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> # Xen bits
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: Andreas Dilger <andreas.dilger@intel.com> # Lustre
      Acked-by: Christian Borntraeger <borntraeger@de.ibm.com> # KVM/s390
      Acked-by: Dan Williams <dan.j.williams@intel.com> # nvdim
      Acked-by: David Sterba <dsterba@suse.com> # btrfs
      Acked-by: Ilya Dryomov <idryomov@gmail.com> # Ceph
      Acked-by: Tariq Toukan <tariqt@mellanox.com> # mlx4
      Acked-by: Leon Romanovsky <leonro@mellanox.com> # mlx5
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Anton Vorontsov <anton@enomsg.org>
      Cc: Colin Cross <ccross@android.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Kent Overstreet <kent.overstreet@gmail.com>
      Cc: Santosh Raspatur <santosh@chelsio.com>
      Cc: Hariprasad S <hariprasad@chelsio.com>
      Cc: Yishai Hadas <yishaih@mellanox.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: "Yan, Zheng" <zyan@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      752ade68
  18. 07 4月, 2017 1 次提交
  19. 10 3月, 2017 2 次提交
  20. 31 1月, 2017 1 次提交
  21. 09 12月, 2016 1 次提交
  22. 25 11月, 2016 1 次提交
    • E
      mlx4: reorganize struct mlx4_en_tx_ring · e3f42f84
      Eric Dumazet 提交于
      Goal is to reorganize this critical structure to increase performance.
      
      ndo_start_xmit() should only dirty one cache line, and access as few
      cache lines as possible.
      
      Add sp_ (Slow Path) prefix to fields that are not used in fast path,
      to make clear what is going on.
      
      After this patch pahole reports something much better, as all
      ndo_start_xmit() needed fields are packed into two cache lines instead
      of seven or eight
      
      struct mlx4_en_tx_ring {
      	u32                        last_nr_txbb;         /*     0   0x4 */
      	u32                        cons;                 /*   0x4   0x4 */
      	long unsigned int          wake_queue;           /*   0x8   0x8 */
      	struct netdev_queue *      tx_queue;             /*  0x10   0x8 */
      	u32                        (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /*  0x18   0x8 */
      	struct mlx4_en_rx_ring *   recycle_ring;         /*  0x20   0x8 */
      
      	/* XXX 24 bytes hole, try to pack */
      
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	u32                        prod;                 /*  0x40   0x4 */
      	unsigned int               tx_dropped;           /*  0x44   0x4 */
      	long unsigned int          bytes;                /*  0x48   0x8 */
      	long unsigned int          packets;              /*  0x50   0x8 */
      	long unsigned int          tx_csum;              /*  0x58   0x8 */
      	long unsigned int          tso_packets;          /*  0x60   0x8 */
      	long unsigned int          xmit_more;            /*  0x68   0x8 */
      	struct mlx4_bf             bf;                   /*  0x70  0x18 */
      	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
      	__be32                     doorbell_qpn;         /*  0x88   0x4 */
      	__be32                     mr_key;               /*  0x8c   0x4 */
      	u32                        size;                 /*  0x90   0x4 */
      	u32                        size_mask;            /*  0x94   0x4 */
      	u32                        full_size;            /*  0x98   0x4 */
      	u32                        buf_size;             /*  0x9c   0x4 */
      	void *                     buf;                  /*  0xa0   0x8 */
      	struct mlx4_en_tx_info *   tx_info;              /*  0xa8   0x8 */
      	int                        qpn;                  /*  0xb0   0x4 */
      	u8                         queue_index;          /*  0xb4   0x1 */
      	bool                       bf_enabled;           /*  0xb5   0x1 */
      	bool                       bf_alloced;           /*  0xb6   0x1 */
      	u8                         hwtstamp_tx_type;     /*  0xb7   0x1 */
      	u8 *                       bounce_buf;           /*  0xb8   0x8 */
      	/* --- cacheline 3 boundary (192 bytes) --- */
      	long unsigned int          queue_stopped;        /*  0xc0   0x8 */
      	struct mlx4_hwq_resources  sp_wqres;             /*  0xc8  0x58 */
      	/* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */
      	struct mlx4_qp             sp_qp;                /* 0x120  0x30 */
      	/* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */
      	struct mlx4_qp_context     sp_context;           /* 0x150  0xf8 */
      	/* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
      	cpumask_t                  sp_affinity_mask;     /* 0x248  0x20 */
      	enum mlx4_qp_state         sp_qp_state;          /* 0x268   0x4 */
      	u16                        sp_stride;            /* 0x26c   0x2 */
      	u16                        sp_cqn;               /* 0x26e   0x2 */
      
      	/* size: 640, cachelines: 10, members: 36 */
      	/* sum members: 600, holes: 1, sum holes: 24 */
      	/* padding: 16 */
      };
      
      Instead of this silly placement :
      
      struct mlx4_en_tx_ring {
      	u32                        last_nr_txbb;         /*     0   0x4 */
      	u32                        cons;                 /*   0x4   0x4 */
      	long unsigned int          wake_queue;           /*   0x8   0x8 */
      
      	/* XXX 48 bytes hole, try to pack */
      
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	u32                        prod;                 /*  0x40   0x4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	long unsigned int          bytes;                /*  0x48   0x8 */
      	long unsigned int          packets;              /*  0x50   0x8 */
      	long unsigned int          tx_csum;              /*  0x58   0x8 */
      	long unsigned int          tso_packets;          /*  0x60   0x8 */
      	long unsigned int          xmit_more;            /*  0x68   0x8 */
      	unsigned int               tx_dropped;           /*  0x70   0x4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct mlx4_bf             bf;                   /*  0x78  0x18 */
      	/* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
      	long unsigned int          queue_stopped;        /*  0x90   0x8 */
      	cpumask_t                  affinity_mask;        /*  0x98  0x10 */
      	struct mlx4_qp             qp;                   /*  0xa8  0x30 */
      	/* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */
      	struct mlx4_hwq_resources  wqres;                /*  0xd8  0x58 */
      	/* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */
      	u32                        size;                 /* 0x130   0x4 */
      	u32                        size_mask;            /* 0x134   0x4 */
      	u16                        stride;               /* 0x138   0x2 */
      
      	/* XXX 2 bytes hole, try to pack */
      
      	u32                        full_size;            /* 0x13c   0x4 */
      	/* --- cacheline 5 boundary (320 bytes) --- */
      	u16                        cqn;                  /* 0x140   0x2 */
      
      	/* XXX 2 bytes hole, try to pack */
      
      	u32                        buf_size;             /* 0x144   0x4 */
      	__be32                     doorbell_qpn;         /* 0x148   0x4 */
      	__be32                     mr_key;               /* 0x14c   0x4 */
      	void *                     buf;                  /* 0x150   0x8 */
      	struct mlx4_en_tx_info *   tx_info;              /* 0x158   0x8 */
      	struct mlx4_en_rx_ring *   recycle_ring;         /* 0x160   0x8 */
      	u32                        (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x168   0x8 */
      	u8 *                       bounce_buf;           /* 0x170   0x8 */
      	struct mlx4_qp_context     context;              /* 0x178  0xf8 */
      	/* --- cacheline 9 boundary (576 bytes) was 48 bytes ago --- */
      	int                        qpn;                  /* 0x270   0x4 */
      	enum mlx4_qp_state         qp_state;             /* 0x274   0x4 */
      	u8                         queue_index;          /* 0x278   0x1 */
      	bool                       bf_enabled;           /* 0x279   0x1 */
      	bool                       bf_alloced;           /* 0x27a   0x1 */
      
      	/* XXX 5 bytes hole, try to pack */
      
      	/* --- cacheline 10 boundary (640 bytes) --- */
      	struct netdev_queue *      tx_queue;             /* 0x280   0x8 */
      	int                        hwtstamp_tx_type;     /* 0x288   0x4 */
      
      	/* size: 704, cachelines: 11, members: 36 */
      	/* sum members: 587, holes: 6, sum holes: 65 */
      	/* padding: 52 */
      };
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3f42f84
  23. 03 11月, 2016 2 次提交
    • T
      net/mlx4_en: Add ethtool statistics for XDP cases · 15fca2c8
      Tariq Toukan 提交于
      XDP statistics are reported in ethtool, in total and per ring,
      as follows:
      - xdp_drop: the number of packets dropped by xdp.
      - xdp_tx: the number of packets forwarded by xdp.
      - xdp_tx_full: the number of times an xdp forward failed
      	due to a full tx xdp ring.
      
      In addition, all packets that are dropped/forwarded by XDP
      are no longer accounted in rx_packets/rx_bytes of the ring,
      so that they count traffic that is passed to the stack.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15fca2c8
    • T
      net/mlx4_en: Refactor the XDP forwarding rings scheme · 67f8b1dc
      Tariq Toukan 提交于
      Separately manage the two types of TX rings: regular ones, and XDP.
      Upon an XDP set, do not borrow regular TX rings and convert them
      into XDP ones, but allocate new ones, unless we hit the max number
      of rings.
      Which means that in systems with smaller #cores we will not consume
      the current TX rings for XDP, while we are still in the num TX limit.
      
      XDP TX rings counters are not shown in ethtool statistics.
      Instead, XDP counters will be added to the respective RX rings
      in a downstream patch.
      
      This has no performance implications.
      Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67f8b1dc
  24. 12 9月, 2016 1 次提交
  25. 20 7月, 2016 2 次提交
    • B
      net/mlx4_en: add xdp forwarding and data write support · 9ecc2d86
      Brenden Blanco 提交于
      A user will now be able to loop packets back out of the same port using
      a bpf program attached to xdp hook. Updates to the packet contents from
      the bpf program is also supported.
      
      For the packet write feature to work, the rx buffers are now mapped as
      bidirectional when the page is allocated. This occurs only when the xdp
      hook is active.
      
      When the program returns a TX action, enqueue the packet directly to a
      dedicated tx ring, so as to avoid completely any locking. This requires
      the tx ring to be allocated 1:1 for each rx ring, as well as the tx
      completion running in the same softirq.
      
      Upon tx completion, this dedicated tx ring recycles pages without
      unmapping directly back to the original rx ring. In steady state tx/drop
      workload, effectively 0 page allocs/frees will occur.
      
      In order to separate out the paths between free and recycle, a
      free_tx_desc func pointer is introduced that is optionally updated
      whenever recycle_ring is activated. By default the original free
      function is always initialized.
      Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ecc2d86
    • B
      net/mlx4_en: break out tx_desc write into separate function · 224e92e0
      Brenden Blanco 提交于
      In preparation for writing the tx descriptor from multiple functions,
      create a helper for both normal and blueflame access.
      Signed-off-by: NBrenden Blanco <bblanco@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      224e92e0
  26. 26 5月, 2016 1 次提交
  27. 06 5月, 2016 1 次提交
    • H
      net/mlx4: Avoid wrong virtual mappings · 73898db0
      Haggai Abramovsky 提交于
      The dma_alloc_coherent() function returns a virtual address which can
      be used for coherent access to the underlying memory.  On some
      architectures, like arm64, undefined behavior results if this memory is
      also accessed via virtual mappings that are not coherent.  Because of
      their undefined nature, operations like virt_to_page() return garbage
      when passed virtual addresses obtained from dma_alloc_coherent().  Any
      subsequent mappings via vmap() of the garbage page values are unusable
      and result in bad things like bus errors (synchronous aborts in ARM64
      speak).
      
      The mlx4 driver contains code that does the equivalent of:
      vmap(virt_to_page(dma_alloc_coherent)), this results in an OOPs when the
      device is opened.
      
      Prevent Ethernet driver to run this problematic code by forcing it to
      allocate contiguous memory. As for the Infiniband driver, at first we
      are trying to allocate contiguous memory, but in case of failure roll
      back to work with fragmented memory.
      Signed-off-by: NHaggai Abramovsky <hagaya@mellanox.com>
      Signed-off-by: NYishai Hadas <yishaih@mellanox.com>
      Reported-by: NDavid Daney <david.daney@cavium.com>
      Tested-by: NSinan Kaya <okaya@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73898db0
  28. 05 5月, 2016 1 次提交
    • A
      net/mlx4_en: Add support for inner IPv6 checksum offloads and TSO · 09067122
      Alexander Duyck 提交于
      >From what I can tell the ConnectX-3 will support an inner IPv6 checksum and
      segmentation offload, however it cannot support outer IPv6 headers.  This
      assumption is based on the fact that I could see the checksum being
      offloaded for inner header on IPv4 tunnels, but not on IPv6 tunnels.
      
      For this reason I am adding the feature to the hw_enc_features and adding
      an extra check to the features_check call that will disable GSO and
      checksum offload in the case that the encapsulated frame has an outer IP
      version of that is not 4.  The check in mlx4_en_features_check could be
      removed if at some point in the future a fix is found that allows the
      hardware to offload segmentation/checksum on tunnels with an outer IPv6
      header.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09067122