1. 25 11月, 2016 22 次提交
    • A
      samples/bpf: fix bpf loader · db6a71dd
      Alexei Starovoitov 提交于
      llvm can emit relocations into sections other than program code
      (like debug info sections). Ignore them during parsing of elf file
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db6a71dd
    • A
      samples/bpf: fix sockex2 example · d2b024d3
      Alexei Starovoitov 提交于
      since llvm commit "Do not expand UNDEF SDNode during insn selection lowering"
      llvm will generate code that uses uninitialized registers for cases
      where C code is actually uses uninitialized data.
      So this sockex2 example is technically broken.
      Fix it by initializing on the stack variable fully.
      Also increase verifier buffer limit, since verifier output
      may not fit in 64k for this sockex2 code depending on llvm version.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2b024d3
    • E
      mlx4: reorganize struct mlx4_en_tx_ring · e3f42f84
      Eric Dumazet 提交于
      Goal is to reorganize this critical structure to increase performance.
      
      ndo_start_xmit() should only dirty one cache line, and access as few
      cache lines as possible.
      
      Add sp_ (Slow Path) prefix to fields that are not used in fast path,
      to make clear what is going on.
      
      After this patch pahole reports something much better, as all
      ndo_start_xmit() needed fields are packed into two cache lines instead
      of seven or eight
      
      struct mlx4_en_tx_ring {
      	u32                        last_nr_txbb;         /*     0   0x4 */
      	u32                        cons;                 /*   0x4   0x4 */
      	long unsigned int          wake_queue;           /*   0x8   0x8 */
      	struct netdev_queue *      tx_queue;             /*  0x10   0x8 */
      	u32                        (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /*  0x18   0x8 */
      	struct mlx4_en_rx_ring *   recycle_ring;         /*  0x20   0x8 */
      
      	/* XXX 24 bytes hole, try to pack */
      
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	u32                        prod;                 /*  0x40   0x4 */
      	unsigned int               tx_dropped;           /*  0x44   0x4 */
      	long unsigned int          bytes;                /*  0x48   0x8 */
      	long unsigned int          packets;              /*  0x50   0x8 */
      	long unsigned int          tx_csum;              /*  0x58   0x8 */
      	long unsigned int          tso_packets;          /*  0x60   0x8 */
      	long unsigned int          xmit_more;            /*  0x68   0x8 */
      	struct mlx4_bf             bf;                   /*  0x70  0x18 */
      	/* --- cacheline 2 boundary (128 bytes) was 8 bytes ago --- */
      	__be32                     doorbell_qpn;         /*  0x88   0x4 */
      	__be32                     mr_key;               /*  0x8c   0x4 */
      	u32                        size;                 /*  0x90   0x4 */
      	u32                        size_mask;            /*  0x94   0x4 */
      	u32                        full_size;            /*  0x98   0x4 */
      	u32                        buf_size;             /*  0x9c   0x4 */
      	void *                     buf;                  /*  0xa0   0x8 */
      	struct mlx4_en_tx_info *   tx_info;              /*  0xa8   0x8 */
      	int                        qpn;                  /*  0xb0   0x4 */
      	u8                         queue_index;          /*  0xb4   0x1 */
      	bool                       bf_enabled;           /*  0xb5   0x1 */
      	bool                       bf_alloced;           /*  0xb6   0x1 */
      	u8                         hwtstamp_tx_type;     /*  0xb7   0x1 */
      	u8 *                       bounce_buf;           /*  0xb8   0x8 */
      	/* --- cacheline 3 boundary (192 bytes) --- */
      	long unsigned int          queue_stopped;        /*  0xc0   0x8 */
      	struct mlx4_hwq_resources  sp_wqres;             /*  0xc8  0x58 */
      	/* --- cacheline 4 boundary (256 bytes) was 32 bytes ago --- */
      	struct mlx4_qp             sp_qp;                /* 0x120  0x30 */
      	/* --- cacheline 5 boundary (320 bytes) was 16 bytes ago --- */
      	struct mlx4_qp_context     sp_context;           /* 0x150  0xf8 */
      	/* --- cacheline 9 boundary (576 bytes) was 8 bytes ago --- */
      	cpumask_t                  sp_affinity_mask;     /* 0x248  0x20 */
      	enum mlx4_qp_state         sp_qp_state;          /* 0x268   0x4 */
      	u16                        sp_stride;            /* 0x26c   0x2 */
      	u16                        sp_cqn;               /* 0x26e   0x2 */
      
      	/* size: 640, cachelines: 10, members: 36 */
      	/* sum members: 600, holes: 1, sum holes: 24 */
      	/* padding: 16 */
      };
      
      Instead of this silly placement :
      
      struct mlx4_en_tx_ring {
      	u32                        last_nr_txbb;         /*     0   0x4 */
      	u32                        cons;                 /*   0x4   0x4 */
      	long unsigned int          wake_queue;           /*   0x8   0x8 */
      
      	/* XXX 48 bytes hole, try to pack */
      
      	/* --- cacheline 1 boundary (64 bytes) --- */
      	u32                        prod;                 /*  0x40   0x4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	long unsigned int          bytes;                /*  0x48   0x8 */
      	long unsigned int          packets;              /*  0x50   0x8 */
      	long unsigned int          tx_csum;              /*  0x58   0x8 */
      	long unsigned int          tso_packets;          /*  0x60   0x8 */
      	long unsigned int          xmit_more;            /*  0x68   0x8 */
      	unsigned int               tx_dropped;           /*  0x70   0x4 */
      
      	/* XXX 4 bytes hole, try to pack */
      
      	struct mlx4_bf             bf;                   /*  0x78  0x18 */
      	/* --- cacheline 2 boundary (128 bytes) was 16 bytes ago --- */
      	long unsigned int          queue_stopped;        /*  0x90   0x8 */
      	cpumask_t                  affinity_mask;        /*  0x98  0x10 */
      	struct mlx4_qp             qp;                   /*  0xa8  0x30 */
      	/* --- cacheline 3 boundary (192 bytes) was 24 bytes ago --- */
      	struct mlx4_hwq_resources  wqres;                /*  0xd8  0x58 */
      	/* --- cacheline 4 boundary (256 bytes) was 48 bytes ago --- */
      	u32                        size;                 /* 0x130   0x4 */
      	u32                        size_mask;            /* 0x134   0x4 */
      	u16                        stride;               /* 0x138   0x2 */
      
      	/* XXX 2 bytes hole, try to pack */
      
      	u32                        full_size;            /* 0x13c   0x4 */
      	/* --- cacheline 5 boundary (320 bytes) --- */
      	u16                        cqn;                  /* 0x140   0x2 */
      
      	/* XXX 2 bytes hole, try to pack */
      
      	u32                        buf_size;             /* 0x144   0x4 */
      	__be32                     doorbell_qpn;         /* 0x148   0x4 */
      	__be32                     mr_key;               /* 0x14c   0x4 */
      	void *                     buf;                  /* 0x150   0x8 */
      	struct mlx4_en_tx_info *   tx_info;              /* 0x158   0x8 */
      	struct mlx4_en_rx_ring *   recycle_ring;         /* 0x160   0x8 */
      	u32                        (*free_tx_desc)(struct mlx4_en_priv *, struct mlx4_en_tx_ring *, int, u8, u64, int); /* 0x168   0x8 */
      	u8 *                       bounce_buf;           /* 0x170   0x8 */
      	struct mlx4_qp_context     context;              /* 0x178  0xf8 */
      	/* --- cacheline 9 boundary (576 bytes) was 48 bytes ago --- */
      	int                        qpn;                  /* 0x270   0x4 */
      	enum mlx4_qp_state         qp_state;             /* 0x274   0x4 */
      	u8                         queue_index;          /* 0x278   0x1 */
      	bool                       bf_enabled;           /* 0x279   0x1 */
      	bool                       bf_alloced;           /* 0x27a   0x1 */
      
      	/* XXX 5 bytes hole, try to pack */
      
      	/* --- cacheline 10 boundary (640 bytes) --- */
      	struct netdev_queue *      tx_queue;             /* 0x280   0x8 */
      	int                        hwtstamp_tx_type;     /* 0x288   0x4 */
      
      	/* size: 704, cachelines: 11, members: 36 */
      	/* sum members: 587, holes: 6, sum holes: 65 */
      	/* padding: 52 */
      };
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NTariq Toukan <tariqt@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3f42f84
    • F
      ethtool: Protect {get, set}_phy_tunable with PHY device mutex · 4b65246b
      Florian Fainelli 提交于
      PHY drivers should be able to rely on the caller of {get,set}_tunable to
      have acquired the PHY device mutex, in order to both serialize against
      concurrent calls of these functions, but also against PHY state machine
      changes. All ethtool PHY-level functions do this, except
      {get,set}_tunable, so we make them consistent here as well.
      
      We need to update the Microsemi PHY driver in the same commit to avoid
      introducing either deadlocks, or lack of proper locking.
      
      Fixes: 968ad9da ("ethtool: Implements ETHTOOL_PHY_GTUNABLE/ETHTOOL_PHY_STUNABLE")
      Fixes: 310d9ad5 ("net: phy: Add downshift get/set support in Microsemi PHYs driver")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NAllan W. Nielsen <allan.nielsen@microsemi.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b65246b
    • D
      Merge branch 'mlx5-next' · fab96ec8
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      Mellanox 100G mlx5 SRIOV switchdev update
      
      This series from Roi and Or further enhances the new SRIOV switchdev mode.
      
      Roi's patches deal with allowing users to configure though devlink
      the level of inline headers that the VF should be setting in order for
      the eswitch HW to do proper matching. We also enforce that the matching
      required for offloaded TC rules is aligned with that level on the PF driver.
      
      Or's patches deals with allowing the user to control on the VF operational
      link state through admin directives on the mlx5 VF rep link. Also in this series
      is implementation of HW and SW counters for the mlx5 VF rep which is aligned
      with the design set by commit a5ea31f5 'Merge branch net-offloaded-stats'.
      
      v1 --> v2:
      * constified the net-device param of get offloaded stats ndo in mlxsw
        (pointed by 0-day screaming on us...)
      * added Or's Review-by tags for Roi's patches
      
      This series was generated against commit
      e796f49d ("net: ieee802154: constify ieee802154_ops structures")
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fab96ec8
    • R
      net/mlx5e: Enforce min inline mode when offloading flows · de0af0bf
      Roi Dayan 提交于
      A flow should be offloaded only if the matches are
      allowed according to min inline mode.
      Signed-off-by: NRoi Dayan <roid@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      de0af0bf
    • R
      net/mlx5: E-Switch, Add control for inline mode · bffaa916
      Roi Dayan 提交于
      Implement devlink show and set of HW inline-mode.
      The supported modes: none, link, network, transport.
      We currently support one mode for all vports so set is done on all vports.
      When eswitch is first initialized the inline-mode is queried from the FW.
      Signed-off-by: NRoi Dayan <roid@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bffaa916
    • R
      net/mlx5: Enable to query min inline for a specific vport · 34e4e990
      Roi Dayan 提交于
      Also move the inline capablities enum to a shared header vport.h
      Signed-off-by: NRoi Dayan <roid@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34e4e990
    • R
      devlink: Add E-Switch inline mode control · 59bfde01
      Roi Dayan 提交于
      Some HWs need the VF driver to put part of the packet headers on the
      TX descriptor so the e-switch can do proper matching and steering.
      
      The supported modes: none, link, network, transport.
      Signed-off-by: NRoi Dayan <roid@mellanox.com>
      Reviewed-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59bfde01
    • O
      net/mlx5e: Support VF vport link state control for SRIOV switchdev mode · 20a1ea67
      Or Gerlitz 提交于
      Reflect the administative link changes done on the VF representor to the
      VF e-switch vport. This means that doing ip link set down/up commands on
      the VF rep will modify the e-switch vport state which in turn will make
      proper VF drivers to set their carrier accordingly.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20a1ea67
    • O
      net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode · 370bad0f
      Or Gerlitz 提交于
      Switchdev driver net-device port statistics should follow the model introduced
      in commit a5ea31f5 'Merge branch net-offloaded-stats'.
      
      For VF reps we return the SRIOV eswitch vport stats as the usual ones and SW stats
      if asked. For the PF, if we're in the switchdev mode, we return the uplink stats
      and SW stats if asked, otherwise as before. The uplink stats are implemented using
      the PPCNT 802_3 counters which are already being read/cached by the driver.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      370bad0f
    • O
      net: Add net-device param to the get offloaded stats ndo · 3df5b3c6
      Or Gerlitz 提交于
      Some drivers would need to check few internal matters for
      that. To be used in downstream mlx5 commit.
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3df5b3c6
    • D
      Merge branch 'phy-broadcom-wirespeed-downshift-support' · ac32378f
      David S. Miller 提交于
      Florian Fainelli says:
      
      ====================
      net: phy: broadcom: Wirespeed/downshift support
      
      This patch series adds support for the Broadcom Wirespeed, aka
      downsfhit feature utilizing the recently added ethtool PHY tunables.
      
      Tested with two Gigabit link partners with a 4-wire cable having only
      2 pairs connected.
      
      Last patch in the series is a fix that was required for testing, which
      should make it to -stable, which I can submit separate against net if
      you prefer David.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac32378f
    • F
      net: dsa: bcm_sf2: Ensure we re-negotiate EEE during after link change · 30ce0de4
      Florian Fainelli 提交于
      In case the link change and EEE is enabled or disabled, always try to
      re-negotiate this with the link partner.
      
      Fixes: 450b05c1 ("net: dsa: bcm_sf2: add support for controlling EEE")
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30ce0de4
    • F
      net: phy: bcm7xxx: Add support for downshift/Wirespeed · db88816b
      Florian Fainelli 提交于
      Add support for configuring the downshift/Wirespeed enable/disable
      toggles and specify a link retry value ranging from 1 to 9. Since the
      integrated BCM7xxx have issues when wirespeed is enabled and EEE is also
      enabled, we do disable EEE if wirespeed is enabled.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db88816b
    • F
      net: phy: broadcom: Allow enabling or disabling of EEE · 99cec8a4
      Florian Fainelli 提交于
      In preparation for adding support for Wirespeed/downshift, we need to
      change bcm_phy_eee_enable() to allow enabling or disabling EEE, so make
      the function take an extra enable/disable boolean parameter and rename
      it to illustrate it sets EEE, not necessarily just enables it.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      99cec8a4
    • F
      net: phy: broadcom: Add support code for downshift/Wirespeed · d06f78c4
      Florian Fainelli 提交于
      Broadcom's Wirespeed feature allows us to configure how auto-negotiation
      should behave with fewer working pairs of wires on a cable. Add support
      code for retrieving and setting such downshift counters using the
      recently added ethtool downshift tunables.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d06f78c4
    • F
      net: phy: broadcom: Move bcm54xx_auxctl_{read, write} to common library · 5519da87
      Florian Fainelli 提交于
      We are going to need these functions to implement support for Broadcom
      Wirespeed, aka downshift.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5519da87
    • E
      tcp: enhance tcp_collapse_retrans() with skb_shift() · f8071cde
      Eric Dumazet 提交于
      In commit 2331ccc5 ("tcp: enhance tcp collapsing"),
      we made a first step allowing copying right skb to left skb head.
      
      Since all skbs in socket write queue are headless (but possibly the very
      first one), this strategy often does not work.
      
      This patch extends tcp_collapse_retrans() to perform frag shifting,
      thanks to skb_shift() helper.
      
      This helper needs to not BUG on non headless skbs, as callers are ok
      with that.
      
      Tested:
      
      Following packetdrill test now passes :
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
         +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
         +0 bind(3, ..., ...) = 0
         +0 listen(3, 1) = 0
      
         +0 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 8>
         +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 8>
      +.100 < . 1:1(0) ack 1 win 257
         +0 accept(3, ..., ...) = 4
      
         +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0
         +0 write(4, ..., 200) = 200
         +0 > P. 1:201(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 201:401(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 401:601(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 601:801(200) ack 1
      +.001 write(4, ..., 200) = 200
         +0 > P. 801:1001(200) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1001:1101(100) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1101:1201(100) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1201:1301(100) ack 1
      +.001 write(4, ..., 100) = 100
         +0 > P. 1301:1401(100) ack 1
      
      +.099 < . 1:1(0) ack 201 win 257
      +.001 < . 1:1(0) ack 201 win 257 <nop,nop,sack 1001:1401>
         +0 > P. 201:1001(800) ack 1
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8071cde
    • S
      net: dsa: mv88e6xxx: add MV88E6097 switch · 7d381a02
      Stefan Eichenberger 提交于
      Add support for the MV88E6097 switch. The change was tested on an Armada
      based platform with a MV88E6097 switch.
      Signed-off-by: NStefan Eichenberger <stefan.eichenberger@netmodule.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d381a02
    • U
      net/phy: add trace events for mdio accesses · e22e996b
      Uwe Kleine-König 提交于
      Make it possible to generate trace events for mdio read and write accesses.
      Signed-off-by: NUwe Kleine-König <uwe@kleine-koenig.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e22e996b
    • S
      VSOCK: add loopback to virtio_transport · b9116823
      Stefan Hajnoczi 提交于
      The VMware VMCI transport supports loopback inside virtual machines.
      This patch implements loopback for virtio-vsock.
      
      Flow control is handled by the virtio-vsock protocol as usual.  The
      sending process stops transmitting on a connection when the peer's
      receive buffer space is exhausted.
      
      Cathy Avery <cavery@redhat.com> noticed this difference between VMCI and
      virtio-vsock when a test case using loopback failed.  Although loopback
      isn't the main point of AF_VSOCK, it is useful for testing and
      virtio-vsock must match VMCI semantics so that userspace programs run
      regardless of the underlying transport.
      
      My understanding is that loopback is not supported on the host side with
      VMCI.  Follow that by implementing it only in the guest driver, not the
      vhost host driver.
      
      Cc: Jorgen Hansen <jhansen@vmware.com>
      Reported-by: NCathy Avery <cavery@redhat.com>
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9116823
  2. 23 11月, 2016 1 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · f9aa9dc7
      David S. Miller 提交于
      All conflicts were simple overlapping changes except perhaps
      for the Thunder driver.
      
      That driver has a change_mtu method explicitly for sending
      a message to the hardware.  If that fails it returns an
      error.
      
      Normally a driver doesn't need an ndo_change_mtu method becuase those
      are usually just range changes, which are now handled generically.
      But since this extra operation is needed in the Thunder driver, it has
      to stay.
      
      However, if the message send fails we have to restore the original
      MTU before the change because the entire call chain expects that if
      an error is thrown by ndo_change_mtu then the MTU did not change.
      Therefore code is added to nicvf_change_mtu to remember the original
      MTU, and to restore it upon nicvf_update_hw_max_frs() failue.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9aa9dc7
  3. 22 11月, 2016 17 次提交