1. 21 1月, 2017 11 次提交
  2. 20 1月, 2017 7 次提交
  3. 19 1月, 2017 22 次提交
    • T
      net: Remove usage of net_device last_rx member · 4a7c9726
      Tobias Klauser 提交于
      The network stack no longer uses the last_rx member of struct net_device
      since the bonding driver switched to use its own private last_rx in
      commit 9f242738 ("bonding: use last_arp_rx in slave_last_rx()").
      
      However, some drivers still (ab)use the field for their own purposes and
      some driver just update it without actually using it.
      
      Previously, there was an accompanying comment for the last_rx member
      added in commit 4dc89133 ("net: add a comment on netdev->last_rx")
      which asked drivers not to update is, unless really needed. However,
      this commend was removed in commit f8ff080d ("bonding: remove
      useless updating of slave->dev->last_rx"), so some drivers added later
      on still did update last_rx.
      
      Remove all usage of last_rx and switch three drivers (sky2, atp and
      smc91c92_cs) which actually read and write it to use their own private
      copy in netdev_priv.
      
      Compile-tested with allyesconfig and allmodconfig on x86 and arm.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Mirko Lindner <mlindner@marvell.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Reviewed-by: NJay Vosburgh <jay.vosburgh@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a7c9726
    • V
      net: dsa: use cpu_switch instead of ds[0] · 9520ed8f
      Vivien Didelot 提交于
      Now that the DSA Ethernet switches are true Linux devices, the CPU
      switch is not necessarily the first one. If its address is higher than
      the second switch on the same MDIO bus, its index will be 1, not 0.
      
      Avoid any confusion by using dst->cpu_switch instead of dst->ds[0].
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9520ed8f
    • V
      net: dsa: store CPU switch structure in the tree · b22de490
      Vivien Didelot 提交于
      Store a dsa_switch pointer to the CPU switch in the tree instead of only
      its index. This avoids the need to initialize it to -1.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b22de490
    • I
      net: ethernet: ti: davinci_cpdma: correct check on NULL in set rate · e33c2ef1
      Ivan Khoronzhuk 提交于
      Check "ch" on NULL first, then get ctlr.
      Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e33c2ef1
    • D
      Merge branch 'vhost_net-batching' · e3e37e70
      David S. Miller 提交于
      Jason Wang says:
      
      ====================
      vhost_net tx batching
      
      This series tries to implement tx batching support for vhost. This was
      done by using MSG_MORE as a hint for under layer socket. The backend
      (e.g tap) can then batch the packets temporarily in a list and
      submit it all once the number of bacthed exceeds a limitation.
      
      Tests shows obvious improvement on guest pktgen over over
      mlx4(noqueue) on host:
      
                                           Mpps  -+%
              rx-frames = 0                0.91  +0%
              rx-frames = 4                1.00  +9.8%
              rx-frames = 8                1.00  +9.8%
              rx-frames = 16               1.01  +10.9%
              rx-frames = 32               1.07  +17.5%
              rx-frames = 48               1.07  +17.5%
              rx-frames = 64               1.08  +18.6%
              rx-frames = 64 (no MSG_MORE) 0.91  +0%
      
      Changes from V4:
      - stick to NAPI_POLL_WEIGHT for rx-frames is user specify a value
        greater than it.
      Changes from V3:
      - use ethtool instead of module parameter to control the maximum
        number of batched packets
      - avoid overhead when MSG_MORE were not set and no packet queued
      Changes from V2:
      - remove uselss queue limitation check (and we don't drop any packet now)
      Changes from V1:
      - drop NAPI handler since we don't use NAPI now
      - fix the issues that may exceeds max pending of zerocopy
      - more improvement on available buffer detection
      - move the limitation of batched pacekts from vhost to tuntap
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3e37e70
    • J
      tun: rx batching · 5503fcec
      Jason Wang 提交于
      We can only process 1 packet at one time during sendmsg(). This often
      lead bad cache utilization under heavy load. So this patch tries to do
      some batching during rx before submitting them to host network
      stack. This is done through accepting MSG_MORE as a hint from
      sendmsg() caller, if it was set, batch the packet temporarily in a
      linked list and submit them all once MSG_MORE were cleared.
      
      Tests were done by pktgen (burst=128) in guest over mlx4(noqueue) on host:
      
                                       Mpps  -+%
          rx-frames = 0                0.91  +0%
          rx-frames = 4                1.00  +9.8%
          rx-frames = 8                1.00  +9.8%
          rx-frames = 16               1.01  +10.9%
          rx-frames = 32               1.07  +17.5%
          rx-frames = 48               1.07  +17.5%
          rx-frames = 64               1.08  +18.6%
          rx-frames = 64 (no MSG_MORE) 0.91  +0%
      
      User were allowed to change per device batched packets through
      ethtool -C rx-frames. NAPI_POLL_WEIGHT were used as upper limitation
      to prevent bh from being disabled too long.
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5503fcec
    • J
      vhost_net: tx batching · 0ed005ce
      Jason Wang 提交于
      This patch tries to utilize tuntap rx batching by peeking the tx
      virtqueue during transmission, if there's more available buffers in
      the virtqueue, set MSG_MORE flag for a hint for backend (e.g tuntap)
      to batch the packets.
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ed005ce
    • J
      vhost: better detection of available buffers · 275bf960
      Jason Wang 提交于
      This patch tries to do several tweaks on vhost_vq_avail_empty() for a
      better performance:
      
      - check cached avail index first which could avoid userspace memory access.
      - using unlikely() for the failure of userspace access
      - check vq->last_avail_idx instead of cached avail index as the last
        step.
      
      This patch is need for batching supports which needs to peek whether
      or not there's still available buffers in the ring.
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      275bf960
    • M
      net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering · 1a8b6d76
      Mao Wenan 提交于
      Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can
      enhance the performance for some cpu architecure, such as SPARC and so on.
      Currently it only supports one special cpu architecture(SPARC) in 82599
      driver to enable RO feature, this is not very common for other cpu architecture
      which really needs RO feature.
      This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO feature,
      and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly.
      Signed-off-by: NMao Wenan <maowenan@huawei.com>
      Reviewed-by: NAlexander Duyck <alexander.duyck@gmail.com>
      Reviewed-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1a8b6d76
    • D
      Merge branch 'ipv6-simplify-rt6_fill_node' · 1e48aac1
      David S. Miller 提交于
      David Ahern says:
      
      ====================
      net: ipv6: simplify rt6_fill_node
      
      Remove a couple of unnecessary input arguments to rt6_fill_node.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e48aac1
    • D
      net: ipv6: remove prefix arg to rt6_fill_node · f8cfe2ce
      David Ahern 提交于
      The prefix arg to rt6_fill_node is non-0 in only 1 path - rt6_dump_route
      where a user is requesting a prefix only dump. Simplify rt6_fill_node
      by removing the prefix arg and moving the prefix check to rt6_dump_route.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8cfe2ce
    • D
      net: ipv6: remove nowait arg to rt6_fill_node · fd61c6ba
      David Ahern 提交于
      All callers of rt6_fill_node pass 0 for nowait arg. Remove the arg and
      simplify rt6_fill_node accordingly.
      
      rt6_fill_node passes the nowait of 0 to ip6mr_get_route. Remove the
      nowait arg from it as well.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd61c6ba
    • D
      Merge branch 'sctp-sender-side-stream-reconf-ssn-reset-request-chunk' · 1ce463dd
      David S. Miller 提交于
      Xin Long says:
      
      ====================
      sctp: add sender-side procedures for stream reconf ssn reset request chunk
      
      Patch 6/6 is to implement sender-side procedures for the Outgoing
      and Incoming SSN Reset Request Parameter described in rfc6525
      section 5.1.2 and 5.1.3
      
      Patches 1-5/6 are ahead of it to define some apis and asoc members
      for it.
      
      Note that with this patchset, asoc->reconf_enable has no chance yet to
      be set, until the patch "sctp: add get and set sockopt for reconf_enable"
      is applied in the future. As we can not just enable it when sctp is not
      capable of processing reconf chunk yet.
      
      v1->v2:
        - put these into a smaller group.
        - rename some temporary variables in the codes.
        - rename the titles of the commits and improve some changelogs.
      v2->v3:
        - re-split the patchset and make sure it has no dead codes for review.
      v3->v4:
        - move sctp_make_reconf() into patch 1/6 to avoid kbuild warning.
        - drop unused struct sctp_strreset_req.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ce463dd
    • X
      sctp: implement sender-side procedures for SSN Reset Request Parameter · 7f9d68ac
      Xin Long 提交于
      This patch is to implement sender-side procedures for the Outgoing
      and Incoming SSN Reset Request Parameter described in rfc6525 section
      5.1.2 and 5.1.3.
      
      It is also add sockopt SCTP_RESET_STREAMS in rfc6525 section 6.3.2
      for users.
      
      Note that the new asoc member strreset_outstanding is to make sure
      only one reconf request chunk on the fly as rfc6525 section 5.1.1
      demands.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f9d68ac
    • X
      sctp: add sockopt SCTP_ENABLE_STREAM_RESET · 9fb657ae
      Xin Long 提交于
      This patch is to add sockopt SCTP_ENABLE_STREAM_RESET to get/set
      strreset_enable to indicate which reconf request type it supports,
      which is described in rfc6525 section 6.3.1.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9fb657ae
    • X
      sctp: add reconf_enable in asoc ep and netns · c28445c3
      Xin Long 提交于
      This patch is to add reconf_enable field in all of asoc ep and netns
      to indicate if they support stream reset.
      
      When initializing, asoc reconf_enable get the default value from ep
      reconf_enable which is from netns netns reconf_enable by default.
      
      It is also to add reconf_capable in asoc peer part to know if peer
      supports reconf_enable, the value is set if ext params have reconf
      chunk support when processing init chunk, just as rfc6525 section
      5.1.1 demands.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c28445c3
    • X
      sctp: add stream reconf primitive · 7a090b04
      Xin Long 提交于
      This patch is to add a primitive based on sctp primitive frame for
      sending stream reconf request. It works as the other primitives,
      and create a SCTP_CMD_REPLY command to send the request chunk out.
      
      sctp_primitive_RECONF would be the api to send a reconf request
      chunk.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a090b04
    • X
      sctp: add stream reconf timer · 7b9438de
      Xin Long 提交于
      This patch is to add a per transport timer based on sctp timer frame
      for stream reconf chunk retransmission. It would start after sending
      a reconf request chunk, and stop after receiving the response chunk.
      
      If the timer expires, besides retransmitting the reconf request chunk,
      it would also do the same thing with data RTO timer. like to increase
      the appropriate error counts, and perform threshold management, possibly
      destroying the asoc if sctp retransmission thresholds are exceeded, just
      as section 5.1.1 describes.
      
      This patch is also to add asoc strreset_chunk, it is used to save the
      reconf request chunk, so that it can be retransmitted, and to check if
      the response is really for this request by comparing the information
      inside with the response chunk as well.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b9438de
    • X
      sctp: add support for generating stream reconf ssn reset request chunk · cc16f00f
      Xin Long 提交于
      This patch is to add asoc strreset_outseq and strreset_inseq for
      saving the reconf request sequence, initialize them when create
      assoc and process init, and also to define Incoming and Outgoing
      SSN Reset Request Parameter described in rfc6525 section 4.1 and
      4.2, As they can be in one same chunk as section rfc6525 3.1-3
      describes, it makes them in one function.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc16f00f
    • D
      Merge branch 'rework-inet_csk_get_port' · b16ed2b1
      David S. Miller 提交于
      Josef Bacik says:
      
      ====================
      Rework inet_csk_get_port
      
      V3->V4:
      -Removed the random include of addrconf.h that is no longer needed.
      
      V2->V3:
      -Dropped the fastsock from the tb and instead just carry the saddrs, family, and
       ipv6 only flag.
      -Reworked the helper functions to deal with this change so I could still use
       them when checking the fast path.
      -Killed tb->num_owners as per Eric's request.
      -Attached a reproducer to the bottom of this email.
      
      V1->V2:
      -Added a new patch 'inet: collapse ipv4/v6 rcv_saddr_equal functions into one'
       at Hannes' suggestion.
      -Dropped ->bind_conflict and just use the new helper.
      -Fixed a compile bug from the original ->bind_conflict patch.
      
      The original description of the series follows:
      
      At some point recently the guys working on our load balancer added the ability
      to use SO_REUSEPORT.  When they restarted their app with this option enabled
      they immediately hit a softlockup on what appeared to be the
      inet_bind_bucket->lock.  Eventually what all of our debugging and discussion led
      us to was the fact that the application comes up without SO_REUSEPORT, shuts
      down which creates around 100k twsk's, and then comes up and tries to open a
      bunch of sockets using SO_REUSEPORT, which meant traversing the inet_bind_bucket
      owners list under the lock.  Since this lock is needed for dealing with the
      twsk's and basically anything else related to connections we would softlockup,
      and sometimes not ever recover.
      
      To solve this problem I did what you see in Path 5/5.  Once we have a
      SO_REUSEPORT socket on the tb->owners list we know that the socket has no
      conflicts with any of the other sockets on that list.  So we can add a copy of
      the sock_common (really all we need is the recv_saddr but it seemed ugly to copy
      just the ipv6, ipv4, and flag to indicate if we were ipv6 only in there so I've
      copied the whole common) in order to check subsequent SO_REUSEPORT sockets.  If
      they match the previous one then we can skip the expensive
      inet_csk_bind_conflict check.  This is what eliminated the soft lockup that we
      were seeing.
      
      Patches 1-4 are cleanups and re-workings.  For instance when we specify port ==
      0 we need to find an open port, but we would do two passes through
      inet_csk_bind_conflict every time we found a possible port.  We would also keep
      track of the smallest_port value in order to try and use it if we found no
      port our first run through.  This however made no sense as it would have had to
      fail the first pass through inet_csk_bind_conflict, so would not actually pass
      the second pass through either.  Finally I split the function into two functions
      in order to make it easier to read and to distinguish between the two behaviors.
      
      I have tested this on one of our load balancing boxes during peak traffic and it
      hasn't fallen over.  But this is not my area, so obviously feel free to point
      out where I'm being stupid and I'll get it fixed up and retested.  Thanks,
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b16ed2b1
    • J
      inet: reset tb->fastreuseport when adding a reuseport sk · 637bc8bb
      Josef Bacik 提交于
      If we have non reuseport sockets on a tb we will set tb->fastreuseport to 0 and
      never set it again.  Which means that in the future if we end up adding a bunch
      of reuseport sk's to that tb we'll have to do the expensive scan every time.
      Instead add the ipv4/ipv6 saddr fields to the bind bucket, as well as the family
      so we know what comparison to make, and the ipv6 only setting so we can make
      sure to compare with new sockets appropriately.  Once one sk has made it onto
      the list we know that there are no potential bind conflicts on the owners list
      that match that sk's rcv_addr.  So copy the sk's information into our bind
      bucket and set tb->fastruseport to FASTREUSESOCK_STRICT so we know we have to do
      an extra check for subsequent reuseport sockets and skip the expensive bind
      conflict check.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      637bc8bb
    • J
      inet: split inet_csk_get_port into two functions · 289141b7
      Josef Bacik 提交于
      inet_csk_get_port does two different things, it either scans for an open port,
      or it tries to see if the specified port is available for use.  Since these two
      operations have different rules and are basically independent lets split them
      into two different functions to make them both more readable.
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      289141b7