1. 05 5月, 2016 23 次提交
  2. 04 5月, 2016 17 次提交
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · cba65321
      David S. Miller 提交于
      Conflicts:
      	net/ipv4/ip_gre.c
      
      Minor conflicts between tunnel bug fixes in net and
      ipv6 tunnel cleanups in net-next.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cba65321
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7391daf2
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "Some straggler bug fixes:
      
         1) Batman-adv DAT must consider VLAN IDs when choosing candidate
            nodes, from Antonio Quartulli.
      
         2) Fix botched reference counting of vlan objects and neigh nodes in
            batman-adv, from Sven Eckelmann.
      
         3) netem can crash when it sees GSO packets, the fix is to segment
            then upon ->enqueue.  Fix from Neil Horman with help from Eric
            Dumazet.
      
         4) Fix VXLAN dependencies in mlx5 driver Kconfig, from Matthew
            Finlay.
      
         5) Handle VXLAN ops outside of rcu lock, via a workqueue, in mlx5,
            since it can sleep.  Fix also from Matthew Finlay.
      
         6) Check mdiobus_scan() return values properly in pxa168_eth and macb
            drivers.  From Sergei Shtylyov.
      
         7) If the netdevice doesn't support checksumming, disable
            segmentation.  From Alexandery Duyck.
      
         8) Fix races between RDS tcp accept and sending, from Sowmini
            Varadhan.
      
         9) In macb driver, probe MDIO bus before we register the netdev,
            otherwise we can try to open the device before it is really ready
            for that.  Fix from Florian Fainelli.
      
        10) Netlink attribute size for ILA "tunnels" not calculated properly,
            fix from Nicolas Dichtel"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        ipv6/ila: fix nlsize calculation for lwtunnel
        net: macb: Probe MDIO bus before registering netdev
        RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock.
        RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock
        vxlan: Add checksum check to the features check function
        net: Disable segmentation if checksumming is not supported
        net: mvneta: Remove superfluous SMP function call
        macb: fix mdiobus_scan() error check
        pxa168_eth: fix mdiobus_scan() error check
        net/mlx5e: Use workqueue for vxlan ops
        net/mlx5e: Implement a mlx5e workqueue
        net/mlx5: Kconfig: Fix MLX5_EN/VXLAN build issue
        net/mlx5: Unmap only the relevant IO memory mapping
        netem: Segment GSO packets on enqueue
        batman-adv: Fix reference counting of hardif_neigh_node object for neigh_node
        batman-adv: Fix reference counting of vlan object for tt_local_entry
        batman-adv: B.A.T.M.A.N V - make sure iface is reactivated upon NETDEV_UP event
        batman-adv: fix DAT candidate selection (must use vid)
      7391daf2
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 610603a5
      Linus Torvalds 提交于
      Pull fuse fixes from Miklos Szeredi:
       "Fix a regression and update the MAINTAINERS entry for fuse"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
        fuse: update mailing list in MAINTAINERS
        fuse: Fix return value from fuse_get_user_pages()
      610603a5
    • N
      ipv6/ila: fix nlsize calculation for lwtunnel · 79e8dc8b
      Nicolas Dichtel 提交于
      The handler 'ila_fill_encap_info' adds one attribute: ILA_ATTR_LOCATOR.
      
      Fixes: 65d7ab8d ("net: Identifier Locator Addressing module")
      CC: Tom Herbert <tom@herbertland.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79e8dc8b
    • W
      ipv6: add new struct ipcm6_cookie · 26879da5
      Wei Wang 提交于
      In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local
      variables like hlimits, tclass, opt and dontfrag and pass them to corresponding
      functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames.
      This is not a good practice and makes it hard to add new parameters.
      This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in
      ipv4 and include the above mentioned variables. And we only pass the
      pointer to this structure to corresponding functions. This makes it easier
      to add new parameters in the future and makes the function cleaner.
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26879da5
    • F
      net: macb: Probe MDIO bus before registering netdev · cf669660
      Florian Fainelli 提交于
      The current sequence makes us register for a network device prior to
      registering and probing the MDIO bus which could lead to some unwanted
      consequences, like a thread of execution calling into ndo_open before
      register_netdev() returns, while the MDIO bus is not ready yet.
      
      Rework the sequence to register for the MDIO bus, and therefore attach
      to a PHY prior to calling register_netdev(), which implies reworking the
      error path a bit.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf669660
    • D
      Merge branch 'rds-fixes' · b365d955
      David S. Miller 提交于
      Sowmini Varadhan says:
      
      ====================
      RDS: TCP: sychronization during connection startup
      
      This patch series ensures that the passive (accept) side of the
      TCP connection used for RDS-TCP is correctly synchronized with
      any concurrent active (connect) attempts for a given pair of peers.
      
      Patch 1 in the series makes sure that the t_sock in struct
      rds_tcp_connection is only reset after any threads in rds_tcp_xmit
      have completed (otherwise a null-ptr deref may be encountered).
      Patch 2 synchronizes rds_tcp_accept_one() with the rds_tcp*connect()
      path.
      
      v2: review comments from Santosh Shilimkar, other spelling corrections
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b365d955
    • S
      RDS: TCP: Synchronize accept() and connect() paths on t_conn_lock. · bd7c5f98
      Sowmini Varadhan 提交于
      An arbitration scheme for duelling SYNs is implemented as part of
      commit 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an
      outgoing socket in rds_tcp_accept_one()") which ensures that both nodes
      involved will arrive at the same arbitration decision. However, this
      needs to be synchronized with an outgoing SYN to be generated by
      rds_tcp_conn_connect(). This commit achieves the synchronization
      through the t_conn_lock mutex in struct rds_tcp_connection.
      
      The rds_conn_state is checked in rds_tcp_conn_connect() after acquiring
      the t_conn_lock mutex.  A SYN is sent out only if the RDS connection is
      not already UP (an UP would indicate that rds_tcp_accept_one() has
      completed 3WH, so no SYN needs to be generated).
      
      Similarly, the rds_conn_state is checked in rds_tcp_accept_one() after
      acquiring the t_conn_lock mutex. The only acceptable states (to
      allow continuation of the arbitration logic) are UP (i.e., outgoing SYN
      was SYN-ACKed by peer after it sent us the SYN) or CONNECTING (we sent
      outgoing SYN before we saw incoming SYN).
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd7c5f98
    • S
      RDS:TCP: Synchronize rds_tcp_accept_one with rds_send_xmit when resetting t_sock · eb192840
      Sowmini Varadhan 提交于
      There is a race condition between rds_send_xmit -> rds_tcp_xmit
      and the code that deals with resolution of duelling syns added
      by commit 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an
      outgoing socket in rds_tcp_accept_one()").
      
      Specifically, we may end up derefencing a null pointer in rds_send_xmit
      if we have the interleaving sequence:
                 rds_tcp_accept_one                  rds_send_xmit
      
                                                   conn is RDS_CONN_UP, so
          					 invoke rds_tcp_xmit
      
                                                   tc = conn->c_transport_data
              rds_tcp_restore_callbacks
                  /* reset t_sock */
          					 null ptr deref from tc->t_sock
      
      The race condition can be avoided without adding the overhead of
      additional locking in the xmit path: have rds_tcp_accept_one wait
      for rds_tcp_xmit threads to complete before resetting callbacks.
      The synchronization can be done in the same manner as rds_conn_shutdown().
      First set the rds_conn_state to something other than RDS_CONN_UP
      (so that new threads cannot get into rds_tcp_xmit()), then wait for
      RDS_IN_XMIT to be cleared in the conn->c_flags indicating that any
      threads in rds_tcp_xmit are done.
      
      Fixes: 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an
      outgoing socket in rds_tcp_accept_one()")
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb192840
    • E
      net: add __sock_wfree() helper · 1d2077ac
      Eric Dumazet 提交于
      Hosts sending lot of ACK packets exhibit high sock_wfree() cost
      because of cache line miss to test SOCK_USE_WRITE_QUEUE
      
      We could move this flag close to sk_wmem_alloc but it is better
      to perform the atomic_sub_and_test() on a clean cache line,
      as it avoid one extra bus transaction.
      
      skb_orphan_partial() can also have a fast track for packets that either
      are TCP acks, or already went through another skb_orphan_partial()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d2077ac
    • D
      Merge branch 'tunnel-csum-and-sg-offloads' · 42c8819b
      David S. Miller 提交于
      Alexander Duyck says:
      
      ====================
      Fixes for tunnel checksum and segmentation offloads
      
      This patch series is a subset of patches I had submitted for net-next.  I
      plan to drop these two patches from the v3 of "Fix Tunnel features and
      enable GSO partial for several drivers" and I am instead submitting them
      for net since these are truly fixes and likely will need to be backported
      to stable branches.
      
      This series addresses 2 specific issues.  The first is that we could
      request TSO on a v4 inner header while not supporting checksum offload of
      the outer IPv6 header.  The second is that we could request an IPv6 inner
      checksum offload without validating that we could actually support an inner
      IPv6 checksum offload.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42c8819b
    • A
      vxlan: Add checksum check to the features check function · af67eb9e
      Alexander Duyck 提交于
      We need to perform an additional check on the inner headers to determine if
      we can offload the checksum for them.  Previously this check didn't occur
      so we would generate an invalid frame in the case of an IPv6 header
      encapsulated inside of an IPv4 tunnel.  To fix this I added a secondary
      check to vxlan_features_check so that we can verify that we can offload the
      inner checksum.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af67eb9e
    • A
      net: Disable segmentation if checksumming is not supported · 996e8021
      Alexander Duyck 提交于
      In the case of the mlx4 and mlx5 driver they do not support IPv6 checksum
      offload for tunnels.  With this being the case we should disable GSO in
      addition to the checksum offload features when we find that a device cannot
      perform a checksum on a given packet type.
      Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      996e8021
    • D
      Merge branch 'tipc-next' · e34b1638
      David S. Miller 提交于
      Jon Maloy says:
      
      ====================
      tipc: redesign socket-level flow control
      
      The socket-level flow control in TIPC has long been due for a major
      overhaul. This series fixes this.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e34b1638
    • J
      tipc: redesign connection-level flow control · 10724cc7
      Jon Paul Maloy 提交于
      There are two flow control mechanisms in TIPC; one at link level that
      handles network congestion, burst control, and retransmission, and one
      at connection level which' only remaining task is to prevent overflow
      in the receiving socket buffer. In TIPC, the latter task has to be
      solved end-to-end because messages can not be thrown away once they
      have been accepted and delivered upwards from the link layer, i.e, we
      can never permit the receive buffer to overflow.
      
      Currently, this algorithm is message based. A counter in the receiving
      socket keeps track of number of consumed messages, and sends a dedicated
      acknowledge message back to the sender for each 256 consumed message.
      A counter at the sending end keeps track of the sent, not yet
      acknowledged messages, and blocks the sender if this number ever reaches
      512 unacknowledged messages. When the missing acknowledge arrives, the
      socket is then woken up for renewed transmission. This works well for
      keeping the message flow running, as it almost never happens that a
      sender socket is blocked this way.
      
      A problem with the current mechanism is that it potentially is very
      memory consuming. Since we don't distinguish between small and large
      messages, we have to dimension the socket receive buffer according
      to a worst-case of both. I.e., the window size must be chosen large
      enough to sustain a reasonable throughput even for the smallest
      messages, while we must still consider a scenario where all messages
      are of maximum size. Hence, the current fix window size of 512 messages
      and a maximum message size of 66k results in a receive buffer of 66 MB
      when truesize(66k) = 131k is taken into account. It is possible to do
      much better.
      
      This commit introduces an algorithm where we instead use 1024-byte
      blocks as base unit. This unit, always rounded upwards from the
      actual message size, is used when we advertise windows as well as when
      we count and acknowledge transmitted data. The advertised window is
      based on the configured receive buffer size in such a way that even
      the worst-case truesize/msgsize ratio always is covered. Since the
      smallest possible message size (from a flow control viewpoint) now is
      1024 bytes, we can safely assume this ratio to be less than four, which
      is the value we are now using.
      
      This way, we have been able to reduce the default receive buffer size
      from 66 MB to 2 MB with maintained performance.
      
      In order to keep this solution backwards compatible, we introduce a
      new capability bit in the discovery protocol, and use this throughout
      the message sending/reception path to always select the right unit.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      10724cc7
    • J
      tipc: propagate peer node capabilities to socket layer · 60020e18
      Jon Paul Maloy 提交于
      During neighbor discovery, nodes advertise their capabilities as a bit
      map in a dedicated 16-bit field in the discovery message header. This
      bit map has so far only be stored in the node structure on the peer
      nodes, but we now see the need to keep a copy even in the socket
      structure.
      
      This commit adds this functionality.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60020e18
    • J
      tipc: re-enable compensation for socket receive buffer double counting · 7c8bcfb1
      Jon Paul Maloy 提交于
      In the refactoring commit d570d864 ("tipc: enqueue arrived buffers
      in socket in separate function") we did by accident replace the test
      
      if (sk->sk_backlog.len == 0)
           atomic_set(&tsk->dupl_rcvcnt, 0);
      
      with
      
      if (sk->sk_backlog.len)
           atomic_set(&tsk->dupl_rcvcnt, 0);
      
      This effectively disables the compensation we have for the double
      receive buffer accounting that occurs temporarily when buffers are
      moved from the backlog to the socket receive queue. Until now, this
      has gone unnoticed because of the large receive buffer limits we are
      applying, but becomes indispensable when we reduce this buffer limit
      later in this series.
      
      We now fix this by inverting the mentioned condition.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c8bcfb1