1. 26 8月, 2015 5 次提交
    • S
      RDS: always free recv frag as we free its ring entry · 43962dd7
      santosh.shilimkar@oracle.com 提交于
      We were still seeing rare occurrences of the WARN_ON(recv->r_frag) which
      indicates that the recv refill path was finding allocated frags in ring
      entries that were marked free. These were usually followed by OOM crashes.
      They only seem to be occurring in the presence of completion errors and
      connection resets.
      
      This patch ensures that we free the frag as we mark the ring entry free.
      This should stop the refill path from finding allocated frags in ring
      entries that were marked free.
      Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com>
      Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43962dd7
    • S
      RDS: restore return value in rds_cmsg_rdma_args() · 1d2e3f39
      santosh.shilimkar@oracle.com 提交于
      In rds_cmsg_rdma_args() 'ret' is used by rds_pin_pages() which returns
      number of pinned pages on success. And the same value is returned to the
      caller of rds_cmsg_rdma_args() on success which is not intended.
      
      Commit f4a3fc03 ("RDS: Clean up error handling in rds_cmsg_rdma_args")
      removed the 'ret = 0' line which broke RDS RDMA mode.
      
      Fix it by restoring the return value on rds_pin_pages() success
      keeping the clean-up in place.
      Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d2e3f39
    • E
      tcp: refine pacing rate determination · 43e122b0
      Eric Dumazet 提交于
      When TCP pacing was added back in linux-3.12, we chose
      to apply a fixed ratio of 200 % against current rate,
      to allow probing for optimal throughput even during
      slow start phase, where cwnd can be doubled every other gRTT.
      
      At Google, we found it was better applying a different ratio
      while in Congestion Avoidance phase.
      This ratio was set to 120 %.
      
      We've used the normal tcp_in_slow_start() helper for a while,
      then tuned the condition to select the conservative ratio
      as soon as cwnd >= ssthresh/2 :
      
      - After cwnd reduction, it is safer to ramp up more slowly,
        as we approach optimal cwnd.
      - Initial ramp up (ssthresh == INFINITY) still allows doubling
        cwnd every other RTT.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43e122b0
    • D
      xfrm: Use VRF master index if output device is enslaved · 4ec3b28c
      David Ahern 提交于
      Directs route lookups to VRF table. Compiles out if NET_VRF is not
      enabled. With this patch able to successfully bring up ipsec tunnels
      in VRFs, even with duplicate network configuration.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ec3b28c
    • E
      tcp: fix slow start after idle vs TSO/GSO · 6f021c62
      Eric Dumazet 提交于
      slow start after idle might reduce cwnd, but we perform this
      after first packet was cooked and sent.
      
      With TSO/GSO, it means that we might send a full TSO packet
      even if cwnd should have been reduced to IW10.
      
      Moving the SSAI check in skb_entail() makes sense, because
      we slightly reduce number of times this check is done,
      especially for large send() and TCP Small queue callbacks from
      softirq context.
      
      As Neal pointed out, we also need to perform the check
      if/when receive window opens.
      
      Tested:
      
      Following packetdrill test demonstrates the problem
      // Test of slow start after idle
      
      `sysctl -q net.ipv4.tcp_slow_start_after_idle=1`
      
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +0    setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +0    bind(3, ..., ...) = 0
      +0    listen(3, 1) = 0
      
      +0    < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      +0    > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
      +.100 < . 1:1(0) ack 1 win 511
      +0    accept(3, ..., ...) = 4
      +0    setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0
      
      +0    write(4, ..., 26000) = 26000
      +0    > . 1:5001(5000) ack 1
      +0    > . 5001:10001(5000) ack 1
      +0    %{ assert tcpi_snd_cwnd == 10 }%
      
      +.100 < . 1:1(0) ack 10001 win 511
      +0    %{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }%
      +0    > . 10001:20001(10000) ack 1
      +0    > P. 20001:26001(6000) ack 1
      
      +.100 < . 1:1(0) ack 26001 win 511
      +0    %{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }%
      
      +4 write(4, ..., 20000) = 20000
      // If slow start after idle works properly, we should send 5 MSS here (cwnd/2)
      +0    > . 26001:31001(5000) ack 1
      +0    %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }%
      +0    > . 31001:36001(5000) ack 1
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f021c62
  2. 25 8月, 2015 27 次提交
  3. 24 8月, 2015 8 次提交
    • S
      net: phy: add interrupt support for aquantia phy · 54cf7be9
      Shaohui Xie 提交于
      By implementing config_intr & ack_interrupt, now the phy can support
      link connect/disconnect interrupt.
      Signed-off-by: NShaohui Xie <Shaohui.Xie@freescale.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      54cf7be9
    • D
      Merge tag 'nfc-next-4.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next · d9893d13
      David S. Miller 提交于
      Samuel Ortiz says:
      
      ====================
      NFC 4.3 pull request
      
      This is the NFC pull request for 4.3.
      With this one we have:
      
      - A new driver for Samsung's S3FWRN5 NFC chipset. In order to
        properly support this driver, a few NCI core routines needed
        to be exported. Future drivers like Intel's Fields Peak will
        benefit from this.
      
      - SPI support as a physical transport for STM st21nfcb.
      
      - An additional netlink API for sending replies back to userspace
        from vendor commands.
      
      - 2 small fixes for TI's trf7970a
      
      - A few st-nci fixes.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9893d13
    • J
      route: fix breakage after moving lwtunnel state · 751a587a
      Jiri Benc 提交于
      __recnt and related fields need to be in its own cacheline for performance
      reasons. Commit 61adedf3 ("route: move lwtunnel state to dst_entry")
      broke that on 32bit archs, causing BUILD_BUG_ON in dst_hold to be triggered.
      
      This patch fixes the breakage by moving the lwtunnel state to the end of
      dst_entry on 32bit archs. Unfortunately, this makes it share the cacheline
      with __refcnt and may affect performance, thus further patches may be
      needed.
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Fixes: 61adedf3 ("route: move lwtunnel state to dst_entry")
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      751a587a
    • D
      Merge tag 'linux-can-next-for-4.3-20150820' of... · 31fbde99
      David S. Miller 提交于
      Merge tag 'linux-can-next-for-4.3-20150820' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
      
      Marc Kleine-Budde says:
      
      ====================
      this is a pull request of a two patches for net-next.
      
      The first patch is by Nik Nyby and fixes a typo in a function name. The
      second patch by Lucas Stach demotes register output to debug level.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31fbde99
    • D
      Merge branch 'tipc-failover-fixes' · c5f98b56
      David S. Miller 提交于
      Jon Maloy says:
      
      ====================
      tipc: fix link failover/synch problems
      
      We fix three problems with the new link failover/synch implementation,
      which was introduced earlier in this release cycle. They are all related
      to situations where there is a very short interval between the disabling
      and enabling of interfaces.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5f98b56
    • J
      tipc: fix stale link problem during synchronization · 2be80c2d
      Jon Paul Maloy 提交于
      Recent changes to the link synchronization means that we can now just
      drop packets arriving on the synchronizing link before the synch point
      is reached. This has lead to significant simplifications to the
      implementation, but also turns out to have a flip side that we need
      to consider.
      
      Under unlucky circumstances, the two endpoints may end up
      repeatedly dropping each other's packets, while immediately
      asking for retransmission of the same packets, just to drop
      them once more. This pattern will eventually be broken when
      the synch point is reached on the other link, but before that,
      the endpoints may have arrived at the retransmission limit
      (stale counter) that indicates that the link should be broken.
      We see this happen at rare occasions.
      
      The fix for this is to not ask for retransmissions when a link is in
      state LINK_SYNCHING. The fact that the link has reached this state
      means that it has already received the first SYNCH packet, and that it
      knows the synch point. Hence, it doesn't need any more packets until the
      other link has reached the synch point, whereafter it can go ahead and
      ask for the missing packets.
      
      However, because of the reduced traffic on the synching link that
      follows this change, it may now take longer to discover that the
      synch point has been reached. We compensate for this by letting all
      packets, on any of the links, trig a check for synchronization
      termination. This is possible because the packets themselves don't
      contain any information that is needed for discovering this condition.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2be80c2d
    • J
      tipc: interrupt link synchronization when a link goes down · 5ae2f8e6
      Jon Paul Maloy 提交于
      When we introduced the new link failover/synch mechanism
      in commit 6e498158
      ("tipc: move link synch and failover to link aggregation level"),
      we missed the case when the non-tunnel link goes down during the link
      synchronization period. In this case the tunnel link will remain in
      state LINK_SYNCHING, something leading to unpredictable behavior when
      the failover procedure is initiated.
      
      In this commit, we ensure that the node and remaining link goes
      back to regular communication state (SELF_UP_PEER_UP/LINK_ESTABLISHED)
      when one of the parallel links goes down. We also ensure that we don't
      re-enter synch mode if subsequent SYNCH packets arrive on the remaining
      link.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5ae2f8e6
    • J
      tipc: eliminate risk of premature link setup during failover · 17b20630
      Jon Paul Maloy 提交于
      When a link goes down, and there is still a working link towards its
      destination node, a failover is initiated, and the failed link is not
      allowed to re-establish until that procedure is finished. To ensure
      this, the concerned link endpoints are set to state LINK_FAILINGOVER,
      and the node endpoints to NODE_FAILINGOVER during the failover period.
      
      However, if the link reset is due to a disabled bearer, the corres-
      ponding link endpoint is deleted, and only the node endpoint knows
      about the ongoing failover. Now, if the disabled bearer is re-enabled
      during the failover period, the discovery mechanism may create a new
      link endpoint that is ready to be established, despite that this is not
      permitted. This situation may cause both the ongoing failover and any
      subsequent link synchronization to fail.
      
      In this commit, we ensure that a newly created link goes directly to
      state LINK_FAILINGOVER if the corresponding node state is
      NODE_FAILINGOVER. This eliminates the problem described above.
      
      Furthermore, we tighten the criteria for which packets are allowed
      to end a failover state in the function tipc_node_check_state().
      By checking that the receiving link is up and running, instead of just
      checking that it is not in failover mode, we eliminate the risk that
      protocol packets from the re-created link may cause the failover to
      be prematurely terminated.
      Reviewed-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17b20630