- 26 8月, 2015 16 次提交
-
-
由 Mukesh Kacker 提交于
rds_send_queue_rm() allows for the "current datagram" being queued to exceed SO_SNDBUF thresholds by checking bytes queued without counting in length of current datagram. (Since sk_sndbuf is set to twice requested SO_SNDBUF value as a kernel heuristic this is usually fine!) If this "current datagram" squeezing past the threshold is itself many times the size of the sk_sndbuf threshold itself then even twice the SO_SNDBUF does not save us and it gets queued but cannot be transmitted. Threads block and deadlock and device becomes unusable. The check for this datagram not exceeding SNDBUF thresholds (EMSGSIZE) is not done on this datagram as that check is only done if queueing attempt fails. (Datagrams that follow this datagram fail queueing attempts, go through the check and eventually trip EMSGSIZE error but zero length datagrams silently fail!) This fix moves the check for datagrams exceeding SNDBUF limits before any processing or queueing is attempted and returns EMSGSIZE early in the rds_sndmsg() code. This change also ensures that all datagrams get checked for exceeding SNDBUF/sk_sndbuf size limits and the large datagrams that exceed those limits do not get to rds_send_queue_rm() code for processing. Signed-off-by: NMukesh Kacker <mukesh.kacker@oracle.com> Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
rds_send_drop_to() is used during socket tear down to find all the messages on the socket and flush them . It can race with the acking code unless it takes the m_rs_lock on each and every message. This plugs a hole where we didn't take m_rs_lock on any message that didn't have the RDS_MSG_ON_CONN set. Taking m_rs_lock avoids double frees and other memory corruptions as the ack code trusts the message m_rs pointer on a socket that had actually been freed. We must take m_rs_lock to access m_rs. Because of lock nesting and rs access, we also need to acquire rs_lock. Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Santosh Shilimkar 提交于
During connection resets, we are destroying the rdma id too soon. We can't destroy it when it is still in use. So lets move rdma_destroy_id() after we clear the rings. Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
Fix the asserion level since its not fatal and can be hit in normal execution paths. There is no need to take the system down. We keep the WARN_ON() to detect the condition if we get here with bad pages. Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
WR(Work Requests )always generate a WC(Work Completion) with signaled send. Default RDS ib code is setup for un-signaled completion. Since RDS connction is persistent, we can end up sending the data even after large-send when the remote end is not active(for any reason). By doing a signaled send at least once per large-send, we can at least detect the problem in work completion handler there by avoiding sending more data to inactive remote. Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
rds_send_xmit() marks the rds message map flag after xmit_[rdma/atomic]() which is clearly wrong. We need to maintain the ownership between transport and rds. Also take care of error path. Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
This helps to detect the accidental processes/apps trying to destroy the RDS socket which they are sharing with other processes/apps. Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
Ensure we don't keep sending the data if the link is congested. Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
If we get an ENOMEM during rds_ib_recv_refill, we might never come back and refill again later. Patch makes sure to kick krdsd into helping out. To achieve this we add RDS_RECV_REFILL flag and update in the refill path based on that so that at least some therad will keep posting receive buffers. Since krdsd and softirq both might race for refill, we decide to schedule on work queue based on ring_low instead of ring_empty. Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
If the ip address tables hasn't changed, there is no need to remove them only to be added back again. Lets fix it. Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
Destroy ib state early during shutdown. Otherwise we can get callbacks after the QP isn't really able to handle them. Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
We were still seeing rare occurrences of the WARN_ON(recv->r_frag) which indicates that the recv refill path was finding allocated frags in ring entries that were marked free. These were usually followed by OOM crashes. They only seem to be occurring in the presence of completion errors and connection resets. This patch ensures that we free the frag as we mark the ring entry free. This should stop the refill path from finding allocated frags in ring entries that were marked free. Reviewed-by: NAjaykumar Hotchandani <ajaykumar.hotchandani@oracle.com> Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
In rds_cmsg_rdma_args() 'ret' is used by rds_pin_pages() which returns number of pinned pages on success. And the same value is returned to the caller of rds_cmsg_rdma_args() on success which is not intended. Commit f4a3fc03 ("RDS: Clean up error handling in rds_cmsg_rdma_args") removed the 'ret = 0' line which broke RDS RDMA mode. Fix it by restoring the return value on rds_pin_pages() success keeping the clean-up in place. Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
When TCP pacing was added back in linux-3.12, we chose to apply a fixed ratio of 200 % against current rate, to allow probing for optimal throughput even during slow start phase, where cwnd can be doubled every other gRTT. At Google, we found it was better applying a different ratio while in Congestion Avoidance phase. This ratio was set to 120 %. We've used the normal tcp_in_slow_start() helper for a while, then tuned the condition to select the conservative ratio as soon as cwnd >= ssthresh/2 : - After cwnd reduction, it is safer to ramp up more slowly, as we approach optimal cwnd. - Initial ramp up (ssthresh == INFINITY) still allows doubling cwnd every other RTT. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Directs route lookups to VRF table. Compiles out if NET_VRF is not enabled. With this patch able to successfully bring up ipsec tunnels in VRFs, even with duplicate network configuration. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Acked-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
slow start after idle might reduce cwnd, but we perform this after first packet was cooked and sent. With TSO/GSO, it means that we might send a full TSO packet even if cwnd should have been reduced to IW10. Moving the SSAI check in skb_entail() makes sense, because we slightly reduce number of times this check is done, especially for large send() and TCP Small queue callbacks from softirq context. As Neal pointed out, we also need to perform the check if/when receive window opens. Tested: Following packetdrill test demonstrates the problem // Test of slow start after idle `sysctl -q net.ipv4.tcp_slow_start_after_idle=1` 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 65535 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6> +.100 < . 1:1(0) ack 1 win 511 +0 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_SOCKET, SO_SNDBUF, [200000], 4) = 0 +0 write(4, ..., 26000) = 26000 +0 > . 1:5001(5000) ack 1 +0 > . 5001:10001(5000) ack 1 +0 %{ assert tcpi_snd_cwnd == 10 }% +.100 < . 1:1(0) ack 10001 win 511 +0 %{ assert tcpi_snd_cwnd == 20, tcpi_snd_cwnd }% +0 > . 10001:20001(10000) ack 1 +0 > P. 20001:26001(6000) ack 1 +.100 < . 1:1(0) ack 26001 win 511 +0 %{ assert tcpi_snd_cwnd == 36, tcpi_snd_cwnd }% +4 write(4, ..., 20000) = 20000 // If slow start after idle works properly, we should send 5 MSS here (cwnd/2) +0 > . 26001:31001(5000) ack 1 +0 %{ assert tcpi_snd_cwnd == 10, tcpi_snd_cwnd }% +0 > . 31001:36001(5000) ack 1 Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 8月, 2015 24 次提交
-
-
由 David S. Miller 提交于
Taku Izumi says: ==================== FUJITSU Extended Socket network device driver This patchsets adds FUJITSU Extended Socket network device driver. Extended Socket network device is a shared memory based high-speed network interface between Extended Partitions of PRIMEQUEST 2000 E2 series. You can get some information about Extended Partition and Extended Socket by referring the following manual. http://globalsp.ts.fujitsu.com/dmsp/Publications/public/CA92344-0537.pdf 3.2.1 Extended Partitioning 3.2.2 Extended Socke v2.2 -> v3: - Fix up according to David's comment (No functional change) ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds implementation for ethtool support. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds implementation of handling IRQ of other receiver's receive cancellation request. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds epstop_task. This task is used to process other receiver's cancellation request. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds update_zone_task. Zoning information can be changed by user. This task is used to monitor if zoning information is changed or not. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds unshare_watch_task. Shared buffer's status can be changed into unshared. This task is used to monitor shared buffer's status. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds force_close_task. This task is used to close network device forcibly. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds interrupt_watch_task. This task is used to prevent delay of interrupts. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds net_device_ops.ndo_vlan_rx_add_vid and net_device_ops.ndo_vlan_rx_kill_vid callback. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds net_device_ops.ndo_tx_timeout callback. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds net_device_ops.ndo_change_mtu. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds net_device_ops.ndo_get_stats64 callback. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds NAPI polling function and receive related work. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds tx_stall_task. When receiver's buffer is full, sender stops its tx queue. This task is used to monitor receiver's status and when receiver's buffer is avairable, it resumes tx queue. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch add raise_intr_rxdata_task. Extended Socket Network Device is shared memory based, so someone's transmission denotes other's reception. In order to notify receivers, sender has to raise interruption of receivers. raise_intr_rxdata_task does this work. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds net_device_ops.ndo_start_xmit callback, which is called when sending packets. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds net_device_ops.ndo_open and .ndo_stop callback. These function is called when network device activation and deactivation. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds buffer address regist/unregistration routine. This function is mainly invoked when network device's activation (open) and deactivation (close) in order to retist/unregist shared buffer address. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds ES information acquisition routine. ES information can be retrieved issuing information request command. ES information includes which receiver is same zone. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch implements platform_driver's .probe and .remove routine, and also adds board specific private data structure. This driver registers net_device at platform_driver's .probe routine and unregisters net_device at its .remove routine. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds hardware cleanup routine to be invoked at driver's .remove routine. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds hardware initialization routine to be invoked at driver's .probe routine. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Taku Izumi 提交于
This patch adds the basic code of FUJITSU Extended Socket Network Device driver. When "PNP0C02" is found in ACPI DSDT, it evaluates "_STR" to check if "PNP0C02" is for Extended Socket device driver and retrieves ACPI resource information. Then creates platform_device. Signed-off-by: NTaku Izumi <izumi.taku@jp.fujitsu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Loganaden Velvindron 提交于
This BQL patch is based on work done by Tino Reichardt. Tested on 0000:05:00.0: 3Com PCI 3c905C Tornado at ffffc90000e6e000 by running Flent several times. Signed-off-by: NLoganaden Velvindron <logan@elandsys.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-