- 05 5月, 2016 8 次提交
-
-
由 Jiri Benc 提交于
For ipgre interfaces in collect metadata mode, receive also traffic with encapsulated Ethernet headers. The lwtunnel users are supposed to sort this out correctly. This allows to have mixed Ethernet + L3-only traffic on the same lwtunnel interface. This is the same way as VXLAN-GPE behaves. To keep backwards compatibility and prevent any surprises, gretap interfaces have priority in receiving packets with Ethernet headers. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
This will allow to make the pull dependent on the tunnel type. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
The call to gre_parse_header is either followed by iptunnel_pull_header, or in the case of ICMP error path, the actual header is not accessed at all. In the first case, iptunnel_pull_header will call pskb_may_pull anyway and it's pointless to do it twice. The only difference is what call will fail with what error code but the net effect is still the same in all call sites. In the second case, pskb_may_pull is pointless, as skb->data is at the outer IP header and not at the GRE header. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This change makes it so that we will strip the TSO_MANGLEID bit if TSO is not present. This way we will also handle ECN correctly of TSO is not present. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch addresses a possible issue that can occur if we get into any odd corner cases where we support TSO for a given protocol but not the checksum or scatter-gather offload. There are few drivers floating around that setup their tunnels this way and by enforcing the checksum piece we can avoid mangling any frames. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
In the event that the number of partial segments is equal to 1 we don't really need to perform partial segmentation offload. As such we should skip multiplying the MSS and instead just clear the partial_segs value since it will not provide any gain to advertise the frame as being GSO when it is a single frame. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
It's easier for gre_parse_header to return the header length instead of filing it into a parameter. That way, the callers that don't care about the header length can just check whether the returned value is lower than zero. In gre_err, the tunnel header must not be pulled. See commit b7f8fe25 ("gre: do not pull header in ICMP error processing") for details. This patch reduces the conflict between the mentioned commit and commit 95f5c64c ("gre: Move utility functions to common headers"). Signed-off-by: NJiri Benc <jbenc@redhat.com> Acked-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Under high rx pressure, it is possible tcp_sendmsg() never has a chance to allocate an skb and loop forever as sk_flush_backlog() would always return true. Fix this by calling sk_flush_backlog() only if one skb had been allocated and filled before last backlog check. Fixes: d41a69f1 ("tcp: make tcp_sendmsg() aware of socket backlog") Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 5月, 2016 11 次提交
-
-
由 Nicolas Dichtel 提交于
The handler 'ila_fill_encap_info' adds one attribute: ILA_ATTR_LOCATOR. Fixes: 65d7ab8d ("net: Identifier Locator Addressing module") CC: Tom Herbert <tom@herbertland.com> Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Wang 提交于
In the sendmsg function of UDP, raw, ICMP and l2tp sockets, we use local variables like hlimits, tclass, opt and dontfrag and pass them to corresponding functions like ip6_make_skb, ip6_append_data and xxx_push_pending_frames. This is not a good practice and makes it hard to add new parameters. This fix introduces a new struct ipcm6_cookie similar to ipcm_cookie in ipv4 and include the above mentioned variables. And we only pass the pointer to this structure to corresponding functions. This makes it easier to add new parameters in the future and makes the function cleaner. Signed-off-by: NWei Wang <weiwan@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sowmini Varadhan 提交于
An arbitration scheme for duelling SYNs is implemented as part of commit 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()") which ensures that both nodes involved will arrive at the same arbitration decision. However, this needs to be synchronized with an outgoing SYN to be generated by rds_tcp_conn_connect(). This commit achieves the synchronization through the t_conn_lock mutex in struct rds_tcp_connection. The rds_conn_state is checked in rds_tcp_conn_connect() after acquiring the t_conn_lock mutex. A SYN is sent out only if the RDS connection is not already UP (an UP would indicate that rds_tcp_accept_one() has completed 3WH, so no SYN needs to be generated). Similarly, the rds_conn_state is checked in rds_tcp_accept_one() after acquiring the t_conn_lock mutex. The only acceptable states (to allow continuation of the arbitration logic) are UP (i.e., outgoing SYN was SYN-ACKed by peer after it sent us the SYN) or CONNECTING (we sent outgoing SYN before we saw incoming SYN). Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sowmini Varadhan 提交于
There is a race condition between rds_send_xmit -> rds_tcp_xmit and the code that deals with resolution of duelling syns added by commit 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()"). Specifically, we may end up derefencing a null pointer in rds_send_xmit if we have the interleaving sequence: rds_tcp_accept_one rds_send_xmit conn is RDS_CONN_UP, so invoke rds_tcp_xmit tc = conn->c_transport_data rds_tcp_restore_callbacks /* reset t_sock */ null ptr deref from tc->t_sock The race condition can be avoided without adding the overhead of additional locking in the xmit path: have rds_tcp_accept_one wait for rds_tcp_xmit threads to complete before resetting callbacks. The synchronization can be done in the same manner as rds_conn_shutdown(). First set the rds_conn_state to something other than RDS_CONN_UP (so that new threads cannot get into rds_tcp_xmit()), then wait for RDS_IN_XMIT to be cleared in the conn->c_flags indicating that any threads in rds_tcp_xmit are done. Fixes: 241b2719 ("RDS-TCP: Reset tcp callbacks if re-using an outgoing socket in rds_tcp_accept_one()") Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Hosts sending lot of ACK packets exhibit high sock_wfree() cost because of cache line miss to test SOCK_USE_WRITE_QUEUE We could move this flag close to sk_wmem_alloc but it is better to perform the atomic_sub_and_test() on a clean cache line, as it avoid one extra bus transaction. skb_orphan_partial() can also have a fast track for packets that either are TCP acks, or already went through another skb_orphan_partial() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
In the case of the mlx4 and mlx5 driver they do not support IPv6 checksum offload for tunnels. With this being the case we should disable GSO in addition to the checksum offload features when we find that a device cannot perform a checksum on a given packet type. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
There are two flow control mechanisms in TIPC; one at link level that handles network congestion, burst control, and retransmission, and one at connection level which' only remaining task is to prevent overflow in the receiving socket buffer. In TIPC, the latter task has to be solved end-to-end because messages can not be thrown away once they have been accepted and delivered upwards from the link layer, i.e, we can never permit the receive buffer to overflow. Currently, this algorithm is message based. A counter in the receiving socket keeps track of number of consumed messages, and sends a dedicated acknowledge message back to the sender for each 256 consumed message. A counter at the sending end keeps track of the sent, not yet acknowledged messages, and blocks the sender if this number ever reaches 512 unacknowledged messages. When the missing acknowledge arrives, the socket is then woken up for renewed transmission. This works well for keeping the message flow running, as it almost never happens that a sender socket is blocked this way. A problem with the current mechanism is that it potentially is very memory consuming. Since we don't distinguish between small and large messages, we have to dimension the socket receive buffer according to a worst-case of both. I.e., the window size must be chosen large enough to sustain a reasonable throughput even for the smallest messages, while we must still consider a scenario where all messages are of maximum size. Hence, the current fix window size of 512 messages and a maximum message size of 66k results in a receive buffer of 66 MB when truesize(66k) = 131k is taken into account. It is possible to do much better. This commit introduces an algorithm where we instead use 1024-byte blocks as base unit. This unit, always rounded upwards from the actual message size, is used when we advertise windows as well as when we count and acknowledge transmitted data. The advertised window is based on the configured receive buffer size in such a way that even the worst-case truesize/msgsize ratio always is covered. Since the smallest possible message size (from a flow control viewpoint) now is 1024 bytes, we can safely assume this ratio to be less than four, which is the value we are now using. This way, we have been able to reduce the default receive buffer size from 66 MB to 2 MB with maintained performance. In order to keep this solution backwards compatible, we introduce a new capability bit in the discovery protocol, and use this throughout the message sending/reception path to always select the right unit. Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
During neighbor discovery, nodes advertise their capabilities as a bit map in a dedicated 16-bit field in the discovery message header. This bit map has so far only be stored in the node structure on the peer nodes, but we now see the need to keep a copy even in the socket structure. This commit adds this functionality. Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Paul Maloy 提交于
In the refactoring commit d570d864 ("tipc: enqueue arrived buffers in socket in separate function") we did by accident replace the test if (sk->sk_backlog.len == 0) atomic_set(&tsk->dupl_rcvcnt, 0); with if (sk->sk_backlog.len) atomic_set(&tsk->dupl_rcvcnt, 0); This effectively disables the compensation we have for the double receive buffer accounting that occurs temporarily when buffers are moved from the backlog to the socket receive queue. Until now, this has gone unnoticed because of the large receive buffer limits we are applying, but becomes indispensable when we reduce this buffer limit later in this series. We now fix this by inverting the mentioned condition. Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Julia Lawall 提交于
The vsock_transport structure is never modified, so declare it as const. Done with the help of Coccinelle. Signed-off-by: NJulia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
In presence of inelastic flows and stress, we can call fq_codel_drop() for every packet entering fq_codel qdisc. fq_codel_drop() is quite expensive, as it does a linear scan of 4 KB of memory to find a fat flow. Once found, it drops the oldest packet of this flow. Instead of dropping a single packet, try to drop 50% of the backlog of this fat flow, with a configurable limit of 64 packets per round. TCA_FQ_CODEL_DROP_BATCH_SIZE is the new attribute to make this limit configurable. With this strategy the 4 KB search is amortized to a single cache line per drop [1], so fq_codel_drop() no longer appears at the top of kernel profile in presence of few inelastic flows. [1] Assuming a 64byte cache line, and 1024 buckets Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NDave Taht <dave.taht@gmail.com> Cc: Jonathan Morton <chromatix99@gmail.com> Acked-by: NJesper Dangaard Brouer <brouer@redhat.com> Acked-by: Dave Taht Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 03 5月, 2016 20 次提交
-
-
由 Neil Horman 提交于
This was recently reported to me, and reproduced on the latest net kernel, when attempting to run netperf from a host that had a netem qdisc attached to the egress interface: [ 788.073771] ---------------------[ cut here ]--------------------------- [ 788.096716] WARNING: at net/core/dev.c:2253 skb_warn_bad_offload+0xcd/0xda() [ 788.129521] bnx2: caps=(0x00000001801949b3, 0x0000000000000000) len=2962 data_len=0 gso_size=1448 gso_type=1 ip_summed=3 [ 788.182150] Modules linked in: sch_netem kvm_amd kvm crc32_pclmul ipmi_ssif ghash_clmulni_intel sp5100_tco amd64_edac_mod aesni_intel lrw gf128mul glue_helper ablk_helper edac_mce_amd cryptd pcspkr sg edac_core hpilo ipmi_si i2c_piix4 k10temp fam15h_power hpwdt ipmi_msghandler shpchp acpi_power_meter pcc_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ahci ata_generic pata_acpi ttm libahci crct10dif_pclmul pata_atiixp tg3 libata crct10dif_common drm crc32c_intel ptp serio_raw bnx2 r8169 hpsa pps_core i2c_core mii dm_mirror dm_region_hash dm_log dm_mod [ 788.465294] CPU: 16 PID: 0 Comm: swapper/16 Tainted: G W ------------ 3.10.0-327.el7.x86_64 #1 [ 788.511521] Hardware name: HP ProLiant DL385p Gen8, BIOS A28 12/17/2012 [ 788.542260] ffff880437c036b8 f7afc56532a53db9 ffff880437c03670 ffffffff816351f1 [ 788.576332] ffff880437c036a8 ffffffff8107b200 ffff880633e74200 ffff880231674000 [ 788.611943] 0000000000000001 0000000000000003 0000000000000000 ffff880437c03710 [ 788.647241] Call Trace: [ 788.658817] <IRQ> [<ffffffff816351f1>] dump_stack+0x19/0x1b [ 788.686193] [<ffffffff8107b200>] warn_slowpath_common+0x70/0xb0 [ 788.713803] [<ffffffff8107b29c>] warn_slowpath_fmt+0x5c/0x80 [ 788.741314] [<ffffffff812f92f3>] ? ___ratelimit+0x93/0x100 [ 788.767018] [<ffffffff81637f49>] skb_warn_bad_offload+0xcd/0xda [ 788.796117] [<ffffffff8152950c>] skb_checksum_help+0x17c/0x190 [ 788.823392] [<ffffffffa01463a1>] netem_enqueue+0x741/0x7c0 [sch_netem] [ 788.854487] [<ffffffff8152cb58>] dev_queue_xmit+0x2a8/0x570 [ 788.880870] [<ffffffff8156ae1d>] ip_finish_output+0x53d/0x7d0 ... The problem occurs because netem is not prepared to handle GSO packets (as it uses skb_checksum_help in its enqueue path, which cannot manipulate these frames). The solution I think is to simply segment the skb in a simmilar fashion to the way we do in __dev_queue_xmit (via validate_xmit_skb), with some minor changes. When we decide to corrupt an skb, if the frame is GSO, we segment it, corrupt the first segment, and enqueue the remaining ones. tested successfully by myself on the latest net kernel, to which this applies Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> CC: Jamal Hadi Salim <jhs@mojatatu.com> CC: "David S. Miller" <davem@davemloft.net> CC: netem@lists.linux-foundation.org CC: eric.dumazet@gmail.com CC: stephen@networkplumber.org Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Locally generated TCP GSO packets having to go through a GRE/SIT/IPIP tunnel have to go through an expensive skb_unclone() Reallocating skb->head is a lot of work. Test should really check if a 'real clone' of the packet was done. TCP does not care if the original gso_type is changed while the packet travels in the stack. This adds skb_header_unclone() which is a variant of skb_clone() using skb_header_cloned() check instead of skb_cloned(). This variant can probably be used from other points. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
Add a new LINK_XSTATS_TYPE_BRIDGE attribute and implement the RTM_GETSTATS callbacks for IFLA_STATS_LINK_XSTATS (fill_linkxstats and get_linkxstats_size) in order to export the per-vlan stats. The paddings were added because soon these fields will be needed for per-port per-vlan stats (or something else if someone beats me to it) so avoiding at least a few more netlink attributes. Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
Add support for per-VLAN Tx/Rx statistics. Every global vlan context gets allocated a per-cpu stats which is then set in each per-port vlan context for quick access. The br_allowed_ingress() common function is used to account for Rx packets and the br_handle_vlan() common function is used to account for Tx packets. Stats accounting is performed only if the bridge-wide vlan_stats_enabled option is set either via sysfs or netlink. A struct hole between vlan_enabled and vlan_proto is used for the new option so it is in the same cache line. Currently it is binary (on/off) but it is intentionally restricted to exactly 0 and 1 since other values will be used in the future for different purposes (e.g. per-port stats). Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
Add callbacks to calculate the size and fill link extended statistics which can be split into multiple messages and are dumped via the new rtnl stats API (RTM_GETSTATS) with the IFLA_STATS_LINK_XSTATS attribute. Also add that attribute to the idx mask check since it is expected to be able to save state and resume dumping (e.g. future bridge per-vlan stats will be dumped via this attribute and callbacks). Each link type should nest its private attributes under the per-link type attribute. This allows to have any number of separated private attributes and to avoid one call to get the dev link type. Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
The new prividx argument allows the current dumping device to save a private state counter which would enable it to continue dumping from where it left off. And the idxattr is used to save the current idx user so multiple prividx using attributes can be requested at the same time as suggested by Roopa Prabhu. Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Changes in GREv6 transmit path: - Call gre_checksum, remove gre6_checksum - Rename ip6gre_xmit2 to __gre6_xmit - Call gre_build_header utility function - Call ip6_tnl_xmit common function - Call ip6_tnl_change_mtu, eliminate ip6gre_tunnel_change_mtu Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
A few generic changes to generalize tunnels in IPv6: - Export ip6_tnl_change_mtu so that it can be called by ip6_gre - Add tun_hlen to ip6_tnl structure. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Create common functions for both IPv4 and IPv6 GRE in transmit. These are put into gre.h. Common functions are for: - GRE checksum calculation. Move gre_checksum to gre.h. - Building a GRE header. Move GRE build_header and rename gre_build_header. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
This patch renames ip6_tnl_xmit2 to ip6_tnl_xmit and exports it. Other users like GRE will be able to call this. The original ip6_tnl_xmit function is renamed to ip6_tnl_start_xmit (this is an ndo_start_xmit function). Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
- Create gre_rcv function. This calls gre_parse_header and ip6gre_rcv. - Call ip6_tnl_rcv. Doing this and using gre_parse_header eliminates most of the code in ip6gre_rcv. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Several of the GRE functions defined in net/ipv4/ip_gre.c are usable for IPv6 GRE implementation (that is they are protocol agnostic). These include: - GRE flag handling functions are move to gre.h - GRE build_header is moved to gre.h and renamed gre_build_header - parse_gre_header is moved to gre_demux.c and renamed gre_parse_header - iptunnel_pull_header is taken out of gre_parse_header. This is now done by caller. The header length is returned from gre_parse_header in an int* argument. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Some basic changes to make IPv6 tunnel receive path look more like IPv4 path: - Make ip6_tnl_rcv non-static so that GREv6 and others can call it - Make ip6_tnl_rcv look like ip_tunnel_rcv - Switch to gro_cells_receive - Make ip6_tnl_rcv non-static and export it Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Large sendmsg()/write() hold socket lock for the duration of the call, unless sk->sk_sndbuf limit is hit. This is bad because incoming packets are parked into socket backlog for a long time. Critical decisions like fast retransmit might be delayed. Receivers have to maintain a big out of order queue with additional cpu overhead, and also possible stalls in TX once windows are full. Bidirectional flows are particularly hurt since the backlog can become quite big if the copy from user space triggers IO (page faults) Some applications learnt to use sendmsg() (or sendmmsg()) with small chunks to avoid this issue. Kernel should know better, right ? Add a generic sk_flush_backlog() helper and use it right before a new skb is allocated. Typically we put 64KB of payload per skb (unless MSG_EOR is requested) and checking socket backlog every 64KB gives good results. As a matter of fact, tests with TSO/GSO disabled give very nice results, as we manage to keep a small write queue and smaller perceived rtt. Note that sk_flush_backlog() maintains socket ownership, so is not equivalent to a {release_sock(sk); lock_sock(sk);}, to ensure implicit atomicity rules that sendmsg() was giving to (possibly buggy) applications. In this simple implementation, I chose to not call tcp_release_cb(), but we might consider this later. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Socket backlog processing is a major latency source. With current TCP socket sk_rcvbuf limits, I have sampled __release_sock() holding cpu for more than 5 ms, and packets being dropped by the NIC once ring buffer is filled. All users are now ready to be called from process context, we can unblock BH and let interrupts be serviced faster. cond_resched_softirq() could be removed, as it has no more user. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
sctp_inq_push() will soon be called without BH being blocked when generic socket code flushes the socket backlog. It is very possible SCTP can be converted to not rely on BH, but this needs to be done by SCTP experts. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
UDP uses the generic socket backlog code, and this will soon be changed to not disable BH when protocol is called back. We need to use appropriate SNMP accessors. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
DCCP uses the generic backlog code, and this will soon be changed to not disable BH when protocol is called back. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
AFAIK, nothing in current TCP stack absolutely wants BH being disabled once socket is owned by a thread running in process context. As mentioned in my prior patch ("tcp: give prequeue mode some care"), processing a batch of packets might take time, better not block BH at all. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We want to to make TCP stack preemptible, as draining prequeue and backlog queues can take lot of time. Many SNMP updates were assuming that BH (and preemption) was disabled. Need to convert some __NET_INC_STATS() calls to NET_INC_STATS() and some __TCP_INC_STATS() to TCP_INC_STATS() Before using this_cpu_ptr(net->ipv4.tcp_sk) in tcp_v4_send_reset() and tcp_v4_send_ack(), we add an explicit preempt disabled section. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 5月, 2016 1 次提交
-
-
由 Jiri Benc 提交于
iptunnel_pull_header expects that IP header was already pulled; with this expectation, it pulls the tunnel header. This is not true in gre_err. Furthermore, ipv4_update_pmtu and ipv4_redirect expect that skb->data points to the IP header. We cannot pull the tunnel header in this path. It's just a matter of not calling iptunnel_pull_header - we don't need any of its effects. Fixes: bda7bb46 ("gre: Allow multiple protocol listener for gre protocol.") Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-