- 03 5月, 2016 6 次提交
-
-
由 Tom Herbert 提交于
A few generic changes to generalize tunnels in IPv6: - Export ip6_tnl_change_mtu so that it can be called by ip6_gre - Add tun_hlen to ip6_tnl structure. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Create common functions for both IPv4 and IPv6 GRE in transmit. These are put into gre.h. Common functions are for: - GRE checksum calculation. Move gre_checksum to gre.h. - Building a GRE header. Move GRE build_header and rename gre_build_header. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
This patch renames ip6_tnl_xmit2 to ip6_tnl_xmit and exports it. Other users like GRE will be able to call this. The original ip6_tnl_xmit function is renamed to ip6_tnl_start_xmit (this is an ndo_start_xmit function). Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Several of the GRE functions defined in net/ipv4/ip_gre.c are usable for IPv6 GRE implementation (that is they are protocol agnostic). These include: - GRE flag handling functions are move to gre.h - GRE build_header is moved to gre.h and renamed gre_build_header - parse_gre_header is moved to gre_demux.c and renamed gre_parse_header - iptunnel_pull_header is taken out of gre_parse_header. This is now done by caller. The header length is returned from gre_parse_header in an int* argument. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Some basic changes to make IPv6 tunnel receive path look more like IPv4 path: - Make ip6_tnl_rcv non-static so that GREv6 and others can call it - Make ip6_tnl_rcv look like ip_tunnel_rcv - Switch to gro_cells_receive - Make ip6_tnl_rcv non-static and export it Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Large sendmsg()/write() hold socket lock for the duration of the call, unless sk->sk_sndbuf limit is hit. This is bad because incoming packets are parked into socket backlog for a long time. Critical decisions like fast retransmit might be delayed. Receivers have to maintain a big out of order queue with additional cpu overhead, and also possible stalls in TX once windows are full. Bidirectional flows are particularly hurt since the backlog can become quite big if the copy from user space triggers IO (page faults) Some applications learnt to use sendmsg() (or sendmmsg()) with small chunks to avoid this issue. Kernel should know better, right ? Add a generic sk_flush_backlog() helper and use it right before a new skb is allocated. Typically we put 64KB of payload per skb (unless MSG_EOR is requested) and checking socket backlog every 64KB gives good results. As a matter of fact, tests with TSO/GSO disabled give very nice results, as we manage to keep a small write queue and smaller perceived rtt. Note that sk_flush_backlog() maintains socket ownership, so is not equivalent to a {release_sock(sk); lock_sock(sk);}, to ensure implicit atomicity rules that sendmsg() was giving to (possibly buggy) applications. In this simple implementation, I chose to not call tcp_release_cb(), but we might consider this later. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Alexei Starovoitov <ast@fb.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 5月, 2016 2 次提交
-
-
由 Sudarsana Reddy Kalluru 提交于
This patch adds the functionality and APIs needed for selftests. It adds the ability to configure the link-mode which is required for the implementation of loopback tests. It adds the APIs for clock test, register test, interrupt test and memory test. Signed-off-by: NSudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com> Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Marcelo Ricardo Leitner 提交于
Dave Miller pointed out that fb586f25 ("sctp: delay calls to sk_data_ready() as much as possible") may insert latency specially if the receiving application is running on another CPU and that it would be better if we signalled as early as possible. This patch thus basically inverts the logic on fb586f25 and signals it as early as possible, similar to what we had before. Fixes: fb586f25 ("sctp: delay calls to sk_data_ready() as much as possible") Reported-by: NDave Miller <davem@davemloft.net> Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 4月, 2016 5 次提交
-
-
由 Maor Gottlieb 提交于
Allocating CPU rmap and add entry for each IRQ. CPU rmap is used in aRFS to get the RX queue number of the RX completion interrupts. Signed-off-by: NMaor Gottlieb <maorg@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Maor Gottlieb 提交于
Currently, consumers of the flow steering infrastructure can't choose their own flow table levels and are limited to one flow table per level. This just waste levels. Instead, we introduce here the possibility to use multiple flow tables in a level. The user is free to connect these flow tables, while following the rule (FTEs in FT of level x could only point to FTs of level y where y > x). In addition this patch switch the order of the create/destroy flow tables of the NIC(vlan and main). Signed-off-by: NMaor Gottlieb <maorg@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Maor Gottlieb 提交于
This API is used for modifying the flow rule destination. This is needed for modifying the pointed flow table by the traffic type classifier rules to point on the aRFS tables. Signed-off-by: NMaor Gottlieb <maorg@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nikolay Aleksandrov 提交于
is_skb_forwardable is not supposed to change anything so constify its arguments Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Guillaume Nault 提交于
Define PPP device handler for use with rtnetlink. The only PPP specific attribute is IFLA_PPP_DEV_FD. It is mandatory and contains the file descriptor of the associated /dev/ppp instance (the file descriptor which would have been used for ioctl(PPPIOCNEWUNIT) in the ioctl-based API). The PPP device is removed when this file descriptor is released (same behaviour as with ioctl based PPP devices). PPP devices created with the rtnetlink API behave like the ones created with ioctl(PPPIOCNEWUNIT). In particular existing ioctls work the same way, no matter how the PPP device was created. The rtnl callbacks are also assigned to ioctl based PPP devices. This way, rtnl messages have the same effect on any PPP devices. The immediate effect is that all PPP devices, even ioctl-based ones, can now be removed with "ip link del". A minor difference still exists between ioctl and rtnl based PPP interfaces: in the device name, the number following the "ppp" prefix corresponds to the PPP unit number for ioctl based devices, while it is just an unrelated incrementing index for rtnl ones. Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 4月, 2016 4 次提交
-
-
由 Florian Fainelli 提交于
This patch overloads the DSA master netdev, aka CPU Ethernet MAC to also include switch-side statistics, which is useful for debugging purposes, when the switch is not properly connected to the Ethernet MAC (duplex mismatch, (RG)MII electrical issues etc.). We accomplish this by retaining the original copy of the master netdev's ethtool_ops, and just overload the 3 operations we care about: get_sset_count, get_strings and get_ethtool_stats so as to intercept these calls and call into the original master_netdev ethtool_ops, plus our own. We take this approach as opposed to providing a set of DSA helper functions that would retrive the CPU port's statistics, because the entire purpose of DSA is to allow unmodified Ethernet MAC drivers to be used as CPU conduit interfaces, therefore, statistics overlay in such drivers would simply not scale. The new ethtool -S <iface> output would therefore look like this now: <iface> statistics p<2 digits cpu port number>_<switch MIB counter names> Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Kazior 提交于
mac80211 (which will be the first user of the fq.h) recently started to support software A-MSDU aggregation. It glues skbuffs together into a single one so the backlog accounting needs to be more fine-grained. To avoid backlog sorting logic duplication split it up for re-use. Signed-off-by: NMichal Kazior <michal.kazior@tieto.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Martin KaFai Lau 提交于
This patch adds an eor bit to the TCP_SKB_CB. When MSG_EOR is passed to tcp_sendmsg, the eor bit will be set at the skb containing the last byte of the userland's msg. The eor bit will prevent data from appending to that skb in the future. The change in do_tcp_sendpages is to honor the eor set during the previous tcp_sendmsg(MSG_EOR) call. This patch handles the tcp_sendmsg case. The followup patches will handle other skb coalescing and fragment cases. One potential use case is to use MSG_EOR with SOF_TIMESTAMPING_TX_ACK to get a more accurate TCP ack timestamping on application protocol with multiple outgoing response messages (e.g. HTTP2). Packetdrill script for testing: ~~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 14600) = 14600 0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730 0.200 sendto(4, ..., 730, MSG_EOR, ..., ...) = 730 0.200 > . 1:7301(7300) ack 1 0.200 > P. 7301:14601(7300) ack 1 0.300 < . 1:1(0) ack 14601 win 257 0.300 > P. 14601:15331(730) ack 1 0.300 > P. 15331:16061(730) ack 1 0.400 < . 1:1(0) ack 16061 win 257 0.400 close(4) = 0 0.400 > F. 16061:16061(0) ack 1 0.400 < F. 1:1(0) ack 16062 win 257 0.400 > . 16062:16062(0) ack 2 Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Suggested-by: NEric Dumazet <edumazet@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Soheil Hassas Yeganeh 提交于
The SKBTX_ACK_TSTAMP flag is set in skb_shinfo->tx_flags when the timestamp of the TCP acknowledgement should be reported on error queue. Since accessing skb_shinfo is likely to incur a cache-line miss at the time of receiving the ack, the txstamp_ack bit was added in tcp_skb_cb, which is set iff the SKBTX_ACK_TSTAMP flag is set for an skb. This makes SKBTX_ACK_TSTAMP flag redundant. Remove the SKBTX_ACK_TSTAMP and instead use the txstamp_ack bit everywhere. Note that this frees one bit in shinfo->tx_flags. Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NMartin KaFai Lau <kafai@fb.com> Suggested-by: NWillem de Bruijn <willemb@google.com> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 4月, 2016 20 次提交
-
-
由 Eric Dumazet 提交于
I accidentally replaced BH disabling by preemption disabling in SNMP_ADD_STATS64() and SNMP_UPD_PO_STATS64() on 32bit builds. For 64bit stats on 32bit arch, we really need to disable BH, since the "struct u64_stats_sync syncp" might be manipulated both from process and BH contexts. Fixes: 6aef70a8 ("net: snmp: kill various STATS_USER() helpers") Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Tested-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
SOCKWQ_ASYNC_WAITDATA is set/cleared in sk_wait_data() and equivalent functions, so that sock_wake_async() can send a SIGIO only when necessary. Since these atomic operations are really not needed unless socket expressed interest in FASYNC, we can omit them in most cases. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
SOCKWQ_ASYNC_NOSPACE is tested in sock_wake_async() so that a SIGIO signal is sent when needed. tcp_sendmsg() clears the bit. tcp_poll() sets the bit when stream is not writeable. We can avoid two atomic operations by first checking if socket is actually interested in the FASYNC business (most sockets in real applications do not use AIO, but select()/poll()/epoll()) This also removes one cache line miss to access sk->sk_wq->flags in tcp_sendmsg() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
There is nothing related to BH in SNMP counters anymore, since linux-3.0. Rename helpers to use __ prefix instead of _BH prefix, for contexts where preemption is disabled. This more closely matches convention used to update percpu variables. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
IPv6 ICMP stats are atomics anyway. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename IP6_UPD_PO_STATS_BH() to __IP6_UPD_PO_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename IP6_INC_STATS_BH() to __IP6_INC_STATS() and IP6_ADD_STATS_BH() to __IP6_ADD_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename NET_INC_STATS_BH() to __NET_INC_STATS() and NET_ADD_STATS_BH() to __NET_ADD_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename IP_UPD_PO_STATS_BH() to __IP_UPD_PO_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename IP_ADD_STATS_BH() to __IP_ADD_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename ICMP6_INC_STATS_BH() to __ICMP6_INC_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename IP_INC_STATS_BH() to __IP_INC_STATS(), to better express this is used in non preemptible context. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename SCTP_INC_STATS_BH() to __SCTP_INC_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Remove misleading _BH suffix. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename TCP_INC_STATS_BH() to __TCP_INC_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Not used anymore. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename UDP_INC_STATS_BH() to __UDP_INC_STATS(), and UDP6_INC_STATS_BH() to __UDP6_INC_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Rename ICMP_INC_STATS_BH() to __ICMP_INC_STATS() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
In the old days (before linux-3.0), SNMP counters were duplicated, one for user context, and one for BH context. After commit 8f0ea0fe ("snmp: reduce percpu needs by 50%") we have a single copy, and what really matters is preemption being enabled or disabled, since we use this_cpu_inc() or __this_cpu_inc() respectively. We therefore kill SNMP_INC_STATS_USER(), SNMP_ADD_STATS_USER(), NET_INC_STATS_USER(), NET_ADD_STATS_USER(), SCTP_INC_STATS_USER(), SNMP_INC_STATS64_USER(), SNMP_ADD_STATS64_USER(), TCP_ADD_STATS_USER(), UDP_INC_STATS_USER(), UDP6_INC_STATS_USER(), and XFRM_INC_STATS_USER() Following patches will rename __BH helpers to make clear their usage is not tied to BH being disabled. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
sd->input_queue_head is incremented for each processed packet in process_backlog(), and read from other cpus performing Out Of Order avoidance in get_rps_cpu() Moving this field in a separate cache line keeps it mostly hot for the cpu in process_backlog(), as other cpus will only read it. In a stress test, process_backlog() was consuming 6.80 % of cpu cycles, and the patch reduced the cost to 0.65 % Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 4月, 2016 3 次提交
-
-
由 Linus Torvalds 提交于
This is more prep-work for the upcoming pty changes. Still just code cleanup with no actual semantic changes. This removes a bunch pointless complexity by just having the slave pty side remember the dentry associated with the devpts slave rather than the inode. That allows us to remove all the "look up the dentry" code for when we want to remove it again. Together with moving the tty pointer from "inode->i_private" to "dentry->d_fsdata" and getting rid of pointless inode locking, this removes about 30 lines of code. Not only is the end result smaller, it's simpler and easier to understand. The old code, for example, depended on the d_find_alias() to not just find the dentry, but also to check that it is still hashed, which in turn validated the tty pointer in the inode. That is a _very_ roundabout way to say "invalidate the cached tty pointer when the dentry is removed". The new code just does dentry->d_fsdata = NULL; in devpts_pty_kill() instead, invalidating the tty pointer rather more directly and obviously. Don't do something complex and subtle when the obvious straightforward approach will do. The rest of the patch (ie apart from code deletion and the above tty pointer clearing) is just switching the calling convention to pass the dentry or file pointer around instead of the inode. Cc: Eric Biederman <ebiederm@xmission.com> Cc: Peter Anvin <hpa@zytor.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Peter Hurley <peter@hurleysoftware.com> Cc: Serge Hallyn <serge.hallyn@ubuntu.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Aurelien Jarno <aurelien@aurel32.net> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jann Horn <jann@thejh.net> Cc: Greg KH <greg@kroah.com> Cc: Jiri Slaby <jslaby@suse.com> Cc: Florian Weimer <fw@deneb.enyo.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Saeed Mahameed 提交于
Now as rx-vlan offload can be disabled, packets can be received with vlan tag not stripped, which means is_first_ethertype_ip will return false, for that we need to check if the hardware reported csum OK so we will report CHECKSUM_UNNECESSARY for those packets. Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Gal Pressman 提交于
Use ethtool -K <interface> rxvlan <on/off> to enable/disable C-TAG vlan stripping by hardware. Signed-off-by: NGal Pressman <galp@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-