- 14 3月, 2016 2 次提交
-
-
由 David Howells 提交于
Replace all "unsigned" types with "unsigned int" types. Reported-by: NDavid Miller <davem@davemloft.net> Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexander Duyck 提交于
This patch updates the GRO handlers for GRE, VXLAN, GENEVE, and FOU so that we do not clear the flush bit until after we have called the next level GRO handler. Previously this was being cleared before parsing through the list of frames, however this resulted in several paths where either the bit needed to be reset but wasn't as in the case of FOU, or cases where it was being set as in GENEVE. By just deferring the clearing of the bit until after the next level protocol has been parsed we can avoid any unnecessary bit twiddling and avoid bugs. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 3月, 2016 4 次提交
-
-
由 Daniel Borkmann 提交于
This patch extends bpf_tunnel_key with a tunnel_label member, that maps to ip_tunnel_key's label so underlying backends like vxlan and geneve can propagate the label to udp_tunnel6_xmit_skb(), where it's being set in the IPv6 header. It allows for having 20 more bits to encode/decode flow related meta information programmatically. Tested with vxlan and geneve. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
This patch extends udp_tunnel6_xmit_skb() to pass in the IPv6 flow label from call sites. Currently, there's no such option and it's always set to zero when writing ip6_flow_hdr(). Add a label member to ip_tunnel_key, so that flow-based tunnels via collect metadata frontends can make use of it. vxlan and geneve will be converted to add flow label support separately. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
This fixes a regression in the bridge ageing time caused by: commit c62987bb ("bridge: push bridge setting ageing_time down to switchdev") There are users of Linux bridge which use the feature that if ageing time is set to 0 it causes entries to never expire. See: https://www.linuxfoundation.org/collaborate/workgroups/networking/bridge For a pure software bridge, it is unnecessary for the code to have arbitrary restrictions on what values are allowable. Signed-off-by: NStephen Hemminger <stephen@networkplumber.org> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
Cast pointer to unsigned long instead of u64, to fix compilation warning on 32 bit arch, spotted by 0day build. Fixes: 5b33f488 ("net/flower: Introduce hardware offload support") Signed-off-by: NAmir Vadai <amir@vadai.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 3月, 2016 3 次提交
-
-
由 Amir Vadai 提交于
Will be used in a following patch to query if a key is being used, and what it's value in the target object. Acked-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NAmir Vadai <amir@vadai.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amir Vadai 提交于
This patch is based on a patch made by John Fastabend. It adds support for offloading cls_flower. when NETIF_F_HW_TC is on: flags = 0 => Rule will be processed twice - by hardware, and if still relevant, by software. flags = SKIP_HW => Rull will be processed by software only If hardware fail/not capabale to apply the rule, operation will NOT fail. Filter will be processed by SW only. Acked-by: NJiri Pirko <jiri@mellanox.com> Suggested-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NAmir Vadai <amir@vadai.me> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neil Armstrong 提交于
The initial commit badly merged into the dsa_resume method instead of the dsa_remove_dst method. As consequence, the dst->master_netdev->dsa_ptr is not set to NULL on removal and re-bind of the dsa device fails with error -17. Fixes: b0dc635d ("net: dsa: cleanup resources upon module removal ") Signed-off-by: NNeil Armstrong <narmstrong@baylibre.com> Acked-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 3月, 2016 13 次提交
-
-
由 Willem de Bruijn 提交于
Replace link layer header validation check ll_header_truncate with more generic dev_validate_header. Validation based on hard_header_len incorrectly drops valid packets in variable length protocols, such as AX25. dev_validate_header calls header_ops.validate for such protocols to ensure correctness below hard_header_len. See also http://comments.gmane.org/gmane.linux.network/401064 Fixes 9c707762 ("packet: make packet_snd fail on len smaller than l2 header") Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Willem de Bruijn 提交于
As variable length protocol, AX25 fails link layer header validation tests based on a minimum length. header_ops.validate allows protocols to validate headers that are shorter than hard_header_len. Implement this callback for AX25. See also http://comments.gmane.org/gmane.linux.network/401064Signed-off-by: NWillem de Bruijn <willemb@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
This patch adds receive timeout for message assembly on the attached TCP sockets. The timeout is set when a new messages is started and the whole message has not been received by TCP (not in the receive queue). If the completely message is subsequently received the timer is cancelled, if the timer expires the RX side is aborted. The timeout value is taken from the socket timeout (SO_RCVTIMEO) that is set on a TCP socket (i.e. set by get sockopt before attaching a TCP socket to KCM. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Message assembly is performed on the TCP socket. This is logically equivalent of an application that performs a peek on the socket to find out how much memory is needed for a receive buffer. The receive socket buffer also provides the maximum message size which is checked. The receive algorithm is something like: 1) Receive the first skbuf for a message (or skbufs if multiple are needed to determine message length). 2) Check the message length against the number of bytes in the TCP receive queue (tcp_inq()). - If all the bytes of the message are in the queue (incluing the skbuf received), then proceed with message assembly (it should complete with the tcp_read_sock) - Else, mark the psock with the number of bytes needed to complete the message. 3) In TCP data ready function, if the psock indicates that we are waiting for the rest of the bytes of a messages, check the number of queued bytes against that. - If there are still not enough bytes for the message, just return - Else, clear the waiting bytes and proceed to receive the skbufs. The message should now be received in one tcp_read_sock Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Implement kcm_sendpage. Set in sendpage to kcm_sendpage in both dgram and seqpacket ops. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Implement kcm_splice_read. This is supported only for seqpacket. Add kcm_seqpacket_ops and set splice read to kcm_splice_read. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
This patch adds various counters for KCM. These include counters for messages and bytes received or sent, as well as counters for number of attached/unattached TCP sockets and other error or edge events. The statistics are exposed via a proc interface. /proc/net/kcm provides statistics per KCM socket and per psock (attached TCP sockets). /proc/net/kcm_stats provides aggregate statistics. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
This module implements the Kernel Connection Multiplexor. Kernel Connection Multiplexor (KCM) is a facility that provides a message based interface over TCP for generic application protocols. With KCM an application can efficiently send and receive application protocol messages over TCP using datagram sockets. For more information see the included Documentation/networking/kcm.txt Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Create a common kernel function to get the number of bytes available on a TCP socket. This is based on code in INQ getsockopt and we now call the function for that getsockopt. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Add walking of fragments in __skb_splice_bits. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Add a new msg flag called MSG_BATCH. This flag is used in sendmsg to indicate that more messages will follow (i.e. a batch of messages is being sent). This is similar to MSG_MORE except that the following messages are not merged into one packet, they are sent individually. sendmmsg is updated so that each contained message except for the last one is marked as MSG_BATCH. MSG_BATCH is a performance optimization in cases where a socket implementation can benefit by transmitting packets in a batch. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
This patch allows setting MSG_EOR in each individual msghdr passed in sendmmsg. This allows a sendmmsg to send multiple messages when using SOCK_SEQPACKET. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Export it for cases where we want to create sockets by hand. Signed-off-by: NTom Herbert <tom@herbertland.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 3月, 2016 10 次提交
-
-
由 Michal Kubeček 提交于
One of our customers observed issues with FIB6 garbage collectors running in different network namespaces blocking each other, resulting in soft lockups (fib6_run_gc() initiated from timer runs always in forced mode). Now that FIB6 walkers are separated per namespace, there is no more need for instances of fib6_run_gc() in different namespaces blocking each other. There is still a call to icmp6_dst_gc() which operates on shared data but this function is protected by its own shared lock. Signed-off-by: NMichal Kubecek <mkubecek@suse.cz> Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Kubeček 提交于
The IPv6 FIB data structures are separated per network namespace but there is still only one global walkers list and one global walker list lock. This means changes in one namespace unnecessarily interfere with walkers in other namespaces. Replace the global list with per-netns lists (and give each its own lock). Signed-off-by: NMichal Kubecek <mkubecek@suse.cz> Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Kubeček 提交于
Global variable gc_args is only used in fib6_run_gc() and functions called from it. As fib6_run_gc() makes sure there is at most one instance of fib6_clean_all() running at any moment, we can replace gc_args with a local variable which will be needed once multiple instances (per netns) of garbage collector are allowed. Signed-off-by: NMichal Kubecek <mkubecek@suse.cz> Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Kyeong Yoo 提交于
This fix is for dsmark similar to commit 3557619f ("net_sched: prio: use qdisc_dequeue_peeked") and makes use of qdisc_dequeue_peeked() instead of direct dequeue() call. First time, wrr peeks dsmark, which will then peek into sfq. sfq dequeues an skb and it's stored in sch->gso_skb. Next time, wrr tries to dequeue from dsmark, which will call sfq dequeue directly. This results skipping the previously peeked skb. So changed dsmark dequeue to call qdisc_dequeue_peeked() instead to use peeked skb if exists. Signed-off-by: NKyeong Yoo <kyeong.yoo@alliedtelesis.co.nz> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
The assumptions from commit 0c1d70af ("net: use dst_cache for vxlan device"), 468dfffc ("geneve: add dst caching support") and 3c1cb4d2 ("net/ipv4: add dst cache support for gre lwtunnels") on dst_cache usage when ip_tunnel_info is used is unfortunately not always valid as assumed. While it seems correct for ip_tunnel_info front-ends such as OVS, eBPF however can fill in ip_tunnel_info for consumers like vxlan, geneve or gre with different remote dsts, tos, etc, therefore they cannot be assumed as packet independent. Right now vxlan, geneve, gre would cache the dst for eBPF and every packet would reuse the same entry that was first created on the initial route lookup. eBPF doesn't store/cache the ip_tunnel_info, so each skb may have a different one. Fix it by adding a flag that checks the ip_tunnel_info. Also the !tos test in vxlan needs to be handeled differently in this context as it is currently inferred from ip_tunnel_info as well if present. ip_tunnel_dst_cache_usable() helper is added for the three tunnel cases, which checks if we can use dst cache. Fixes: 0c1d70af ("net: use dst_cache for vxlan device") Fixes: 468dfffc ("geneve: add dst caching support") Fixes: 3c1cb4d2 ("net/ipv4: add dst cache support for gre lwtunnels") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NPaolo Abeni <pabeni@redhat.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
After eBPF being able to programmatically access/manage tunnel key meta data via commit d3aa45ce ("bpf: add helpers to access tunnel metadata") and more recently also for IPv6 through c6c33454 ("bpf: support ipv6 for bpf_skb_{set,get}_tunnel_key"), this work adds two complementary helpers to generically access their auxiliary tunnel options. Geneve and vxlan support this facility. For geneve, TLVs can be pushed, and for the vxlan case its GBP extension. I.e. setting tunnel key for geneve case only makes sense, if we can also read/write TLVs into it. In the GBP case, it provides the flexibility to easily map the group policy ID in combination with other helpers or maps. I chose to model this as two separate helpers, bpf_skb_{set,get}_tunnel_opt(), for a couple of reasons. bpf_skb_{set,get}_tunnel_key() is already rather complex by itself, and there may be cases for tunnel key backends where tunnel options are not always needed. If we would have integrated this into bpf_skb_{set,get}_tunnel_key() nevertheless, we are very limited with remaining helper arguments, so keeping compatibility on structs in case of passing in a flat buffer gets more cumbersome. Separating both also allows for more flexibility and future extensibility, f.e. options could be fed directly from a map, etc. Moreover, change geneve's xmit path to test only for info->options_len instead of TUNNEL_GENEVE_OPT flag. This makes it more consistent with vxlan's xmit path and allows for avoiding to specify a protocol flag in the API on xmit, so it can be protocol agnostic. Having info->options_len is enough information that is needed. Tested with vxlan and geneve. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Added by 9a628224 ("ip_tunnel: Add dont fragment flag."), allow to feed df flag into tunneling facilities (currently supported on TX by vxlan, geneve and gre) as a hint from eBPF's bpf_skb_set_tunnel_key() helper. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
They are only used here, so there's no reason they should not be static. Only the vlan push/pop protos are used in the test_bpf suite. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
When overwriting parts of the packet with bpf_skb_store_bytes() that were fed previously into skb->hash calculation, we should clear the current hash with skb_clear_hash(), so that a next skb_get_hash() call can determine the correct hash related to this skb. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
Commit 7d672345 ("bpf: add generic bpf_csum_diff helper") added a generic checksum diff helper that can feed bpf_l4_csum_replace() with a target __wsum diff that is to be applied to the L4 checksum. This facility is very flexible, can be cascaded, allows for adding, removing, or diffing data, or for calculating the pseudo header checksum from scratch, but it can also be reused for working with the IPv4 header checksum. Thus, analogous to bpf_l4_csum_replace(), add a case for header field value of 0 to change the checksum at a given offset through a new helper csum_replace_by_diff(). Also, in addition to that, this provides an easy to use interface for feeding precalculated diffs f.e. coming from a map. It nicely complements bpf_l3_csum_replace() that currently allows only for csum updates of 2 and 4 byte diffs. Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 3月, 2016 6 次提交
-
-
由 Eric Dumazet 提交于
If final packet (ACK) of 3WHS is lost, it appears we do not properly account the following incoming segment into tcpi_segs_in While we are at it, starts segs_in with one, to count the SYN packet. We do not yet count number of SYN we received for a request sock, we might add this someday. packetdrill script showing proper behavior after fix : // Tests tcpi_segs_in when 3rd packet (ACK) of 3WHS is lost 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +0 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop> +0 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK> +.020 < P. 1:1001(1000) ack 1 win 32792 +0 accept(3, ..., ...) = 4 +.000 %{ assert tcpi_segs_in == 2, 'tcpi_segs_in=%d' % tcpi_segs_in }% Fixes: 2efd055c ("tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Bill Sommerfeld 提交于
IPv4 interprets a negative return value from a protocol handler as a request to redispatch to a new protocol. In contrast, IPv6 interprets a negative value as an error, and interprets a positive value as a request for redispatch. UDP for IPv6 was unaware of this difference. Change __udp6_lib_rcv() to return a positive value for redispatch. Note that the socket's encap_rcv hook still needs to return a negative value to request dispatch, and in the case of IPv6 packets, adjust IP6CB(skb)->nhoff to identify the byte containing the next protocol. Signed-off-by: NBill Sommerfeld <wsommerfeld@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Richard Alpe 提交于
Make the c files less cluttered and enable netlink attributes to be shared between files. Signed-off-by: NRichard Alpe <richard.alpe@ericsson.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Acked-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Zhang Shengju 提交于
Currently, arp_rcv() always return zero on a packet delivery upcall. To make its behavior more compliant with the way this API should be used, this patch changes this to let it return NET_RX_SUCCESS when the packet is proper handled, and NET_RX_DROP otherwise. v1->v2: If sanity check is failed, call kfree_skb() instead of consume_skb(), then return the correct return value. Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Tang 提交于
This patch fixes the checkpatch.pl error to netlabel_domainhash.c: ERROR: do not initialise statics to NULL Signed-off-by: NWei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Wei Tang 提交于
This patch fixes the checkpatch.pl error to netlabel_unlabeled.c: ERROR: do not initialise statics to 0 or NULL Signed-off-by: NWei Tang <tangwei@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 3月, 2016 2 次提交
-
-
由 Jon Paul Maloy 提交于
Until now, we have kept a pre-allocated protocol message header aggregated into struct tipc_link. Apart from adding unnecessary footprint to the link instances, this requires extra code both to initialize and re-initialize it. We now remove this sub-optimization. This change also makes it possible to clean up the function tipc_build_proto_msg() and remove a couple of small functions that were accessing the mentioned header. In particular, we can replace all occurrences of the local function call link_own_addr(link) with the generic tipc_own_addr(net). Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parthasarathy Bhuvaragan 提交于
commit 4d5cfcba ('tipc: fix connection abort during subscription cancel'), removes the check for a valid subscription before calling tipc_nametbl_subscribe(). This will lead to a nullptr exception when we process a subscription cancel request. For a cancel request, a null subscription is passed to tipc_nametbl_subscribe() resulting in exception. In this commit, we call tipc_nametbl_subscribe() only for a valid subscription. Fixes: 4d5cfcba ('tipc: fix connection abort during subscription cancel') Reported-by: NAnders Widell <anders.widell@ericsson.com> Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-