- 30 11月, 2017 6 次提交
-
-
由 David Miller 提交于
XFRM bundle child chains look like this: xdst1 --> xdst2 --> xdst3 --> path_dst All of xdstN are xfrm_dst objects and xdst->u.dst.xfrm is non-NULL. The final child pointer in the chain, here called 'path_dst', is some other kind of route such as an ipv4 or ipv6 one. The xfrm output path pops routes, one at a time, via the child pointer, until we hit one which has a dst->xfrm pointer which is NULL. We can easily preserve the above mechanisms with child sitting only in the xfrm_dst structure. All children in the chain before we break out of the xfrm_output() loop have dst->xfrm non-NULL and are therefore xfrm_dst objects. Since we break out of the loop when we find dst->xfrm NULL, we will not try to dereference 'dst' as if it were an xfrm_dst. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Miller 提交于
This will make a future change moving the dst->child pointer less invasive. Signed-off-by: NDavid S. Miller <davem@davemloft.net> Reviewed-by: NEric Dumazet <edumazet@google.com>
-
由 David Miller 提交于
Only IPSEC routes have a non-NULL dst->child pointer. And IPSEC routes are identified by a non-NULL dst->xfrm pointer. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net> Reviewed-by: NEric Dumazet <edumazet@google.com>
-
由 David Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net> Reviewed-by: NEric Dumazet <edumazet@google.com>
-
由 David Miller 提交于
Delete it. Signed-off-by: NDavid S. Miller <davem@davemloft.net> Reviewed-by: NEric Dumazet <edumazet@google.com>
-
- 29 11月, 2017 2 次提交
-
-
由 Xin Long 提交于
Now each stream sched ops is defined in different .c file and added into the global ops in another .c file, it uses extern to make this work. However extern is not good coding style to get them in and even make C=2 reports errors for this. This patch adds sctp_sched_ops_xxx_init for each stream sched ops in their .c file, then get them into the global ops by calling them when initializing sctp module. Fixes: 637784ad ("sctp: introduce priority based stream scheduler") Fixes: ac1ed8b8 ("sctp: introduce round robin stream scheduler") Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
Now sctp_csum_xxx doesn't really match the param types of these common csum apis. As sctp_csum_xxx is defined in sctp/checksum.h, many sparse errors occur when make C=2 not only with M=net/sctp but also with other modules that include this header file. This patch is to force them fit in csum apis with the right types. Fixes: e6d8b64b ("net: sctp: fix and consolidate SCTP checksumming code") Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 11月, 2017 1 次提交
-
-
由 Johannes Berg 提交于
When connected to a QoS/WMM AP, mac80211 should use a QoS NDP for probing it, instead of a regular non-QoS one, fix this. Change all the drivers to *not* allow QoS NDP for now, even though it looks like most of them should be OK with that. Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
- 24 11月, 2017 1 次提交
-
-
由 Willem de Bruijn 提交于
Tuntap and similar devices can inject GSO packets. Accept type VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively. Processes are expected to use feature negotiation such as TUNSETOFFLOAD to detect supported offload types and refrain from injecting other packets. This process breaks down with live migration: guest kernels do not renegotiate flags, so destination hosts need to expose all features that the source host does. Partially revert the UFO removal from 182e0b6b~1..d9d30adf. This patch introduces nearly(*) no new code to simplify verification. It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP insertion and software UFO segmentation. It does not reinstate protocol stack support, hardware offload (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception of VIRTIO_NET_HDR_GSO_UDP packets in tuntap. To support SKB_GSO_UDP reappearing in the stack, also reinstate logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD by squashing in commit 93991221 ("net: skb_needs_check() removes CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6 ("net: avoid skb_warn_bad_offload false positives on UFO"). (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id, ipv6_proxy_select_ident is changed to return a __be32 and this is assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted at the end of the enum to minimize code churn. Tested Booted a v4.13 guest kernel with QEMU. On a host kernel before this patch `ethtool -k eth0` shows UFO disabled. After the patch, it is enabled, same as on a v4.13 host kernel. A UFO packet sent from the guest appears on the tap device: host: nc -l -p -u 8000 & tcpdump -n -i tap0 guest: dd if=/dev/zero of=payload.txt bs=1 count=2000 nc -u 192.16.1.1 8000 < payload.txt Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds, packets arriving fragmented: ./with_tap_pair.sh ./tap_send_ufo tap0 tap1 (from https://github.com/wdebruij/kerneltools/tree/master/tests) Changes v1 -> v2 - simplified set_offload change (review comment) - documented test procedure Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com> Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.") Reported-by: NMichal Kubecek <mkubecek@suse.cz> Signed-off-by: NWillem de Bruijn <willemb@google.com> Acked-by: NJason Wang <jasowang@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 11月, 2017 1 次提交
-
-
由 Neal Cardwell 提交于
Fix the TLP scheduling logic so that when scheduling a TLP probe, we ensure that the estimated time at which an RTO would fire accounts for the fact that ACKs indicating forward progress should push back RTO times. After the following fix: df92c839 ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") we had an unintentional behavior change in the following kind of scenario: suppose the RTT variance has been very low recently. Then suppose we send out a flight of N packets and our RTT is 100ms: t=0: send a flight of N packets t=100ms: receive an ACK for N-1 packets The response before df92c839 that was: -> schedule a TLP for now + RTO_interval The response after df92c839 is: -> schedule a TLP for t=0 + RTO_interval Since RTO_interval = srtt + RTT_variance, this means that we have scheduled a TLP timer at a point in the future that only accounts for RTT_variance. If the RTT_variance term is small, this means that the timer fires soon. Before df92c839 this would not happen, because in that code, when we receive an ACK for a prefix of flight, we did: 1) Near the top of tcp_ack(), switch from TLP timer to RTO at write_queue_head->paket_tx_time + RTO_interval: if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE) tcp_rearm_rto(sk); 2) In tcp_clean_rtx_queue(), update the RTO to now + RTO_interval: if (flag & FLAG_ACKED) { tcp_rearm_rto(sk); 3) In tcp_ack() after tcp_fastretrans_alert() switch from RTO to TLP at now + RTO_interval: if (icsk->icsk_pending == ICSK_TIME_RETRANS) tcp_schedule_loss_probe(sk); In df92c839 we removed that 3-phase dance, and instead directly set the TLP timer once: we set the TLP timer in cases like this to write_queue_head->packet_tx_time + RTO_interval. So if the RTT variance is small, then this means that this is setting the TLP timer to fire quite soon. This means if the ACK for the tail of the flight takes longer than an RTT to arrive (often due to delayed ACKs), then the TLP timer fires too quickly. Fixes: df92c839 ("tcp: fix xmit timer to only be reset if data ACKed/SACKed") Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 11月, 2017 1 次提交
-
-
由 Xin Long 提交于
Now in sctp_setsockopt_maxseg user_frag or frag_point can be set with val >= 8 and val <= SCTP_MAX_CHUNK_LEN. But both checks are incorrect. val >= 8 means frag_point can even be less than SCTP_DEFAULT_MINSEGMENT. Then in sctp_datamsg_from_user(), when it's value is greater than cookie echo len and trying to bundle with cookie echo chunk, the first_len will overflow. The worse case is when it's value is equal as cookie echo len, first_len becomes 0, it will go into a dead loop for fragment later on. In Hangbin syzkaller testing env, oom was even triggered due to consecutive memory allocation in that loop. Besides, SCTP_MAX_CHUNK_LEN is the max size of the whole chunk, it should deduct the data header for frag_point or user_frag check. This patch does a proper check with SCTP_DEFAULT_MINSEGMENT subtracting the sctphdr and datahdr, SCTP_MAX_CHUNK_LEN subtracting datahdr when setting frag_point via sockopt. It also improves sctp_setsockopt_maxseg codes. Suggested-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Reported-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 11月, 2017 3 次提交
-
-
Patch series "kmemcheck: kill kmemcheck", v2. As discussed at LSF/MM, kill kmemcheck. KASan is a replacement that is able to work without the limitation of kmemcheck (single CPU, slow). KASan is already upstream. We are also not aware of any users of kmemcheck (or users who don't consider KASan as a suitable replacement). The only objection was that since KASAN wasn't supported by all GCC versions provided by distros at that time we should hold off for 2 years, and try again. Now that 2 years have passed, and all distros provide gcc that supports KASAN, kill kmemcheck again for the very same reasons. This patch (of 4): Remove kmemcheck annotations, and calls to kmemcheck from the kernel. [alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs] Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.comSigned-off-by: NSasha Levin <alexander.levin@verizon.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Tim Hansen <devtimhansen@gmail.com> Cc: Vegard Nossum <vegardno@ifi.uio.no> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
Add sparse-checked slab_flags_t for struct kmem_cache::flags (SLAB_POISON, etc). SLAB is bloated temporarily by switching to "unsigned long", but only temporarily. Link: http://lkml.kernel.org/r/20171021100225.GA22428@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Acked-by: NPekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Kubecek 提交于
According to the description, first argument of genlmsg_nlhdr() points to what genlmsg_put() returns, i.e. beginning of user header. Therefore we should only subtract size of genetlink header and netlink message header, not user header. This also means we don't need to pass the pointer to genetlink family and the same is true for genl_dump_check_consistent() which is the only caller of genlmsg_nlhdr(). (Note that at the moment, these functions are only used for families which do not have user header so that they are not affected.) Fixes: 670dc283 ("netlink: advertise incomplete dumps") Signed-off-by: NMichal Kubecek <mkubecek@suse.cz> Reviewed-by: NJohannes Berg <johannes@sipsolutions.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 11月, 2017 3 次提交
-
-
由 Eric Dumazet 提交于
syzbot easily found a regression added in our latest patches [1] No longer set tp->highest_sack to the head of the send queue since this is not logical and error prone. Only sack processing should maintain the pointer to an skb from rtx queue. We might in the future only remember the sequence instead of a pointer to skb, since rb-tree should allow a fast lookup. [1] BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1706 [inline] BUG: KASAN: use-after-free in tcp_ack+0x42bb/0x4fd0 net/ipv4/tcp_input.c:3537 Read of size 4 at addr ffff8801c154faa8 by task syz-executor4/12860 CPU: 0 PID: 12860 Comm: syz-executor4 Not tainted 4.14.0-next-20171113+ #41 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:17 [inline] dump_stack+0x194/0x257 lib/dump_stack.c:53 print_address_description+0x73/0x250 mm/kasan/report.c:252 kasan_report_error mm/kasan/report.c:351 [inline] kasan_report+0x25b/0x340 mm/kasan/report.c:409 __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429 tcp_highest_sack_seq include/net/tcp.h:1706 [inline] tcp_ack+0x42bb/0x4fd0 net/ipv4/tcp_input.c:3537 tcp_rcv_established+0x672/0x18a0 net/ipv4/tcp_input.c:5439 tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1468 sk_backlog_rcv include/net/sock.h:909 [inline] __release_sock+0x124/0x360 net/core/sock.c:2264 release_sock+0xa4/0x2a0 net/core/sock.c:2778 tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:632 [inline] sock_sendmsg+0xca/0x110 net/socket.c:642 ___sys_sendmsg+0x75b/0x8a0 net/socket.c:2048 __sys_sendmsg+0xe5/0x210 net/socket.c:2082 SYSC_sendmsg net/socket.c:2093 [inline] SyS_sendmsg+0x2d/0x50 net/socket.c:2089 entry_SYSCALL_64_fastpath+0x1f/0x96 RIP: 0033:0x452879 RSP: 002b:00007fc9761bfbe8 EFLAGS: 00000212 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 0000000000758020 RCX: 0000000000452879 RDX: 0000000000000000 RSI: 0000000020917fc8 RDI: 0000000000000015 RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000212 R12: 00000000006ee3a0 R13: 00000000ffffffff R14: 00007fc9761c06d4 R15: 0000000000000000 Allocated by task 12860: save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489 kmem_cache_alloc_node+0x144/0x760 mm/slab.c:3638 __alloc_skb+0xf1/0x780 net/core/skbuff.c:193 alloc_skb_fclone include/linux/skbuff.h:1023 [inline] sk_stream_alloc_skb+0x11d/0x900 net/ipv4/tcp.c:870 tcp_sendmsg_locked+0x1341/0x3b80 net/ipv4/tcp.c:1299 tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1461 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:632 [inline] sock_sendmsg+0xca/0x110 net/socket.c:642 SYSC_sendto+0x358/0x5a0 net/socket.c:1749 SyS_sendto+0x40/0x50 net/socket.c:1717 entry_SYSCALL_64_fastpath+0x1f/0x96 Freed by task 12860: save_stack+0x43/0xd0 mm/kasan/kasan.c:447 set_track mm/kasan/kasan.c:459 [inline] kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524 __cache_free mm/slab.c:3492 [inline] kmem_cache_free+0x77/0x280 mm/slab.c:3750 kfree_skbmem+0xdd/0x1d0 net/core/skbuff.c:603 __kfree_skb+0x1d/0x20 net/core/skbuff.c:642 sk_wmem_free_skb include/net/sock.h:1419 [inline] tcp_rtx_queue_unlink_and_free include/net/tcp.h:1682 [inline] tcp_clean_rtx_queue net/ipv4/tcp_input.c:3111 [inline] tcp_ack+0x1b17/0x4fd0 net/ipv4/tcp_input.c:3593 tcp_rcv_established+0x672/0x18a0 net/ipv4/tcp_input.c:5439 tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1468 sk_backlog_rcv include/net/sock.h:909 [inline] __release_sock+0x124/0x360 net/core/sock.c:2264 release_sock+0xa4/0x2a0 net/core/sock.c:2778 tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763 sock_sendmsg_nosec net/socket.c:632 [inline] sock_sendmsg+0xca/0x110 net/socket.c:642 ___sys_sendmsg+0x75b/0x8a0 net/socket.c:2048 __sys_sendmsg+0xe5/0x210 net/socket.c:2082 SYSC_sendmsg net/socket.c:2093 [inline] SyS_sendmsg+0x2d/0x50 net/socket.c:2089 entry_SYSCALL_64_fastpath+0x1f/0x96 The buggy address belongs to the object at ffff8801c154fa80 which belongs to the cache skbuff_fclone_cache of size 456 The buggy address is located 40 bytes inside of 456-byte region [ffff8801c154fa80, ffff8801c154fc48) The buggy address belongs to the page: page:ffffea00070553c0 count:1 mapcount:0 mapping:ffff8801c154f080 index:0x0 flags: 0x2fffc0000000100(slab) raw: 02fffc0000000100 ffff8801c154f080 0000000000000000 0000000100000006 raw: ffffea00070a5a20 ffffea0006a18360 ffff8801d9ca0500 0000000000000000 page dumped because: kasan: bad access detected Fixes: 737ff314 ("tcp: use sequence distance to detect reordering") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Hemminger 提交于
Make default TCP default congestion control to a per namespace value. This changes default congestion control to a pointer to congestion ops (rather than implicit as first element of available lsit). The congestion control setting of new namespaces is inherited from the current setting of the root namespace. Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dmitry V. Levin 提交于
Move inclusion of a private kernel header <net/tcp.h> from uapi/linux/tls.h to its only user - net/tls.h, to fix the following linux/tls.h userspace compilation error: /usr/include/linux/tls.h:41:21: fatal error: net/tcp.h: No such file or directory As to this point uapi/linux/tls.h was totaly unusuable for userspace, cleanup this header file further by moving other redundant includes to net/tls.h. Fixes: 3c4d7559 ("tls: kernel TLS support") Cc: <stable@vger.kernel.org> # v4.13+ Signed-off-by: NDmitry V. Levin <ldv@altlinux.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 11月, 2017 4 次提交
-
-
由 Ilya Lesokhin 提交于
move tls_make_aad as it is going to be reused by the device offload code and rx path. Remove unused recv parameter. Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilya Lesokhin 提交于
Previously the TLS ulp context would leak if we attached a TLS ulp to a socket but did not use the TLS_TX setsockopt, or did use it but it failed. This patch solves the issue by overriding prot[TLS_BASE_TX].close and fixing tls_sk_proto_close to work properly when its called with ctx->tx_conf == TLS_BASE_TX. This patch also removes ctx->free_resources as we can use ctx->tx_conf to obtain the relevant information. Fixes: 3c4d7559 ('tls: kernel TLS support') Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ilya Lesokhin 提交于
The tx configuration is now stored in ctx->tx_conf. And sk->sk_prot is updated trough a function This will simplify things when we add rx and support for different possible tx and rx cross configurations. Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
I had many reports that TSQ logic breaks wifi aggregation. Current logic is to allow up to 1 ms of bytes to be queued into qdisc and drivers queues. But Wifi aggregation needs a bigger budget to allow bigger rates to be discovered by various TCP Congestion Controls algorithms. This patch adds an extra socket field, allowing wifi drivers to select another log scale to derive TCP Small Queue credit from current pacing rate. Initial value is 10, meaning that this patch does not change current behavior. We expect wifi drivers to set this field to smaller values (tests have been done with values from 6 to 9) They would have to use following template : if (skb->sk && skb->sk->sk_pacing_shift != MY_PACING_SHIFT) skb->sk->sk_pacing_shift = MY_PACING_SHIFT; Ref: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1670041Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Johannes Berg <johannes.berg@intel.com> Cc: Toke Høiland-Jørgensen <toke@toke.dk> Cc: Kir Kolyshkin <kir@openvz.org> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 11月, 2017 2 次提交
-
-
由 Florian Fainelli 提交于
Add a new type: DSA_TAG_PROTO_PREPEND which allows us to support for the 4-bytes Broadcom tag that we already support, but in a format where it is pre-pended to the packet instead of located between the MAC SA and the Ethertyper (DSA_TAG_PROTO_BRCM). Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Florian Fainelli 提交于
A number of drivers want to check whether the configured CPU port is a possible configuration for enabling tagging, pass down the CPU port number so they verify that. Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com> Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 11月, 2017 3 次提交
-
-
由 Mat Martineau 提交于
ip6_frag_id was only used by UFO, which has been removed. ipv6_proxy_select_ident() only existed to set ip6_frag_id and has no in-tree callers. Signed-off-by: NMat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
FACK loss detection has been disabled by default and the successor RACK subsumed FACK and can handle reordering better. This patch removes FACK to simplify TCP loss recovery. Signed-off-by: NYuchung Cheng <ycheng@google.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Reviewed-by: NNeal Cardwell <ncardwell@google.com> Reviewed-by: NSoheil Hassas Yeganeh <soheil@google.com> Reviewed-by: NPriyaranjan Jha <priyarjha@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Maloy 提交于
Currently, the TIPC RPS dissector is based only on the incoming packets' source node address, hence steering all traffic from a node to the same core. We have seen that this makes the links vulnerable to starvation and unnecessary resets when we turn down the link tolerance to very low values. To reduce the risk of this happening, we exempt probe and probe replies packets from the convergence to one core per source node. Instead, we do the opposite, - we try to diverge those packets across as many cores as possible, by randomizing the flow selector key. To make such packets identifiable to the dissector, we add a new 'is_keepalive' bit to word 0 of the LINK_PROTOCOL header. This bit is set both for PROBE and PROBE_REPLY messages, and only for those. It should be noted that these packets are not part of any flow anyway, and only constitute a minuscule fraction of all packets sent across a link. Hence, there is no risk that this will affect overall performance. Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 11月, 2017 4 次提交
-
-
由 Manish Kurup 提交于
Using a spinlock in the VLAN action causes performance issues when the VLAN action is used on multiple cores. Rewrote the VLAN action to use RCU read locking for reads and updates instead. All functions now use an RCU dereferenced pointer to access the VLAN action context. Modified helper functions used by other modules, to use the RCU as opposed to directly accessing the structure. Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NManish Kurup <manish.kurup@verizon.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Note that when a new netns is created, it inherits its sysctl_tcp_rmem and sysctl_tcp_wmem from initial netns. This change is needed so that we can refine TCP rcvbuf autotuning, to take RTT into consideration. Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
As we want to gradually implement per netns sysctl_rmem and sysctl_wmem on per protocol basis, add two new fields in struct proto, and two new helpers : sk_get_wmem0() and sk_get_rmem0() First user will be TCP. Then UDP and SCTP can be easily converted, while DECNET probably wont get this support. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andrew Lunn 提交于
When the host joins or leaves a multicast group, use switchdev to add an object to the hardware to forward traffic for the group to the host. Signed-off-by: NAndrew Lunn <andrew@lunn.ch> Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 11月, 2017 4 次提交
-
-
由 Cong Wang 提交于
Instead of holding netns refcnt in tc actions, we can minimize the holding time by saving it in struct tcf_exts instead. This means we can just hold netns refcnt right before call_rcu() and release it after tcf_exts_destroy() is done. However, because on netns cleanup path we call tcf_proto_destroy() too, obviously we can not hold netns for a zero refcnt, in this case we have to do cleanup synchronously. It is fine for RCU too, the caller cleanup_net() already waits for a grace period. For other cases, refcnt is non-zero and we can safely grab it as normal and release it after we are done. This patch provides two new API for each filter to use: tcf_exts_get_net() and tcf_exts_put_net(). And all filters now can use the following pattern: void __destroy_filter() { tcf_exts_destroy(); tcf_exts_put_net(); // <== release netns refcnt kfree(); } void some_work() { rtnl_lock(); __destroy_filter(); rtnl_unlock(); } void some_rcu_callback() { tcf_queue_work(some_work); } if (tcf_exts_get_net()) // <== hold netns refcnt call_rcu(some_rcu_callback); else __destroy_filter(); Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Cong Wang 提交于
This reverts commit ceffcc5e. If we hold that refcnt, the netns can never be destroyed until all actions are destroyed by user, this breaks our netns design which we expect all actions are destroyed when we destroy the whole netns. Cc: Lucas Bates <lucasb@mojatatu.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vivien Didelot 提交于
This commit provides better scope for the DSA tree setup and teardown functions. It renames the "applied" bool to "setup" and print a message when the tree is setup, as it is done during teardown. At the same time, check dst->setup in dsa_tree_setup, where it is set to true. Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vivien Didelot 提交于
A DSA port has a dedicated CPU port assigned to it, stored in the cpu_dp member. It is not meant to be modified by a port, thus make it const. Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 11月, 2017 2 次提交
-
-
由 Yi Yang 提交于
v16->17 - Fixed disputed check code: keep them in nsh_push and nsh_pop but also add them in __ovs_nla_copy_actions v15->v16 - Add csum recalculation for nsh_push, nsh_pop and set_nsh pointed out by Pravin - Move nsh key into the union with ipv4 and ipv6 and add check for nsh key in match_validate pointed out by Pravin - Add nsh check in validate_set and __ovs_nla_copy_actions v14->v15 - Check size in nsh_hdr_from_nlattr - Fixed four small issues pointed out By Jiri and Eric v13->v14 - Rename skb_push_nsh to nsh_push per Dave's comment - Rename skb_pop_nsh to nsh_pop per Dave's comment v12->v13 - Fix NSH header length check in set_nsh v11->v12 - Fix missing changes old comments pointed out - Fix new comments for v11 v10->v11 - Fix the left three disputable comments for v9 but not fixed in v10. v9->v10 - Change struct ovs_key_nsh to struct ovs_nsh_key_base base; __be32 context[NSH_MD1_CONTEXT_SIZE]; - Fix new comments for v9 v8->v9 - Fix build error reported by daily intel build because nsh module isn't selected by openvswitch v7->v8 - Rework nested value and mask for OVS_KEY_ATTR_NSH - Change pop_nsh to adapt to nsh kernel module - Fix many issues per comments from Jiri Benc v6->v7 - Remove NSH GSO patches in v6 because Jiri Benc reworked it as another patch series and they have been merged. - Change it to adapt to nsh kernel module added by NSH GSO patch series v5->v6 - Fix the rest comments for v4. - Add NSH GSO support for VxLAN-gpe + NSH and Eth + NSH. v4->v5 - Fix many comments by Jiri Benc and Eric Garver for v4. v3->v4 - Add new NSH match field ttl - Update NSH header to the latest format which will be final format and won't change per its author's confirmation. - Fix comments for v3. v2->v3 - Change OVS_KEY_ATTR_NSH to nested key to handle length-fixed attributes and length-variable attriubte more flexibly. - Remove struct ovs_action_push_nsh completely - Add code to handle nested attribute for SET_MASKED - Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH to transfer NSH header data. - Fix comments and coding style issues by Jiri and Eric v1->v2 - Change encap_nsh and decap_nsh to push_nsh and pop_nsh - Dynamically allocate struct ovs_action_push_nsh for length-variable metadata. OVS master and 2.8 branch has merged NSH userspace patch series, this patch is to enable NSH support in kernel data path in order that OVS can support NSH in compat mode by porting this. Signed-off-by: NYi Yang <yi.y.yang@intel.com> Acked-by: NJiri Benc <jbenc@redhat.com> Acked-by: NEric Garver <e@erig.me> Acked-by: NPravin Shelar <pshelar@ovn.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nogah Frankel 提交于
Add the ability to offload RED qdisc by using ndo_setup_tc. There are four commands for RED offloading: * TC_RED_SET: handles set and change. * TC_RED_DESTROY: handle qdisc destroy. * TC_RED_STATS: update the qdiscs counters (given as reference) * TC_RED_XSTAT: returns red xstats. Whether RED is being offloaded is being determined every time dump action is being called because parent change of this qdisc could change its offload state but doesn't require any RED function to be called. Signed-off-by: NNogah Frankel <nogahf@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 11月, 2017 1 次提交
-
-
由 Pablo Neira Ayuso 提交于
This patch adds a new get operation to look up for specific elements in a set via netlink interface. You can also use it to check if an interval already exists. Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 06 11月, 2017 1 次提交
-
-
由 Florian Westphal 提交于
We currently call ->nlattr_tuple_size() once at register time and cache result in l4proto->nla_size. nla_size is the only member that is written to, avoiding this would allow to make l4proto trackers const. We can use ->nlattr_tuple_size() at run time, and cache result in the individual trackers instead. This is an intermediate step, next patch removes nlattr_size() callback and computes size at compile time, then removes nla_size. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 05 11月, 2017 1 次提交
-
-
由 Priyaranjan Jha 提交于
Currently TCP RACK loss detection does not work well if packets are being reordered beyond its static reordering window (min_rtt/4).Under such reordering it may falsely trigger loss recoveries and reduce TCP throughput significantly. This patch improves that by increasing and reducing the reordering window based on DSACK, which is now supported in major TCP implementations. It makes RACK's reo_wnd adaptive based on DSACK and no. of recoveries. - If DSACK is received, increment reo_wnd by min_rtt/4 (upper bounded by srtt), since there is possibility that spurious retransmission was due to reordering delay longer than reo_wnd. - Persist the current reo_wnd value for TCP_RACK_RECOVERY_THRESH (16) no. of successful recoveries (accounts for full DSACK-based loss recovery undo). After that, reset it to default (min_rtt/4). - At max, reo_wnd is incremented only once per rtt. So that the new DSACK on which we are reacting, is due to the spurious retx (approx) after the reo_wnd has been updated last time. - reo_wnd is tracked in terms of steps (of min_rtt/4), rather than absolute value to account for change in rtt. In our internal testing, we observed significant increase in throughput, in scenarios where reordering exceeds min_rtt/4 (previous static value). Signed-off-by: NPriyaranjan Jha <priyarjha@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-