- 27 4月, 2016 1 次提交
-
-
由 Nicolas Dichtel 提交于
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 26 4月, 2016 5 次提交
-
-
由 Michal Kazior 提交于
This works on the same implementation principle as codel*.h, i.e. there's a generic header with structures and macros and a implementation header carrying function definitions to include in given, e.g. driver or module. The fairness logic comes from net/sched/sch_fq_codel.c but is generalized so it is more flexible and easier to re-use. Signed-off-by: NMichal Kazior <michal.kazior@tieto.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Kazior 提交于
It was impossible to include codel.h for the purpose of having access to codel_params or codel_vars structure definitions and using them for embedding in other more complex structures. This splits allows codel.h itself to be treated like any other header file while codel_qdisc.h and codel_impl.h contain function definitions with logic that was previously in codel.h. This copies over copyrights and doesn't involve code changes other than adding a few additional include directives to net/sched/sch*codel.c. Signed-off-by: NMichal Kazior <michal.kazior@tieto.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Kazior 提交于
This strips out qdisc specific bits from the code and makes it slightly more reusable. Codel will be used by wireless/mac80211 in the future. Signed-off-by: NMichal Kazior <michal.kazior@tieto.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
Commit 751a587a ("route: fix breakage after moving lwtunnel state") moved lwtstate to the end of dst_entry for 32bit archs. This makes it share the cacheline with __refcnt which had an unkown effect on performance. For this reason, the pointer was kept in place for 64bit archs. However, later performance measurements showed this is of no concern. It turns out that every performance sensitive path that accesses lwtstate accesses also struct rtable or struct rt6_info which share the same cache line. Thus, to get rid of a few #ifdefs, move the field to the end of the struct also for 64bit. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Craig Gallek 提交于
d894ba18 ("soreuseport: fix ordering for mixed v4/v6 sockets") was merged as a bug fix to the net tree. Two conflicting changes were committed to net-next before the above fix was merged back to net-next: ca065d0c ("udp: no longer use SLAB_DESTROY_BY_RCU") 3b24d854 ("tcp/dccp: do not touch listener sk_refcnt under synflood") These changes switched the datastructure used for TCP and UDP sockets from hlist_nulls to hlist. This patch applies the necessary parts of the net tree fix to net-next which were not automatic as part of the merge. Fixes: 1602f49b ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") Signed-off-by: NCraig Gallek <kraig@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 25 4月, 2016 3 次提交
-
-
由 Eric Dumazet 提交于
Valdis reported tons of stack dumps caused by WARN_ON() in sock_owned_by_user() This test needs to be relaxed if/when lockdep disables itself. Note that other lockdep_sock_is_held() callers are all from rcu_dereference_protected() sections which already are disabled if/when lockdep has been disabled. Fixes: fafc4e1e ("sock: tigthen lockdep checks for sock_owned_by_user") Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu> Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Linux TCP stack painfully segments all TSO/GSO packets before retransmits. This was fine back in the days when TSO/GSO were emerging, with their bugs, but we believe the dark age is over. Keeping big packets in write queues, but also in stack traversal has a lot of benefits. - Less memory overhead, because write queues have less skbs - Less cpu overhead at ACK processing. - Better SACK processing, as lot of studies mentioned how awful linux was at this ;) - Less cpu overhead to send the rtx packets (IP stack traversal, netfilter traversal, drivers...) - Better latencies in presence of losses. - Smaller spikes in fq like packet schedulers, as retransmits are not constrained by TCP Small Queues. 1 % packet losses are common today, and at 100Gbit speeds, this translates to ~80,000 losses per second. Losses are often correlated, and we see many retransmit events leading to 1-MSS train of packets, at the time hosts are already under stress. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Elad Raz 提交于
When using switchdev deferred operation (SWITCHDEV_F_DEFER), the operation is executed in different context and the application doesn't have any way to get the operation real status. Adding a completion callback fixes that. This patch adds fields to switchdev_attr and switchdev_obj "complete_priv" field which is used by the "complete" callback. Application can set a complete function which will be called once the operation executed. Signed-off-by: NElad Raz <eladr@mellanox.com> Signed-off-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NIdo Schimmel <idosch@mellanox.com> Acked-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 4月, 2016 6 次提交
-
-
由 Nicolas Dichtel 提交于
With this function, nla_data() is aligned on a 64-bit area. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
nla_data() is now aligned on a 64-bit area. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
nla_data() is now aligned on a 64-bit area. In fact, there is no user of this function. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
nla_data() is now aligned on a 64-bit area. The temporary function nla_put_be64_32bit() is removed in this patch. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
nla_data() is now aligned on a 64-bit area. A temporary version (nla_put_be64_32bit()) is added for nla_put_net64(). This function is removed in the next patch. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
nla_data() is now aligned on a 64-bit area. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 4月, 2016 6 次提交
-
-
由 Johan Hedberg 提交于
Extend the set of possible HCI bus types with SPI and I2C. Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Hannes Frederic Sowa 提交于
Equivalent to "vxlan: break dependency with netdev drivers", don't autoload geneve module in case the driver is loaded. Instead make the coupling weaker by using netdevice notifiers as proxy. Cc: Jesse Gross <jesse@kernel.org> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
Currently all drivers depend and autoload the vxlan module because how vxlan_get_rx_port is linked into them. Remove this dependency: By using a new event type in the netdevice notifier call chain we proxy the request from the drivers to flush and resetup the vxlan ports not directly via function call but by the already existing netdevice notifier call chain. I added a separate new event type, NETDEV_OFFLOAD_PUSH_VXLAN, to do so. We don't need to save those ids, as the event type field is an unsigned long and using specialized event types for this purpose seemed to be a more elegant way. This also comes in beneficial if in future we want to add offloading knobs for vxlan. Cc: Jesse Gross <jesse@kernel.org> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Martin KaFai Lau 提交于
After receiving sacks, tcp_shifted_skb() will collapse skbs if possible. tx_flags and tskey also have to be merged. This patch reuses the tcp_skb_collapse_tstamp() to handle them. BPF Output Before: ~~~~~ <no-output-due-to-missing-tstamp-event> BPF Output After: ~~~~~ <...>-2024 [007] d.s. 88.644374: : ee_data:14599 Packetdrill Script: ~~~~~ +0 `sysctl -q -w net.ipv4.tcp_min_tso_segs=10` +0 `sysctl -q -w net.ipv4.tcp_no_metrics_save=1` +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 +0 setsockopt(4, SOL_TCP, TCP_NODELAY, [1], 4) = 0 0.200 write(4, ..., 1460) = 1460 +0 setsockopt(4, SOL_SOCKET, 37, [2688], 4) = 0 0.200 write(4, ..., 13140) = 13140 0.200 > P. 1:1461(1460) ack 1 0.200 > . 1461:8761(7300) ack 1 0.200 > P. 8761:14601(5840) ack 1 0.300 < . 1:1(0) ack 1 win 257 <sack 1461:14601,nop,nop> 0.300 > P. 1:1461(1460) ack 1 0.400 < . 1:1(0) ack 14601 win 257 0.400 close(4) = 0 0.400 > F. 14601:14601(0) ack 1 0.500 < F. 1:1(0) ack 14602 win 257 0.500 > . 14602:14602(0) ack 2 Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Tested-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Nicolas Dichtel 提交于
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vivien Didelot 提交于
Having the tag protocol in dsa_switch_driver for setup time and in dsa_switch_tree for runtime is enough. Remove dsa_switch's one. Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 4月, 2016 1 次提交
-
-
由 David S. Miller 提交于
Netlink messages are appended, one object at a time, to the end of the SKB. Therefore we need to test skb_tail_pointer() not skb->data for alignment. Fixes: 35c58459 ("net: Add helpers for 64-bit aligning netlink attributes.") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 4月, 2016 3 次提交
-
-
由 Eric Dumazet 提交于
HAVE_EFFICIENT_UNALIGNED_ACCESS needs CONFIG_ prefix. Also add a comment in nla_align_64bit() explaining we have to add a padding if current skb->data is aligned, as it certainly can be confusing. Fixes: 35c58459 ("net: Add helpers for 64-bit aligning netlink attributes.") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Konstantin Khlebnikov 提交于
skb->sk could point to timewait or request socket which has no sk_classid. Detected as "BUG: KASAN: slab-out-of-bounds in cls_cgroup_classify". Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com> Suggested-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 4月, 2016 2 次提交
-
-
由 Florian Westphal 提交于
nf_connlabel_set() takes the bit number that we would like to set. nf_connlabels_get() however took the number of bits that we want to support. So e.g. nf_connlabels_get(32) support bits 0 to 31, but not 32. This changes nf_connlabels_get() to take the highest bit that we want to set. Callers then don't have to cope with a potential integer wrap when using nf_connlabels_get(bit + 1) anymore. Current callers are fine, this change is only to make folloup nft ct label set support simpler. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Florian Westphal 提交于
Currently labels can only be set either by iptables connlabel match or via ctnetlink. Before adding nftables set support, clean up the clabel core and move helpers that nft will not need after all to the xtables module. Signed-off-by: NFlorian Westphal <fw@strlen.de> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 18 4月, 2016 1 次提交
-
-
由 Vivien Didelot 提交于
Change the dsa_switch_driver.probe function to return a const char *. Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 4月, 2016 2 次提交
-
-
由 Alexander Duyck 提交于
This patch updates the IP tunnel core function iptunnel_handle_offloads so that we return an int and do not free the skb inside the function. This actually allows us to clean up several paths in several tunnels so that we can free the skb at one point in the path without having to have a secondary path if we are supporting tunnel offloads. In addition it should resolve some double-free issues I have found in the tunnels paths as I believe it is possible for us to end up triggering such an event in the case of fou or gue. Signed-off-by: NAlexander Duyck <aduyck@mirantis.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hannes Frederic Sowa 提交于
Due to the fact that the udp socket is destructed asynchronously in a work queue, we have some nondeterministic behavior during shutdown of vxlan tunnels and creating new ones. Fix this by keeping the destruction process synchronous in regards to the user space process so IFF_UP can be reliably set. udp_tunnel_sock_release destroys vs->sock->sk if reference counter indicates so. We expect to have the same lifetime of vxlan_sock and vxlan_sock->sock->sk even in fast paths with only rcu locks held. So only destruct the whole socket after we can be sure it cannot be found by searching vxlan_net->sock_list. Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jiri Benc <jbenc@redhat.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 4月, 2016 5 次提交
-
-
由 Xin Long 提交于
For some main variables in sctp.ko, we couldn't export it to other modules, so we have to define some api to access them. It will include sctp transport and endpoint's traversal. There are some transport traversal functions for sctp_diag, we can also use it for sctp_proc. cause they have the similar situation to traversal transport. v2->v3: - rhashtable_walk_init need the parameter gfp, because of recent upstrem update Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
sctp_diag will dump some important details of sctp's assoc or ep, we use sctp_info to describe them, sctp_get_sctp_info to get them, and export it to sctp_diag.ko. v2->v3: - we will not use list_for_each_safe in sctp_get_sctp_info, cause all the callers of it will use lock_sock. - fix the holes in struct sctp_info with __reserved* field. because sctp_diag is a new feature, and sctp_info is just for now, it may be changed in the future. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Marcelo Ricardo Leitner 提交于
SCTP already serializes access to rcvbuf through its sock lock: sctp_recvmsg takes it right in the start and release at the end, while rx path will also take the lock before doing any socket processing. On sctp_rcv() it will check if there is an user using the socket and, if there is, it will queue incoming packets to the backlog. The backlog processing will do the same. Even timers will do such check and re-schedule if an user is using the socket. Simplifying this will allow us to remove sctp_skb_list_tail and get ride of some expensive lockings. The lists that it is used on are also mangled with functions like __skb_queue_tail and __skb_unlink in the same context, like on sctp_ulpq_tail_event() and sctp_clear_pd(). sctp_close() will also purge those while using only the sock lock. Therefore the lockings performed by sctp_skb_list_tail() are not necessary. This patch removes this function and replaces its calls with just skb_queue_splice_tail_init() instead. The biggest gain is at sctp_ulpq_tail_event(), because the events always contain a list, even if it's queueing a single skb and this was triggering expensive calls to spin_lock_irqsave/_irqrestore for every data chunk received. As SCTP will deliver each data chunk on a corresponding recvmsg, the more effective the change will be. Before this patch, with chunks with 30 bytes: netperf -t SCTP_STREAM -H 192.168.1.2 -cC -l 60 -- -m 30 -S 400000 400000 -s 400000 400000 on a 10Gbit link with 1500 MTU: SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 425984 425984 30 60.00 137.45 7.34 7.36 52.504 52.608 With it: SCTP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.1.1 () port 0 AF_INET Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 425984 425984 30 60.00 179.10 7.97 6.70 43.740 36.788 Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
When removing sk_refcnt manipulation on synflood, I missed that using skb_set_owner_w() was racy, if sk->sk_wmem_alloc had already transitioned to 0. We should hold sk_refcnt instead, but this is a big deal under attack. (Doing so increase performance from 3.2 Mpps to 3.8 Mpps only) In this patch, I chose to not attach a socket to syncookies skb. Performance is now 5 Mpps instead of 3.2 Mpps. Following patch will remove last known false sharing in tcp_rcv_state_process() Fixes: 3b24d854 ("tcp/dccp: do not touch listener sk_refcnt under synflood") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Fixes: bf797471 ("devlink: add shared buffer configuration") Signed-off-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 4月, 2016 5 次提交
-
-
由 Craig Gallek 提交于
With the SO_REUSEPORT socket option, it is possible to create sockets in the AF_INET and AF_INET6 domains which are bound to the same IPv4 address. This is only possible with SO_REUSEPORT and when not using IPV6_V6ONLY on the AF_INET6 sockets. Prior to the commits referenced below, an incoming IPv4 packet would always be routed to a socket of type AF_INET when this mixed-mode was used. After those changes, the same packet would be routed to the most recently bound socket (if this happened to be an AF_INET6 socket, it would have an IPv4 mapped IPv6 address). The change in behavior occurred because the recent SO_REUSEPORT optimizations short-circuit the socket scoring logic as soon as they find a match. They did not take into account the scoring logic that favors AF_INET sockets over AF_INET6 sockets in the event of a tie. To fix this problem, this patch changes the insertion order of AF_INET and AF_INET6 addresses in the TCP and UDP socket lists when the sockets have SO_REUSEPORT set. AF_INET sockets will be inserted at the head of the list and AF_INET6 sockets with SO_REUSEPORT set will always be inserted at the tail of the list. This will force AF_INET sockets to always be considered first. Fixes: e32ea7e7 ("soreuseport: fast reuseport UDP socket selection") Fixes: 125e80b88687 ("soreuseport: fast reuseport TCP socket selection") Reported-by: NMaciej Żenczykowski <maze@google.com> Signed-off-by: NCraig Gallek <kraig@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Martin KaFai Lau 提交于
This patch adds a release_cb for UDPv6. It does a route lookup and updates sk->sk_dst_cache if it is needed. It picks up the left-over job from ip6_sk_update_pmtu() if the sk was owned by user during the pmtu update. It takes a rcu_read_lock to protect the __sk_dst_get() operations because another thread may do ip6_dst_store() without taking the sk lock (e.g. sendmsg). Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Reported-by: NWei Wang <weiwan@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Martin KaFai Lau 提交于
There is a case in connected UDP socket such that getsockopt(IPV6_MTU) will return a stale MTU value. The reproducible sequence could be the following: 1. Create a connected UDP socket 2. Send some datagrams out 3. Receive a ICMPV6_PKT_TOOBIG 4. No new outgoing datagrams to trigger the sk_dst_check() logic to update the sk->sk_dst_cache. 5. getsockopt(IPV6_MTU) returns the mtu from the invalid sk->sk_dst_cache instead of the newly created RTF_CACHE clone. This patch updates the sk->sk_dst_cache for a connected datagram sk during pmtu-update code path. Note that the sk->sk_v6_daddr is used to do the route lookup instead of skb->data (i.e. iph). It is because a UDP socket can become connected after sending out some datagrams in un-connected state. or It can be connected multiple times to different destinations. Hence, iph may not be related to where sk is currently connected to. It is done under '!sock_owned_by_user(sk)' condition because the user may make another ip6_datagram_connect() (i.e changing the sk->sk_v6_daddr) while dst lookup is happening in the pmtu-update code path. For the sock_owned_by_user(sk) == true case, the next patch will introduce a release_cb() which will update the sk->sk_dst_cache. Test: Server (Connected UDP Socket): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Route Details: [root@arch-fb-vm1 ~]# ip -6 r show | egrep '2fac' 2fac::/64 dev eth0 proto kernel metric 256 pref medium 2fac:face::/64 via 2fac::face dev eth0 metric 1024 pref medium A simple python code to create a connected UDP socket: import socket import errno HOST = '2fac::1' PORT = 8080 s = socket.socket(socket.AF_INET6, socket.SOCK_DGRAM) s.bind((HOST, PORT)) s.connect(('2fac:face::face', 53)) print("connected") while True: try: data = s.recv(1024) except socket.error as se: if se.errno == errno.EMSGSIZE: pmtu = s.getsockopt(41, 24) print("PMTU:%d" % pmtu) break s.close() Python program output after getting a ICMPV6_PKT_TOOBIG: [root@arch-fb-vm1 ~]# python2 ~/devshare/kernel/tasks/fib6/udp-connect-53-8080.py connected PMTU:1300 Cache routes after recieving TOOBIG: [root@arch-fb-vm1 ~]# ip -6 r show table cache 2fac:face::face via 2fac::face dev eth0 metric 0 cache expires 463sec mtu 1300 pref medium Client (Send the ICMPV6_PKT_TOOBIG): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ scapy is used to generate the TOOBIG message. Here is the scapy script I have used: >>> p=Ether(src='da:75:4d:36:ac:32', dst='52:54:00:12:34:66', type=0x86dd)/IPv6(src='2fac::face', dst='2fac::1')/ICMPv6PacketTooBig(mtu=1300)/IPv6(src='2fac:: 1',dst='2fac:face::face', nh='UDP')/UDP(sport=8080,dport=53) >>> sendp(p, iface='qemubr0') Fixes: 45e4fd26 ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Reported-by: NWei Wang <weiwan@google.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Wei Wang <weiwan@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
User needs to monitor shared buffer occupancy. For that, he issues a snapshot command in order to instruct hardware to catch current and maximal occupancy values, and clear command in order to clear the historical maximal values. Also port-pool and tc-pool-bind command response messages are extended to carry occupancy values. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Pirko 提交于
Define userspace API and drivers API for configuration of shared buffers. Four basic objects are defined: shared buffer - attributes are size, number of pools and TCs pool - chunk of sharedbuffer definition, it has some size and either static or dynamic threshold port pool threshold - to set per-port threshold for each pool port tc threshold bind - to bind port and TC to specified pool with threshold. Signed-off-by: NJiri Pirko <jiri@mellanox.com> Reviewed-by: NIdo Schimmel <idosch@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-