- 16 11月, 2018 1 次提交
-
-
由 Cong Wang 提交于
Currently netdev_rx_csum_fault() only shows a device name, we need more information about the skb for debugging csum failures. Sample output: ens3: hw csum failure dev features: 0x0000000000014b89 skb len=84 data_len=0 pkt_type=0 gso_size=0 gso_type=0 nr_frags=0 ip_summed=0 csum=0 csum_complete_sw=0 csum_valid=0 csum_level=0 Note, I use pr_err() just to be consistent with the existing one. Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 11月, 2018 4 次提交
-
-
由 Jakub Kicinski 提交于
RED qdisc's limit parameter changes the behaviour of the qdisc, for instance if it's set to 0 qdisc will drop all the packets. When replace operation happens and parameter is set to non-0 a new fifo qdisc will be instantiated and replace the old child qdisc which will be destroyed. Drivers need to know the parameter, even if they don't impose the actual limit to be able to reliably reconstruct the Qdisc hierarchy. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NJohn Hurley <john.hurley@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Drivers offloading Qdiscs should have reasonable certainty the offloaded behaviour matches the SW path. This is impossible if the driver does not know about all Qdiscs or when Qdiscs move and are reused. Send a graft notification from MQ. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NJohn Hurley <john.hurley@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Drivers offloading Qdiscs should have reasonable certainty the offloaded behaviour matches the SW path. This is impossible if the driver does not know about all Qdiscs or when Qdiscs move and are reused. Send a graft notification from RED. The drivers are expected to simply stop offloading the Qdisc, if a non-standard child is ever grafted onto it. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NJohn Hurley <john.hurley@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
Drivers are currently not notified when a Qdisc is grafted as root. This requires special casing Qdiscs added with parent = TC_H_ROOT in the driver. Also there is no notification sent to the driver when an existing Qdisc is grafted as root. Add this very simple notifications, drivers should now be able to track their Qdisc tree fully. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NJohn Hurley <john.hurley@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 11月, 2018 3 次提交
-
-
由 Xin Long 提交于
When socks' sk_reuseport is set, the same port and address are allowed to be bound into these socks who have the same uid. Note that the difference from sk_reuse is that it allows multiple socks to listen on the same port and address. Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This is a part of sk_reuseport support for sctp. It defines a helper sctp_bind_addrs_check() to check if the bind_addrs in two socks are matched. It will add sock_reuseport if they are completely matched, and return err if they are partly matched, and alloc sock_reuseport if all socks are not matched at all. It will work until sk_reuseport support is added in sctp_get_port_local() in the next patch. v1->v2: - use 'laddr->valid && laddr2->valid' check instead as Marcelo pointed in sctp_bind_addrs_check(). Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This is a part of sk_reuseport support for sctp, and it selects a sock by the hashkey of lport, paddr and dport by default. It will work until sk_reuseport support is added in sctp_get_port_local() in the next patch. v1->v2: - define lport as __be16 instead of __be32 as Marcelo pointed in __sctp_rcv_lookup_endpoint(). Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 11月, 2018 10 次提交
-
-
由 Eric Dumazet 提交于
Similar to 80ba92fa ("codel: add ce_threshold attribute") After EDT adoption, it became easier to implement DCTCP-like CE marking. In many cases, queues are not building in the network fabric but on the hosts themselves. If packets leaving fq missed their Earliest Departure Time by XXX usec, we mark them with ECN CE. This gives a feedback (after one RTT) to the sender to slow down and find better operating mode. Example : tc qd replace dev eth0 root fq ce_threshold 2.5ms Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
FQ pacing guarantees that paced packets queued by one flow do not add head-of-line blocking for other flows. After TCP GSO conversion, increasing limit_output_bytes to 1 MB is safe, since this maps to 16 skbs at most in qdisc or device queues. (or slightly more if some drivers lower {gso_max_segs|size}) We still can queue at most 1 ms worth of traffic (this can be scaled by wifi drivers if they need to) Tested: # ethtool -c eth0 | egrep "tx-usecs:|tx-frames:" # 40 Gbit mlx4 NIC tx-usecs: 16 tx-frames: 16 # tc qdisc replace dev eth0 root fq # for f in {1..10};do netperf -P0 -H lpaa24,6 -o THROUGHPUT;done Before patch: 27711 26118 27107 27377 27712 27388 27340 27117 27278 27509 After patch: 37434 36949 36658 36998 37711 37291 37605 36659 36544 37349 Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
tcp_tso_should_defer() first heuristic is to not defer if last send is "old enough". Its current implementation uses jiffies and its low granularity. TSO autodefer performance should not rely on kernel HZ :/ After EDT conversion, we have state variables in nanoseconds that can allow us to properly implement the heuristic. This patch increases TSO chunk sizes on medium rate flows, especially when receivers do not use GRO or similar aggregation. It also reduces bursts for HZ=100 or HZ=250 kernels, making TCP behavior more uniform. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
tcp_tso_should_defer() last step tries to check if the probable next ACK packet is coming in less than half rtt. Problem is that the head->tstamp might be in the future, so we need to use signed arithmetics to avoid overflows. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Applications using MSG_EOR are giving a strong hint to TCP stack : Subsequent sendmsg() can not append more bytes to skbs having the EOR mark. Do not try to TSO defer suchs skbs, there is really no hope. Signed-off-by: NEric Dumazet <edumazet@google.com> Acked-by: NSoheil Hassas Yeganeh <soheil@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yafang Shao 提交于
Bitwise operation is a little faster. So I replace after() with using the flag FLAG_SND_UNA_ADVANCED as it is already set before. In addtion, there's another similar improvement in tcp_cwnd_reduction(). Cc: Joe Perches <joe@perches.com> Suggested-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NYafang Shao <laoar.shao@gmail.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
If sch_fq is used at ingress, skbs that might have been timestamped by net_timestamp_set() if a packet capture is requesting timestamps could be delayed by arbitrary amount of time, since sch_fq time base is MONOTONIC. Fix this problem by moving code from sch_netem.c to act_mirred.c. Fixes: fb420d5d ("tcp/fq: move back to CLOCK_MONOTONIC") Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jon Maloy 提交于
When a link failure is detected locally, the link is reset, the flag link->in_session is set to false, and a RESET_MSG with the 'stopping' bit set is sent to the peer. The purpose of this bit is to inform the peer that this endpoint just is going down, and that the peer should handle the reception of this particular RESET message as a local failure. This forces the peer to accept another RESET or ACTIVATE message from this endpoint before it can re-establish the link. This again is necessary to ensure that link session numbers are properly exchanged before the link comes up again. If a failure is detected locally at the same time at the peer endpoint this will do the same, which is also a correct behavior. However, when receiving such messages, the endpoints will not distinguish between 'stopping' RESETs and ordinary ones when it comes to updating session numbers. Both endpoints will copy the received session number and set their 'in_session' flags to true at the reception, while they are still expecting another RESET from the peer before they can go ahead and re-establish. This is contradictory, since, after applying the validation check referred to below, the 'in_session' flag will cause rejection of all such messages, and the link will never come up again. We now fix this by not only handling received RESET/STOPPING messages as a local failure, but also by omitting to set a new session number and the 'in_session' flag in such cases. Fixes: 7ea817f4 ("tipc: check session number before accepting link protocol messages") Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 LUU Duc Canh 提交于
Currently, the broadcast retransmission algorithm is using the 'prev_retr' field in struct tipc_link to time stamp the latest broadcast retransmission occasion. This helps to restrict retransmission of individual broadcast packets to max once per 10 milliseconds, even though all other criteria for retransmission are met. We now move this time stamp to the control block of each individual packet, and remove other limiting criteria. This simplifies the retransmission algorithm, and eliminates any risk of logical errors in selecting which packets can be retransmitted. Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NLUU Duc Canh <canh.d.luu@dektech.com.au> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Hurley 提交于
Currently drivers can register to receive TC block bind/unbind callbacks by implementing the setup_tc ndo in any of their given netdevs. However, drivers may also be interested in binds to higher level devices (e.g. tunnel drivers) to potentially offload filters applied to them. Introduce indirect block devs which allows drivers to register callbacks for block binds on other devices. The callback is triggered when the device is bound to a block, allowing the driver to register for rules applied to that block using already available functions. Freeing an indirect block callback will trigger an unbind event (if necessary) to direct the driver to remove any offloaded rules and unreg any block rule callbacks. It is the responsibility of the implementing driver to clean any registered indirect block callbacks before exiting, if the block it still active at such a time. Allow registering an indirect block dev callback for a device that is already bound to a block. In this case (if it is an ingress block), register and also trigger the callback meaning that any already installed rules can be replayed to the calling driver. Signed-off-by: NJohn Hurley <john.hurley@netronome.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 11月, 2018 5 次提交
-
-
由 David S. Miller 提交于
Same change as made to sctp_intl_store_reasm(). To be fully correct, an iterator has an undefined value when something like skb_queue_walk() naturally terminates. This will actually matter when SKB queues are converted over to list_head. Formalize what this code ends up doing with the current implementation. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
To be fully correct, an iterator has an undefined value when something like skb_queue_walk() naturally terminates. This will actually matter when SKB queues are converted over to list_head. Formalize what this code ends up doing with the current implementation. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Eliminate the assumption that SKBs and SKB list heads can be cast to eachother in SKB list handling code. This change also appears to fix a bug since the list->next pointer is sampled outside of holding the SKB queue lock. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michał Mirosław 提交于
It turns out I missed one VLAN_TAG_PRESENT in OVS code while rebasing. This fixes it. Fixes: 9df46aef ("OVS: remove use of VLAN_TAG_PRESENT") Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jakub Kicinski 提交于
TCA_FLOWER_KEY_ENC_OPTS and TCA_FLOWER_KEY_ENC_OPTS_MASK can only currently contain further nested attributes, which are parsed by hand, so the policy is never actually used resulting in a W=1 build warning: net/sched/cls_flower.c:492:1: warning: ‘enc_opts_policy’ defined but not used [-Wunused-const-variable=] enc_opts_policy[TCA_FLOWER_KEY_ENC_OPTS_MAX + 1] = { Add the validation anyway to avoid potential bugs when other attributes are added and to make the attribute structure slightly more clear. Validation will also set extact to point to bad attribute on error. Fixes: 0a6e7778 ("net/sched: allow flower to match tunnel options") Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Acked-by: NSimon Horman <simon.horman@netronome.com> Acked-by: NJiri Pirko <jiri@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 10 11月, 2018 3 次提交
-
-
由 Paolo Abeni 提交于
In the udp6 code path, we needed multiple tests to select the correct mib to be updated. Since we touch at least a counter at each iteration, it's convenient to use the recently introduced __UDPX_MIB() helper once and remove some code duplication. Signed-off-by: NPaolo Abeni <pabeni@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 배석진 提交于
Only first fragment has the sport/dport information, not the following ones. If we want consistent hash for all fragments, we need to ignore ports even for first fragment. This bug is visible for IPv6 traffic, if incoming fragments do not have a flow label, since skb_get_hash() will give different results for first fragment and following ones. It is also visible if any routing rule wants dissection and sport or dport. See commit 5e5d6fed ("ipv6: route: dissect flow in input path if fib rules need it") for details. [edumazet] rewrote the changelog completely. Fixes: 06635a35 ("flow_dissect: use programable dissector in skb_flow_dissect and friends") Signed-off-by: N배석진 <soukjin.bae@samsung.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Li RongQing 提交于
if skb is NULL pointer, and the following access of skb's skb_mstamp_ns will trigger panic, which is same as BUG_ON Signed-off-by: NLi RongQing <lirongqing@baidu.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 11月, 2018 14 次提交
-
-
由 Jakub Kicinski 提交于
To mirror software behaviour on offload more precisely inform the drivers about the state of the harddrop flag. Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NJohn Hurley <john.hurley@netronome.com> Reviewed-by: NQuentin Monnet <quentin.monnet@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neal Cardwell 提交于
Recently, in commit ab408b6d ("tcp: switch tcp and sch_fq to new earliest departure time model"), the TCP BBR code switched to a new approach of using an explicit bbr_pacing_margin_percent for shaving a pacing rate "haircut", rather than the previous implict approach. Update an old comment to reflect the new approach. Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NSoheil Hassas Yeganeh <soheil@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michał Mirosław 提交于
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michał Mirosław 提交于
This removes assumption than vlan_tci != 0 when tag is present. Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michał Mirosław 提交于
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michał Mirosław 提交于
Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michał Mirosław 提交于
This removes assumptions about VLAN_TAG_PRESENT bit. Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Cong Wang 提交于
__skb_checksum_complete_head() and __skb_checksum_complete() are both declared in skbuff.h, they fit better in skbuff.c than datagram.c. Cc: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ivan Khoronzhuk 提交于
It's redundancy for the drivers to hold the list of vlans when absolutely the same list exists in vlan core. In most cases it's needed only to traverse the vlan devices, their vids and sync some settings with h/w, so add API to simplify this. At least some of these drivers also can benefit: grep "for_each.*vid" -r drivers/net/ethernet/ drivers/net/ethernet/hisilicon/hns3/hns3_enet.c: drivers/net/ethernet/synopsys/dwc-xlgmac-hw.c: drivers/net/ethernet/qlogic/qlge/qlge_main.c: drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c: drivers/net/ethernet/via/via-rhine.c: drivers/net/ethernet/via/via-velocity.c: drivers/net/ethernet/intel/igb/igb_main.c: drivers/net/ethernet/intel/ice/ice_main.c: drivers/net/ethernet/intel/e1000/e1000_main.c: drivers/net/ethernet/intel/i40e/i40e_main.c: drivers/net/ethernet/intel/e1000e/netdev.c: drivers/net/ethernet/intel/igbvf/netdev.c: drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c: drivers/net/ethernet/intel/ixgb/ixgb_main.c: drivers/net/ethernet/intel/ixgbe/ixgbe_main.c: drivers/net/ethernet/amd/xgbe/xgbe-dev.c: drivers/net/ethernet/emulex/benet/be_main.c: drivers/net/ethernet/neterion/vxge/vxge-main.c: drivers/net/ethernet/adaptec/starfire.c: drivers/net/ethernet/brocade/bna/bnad.c: Reviewed-by: NGrygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ivan Khoronzhuk 提交于
In order to avoid all table update, and only remove or add new address, the auxiliary function exists, named __hw_addr_sync_dev(). It allows end driver do nothing when nothing changed and add/rm when concrete address is firstly added or lastly removed. But it doesn't include cases when an address of real device or vlan was reused by other vlans or vlan/macval devices. For handaling events when address was reused/unreused the patch adds new auxiliary routine - __hw_addr_ref_sync_dev(). It allows to do nothing when nothing was changed and do updates only for an address being added/reused/deleted/unreused. Thus, clone address changes for vlans can be mirrored in the table. The function is exclusive with __hw_addr_sync_dev(). It's responsibility of the end driver to identify address vlan device, if it needs so. Signed-off-by: NIvan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michał Mirosław 提交于
This is a minimal change to allow removing of VLAN_TAG_PRESENT. It leaves OVS unable to use CFI bit, as fixing this would need a deeper surgery involving userspace interface. Signed-off-by: NMichał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Barmann 提交于
When setting the SO_MARK socket option, if the mark changes, the dst needs to be reset so that a new route lookup is performed. This fixes the case where an application wants to change routing by setting a new sk_mark. If this is done after some packets have already been sent, the dst is cached and has no effect. Signed-off-by: NDavid Barmann <david.barmann@stackpath.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Multiple cpus might attempt to insert a new fragment in rhashtable, if for example RPS is buggy, as reported by 배석진 in https://patchwork.ozlabs.org/patch/994601/ We use rhashtable_lookup_get_insert_key() instead of rhashtable_insert_fast() to let cpus losing the race free their own inet_frag_queue and use the one that was inserted by another cpu. Fixes: 648700f7 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: N배석진 <soukjin.bae@samsung.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Li RongQing 提交于
if local is NULL pointer, and the following access of local's dev will trigger panic, which is same as BUG_ON Signed-off-by: NLi RongQing <lirongqing@baidu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-