- 04 7月, 2018 6 次提交
-
-
由 Edward Cree 提交于
Just calls netif_receive_skb() in a loop. Signed-off-by: NEdward Cree <ecree@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
The transport with illegal flowlabel should not be allowed to send packets. Other transport protocols already denies this. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
Struct sockaddr_in6 has the member sin6_flowinfo that includes the ipv6 flowlabel, it should also support for setting flowlabel when adding a transport whose ipaddr is from userspace. Note that addrinfo in sctp_sendmsg is using struct in6_addr for the secondary addrs, which doesn't contain sin6_flowinfo, and it needs to copy sin6_flowinfo from the primary addr. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
spp_ipv6_flowlabel and spp_dscp are added in sctp_paddrparams in this patch so that users could set sctp_sock/asoc/transport dscp and flowlabel with spp_flags SPP_IPV6_FLOWLABEL or SPP_DSCP by SCTP_PEER_ADDR_PARAMS , as described section 8.1.12 in RFC6458. As said in last patch, it uses '| 0x100000' or '|0x1' to mark flowlabel or dscp is set, so that their values could be set to 0. Note that to guarantee that an old app built with old kernel headers could work on the newer kernel, the param's check in sctp_g/setsockopt_peer_addr_params() is also improved, which follows the way that sctp_g/setsockopt_delayed_ack() or some other sockopts' process that accept two types of params does. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
Like some other per transport params, flowlabel and dscp are added in transport, asoc and sctp_sock. By default, transport sets its value from asoc's, and asoc does it from sctp_sock. flowlabel only works for ipv6 transport. Other than that they need to be passed down in sctp_xmit, flow4/6 also needs to set them before looking up route in get_dst. Note that it uses '& 0x100000' to check if flowlabel is set and '& 0x1' (tos 1st bit is unused) to check if dscp is set by users, so that they could be set to 0 by sockopt in next patch. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Xin Long 提交于
This patch introduces __ip_queue_xmit(), through which the callers can pass tos param into it without having to set inet->tos. For ipv6, ip6_xmit() already allows passing tclass parameter. It's needed when some transport protocol doesn't use inet->tos, like sctp's per transport dscp, which will be added in next patch. Signed-off-by: NXin Long <lucien.xin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 02 7月, 2018 9 次提交
-
-
由 Roman Mashak 提交于
Signed-off-by: NRoman Mashak <mrv@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yafang Shao 提交于
Currently trace_sock_exceed_buf_limit() only show rmem info, but wmem limit may also be hit. So expose wmem info in this tracepoint as well. Regarding memcg, I think it is better to introduce a new tracepoint(if that is needed), i.e. trace_memcg_limit_hit other than show memcg info in trace_sock_exceed_buf_limit. Signed-off-by: NYafang Shao <laoar.shao@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Biggers 提交于
The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags, not 'gfp_t'. So don't pass GFP_KERNEL to it. Fixes: bf355b8d ("ipv6: sr: add core files for SR HMAC support") Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
Since the addition of GRO for ESP, gro_receive can consume the skb and return -EINPROGRESS. In that case, the lower layer GRO handler cannot touch the skb anymore. Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted some of the gro_receive handlers that can lead to ESP's gro_receive so that they wouldn't access the skb when -EINPROGRESS is returned, but missed other spots, mainly in tunneling protocols. This patch finishes the conversion to using skb_gro_flush_final(), and adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and GUE. Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Reviewed-by: NStefano Brivio <sbrivio@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amritha Nambiar 提交于
Extend transmit queue sysfs attribute to configure Rx queue(s) map per Tx queue. By default no receive queues are configured for the Tx queue. - /sys/class/net/eth0/queues/tx-*/xps_rxqs Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amritha Nambiar 提交于
This patch adds support to pick Tx queue based on the Rx queue(s) map configuration set by the admin through the sysfs attribute for each Tx queue. If the user configuration for receive queue(s) map does not apply, then the Tx queue selection falls back to CPU(s) map based selection and finally to hashing. Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amritha Nambiar 提交于
This patch adds a new field to sock_common 'skc_rx_queue_mapping' which holds the receive queue number for the connection. The Rx queue is marked in tcp_finish_connect() to allow a client app to do SO_INCOMING_NAPI_ID after a connect() call to get the right queue association for a socket. Rx queue is also marked in tcp_conn_request() to allow syn-ack to go on the right tx-queue associated with the queue on which syn is received. Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: NSridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amritha Nambiar 提交于
Use static_key for XPS maps to reduce the cost of extra map checks, similar to how it is used for RPS and RFS. This includes static_key 'xps_needed' for XPS and another for 'xps_rxqs_needed' for XPS using Rx queues map. Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Amritha Nambiar 提交于
Refactor XPS code to support Tx queue selection based on CPU(s) map or Rx queue(s) map. Signed-off-by: NAmritha Nambiar <amritha.nambiar@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 7月, 2018 1 次提交
-
-
由 Ilpo Järvinen 提交于
If SACK is not enabled and the first cumulative ACK after the RTO retransmission covers more than the retransmitted skb, a spurious FRTO undo will trigger (assuming FRTO is enabled for that RTO). The reason is that any non-retransmitted segment acknowledged will set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is no indication that it would have been delivered for real (the scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK case so the check for that bit won't help like it does with SACK). Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo in tcp_process_loss. We need to use more strict condition for non-SACK case and check that none of the cumulatively ACKed segments were retransmitted to prove that progress is due to original transmissions. Only then keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in non-SACK case. (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS to better indicate its purpose but to keep this change minimal, it will be done in another patch). Besides burstiness and congestion control violations, this problem can result in RTO loop: When the loss recovery is prematurely undoed, only new data will be transmitted (if available) and the next retransmission can occur only after a new RTO which in case of multiple losses (that are not for consecutive packets) requires one RTO per loss to recover. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 30 6月, 2018 20 次提交
-
-
由 Roopa Prabhu 提交于
After commit f9d4b0c1 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule"), rule_exists got replaced by rule_find for existing rule lookup in both the add and del paths. While this is good for the delete path, it solves a few problems but opens up a few invalid key matches in the add path. $ip -4 rule add table main tos 10 fwmark 1 $ip -4 rule add table main tos 10 RTNETLINK answers: File exists The problem here is rule_find does not check if the key masks in the new and old rule are the same and hence ends up matching a more secific rule. Rule key masks cannot be easily compared today without an elaborate if-else block. Its best to introduce key masks for easier and accurate rule comparison in the future. Until then, due to fear of regressions this patch re-introduces older loose rule_exists during add. Also fixes both rule_exists and rule_find to cover missing attributes. Fixes: f9d4b0c1 ("fib_rules: move common handling of newrule delrule msgs into fib_nl2rule") Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Cong Wang 提交于
As noticed by Eric, we need to switch to the helper dev_change_tx_queue_len() for SIOCSIFTXQLEN call path too, otheriwse still miss dev_qdisc_change_tx_queue_len(). Fixes: 6a643ddb ("net: introduce helper dev_change_tx_queue_len()") Reported-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vakul Garg 提交于
Calling skb_unclone() is expensive as it triggers a memcpy operation. Instead of calling skb_unclone() unconditionally, call it only when skb has a shared frag_list. This improves tls rx throughout significantly. Signed-off-by: NVakul Garg <vakul.garg@nxp.com> Suggested-by: NBoris Pismenny <borisp@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
This commit extends the existing TIPC socket diagnostics framework for information related to TIPC group communication. Acked-by: NYing Xue <ying.xue@windriver.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NGhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
A peer node is considered down if there are no active links (or) lost contact to the node. In current implementation, a peer node instance is deleted either if a) TIPC module is removed (or) b) Application can use a netlink/iproute2 interface to delete a specific down node. Thus, a down node instance lives in the system forever, unless the application explicitly removes it. We fix this by deleting the nodes which are down for a specified amount of time (5 minutes). Existing node supervision timer is used to achieve this. Acked-by: NYing Xue <ying.xue@windriver.com> Acked-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NGhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tung Nguyen 提交于
In single-link usage, the function tipc_node_timeout() still iterates over the whole link array to handle each link. Given that the maximum number of bearers are 3, there are 2 redundant iterations with lock grab/release. Since this function is executing very frequently it makes sense to optimize it. This commit adds conditional checking to exit from the loop if the known number of configured links has already been accessed. Acked-by: NYing Xue <ying.xue@windriver.com> Signed-off-by: NTung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Sowmini reported that a recent commit broke prefix routes for linklocal addresses. The newly added modify_prefix_route is attempting to add a new prefix route when the ifp priority does not match the route metric however the check needs to account for the default priority. In addition, the route add fails because the route already exists, and then the delete removes the one that exists. Flip the order to do the delete first. Fixes: 8308f3ff ("net/ipv6: Add support for specifying metric of connected routes") Reported-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Tested-by: NSowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tung Nguyen 提交于
The function tipc_msg_extract() is using skb_clone() to clone inner messages from a message bundle buffer. Although this method is safe, it has an undesired effect that each buffer clone inherits the true-size of the bundling buffer. As a result, the buffer clone almost always ends up with being copied anyway by the message validation function. This makes the cloning into a sub-optimization. In this commit we take the consequence of this realization, and copy each inner message to a separately allocated buffer up front in the extraction function. As a bonus we can now eliminate the two cases where we had to copy re-routed packets that may potentially go out on the wire again. Signed-off-by: NTung Nguyen <tung.q.nguyen@dektech.com.au> Signed-off-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hans Wippel 提交于
This patch adds diag support for SMC-D. Signed-off-by: NHans Wippel <hwippel@linux.ibm.com> Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Suggested-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hans Wippel 提交于
This patch ties together the previous SMC-D patches. It adds support for SMC-D to the listen and connect functions and, thus, enables SMC-D support in the SMC code. If a connection supports both SMC-R and SMC-D, SMC-D is preferred. Signed-off-by: NHans Wippel <hwippel@linux.ibm.com> Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Suggested-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hans Wippel 提交于
The data transfer and CDC message headers differ in SMC-R and SMC-D. This patch adds support for the SMC-D data transfer to the existing SMC code. It consists of the following: * SMC-D CDC support * SMC-D tx support * SMC-D rx support The CDC header is stored at the beginning of the receive buffer. Thus, a rx_offset variable is added for the CDC header offset within the buffer (0 for SMC-R). Signed-off-by: NHans Wippel <hwippel@linux.ibm.com> Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Suggested-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hans Wippel 提交于
There are two types of SMC: SMC-R and SMC-D. These types are signaled within the CLC messages during the CLC handshake. This patch adds support for and checks of the SMC type. Also, SMC-R and SMC-D need to exchange different information during the CLC handshake. So, this patch extends the current message formats to support the SMC-D header fields. The Proposal message can contain both SMC-R and SMC-D information. The Accept and Confirm messages contain either SMC-R or SMC-D information. Signed-off-by: NHans Wippel <hwippel@linux.ibm.com> Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Suggested-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hans Wippel 提交于
SMC-D relies on PNETIDs to find usable SMC-D/ISM devices for a SMC connection. This patch adds SMC-D/ISM support to the current PNETID implementation. Signed-off-by: NHans Wippel <hwippel@linux.ibm.com> Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Suggested-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hans Wippel 提交于
SMC supports two variants: SMC-R and SMC-D. For data transport, SMC-R uses RDMA devices, SMC-D uses so-called Internal Shared Memory (ISM) devices. An ISM device only allows shared memory communication between SMC instances on the same machine. For example, this allows virtual machines on the same host to communicate via SMC without RDMA devices. This patch adds the base infrastructure for SMC-D and ISM devices to the existing SMC code. It contains the following: * ISM driver interface: This interface allows an ISM driver to register ISM devices in SMC. In the process, the driver provides a set of device ops for each device. SMC uses these ops to execute SMC specific operations on or transfer data over the device. * Core SMC-D link group, connection, and buffer support: Link groups, SMC connections and SMC buffers (in smc_core) are extended to support SMC-D. * SMC type checks: Some type checks are added to prevent using SMC-R specific code for SMC-D and vice versa. To actually use SMC-D, additional changes to pnetid, CLC, CDC, etc. are required. These are added in follow-up patches. Signed-off-by: NHans Wippel <hwippel@linux.ibm.com> Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Suggested-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ursula Braun 提交于
The SMC protocol requires to send a separate consumer cursor update, if it cannot be piggybacked to updates of the producer cursor. Currently the decision to send a separate consumer cursor update just considers the amount of data already received by the socket program. It does not consider the amount of data already arrived, but not yet consumed by the receiver. Basing the decision on the difference between already confirmed and already arrived data (instead of difference between already confirmed and already consumed data), may lead to a somewhat earlier consumer cursor update send in fast unidirectional traffic scenarios, and thus to better throughput. Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Suggested-by: NThomas Richter <tmricht@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ursula Braun 提交于
s390 hardware supports the definition of a so-call Physical NETwork IDentifier (short PNETID) per network device port. These PNETIDS can be used to identify network devices that are attached to the same physical network (broadcast domain). On s390 try to use the PNETID of the ethernet device port used for initial connecting, and derive the IB device port used for SMC RDMA traffic. On platforms without PNETID support fall back to the existing solution of a configured pnet table. Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Ursula Braun 提交于
For SMC it is important to know the current port state of RoCE devices. Monitoring port states has been triggered, when a RoCE device was added to the pnet table. To support future alternatives to the pnet table the monitoring of ports is made independent of the existence of a pnet table. It starts once the smc_ib_device is established. Due to this change smc_ib_remember_port_attr() is now a local function and shuffling its location and the location of its used functions makes any forward references obsolete. And the duplicate SMC_MAX_PORTS definition is removed. Signed-off-by: NUrsula Braun <ubraun@linux.ibm.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Michal Hocko 提交于
alloc_skb_with_frags uses __GFP_NORETRY for non-sleeping allocations which is just a noop and a little bit confusing. __GFP_NORETRY was added by ed98df33 ("net: use __GFP_NORETRY for high order allocations") to prevent from the OOM killer. Yet this was not enough because fb05e7a8 ("net: don't wait for order-3 page allocation") didn't want an excessive reclaim for non-costly orders so it made it completely NOWAIT while it preserved __GFP_NORETRY in place which is now redundant. Drop the pointless __GFP_NORETRY because this function is used as copy&paste source for other places. Reviewed-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yafang Shao 提交于
When sk_rmem_alloc is larger than the receive buffer and we can't schedule more memory for it, the skb will be dropped. In above situation, if this skb is put into the ofo queue, LINUX_MIB_TCPOFODROP is incremented to track it. While if this skb is put into the receive queue, there's no record. So a new SNMP counter is introduced to track this behavior. LINUX_MIB_TCPRCVQDROP: Number of packets meant to be queued in rcv queue but dropped because socket rcvbuf limit hit. Signed-off-by: NYafang Shao <laoar.shao@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Yuchung Cheng 提交于
Fast Open key could be stored in different endian based on the CPU. Previously hosts in different endianness in a server farm using the same key config (sysctl value) would produce different cookies. This patch fixes it by always storing it as little endian to keep same API for LE hosts. Reported-by: NDaniele Iamartino <danielei@google.com> Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 29 6月, 2018 4 次提交
-
-
由 Simon Horman 提交于
Allow setting tunnel options using the act_tunnel_key action. Options are expressed as class:type:data and multiple options may be listed using a comma delimiter. # ip link add name geneve0 type geneve dstport 0 external # tc qdisc add dev eth0 ingress # tc filter add dev eth0 protocol ip parent ffff: \ flower indev eth0 \ ip_proto udp \ action tunnel_key \ set src_ip 10.0.99.192 \ dst_ip 10.0.99.193 \ dst_port 6081 \ id 11 \ geneve_opts 0102:80:00800022,0102:80:00800022 \ action mirred egress redirect dev geneve0 Signed-off-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Pieter Jansen van Vuuren 提交于
Check the tunnel option type stored in tunnel flags when creating options for tunnels. Thereby ensuring we do not set geneve, vxlan or erspan tunnel options on interfaces that are not associated with them. Make sure all users of the infrastructure set correct flags, for the BPF helper we have to set all bits to keep backward compatibility. Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
Add extended ack support for the tunnel key action by using NL_SET_ERR_MSG during validation of user input. Cc: Alexander Aring <aring@mojatatu.com> Signed-off-by: NSimon Horman <simon.horman@netronome.com> Signed-off-by: NPieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NDavid Ahern <dsahern@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Simon Horman 提交于
Metadata may be NULL for one of two reasons: * Missing user input * Failure to allocate the metadata dst Disambiguate these case by returning -EINVAL for the former and -ENOMEM for the latter rather than -EINVAL for both cases. This is in preparation for using extended ack to provide more information to users when parsing their input. Signed-off-by: NSimon Horman <simon.horman@netronome.com> Reviewed-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-