- 21 3月, 2015 2 次提交
-
-
由 Eric Dumazet 提交于
One of the major issue for TCP is the SYNACK rtx handling, done by inet_csk_reqsk_queue_prune(), fired by the keepalive timer of a TCP_LISTEN socket. This function runs for awful long times, with socket lock held, meaning that other cpus needing this lock have to spin for hundred of ms. SYNACK are sent in huge bursts, likely to cause severe drops anyway. This model was OK 15 years ago when memory was very tight. We now can afford to have a timer per request sock. Timer invocations no longer need to lock the listener, and can be run from all cpus in parallel. With following patch increasing somaxconn width to 32 bits, I tested a listener with more than 4 million active request sockets, and a steady SYNFLOOD of ~200,000 SYN per second. Host was sending ~830,000 SYNACK per second. This is ~100 times more what we could achieve before this patch. Later, we will get rid of the listener hash and use ehash instead. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
When request sock are put in ehash table, the whole notion of having a previous request to update dl_next is pointless. Also, following patch will get rid of big purge timer, so we want to delete a request sock without holding listener lock. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 3月, 2015 16 次提交
-
-
由 Jörg Thalheim 提交于
Signed-off-by: NJörg Thalheim <joerg@higgsboson.tk> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
When a networking device is taken down that has a non-trivial number of VLAN devices configured under it, we eat a full synchronize_net() for every such VLAN device. This is because of the call chain: NETDEV_DOWN notifier --> vlan_device_event() --> dev_change_flags() --> __dev_change_flags() --> __dev_close() --> __dev_close_many() --> dev_deactivate_many() --> synchronize_net() This is kind of rediculous because we already have infrastructure for batching doing operation X to a list of net devices so that we only incur one sync. So make use of that by exporting dev_close_many() and adjusting it's interfaace so that the caller can fully manage the batch list. Use this in vlan_device_event() and all the overhead goes away. Reported-by: NSalam Noureddine <noureddine@arista.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Similar to port id allow netdevices to specify port names and export the name via sysfs. Drivers can implement the netdevice operation to assist udev in having sane default names for the devices using the rule: $ cat /etc/udev/rules.d/80-net-setup-link.rules SUBSYSTEM=="net", ACTION=="add", ATTR{phys_port_name}!="", NAME="$attr{phys_port_name}" Use of phys_name versus phys_id was suggested-by Jiri Pirko. Signed-off-by: NDavid Ahern <dsahern@gmail.com> Acked-by: NJiri Pirko <jiri@resnulli.us> Acked-by: NScott Feldman <sfeldma@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Marcelo Ricardo Leitner 提交于
in favor of their inner __ ones, which doesn't grab rtnl. As these functions need to operate on a locked socket, we can't be grabbing rtnl by then. It's too late and doing so causes reversed locking. So this patch: - move rtnl handling to callers instead while already fixing some reversed locking situations, like on vxlan and ipvs code. - renames __ ones to not have the __ mark: __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop} Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
In order to be able to use sk_ehashfn() for request socks, we need to initialize their IPv6/IPv4 addresses. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We now always call __inet_hash_nolisten(), no need to pass it as an argument. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We can now use inet_hash() and __inet_hash() instead of private functions. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Intent is to converge IPv4 & IPv6 inet_hash functions to factorize code. IPv4 sockets initialize sk_rcv_saddr and sk_v6_daddr in this patch, thanks to new sk_daddr_set() and sk_rcv_saddr_set() helpers. __inet6_hash can now use sk_ehashfn() instead of a private inet6_sk_ehashfn() and will simply use __inet_hash() in a following patch. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Goal is to unify IPv4/IPv6 inet_hash handling, and use common helpers for all kind of sockets (full sockets, timewait and request sockets) inet_sk_ehashfn() becomes sk_ehashfn() but still only copes with IPv4 Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
const qualifiers ease code review by making clear which objects are not written in a function. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Or Gerlitz 提交于
Add the low-level device commands and definitions used for QP max-rate limiting. This is done through the following elements: - read rate-limit device caps in QUERY_DEV_CAP: number of different rates and the min/max rates in Kbs/Mbs/Gbs units - enhance the QP context struct to contain rate limit units and value - allow to do run time rate-limit setting to QPs through the update-qp firmware command - QP rate-limiting is disallowed for VFs Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
This adds a tx_maxrate attribute to the tx queue sysfs entry allowing for max-rate limiting. Along with DCB-ETS and BQL this provides another knob to tune queue performance. The limit units are Mbps. By default it is disabled. To disable the rate limitation after it has been set for a queue, it should be set to zero. Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Brad Campbell 提交于
The TI CC2521 is an RF power amplifier that is designed to interface with the CC2520. Conveniently, it directly interfaces with the CC2520 and does not require any pins to be connected to a microcontroller/processor. Adding a CC2591 increases the CC2520's range, which is useful for border router and other wall-powered applications. Using the CC2591 with the CC2520 requires configuring the CC2520 GPIOs that are connected to the CC2591 to correctly set the CC2591 into TX and RX modes. Further, TI recommends that the CC2520_TXPOWER and CC2520_AGCCTRL1 registers are set differently to maximize the CC2591's performance. These settings are covered in TI Application Note AN065. This patch adds an optional `amplified` field to the cc2520 entry in the device tree. If present, the CC2520 will be configured to operate with a CC2591. The expected pin mapping is: CC2520 GPIO0 --> CC2591 EN CC2520 GPIO5 --> CC2591 PAEN Signed-off-by: NBrad Campbell <bradjc5@gmail.com> Acked-by: NVarka Bhadram <varkabhadram@gmail.com> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Herbert Xu 提交于
Now that nobody uses max_shift and min_shift, we can safely remove them. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
This patch adds the parameters max_size and min_size which are meant to replace max_shift and min_shift. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
Keeping both size and shift is silly. We only need one. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 3月, 2015 13 次提交
-
-
由 Eric Dumazet 提交于
While testing last patch series, I found req sock refcounting was wrong. We must set skc_refcnt to 1 for all request socks added in hashes, but also on request sockets created by FastOpen or syncookies. It is tricky because we need to defer this initialization so that future RCU lookups do not try to take a refcount on a not yet fully initialized request socket. Also get rid of ireq_refcnt alias. Signed-off-by: NEric Dumazet <edumazet@google.com> Fixes: 13854e5a ("inet: add proper refcounting to request sock") Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
The listener field in struct tcp_request_sock is a pointer back to the listener. We now have req->rsk_listener, so TCP only needs one boolean and not a full pointer. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Once we'll be able to lookup request sockets in ehash table, we'll need to get access to listener which created this request. This avoid doing a lookup to find the listener, which benefits for a more solid SO_REUSEPORT, and is needed once we no longer queue request sock into a listener private queue. Note that 'struct tcp_request_sock'->listener could be reduced to a single bit, as TFO listener should match req->rsk_listener. TFO will no longer need to hold a reference on the listener. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
inet_reqsk_alloc() is becoming fat and should not be inlined. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
listener socket can be used to set net pointer, and will be later used to hold a reference on listener. Add a const qualifier to first argument (struct request_sock_ops *), and factorize all write_pnet(&ireq->ireq_net, sock_net(sk)); Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
tcp_oow_rate_limited() is hardly used in fast path, there is no point inlining it. Signed-of-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
This big helper is called once from tcp_conn_request(), there is no point having it in an include. Compiler will inline it anyway. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
On 64bit arches, we can save 8 bytes in inet_request_sock by moving ir_mark to fill a hole. While we are at it, inet_request_mark() can get a const qualifier for listener socket. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
as a follow on to patch 70006af9 ("bpf: allow eBPF access skb fields") this patch allows 'protocol' and 'vlan_tci' fields to be accessible from extended BPF programs. The usage of 'protocol', 'vlan_present' and 'vlan_tci' fields is the same as corresponding SKF_AD_PROTOCOL, SKF_AD_VLAN_TAG_PRESENT and SKF_AD_VLAN_TAG accesses in classic BPF. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Johan Hedberg 提交于
The mgmt.c file should be reserved purely for HCI_CHANNEL_CONTROL. The mgmt_control() function in it is already completely generic and has a single user in hci_sock.c. This patch moves the function there and renames it a bit more appropriately to hci_mgmt_cmd() (as it's a command dispatcher). Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Johan Hedberg 提交于
In order to make the mgmt command handling more generic we can't have a direct call to mgmt_init_hdev() from mgmt_control(). This patch adds a new callback to struct hci_mgmt_chan. And sets it to point to the mgmt_init_hdev() function for the HCI_CHANNEL_CONTROL instance. Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Johan Hedberg 提交于
We'll need to have access to which HCI channel a socket is bound to, in order to manage pending mgmt commands in clean way. This patch adds a helper for the purpose. Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com> Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
-
由 Jakub Pawlowski 提交于
Some controllers allow both LE scan and BR/EDR inquiry to run at the same time, while others allow only one, LE SCAN or BR/EDR inquiry at given time. Since this is specific to each controller, add a new quirk setting that allows drivers to tell the core wether given controller can do both LE scan and BR/EDR inquiry at same time. Signed-off-by: NJakub Pawlowski <jpawlowski@google.com> Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
-
- 17 3月, 2015 3 次提交
-
-
由 Marcel Holtmann 提交于
When a different user requests a new set of local out-of-band data, then inform all previous users that the data has been updated. To limit the scope of users, the updates are limited to previous users. If a user has never requested out-of-band data, it will also not see the update. Signed-off-by: NMarcel Holtmann <marcel@holtmann.org> Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
-
由 Eric Dumazet 提交于
reqsk_put() is the generic function that should be used to release a refcount (and automatically call reqsk_free()) reqsk_free() might be called if refcount is known to be 0 or undefined. refcnt is set to one in inet_csk_reqsk_queue_add() As request socks are not yet in global ehash table, I added temporary debugging checks in reqsk_put() and reqsk_free() Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We have many places where we want to check if a socket is not a timewait or request socket. Use a helper to avoid hard coding this. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 3月, 2015 6 次提交
-
-
由 Marcel Holtmann 提交于
The LE Secure Connections Confirmation Value and LE Secure Connections Random Value contants are required for the out-of-band data and so just define them. Signed-off-by: NMarcel Holtmann <marcel@holtmann.org> Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
-
由 Marcel Holtmann 提交于
The HCI_CONN_REMOTE_OOB connection flag is used to indicate if the pairing initiator has provided out-of-band data. However since that value is no longer used in any decision making, just remove it. It is actually unclear what purpose the OOB data present field from the HCI IO Capability Response event serves in the first place. If either side provided out-of-band data, then that data will be used for pairing. Signed-off-by: NMarcel Holtmann <marcel@holtmann.org> Signed-off-by: NJohan Hedberg <johan.hedberg@intel.com>
-
由 Scott Feldman 提交于
Signed-off-by: NScott Feldman <sfeldma@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Scott Feldman 提交于
As discussed at netconf, introduce swdev_ops as first step to move switchdev ops from ndo to swdev. This will keep switchdev from cluttering up ndo ops space. Signed-off-by: NScott Feldman <sfeldma@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
introduce user accessible mirror of in-kernel 'struct sk_buff': struct __sk_buff { __u32 len; __u32 pkt_type; __u32 mark; __u32 queue_mapping; }; bpf programs can do: int bpf_prog(struct __sk_buff *skb) { __u32 var = skb->pkt_type; which will be compiled to bpf assembler as: dst_reg = *(u32 *)(src_reg + 4) // 4 == offsetof(struct __sk_buff, pkt_type) bpf verifier will check validity of access and will convert it to: dst_reg = *(u8 *)(src_reg + offsetof(struct sk_buff, __pkt_type_offset)) dst_reg &= 7 since skb->pkt_type is a bitfield. Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
This patch adds the possibility to obtain raw_smp_processor_id() in eBPF. Currently, this is only possible in classic BPF where commit da2033c2 ("filter: add SKF_AD_RXHASH and SKF_AD_CPU") has added facilities for this. Perhaps most importantly, this would also allow us to track per CPU statistics with eBPF maps, or to implement a poor-man's per CPU data structure through eBPF maps. Example function proto-type looks like: u32 (*smp_processor_id)(void) = (void *)BPF_FUNC_get_smp_processor_id; Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-