- 08 2月, 2016 20 次提交
-
-
由 Yuchung Cheng 提交于
The retransmission and F-RTO transmission currently happen inside recovery state processing (tcp_fastretrans_alert) but before congestion control. This refactoring moves the logic after both s.t. we can determine how much to send (cwnd) before deciding what to send. Signed-off-by: NYuchung Cheng <ycheng@google.com> Signed-off-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NEric Dumazet <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Herrmann 提交于
Remove a write-only stack variable from unix_attach_fds(). This is a left-over from the security fix in: commit 712f4aad Author: willy tarreau <w@1wt.eu> Date: Sun Jan 10 07:54:56 2016 +0100 unix: properly account for FDs passed over unix sockets Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com> Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David Ahern 提交于
Allows userspace to have direct access to VRF table association versus looking up master device and its table. Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paul Durrant 提交于
My recent patch to the Xen Project documents a protocol for 'dynamic multicast control' in netif.h. This extends the previous multicast control protocol to not require a shared ring reconnection to turn the feature off. Instead the backend watches the "request-multicast-control" key in xenstore and turns the feature off if the key value is written to zero. This patch adds support for dynamic multicast control in xen-netback. Signed-off-by: NPaul Durrant <paul.durrant@citrix.com> Cc: Ian Campbell <ian.campbell@citrix.com> Cc: Wei Liu <wei.liu2@citrix.com> Acked-by: NWei Liu <wei.liu2@citrix.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Sriharsha Basavapatna says: ==================== be2net patch-set v2 changes: Patch-4: Changed a tab to space in be.h Patches-6,7,8: Updated commit log summary line: benet --> be2net Hi David, The following patch set contains a few non-critical bug fixes. Please consider applying this to the net-next tree. Thanks. Patch-1 fixes be_set_phys_id() ethtool function to return an error code. Patch-2 fixes a warning when some commands fail for VFs. Patch-3 fixes be_vlan_rem_vid() to verify vlan being removed is in the list. Patch-4 improves SRIOV queue distribution logic. Patch-5 avoids running self test on VFs. Patch-6 fixes error recovery in Lancer to clean up after moving to ready state. Patch-7 adds retry logic to error recovery in case of recovery failures Patch-8 fixes time interval used in eq delay computation routine ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Padmanabh Ratnakar 提交于
Interrupt moderation parameters need to be recalculated only after a time interval of 1 ms. Interval calculation is wrong when there is a rollover of jiffies. Using recommended way of interval calculation using jiffies to fix this. Signed-off-by: NPadmanabh Ratnakar <padmanabh.ratnakar@broadcom.com> Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Padmanabh Ratnakar 提交于
Retry error recovery MAX_ERR_RECOVERY_RETRY_COUNT times in case of failure during error recovery. Signed-off-by: NPadmanabh Ratnakar <padmanabh.ratnakar@broadcom.com> Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Padmanabh Ratnakar 提交于
After error is detected, wait for adapter to move to ready state before destroying queues and cleanup of other resources. Also skip performing any cleanup for non-Lancer chips and move debug messages to correct routine. Signed-off-by: NPadmanabh Ratnakar <padmanabh.ratnakar@broadcom.com> Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Somnath Kotur 提交于
The CMD_SUBSYSTEM_LOWLEVEL cmds need DEV_CFG Privilege to run which VFs don't have by default. Self-tests need to be issued only for PFs. Signed-off-by: NSomnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sriharsha Basavapatna 提交于
The SRIOV resource distribution logic for RX/TX queue counts is not optimal when a small number of VFs are enabled. It does not take into account the VF's EQ count while computing the queue counts. Because of this, the VF gets a large number of queues, though it doesn't have sufficient EQs, resulting in wasted queue resources. And the PF gets a smaller share of queues though it has more EQs. Fix this by capping the VF queue count at its EQ count. Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sriharsha Basavapatna 提交于
The driver decrements its vlan count without checking if it is really present in its list. This results in an invalid vlan count and impacts subsequent vlan add/rem ops. The function be_vlan_rem_vid() should be updated to fix this. Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Suresh Reddy 提交于
The driver currently logs the message "VF is not privileged to issue opcode" by checking only the base_status field for UNAUTHORIZED_REQUEST. Add check to look for INSUFFICIENT_PRIVILEGES in the additional status field also as not all cmds fail with that base status. Signed-off-by: NSuresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Suresh Reddy 提交于
be_set_phys_id() returns 0 to ethtool when the command fails in the FW. This patch fixes the set_phys_id() to return -EIO in case the FW cmd fails. Signed-off-by: NSuresh Reddy <suresh.reddy@broadcom.com> Signed-off-by: NSriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Jiri Benc says: ==================== vxlan: cleanup headers and tx path This is first part of vxlan cleanup. It makes vxlan.h better organized and removes code duplication in the tx path. There's no change in functionality. This is in preparation for VXLAN-GPE support. Other sets will follow with cleanup of rx path, modifications of vxlan interface setup and cleanup/modification of fdb handling. The goal of these patchsets is to consolidate VXLAN extension handling (right now it's scattered all around the code) and allow vxlan to operate in L3 mode (i.e. without inner Ethernet headers). The full picture (all 30 patches) can be seen at: https://github.com/jbenc/linux-vxlan/commits/master Changelog: v2: added patch 5/6, fixing the udp sum problem spotted by Paolo Abeni ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
There's a lot of code duplication. Factor out the duplicate code to a new function shared between IPv4 and IPv6 xmit path. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
The flag for tx checksumming for tunneling over IPv4 and IPv6 is different. Decide whether to do tx checksumming in vxlan_xmit_one and pass it on as a separate flag. This will allow for tx path consolidation in the next patch. Unfortunately, gcc is not clever enough to see that udp_sum is always initialized and gives an uninitialized variable warning. Set it to false to silence the warning. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
The code for output route lookup is duplicated for ndo_start_xmit and ndo_fill_metadata_dst. Move it to a common function. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
RCO and GBP are VXLAN extensions, not specified in RFC 7348. Because of that, they need to be explicitly enabled when creating vxlan interface. By default, those extensions are not used and plain VXLAN header is sent and received. Reflect this in vxlan.h: first, the plain VXLAN header is defined. Following it, RCO is documented and defined, and likewise for GBP. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
VNI_HASH_BITS and VNI_HASH_SIZE are defined twice. Remove the extra definitions. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jiri Benc 提交于
include/net/vxlan.h is a kernel header, no need to prefix fixed size types with double underscore. Signed-off-by: NJiri Benc <jbenc@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 2月, 2016 1 次提交
-
-
由 Eric Dumazet 提交于
When we acknowledge a FIN, it is not enough to ack the sequence number and queue the skb into receive queue. We also have to call tcp_fin() to properly update socket state and send proper poll() notifications. It seems we also had the problem if we received a SYN packet with the FIN flag set, but it does not seem an urgent issue, as no known implementation can do that. Fixes: 61d2bcae ("tcp: fastopen: accept data/FIN present in SYNACK message") Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 06 2月, 2016 19 次提交
-
-
由 David S. Miller 提交于
Parthasarathy Bhuvaragan says: ==================== tipc: cleanups, fixes & improvements for topology server This series contains topology server cleanups, fixes and improvements. Cleanups in #1-#4: We remove duplicate data structures and aligin the rest of the code accordingly. Fixes in #5-#8: The bugs occur either during configuration or while running on SMP targets, which are race conditions that pop up under different situations. Improvements in #9-#10: Updates to decrease timer usage and improve readability. v2: Updated commit message in patch 6 based on feedback from Sergei Shtylyov sergei.shtylyov@cogentembedded.com ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parthasarathy Bhuvaragan 提交于
Until now, tipc_rcv and tipc_send workqueues in server are allocated with parameters WQ_UNBOUND & max_active = 1. This parameters passed to this function makes it equivalent to alloc_ordered_workqueue(). The later form is more explicit and can inherit future ordered_workqueue changes. In this commit we replace alloc_workqueue() with more readable alloc_ordered_workqueue(). Acked-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parthasarathy Bhuvaragan 提交于
Until now, we create timers even for the subscription requests with timeout = TIPC_WAIT_FOREVER. This can be improved by avoiding timer creation when the timeout is set to TIPC_WAIT_FOREVER. In this commit, we introduce a check to creates timers only when timeout != TIPC_WAIT_FOREVER. Acked-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parthasarathy Bhuvaragan 提交于
Until now, during subscription creation the mod_time() & tipc_subscrb_get() are called after releasing the subscriber spin lock. In a SMP system when performing a subscription creation, if the subscription timeout occurs simultaneously (the timer is scheduled to run on another CPU) then the timer thread might decrement the subscribers refcount before the create thread increments the refcount. This can be simulated by creating subscription with timeout=0 and sometimes the timeout occurs before the create request is complete. This leads to the following message: [30.702949] BUG: spinlock bad magic on CPU#1, kworker/u8:3/87 [30.703834] general protection fault: 0000 [#1] SMP [30.704826] CPU: 1 PID: 87 Comm: kworker/u8:3 Not tainted 4.4.0-rc8+ #18 [30.704826] Workqueue: tipc_rcv tipc_recv_work [tipc] [30.704826] task: ffff88003f878600 ti: ffff88003fae0000 task.ti: ffff88003fae0000 [30.704826] RIP: 0010:[<ffffffff8109196c>] [<ffffffff8109196c>] spin_dump+0x5c/0xe0 [...] [30.704826] Call Trace: [30.704826] [<ffffffff81091a16>] spin_bug+0x26/0x30 [30.704826] [<ffffffff81091b75>] do_raw_spin_lock+0xe5/0x120 [30.704826] [<ffffffff81684439>] _raw_spin_lock_bh+0x19/0x20 [30.704826] [<ffffffffa0096f10>] tipc_subscrb_rcv_cb+0x1d0/0x330 [tipc] [30.704826] [<ffffffffa00a37b1>] tipc_receive_from_sock+0xc1/0x150 [tipc] [30.704826] [<ffffffffa00a31df>] tipc_recv_work+0x3f/0x80 [tipc] [30.704826] [<ffffffff8106a739>] process_one_work+0x149/0x3c0 [30.704826] [<ffffffff8106aa16>] worker_thread+0x66/0x460 [30.704826] [<ffffffff8106a9b0>] ? process_one_work+0x3c0/0x3c0 [30.704826] [<ffffffff8106a9b0>] ? process_one_work+0x3c0/0x3c0 [30.704826] [<ffffffff8107029d>] kthread+0xed/0x110 [30.704826] [<ffffffff810701b0>] ? kthread_create_on_node+0x190/0x190 [30.704826] [<ffffffff81684bdf>] ret_from_fork+0x3f/0x70 In this commit, 1. we remove the check for the return code for mod_timer() 2. we protect tipc_subscrb_get() using the subscriber spin lock. We increment the subscriber's refcount as soon as we add the subscription to subscriber's subscription list. Acked-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parthasarathy Bhuvaragan 提交于
Until now, while creating a subscription the subscriber lock protects only the subscribers subscription list and not the nametable. The call to tipc_nametbl_subscribe() is outside the lock. However, at subscription timeout and cancel both the subscribers subscription list and the nametable are protected by the subscriber lock. This asymmetric locking mechanism leads to the following problem: In a SMP system, the timer can be fire on another core before the create request is complete. When the timer thread calls tipc_nametbl_unsubscribe() before create thread calls tipc_nametbl_subscribe(), we get a nullptr exception. This can be simulated by creating subscription with timeout=0 and sometimes the timeout occurs before the create request is complete. The following is the oops: [57.569661] BUG: unable to handle kernel NULL pointer dereference at (null) [57.577498] IP: [<ffffffffa02135aa>] tipc_nametbl_unsubscribe+0x8a/0x120 [tipc] [57.584820] PGD 0 [57.586834] Oops: 0002 [#1] SMP [57.685506] CPU: 14 PID: 10077 Comm: kworker/u40:1 Tainted: P OENX 3.12.48-52.27.1. 9688.1.PTF-default #1 [57.703637] Workqueue: tipc_rcv tipc_recv_work [tipc] [57.708697] task: ffff88064c7f00c0 ti: ffff880629ef4000 task.ti: ffff880629ef4000 [57.716181] RIP: 0010:[<ffffffffa02135aa>] [<ffffffffa02135aa>] tipc_nametbl_unsubscribe+0x8a/ 0x120 [tipc] [...] [57.812327] Call Trace: [57.814806] [<ffffffffa0211c77>] tipc_subscrp_delete+0x37/0x90 [tipc] [57.821357] [<ffffffffa0211e2f>] tipc_subscrp_timeout+0x3f/0x70 [tipc] [57.827982] [<ffffffff810618c1>] call_timer_fn+0x31/0x100 [57.833490] [<ffffffff81062709>] run_timer_softirq+0x1f9/0x2b0 [57.839414] [<ffffffff8105a795>] __do_softirq+0xe5/0x230 [57.844827] [<ffffffff81520d1c>] call_softirq+0x1c/0x30 [57.850150] [<ffffffff81004665>] do_softirq+0x55/0x90 [57.855285] [<ffffffff8105aa35>] irq_exit+0x95/0xa0 [57.860290] [<ffffffff815215b5>] smp_apic_timer_interrupt+0x45/0x60 [57.866644] [<ffffffff8152005d>] apic_timer_interrupt+0x6d/0x80 [57.872686] [<ffffffffa02121c5>] tipc_subscrb_rcv_cb+0x2a5/0x3f0 [tipc] [57.879425] [<ffffffffa021c65f>] tipc_receive_from_sock+0x9f/0x100 [tipc] [57.886324] [<ffffffffa021c826>] tipc_recv_work+0x26/0x60 [tipc] [57.892463] [<ffffffff8106fb22>] process_one_work+0x172/0x420 [57.898309] [<ffffffff8107079a>] worker_thread+0x11a/0x3c0 [57.903871] [<ffffffff81077114>] kthread+0xb4/0xc0 [57.908751] [<ffffffff8151f318>] ret_from_fork+0x58/0x90 In this commit, we do the following at subscription creation: 1. set the subscription's subscriber pointer before performing tipc_nametbl_subscribe(), as this value is required further in the call chain ex: by tipc_subscrp_send_event(). 2. move tipc_nametbl_subscribe() under the scope of subscriber lock Acked-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parthasarathy Bhuvaragan 提交于
Until now, the subscribers endianness for a subscription create/cancel request is determined as: swap = !(s->filter & (TIPC_SUB_PORTS | TIPC_SUB_SERVICE)) The checks are performed only for port/service subscriptions. The swap calculation is incorrect if the filter in the subscription cancellation request is set to TIPC_SUB_CANCEL (it's a malformed cancel request, as the corresponding subscription create filter is missing). Thus, the check if the request is for cancellation fails and the request is treated as a subscription create request. The subscription creation fails as the request is illegal, which terminates this connection. In this commit we determine the endianness by including TIPC_SUB_CANCEL, which will set swap correctly and the request is processed as a cancellation request. Acked-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parthasarathy Bhuvaragan 提交于
In 'commit 7fe8097c ("tipc: fix nullpointer bug when subscribing to events")', we terminate the connection if the subscription creation fails. In the same commit, the subscription creation result was based on the value of subscription pointer (set in the function) instead of the return code. Unfortunately, the same function also handles subscription cancellation request. For a subscription cancellation request, the subscription pointer cannot be set. Thus the connection is terminated during cancellation request. In this commit, we move the subcription cancel check outside of tipc_subscrp_create(). Hence, - tipc_subscrp_create() will create a subscripton - tipc_subscrb_rcv_cb() will subscribe or cancel a subscription. Fixes: 'commit 7fe8097c ("tipc: fix nullpointer bug when subscribing to events")' Acked-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parthasarathy Bhuvaragan 提交于
In this commit, we split tipc_subscrp_create() into two: 1. tipc_subscrp_create() creates a subscription 2. A new function tipc_subscrp_subscribe() adds the subscription to the subscriber subscription list, activates the subscription timer and subscribes to the nametable updates. In future commits, the purpose of tipc_subscrb_rcv_cb() will be to either subscribe or cancel a subscription. There is no functional change in this commit. Acked-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parthasarathy Bhuvaragan 提交于
Until now, struct tipc_subscriber has duplicate fields for type, upper and lower (as member of struct tipc_name_seq) at: 1. as member seq in struct tipc_subscription 2. as member seq in struct tipc_subscr, which is contained in struct tipc_event The former structure contains the type, upper and lower values in network byte order and the later contains the intact copy of the request. The struct tipc_subscription contains a field swap to determine if request needs network byte order conversion. Thus by using swap, we can convert the request when required instead of duplicating it. In this commit, 1. we remove the references to these elements as members of struct tipc_subscription and replace them with elements from struct tipc_subscr. 2. provide new functions to convert the user request into network byte order. Acked-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parthasarathy Bhuvaragan 提交于
Until now, struct tipc_subscription has duplicate timeout and filter attributes present: 1. directly as members of struct tipc_subscription 2. in struct tipc_subscr, which is contained in struct tipc_event In this commit, we remove the references to these elements as members of struct tipc_subscription and replace them with elements from struct tipc_subscr. Acked-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Parthasarathy Bhuvaragan 提交于
Until now, during subscription creation we set sub->timeout by converting the timeout request value in milliseconds to jiffies. This is followed by setting the timeout value in the timer if sub->timeout != TIPC_WAIT_FOREVER. For a subscription create request with a timeout value of TIPC_WAIT_FOREVER, msecs_to_jiffies(TIPC_WAIT_FOREVER) returns MAX_JIFFY_OFFSET (0xfffffffe). This is not equal to TIPC_WAIT_FOREVER (0xffffffff). In this commit, we remove this check. Acked-by: NYing Xue <ying.xue@windriver.com> Reviewed-by: NJon Maloy <jon.maloy@ericsson.com> Signed-off-by: NParthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Zhang Shengju 提交于
netdev_dbg() will add bond device name, it will be helpful if we print slave device name. Signed-off-by: NZhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rafał Miłecki 提交于
Chipsets with BCM4707 / BCM53018 ID require special handling at a few places in the code. It's likely there will be more IDs to check in the future. To simplify it add this trivial helper. Signed-off-by: NRafał Miłecki <zajec5@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Alexei Starovoitov says: ==================== bpf: introduce per-cpu maps We've started to use bpf to trace every packet and atomic add instruction (event JITed) started to show up in perf profile. The solution is to do per-cpu counters. For PERCPU_(HASH|ARRAY) map the existing bpf_map_lookup() helper returns per-cpu area which bpf programs can use to store and increment the counters. The BPF_MAP_LOOKUP_ELEM syscall command returns areas from all cpus and user process aggregates the counters. The usage example is in patch 6. The api turned out to be very easy to use from bpf program and from user space. Long term we were discussing to add 'bounded loop' instruction, so bpf programs can do aggregation within the program which may help some use cases. Right now user space aggregation of per-cpu counters fits the best. This patch set is new approach for per-cpu hash and array maps. I've reused the map tests written by Martin and Ming, but implementation and api is new. Old discussion here: http://thread.gmane.org/gmane.linux.kernel/2123800/focus=2126435 ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 tom.leiming@gmail.com 提交于
A sanity test for BPF_MAP_TYPE_PERCPU_ARRAY Signed-off-by: NMing Lei <tom.leiming@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Martin KaFai Lau 提交于
A sanity test for BPF_MAP_TYPE_PERCPU_HASH. Signed-off-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
The functions bpf_map_lookup_elem(map, key, value) and bpf_map_update_elem(map, key, value, flags) need to get/set values from all-cpus for per-cpu hash and array maps, so that user space can aggregate/update them as necessary. Example of single counter aggregation in user space: unsigned int nr_cpus = sysconf(_SC_NPROCESSORS_CONF); long values[nr_cpus]; long value = 0; bpf_lookup_elem(fd, key, values); for (i = 0; i < nr_cpus; i++) value += values[i]; The user space must provide round_up(value_size, 8) * nr_cpus array to get/set values, since kernel will use 'long' copy of per-cpu values to try to copy good counters atomically. It's a best-effort, since bpf programs and user space are racing to access the same memory. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alexei Starovoitov 提交于
Primary use case is a histogram array of latency where bpf program computes the latency of block requests or other events and stores histogram of latency into array of 64 elements. All cpus are constantly running, so normal increment is not accurate, bpf_xadd causes cache ping-pong and this per-cpu approach allows fastest collision-free counters. Signed-off-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-