- 17 11月, 2010 10 次提交
-
-
由 Eric Dumazet 提交于
Right now, fields in struct sock are not optimally ordered, because each path (RX softirq, TX completion, RX user, TX user) has to touch fields that are contained in many different cache lines. The really critical thing is to shrink number of cache lines that are used at RX softirq time : CPU handling softirqs for a device can receive many frames per second for many sockets. If load is too big, we can drop frames at NIC level. RPS or multiqueue cards can help, but better reduce latency if possible. This patch starts with UDP protocol, then additional patches will try to reduce latencies of other ones as well. At RX softirq time, fields of interest for UDP protocol are : (not counting ones in inet struct for the lookup) Read/Written: sk_refcnt (atomic increment/decrement) sk_rmem_alloc & sk_backlog.len (to check if there is room in queues) sk_receive_queue sk_backlog (if socket locked by user program) sk_rxhash sk_forward_alloc sk_drops Read only: sk_rcvbuf (sk_rcvqueues_full()) sk_filter sk_wq sk_policy[0] sk_flags Additional notes : - sk_backlog has one hole on 64bit arches. We can fill it to save 8 bytes. - sk_backlog is used only if RX sofirq handler finds the socket while locked by user. - sk_rxhash is written only once per flow. - sk_drops is written only if queues are full Final layout : [1] One section grouping all read/write fields, but placing rxhash and sk_backlog at the end of this section. [2] One section grouping all read fields in RX handler (sk_filter, sk_rcv_buf, sk_wq) [3] Section used by other paths I'll post a patch on its own to put sk_refcnt at the end of struct sock_common so that it shares same cache line than section [1] New offsets on 64bit arch : sizeof(struct sock)=0x268 offsetof(struct sock, sk_refcnt) =0x10 offsetof(struct sock, sk_lock) =0x48 offsetof(struct sock, sk_receive_queue)=0x68 offsetof(struct sock, sk_backlog)=0x80 offsetof(struct sock, sk_rmem_alloc)=0x80 offsetof(struct sock, sk_forward_alloc)=0x98 offsetof(struct sock, sk_rxhash)=0x9c offsetof(struct sock, sk_rcvbuf)=0xa4 offsetof(struct sock, sk_drops) =0xa0 offsetof(struct sock, sk_filter)=0xa8 offsetof(struct sock, sk_wq)=0xb0 offsetof(struct sock, sk_policy)=0xd0 offsetof(struct sock, sk_flags) =0xe0 Instead of : sizeof(struct sock)=0x270 offsetof(struct sock, sk_refcnt) =0x10 offsetof(struct sock, sk_lock) =0x50 offsetof(struct sock, sk_receive_queue)=0xc0 offsetof(struct sock, sk_backlog)=0x70 offsetof(struct sock, sk_rmem_alloc)=0xac offsetof(struct sock, sk_forward_alloc)=0x10c offsetof(struct sock, sk_rxhash)=0x128 offsetof(struct sock, sk_rcvbuf)=0x4c offsetof(struct sock, sk_drops) =0x16c offsetof(struct sock, sk_filter)=0x198 offsetof(struct sock, sk_wq)=0x88 offsetof(struct sock, sk_policy)=0x98 offsetof(struct sock, sk_flags) =0x130 Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
UDP sockets refcount is usually 2, unless an incoming frame is going to be queued in receive or backlog queue. Using atomic_inc_not_zero_hint() permits to reduce latency, because processor issues less memory transactions. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Now vlan are lockless, we dont need special ndo_select_queue() logic. dev_pick_tx() will do the multiqueue stuff on the real device transmit. Suggested-by: NJesse Gross <jesse@nicira.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
vlan is a stacked device, like tunnels. We should use the lockless mechanism we are using in tunnels and loopback. This patch completely removes locking in TX path. tx stat counters are added into existing percpu stat structure, renamed from vlan_rx_stats to vlan_pcpu_stats. Note : this partially reverts commit 2e59af3d (vlan: multiqueue vlan device) Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
macvlan is a stacked device, like tunnels. We should use the lockless mechanism we are using in tunnels and loopback. This patch completely removes locking in TX path. tx stat counters are added into existing percpu stat structure, renamed from rx_stats to pcpu_stats. Note : this reverts commit 2c114553 (macvlan: add multiqueue capability) Note : rx_errors converted to a 32bit counter, like tx_dropped, since they dont need 64bit range. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Ben Greear <greearb@candelatech.com> Cc: Ben Hutchings <bhutchings@solarflare.com> Acked-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Neil Horman 提交于
packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Version 4 of this patch. Change notes: 1) Removed extra memset. Didn't think kcalloc added a GFP_ZERO the way kzalloc did :) Summary: It was shown to me recently that systems under high load were driven very deep into swap when tcpdump was run. The reason this happened was because the AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space application to specify how many entries an AF_PACKET socket will have and how large each entry will be. It seems the default setting for tcpdump is to set the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5 allocation. Thats difficult under good circumstances, and horrid under memory pressure. I thought it would be good to make that a bit more usable. I was going to do a simple conversion of the ring buffer from contigous pages to iovecs, but unfortunately, the metadata which AF_PACKET places in these buffers can easily span a page boundary, and given that these buffers get mapped into user space, and the data layout doesn't easily allow for a change to padding between frames to avoid that, a simple iovec change is just going to break user space ABI consistency. So I've done this, I've added a three tiered mechanism to the af_packet set_ring socket option. It attempts to allocate memory in the following order: 1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without digging into swap 2) Using vmalloc 3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as needed to get the memory The effect is that we don't disturb the system as much when we're under load, while still being able to conduct tcpdumps effectively. Tested successfully by me. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NMaciej Żenczykowski <zenczykowski@gmail.com> Reported-by: NMaciej Żenczykowski <zenczykowski@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
Using %pV reduces the number of printk calls and eliminates any possible message interleaving from other printk calls. Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Jan Engelhardt 提交于
The changed functions do not modify the NL messages and/or attributes at all. They should use const (similar to strchr), so that callers which have a const nlmsg/nlattr around can make use of them without casting. While at it, constify a data array. Signed-off-by: NJan Engelhardt <jengelh@medozas.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 John Fastabend 提交于
Fix ref count bug introduced by commit 2de79570 Author: Lorenzo Colitti <lorenzo@google.com> Date: Wed Oct 27 18:16:49 2010 +0000 ipv6: addrconf: don't remove address state on ifdown if the address is being kept Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr refcnt correctly with in6_ifa_put(). Reported-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 11月, 2010 30 次提交
-
-
由 David S. Miller 提交于
ERROR: "netif_get_vlan_features" [drivers/net/xen-netfront.ko] undefined! Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Vasanthy Kolluri 提交于
Fix data type of argument passed to pci_alloc_consistent and pci_free_consistent routines. Signed-off-by: NVasanthy Kolluri <vkolluri@cisco.com> Signed-off-by: NRoopa Prabhu <roprabhu@cisco.com> Signed-off-by: NDavid Wang <dwang2@cisco.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Alan Cox 提交于
Fallout from the TIOCGICOUNT work Signed-off-by: NAlan Cox <alan@linux.intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
br_port_get() renamed to br_port_get_rtnl() to make clear RTNL is held. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
The macro br_port_exists() is not enough protection when only RCU is being used. There is a tiny race where other CPU has cleared port handler hook, but is bridge port flag might still be set. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
Suggested by Eric's bridge RCU changes. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Add br_should_route_hook_t typedef, this is the only way we can get a clean RCU implementation for function pointer. Move route_hook to location where it is used. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Add modern __rcu annotatations to bridge multicast table. Use newer hlist macros to avoid direct access to hlist internals. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NSjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Joe Perches 提交于
Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
-
由 Tom Herbert 提交于
This patch move RX queue allocation to alloc_netdev_mq and freeing of the queues to free_netdev (symmetric to TX queue allocation). Each kobject RX queue takes a reference to the queue's device so that the device can't be freed before all the kobjects have been released-- this obviates the need for reference counts specific to RX queues. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
TX queues are now allocated in alloc_netdev_mq and freed in free_netdev. Signed-off-by: NTom Herbert <therbert@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Timo Teräs 提交于
The GRE Key field is intended to be used for identifying an individual traffic flow within a tunnel. It is useful to be able to have XFRM policy selector matches to have different policies for different GRE tunnels. Signed-off-by: NTimo Teräs <timo.teras@iki.fi> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
net/8021q/vlanproc.c: In function 'vlandev_seq_show': net/8021q/vlanproc.c:283:20: warning: unused variable 'fmt' Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Christian Lamparter 提交于
This patch replaces the handcrafted sign extension cruft with a generic bitop function. Signed-off-by: NChristian Lamparter <chunkeey@googlemail.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Andreas Herrmann 提交于
This patch moves code out from wireless drivers where two different functions are defined in three code locations for the same purpose and provides a common function to sign extend a 32-bit value. Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Grazvydas Ignotas 提交于
Make use the newly added method to pass platform data for wl1251 too. This allows to eliminate some redundant code. Cc: Ohad Ben-Cohen <ohad@wizery.com> Signed-off-by: NGrazvydas Ignotas <notasas@gmail.com> Acked-by: NKalle Valo <kvalo@adurom.com> Acked-by: NLuciano Coelho <luciano.coelho@nokia.com> Acked-by: NTony Lindgren <tony@atomide.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Grazvydas Ignotas 提交于
Add runtime PM support, similar to how it's done for wl1271. This allows to power down the card when the driver is loaded but network is not in use. Cc: Ohad Ben-Cohen <ohad@wizery.com> Signed-off-by: NGrazvydas Ignotas <notasas@gmail.com> Acked-by: NKalle Valo <kvalo@adurom.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Grazvydas Ignotas 提交于
Call interface specific power callback before calling board specific one. Also allow that callback to fail. This is how it's done for wl1271 and will be used for runtime_pm support. Signed-off-by: NGrazvydas Ignotas <notasas@gmail.com> Acked-by: NKalle Valo <kvalo@adurom.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Wey-Yi Guy 提交于
For 6000 series devices and up, enable automatic update MAC's register for better power usage in PSP mode Signed-off-by: NWey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Shanyu Zhao 提交于
Disconnected antenna algorithm is used to find out which antennas are disconnected. It should be disabled for devices that support advanced bluetooth coexist. Signed-off-by: NShanyu Zhao <shanyu.zhao@intel.com> Signed-off-by: NWey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Shanyu Zhao 提交于
Disconnected antenna algorithm is seperated into its own function from chain noise calibration routine for better code management. Signed-off-by: NShanyu Zhao <shanyu.zhao@intel.com> Signed-off-by: NWey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Johannes Berg 提交于
When the HT information is changed due to BSS changes (like legacy stations joining) we need to recalculate HT RXON parameters. Signed-off-by: NJohannes Berg <johannes.berg@intel.com> Signed-off-by: NWey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-