- 14 11月, 2009 7 次提交
-
-
由 Eric Dumazet 提交于
When handling large number of netdevices, inet6_dump_addr() is very slow because it has O(N^2) complexity. Instead of scanning one single list, we can use the NETDEV_HASHENTRIES sub lists of the dev_index hash table, and RCU lookups. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Stephen Hemminger a écrit : > On Thu, 12 Nov 2009 15:11:36 +0100 > Eric Dumazet <eric.dumazet@gmail.com> wrote: > >> When handling large number of netdevices, inet_dump_ifaddr() >> is very slow because it has O(N^2) complexity. >> >> Instead of scanning one single list, we can use the NETDEV_HASHENTRIES >> sub lists of the dev_index hash table, and RCU lookups. >> >> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> > > You might be able to make RCU critical section smaller by moving > it into loop. > Indeed. But we dump at most one skb (<= 8192 bytes ?), so rcu_read_lock holding time is small, unless we meet many netdevices without addresses. I wonder if its really common... Thanks [PATCH net-next-2.6] ipv4: speedup inet_dump_ifaddr() When handling large number of netdevices, inet_dump_ifaddr() is very slow because it has O(N2) complexity. Instead of scanning one single list, we can use the NETDEV_HASHENTRIES sub lists of the dev_index hash table, and RCU lookups. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We need to use next_det_device_rcu() in RCU protected section. We also can avoid in_dev_get()/in_dev_put() overhead (code size mainly) in rcu_read_lock() sections. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Acked-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
No longer need read_lock(&dev_base_lock), use RCU instead. We also can avoid taking references on inet6_dev structs. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 William Allen Simpson 提交于
Define two symbols needed in both kernel and user space. Remove old (somewhat incorrect) kernel variant that wasn't used in most cases. Default should apply to both RMSS and SMSS (RFC2581). Replace numeric constants with defined symbols. Stand-alone patch, originally developed for TCPCT. Signed-off-by: William.Allen.Simpson@gmail.com Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Both vlan and macvlan devices usually don't use a qdisc and immediately queue packets to the underlying device. Propagate transmission state of the underlying device to the upper layers so they can react on congestion and/or inform the sending process. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Patrick McHardy 提交于
Currently the ->ndo_hard_start_xmit() callbacks are only permitted to return one of the NETDEV_TX codes. This prevents any kind of error propagation for virtual devices, like queue congestion of the underlying device in case of layered devices, or unreachability in case of tunnels. This patches changes the NET_XMIT codes to avoid clashes with the NETDEV_TX codes and changes the two callers of dev_hard_start_xmit() to expect either errno codes, NET_XMIT codes or NETDEV_TX codes as return value. In case of qdisc_restart(), all non NETDEV_TX codes are mapped to NETDEV_TX_OK since no error propagation is possible when using qdiscs. In case of dev_queue_xmit(), the error is propagated upwards. Signed-off-by: NPatrick McHardy <kaber@trash.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 12 11月, 2009 7 次提交
-
-
由 Arnd Bergmann 提交于
We have two implementations of the compat_ioctl handling for ATM, the one that we have had for ages in fs/compat_ioctl.c and the one added to net/atm/ioctl.c by David Woodhouse. Unfortunately, both versions are incomplete, and in practice we use a very confusing combination of the two. For ioctl numbers that have the same identifier on 32 and 64 bit systems, we go directly through the compat_ioctl socket operation, for those that differ, we do a conversion in fs/compat_ioctl.c. This patch moves both variants into the vcc_compat_ioctl() function, while preserving the current behaviour. It also kills off the COMPATIBLE_IOCTL definitions that we never use here. Doing it this way is clearly not a good solution, but I hope it is a step into the right direction, so that someone is able to clean up this mess for real. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnd Bergmann 提交于
Handling for SIOCSHWTSTAMP is broken on architectures with a split user/kernel address space like s390, because it passes a real user pointer while using set_fs(KERNEL_DS). A similar problem might arise the next time somebody adds code to dev_ifsioc. Split up dev_ifsioc into three separate functions for SIOCSHWTSTAMP, SIOC*IFMAP and all other numbers so we can get rid of set_fs in all potentially affected cases. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: Patrick Ohly <patrick.ohly@intel.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
There is no reason for this lock to be reader/writer since the reader only has lock held for a very brief period. The overhead of read_lock is more expensive than spinlock. Compile tested only, I am not a decnet user. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
Add missing locking in the case of auto binding to the default device. The address list might change while this code is looking at the list. Compile tested only, I am not a decnet user. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
The full_name_hash function does not produce well distributed values in the lower bits, so most code uses hash_32() to fold it. This is really a bug introduced when name hashing was added, back in 2.5 when I added name hashing. hash_32 is all that is needed since full_name_hash returns unsigned int which is only 32 bits on 64 bit platforms. Also, there is no point in using hash_32 on ifindex, because the is naturally sequential and usually well distributed. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Anton Vorontsov 提交于
NAPI drivers try to recycle SKBs in their polling routine, but we generally don't know the context in which the polling will be called, and the skb recycling itself may require IRQs to be enabled. This patch adds irqs_disabled() test to the skb_recycle_check() routine, so that we'll not let the drivers hit the skb recycling path with IRQs disabled. As a side effect, this patch actually disables skb recycling for some [broken] drivers. E.g. gianfar driver grabs an irqsave spinlock during TX ring processing, and then tries to recycle an skb, and that caused the following badness: nf_conntrack version 0.5.0 (1008 buckets, 4032 max) ------------[ cut here ]------------ Badness at kernel/softirq.c:143 NIP: c003e3c4 LR: c423a528 CTR: c003e344 ... NIP [c003e3c4] local_bh_enable+0x80/0xc4 LR [c423a528] destroy_conntrack+0xd4/0x13c [nf_conntrack] Call Trace: [c15d1b60] [c003e32c] local_bh_disable+0x1c/0x34 (unreliable) [c15d1b70] [c423a528] destroy_conntrack+0xd4/0x13c [nf_conntrack] [c15d1b80] [c02c6370] nf_conntrack_destroy+0x3c/0x70 Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Reported by Stephen Rothwell: -------------------- Today's linux-next build (x86_64 allmodconfig) produced this warning: net/ipv6/addrconf.c: In function 'inet6_dump_ifinfo': net/ipv6/addrconf.c:3833: warning: unused variable 'err' Introduced by commit 84d2697d ("ipv6: speedup inet6_dump_ifinfo()"). -------------------- Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 11 11月, 2009 13 次提交
-
-
由 stephen hemminger 提交于
Use new function to avoid doing read_lock(). Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NOliver Hartkopp <oliver@hartkopp.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
This also needs to be optimized for large number of devices. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
When showing device statistics use RCU rather than read_lock(&dev_base_lock) Compile tested only. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
Use RCU to walk list of network devices in qdisc dump. This could be optimized for large number of devices. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 stephen hemminger 提交于
Do not need to use read_lock(&dev_base_lock), use RCU instead. Signed-off-by: NStephen Hemminger <shemminger@vyatta.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Brian Haley 提交于
Change udp6_portaddr_hash() to use ipv6_addr_v4mapped() inline instead of ipv6_addr_type(). Signed-off-by: NBrian Haley <brian.haley@hp.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Herbert Xu 提交于
This patch rearranges the SIT DF bit handling using the new IPIP DF code. The only externally visible effect should be the case where PMTU is enabled and the MTU is exactly 1280 bytes. In this case the previous code would send packets out with DF off while the new code would set the DF bit. This is inline with RFC 4213. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Thanks, Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Apparently, inet6_dump_addr() is not able to handle more than 64 ipv6 addresses per device. We must break from inner loops in case skb is full, or else cursor is put at the end of list. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
When handling large number of netdevice, inet6_dump_ifinfo() is very slow because it has O(N^2) complexity. Instead of scanning one single list, we can use the 256 sub lists of the dev_index hash table, and RCU lookups. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Cyrill Gorcunov 提交于
Use guard DECLARE_SOCKADDR in a few more places which allow us to catch if the structure copied back is too big. Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
UDP bind() can be O(N^2) in some pathological cases. Thanks to secondary hash tables, we can make it O(N) Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rémi Denis-Courmont 提交于
Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Rémi Denis-Courmont 提交于
Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 11月, 2009 12 次提交
-
-
由 Yury Polyanskiy 提交于
This fixes the following bug in the current implementation of net/xfrm: SAD entries timeouts do not count the time spent by the machine in the suspended state. This leads to the connectivity problems because after resuming local machine thinks that the SAD entry is still valid, while it has already been expired on the remote server. The cause of this is very simple: the timeouts in the net/xfrm are bound to the old mod_timer() timers. This patch reassigns them to the CLOCK_REALTIME hrtimer. I have been using this version of the patch for a few months on my machines without any problems. Also run a few stress tests w/o any issues. This version of the patch uses tasklet_hrtimer by Peter Zijlstra (commit 9ba5f0). This patch is against 2.6.31.4. Please CC me. Signed-off-by: NYury Polyanskiy <polyanskiy@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnd Bergmann 提交于
This adds compat_ioctl support for SIOCWANDEV, which has always been missing. The definition of struct compat_ifreq was missing an ifru_settings fields that is needed to support SIOCWANDEV, so add that and clean up the whitespace damage in the struct definition. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Arnd Bergmann 提交于
SIOCGMIIPHY and SIOCGMIIREG return data through ifreq, so it needs to be converted on the way out as well. SIOCGIFPFLAGS is unused, but has the same problem in theory. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
When skb_clone() fails, we should increment sk_drops and SNMP counters. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
IPV6 UDP multicast rx path is a bit complex and can hold a spinlock for a long time. Using a small (32 or 64 entries) stack of socket pointers can help to perform expensive operations (skb_clone(), udp_queue_rcv_skb()) outside of the lock, in most cases. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
UDP multicast rx path is a bit complex and can hold a spinlock for a long time. Using a small (32 or 64 entries) stack of socket pointers can help to perform expensive operations (skb_clone(), udp_queue_rcv_skb()) outside of the lock, in most cases. It's also a base for a future RCU conversion of multicast recption. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NLucian Adrian Grijincu <lgrijincu@ixiacom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We first locate the (local port) hash chain head If few sockets are in this chain, we proceed with previous lookup algo. If too many sockets are listed, we take a look at the secondary (port, address) hash chain. We choose the shortest chain and proceed with a RCU lookup on the elected chain. But, if we chose (port, address) chain, and fail to find a socket on given address, we must try another lookup on (port, in6addr_any) chain to find sockets not bound to a particular IP. -> No extra cost for typical setups, where the first lookup will probabbly be performed. RCU lookups everywhere, we dont acquire spinlock. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
We first locate the (local port) hash chain head If few sockets are in this chain, we proceed with previous lookup algo. If too many sockets are listed, we take a look at the secondary (port, address) hash chain we added in previous patch. We choose the shortest chain and proceed with a RCU lookup on the elected chain. But, if we chose (port, address) chain, and fail to find a socket on given address, we must try another lookup on (port, INADDR_ANY) chain to find socket not bound to a particular IP. -> No extra cost for typical setups, where the first lookup will probabbly be performed. RCU lookups everywhere, we dont acquire spinlock. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Extends udp_table to contain a secondary hash table. socket anchor for this second hash is free, because UDP doesnt use skc_bind_node : We define an union to hold both skc_bind_node & a new hlist_nulls_node udp_portaddr_node udp_lib_get_port() inserts sockets into second hash chain (additional cost of one atomic op) udp_lib_unhash() deletes socket from second hash chain (additional cost of one atomic op) Note : No spinlock lockdep annotation is needed, because lock for the secondary hash chain is always get after lock for primary hash chain. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Union sk_hash with two u16 hashes for udp (no extra memory taken) One 16 bits hash on (local port) value (the previous udp 'hash') One 16 bits hash on (local address, local port) values, initialized but not yet used. This second hash is using jenkin hash for better distribution. Because the 'port' is xored later, a partial hash is performed on local address + net_hash_mix(net) Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Dumazet 提交于
Adds a counter in udp_hslot to keep an accurate count of sockets present in chain. This will permit to upcoming UDP lookup algo to chose the shortest chain when secondary hash is added. Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stephen Rothwell 提交于
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 08 11月, 2009 1 次提交
-
-
由 Eric W. Biederman 提交于
There is no good reason to not support userspace specifying the network namespace during device creation, and it makes it easier to create a network device and pass it to a child network namespace with a well known name. We have to be careful to ensure that the target network namespace for the new device exists through the life of the call. To keep that logic clear I have factored out the network namespace grabbing logic into rtnl_link_get_net. In addtion we need to continue to pass the source network namespace to the rtnl_link_ops.newlink method so that we can find the base device source network namespace. Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
-