提交 · c6d14c84566d6b70ad9dc1618db0dec87cca9300 · openanolis / cloud-kernel

04 11月, 2009 3 次提交

net: Introduce for_each_netdev_rcu() iterator · c6d14c84

由 Eric Dumazet 提交于 11月 04, 2009

Adds RCU management to the list of netdevices.

Convert some for_each_netdev() users to RCU version, if
it can avoid read_lock-ing dev_base_lock

Ie:
	read_lock(&dev_base_loack);
	for_each_netdev(net, dev)
		some_action();
	read_unlock(&dev_base_lock);

becomes :

	rcu_read_lock();
	for_each_netdev_rcu(net, dev)
		some_action();
	rcu_read_unlock();
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6d14c84

em_meta: avoid one dev_put() · d0075634

由 Eric Dumazet 提交于 11月 04, 2009

Another rcu conversion to avoid one dev_hold()/dev_put() pair
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0075634

Phonet: remove tautologies · 4b7673a0

由 Rémi Denis-Courmont 提交于 11月 02, 2009

These checks don't make sense anymore since rtnl_notify() cannot fail.
Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b7673a0

02 11月, 2009 9 次提交

ipv6: no more dev_put() in datagram_send_ctl() · 536b2e92

由 Eric Dumazet 提交于 11月 02, 2009

Avoids touching device refcount in datagram_send_ctl(), thanks to RCU
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

536b2e92

ipv6: no more dev_put() in inet6_bind() · 16ba5e8e

由 Eric Dumazet 提交于 11月 02, 2009

Avoids touching device refcount in inet6_bind(), thanks to RCU
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16ba5e8e

ip6tnl: less dev_put() calls · f1a28eab

由 Eric Dumazet 提交于 11月 02, 2009

Using dev_get_by_index_rcu() in ip6_tnl_rcv_ctl() & ip6_tnl_xmit_ctl()
avoids touching device refcount.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1a28eab

packet: less dev_put() calls · 654d1f8a

由 Eric Dumazet 提交于 11月 02, 2009

- packet_sendmsg_spkt() can use dev_get_by_name_rcu() to avoid touching device refcount.

- packet_getname_spkt() & packet_getname() can use dev_get_by_index_rcu() to
  avoid touching device refcount too.

tpacket_snd() & packet_snd() can not use RCU yet because they can sleep when
allocating skb.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

654d1f8a

net: RCU locking for simple ioctl() · 3710becf

由 Eric Dumazet 提交于 11月 01, 2009

All ioctls() implemented by dev_ifsioc_locked() :
SIOCGIFFLAGS, SIOCGIFMETRIC, SIOCGIFMTU, SIOCGIFHWADDR,
SIOCGIFSLAVE, SIOCGIFMAP, SIOCGIFINDEX & SIOCGIFTXQLEN
can use RCU lock instead of dev_base_lock rwlock
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3710becf

icmp: icmp_send() can avoid a dev_put() · 685c7944

由 Eric Dumazet 提交于 11月 01, 2009

We can avoid touching device refcount in icmp_send(),
using dev_get_by_index_rcu()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

685c7944

ipv4: inetdev_by_index() switch to RCU · c148fc2e

由 Eric Dumazet 提交于 11月 01, 2009

Use dev_get_by_index_rcu() instead of __dev_get_by_index() and
dev_base_lock rwlock
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c148fc2e

veth: Fix unregister_netdevice_queue for veth · 9fdce099

由 Eric W. Biederman 提交于 10月 30, 2009

I tested the recent unregister many changes and got a weird,
nasty and seemingly unrelasted kernel oops. Changing
unregister_netdevice_queue to use list_move_tail fixes
the problem for me.

ip link add type veth
rmmod veth

ls /sys/class/net/
showed one of the veth devices still present.

A subsequent ip link oopsed the box.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9fdce099

net: Introduce dev_get_by_name_rcu() · 72c9528b

由 Eric Dumazet 提交于 10月 30, 2009

Some workloads hit dev_base_lock rwlock pretty hard.
We can use RCU lookups to avoid touching this rwlock
(and avoid touching netdevice refcount)

netdevices are already freed after a RCU grace period, so this patch
adds no penalty at device dismantle time.

However, it adds a synchronize_rcu() call in dev_change_name()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72c9528b

31 10月, 2009 6 次提交

RDS/IB+IW: Move recv processing to a tasklet · d521b63b

由 Andy Grover 提交于 10月 30, 2009

Move receive processing from event handler to a tasklet.
This should help prevent hangcheck timer from going off
when RDS is under heavy load.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d521b63b

RDS: Do not send congestion updates to loopback connections · 0514f8a9

由 Andy Grover 提交于 10月 30, 2009

This issue was discovered by HP's Pradeep and fixed in OFED
1.3, but not fixed in later versions, since the fix's implementation
was not immediately applyable to the later code. This patch should
do the trick for 1.4+ codebases.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0514f8a9

RDS: Fix panic on unload · 433d308d

由 Andy Grover 提交于 10月 30, 2009

Remove explicit destruction of passive connection when destroying
active end of the connection. The passive end is also on the
device's connection list, and will thus be cleaned up properly.
Panic was caused by trying to clean it up twice.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

433d308d

RDS: Fix potential race around rds_i[bw]_allocation · 86357b19

由 Andy Grover 提交于 10月 30, 2009

"At rds_ib_recv_refill_one(), it first executes atomic_read(&rds_ib_allocation)
for if-condition checking,

and then executes atomic_inc(&rds_ib_allocation) if the condition was
not satisfied.

However, if any other code which updates rds_ib_allocation executes
between these two atomic operation executions,
it seems that it may result race condition. (especially when
rds_ib_allocation + 1 == rds_ib_sysctl_max_recv_allocation)"

This patch fixes this by using atomic_inc_unless to eliminate the
possibility of allocating more than rds_ib_sysctl_max_recv_allocation
and then decrementing the count if the allocation fails. It also
makes an identical change to the iwarp transport.
Reported-by: NShin Hong <hongshin@gmail.com>
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86357b19

RDS: Add GET_MR_FOR_DEST sockopt · 244546f0

由 Andy Grover 提交于 10月 30, 2009

RDS currently supports a GET_MR sockopt to establish a
memory region (MR) for a chunk of memory. However, the fastreg
method ties a MR to a particular destination. The GET_MR_FOR_DEST
sockopt allows the remote machine to be specified, and thus
support for fastreg (aka FRWRs).

Note that this patch does *not* do all of this - it simply
implements the new sockopt in terms of the old one, so applications
can begin to use the new sockopt in preparation for cutover to
FRWRs.
Signed-off-by: NAndy Grover <andy.grover@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

244546f0

net: Allow devices to specify a device specific sysfs group. · 0c509a6c

由 Eric W. Biederman 提交于 10月 29, 2009

This isn't beautifully abstracted, but it is simple,
simplifies uses and so far is only needed for the bonding driver.
Signed-off-by: NEric W. Biederman <ebiederm@aristanetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c509a6c

30 10月, 2009 5 次提交

net: use hlist_for_each_entry() · 0bd8d536

由 Eric Dumazet 提交于 10月 30, 2009

Small cleanup of __dev_get_by_name() and __dev_get_by_index()
to use hlist_for_each_entry() : They'll look like their _rcu variant.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0bd8d536

vlan: cleanup multiple unregistrations · 29906f6a

由 Patrick McHardy 提交于 10月 29, 2009

The temporary copy of the VLAN group is not neccessary since the lower device
is already in the process of being unregistered, if it was neccessary the
memset of the global group would introduce a race condition.

With this removed, the changes to the original code are only a few lines, so
remove the new function and move the code back into vlan_device_event().
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29906f6a

ax25: unsigned cannot be less than 0 in ax25_ctl_ioctl() · 43ab8502

由 roel kluin 提交于 10月 14, 2009

struct ax25_ctl_struct member `arg' is unsigned and cannot be less
than 0.
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43ab8502

gro: Change all receive functions to return GRO result codes · c7c4b3b6

由 Ben Hutchings 提交于 10月 29, 2009

This will allow drivers to adjust their receive path dynamically
based on whether GRO is being applied successfully.

Currently all in-tree callers ignore the return values of these
functions and do not need to be changed.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7c4b3b6

gro: Name the GRO result enumeration type · 5b252f0c

由 Ben Hutchings 提交于 10月 29, 2009

This clarifies which return and parameter types are GRO result codes
and not RX result codes.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b252f0c

29 10月, 2009 17 次提交

net: Fix 'Re: PACKET_TX_RING: packet size is too long' · b5dd884e

由 Gabor Gombas 提交于 10月 29, 2009

Currently PACKET_TX_RING forces certain amount of every frame to remain
unused. This probably originates from an early version of the
PACKET_TX_RING patch that in fact used the extra space when the (since
removed) CONFIG_PACKET_MMAP_ZERO_COPY option was enabled. The current
code does not make any use of this extra space.

This patch removes the extra space reservation and lets userspace make
use of the full frame size.
Signed-off-by: NGabor Gombas <gombasg@sztaki.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b5dd884e

net,socket: introduce DECLARE_SOCKADDR helper to catch overflow at build time · 38bfd8f5

由 Cyrill Gorcunov 提交于 10月 29, 2009

proto_ops->getname implies copying protocol specific data
into storage unit (particulary to __kernel_sockaddr_storage).
So when we implement new protocol support we should keep such
a detail in mind (which is easy to forget about).

Lets introduce DECLARE_SOCKADDR helper which check if
storage unit is not overfowed at build time.

Eventually inet_getname is switched to use DECLARE_SOCKADDR
(to show example of usage).
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38bfd8f5

net: Introduce dev_get_by_index_rcu() · fb699dfd

由 Eric Dumazet 提交于 10月 19, 2009

Some workloads hit dev_base_lock rwlock pretty hard.
We can use RCU lookups to avoid touching this rwlock.

netdevices are already freed after a RCU grace period, so this patch
adds no penalty at device dismantle time.

dev_ifname() converted to dev_get_by_index_rcu()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fb699dfd

net: Cleanup redundant tests on unsigned · 65a1c4ff

由 roel kluin 提交于 10月 23, 2009

optlen is unsigned so the `< 0' test is never true.
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65a1c4ff

net: Cleanup redundant tests on unsigned · 091bb8ab

由 roel kluin 提交于 10月 23, 2009

If there is data, the unsigned skb->len is greater than 0.

rt.sigdigits is unsigned as well, so the test `>= 0' is
always true, the other part of the test catches wrapped
values.
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

091bb8ab

Allow disabling of DSACK TCP option per route · dc343475

由 Gilad Ben-Yossef 提交于 10月 28, 2009

Add and use no DSCAK bit in the features field.
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: NOri Finkelman <ori@comsleep.com>
Sigend-off-by: NYony Amit <yony@comsleep.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc343475

Allow to turn off TCP window scale opt per route · 345cda2f

由 Gilad Ben-Yossef 提交于 10月 28, 2009

Add and use no window scale bit in the features field.

Note that this is not the same as setting a window scale of 0
as would happen with window limit on route.
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: NOri Finkelman <ori@comsleep.com>
Sigend-off-by: NYony Amit <yony@comsleep.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

345cda2f

Allow disabling TCP timestamp options per route · cda42ebd

由 Gilad Ben-Yossef 提交于 10月 28, 2009

Implement querying and acting upon the no timestamp bit in the feature
field.
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: NOri Finkelman <ori@comsleep.com>
Sigend-off-by: NYony Amit <yony@comsleep.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cda42ebd

Add the no SACK route option feature · 1aba721e

由 Gilad Ben-Yossef 提交于 10月 28, 2009

Implement querying and acting upon the no sack bit in the features
field.
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: NOri Finkelman <ori@comsleep.com>
Sigend-off-by: NYony Amit <yony@comsleep.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1aba721e

Allow tcp_parse_options to consult dst entry · 022c3f7d

由 Gilad Ben-Yossef 提交于 10月 28, 2009

We need tcp_parse_options to be aware of dst_entry to
take into account per dst_entry TCP options settings
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Sigend-off-by: NOri Finkelman <ori@comsleep.com>
Sigend-off-by: NYony Amit <yony@comsleep.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

022c3f7d

Only parse time stamp TCP option in time wait sock · f55017a9

由 Gilad Ben-Yossef 提交于 10月 28, 2009

Since we only use tcp_parse_options here to check for the exietence
of TCP timestamp option in the header, it is better to call with
the "established" flag on.
Signed-off-by: NGilad Ben-Yossef <gilad@codefidence.com>
Signed-off-by: NOri Finkelman <ori@comsleep.com>
Signed-off-by: NYony Amit <yony@comsleep.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f55017a9

ip6mr: Optimize multiple unregistration · c871e664