提交 · 941666c2e3e0f9f6a1cb5808d02352d445bd702c · openeuler / Kernel

09 12月, 2010 1 次提交

net: RCU conversion of dev_getbyhwaddr() and arp_ioctl() · 941666c2

由 Eric Dumazet 提交于 12月 05, 2010

Le dimanche 05 décembre 2010 à 09:19 +0100, Eric Dumazet a écrit :

> Hmm..
>
> If somebody can explain why RTNL is held in arp_ioctl() (and therefore
> in arp_req_delete()), we might first remove RTNL use in arp_ioctl() so
> that your patch can be applied.
>
> Right now it is not good, because RTNL wont be necessarly held when you
> are going to call arp_invalidate() ?

While doing this analysis, I found a refcount bug in llc, I'll send a
patch for net-2.6

Meanwhile, here is the patch for net-next-2.6

Your patch then can be applied after mine.

Thanks

[PATCH] net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()

dev_getbyhwaddr() was called under RTNL.

Rename it to dev_getbyhwaddr_rcu() and change all its caller to now use
RCU locking instead of RTNL.

Change arp_ioctl() to use RCU instead of RTNL locking.

Note: this fix a dev refcount bug in llc
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

941666c2

07 12月, 2010 2 次提交

net: arp: use assignment · ae9c416d

由 Changli Gao 提交于 12月 01, 2010

Only when dont_send is 0, arp_filter() is consulted, so we can simply
assign the return value of arp_filter() to dont_send instead.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae9c416d

net: kill an RCU warning in inet_fill_link_af() · f7fce74e

由 Eric Dumazet 提交于 12月 01, 2010

commits 9f0f7272 (ipv4: AF_INET link address family) and cf7afbfe
(rtnl: make link af-specific updates atomic) used incorrect
__in_dev_get_rcu() in RTNL protected contexts, triggering PROVE_RCU
warnings.

Switch to __in_dev_get_rtnl(), wich is more appropriate, since we hold
RTNL.

Based on a report and initial patch from Amerigo Wang.
Reported-by: NAmerigo Wang <amwang@redhat.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Reviewed-by: NWANG Cong <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7fce74e

03 12月, 2010 1 次提交

tcp: use TCP_BASE_MSS to set basic mss value · 97b1ce25

由 Shan Wei 提交于 12月 01, 2010

TCP_BASE_MSS is defined, but not used.
commit 5d424d5a introduce this macro, so use
it to initial sysctl_tcp_base_mss.

commit 5d424d5a
Author: John Heffner <jheffner@psc.edu>
Date:   Mon Mar 20 17:53:41 2006 -0800

    [TCP]: MTU probing
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97b1ce25

02 12月, 2010 5 次提交

timewait_sock: Create and use getpeer op. · ccb7c410

由 David S. Miller 提交于 12月 01, 2010

The only thing AF-specific about remembering the timestamp
for a time-wait TCP socket is getting the peer.

Abstract that behind a new timewait_sock_ops vector.

Support for real IPV6 sockets is not filled in yet, but
curiously this makes timewait recycling start to work
for v4-mapped ipv6 sockets.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccb7c410

D
inetpeer: Kill use of inet_peer_address_t typedef. · 8790ca17
由 David S. Miller 提交于 12月 01, 2010
```
They are verboten these days.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
8790ca17

ipip: add module alias for tunl0 tunnel device · 8afe7c8a

由 stephen hemminger 提交于 11月 29, 2010

If ipip is built as a module the 'ip tunnel add' command would fail because
the ipip module was not being autoloaded. Adding an alias for
the tunl0 device name cause dev_load() to autoload it when needed.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8afe7c8a

gre: add module alias for gre0 tunnel device · 4da6a738

由 stephen hemminger 提交于 11月 29, 2010

If gre is built as a module the 'ip tunnel add' command would fail because
the ip_gre module was not being autoloaded. Adding an alias for
the gre0 device name cause dev_load() to autoload it when needed.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4da6a738

gre: minor cleanups · 407d6fcb

由 stephen hemminger 提交于 11月 29, 2010

Use strcpy() rather the sprintf() for the case where name is getting
generated.  Fix indentation.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

407d6fcb

01 12月, 2010 7 次提交

inet: Turn ->remember_stamp into ->get_peer in connection AF ops. · 3f419d2d

由 David S. Miller 提交于 11月 29, 2010

Then we can make a completely generic tcp_remember_stamp()
that uses ->get_peer() as a helper, minimizing the AF specific
code and minimizing the eventual code duplication when we implement
the ipv6 side of TW recycling.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f419d2d

D
ipv6: Add infrastructure to bind inet_peer objects to routes. · b3419363
由 David S. Miller 提交于 11月 30, 2010
```
They are only allowed on cached ipv6 routes.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
b3419363

inetpeer: Add v6 peers tree, abstract root properly. · 021e9299

由 David S. Miller 提交于 11月 30, 2010

Add the ipv6 peer tree instance, and adapt remaining
direct references to 'v4_peers' as needed.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

021e9299

inetpeer: Abstract address comparisons. · 02663045

由 David S. Miller 提交于 11月 30, 2010

Now v4 and v6 addresses will both work properly.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02663045

D
inetpeer: Make inet_getpeer() take an inet_peer_adress_t pointer. · b534ecf1
由 David S. Miller 提交于 11月 30, 2010
```
And make an inet_getpeer_v4() helper, update callers.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
b534ecf1

inetpeer: Introduce inet_peer_address_t. · 582a72da

由 David S. Miller 提交于 11月 30, 2010

Currently only the v4 aspect is used, but this will change.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

582a72da

inetpeer: Abstract out the tree root accesses. · 98158f5a

由 David S. Miller 提交于 11月 30, 2010

Instead of directly accessing "peer", change to code to
operate using a "struct inet_peer_base *" pointer.

This will facilitate the addition of a seperate tree for
ipv6 peer entries.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98158f5a

29 11月, 2010 1 次提交

net: add some KERN_CONT markers to continuation lines · a40c9f88

由 Uwe Kleine-König 提交于 11月 23, 2010

Cc: netdev@vger.kernel.org
Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a40c9f88

28 11月, 2010 1 次提交

rtnl: make link af-specific updates atomic · cf7afbfe

由 Thomas Graf 提交于 11月 22, 2010

As David pointed out correctly, updates to af-specific attributes
are currently not atomic. If multiple changes are requested and
one of them fails, previous updates may have been applied already
leaving the link behind in a undefined state.

This patch splits the function parse_link_af() into two functions
validate_link_af() and set_link_at(). validate_link_af() is placed
to validate_linkmsg() check for errors as early as possible before
any changes to the link have been made. set_link_af() is called to
commit the changes later.

This method is not fail proof, while it is currently sufficient
to make set_link_af() inerrable and thus 100% atomic, the
validation function method will not be able to detect all error
scenarios in the future, there will likely always be errors
depending on states which are f.e. not protected by rtnl_mutex
and thus may change between validation and setting.

Also, instead of silently ignoring unknown address families and
config blocks for address families which did not register a set
function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are
returned to avoid comitting 4 out of 5 update requests without
notifying the user.
Signed-off-by: NThomas Graf <tgraf@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf7afbfe

25 11月, 2010 1 次提交

xps: Improvements in TX queue selection · 3853b584

由 Tom Herbert 提交于 11月 21, 2010

In dev_pick_tx, don't do work in calculating queue
index or setting
the index in the sock unless the device has more than one queue.  This
allows the sock to be set only with a queue index of a multi-queue
device which is desirable if device are stacked like in a tunnel.

We also allow the mapping of a socket to queue to be changed.  To
maintain in order packet transmission a flag (ooo_okay) has been
added to the sk_buff structure.  If a transport layer sets this flag
on a packet, the transmit queue can be changed for the socket.
Presumably, the transport would set this if there was no possbility
of creating OOO packets (for instance, there are no packets in flight
for the socket).  This patch includes the modification in TCP output
for setting this flag.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3853b584

23 11月, 2010 1 次提交

Net: ipv4: netfilter: Makefile: Remove deprecated kbuild goal definitions · 6b8ff8c5

由 Tracey Dent 提交于 11月 21, 2010

Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.
Signed-off-by: NTracey Dent <tdent48227@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b8ff8c5

19 11月, 2010 2 次提交

igmp: refine skb allocations · 57e1ab6e

由 Eric Dumazet 提交于 11月 16, 2010

IGMP allocates MTU sized skbs. This may fail for large MTU (order-2
allocations), so add a fallback to try lower sizes.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57e1ab6e

bonding: IGMP handling cleanup · 866f3b25

由 Eric Dumazet 提交于 11月 18, 2010

Instead of iterating in_dev->mc_list from bonding driver, its better
to call a helper function provided by igmp.c
Details of implementation (locking) are private to igmp code.

ip_mc_rejoin_group(struct ip_mc_list *im) becomes
ip_mc_rejoin_groups(struct in_device *in_dev);
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

866f3b25

18 11月, 2010 4 次提交

net: ipv4: tcp_probe: cleanup snprintf() use · dda0b386

由 Vasiliy Kulikov 提交于 11月 14, 2010

snprintf() returns number of bytes that were copied if there is no overflow.
This code uses return value as number of copied bytes. Theoretically format
string '%lu.%09lu %pI4:%u %pI4:%u %d %#x %#x %u %u %u %u\n' may be expanded
up to 163 bytes. In reality tv.tv_sec is just few bytes instead of 20, 2 ports
are just 5 bytes each instead of 10, length is 5 bytes instead of 10. The rest
is an unstrusted input. Theoretically if tv_sec is big then copy_to_user() would
overflow tbuf.

tbuf was increased to fit in 163 bytes. snprintf() is used to follow return
value semantic.
Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dda0b386

net: use the macros defined for the members of flowi · 5811662b

由 Changli Gao 提交于 11月 12, 2010

Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5811662b

ipv4: AF_INET link address family · 9f0f7272

由 Thomas Graf 提交于 11月 16, 2010

Implements the AF_INET link address family exposing the per
device configuration settings via netlink using the attribute
IFLA_INET_CONF.

The format of IFLA_INET_CONF differs depending on the direction
the attribute is sent. The attribute sent by the kernel consists
of a u32 array, basically a 1:1 copy of in_device->cnf.data[].
The attribute expected by the kernel must consist of a sequence
of nested u32 attributes, each representing a change request,
e.g.
	[IFLA_INET_CONF] = {
		[IPV4_DEVCONF_FORWARDING] = 1,
		[IPV4_DEVCONF_NOXFRM] = 0,
	}

libnl userspace API documentation and example available from:
http://www.infradead.org/~tgr/libnl/doc-git/group__link__inet.htmlSigned-off-by: NThomas Graf <tgraf@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f0f7272

network: tcp_connect should return certain errors up the stack · ee586811

由 Eric Paris 提交于 11月 16, 2010

The current tcp_connect code completely ignores errors from sending an skb.
This makes sense in many situations (like -ENOBUFFS) but I want to be able to
immediately fail connections if they are denied by the SELinux netfilter hook.
Netfilter does not normally return ECONNREFUSED when it drops a packet so we
respect that error code as a final and fatal error that can not be recovered.
Based-on-patch-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NEric Paris <eparis@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee586811

17 11月, 2010 2 次提交

xfrm: update flowi saddr in icmp_send if unset · 7d98ffd8

由 Ulrich Weber 提交于 11月 05, 2010

otherwise xfrm_lookup will fail to find correct policy
Signed-off-by: NUlrich Weber <uweber@astaro.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d98ffd8

udp: use atomic_inc_not_zero_hint · c31504dc

由 Eric Dumazet 提交于 11月 15, 2010

UDP sockets refcount is usually 2, unless an incoming frame is going to
be queued in receive or backlog queue.

Using atomic_inc_not_zero_hint() permits to reduce latency, because
processor issues less memory transactions.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c31504dc

16 11月, 2010 2 次提交

xfrm: use gre key as flow upper protocol info · cc9ff19d

由 Timo Teräs 提交于 11月 03, 2010

The GRE Key field is intended to be used for identifying an individual
traffic flow within a tunnel. It is useful to be able to have XFRM
policy selector matches to have different policies for different
GRE tunnels.
Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc9ff19d

ipv4: Fix build with multicast disabled. · d9aa9380

由 David S. Miller 提交于 11月 15, 2010

net/ipv4/igmp.c: In function 'ip_mc_inc_group':
net/ipv4/igmp.c:1228: error: implicit declaration of function 'for_each_pmc_rtnl'
net/ipv4/igmp.c:1228: error: expected ';' before '{' token
net/ipv4/igmp.c: In function 'ip_mc_unmap':
net/ipv4/igmp.c:1333: error: expected ';' before 'igmp_group_dropped'
 ...

Move for_each_pmc_rcu and for_each_pmc_rtnl macro definitions
outside of multicast ifdef protection.
Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9aa9380

13 11月, 2010 2 次提交

tcp: Don't change unlocked socket state in tcp_v4_err(). · 8f49c270

由 David S. Miller 提交于 11月 12, 2010

Alexey Kuznetsov noticed a regression introduced by
commit f1ecd5d9
("Revert Backoff [v3]: Revert RTO on ICMP destination unreachable")

The RTO and timer modification code added to tcp_v4_err()
doesn't check sock_owned_by_user(), which if true means we
don't have exclusive access to the socket and therefore cannot
modify it's critical state.

Just skip this new code block if sock_owned_by_user() is true
and eliminate the now superfluous sock_owned_by_user() code
block contained within.
Reported-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
CC: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

8f49c270

igmp: RCU conversion of in_dev->mc_list · 1d7138de

由 Eric Dumazet 提交于 11月 12, 2010

in_dev->mc_list is protected by one rwlock (in_dev->mc_list_lock).

This can easily be converted to a RCU protection.

Writers hold RTNL, so mc_list_lock is removed, not replaced by a
spinlock.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Cypher Wu <cypher.w@gmail.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d7138de

12 11月, 2010 2 次提交

ipv4: Make rt->fl.iif tests lest obscure. · c7537967

由 David S. Miller 提交于 11月 11, 2010

When we test rt->fl.iif against zero, we're seeing if it's
an output or an input route.

Make that explicit with some helper functions.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7537967

net: get rid of rtable->idev · 72cdd1d9

由 Eric Dumazet 提交于 11月 11, 2010

It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.

We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).

infiniband case is solved using dst.dev instead of idev->dev

Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.

About 5% speedup on routing test.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72cdd1d9

11 11月, 2010 2 次提交

tcp: Increase TCP_MAXSEG socket option minimum. · 7a1abd08

由 David S. Miller 提交于 11月 10, 2010

As noted by Steve Chen, since commit
f5fff5dc ("tcp: advertise MSS
requested by user") we can end up with a situation where
tcp_select_initial_window() does a divide by a zero (or
even negative) mss value.

The problem is that sometimes we effectively subtract
TCPOLEN_TSTAMP_ALIGNED and/or TCPOLEN_MD5SIG_ALIGNED from the mss.

Fix this by increasing the minimum from 8 to 64.
Reported-by: NSteve Chen <schen@mvista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a1abd08

net: avoid limits overflow · 8d987e5c

由 Eric Dumazet 提交于 11月 09, 2010

Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]

We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Reported-by: NRobin Holt <holt@sgi.com>
Reviewed-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d987e5c

10 11月, 2010 2 次提交

net/ipv4/tcp.c: Update WARN uses · 2af6fd8b

由 Joe Perches 提交于 10月 30, 2010

Coalesce long formats.
Align arguments.
Remove KERN_<level>.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2af6fd8b

inet: fix ip_mc_drop_socket() · 18943d29

由 Eric Dumazet 提交于 11月 08, 2010

commit 8723e1b4 (inet: RCU changes in inetdev_by_index())
forgot one call site in ip_mc_drop_socket()

We should not decrease idev refcount after inetdev_by_index() call,
since refcount is not increased anymore.
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: NMiles Lane <miles.lane@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18943d29

05 11月, 2010 2 次提交

inet_diag: Make sure we actually run the same bytecode we audited. · 22e76c84

由 Nelson Elhage 提交于 11月 03, 2010

We were using nlmsg_find_attr() to look up the bytecode by attribute when
auditing, but then just using the first attribute when actually running
bytecode. So, if we received a message with two attribute elements, where only
the second had type INET_DIAG_REQ_BYTECODE, we would validate and run different
bytecode strings.

Fix this by consistently using nlmsg_find_attr everywhere.
Signed-off-by: NNelson Elhage <nelhage@ksplice.com>
Signed-off-by: NThomas Graf <tgraf@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22e76c84

fib: fib_result_assign() should not change fib refcounts · 1f1b9c99

由 Eric Dumazet 提交于 11月 04, 2010

After commit ebc0ffae (RCU conversion of fib_lookup()),
fib_result_assign()  should not change fib refcounts anymore.

Thanks to Michael who did the bisection and bug report.
Reported-by: NMichael Ellerman <michael@ellerman.id.au>
Tested-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f1b9c99

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功