提交 · cc6f02dd490dac4ad821d5077b934c9b37037cd0 · OpenHarmony / kernel_linux

14 12月, 2010 1 次提交

net: add limits to ip_default_ttl · 249fab77

由 Eric Dumazet 提交于 12月 13, 2010

ip_default_ttl should be between 1 and 255
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

249fab77

13 12月, 2010 2 次提交

ipv4: Don't pre-seed hoplimit metric. · 323e126f

由 David S. Miller 提交于 12月 12, 2010

Always go through a new ip4_dst_hoplimit() helper, just like ipv6.

This allowed several simplifications:

1) The interim dst_metric_hoplimit() can go as it's no longer
   userd.

2) The sysctl_ip_default_ttl entry no longer needs to use
   ipv4_doint_and_flush, since the sysctl is not cached in
   routing cache metrics any longer.

3) ipv4_doint_and_flush no longer needs to be exported and
   therefore can be marked static.

When ipv4_doint_and_flush_strategy was removed some time ago,
the external declaration in ip.h was mistakenly left around
so kill that off too.

We have to move the sysctl_ip_default_ttl declaration into
ipv4's route cache definition header net/route.h, because
currently net/ip.h (where the declaration lives now) has
a back dependency on net/route.h
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

323e126f

D
net: Abstract RTAX_HOPLIMIT metric accesses behind helper. · 5170ae82
由 David S. Miller 提交于 12月 12, 2010
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
5170ae82

11 12月, 2010 1 次提交

xfrm: Traffic Flow Confidentiality for IPv4 ESP · d979e20f

由 Martin Willi 提交于 12月 08, 2010

Add TFC padding to all packets smaller than the boundary configured
on the xfrm state. If the boundary is larger than the PMTU, limit
padding to the PMTU.
Signed-off-by: NMartin Willi <martin@strongswan.org>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d979e20f

10 12月, 2010 2 次提交

net: optimize INET input path further · 68835aba

由 Eric Dumazet 提交于 11月 30, 2010

Followup of commit b178bb3d (net: reorder struct sock fields)

Optimize INET input path a bit further, by :

1) moving sk_refcnt close to sk_lock.

This reduces number of dirtied cache lines by one on 64bit arches (and
64 bytes cache line size).

2) moving inet_daddr & inet_rcv_saddr at the beginning of sk

(same cache line than hash / family / bound_dev_if / nulls_node)

This reduces number of accessed cache lines in lookups by one, and dont
increase size of inet and timewait socks.
inet and tw sockets now share same place-holder for these fields.

Before patch :

offsetof(struct sock, sk_refcnt) = 0x10
offsetof(struct sock, sk_lock) = 0x40
offsetof(struct sock, sk_receive_queue) = 0x60
offsetof(struct inet_sock, inet_daddr) = 0x270
offsetof(struct inet_sock, inet_rcv_saddr) = 0x274

After patch :

offsetof(struct sock, sk_refcnt) = 0x44
offsetof(struct sock, sk_lock) = 0x48
offsetof(struct sock, sk_receive_queue) = 0x68
offsetof(struct inet_sock, inet_daddr) = 0x0
offsetof(struct inet_sock, inet_rcv_saddr) = 0x4

compute_score() (udp or tcp) now use a single cache line per ignored
item, instead of two.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68835aba

net: Abstract away all dst_entry metrics accesses. · defb3519

由 David S. Miller 提交于 12月 08, 2010

Use helper functions to hide all direct accesses, especially writes,
to dst_entry metrics values.

This will allow us to:

1) More easily change how the metrics are stored.

2) Implement COW for metrics.

In particular this will help us put metrics into the inetpeer
cache if that is what we end up doing.  We can make the _metrics
member a pointer instead of an array, initially have it point
at the read-only metrics in the FIB, and then on the first set
grab an inetpeer entry and point the _metrics member there.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

defb3519

09 12月, 2010 5 次提交

tcp: protect sysctl_tcp_cookie_size reads · f1987257

由 Eric Dumazet 提交于 12月 07, 2010

Make sure sysctl_tcp_cookie_size is read once in
tcp_cookie_size_check(), or we might return an illegal value to caller
if sysctl_tcp_cookie_size is changed by another cpu.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: William Allen Simpson <william.allen.simpson@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1987257

tcp: avoid a possible divide by zero · ad9f4f50

由 Eric Dumazet 提交于 12月 07, 2010

sysctl_tcp_tso_win_divisor might be set to zero while one cpu runs in
tcp_tso_should_defer(). Make sure we dont allow a divide by zero by
reading sysctl_tcp_tso_win_divisor exactly once.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad9f4f50

tcp: Replace time wait bucket msg by counter · 67631510

由 Tom Herbert 提交于 12月 08, 2010

Rather than printing the message to the log, use a mib counter to keep
track of the count of occurences of time wait bucket overflow.  Reduces
spam in logs.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67631510

net: RCU conversion of dev_getbyhwaddr() and arp_ioctl() · 941666c2

由 Eric Dumazet 提交于 12月 05, 2010

Le dimanche 05 décembre 2010 à 09:19 +0100, Eric Dumazet a écrit :

> Hmm..
>
> If somebody can explain why RTNL is held in arp_ioctl() (and therefore
> in arp_req_delete()), we might first remove RTNL use in arp_ioctl() so
> that your patch can be applied.
>
> Right now it is not good, because RTNL wont be necessarly held when you
> are going to call arp_invalidate() ?

While doing this analysis, I found a refcount bug in llc, I'll send a
patch for net-2.6

Meanwhile, here is the patch for net-next-2.6

Your patch then can be applied after mine.

Thanks

[PATCH] net: RCU conversion of dev_getbyhwaddr() and arp_ioctl()

dev_getbyhwaddr() was called under RTNL.

Rename it to dev_getbyhwaddr_rcu() and change all its caller to now use
RCU locking instead of RTNL.

Change arp_ioctl() to use RCU instead of RTNL locking.

Note: this fix a dev refcount bug in llc
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

941666c2

tcp: Bug fix in initialization of receive window. · b1afde60

由 Nandita Dukkipati 提交于 12月 03, 2010

The bug has to do with boundary checks on the initial receive window.
If the initial receive window falls between init_cwnd and the
receive window specified by the user, the initial window is incorrectly
brought down to init_cwnd. The correct behavior is to allow it to
remain unchanged.
Signed-off-by: NNandita Dukkipati <nanditad@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1afde60

07 12月, 2010 2 次提交

net: arp: use assignment · ae9c416d

由 Changli Gao 提交于 12月 01, 2010

Only when dont_send is 0, arp_filter() is consulted, so we can simply
assign the return value of arp_filter() to dont_send instead.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ae9c416d

net: kill an RCU warning in inet_fill_link_af() · f7fce74e

由 Eric Dumazet 提交于 12月 01, 2010

commits 9f0f7272 (ipv4: AF_INET link address family) and cf7afbfe
(rtnl: make link af-specific updates atomic) used incorrect
__in_dev_get_rcu() in RTNL protected contexts, triggering PROVE_RCU
warnings.

Switch to __in_dev_get_rtnl(), wich is more appropriate, since we hold
RTNL.

Based on a report and initial patch from Amerigo Wang.
Reported-by: NAmerigo Wang <amwang@redhat.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Reviewed-by: NWANG Cong <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f7fce74e

03 12月, 2010 1 次提交

tcp: use TCP_BASE_MSS to set basic mss value · 97b1ce25

由 Shan Wei 提交于 12月 01, 2010

TCP_BASE_MSS is defined, but not used.
commit 5d424d5a introduce this macro, so use
it to initial sysctl_tcp_base_mss.

commit 5d424d5a
Author: John Heffner <jheffner@psc.edu>
Date:   Mon Mar 20 17:53:41 2006 -0800

    [TCP]: MTU probing
Signed-off-by: NShan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97b1ce25

02 12月, 2010 5 次提交

timewait_sock: Create and use getpeer op. · ccb7c410

由 David S. Miller 提交于 12月 01, 2010

The only thing AF-specific about remembering the timestamp
for a time-wait TCP socket is getting the peer.

Abstract that behind a new timewait_sock_ops vector.

Support for real IPV6 sockets is not filled in yet, but
curiously this makes timewait recycling start to work
for v4-mapped ipv6 sockets.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ccb7c410

D
inetpeer: Kill use of inet_peer_address_t typedef. · 8790ca17
由 David S. Miller 提交于 12月 01, 2010
```
They are verboten these days.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
8790ca17

ipip: add module alias for tunl0 tunnel device · 8afe7c8a

由 stephen hemminger 提交于 11月 29, 2010

If ipip is built as a module the 'ip tunnel add' command would fail because
the ipip module was not being autoloaded. Adding an alias for
the tunl0 device name cause dev_load() to autoload it when needed.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8afe7c8a

gre: add module alias for gre0 tunnel device · 4da6a738

由 stephen hemminger 提交于 11月 29, 2010

If gre is built as a module the 'ip tunnel add' command would fail because
the ip_gre module was not being autoloaded. Adding an alias for
the gre0 device name cause dev_load() to autoload it when needed.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4da6a738

gre: minor cleanups · 407d6fcb

由 stephen hemminger 提交于 11月 29, 2010

Use strcpy() rather the sprintf() for the case where name is getting
generated.  Fix indentation.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

407d6fcb

01 12月, 2010 7 次提交

inet: Turn ->remember_stamp into ->get_peer in connection AF ops. · 3f419d2d

由 David S. Miller 提交于 11月 29, 2010

Then we can make a completely generic tcp_remember_stamp()
that uses ->get_peer() as a helper, minimizing the AF specific
code and minimizing the eventual code duplication when we implement
the ipv6 side of TW recycling.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f419d2d

D
ipv6: Add infrastructure to bind inet_peer objects to routes. · b3419363
由 David S. Miller 提交于 11月 30, 2010
```
They are only allowed on cached ipv6 routes.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
b3419363

inetpeer: Add v6 peers tree, abstract root properly. · 021e9299

由 David S. Miller 提交于 11月 30, 2010

Add the ipv6 peer tree instance, and adapt remaining
direct references to 'v4_peers' as needed.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

021e9299

inetpeer: Abstract address comparisons. · 02663045

由 David S. Miller 提交于 11月 30, 2010

Now v4 and v6 addresses will both work properly.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02663045

D
inetpeer: Make inet_getpeer() take an inet_peer_adress_t pointer. · b534ecf1
由 David S. Miller 提交于 11月 30, 2010
```
And make an inet_getpeer_v4() helper, update callers.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
b534ecf1

inetpeer: Introduce inet_peer_address_t. · 582a72da

由 David S. Miller 提交于 11月 30, 2010

Currently only the v4 aspect is used, but this will change.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

582a72da

inetpeer: Abstract out the tree root accesses. · 98158f5a

由 David S. Miller 提交于 11月 30, 2010

Instead of directly accessing "peer", change to code to
operate using a "struct inet_peer_base *" pointer.

This will facilitate the addition of a seperate tree for
ipv6 peer entries.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

98158f5a

29 11月, 2010 3 次提交

inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners · b4ff3c90

由 Nagendra Tomar 提交于 11月 26, 2010

inet sockets corresponding to passive connections are added to the bind hash
using ___inet_inherit_port(). These sockets are later removed from the bind
hash using __inet_put_port(). These two functions are not exactly symmetrical.
__inet_put_port() decrements hashinfo->bsockets and tb->num_owners, whereas
___inet_inherit_port() does not increment them. This results in both of these
going to -ve values.

This patch fixes this by calling inet_bind_hash() from ___inet_inherit_port(),
which does the right thing.

'bsockets' and 'num_owners' were introduced by commit a9d8f911
(inet: Allowing more than 64k connections and heavily optimize bind(0))
Signed-off-by: NNagendra Singh Tomar <tomer_iisc@yahoo.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4ff3c90

net: add some KERN_CONT markers to continuation lines · a40c9f88

由 Uwe Kleine-König 提交于 11月 23, 2010

Cc: netdev@vger.kernel.org
Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a40c9f88

tcp: restrict net.ipv4.tcp_adv_win_scale (#20312) · 0147fc05

由 Alexey Dobriyan 提交于 11月 22, 2010

tcp_win_from_space() does the following:

      if (sysctl_tcp_adv_win_scale <= 0)
              return space >> (-sysctl_tcp_adv_win_scale);
      else
              return space - (space >> sysctl_tcp_adv_win_scale);

"space" is int.

As per C99 6.5.7 (3) shifting int for 32 or more bits is
undefined behaviour.

Indeed, if sysctl_tcp_adv_win_scale is exactly 32,
space >> 32 equals space and function returns 0.

Which means we busyloop in tcp_fixup_rcvbuf().

Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].

Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312

Steps to reproduce:

      echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
      wget www.kernel.org
      [softlockup]
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0147fc05

28 11月, 2010 2 次提交

netns: Don't leak others' openreq-s in proc · 8475ef9f

由 Pavel Emelyanov 提交于 11月 22, 2010

The /proc/net/tcp leaks openreq sockets from other namespaces.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8475ef9f

rtnl: make link af-specific updates atomic · cf7afbfe

由 Thomas Graf 提交于 11月 22, 2010

As David pointed out correctly, updates to af-specific attributes
are currently not atomic. If multiple changes are requested and
one of them fails, previous updates may have been applied already
leaving the link behind in a undefined state.

This patch splits the function parse_link_af() into two functions
validate_link_af() and set_link_at(). validate_link_af() is placed
to validate_linkmsg() check for errors as early as possible before
any changes to the link have been made. set_link_af() is called to
commit the changes later.

This method is not fail proof, while it is currently sufficient
to make set_link_af() inerrable and thus 100% atomic, the
validation function method will not be able to detect all error
scenarios in the future, there will likely always be errors
depending on states which are f.e. not protected by rtnl_mutex
and thus may change between validation and setting.

Also, instead of silently ignoring unknown address families and
config blocks for address families which did not register a set
function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are
returned to avoid comitting 4 out of 5 update requests without
notifying the user.
Signed-off-by: NThomas Graf <tgraf@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf7afbfe

25 11月, 2010 2 次提交

tcp: Make TCP_MAXSEG minimum more correct. · c39508d6

由 David S. Miller 提交于 11月 24, 2010

Use TCP_MIN_MSS instead of constant 64.
Reported-by: NMin Zhang <mzhang@mvista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c39508d6

xps: Improvements in TX queue selection · 3853b584

由 Tom Herbert 提交于 11月 21, 2010

In dev_pick_tx, don't do work in calculating queue
index or setting
the index in the sock unless the device has more than one queue.  This
allows the sock to be set only with a queue index of a multi-queue
device which is desirable if device are stacked like in a tunnel.

We also allow the mapping of a socket to queue to be changed.  To
maintain in order packet transmission a flag (ooo_okay) has been
added to the sk_buff structure.  If a transport layer sets this flag
on a packet, the transmit queue can be changed for the socket.
Presumably, the transport would set this if there was no possbility
of creating OOO packets (for instance, there are no packets in flight
for the socket).  This patch includes the modification in TCP output
for setting this flag.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3853b584

23 11月, 2010 1 次提交

Net: ipv4: netfilter: Makefile: Remove deprecated kbuild goal definitions · 6b8ff8c5

由 Tracey Dent 提交于 11月 21, 2010

Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.
Signed-off-by: NTracey Dent <tdent48227@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b8ff8c5

22 11月, 2010 1 次提交

net: allow GFP_HIGHMEM in __vmalloc() · 7a1c8e5a

由 Eric Dumazet 提交于 11月 20, 2010

We forgot to use __GFP_HIGHMEM in several __vmalloc() calls.

In ceph, add the missing flag.

In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is
cleaner and allows using HIGHMEM pages as well.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a1c8e5a

19 11月, 2010 2 次提交

igmp: refine skb allocations · 57e1ab6e

由 Eric Dumazet 提交于 11月 16, 2010

IGMP allocates MTU sized skbs. This may fail for large MTU (order-2
allocations), so add a fallback to try lower sizes.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57e1ab6e

bonding: IGMP handling cleanup · 866f3b25

由 Eric Dumazet 提交于 11月 18, 2010

Instead of iterating in_dev->mc_list from bonding driver, its better
to call a helper function provided by igmp.c
Details of implementation (locking) are private to igmp code.

ip_mc_rejoin_group(struct ip_mc_list *im) becomes
ip_mc_rejoin_groups(struct in_device *in_dev);
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

866f3b25

18 11月, 2010 3 次提交

net: ipv4: tcp_probe: cleanup snprintf() use · dda0b386

由 Vasiliy Kulikov 提交于 11月 14, 2010

snprintf() returns number of bytes that were copied if there is no overflow.
This code uses return value as number of copied bytes. Theoretically format
string '%lu.%09lu %pI4:%u %pI4:%u %d %#x %#x %u %u %u %u\n' may be expanded
up to 163 bytes. In reality tv.tv_sec is just few bytes instead of 20, 2 ports
are just 5 bytes each instead of 10, length is 5 bytes instead of 10. The rest
is an unstrusted input. Theoretically if tv_sec is big then copy_to_user() would
overflow tbuf.

tbuf was increased to fit in 163 bytes. snprintf() is used to follow return
value semantic.
Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dda0b386

net: use the macros defined for the members of flowi · 5811662b

由 Changli Gao 提交于 11月 12, 2010

Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5811662b

ipv4: AF_INET link address family · 9f0f7272

由 Thomas Graf 提交于 11月 16, 2010

Implements the AF_INET link address family exposing the per
device configuration settings via netlink using the attribute
IFLA_INET_CONF.

The format of IFLA_INET_CONF differs depending on the direction
the attribute is sent. The attribute sent by the kernel consists
of a u32 array, basically a 1:1 copy of in_device->cnf.data[].
The attribute expected by the kernel must consist of a sequence
of nested u32 attributes, each representing a change request,
e.g.
	[IFLA_INET_CONF] = {
		[IPV4_DEVCONF_FORWARDING] = 1,
		[IPV4_DEVCONF_NOXFRM] = 0,
	}

libnl userspace API documentation and example available from:
http://www.infradead.org/~tgr/libnl/doc-git/group__link__inet.htmlSigned-off-by: NThomas Graf <tgraf@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9f0f7272

OpenHarmony / kernel_linux 上一次同步 3 年多

OpenHarmony / kernel_linux
上一次同步 3 年多