提交 · fc75fc8339e7727167443469027540b283daac71 · openeuler / Kernel

26 12月, 2010 1 次提交

ipv4: dont create routes on down devices · fc75fc83

由 Eric Dumazet 提交于 12月 22, 2010

In ip_route_output_slow(), instead of allowing a route to be created on
a not UPed device, report -ENETUNREACH immediately.

# ip tunnel add mode ipip remote 10.16.0.164 local
10.16.0.72 dev eth0
# (Note : tunl1 is down)
# ping -I tunl1 10.1.2.3
PING 10.1.2.3 (10.1.2.3) from 192.168.18.5 tunl1: 56(84) bytes of data.
(nothing)
# ./a.out tunl1
# ip tunnel del tunl1
Message from syslogd@shelby at Dec 22 10:12:08 ...
  kernel: unregister_netdevice: waiting for tunl1 to become free.
Usage count = 3

After patch:
# ping -I tunl1 10.1.2.3
connect: Network is unreachable
Reported-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: NOctavian Purdila <opurdila@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc75fc83

24 12月, 2010 2 次提交

Revert "ipv4: Allow configuring subnets as local addresses" · e0584649

由 David S. Miller 提交于 12月 23, 2010

This reverts commit 4465b469.

Conflicts:

	net/ipv4/fib_frontend.c

As reported by Ben Greear, this causes regressions:

> Change 4465b469 caused rules
> to stop matching the input device properly because the
> FLOWI_FLAG_MATCH_ANY_IIF is always defined in ip_dev_find().
>
> This breaks rules such as:
>
> ip rule add pref 512 lookup local
> ip rule del pref 0 lookup local
> ip link set eth2 up
> ip -4 addr add 172.16.0.102/24 broadcast 172.16.0.255 dev eth2
> ip rule add to 172.16.0.102 iif eth2 lookup local pref 10
> ip rule add iif eth2 lookup 10001 pref 20
> ip route add 172.16.0.0/24 dev eth2 table 10001
> ip route add unreachable 0/0 table 10001
>
> If you had a second interface 'eth0' that was on a different
> subnet, pinging a system on that interface would fail:
>
>   [root@ct503-60 ~]# ping 192.168.100.1
>   connect: Invalid argument
Reported-by: NBen Greear <greearb@candelatech.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0584649

tcp: fix listening_get_next() · 1bde5ac4

由 Eric Dumazet 提交于 12月 23, 2010

Alexey Vlasov found /proc/net/tcp could sometime loop and display
millions of sockets in LISTEN state.

In 2.6.29, when we converted TCP hash tables to RCU, we left two
sk_next() calls in listening_get_next().

We must instead use sk_nulls_next() to properly detect an end of chain.
Reported-by: NAlexey Vlasov <renton@renton.name>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1bde5ac4

17 12月, 2010 1 次提交

net: fix nulls list corruptions in sk_prot_alloc · fcbdf09d

由 Octavian Purdila 提交于 12月 16, 2010

Special care is taken inside sk_port_alloc to avoid overwriting
skc_node/skc_nulls_node. We should also avoid overwriting
skc_bind_node/skc_portaddr_node.

The patch fixes the following crash:

 BUG: unable to handle kernel paging request at fffffffffffffff0
 IP: [<ffffffff812ec6dd>] udp4_lib_lookup2+0xad/0x370
 [<ffffffff812ecc22>] __udp4_lib_lookup+0x282/0x360
 [<ffffffff812ed63e>] __udp4_lib_rcv+0x31e/0x700
 [<ffffffff812bba45>] ? ip_local_deliver_finish+0x65/0x190
 [<ffffffff812bbbf8>] ? ip_local_deliver+0x88/0xa0
 [<ffffffff812eda35>] udp_rcv+0x15/0x20
 [<ffffffff812bba45>] ip_local_deliver_finish+0x65/0x190
 [<ffffffff812bbbf8>] ip_local_deliver+0x88/0xa0
 [<ffffffff812bb2cd>] ip_rcv_finish+0x32d/0x6f0
 [<ffffffff8128c14c>] ? netif_receive_skb+0x99c/0x11c0
 [<ffffffff812bb94b>] ip_rcv+0x2bb/0x350
 [<ffffffff8128c14c>] netif_receive_skb+0x99c/0x11c0
Signed-off-by: NLeonard Crestez <lcrestez@ixiacom.com>
Signed-off-by: NOctavian Purdila <opurdila@ixiacom.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fcbdf09d

09 12月, 2010 4 次提交

tcp: protect sysctl_tcp_cookie_size reads · f1987257

由 Eric Dumazet 提交于 12月 07, 2010

Make sure sysctl_tcp_cookie_size is read once in
tcp_cookie_size_check(), or we might return an illegal value to caller
if sysctl_tcp_cookie_size is changed by another cpu.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: William Allen Simpson <william.allen.simpson@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f1987257

tcp: avoid a possible divide by zero · ad9f4f50

由 Eric Dumazet 提交于 12月 07, 2010

sysctl_tcp_tso_win_divisor might be set to zero while one cpu runs in
tcp_tso_should_defer(). Make sure we dont allow a divide by zero by
reading sysctl_tcp_tso_win_divisor exactly once.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad9f4f50

tcp: Replace time wait bucket msg by counter · 67631510

由 Tom Herbert 提交于 12月 08, 2010

Rather than printing the message to the log, use a mib counter to keep
track of the count of occurences of time wait bucket overflow.  Reduces
spam in logs.
Signed-off-by: NTom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

67631510

tcp: Bug fix in initialization of receive window. · b1afde60

由 Nandita Dukkipati 提交于 12月 03, 2010

The bug has to do with boundary checks on the initial receive window.
If the initial receive window falls between init_cwnd and the
receive window specified by the user, the initial window is incorrectly
brought down to init_cwnd. The correct behavior is to allow it to
remain unchanged.
Signed-off-by: NNandita Dukkipati <nanditad@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1afde60

29 11月, 2010 2 次提交

inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners · b4ff3c90

由 Nagendra Tomar 提交于 11月 26, 2010

inet sockets corresponding to passive connections are added to the bind hash
using ___inet_inherit_port(). These sockets are later removed from the bind
hash using __inet_put_port(). These two functions are not exactly symmetrical.
__inet_put_port() decrements hashinfo->bsockets and tb->num_owners, whereas
___inet_inherit_port() does not increment them. This results in both of these
going to -ve values.

This patch fixes this by calling inet_bind_hash() from ___inet_inherit_port(),
which does the right thing.

'bsockets' and 'num_owners' were introduced by commit a9d8f911
(inet: Allowing more than 64k connections and heavily optimize bind(0))
Signed-off-by: NNagendra Singh Tomar <tomer_iisc@yahoo.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NEvgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4ff3c90

tcp: restrict net.ipv4.tcp_adv_win_scale (#20312) · 0147fc05

由 Alexey Dobriyan 提交于 11月 22, 2010

tcp_win_from_space() does the following:

      if (sysctl_tcp_adv_win_scale <= 0)
              return space >> (-sysctl_tcp_adv_win_scale);
      else
              return space - (space >> sysctl_tcp_adv_win_scale);

"space" is int.

As per C99 6.5.7 (3) shifting int for 32 or more bits is
undefined behaviour.

Indeed, if sysctl_tcp_adv_win_scale is exactly 32,
space >> 32 equals space and function returns 0.

Which means we busyloop in tcp_fixup_rcvbuf().

Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].

Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312

Steps to reproduce:

      echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
      wget www.kernel.org
      [softlockup]
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0147fc05

28 11月, 2010 1 次提交

netns: Don't leak others' openreq-s in proc · 8475ef9f

由 Pavel Emelyanov 提交于 11月 22, 2010

The /proc/net/tcp leaks openreq sockets from other namespaces.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8475ef9f

25 11月, 2010 1 次提交

tcp: Make TCP_MAXSEG minimum more correct. · c39508d6

由 David S. Miller 提交于 11月 24, 2010

Use TCP_MIN_MSS instead of constant 64.
Reported-by: NMin Zhang <mzhang@mvista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c39508d6

22 11月, 2010 1 次提交

net: allow GFP_HIGHMEM in __vmalloc() · 7a1c8e5a

由 Eric Dumazet 提交于 11月 20, 2010

We forgot to use __GFP_HIGHMEM in several __vmalloc() calls.

In ceph, add the missing flag.

In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is
cleaner and allows using HIGHMEM pages as well.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a1c8e5a

17 11月, 2010 1 次提交

xfrm: update flowi saddr in icmp_send if unset · 7d98ffd8

由 Ulrich Weber 提交于 11月 05, 2010

otherwise xfrm_lookup will fail to find correct policy
Signed-off-by: NUlrich Weber <uweber@astaro.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d98ffd8

13 11月, 2010 1 次提交

tcp: Don't change unlocked socket state in tcp_v4_err(). · 8f49c270

由 David S. Miller 提交于 11月 12, 2010

Alexey Kuznetsov noticed a regression introduced by
commit f1ecd5d9
("Revert Backoff [v3]: Revert RTO on ICMP destination unreachable")

The RTO and timer modification code added to tcp_v4_err()
doesn't check sock_owned_by_user(), which if true means we
don't have exclusive access to the socket and therefore cannot
modify it's critical state.

Just skip this new code block if sock_owned_by_user() is true
and eliminate the now superfluous sock_owned_by_user() code
block contained within.
Reported-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
CC: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

8f49c270

11 11月, 2010 2 次提交

tcp: Increase TCP_MAXSEG socket option minimum. · 7a1abd08

由 David S. Miller 提交于 11月 10, 2010

As noted by Steve Chen, since commit
f5fff5dc ("tcp: advertise MSS
requested by user") we can end up with a situation where
tcp_select_initial_window() does a divide by a zero (or
even negative) mss value.

The problem is that sometimes we effectively subtract
TCPOLEN_TSTAMP_ALIGNED and/or TCPOLEN_MD5SIG_ALIGNED from the mss.

Fix this by increasing the minimum from 8 to 64.
Reported-by: NSteve Chen <schen@mvista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a1abd08

net: avoid limits overflow · 8d987e5c

由 Eric Dumazet 提交于 11月 09, 2010

Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]

We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Reported-by: NRobin Holt <holt@sgi.com>
Reviewed-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d987e5c

10 11月, 2010 1 次提交

inet: fix ip_mc_drop_socket() · 18943d29

由 Eric Dumazet 提交于 11月 08, 2010

commit 8723e1b4 (inet: RCU changes in inetdev_by_index())
forgot one call site in ip_mc_drop_socket()

We should not decrease idev refcount after inetdev_by_index() call,
since refcount is not increased anymore.
Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: NMiles Lane <miles.lane@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18943d29

05 11月, 2010 2 次提交

inet_diag: Make sure we actually run the same bytecode we audited. · 22e76c84

由 Nelson Elhage 提交于 11月 03, 2010

We were using nlmsg_find_attr() to look up the bytecode by attribute when
auditing, but then just using the first attribute when actually running
bytecode. So, if we received a message with two attribute elements, where only
the second had type INET_DIAG_REQ_BYTECODE, we would validate and run different
bytecode strings.

Fix this by consistently using nlmsg_find_attr everywhere.
Signed-off-by: NNelson Elhage <nelhage@ksplice.com>
Signed-off-by: NThomas Graf <tgraf@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22e76c84

fib: fib_result_assign() should not change fib refcounts · 1f1b9c99

由 Eric Dumazet 提交于 11月 04, 2010

After commit ebc0ffae (RCU conversion of fib_lookup()),
fib_result_assign()  should not change fib refcounts anymore.

Thanks to Michael who did the bisection and bug report.
Reported-by: NMichael Ellerman <michael@ellerman.id.au>
Tested-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f1b9c99

03 11月, 2010 2 次提交

ipv4: netfilter: ip_tables: fix information leak to userland · b5f15ac4

由 Vasiliy Kulikov 提交于 11月 03, 2010

Structure ipt_getinfo is copied to userland with the field "name"
that has the last elements unitialized.  It leads to leaking of
contents of kernel stack memory.
Signed-off-by: NVasiliy Kulikov <segooon@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

b5f15ac4

ipv4: netfilter: arp_tables: fix information leak to userland · 1a8b7a67

由 Vasiliy Kulikov 提交于 11月 03, 2010

Structure arpt_getinfo is copied to userland with the field "name"
that has the last elements unitialized.  It leads to leaking of
contents of kernel stack memory.
Signed-off-by: NVasiliy Kulikov <segooon@gmail.com>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

1a8b7a67

31 10月, 2010 1 次提交

ip_gre: fix fallback tunnel setup · 3285ee3b

由 Eric Dumazet 提交于 10月 30, 2010

Before making the fallback tunnel visible to lookups, we should make
sure it is completely setup, once ipgre_tunnel_init() had been called
and tstats per_cpu pointer allocated.

move rcu_assign_pointer(ign->tunnels_wc[0], tunnel); from
ipgre_fb_tunnel_init() to ipgre_init_net()

Based on a patch from Pavel Emelyanov
Reported-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3285ee3b

29 10月, 2010 2 次提交

netfilter: nf_nat: fix compiler warning with CONFIG_NF_CT_NETLINK=n · 64e46749

由 Patrick McHardy 提交于 10月 29, 2010

net/ipv4/netfilter/nf_nat_core.c:52: warning: 'nf_nat_proto_find_get' defined but not used
net/ipv4/netfilter/nf_nat_core.c:66: warning: 'nf_nat_proto_put' defined but not used
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

64e46749

fib: Fix fib zone and its hash leak on namespace stop · 4aa2c466

由 Pavel Emelyanov 提交于 10月 28, 2010

When we stop a namespace we flush the table and free one, but the
added fn_zone-s (and their hashes if grown) are leaked. Need to free.
Tries releases all its stuff in the flushing code.

Shame on us - this bug exists since the very first make-fib-per-net
patches in 2.6.27 :(
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4aa2c466

28 10月, 2010 5 次提交

tunnels: Fix tunnels change rcu protection · 74b0b85b

由 Pavel Emelyanov 提交于 10月 27, 2010

After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug
was introduced into the SIOCCHGTUNNEL code.

The tunnel is first unlinked, then addresses change, then it is linked
back probably into another bucket. But while changing the parms, the
hash table is unlocked to readers and they can lookup the improper tunnel.

Respective commits are b7285b79 (ipip: get rid of ipip_lock), 1507850b
(gre: get rid of ipgre_lock), 3a43be3c (sit: get rid of ipip6_lock) and
94767632 (ip6tnl: get rid of ip6_tnl_lock).

The quick fix is to wait for quiescent state to pass after unlinking,
but if it is inappropriate I can invent something better, just let me
know.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74b0b85b

inetpeer: __rcu annotations · b914c4ea

由 Eric Dumazet 提交于 10月 25, 2010

Adds __rcu annotations to inetpeer
	(struct inet_peer)->avl_left
	(struct inet_peer)->avl_right

This is a tedious cleanup, but removes one smp_wmb() from link_to_pool()
since we now use more self documenting rcu_assign_pointer().

Note the use of RCU_INIT_POINTER() instead of rcu_assign_pointer() in
all cases we dont need a memory barrier.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b914c4ea

tunnels: add __rcu annotations · b33eab08

由 Eric Dumazet 提交于 10月 25, 2010

Add __rcu annotations to :
        (struct ip_tunnel)->prl
        (struct ip_tunnel_prl_entry)->next
        (struct xfrm_tunnel)->next
	struct xfrm_tunnel *tunnel4_handlers
	struct xfrm_tunnel *tunnel64_handlers

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b33eab08

net: add __rcu annotations to protocol · e0ad61ec

由 Eric Dumazet 提交于 10月 25, 2010

Add __rcu annotations to :
        struct net_protocol *inet_protos
        struct net_protocol *inet6_protos

And use appropriate casts to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e0ad61ec

ipv4: add __rcu annotations to routes.c · 1c31720a

由 Eric Dumazet 提交于 10月 25, 2010

Add __rcu annotations to :
        (struct dst_entry)->rt_next
        (struct rt_hash_bucket)->chain

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c31720a

27 10月, 2010 1 次提交

fib_hash: fix rcu sparse and logical errors · ded85aa8

由 Eric Dumazet 提交于 10月 26, 2010

While fixing CONFIG_SPARSE_RCU_POINTER errors, I had to fix accesses to
fz->fz_hash for real.

-	&fz->fz_hash[fn_hash(f->fn_key, fz)]
+	rcu_dereference(fz->fz_hash) + fn_hash(f->fn_key, fz)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ded85aa8

26 10月, 2010 3 次提交

ipv4: add __rcu annotations to ip_ra_chain · 43a951e9

由 Eric Dumazet 提交于 10月 25, 2010

Add __rcu annotations to :
        (struct ip_ra_chain)->next
	struct ip_ra_chain *ip_ra_chain;

And use appropriate rcu primitives.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43a951e9

net: add __rcu annotation to sk_filter · 0d7da9dd

由 Eric Dumazet 提交于 10月 25, 2010

Add __rcu annotation to :
        (struct sock)->sk_filter

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d7da9dd

tunnels: add _rcu annotations · 6f0bcf15

由 Eric Dumazet 提交于 10月 24, 2010

(struct ip6_tnl)->next is rcu protected :
(struct ip_tunnel)->next is rcu protected :
(struct xfrm6_tunnel)->next is rcu protected :

add __rcu annotation and proper rcu primitives.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f0bcf15

21 10月, 2010 4 次提交

nf_nat: restrict ICMP translation for embedded header · b0aeef30

由 Julian Anastasov 提交于 10月 11, 2010

 	Skip ICMP translation of embedded protocol header
if NAT bits are not set. Needed for IPVS to see the original
embedded addresses because for IPVS traffic the IPS_SRC_NAT_BIT
and IPS_DST_NAT_BIT bits are not set. It happens when IPVS performs
DNAT for client packets after using nf_conntrack_alter_reply
to expect replies from real server.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NSimon Horman <horms@verge.net.au>

b0aeef30

tproxy: fix hash locking issue when using port redirection in __inet_inherit_port() · 093d2823

由 Balazs Scheidler 提交于 10月 21, 2010

When __inet_inherit_port() is called on a tproxy connection the wrong locks are
held for the inet_bind_bucket it is added to. __inet_inherit_port() made an
implicit assumption that the listener's port number (and thus its bind bucket).
Unfortunately, if you're using the TPROXY target to redirect skbs to a
transparent proxy that assumption is not true anymore and things break.

This patch adds code to __inet_inherit_port() so that it can handle this case
by looking up or creating a new bind bucket for the child socket and updates
callers of __inet_inherit_port() to gracefully handle __inet_inherit_port()
failing.

Reported by and original patch from Stephen Buck <stephen.buck@exinda.com>.
See http://marc.info/?t=128169268200001&r=1&w=2 for the original discussion.
Signed-off-by: NKOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: NPatrick McHardy <kaber@trash.net>

093d2823

fib: introduce fib_alias_accessed() helper · 9b0c290e

由 Eric Dumazet 提交于 10月 20, 2010

Perf tools session at NFWS 2010 pointed out a false sharing on struct
fib_alias that can be avoided pretty easily, if we set FA_S_ACCESSED bit
only if needed (ie : not already set)
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9b0c290e

secmark: export secctx, drop secmark in procfs · 1ae4de0c

由 Eric Paris 提交于 10月 13, 2010

The current secmark code exports a secmark= field which just indicates if
there is special labeling on a packet or not.  We drop this field as it
isn't particularly useful and instead export a new field secctx= which is
the actual human readable text label.
Signed-off-by: NEric Paris <eparis@redhat.com>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NJames Morris <jmorris@namei.org>

1ae4de0c

20 10月, 2010 1 次提交

net: avoid RCU for NOCACHE dst · 27b75c95

由 Eric Dumazet 提交于 10月 15, 2010

There is no point using RCU for dst we allocate for a very short time
(used once).

Change dst_release() to take DST_NOCACHE into account, but also change
skb_dst_set_noref() to force a refcount increment for such dst.

This is a _huge_ gain, because we dont waste memory to store xx thousand
of dsts. Instead of queueing them to RCU, we can free them instantly.

CPU caches can stay hot, re-using same memory blocks to hold temporary
dsts.

Note : remove unneeded smp_mb__before_atomic_dec(); in dst_release(),
since atomic_dec_return() implies a full memory barrier.

Stress test, 160.000.000 udp frames sent, IP route cache disabled
(DDOS).

Before:

real    0m38.091s
user    0m13.189s
sys     7m53.018s

After:

real	0m29.946s
user	0m12.157s
sys	7m40.605s

For reference, if IP route cache was enabled :

real	0m32.030s
user	0m10.521s
sys	8m15.243s
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

27b75c95

19 10月, 2010 1 次提交

inet: RCU changes in inetdev_by_index() · 8723e1b4

由 Eric Dumazet 提交于 10月 19, 2010

Convert inetdev_by_index() to not increment in_dev refcount.

Callers hold RCU or RTNL, and should not decrement in_dev refcount.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8723e1b4

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功