提交 · ab92bb2f679d66c7e12a6b1c0cdd76fe308f6546 · openeuler / Kernel

11 7月, 2012 2 次提交
- D
  tcp: Abstract back handling peer aliveness test into helper function. · ab92bb2f
  由 David S. Miller 提交于 7月 09, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  ab92bb2f
- D
  tcp: Move dynamnic metrics handling into seperate file. · 4aabd8ef
  由 David S. Miller 提交于 7月 09, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  4aabd8ef
06 7月, 2012 1 次提交

ipv4: Avoid overhead when no custom FIB rules are installed. · f4530fa5

由 David S. Miller 提交于 7月 05, 2012

If the user hasn't actually installed any custom rules, or fiddled
with the default ones, don't go through the whole FIB rules layer.

It's just pure overhead.

Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check
the individual tables by hand, one by one.

Also, move fib_num_tclassid_users into the ipv4 network namespace.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f4530fa5

05 7月, 2012 8 次提交

ipv4: defer fib_compute_spec_dst() call · bf5e53e3

由 Eric Dumazet 提交于 7月 04, 2012

ip_options_compile() can avoid calling fib_compute_spec_dst()
by default, and perform the call only if needed.

David suggested to add a helper to make the call only once.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bf5e53e3

D
ipv4: No need to set generic neighbour pointer. · f187bc6e
由 David S. Miller 提交于 7月 03, 2012
```
Nobody reads it any longer.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
f187bc6e

net: Add optional SKB arg to dst_ops->neigh_lookup(). · f894cbf8

由 David S. Miller 提交于 7月 02, 2012

Causes the handler to use the daddr in the ipv4/ipv6 header when
the route gateway is unspecified (local subnet).
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f894cbf8

net: Do delayed neigh confirmation. · 5110effe

由 David S. Miller 提交于 7月 02, 2012

When a dst_confirm() happens, mark the confirmation as pending in the
dst.  Then on the next packet out, when we have the neigh in-hand, do
the update.

This removes the dependency in dst_confirm() of dst's having an
attached neigh.

While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL.  So just fix those cases up.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5110effe

ipv4: Don't report neigh uptodate state in rtcache procfs. · 3c521f2b

由 David S. Miller 提交于 7月 02, 2012

Soon routes will not have a cached neigh attached, nor will we
be able to necessarily go directly to a neigh from an arbitrary
route.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c521f2b

ipv4: Make neigh lookups directly in output packet path. · a263b309

由 David S. Miller 提交于 7月 02, 2012

Do not use the dst cached neigh, we'll be getting rid of that.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a263b309

ipv4: Fix crashes in ip_options_compile(). · 11604721

由 David S. Miller 提交于 7月 04, 2012

The spec_dst uses should be guarded by skb_rtable() being non-NULL
not just the SKB being non-null.
Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

11604721

netfilter: nf_conntrack: generalize nf_ct_l4proto_net · 08911475

由 Pablo Neira Ayuso 提交于 6月 29, 2012

This patch generalizes nf_ct_l4proto_net by splitting it into chunks and
moving the corresponding protocol part to where it really belongs to.

To clarify, note that we follow two different approaches to support per-net
depending if it's built-in or run-time loadable protocol tracker.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NGao feng <gaofeng@cn.fujitsu.com>

08911475

30 6月, 2012 1 次提交

netlink: add netlink_kernel_cfg parameter to netlink_kernel_create · a31f2d17

由 Pablo Neira Ayuso 提交于 6月 29, 2012

This patch adds the following structure:

struct netlink_kernel_cfg {
        unsigned int    groups;
        void            (*input)(struct sk_buff *skb);
        struct mutex    *cb_mutex;
};

That can be passed to netlink_kernel_create to set optional configurations
for netlink kernel sockets.

I've populated this structure by looking for NULL and zero parameters at the
existing code. The remaining parameters that always need to be set are still
left in the original interface.

That includes optional parameters for the netlink socket creation. This allows
easy extensibility of this interface in the future.

This patch also adapts all callers to use this new interface.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a31f2d17

29 6月, 2012 4 次提交

ipv4: Elide fib_validate_source() completely when possible. · 7a9bc9b8

由 David S. Miller 提交于 6月 29, 2012

If rpfilter is off (or the SKB has an IPSEC path) and there are not
tclassid users, we don't have to do anything at all when
fib_validate_source() is invoked besides setting the itag to zero.

We monitor tclassid uses with a counter (modified only under RTNL and
marked __read_mostly) and we protect the fib_validate_source() real
work with a test against this counter and whether rpfilter is to be
done.

Having a way to know whether we need no tclassid processing or not
also opens the door for future optimized rpfilter algorithms that do
not perform full FIB lookups.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7a9bc9b8

D
ipv4: Remove extraneous assignment of dst->tclassid. · 3085a4b7
由 David S. Miller 提交于 6月 28, 2012
```
We already set it several lines above.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
3085a4b7

ipv4: Adjust in_dev handling in fib_validate_source() · 9e56e380

由 David S. Miller 提交于 6月 28, 2012

Checking for in_dev being NULL is pointless.

In fact, all of our callers have in_dev precomputed already,
so just pass it in and remove the NULL checking.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e56e380

ipv4: Fix bugs in fib_compute_spec_dst(). · a207a4b2

由 David S. Miller 提交于 6月 28, 2012

Based upon feedback from Julian Anastasov.

1) Use route flags to determine multicast/broadcast, not the
   packet flags.

2) Leave saddr unspecified in flow key.

3) Adjust how we invoke inet_select_addr().  Pass ip_hdr(skb)->saddr as
   second arg, and if it was zeronet use link scope.

4) Use loopback as input interface in flow key.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a207a4b2

28 6月, 2012 11 次提交

D
ipv4: Kill rt->rt_spec_dst, no longer used. · 41347dcd
由 David S. Miller 提交于 6月 28, 2012
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
41347dcd

ipv4: Create and use fib_compute_spec_dst() helper. · 35ebf65e

由 David S. Miller 提交于 6月 28, 2012

The specific destination is the host we direct unicast replies to.
Usually this is the original packet source address, but if we are
responding to a multicast or broadcast packet we have to use something
different.

Specifically we must use the source address we would use if we were to
send a packet to the unicast source of the original packet.

The routing cache precomputes this value, but we want to remove that
precomputation because it creates a hard dependency on the expensive
rpfilter source address validation which we'd like to make cheaper.

There are only three places where this matters:

1) ICMP replies.

2) pktinfo CMSG

3) IP options

Now there will be no real users of rt->rt_spec_dst and we can simply
remove it altogether.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

35ebf65e

ipv4: Show that ip_send_reply() is purely unicast routine. · 70e73416

由 David S. Miller 提交于 6月 28, 2012

Rename it to ip_send_unicast_reply() and add explicit 'saddr'
argument.

This removed one of the few users of rt->rt_spec_dst.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

70e73416

D
ipv4: Kill early demux method return value. · 160eb5a6
由 David S. Miller 提交于 6月 27, 2012
```
It's completely unnecessary.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
160eb5a6

Revert "ipv4: tcp: dont cache unconfirmed intput dst" · c10237e0

由 David S. Miller 提交于 6月 27, 2012

This reverts commit c074da28.

This change has several unwanted side effects:

1) Sockets will cache the DST_NOCACHE route in sk->sk_rx_dst and we'll
   thus never create a real cached route.

2) All TCP traffic will use DST_NOCACHE and never use the routing
   cache at all.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c10237e0

net: skb_free_datagram_locked() doesnt drop all packets · 22911fc5

由 Eric Dumazet 提交于 6月 27, 2012

dropwatch wrongly diagnose all received UDP packets as drops.

This patch removes trace_kfree_skb() done in skb_free_datagram_locked().

Locations calling skb_free_datagram_locked() should do it on their own.

As a result, drops are accounted on the right function.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22911fc5

ipmr: Do not use RTA_PUT() macros · 92a395e5

由 Thomas Graf 提交于 6月 26, 2012

Also fix a needless skb tailroom check for a 4 bytes area
after after each rtnexthop block.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

92a395e5

inet_diag: Do not use RTA_PUT() macros · 6e277ed5

由 Thomas Graf 提交于 6月 26, 2012

Also, no need to trim on nlmsg_put() failure, nothing has been added
yet.  We also want to use nlmsg_end(), nlmsg_new() and nlmsg_free().
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e277ed5

ipv4: tcp: dont cache unconfirmed intput dst · c074da28

由 Eric Dumazet 提交于 6月 26, 2012

DDOS synflood attacks hit badly IP route cache.

On typical machines, this cache is allowed to hold up to 8 Millions dst
entries, 256 bytes for each, for a total of 2GB of memory.

rt_garbage_collect() triggers and tries to cleanup things.

Eventually route cache is disabled but machine is under fire and might
OOM and crash.

This patch exploits the new TCP early demux, to set a nocache
boolean in case incoming TCP frame is for a not yet ESTABLISHED or
TIMEWAIT socket.

This 'nocache' boolean is then used in case dst entry is not found in
route cache, to create an unhashed dst entry (DST_NOCACHE)

SYN-cookie-ACK sent use a similar mechanism (ipv4: tcp: dont cache
output dst for syncookies), so after this patch, a machine is able to
absorb a DDOS synflood attack without polluting its IP route cache.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c074da28

netfilter: nf_ct_icmp: add icmp_kmemdup[_compat]_sysctl_table function · a9082b45

由 Gao feng 提交于 6月 21, 2012

Split sysctl function into smaller chucks to cleanup code and prepare
patches to reduce ifdef pollution.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a9082b45

netfilter: nf_conntrack: prepare l4proto->init_net cleanup · f1caad27

由 Gao feng 提交于 6月 21, 2012

l4proto->init contain quite redundant code. We can simplify this
by adding a new parameter l3proto.

This patch prepares that code simplification.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

f1caad27

27 6月, 2012 3 次提交

D
netfilter: ipt_ULOG: Move away from NLMSG_PUT(). · c2bd4baf
由 David S. Miller 提交于 6月 26, 2012
```
And use nlmsg_data() while we're here too.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
c2bd4baf

inet_diag: Move away from NLMSG_PUT(). · d106352d

由 David S. Miller 提交于 6月 26, 2012

And use nlmsg_data() while we're here too, and remove useless
casts.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d106352d

ipv4: Cache ip_error() routes even when not forwarding. · 251da413

由 David S. Miller 提交于 6月 26, 2012

And account for the fact that, when we are not forwarding, we should
bump statistic counters rather than emit an ICMP response.

RP-filter rejected lookups are still not cached.

Since -EHOSTUNREACH and -ENETUNREACH can now no longer be seen in
ip_rcv_finish(), remove those checks.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

251da413

26 6月, 2012 1 次提交

ipv4: Remove unnecessary code from rt_check_expire(). · df67e6c9

由 David S. Miller 提交于 6月 26, 2012

IPv4 routing cache entries no longer use dst->expires, because the
metrics, PMTU, and redirect information are stored in the inetpeer
cache.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

df67e6c9

24 6月, 2012 1 次提交

tcp: Fix bug in tcp socket early demux · 7011d085

由 Vijay Subramanian 提交于 6月 23, 2012

The dest port for the call to __inet_lookup_established() in TCP early demux
code is passed with the wrong endian-ness. This causes the lookup to fail
leading to early demux not being used.
Signed-off-by: NVijay Subramanian <subramanian.vijay@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7011d085

23 6月, 2012 2 次提交

ipv4: tcp: dont cache output dst for syncookies · 7586eceb

由 Eric Dumazet 提交于 6月 20, 2012

Don't cache output dst for syncookies, as this adds pressure on IP route
cache and rcu subsystem for no gain.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Hans Schillstrom <hans.schillstrom@ericsson.com>
Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7586eceb

ipv4: Add sysctl knob to control early socket demux · 6648bd7e

由 Alexander Duyck 提交于 6月 21, 2012

This change is meant to add a control for disabling early socket demux.
The main motivation behind this patch is to provide an option to disable
the feature as it adds an additional cost to routing that reduces overall
throughput by up to 5%.  For example one of my systems went from 12.1Mpps
to 11.6 after the early socket demux was added.  It looks like the reason
for the regression is that we are now having to perform two lookups, first
the one for an established socket, and then the one for the routing table.

By adding this patch and toggling the value for ip_early_demux to 0 I am
able to get back to the 12.1Mpps I was previously seeing.

[ Move local variables in ip_rcv_finish() down into the basic
  block in which they are actually used.  -DaveM ]
Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6648bd7e

22 6月, 2012 2 次提交

netfilter: nfnetlink_queue: fix compilation with CONFIG_NF_NAT=m and CONFIG_NF_CT_NETLINK=y · d584a61a

由 Pablo Neira Ayuso 提交于 6月 20, 2012

  LD      init/built-in.o
net/built-in.o:(.data+0x4408): undefined reference to `nf_nat_tcp_seq_adjust'
make: *** [vmlinux] Error 1

This patch adds a new pointer hook (nfq_ct_nat_hook) similar to other existing
in Netfilter to solve our complicated configuration dependencies.
Reported-by: NValdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d584a61a

tcp: Validate route interface in early demux. · fd62e09b

由 David S. Miller 提交于 6月 21, 2012

Otherwise we might violate reverse path filtering.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd62e09b

21 6月, 2012 1 次提交

inetpeer: inetpeer_invalidate_tree() cleanup · da557374

由 Eric Dumazet 提交于 6月 20, 2012

No need to use cmpxchg() in inetpeer_invalidate_tree() since we hold
base lock.

Also use correct rcu annotations to remove sparse errors
(CONFIG_SPARSE_RCU_POINTER=y)

net/ipv4/inetpeer.c:144:19: error: incompatible types in comparison
expression (different address spaces)
net/ipv4/inetpeer.c:149:20: error: incompatible types in comparison
expression (different address spaces)
net/ipv4/inetpeer.c:595:10: error: incompatible types in comparison
expression (different address spaces)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

da557374

20 6月, 2012 2 次提交

ipv4: Early TCP socket demux. · 41063e9d

由 David S. Miller 提交于 6月 19, 2012

Input packet processing for local sockets involves two major demuxes.
One for the route and one for the socket.

But we can optimize this down to one demux for certain kinds of local
sockets.

Currently we only do this for established TCP sockets, but it could
at least in theory be expanded to other kinds of connections.

If a TCP socket is established then it's identity is fully specified.

This means that whatever input route was used during the three-way
handshake must work equally well for the rest of the connection since
the keys will not change.

Once we move to established state, we cache the receive packet's input
route to use later.

Like the existing cached route in sk->sk_dst_cache used for output
packets, we have to check for route invalidations using dst->obsolete
and dst->ops->check().

Early demux occurs outside of a socket locked section, so when a route
invalidation occurs we defer the fixup of sk->sk_rx_dst until we are
actually inside of established state packet processing and thus have
the socket locked.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41063e9d

inet: Sanitize inet{,6} protocol demux. · f9242b6b

由 David S. Miller 提交于 6月 19, 2012

Don't pretend that inet_protos[] and inet6_protos[] are hashes, thay
are just a straight arrays.  Remove all unnecessary hash masking.

Document MAX_INET_PROTOS.

Use RAW_HTABLE_SIZE when appropriate.
Reported-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f9242b6b

18 6月, 2012 1 次提交

ipv4: Cap ADVMSS metric in the FIB rather than the routing cache. · 6fac2625

由 David S. Miller 提交于 6月 17, 2012

It makes no sense to execute this limit test every time we create a
routing cache entry.

We can't simply error out on these things since we've silently
accepted and truncated them forever.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6fac2625

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功