提交 · b23dd4fe42b455af5c6e20966b7d6959fa8352ea · openanolis / cloud-kernel

03 3月, 2011 1 次提交
- D
  ipv4: Make output route lookup return rtable directly. · b23dd4fe
  由 David S. Miller 提交于 3月 02, 2011
```
Instead of on the stack.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  b23dd4fe
02 3月, 2011 1 次提交

xfrm: Handle blackhole route creation via afinfo. · 2774c131

由 David S. Miller 提交于 3月 01, 2011

That way we don't have to potentially do this in every xfrm_lookup()
caller.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2774c131

24 2月, 2011 1 次提交
- D
  xfrm: Const'ify address arguments to ->dst_lookup() · 5e6b930f
  由 David S. Miller 提交于 2月 24, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  5e6b930f
23 2月, 2011 2 次提交
- D
  xfrm: Mark flowi arg to ->fill_dst() const. · 0c7b3eef
  由 David S. Miller 提交于 2月 22, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  0c7b3eef
- D
  xfrm: Mark flowi arg to ->get_tos() const. · 05d84025
  由 David S. Miller 提交于 2月 22, 2011
```
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  05d84025
27 1月, 2011 1 次提交

net: Implement read-only protection and COW'ing of metrics. · 62fa8a84

由 David S. Miller 提交于 1月 26, 2011

Routing metrics are now copy-on-write.

Initially a route entry points it's metrics at a read-only location.
If a routing table entry exists, it will point there.  Else it will
point at the all zero metric place-holder called 'dst_default_metrics'.

The writeability state of the metrics is stored in the low bits of the
metrics pointer, we have two bits left to spare if we want to store
more states.

For the initial implementation, COW is implemented simply via kmalloc.
However future enhancements will change this to place the writable
metrics somewhere else, in order to increase sharing.  Very likely
this "somewhere else" will be the inetpeer cache.

Note also that this means that metrics updates may transiently fail
if we cannot COW the metrics successfully.

But even by itself, this patch should decrease memory usage and
increase cache locality especially for routing workloads.  In those
cases the read-only metric copies stay in place and never get written
to.

TCP workloads where metrics get updated, and those rare cases where
PMTU triggers occur, will take a very slight performance hit.  But
that hit will be alleviated when the long-term writable metrics
move to a more sharable location.

Since the metrics storage went from a u32 array of RTAX_MAX entries to
what is essentially a pointer, some retooling of the dst_entry layout
was necessary.

Most importantly, we need to preserve the alignment of the reference
count so that it doesn't share cache lines with the read-mostly state,
as per Eric Dumazet's alignment assertion checks.

The only non-trivial bit here is the move of the 'flags' member into
the writeable cacheline.  This is OK since we are always accessing the
flags around the same moment when we made a modification to the
reference count.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62fa8a84

18 11月, 2010 1 次提交

net: use the macros defined for the members of flowi · 5811662b

由 Changli Gao 提交于 11月 12, 2010

Use the macros defined for the members of flowi to clean the code up.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5811662b

16 11月, 2010 1 次提交

xfrm: use gre key as flow upper protocol info · cc9ff19d

由 Timo Teräs 提交于 11月 03, 2010

The GRE Key field is intended to be used for identifying an individual
traffic flow within a tunnel. It is useful to be able to have XFRM
policy selector matches to have different policies for different
GRE tunnels.
Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cc9ff19d

12 11月, 2010 1 次提交

net: get rid of rtable->idev · 72cdd1d9

由 Eric Dumazet 提交于 11月 11, 2010

It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.

We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).

infiniband case is solved using dst.dev instead of idev->dev

Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.

About 5% speedup on routing test.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72cdd1d9

12 10月, 2010 1 次提交

net dst: use a percpu_counter to track entries · fc66f95c

由 Eric Dumazet 提交于 10月 08, 2010

struct dst_ops tracks number of allocated dst in an atomic_t field,
subject to high cache line contention in stress workload.

Switch to a percpu_counter, to reduce number of time we need to dirty a
central location. Place it on a separate cache line to avoid dirtying
read only fields.

Stress test :

(Sending 160.000.000 UDP frames,
IP route cache disabled, dual E5540 @2.53GHz,
32bit kernel, FIB_TRIE, SLUB/NUMA)

Before:

real    0m51.179s
user    0m15.329s
sys     10m15.942s

After:

real	0m45.570s
user	0m15.525s
sys	9m56.669s

With a small reordering of struct neighbour fields, subject of a
following patch, (to separate refcnt from other read mostly fields)

real	0m41.841s
user	0m15.261s
sys	8m45.949s
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc66f95c

23 9月, 2010 1 次提交

xfrm4: strip ECN bits from tos field · 94e22389

由 Ulrich Weber 提交于 9月 22, 2010

otherwise ECT(1) bit will get interpreted as RTO_ONLINK
and routing will fail with XfrmOutBundleGenError.
Signed-off-by: NUlrich Weber <uweber@astaro.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94e22389

05 7月, 2010 1 次提交

xfrm: fix xfrm by MARK logic · 44b451f1

由 Peter Kosyh 提交于 7月 02, 2010

While using xfrm by MARK feature in
2.6.34 - 2.6.35 kernels, the mark
is always cleared in flowi structure via memset in
_decode_session4 (net/ipv4/xfrm4_policy.c), so
the policy lookup fails.
IPv6 code is affected by this bug too.
Signed-off-by: NPeter Kosyh <p.kosyh@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44b451f1

11 6月, 2010 1 次提交

net-next: remove useless union keyword · d8d1f30b

由 Changli Gao 提交于 6月 10, 2010

remove useless union keyword in rtable, rt6_info and dn_route.

Since there is only one member in a union, the union keyword isn't useful.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8d1f30b

07 4月, 2010 1 次提交

xfrm: cache bundles instead of policies for outgoing flows · 80c802f3

由 Timo Teräs 提交于 4月 07, 2010

__xfrm_lookup() is called for each packet transmitted out of
system. The xfrm_find_bundle() does a linear search which can
kill system performance depending on how many bundles are
required per policy.

This modifies __xfrm_lookup() to store bundles directly in
the flow cache. If we did not get a hit, we just create a new
bundle instead of doing slow search. This means that we can now
get multiple xfrm_dst's for same flow (on per-cpu basis).
Signed-off-by: NTimo Teras <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

80c802f3

03 3月, 2010 1 次提交

ipsec: Fix bogus bundle flowi · 87c1e12b

由 Herbert Xu 提交于 3月 02, 2010

When I merged the bundle creation code, I introduced a bogus
flowi value in the bundle.  Instead of getting from the caller,
it was instead set to the flow in the route object, which is
totally different.

The end result is that the bundles we created never match, and
we instead end up with an ever growing bundle list.

Thanks to Jamal for find this problem.
Reported-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Acked-by: NSteffen Klassert <steffen.klassert@secunet.com>
Acked-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87c1e12b

25 1月, 2010 1 次提交

netns xfrm: deal with dst entries in netns · d7c7544c

由 Alexey Dobriyan 提交于 1月 24, 2010

GC is non-existent in netns, so after you hit GC threshold, no new
dst entries will be created until someone triggers cleanup in init_net.

Make xfrm4_dst_ops and xfrm6_dst_ops per-netns.
This is not done in a generic way, because it woule waste
(AF_MAX - 2) * sizeof(struct dst_ops) bytes per-netns.

Reorder GC threshold initialization so it'd be done before registering
XFRM policies.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d7c7544c

12 11月, 2009 1 次提交

sysctl net: Remove unused binary sysctl code · f8572d8f

由 Eric W. Biederman 提交于 11月 05, 2009

Now that sys_sysctl is a compatiblity wrapper around /proc/sys
all sysctl strategy routines, and all ctl_name and strategy
entries in the sysctl tables are unused, and can be
revmoed.

In addition neigh_sysctl_register has been modified to no longer
take a strategy argument and it's callers have been modified not
to pass one.

Cc: "David Miller" <davem@davemloft.net>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: netdev@vger.kernel.org
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

f8572d8f

05 8月, 2009 1 次提交

xfrm4: fix build when SYSCTLs are disabled · f816700a

由 Randy Dunlap 提交于 8月 04, 2009

Fix build errors when SYSCTLs are not enabled:
(.init.text+0x5154): undefined reference to `net_ipv4_ctl_path'
(.init.text+0x5176): undefined reference to `register_net_sysctl_table'
xfrm4_policy.c:(.exit.text+0x573): undefined reference to `unregister_net_sysctl_table
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f816700a

31 7月, 2009 1 次提交

xfrm: select sane defaults for xfrm[4|6] gc_thresh · a33bc5c1

由 Neil Horman 提交于 7月 30, 2009

Choose saner defaults for xfrm[4|6] gc_thresh values on init

Currently, the xfrm[4|6] code has hard-coded initial gc_thresh values
(set to 1024).  Given that the ipv4 and ipv6 routing caches are sized
dynamically at boot time, the static selections can be non-sensical.
This patch dynamically selects an appropriate gc threshold based on
the corresponding main routing table size, using the assumption that
we should in the worst case be able to handle as many connections as
the routing table can.

For ipv4, the maximum route cache size is 16 * the number of hash
buckets in the route cache.  Given that xfrm4 starts garbage
collection at the gc_thresh and prevents new allocations at 2 *
gc_thresh, we set gc_thresh to half the maximum route cache size.

For ipv6, its a bit trickier.  there is no maximum route cache size,
but the ipv6 dst_ops gc_thresh is statically set to 1024.  It seems
sane to select a simmilar gc_thresh for the xfrm6 code that is half
the number of hash buckets in the v6 route cache times 16 (like the v4
code does).
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a33bc5c1

28 7月, 2009 1 次提交

xfrm: export xfrm garbage collector thresholds via sysctl · a44a4a00

由 Neil Horman 提交于 7月 27, 2009

Export garbage collector thresholds for xfrm[4|6]_dst_ops

Had a problem reported to me recently in which a high volume of ipsec
connections on a system began reporting ENOBUFS for new connections
eventually.

It seemed that after about 2000 connections we started being unable to
create more.  A quick look revealed that the xfrm code used a dst_ops
structure that limited the gc_thresh value to 1024, and always
dropped route cache entries after 2x the gc_thresh.

It seems the most direct solution is to export the gc_thresh values in
the xfrm[4|6] dst_ops as sysctls, like the main routing table does, so
that higher volumes of connections can be supported.  This patch has
been tested and allows the reporter to increase their ipsec connection
volume successfully.
Reported-by: NJoe Nall <joe@nall.com>
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>

ipv4/xfrm4_policy.c |   18 ++++++++++++++++++
ipv6/xfrm6_policy.c |   18 ++++++++++++++++++
2 files changed, 36 insertions(+)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a44a4a00

04 7月, 2009 1 次提交

xfrm4: fix the ports decode of sctp protocol · c615c9f3

由 Wei Yongjun 提交于 7月 02, 2009

The SCTP pushed the skb data above the sctp chunk header, so the check
of pskb_may_pull(skb, xprth + 4 - skb->data) in _decode_session4() will
never return 0 because xprth + 4 - skb->data < 0, the ports decode of
sctp will always fail.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c615c9f3

01 2月, 2009 1 次提交

net: replace uses of __constant_{endian} · 09640e63

由 Harvey Harrison 提交于 2月 01, 2009

Base versions handle constant folding now.
Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

09640e63

26 11月, 2008 3 次提交

netns xfrm: ->get_saddr in netns · fbda33b2

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbda33b2

netns xfrm: ->dst_lookup in netns · c5b3cf46

由 Alexey Dobriyan 提交于 11月 25, 2008

Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5b3cf46

netns xfrm: dst garbage-collecting in netns · ddcfd796

由 Alexey Dobriyan 提交于 11月 25, 2008

Pass netns pointer to struct xfrm_policy_afinfo::garbage_collect()

	[This needs more thoughts on what to do with dst_ops]
	[Currently stub to init_net]
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ddcfd796

12 11月, 2008 1 次提交

net: remove struct dst_entry::entry_size · 6bb3ce25

由 Alexey Dobriyan 提交于 11月 11, 2008

Unused after kmem_cache_zalloc() conversion.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6bb3ce25

03 11月, 2008 1 次提交
- J
  net: clean up net/ipv4/ipip.c raw.c tcp.c tcp_minisocks.c tcp_yeah.c xfrm4_policy.c · 5a5f3a8d
  由 Jianjun Kong 提交于 11月 03, 2008
```
Signed-off-by: NJianjun Kong <jianjun@zeuux.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  5a5f3a8d
26 3月, 2008 1 次提交

[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. · c346dca1

由 YOSHIFUJI Hideaki 提交于 3月 25, 2008

Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

c346dca1

01 2月, 2008 1 次提交

[NET]: should explicitely initialize atomic_t field in struct dst_ops · e2422970

由 Eric Dumazet 提交于 1月 30, 2008

All but one struct dst_ops static initializations miss explicit
initialization of entries field.

As this field is atomic_t, we should use ATOMIC_INIT(0), and not
rely on atomic_t implementation.
Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2422970

29 1月, 2008 11 次提交

[NETNS]: Add namespace parameter to __ip_route_output_key. · 611c183e

由 Denis V. Lunev 提交于 1月 22, 2008

This is only required to propagate it down to the
ip_route_output_slow.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

611c183e

[NETNS][DST] dst: pass the dst_ops as parameter to the gc functions · 569d3645

由 Daniel Lezcano 提交于 1月 18, 2008

The garbage collection function receive the dst_ops structure as
parameter. This is useful for the next incoming patchset because it
will need the dst_ops (there will be several instances) and the
network namespace pointer (contained in the dst_ops).

The protocols which do not take care of the namespaces will not be
impacted by this change (expect for the function signature), they do
just ignore the parameter.
Signed-off-by: NDaniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

569d3645

[XFRM] IPv6: Fix dst/routing check at transformation. · a1b05140

由 Masahide NAKAMURA 提交于 12月 20, 2007

IPv6 specific thing is wrongly removed from transformation at net-2.6.25.
This patch recovers it with current design.

o Update "path" of xfrm_dst since IPv6 transformation should
  care about routing changes. It is required by MIPv6 and
  off-link destined IPsec.
o Rename nfheader_len which is for non-fragment transformation used by
  MIPv6 to rt6i_nfheader_len as IPv6 name space.
Signed-off-by: NMasahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a1b05140

[IPSEC]: Added xfrm_decode_session_reverse and xfrmX_policy_check_reverse · d5422efe

由 Herbert Xu 提交于 12月 12, 2007

RFC 4301 requires us to relookup ICMP traffic that does not match any
policies using the reverse of its payload.  This patch adds the functions
xfrm_decode_session_reverse and xfrmX_policy_check_reverse so we can get
the reverse flow to perform such a lookup.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5422efe

[NET]: Multiple namespaces in the all dst_ifdown routines. · 5a3e55d6

由 Denis V. Lunev 提交于 12月 07, 2007

Move dst entries to a namespace loopback to catch refcounting leaks.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a3e55d6

[IPSEC]: Merge most of the output path · 862b82c6

由 Herbert Xu 提交于 11月 13, 2007

As part of the work on asynchrnous cryptographic operations, we need
to be able to resume from the spot where they occur.  As such, it
helps if we isolate them to one spot.

This patch moves most of the remaining family-specific processing into
the common output code.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

862b82c6

[IPSEC]: Merge common code into xfrm_bundle_create · 25ee3286

由 Herbert Xu 提交于 12月 11, 2007

Half of the code in xfrm4_bundle_create and xfrm6_bundle_create are
common.  This patch extracts that logic and puts it into
xfrm_bundle_create.  The rest of it are then accessed through afinfo.

As a result this fixes the problem with inter-family transforms where
we treat every xfrm dst in the bundle as if it belongs to the top
family.

This patch also fixes a long-standing error-path bug where we may free
the xfrm states twice.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25ee3286

[IPSEC]: Move flow construction into xfrm_dst_lookup · 66cdb3ca

由 Herbert Xu 提交于 11月 13, 2007

This patch moves the flow construction from the callers of
xfrm_dst_lookup into that function.  It also changes xfrm_dst_lookup
so that it takes an xfrm state as its argument instead of explicit
addresses.

This removes any address-specific logic from the callers of
xfrm_dst_lookup which is needed to correctly support inter-family
transforms.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66cdb3ca

[IPSEC]: Make sure idev is consistent with dev in xfrm_dst · fff69388

由 Herbert Xu 提交于 11月 13, 2007

Previously we took the device from the bottom route and idev from the
top route.  This is bad because idev may well point to a different
device.  This patch changes it so that we get the idev from the device
directly.

It also makes it an error if either dev or idev is NULL.  This is
consistent with the rest of the routing code which also treats these
cases as errors.

I've removed the err initialisation in xfrm6_policy.c because it
achieves no purpose and hid a bug when an initial version of this
patch neglected to set err to -ENODEV (fortunately the IPv4 version
warned about it).
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fff69388

[IPSEC]: Set dst->input to dst_discard · 45ff5a3f

由 Herbert Xu 提交于 11月 13, 2007

The input function should never be invoked on IPsec dst objects.  This
is because we don't apply IPsec on input until after we've made the
routing decision.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45ff5a3f

[IPSEC]: Only set neighbour on top xfrm dst · 8ce68ceb

由 Herbert Xu 提交于 11月 13, 2007

The neighbour field is only used by dst_confirm which only ever happens on
the top-most xfrm dst.  So it's a waste to duplicate for every other xfrm
dst.  This patch moves its setting out of the loop so that only the top one
gets set.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ce68ceb

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功