提交 · 7f502361531e9eecb396cf99bdc9e9a59f7ebd7f · openanolis / cloud-kernel

01 7月, 2014 1 次提交

ipv4: irq safe sk_dst_[re]set() and ipv4_sk_update_pmtu() fix · 7f502361

由 Eric Dumazet 提交于 6月 30, 2014

We have two different ways to handle changes to sk->sk_dst

First way (used by TCP) assumes socket lock is owned by caller, and use
no extra lock : __sk_dst_set() & __sk_dst_reset()

Another way (used by UDP) uses sk_dst_lock because socket lock is not
always taken. Note that sk_dst_lock is not softirq safe.

These ways are not inter changeable for a given socket type.

ipv4_sk_update_pmtu(), added in linux-3.8, added a race, as it used
the socket lock as synchronization, but users might be UDP sockets.

Instead of converting sk_dst_lock to a softirq safe version, use xchg()
as we did for sk_rx_dst in commit e47eb5df ("udp: ipv4: do not use
sk_dst_lock from softirq context")

In a follow up patch, we probably can remove sk_dst_lock, as it is
only used in IPv6.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Fixes: 9cb3a50c ("ipv4: Invalidate the socket cached route on pmtu events if possible")
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f502361

03 6月, 2014 1 次提交

inetpeer: get rid of ip_id_count · 73f156a6

由 Eric Dumazet 提交于 6月 02, 2014

Ideally, we would need to generate IP ID using a per destination IP
generator.

linux kernels used inet_peer cache for this purpose, but this had a huge
cost on servers disabling MTU discovery.

1) each inet_peer struct consumes 192 bytes

2) inetpeer cache uses a binary tree of inet_peer structs,
   with a nominal size of ~66000 elements under load.

3) lookups in this tree are hitting a lot of cache lines, as tree depth
   is about 20.

4) If server deals with many tcp flows, we have a high probability of
   not finding the inet_peer, allocating a fresh one, inserting it in
   the tree with same initial ip_id_count, (cf secure_ip_id())

5) We garbage collect inet_peer aggressively.

IP ID generation do not have to be 'perfect'

Goal is trying to avoid duplicates in a short period of time,
so that reassembly units have a chance to complete reassembly of
fragments belonging to one message before receiving other fragments
with a recycled ID.

We simply use an array of generators, and a Jenkin hash using the dst IP
as a key.

ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
belongs (it is only used from this file)

secure_ip_id() and secure_ipv6_id() no longer are needed.

Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
unnecessary decrement/increment of the number of segments.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73f156a6

23 5月, 2014 1 次提交

ipv4: initialise the itag variable in __mkroute_input · fbdc0ad0

由 Li RongQing 提交于 5月 22, 2014

the value of itag is a random value from stack, and may not be initiated by
fib_validate_source, which called fib_combine_itag if CONFIG_IP_ROUTE_CLASSID
is not set

This will make the cached dst uncertainty
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fbdc0ad0

14 5月, 2014 1 次提交

net: Use fwmark reflection in PMTU discovery. · 1b3c61dc

由 Lorenzo Colitti 提交于 5月 13, 2014

Currently, routing lookups used for Path PMTU Discovery in
absence of a socket or on unmarked sockets use a mark of 0.
This causes PMTUD not to work when using routing based on
netfilter fwmark mangling and fwmark ip rules, such as:

  iptables -j MARK --set-mark 17
  ip rule add fwmark 17 lookup 100

This patch causes these route lookups to use the fwmark from the
received ICMP error when the fwmark_reflect sysctl is enabled.
This allows the administrator to make PMTUD work by configuring
appropriate fwmark rules to mark the inbound ICMP packets.

Black-box tested using user-mode linux by pointing different
fwmarks at routing tables egressing on different interfaces, and
using iptables mangling to mark packets inbound on each interface
with the interface's fwmark. ICMPv4 and ICMPv6 PMTU discovery
work as expected when mark reflection is enabled and fail when
it is disabled.
Signed-off-by: NLorenzo Colitti <lorenzo@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b3c61dc

17 4月, 2014 1 次提交

ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source() · 0d5edc68

由 Cong Wang 提交于 4月 15, 2014

In my special case, when a packet is redirected from veth0 to lo,
its skb->dev->ifindex would be LOOPBACK_IFINDEX. Meanwhile we
pass the hard-coded LOOPBACK_IFINDEX to fib_validate_source()
in ip_route_input_slow(). This would cause the following check
in fib_validate_source() fail:

            (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))

when rp_filter is disabeld on loopback. As suggested by Julian,
the caller should pass 0 here so that we will not end up by
calling __fib_validate_source().

Cc: Eric Biederman <ebiederm@xmission.com>
Cc: Julian Anastasov <ja@ssi.bg>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NCong Wang <cwang@twopensource.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0d5edc68

16 4月, 2014 1 次提交

ipv4: add a sock pointer to dst->output() path. · aad88724

由 Eric Dumazet 提交于 4月 15, 2014

In the dst->output() path for ipv4, the code assumes the skb it has to
transmit is attached to an inet socket, specifically via
ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the
provider of the packet is an AF_PACKET socket.

The dst->output() method gets an additional 'struct sock *sk'
parameter. This needs a cascade of changes so that this parameter can
be propagated from vxlan to final consumer.

Fixes: 8f646c92 ("vxlan: keep original skb ownership")
Reported-by: Nlucien xin <lucien.xin@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aad88724

14 4月, 2014 1 次提交

ipv4: return valid RTA_IIF on ip route get · 91146153

由 Julian Anastasov 提交于 4月 13, 2014

Extend commit 13378cad
("ipv4: Change rt->rt_iif encoding.") from 3.6 to return valid
RTA_IIF on 'ip route get ... iif DEVICE' instead of rt_iif 0
which is displayed as 'iif *'.

inet_iif is not appropriate to use because skb_iif is not set.
Use the skb->dev->ifindex instead.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91146153

08 4月, 2014 1 次提交

net: replace __this_cpu_inc in route.c with raw_cpu_inc · 3ed66e91

由 Christoph Lameter 提交于 4月 07, 2014

The RT_CACHE_STAT_INC macro triggers the new preemption checks
for __this_cpu ops.

I do not see any other synchronization that would allow the use of a
__this_cpu operation here however in commit dbd2915c ("[IPV4]:
RT_CACHE_STAT_INC() warning fix") Andrew justifies the use of
raw_smp_processor_id() here because "we do not care" about races.  In
the past we agreed that the price of disabling interrupts here to get
consistent counters would be too high.  These counters may be inaccurate
due to race conditions.

The use of __this_cpu op improves the situation already from what commit
dbd2915c did since the single instruction emitted on x86 does not
allow the race to occur anymore.  However, non x86 platforms could still
experience a race here.

Trace:

  __this_cpu_add operation in preemptible [00000000] code: avahi-daemon/1193
  caller is __this_cpu_preempt_check+0x38/0x60
  CPU: 1 PID: 1193 Comm: avahi-daemon Tainted: GF            3.12.0-rc4+ #187
  Call Trace:
    check_preemption_disabled+0xec/0x110
    __this_cpu_preempt_check+0x38/0x60
    __ip_route_output_key+0x575/0x8c0
    ip_route_output_flow+0x27/0x70
    udp_sendmsg+0x825/0xa20
    inet_sendmsg+0x85/0xc0
    sock_sendmsg+0x9c/0xd0
    ___sys_sendmsg+0x37c/0x390
    __sys_sendmsg+0x49/0x90
    SyS_sendmsg+0x12/0x20
    tracesys+0xe1/0xe6
Signed-off-by: NChristoph Lameter <cl@linux.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NIngo Molnar <mingo@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ed66e91

25 3月, 2014 1 次提交

ipv4: remove ip_rt_dump from route.c · 0b8c7f6f

由 Li RongQing 提交于 3月 21, 2014

ip_rt_dump do nothing after IPv4 route caches removal, so we can remove it.
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b8c7f6f

24 3月, 2014 1 次提交

ipv4: remove ipv4_ifdown_dst from route.c · 4a4eb21f

由 Li RongQing 提交于 3月 24, 2014

ipv4_ifdown_dst does nothing after IPv4 route caches removal,
so we can remove it.
Signed-off-by: NLi RongQing <roy.qing.li@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a4eb21f

18 2月, 2014 1 次提交

ipv4: fix counter in_slow_tot · a6254864

由 Duan Jiong 提交于 2月 17, 2014

since commit 89aef892("ipv4: Delete routing cache."), the counter
in_slow_tot can't work correctly.

The counter in_slow_tot increase by one when fib_lookup() return successfully
in ip_route_input_slow(), but actually the dst struct maybe not be created and
cached, so we can increase in_slow_tot after the dst struct is created.
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6254864

17 2月, 2014 1 次提交

ipv4: distinguish EHOSTUNREACH from the ENETUNREACH · cd0f0b95

由 Duan Jiong 提交于 2月 14, 2014

since commit 251da413("ipv4: Cache ip_error() routes even when not forwarding."),
the counter IPSTATS_MIB_INADDRERRORS can't work correctly, because the value of
err was always set to ENETUNREACH.
Signed-off-by: NDuan Jiong <duanj.fnst@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd0f0b95

14 2月, 2014 1 次提交

net: remove unnecessary return's · 2045ceae

由 stephen hemminger 提交于 2月 12, 2014

One of my pet coding style peeves is the practice of
adding extra return; at the end of function.
Kill several instances of this in network code.

I suppose some coccinelle wizardy could do this automatically.
Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2045ceae

14 1月, 2014 1 次提交

ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing · f87c10a8

由 Hannes Frederic Sowa 提交于 1月 09, 2014

While forwarding we should not use the protocol path mtu to calculate
the mtu for a forwarded packet but instead use the interface mtu.

We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was
introduced for multicast forwarding. But as it does not conflict with
our usage in unicast code path it is perfect for reuse.

I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu
along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular
dependencies because of IPSKB_FORWARDED.

Because someone might have written a software which does probe
destinations manually and expects the kernel to honour those path mtus
I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone
can disable this new behaviour. We also still use mtus which are locked on a
route for forwarding.

The reason for this change is, that path mtus information can be injected
into the kernel via e.g. icmp_err protocol handler without verification
of local sockets. As such, this could cause the IPv4 forwarding path to
wrongfully emit fragmentation needed notifications or start to fragment
packets along a path.

Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED
won't be set and further fragmentation logic will use the path mtu to
determine the fragmentation size. They also recheck packet size with
help of path mtu discovery and report appropriate errors.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: John Heffner <johnwheffner@gmail.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f87c10a8

21 11月, 2013 1 次提交

ipv4: fix race in concurrent ip_route_input_slow() · dcdfdf56

由 Alexei Starovoitov 提交于 11月 19, 2013

CPUs can ask for local route via ip_route_input_noref() concurrently.
if nh_rth_input is not cached yet, CPUs will proceed to allocate
equivalent DSTs on 'lo' and then will try to cache them in nh_rth_input
via rt_cache_route()
Most of the time they succeed, but on occasion the following two lines:
	orig = *p;
	prev = cmpxchg(p, orig, rt);
in rt_cache_route() do race and one of the cpus fails to complete cmpxchg.
But ip_route_input_slow() doesn't check the return code of rt_cache_route(),
so dst is leaking. dst_destroy() is never called and 'lo' device
refcnt doesn't go to zero, which can be seen in the logs as:
	unregister_netdevice: waiting for lo to become free. Usage count = 1
Adding mdelay() between above two lines makes it easily reproducible.
Fix it similar to nh_pcpu_rth_output case.

Fixes: d2d68ba9 ("ipv4: Cache input routes in fib_info nexthops.")
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcdfdf56

06 11月, 2013 1 次提交

ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE · 482fc609

由 Hannes Frederic Sowa 提交于 11月 05, 2013

Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
their sockets won't accept and install new path mtu information and they
will always use the interface mtu for outgoing packets. It is guaranteed
that the packet is not fragmented locally. But we won't set the DF-Flag
on the outgoing frames.

Florian Weimer had the idea to use this flag to ensure DNS servers are
never generating outgoing fragments. They may well be fragmented on the
path, but the server never stores or usees path mtu values, which could
well be forged in an attack.

(The root of the problem with path MTU discovery is that there is
no reliable way to authenticate ICMP Fragmentation Needed But DF Set
messages because they are sent from intermediate routers with their
source addresses, and the IMCP payload will not always contain sufficient
information to identify a flow.)

Recent research in the DNS community showed that it is possible to
implement an attack where DNS cache poisoning is feasible by spoofing
fragments. This work was done by Amir Herzberg and Haya Shulman:
<https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>

This issue was previously discussed among the DNS community, e.g.
<http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
without leading to fixes.

This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
regarding local fragmentation with UFO/CORK" for the enforcement of the
non-fragmentable checks. If other users than ip_append_page/data should
use this semantic too, we have to add a new flag to IPCB(skb)->flags to
suppress local fragmentation and check for this in ip_finish_output.

Many thanks to Florian Weimer for the idea and feedback while implementing
this patch.

Cc: David S. Miller <davem@davemloft.net>
Suggested-by: NFlorian Weimer <fweimer@redhat.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

482fc609

18 10月, 2013 1 次提交

ipv4: shrink rt_cache_stat · 0baf2b35

由 Eric Dumazet 提交于 10月 16, 2013

Half of the rt_cache_stat fields are no longer used after IP
route cache removal, lets shrink this per cpu area.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0baf2b35

08 10月, 2013 1 次提交

ipv4: fix ineffective source address selection · 0a7e2260

由 Jiri Benc 提交于 10月 04, 2013

When sending out multicast messages, the source address in inet->mc_addr is
ignored and rewritten by an autoselected one. This is caused by a typo in
commit 813b3b5d ("ipv4: Use caller's on-stack flowi as-is in output
route lookups").
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a7e2260

21 8月, 2013 1 次提交

ipv4: raise IP_MAX_MTU to theoretical limit · 734d2725

由 Eric Dumazet 提交于 8月 18, 2013

As discussed last year [1], there is no compelling reason
to limit IPv4 MTU to 0xFFF0, while real limit is 0xFFFF

[1] : http://marc.info/?l=linux-netdev&m=135607247609434&w=2

Willem raised this issue again because some of our internal
regression tests broke after lo mtu being set to 65536.

IP_MTU reports 0xFFF0, and the test attempts to send a RAW datagram of
mtu + 1 bytes, expecting the send() to fail, but it does not.

Alexey raised interesting points about TCP MSS, that should be addressed
in follow-up patches in TCP stack if needed, as someone could also set
an odd mtu anyway.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

734d2725

01 8月, 2013 1 次提交

net: split rt_genid for ipv4 and ipv6 · ca4c3fc2

由 fan.du 提交于 7月 30, 2013

Current net name space has only one genid for both IPv4 and IPv6, it has below
drawbacks:

- Add/delete an IPv4 address will invalidate all IPv6 routing table entries.
- Insert/remove XFRM policy will also invalidate both IPv4/IPv6 routing table
  entries even when the policy is only applied for one address family.

Thus, this patch attempt to split one genid for two to cater for IPv4 and IPv6
separately in a fine granularity.
Signed-off-by: NFan Du <fan.du@windriver.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca4c3fc2

29 6月, 2013 1 次提交

ipv4: use next hop exceptions also for input routes · 2ffae99d

由 Timo Teräs 提交于 6月 27, 2013

Commit d2d68ba9 (ipv4: Cache input routes in fib_info nexthops)
assmued that "locally destined, and routed packets, never trigger
PMTU events or redirects that will be processed by us".

However, it seems that tunnel devices do trigger PMTU events in certain
cases. At least ip_gre, ip6_gre, sit, and ipip do use the inner flow's
skb_dst(skb)->ops->update_pmtu to propage mtu information from the
outer flows. These can cause the inner flow mtu to be decreased. If
next hop exceptions are not consulted for pmtu, IP fragmentation will
not be done properly for these routes.

It also seems that we really need to have the PMTU information always
for netfilter TCPMSS clamp-to-pmtu feature to work properly.

So for the time being, cache separate copies of input routes for
each next hop exception.
Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
Reviewed-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2ffae99d

13 6月, 2013 1 次提交

net: Convert uses of typedef ctl_table to struct ctl_table · fe2c6338

由 Joe Perches 提交于 6月 11, 2013

Reduce the uses of this unnecessary typedef.

Done via perl script:

$ git grep --name-only -w ctl_table net | \
  xargs perl -p -i -e '\
	sub trim { my ($local) = @_; $local =~ s/(^\s+|\s+$)//g; return $local; } \
        s/\b(?<!struct\s)ctl_table\b(\s*\*\s*|\s+\w+)/"struct ctl_table " . trim($1)/ge'

Reflow the modified lines that now exceed 80 columns.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fe2c6338

03 6月, 2013 3 次提交

ipv4: use separate genid for next hop exceptions · 5aad1de5

由 Timo Teräs 提交于 5月 27, 2013

commit 13d82bf5 (ipv4: Fix flushing of cached routing informations)
added the support to flush learned pmtu information.

However, using rt_genid is quite heavy as it is bumped on route
add/change and multicast events amongst other places. These can
happen quite often, especially if using dynamic routing protocols.

While this is ok with routes (as they are just recreated locally),
the pmtu information is learned from remote systems and the icmp
notification can come with long delays. It is worthy to have separate
genid to avoid excessive pmtu resets.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5aad1de5

ipv4: rate limit updating of next hop exceptions with same pmtu · f016229e

由 Timo Teräs 提交于 5月 27, 2013

The tunnel devices call update_pmtu for each packet sent, this causes
contention on the fnhe_lock. Ignore the pmtu update if pmtu is not
actually changed, and there is still plenty of time before the entry
expires.
Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f016229e

ipv4: properly refresh rtable entries on pmtu/redirect events · 387aa65a

由 Timo Teräs 提交于 5月 27, 2013

This reverts commit 05ab86c5 (xfrm4: Invalidate all ipv4 routes on
IPsec pmtu events). Flushing all cached entries is not needed.

Instead, invalidate only the related next hop dsts to recheck for
the added next hop exception where needed. This also fixes a subtle
race due to bumping generation id's before updating the pmtu.

Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

387aa65a

28 5月, 2013 1 次提交

ipv4: fix redirect handling for TCP packets · f96ef988

由 Michal Kubecek 提交于 5月 28, 2013

Unlike ipv4_redirect() and ipv4_sk_redirect(), ip_do_redirect()
doesn't call __build_flow_key() directly but via
ip_rt_build_flow_key() wrapper. This leads to __build_flow_key()
getting pointer to IPv4 header of the ICMP redirect packet
rather than pointer to the embedded IPv4 header of the packet
initiating the redirect.

As a result, handling of ICMP redirects initiated by TCP packets
is broken. Issue was introduced by

	4895c771 ("ipv4: Add FIB nexthop exceptions.")
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f96ef988

22 3月, 2013 1 次提交

rtnetlink: Remove passing of attributes into rtnl_doit functions · 661d2967

由 Thomas Graf 提交于 3月 21, 2013

With decnet converted, we can finally get rid of rta_buf and its
computations around it. It also gets rid of the minimal header
length verification since all message handlers do that explicitly
anyway.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

661d2967

20 2月, 2013 1 次提交

net: ipv4: fix waring -Wunused-variable · 082c7ca4

由 Gao feng 提交于 2月 19, 2013

the vars ip_rt_gc_timeout is used only when
CONFIG_SYSCTL is selected.

move these vars into CONFIG_SYSCTL.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

082c7ca4

19 2月, 2013 1 次提交

net: proc: change proc_net_fops_create to proc_create · d4beaa66

由 Gao feng 提交于 2月 18, 2013

Right now, some modules such as bonding use proc_create
to create proc entries under /proc/net/, and other modules
such as ipv4 use proc_net_fops_create.

It looks a little chaos.this patch changes all of
proc_net_fops_create to proc_create. we can remove
proc_net_fops_create after this patch.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4beaa66

23 1月, 2013 1 次提交

ipv4: Fix route refcount on pmtu discovery · b44108db

由 Steffen Klassert 提交于 1月 22, 2013

git commit 9cb3a50c (ipv4: Invalidate the socket cached route on
pmtu events if possible) introduced a refcount problem. We don't
get a refcount on the route if we get it from__sk_dst_get(), but
we need one if we want to reuse this route because __sk_dst_set()
releases the refcount of the old route. This patch adds proper
refcount handling for that case. We introduce a 'new' flag to
indicate that we are going to use a new route and we release the
old route only if we replace it by a new one.
Reported-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b44108db

22 1月, 2013 1 次提交

ipv4: Invalidate the socket cached route on pmtu events if possible · 9cb3a50c

由 Steffen Klassert 提交于 1月 21, 2013

The route lookup in ipv4_sk_update_pmtu() might return a route
different from the route we cached at the socket. This is because
standart routes are per cpu, so each cpu has it's own struct rtable.
This means that we do not invalidate the socket cached route if the
NET_RX_SOFTIRQ is not served by the same cpu that the sending socket
uses. As a result, the cached route reused until we disconnect.

With this patch we invalidate the socket cached route if possible.
If the socket is owened by the user, we can't update the cached
route directly. A followup patch will implement socket release
callback functions for datagram sockets to handle this case.
Reported-by: NYurij M. Plotnikov <Yurij.Plotnikov@oktetlabs.ru>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9cb3a50c

17 1月, 2013 2 次提交

ipv4: Don't update the pmtu on mtu locked routes · fa1e492a

由 Steffen Klassert 提交于 1月 16, 2013

Routes with locked mtu should not use learned pmtu informations,
so do not update the pmtu on these routes.
Reported-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa1e492a

ipv4: Remove output route check in ipv4_mtu · 38d523e2

由 Steffen Klassert 提交于 1月 16, 2013

The output route check was introduced with git commit 261663b0
(ipv4: Don't use the cached pmtu informations for input routes)
during times when we cached the pmtu informations on the
inetpeer. Now the pmtu informations are back in the routes,
so this check is obsolete. It also had some unwanted side effects,
as reported by Timo Teras and Lukas Tribus.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Acked-by: NTimo Teräs <timo.teras@iki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

38d523e2

08 12月, 2012 1 次提交

ipv4/route/rtnl: get mcast attributes when dst is multicast · 8caaf7b6

由 Nicolas Dichtel 提交于 12月 04, 2012

Commit f1ce3062 (ipv4: Remove 'rt_dst' from 'struct rtable') removes the
call to ipmr_get_route(), which will get multicast parameters of the route.

I revert the part of the patch that remove this call. I think the goal was only
to get rid of rt_dst field.

The patch is only compiled-tested. My first idea was to remove ipmr_get_route()
because rt_fill_info() was the only user, but it seems the previous patch cleans
the code a bit too much ;-)
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8caaf7b6

23 11月, 2012 1 次提交

ipv4: do not cache looped multicasts · 63617421

由 Julian Anastasov 提交于 11月 22, 2012

	Starting from 3.6 we cache output routes for
multicasts only when using route to 224/4. For local receivers
we can set RTCF_LOCAL flag depending on the membership but
in such case we use maddr and saddr which are not caching
keys as before. Additionally, we can not use same place to
cache routes that differ in RTCF_LOCAL flag value.

	Fix it by caching only RTCF_MULTICAST entries
without RTCF_LOCAL (send-only, no loopback). As a side effect,
we avoid unneeded lookup for fnhe when not caching because
multicasts are not redirected and they do not learn PMTU.

	Thanks to Maxime Bizon for showing the caching
problems in __mkroute_output for 3.6 kernels: different
RTCF_LOCAL flag in cache can lead to wrong ip_mc_output or
ip_output call and the visible problem is that traffic can
not reach local receivers via loopback.
Reported-by: NMaxime Bizon <mbizon@freebox.fr>
Tested-by: NMaxime Bizon <mbizon@freebox.fr>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

63617421

19 11月, 2012 1 次提交

net: Don't export sysctls to unprivileged users · 464dc801

由 Eric W. Biederman 提交于 11月 16, 2012

In preparation for supporting the creation of network namespaces
by unprivileged users, modify all of the per net sysctl exports
and refuse to allow them to unprivileged users.

This makes it safe for unprivileged users in general to access
per net sysctls, and allows sysctls to be exported to unprivileged
users on an individual basis as they are deemed safe.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

464dc801

13 11月, 2012 1 次提交

xfrm: Fix the gc threshold value for ipv4 · 703fb94e

由 Steffen Klassert 提交于 11月 13, 2012

The xfrm gc threshold value depends on ip_rt_max_size. This
value was set to INT_MAX with the routing cache removal patch,
so we start doing garbage collecting when we have INT_MAX/2
IPsec routes cached. Fix this by going back to the static
threshold of 1024 routes.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>

703fb94e

19 10月, 2012 1 次提交

ipv4: Fix flushing of cached routing informations · 13d82bf5

由 Steffen Klassert 提交于 10月 17, 2012

Currently we can not flush cached pmtu/redirect informations via
the ipv4_sysctl_rtcache_flush sysctl. We need to check the rt_genid
of the old route and reset the nh exeption if the old route is
expired when we bind a new route to a nh exeption.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13d82bf5

11 10月, 2012 1 次提交

ipv4: fix route mark sparse warning · 68aaed54

由 stephen hemminger 提交于 10月 10, 2012

Sparse complains about RTA_MARK which is should be host order according
to include file and usage in iproute.

net/ipv4/route.c:2223:46: warning: incorrect type in argument 3 (different base types)
net/ipv4/route.c:2223:46: expected restricted __be32 [usertype] value
net/ipv4/route.c:2223:46: got unsigned int [unsigned] [usertype] flowic_mark
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68aaed54

09 10月, 2012 1 次提交

ipv4: Add FLOWI_FLAG_KNOWN_NH · c92b9655

由 Julian Anastasov 提交于 10月 08, 2012

Add flag to request that output route should be
returned with known rt_gateway, in case we want to use
it as nexthop for neighbour resolving.

	The returned route can be cached as follows:

- in NH exception: because the cached routes are not shared
	with other destinations
- in FIB NH: when using gateway because all destinations for
	NH share same gateway

	As last option, to return rt_gateway!=0 we have to
set DST_NOCACHE.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c92b9655

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功