1. 14 5月, 2014 1 次提交
  2. 13 5月, 2014 1 次提交
  3. 09 5月, 2014 7 次提交
  4. 08 5月, 2014 4 次提交
    • S
      ipv4: fib_semantics: increment fib_info_cnt after fib_info allocation · aeefa1ec
      Sergey Popovich 提交于
      Increment fib_info_cnt in fib_create_info() right after successfuly
      alllocating fib_info structure, overwise fib_metrics allocation failure
      leads to fib_info_cnt incorrectly decremented in free_fib_info(), called
      on error path from fib_create_info().
      Signed-off-by: NSergey Popovich <popovich_sergei@mail.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aeefa1ec
    • W
      net: clean up snmp stats code · 698365fa
      WANG Cong 提交于
      commit 8f0ea0fe (snmp: reduce percpu needs by 50%)
      reduced snmp array size to 1, so technically it doesn't have to be
      an array any more. What's more, after the following commit:
      
      	commit 933393f5
      	Date:   Thu Dec 22 11:58:51 2011 -0600
      
      	    percpu: Remove irqsafe_cpu_xxx variants
      
      	    We simply say that regular this_cpu use must be safe regardless of
      	    preemption and interrupt state.  That has no material change for x86
      	    and s390 implementations of this_cpu operations.  However, arches that
      	    do not provide their own implementation for this_cpu operations will
      	    now get code generated that disables interrupts instead of preemption.
      
      probably no arch wants to have SNMP_ARRAY_SZ == 2. At least after
      almost 3 years, no one complains.
      
      So, just convert the array to a single pointer and remove snmp_mib_init()
      and snmp_mib_free() as well.
      
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      698365fa
    • F
      net: ip: push gso skb forwarding handling down the stack · c7ba65d7
      Florian Westphal 提交于
      Doing the segmentation in the forward path has one major drawback:
      
      When using virtio, we may process gso udp packets coming
      from host network stack.  In that case, netfilter POSTROUTING
      will see one packet with udp header followed by multiple ip
      fragments.
      
      Delay the segmentation and do it after POSTROUTING invocation
      to avoid this.
      
      Fixes: fe6cc55f ("net: ip, ipv6: handle gso skbs in forwarding path")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7ba65d7
    • F
      net: ipv4: ip_forward: fix inverted local_df test · ca6c5d4a
      Florian Westphal 提交于
      local_df means 'ignore DF bit if set', so if its set we're
      allowed to perform ip fragmentation.
      
      This wasn't noticed earlier because the output path also drops such skbs
      (and emits needed icmp error) and because netfilter ip defrag did not
      set local_df until couple of days ago.
      
      Only difference is that DF-packets-larger-than MTU now discarded
      earlier (f.e. we avoid pointless netfilter postrouting trip).
      
      While at it, drop the repeated test ip_exceeds_mtu, checking it once
      is enough...
      
      Fixes: fe6cc55f ("net: ip, ipv6: handle gso skbs in forwarding path")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca6c5d4a
  5. 06 5月, 2014 2 次提交
  6. 05 5月, 2014 1 次提交
  7. 04 5月, 2014 2 次提交
  8. 03 5月, 2014 1 次提交
    • E
      tcp: fix cwnd limited checking to improve congestion control · e114a710
      Eric Dumazet 提交于
      Yuchung discovered tcp_is_cwnd_limited() was returning false in
      slow start phase even if the application filled the socket write queue.
      
      All congestion modules take into account tcp_is_cwnd_limited()
      before increasing cwnd, so this behavior limits slow start from
      probing the bandwidth at full speed.
      
      The problem is that even if write queue is full (aka we are _not_
      application limited), cwnd can be under utilized if TSO should auto
      defer or TCP Small queues decided to hold packets.
      
      So the in_flight can be kept to smaller value, and we can get to the
      point tcp_is_cwnd_limited() returns false.
      
      With TCP Small Queues and FQ/pacing, this issue is more visible.
      
      We fix this by having tcp_cwnd_validate(), which is supposed to track
      such things, take into account unsent_segs, the number of segs that we
      are not sending at the moment due to TSO or TSQ, but intend to send
      real soon. Then when we are cwnd-limited, remember this fact while we
      are processing the window of ACKs that comes back.
      
      For example, suppose we have a brand new connection with cwnd=10; we
      are in slow start, and we send a flight of 9 packets. By the time we
      have received ACKs for all 9 packets we want our cwnd to be 18.
      We implement this by setting tp->lsnd_pending to 9, and
      considering ourselves to be cwnd-limited while cwnd is less than
      twice tp->lsnd_pending (2*9 -> 18).
      
      This makes tcp_is_cwnd_limited() more understandable, by removing
      the GSO/TSO kludge, that tried to work around the issue.
      
      Note the in_flight parameter can be removed in a followup cleanup
      patch.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e114a710
  9. 01 5月, 2014 2 次提交
  10. 29 4月, 2014 1 次提交
  11. 27 4月, 2014 1 次提交
  12. 24 4月, 2014 1 次提交
    • N
      gre: add x-netns support · b57708ad
      Nicolas Dichtel 提交于
      This patch allows to switch the netns when packet is encapsulated or
      decapsulated. In other word, the encapsulated packet is received in a netns,
      where the lookup is done to find the tunnel. Once the tunnel is found, the
      packet is decapsulated and injecting into the corresponding interface which
      stands to another netns.
      
      When one of the two netns is removed, the tunnel is destroyed.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b57708ad
  13. 23 4月, 2014 1 次提交
  14. 21 4月, 2014 2 次提交
  15. 17 4月, 2014 3 次提交
    • N
      ip_tunnel: use the right netns in ioctl handler · 8c923ce2
      Nicolas Dichtel 提交于
      Because the netdevice may be in another netns than the i/o netns, we should
      use the i/o netns instead of dev_net(dev).
      
      The variable 'tunnel' was used only to get 'itn', hence to simplify code I
      remove it and use 't' instead.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8c923ce2
    • C
      ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source() · 0d5edc68
      Cong Wang 提交于
      In my special case, when a packet is redirected from veth0 to lo,
      its skb->dev->ifindex would be LOOPBACK_IFINDEX. Meanwhile we
      pass the hard-coded LOOPBACK_IFINDEX to fib_validate_source()
      in ip_route_input_slow(). This would cause the following check
      in fib_validate_source() fail:
      
                  (dev->ifindex != oif || !IN_DEV_TX_REDIRECTS(idev))
      
      when rp_filter is disabeld on loopback. As suggested by Julian,
      the caller should pass 0 here so that we will not end up by
      calling __fib_validate_source().
      
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d5edc68
    • C
      ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif · 6a662719
      Cong Wang 提交于
      As suggested by Julian:
      
      	Simply, flowi4_iif must not contain 0, it does not
      	look logical to ignore all ip rules with specified iif.
      
      because in fib_rule_match() we do:
      
              if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
                      goto out;
      
      flowi4_iif should be LOOPBACK_IFINDEX by default.
      
      We need to move LOOPBACK_IFINDEX to include/net/flow.h:
      
      1) It is mostly used by flowi_iif
      
      2) Fix the following compile error if we use it in flow.h
      by the patches latter:
      
      In file included from include/linux/netfilter.h:277:0,
                       from include/net/netns/netfilter.h:5,
                       from include/net/net_namespace.h:21,
                       from include/linux/netdevice.h:43,
                       from include/linux/icmpv6.h:12,
                       from include/linux/ipv6.h:61,
                       from include/net/ipv6.h:16,
                       from include/linux/sunrpc/clnt.h:27,
                       from include/linux/nfs_fs.h:30,
                       from init/do_mounts.c:32:
      include/net/flow.h: In function ‘flowi4_init_output’:
      include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)
      
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a662719
  16. 16 4月, 2014 2 次提交
  17. 14 4月, 2014 2 次提交
  18. 13 4月, 2014 2 次提交
    • N
      vti: don't allow to add the same tunnel twice · 8d89dcdf
      Nicolas Dichtel 提交于
      Before the patch, it was possible to add two times the same tunnel:
      ip l a vti1 type vti remote 10.16.0.121 local 10.16.0.249 key 41
      ip l a vti2 type vti remote 10.16.0.121 local 10.16.0.249 key 41
      
      It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
      argument dev->type, which was set only later (when calling ndo_init handler
      in register_netdevice()). Let's set this type in the setup handler, which is
      called before newlink handler.
      
      Introduced by commit b9959fd3 ("vti: switch to new ip tunnel code").
      
      CC: Cong Wang <amwang@redhat.com>
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d89dcdf
    • N
      gre: don't allow to add the same tunnel twice · 5a455275
      Nicolas Dichtel 提交于
      Before the patch, it was possible to add two times the same tunnel:
      ip l a gre1 type gre remote 10.16.0.121 local 10.16.0.249
      ip l a gre2 type gre remote 10.16.0.121 local 10.16.0.249
      
      It was possible, because ip_tunnel_newlink() calls ip_tunnel_find() with the
      argument dev->type, which was set only later (when calling ndo_init handler
      in register_netdevice()). Let's set this type in the setup handler, which is
      called before newlink handler.
      
      Introduced by commit c5441932 ("GRE: Refactor GRE tunneling code.").
      
      CC: Pravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a455275
  19. 12 4月, 2014 1 次提交
    • D
      net: Fix use after free by removing length arg from sk_data_ready callbacks. · 676d2369
      David S. Miller 提交于
      Several spots in the kernel perform a sequence like:
      
      	skb_queue_tail(&sk->s_receive_queue, skb);
      	sk->sk_data_ready(sk, skb->len);
      
      But at the moment we place the SKB onto the socket receive queue it
      can be consumed and freed up.  So this skb->len access is potentially
      to freed up memory.
      
      Furthermore, the skb->len can be modified by the consumer so it is
      possible that the value isn't accurate.
      
      And finally, no actual implementation of this callback actually uses
      the length argument.  And since nobody actually cared about it's
      value, lots of call sites pass arbitrary values in such as '0' and
      even '1'.
      
      So just remove the length argument from the callback, that way there
      is no confusion whatsoever and all of these use-after-free cases get
      fixed as a side effect.
      
      Based upon a patch by Eric Dumazet and his suggestion to audit this
      issue tree-wide.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      676d2369
  20. 08 4月, 2014 1 次提交
    • C
      net: replace __this_cpu_inc in route.c with raw_cpu_inc · 3ed66e91
      Christoph Lameter 提交于
      The RT_CACHE_STAT_INC macro triggers the new preemption checks
      for __this_cpu ops.
      
      I do not see any other synchronization that would allow the use of a
      __this_cpu operation here however in commit dbd2915c ("[IPV4]:
      RT_CACHE_STAT_INC() warning fix") Andrew justifies the use of
      raw_smp_processor_id() here because "we do not care" about races.  In
      the past we agreed that the price of disabling interrupts here to get
      consistent counters would be too high.  These counters may be inaccurate
      due to race conditions.
      
      The use of __this_cpu op improves the situation already from what commit
      dbd2915c did since the single instruction emitted on x86 does not
      allow the race to occur anymore.  However, non x86 platforms could still
      experience a race here.
      
      Trace:
      
        __this_cpu_add operation in preemptible [00000000] code: avahi-daemon/1193
        caller is __this_cpu_preempt_check+0x38/0x60
        CPU: 1 PID: 1193 Comm: avahi-daemon Tainted: GF            3.12.0-rc4+ #187
        Call Trace:
          check_preemption_disabled+0xec/0x110
          __this_cpu_preempt_check+0x38/0x60
          __ip_route_output_key+0x575/0x8c0
          ip_route_output_flow+0x27/0x70
          udp_sendmsg+0x825/0xa20
          inet_sendmsg+0x85/0xc0
          sock_sendmsg+0x9c/0xd0
          ___sys_sendmsg+0x37c/0x390
          __sys_sendmsg+0x49/0x90
          SyS_sendmsg+0x12/0x20
          tracesys+0xe1/0xe6
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ed66e91
  21. 05 4月, 2014 1 次提交
    • T
      netfilter: Can't fail and free after table replacement · c58dd2dd
      Thomas Graf 提交于
      All xtables variants suffer from the defect that the copy_to_user()
      to copy the counters to user memory may fail after the table has
      already been exchanged and thus exposed. Return an error at this
      point will result in freeing the already exposed table. Any
      subsequent packet processing will result in a kernel panic.
      
      We can't copy the counters before exposing the new tables as we
      want provide the counter state after the old table has been
      unhooked. Therefore convert this into a silent error.
      
      Cc: Florian Westphal <fw@strlen.de>
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c58dd2dd
  22. 29 3月, 2014 1 次提交