1. 09 10月, 2013 6 次提交
  2. 08 10月, 2013 5 次提交
    • E
      net: Update the sysctl permissions handler to test effective uid/gid · 88ba09df
      Eric W. Biederman 提交于
      On Tue, 20 Aug 2013 11:40:04 -0500 Eric Sandeen <sandeen@redhat.com> wrote:
      > This was brought up in a Red Hat bug (which may be marked private, I'm sorry):
      >
      > Bug 987055 - open O_WRONLY succeeds on some root owned files in /proc for process running with unprivileged EUID
      >
      > "On RHEL7 some of the files in /proc can be opened for writing by an unprivileged EUID."
      >
      > The flaw existed upstream as well last I checked.
      >
      > This commit in kernel v3.8 caused the regression:
      >
      > commit cff10976
      > Author: Eric W. Biederman <ebiederm@xmission.com>
      > Date:   Fri Nov 16 03:03:01 2012 +0000
      >
      >     net: Update the per network namespace sysctls to be available to the network namespace owner
      >
      >     - Allow anyone with CAP_NET_ADMIN rights in the user namespace of the
      >       the netowrk namespace to change sysctls.
      >     - Allow anyone the uid of the user namespace root the same
      >       permissions over the network namespace sysctls as the global root.
      >     - Allow anyone with gid of the user namespace root group the same
      >       permissions over the network namespace sysctl as the global root group.
      >
      >     Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
      >     Signed-off-by: David S. Miller <davem@davemloft.net>
      >
      > because it changed /sys/net's special permission handler to test current_uid, not
      > current_euid; same for current_gid/current_egid.
      >
      > So in this case, root cannot drop privs via set[ug]id, and retains all privs
      > in this codepath.
      
      Modify the code to use current_euid(), and in_egroup_p, as in done
      in fs/proc/proc_sysctl.c:test_perm()
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      Reported-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88ba09df
    • J
      ipv4: fix ineffective source address selection · 0a7e2260
      Jiri Benc 提交于
      When sending out multicast messages, the source address in inet->mc_addr is
      ignored and rewritten by an autoselected one. This is caused by a typo in
      commit 813b3b5d ("ipv4: Use caller's on-stack flowi as-is in output
      route lookups").
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a7e2260
    • A
      net: fix unsafe set_memory_rw from softirq · d45ed4a4
      Alexei Starovoitov 提交于
      on x86 system with net.core.bpf_jit_enable = 1
      
      sudo tcpdump -i eth1 'tcp port 22'
      
      causes the warning:
      [   56.766097]  Possible unsafe locking scenario:
      [   56.766097]
      [   56.780146]        CPU0
      [   56.786807]        ----
      [   56.793188]   lock(&(&vb->lock)->rlock);
      [   56.799593]   <Interrupt>
      [   56.805889]     lock(&(&vb->lock)->rlock);
      [   56.812266]
      [   56.812266]  *** DEADLOCK ***
      [   56.812266]
      [   56.830670] 1 lock held by ksoftirqd/1/13:
      [   56.836838]  #0:  (rcu_read_lock){.+.+..}, at: [<ffffffff8118f44c>] vm_unmap_aliases+0x8c/0x380
      [   56.849757]
      [   56.849757] stack backtrace:
      [   56.862194] CPU: 1 PID: 13 Comm: ksoftirqd/1 Not tainted 3.12.0-rc3+ #45
      [   56.868721] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
      [   56.882004]  ffffffff821944c0 ffff88080bbdb8c8 ffffffff8175a145 0000000000000007
      [   56.895630]  ffff88080bbd5f40 ffff88080bbdb928 ffffffff81755b14 0000000000000001
      [   56.909313]  ffff880800000001 ffff880800000000 ffffffff8101178f 0000000000000001
      [   56.923006] Call Trace:
      [   56.929532]  [<ffffffff8175a145>] dump_stack+0x55/0x76
      [   56.936067]  [<ffffffff81755b14>] print_usage_bug+0x1f7/0x208
      [   56.942445]  [<ffffffff8101178f>] ? save_stack_trace+0x2f/0x50
      [   56.948932]  [<ffffffff810cc0a0>] ? check_usage_backwards+0x150/0x150
      [   56.955470]  [<ffffffff810ccb52>] mark_lock+0x282/0x2c0
      [   56.961945]  [<ffffffff810ccfed>] __lock_acquire+0x45d/0x1d50
      [   56.968474]  [<ffffffff810cce6e>] ? __lock_acquire+0x2de/0x1d50
      [   56.975140]  [<ffffffff81393bf5>] ? cpumask_next_and+0x55/0x90
      [   56.981942]  [<ffffffff810cef72>] lock_acquire+0x92/0x1d0
      [   56.988745]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
      [   56.995619]  [<ffffffff817628f1>] _raw_spin_lock+0x41/0x50
      [   57.002493]  [<ffffffff8118f52a>] ? vm_unmap_aliases+0x16a/0x380
      [   57.009447]  [<ffffffff8118f52a>] vm_unmap_aliases+0x16a/0x380
      [   57.016477]  [<ffffffff8118f44c>] ? vm_unmap_aliases+0x8c/0x380
      [   57.023607]  [<ffffffff810436b0>] change_page_attr_set_clr+0xc0/0x460
      [   57.030818]  [<ffffffff810cfb8d>] ? trace_hardirqs_on+0xd/0x10
      [   57.037896]  [<ffffffff811a8330>] ? kmem_cache_free+0xb0/0x2b0
      [   57.044789]  [<ffffffff811b59c3>] ? free_object_rcu+0x93/0xa0
      [   57.051720]  [<ffffffff81043d9f>] set_memory_rw+0x2f/0x40
      [   57.058727]  [<ffffffff8104e17c>] bpf_jit_free+0x2c/0x40
      [   57.065577]  [<ffffffff81642cba>] sk_filter_release_rcu+0x1a/0x30
      [   57.072338]  [<ffffffff811108e2>] rcu_process_callbacks+0x202/0x7c0
      [   57.078962]  [<ffffffff81057f17>] __do_softirq+0xf7/0x3f0
      [   57.085373]  [<ffffffff81058245>] run_ksoftirqd+0x35/0x70
      
      cannot reuse jited filter memory, since it's readonly,
      so use original bpf insns memory to hold work_struct
      
      defer kfree of sk_filter until jit completed freeing
      
      tested on x86_64 and i386
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d45ed4a4
    • O
      ipv6: Allow the MTU of ipip6 tunnel to be set below 1280 · 582442d6
      Oussama Ghorbel 提交于
      The (inner) MTU of a ipip6 (IPv4-in-IPv6) tunnel cannot be set below 1280, which is the minimum MTU in IPv6.
      However, there should be no IPv6 on the tunnel interface at all, so the IPv6 rules should not apply.
      More info at https://bugzilla.kernel.org/show_bug.cgi?id=15530
      
      This patch allows to check the minimum MTU for ipv6 tunnel according to these rules:
      -In case the tunnel is configured with ipip6 mode the minimum MTU is 68.
      -In case the tunnel is configured with ip6ip6 or any mode the minimum MTU is 1280.
      Signed-off-by: NOussama Ghorbel <ou.ghorbel@gmail.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      582442d6
    • M
      netif_set_xps_queue: make cpu mask const · 3573540c
      Michael S. Tsirkin 提交于
      virtio wants to pass in cpumask_of(cpu), make parameter
      const to avoid build warnings.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3573540c
  3. 05 10月, 2013 1 次提交
    • E
      tcp: do not forget FIN in tcp_shifted_skb() · 5e8a402f
      Eric Dumazet 提交于
      Yuchung found following problem :
      
       There are bugs in the SACK processing code, merging part in
       tcp_shift_skb_data(), that incorrectly resets or ignores the sacked
       skbs FIN flag. When a receiver first SACK the FIN sequence, and later
       throw away ofo queue (e.g., sack-reneging), the sender will stop
       retransmitting the FIN flag, and hangs forever.
      
      Following packetdrill test can be used to reproduce the bug.
      
      $ cat sack-merge-bug.pkt
      `sysctl -q net.ipv4.tcp_fack=0`
      
      // Establish a connection and send 10 MSS.
      0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
      +.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
      +.000 bind(3, ..., ...) = 0
      +.000 listen(3, 1) = 0
      
      +.050 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7>
      +.000 > S. 0:0(0) ack 1 <mss 1460,nop,nop,sackOK,nop,wscale 6>
      +.001 < . 1:1(0) ack 1 win 1024
      +.000 accept(3, ..., ...) = 4
      
      +.100 write(4, ..., 12000) = 12000
      +.000 shutdown(4, SHUT_WR) = 0
      +.000 > . 1:10001(10000) ack 1
      +.050 < . 1:1(0) ack 2001 win 257
      +.000 > FP. 10001:12001(2000) ack 1
      +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:11001,nop,nop>
      +.050 < . 1:1(0) ack 2001 win 257 <sack 10001:12002,nop,nop>
      // SACK reneg
      +.050 < . 1:1(0) ack 12001 win 257
      +0 %{ print "unacked: ",tcpi_unacked }%
      +5 %{ print "" }%
      
      First, a typo inverted left/right of one OR operation, then
      code forgot to advance end_seq if the merged skb carried FIN.
      
      Bug was added in 2.6.29 by commit 832d11c5
      ("tcp: Try to restore large SKBs while SACK processing")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e8a402f
  4. 04 10月, 2013 1 次提交
  5. 03 10月, 2013 4 次提交
    • F
      l2tp: fix kernel panic when using IPv4-mapped IPv6 addresses · e18503f4
      François Cachereul 提交于
      IPv4 mapped addresses cause kernel panic.
      The patch juste check whether the IPv6 address is an IPv4 mapped
      address. If so, use IPv4 API instead of IPv6.
      
      [  940.026915] general protection fault: 0000 [#1]
      [  940.026915] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core pppox ppp_generic slhc loop psmouse
      [  940.026915] CPU: 0 PID: 3184 Comm: memcheck-amd64- Not tainted 3.11.0+ #1
      [  940.026915] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
      [  940.026915] task: ffff880007130e20 ti: ffff88000737e000 task.ti: ffff88000737e000
      [  940.026915] RIP: 0010:[<ffffffff81333780>]  [<ffffffff81333780>] ip6_xmit+0x276/0x326
      [  940.026915] RSP: 0018:ffff88000737fd28  EFLAGS: 00010286
      [  940.026915] RAX: c748521a75ceff48 RBX: ffff880000c30800 RCX: 0000000000000000
      [  940.026915] RDX: ffff88000075cc4e RSI: 0000000000000028 RDI: ffff8800060e5a40
      [  940.026915] RBP: ffff8800060e5a40 R08: 0000000000000000 R09: ffff88000075cc90
      [  940.026915] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88000737fda0
      [  940.026915] R13: 0000000000000000 R14: 0000000000002000 R15: ffff880005d3b580
      [  940.026915] FS:  00007f163dc5e800(0000) GS:ffffffff81623000(0000) knlGS:0000000000000000
      [  940.026915] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  940.026915] CR2: 00000004032dc940 CR3: 0000000005c25000 CR4: 00000000000006f0
      [  940.026915] Stack:
      [  940.026915]  ffff88000075cc4e ffffffff81694e90 ffff880000c30b38 0000000000000020
      [  940.026915]  11000000523c4bac ffff88000737fdb4 0000000000000000 ffff880000c30800
      [  940.026915]  ffff880005d3b580 ffff880000c30b38 ffff8800060e5a40 0000000000000020
      [  940.026915] Call Trace:
      [  940.026915]  [<ffffffff81356cc3>] ? inet6_csk_xmit+0xa4/0xc4
      [  940.026915]  [<ffffffffa0038535>] ? l2tp_xmit_skb+0x503/0x55a [l2tp_core]
      [  940.026915]  [<ffffffff812b8d3b>] ? pskb_expand_head+0x161/0x214
      [  940.026915]  [<ffffffffa003e91d>] ? pppol2tp_xmit+0xf2/0x143 [l2tp_ppp]
      [  940.026915]  [<ffffffffa00292e0>] ? ppp_channel_push+0x36/0x8b [ppp_generic]
      [  940.026915]  [<ffffffffa00293fe>] ? ppp_write+0xaf/0xc5 [ppp_generic]
      [  940.026915]  [<ffffffff8110ead4>] ? vfs_write+0xa2/0x106
      [  940.026915]  [<ffffffff8110edd6>] ? SyS_write+0x56/0x8a
      [  940.026915]  [<ffffffff81378ac0>] ? system_call_fastpath+0x16/0x1b
      [  940.026915] Code: 00 49 8b 8f d8 00 00 00 66 83 7c 11 02 00 74 60 49
      8b 47 58 48 83 e0 fe 48 8b 80 18 01 00 00 48 85 c0 74 13 48 8b 80 78 02
      00 00 <48> ff 40 28 41 8b 57 68 48 01 50 30 48 8b 54 24 08 49 c7 c1 51
      [  940.026915] RIP  [<ffffffff81333780>] ip6_xmit+0x276/0x326
      [  940.026915]  RSP <ffff88000737fd28>
      [  940.057945] ---[ end trace be8aba9a61c8b7f3 ]---
      [  940.058583] Kernel panic - not syncing: Fatal exception in interrupt
      Signed-off-by: NFrançois CACHEREUL <f.cachereul@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e18503f4
    • E
      net: do not call sock_put() on TIMEWAIT sockets · 80ad1d61
      Eric Dumazet 提交于
      commit 3ab5aee7 ("net: Convert TCP & DCCP hash tables to use RCU /
      hlist_nulls") incorrectly used sock_put() on TIMEWAIT sockets.
      
      We should instead use inet_twsk_put()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80ad1d61
    • A
      tcp: Always set options to 0 before calling tcp_established_options · 5843ef42
      Andi Kleen 提交于
      tcp_established_options assumes opts->options is 0 before calling,
      as it read modify writes it.
      
      For the tcp_current_mss() case the opts structure is not zeroed,
      so this can be done with uninitialized values.
      
      This is ok, because ->options is not read in this path.
      But it's still better to avoid the operation on the uninitialized
      field. This shuts up a static code analyzer, and presumably
      may help the optimizer.
      
      Cc: netdev@vger.kernel.org
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5843ef42
    • M
      unix_diag: fix info leak · 6865d1e8
      Mathias Krause 提交于
      When filling the netlink message we miss to wipe the pad field,
      therefore leak one byte of heap memory to userland. Fix this by
      setting pad to 0.
      Signed-off-by: NMathias Krause <minipli@googlemail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6865d1e8
  6. 02 10月, 2013 8 次提交
    • M
      batman-adv: set up network coding packet handlers during module init · 6c519bad
      Matthias Schiffer 提交于
      batman-adv saves its table of packet handlers as a global state, so handlers
      must be set up only once (and setting them up a second time will fail).
      
      The recently-added network coding support tries to set up its handler each time
      a new softif is registered, which obviously fails when more that one softif is
      used (and in consequence, the softif creation fails).
      
      Fix this by splitting up batadv_nc_init into batadv_nc_init (which is called
      only once) and batadv_nc_mesh_init (which is called for each softif); in
      addition batadv_nc_free is renamed to batadv_nc_mesh_free to keep naming
      consistent.
      Signed-off-by: NMatthias Schiffer <mschiffer@universe-factory.net>
      Signed-off-by: NMarek Lindner <mareklindner@neomailbox.ch>
      Signed-off-by: NAntonio Quartulli <antonio@meshcoding.com>
      6c519bad
    • E
      pkt_sched: fq: rate limiting improvements · 0eab5eb7
      Eric Dumazet 提交于
      FQ rate limiting suffers from two problems, reported
      by Steinar :
      
      1) FQ enforces a delay when flow quantum is exhausted in order
      to reduce cpu overhead. But if packets are small, current
      delay computation is slightly wrong, and observed rates can
      be too high.
      
      Steinar had this problem because he disabled TSO and GSO,
      and default FQ quantum is 2*1514.
      
      (Of course, I wish recent TSO auto sizing changes will help
      to not having to disable TSO in the first place)
      
      2) maxrate was not used for forwarded flows (skbs not attached
      to a socket)
      
      Tested:
      
      tc qdisc add dev eth0 root est 1sec 4sec fq maxrate 8Mbit
      netperf -H lpq84 -l 1000 &
      sleep 10 ; tc -s qdisc show dev eth0
      qdisc fq 8003: root refcnt 32 limit 10000p flow_limit 100p buckets 1024
       quantum 3028 initial_quantum 15140 maxrate 8000Kbit
       Sent 16819357 bytes 11258 pkt (dropped 0, overlimits 0 requeues 0)
       rate 7831Kbit 653pps backlog 7570b 5p requeues 0
        44 flows (43 inactive, 1 throttled), next packet delay 2977352 ns
        0 gc, 0 highprio, 5545 throttled
      
      lpq83:~# tcpdump -p -i eth0 host lpq84 -c 12
      09:02:52.079484 IP lpq83 > lpq84: . 1389536928:1389538376(1448) ack 3808678021 win 457 <nop,nop,timestamp 961812 572609068>
      09:02:52.079499 IP lpq83 > lpq84: . 1448:2896(1448) ack 1 win 457 <nop,nop,timestamp 961812 572609068>
      09:02:52.079906 IP lpq84 > lpq83: . ack 2896 win 16384 <nop,nop,timestamp 572609080 961812>
      09:02:52.082568 IP lpq83 > lpq84: . 2896:4344(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
      09:02:52.082581 IP lpq83 > lpq84: . 4344:5792(1448) ack 1 win 457 <nop,nop,timestamp 961815 572609071>
      09:02:52.083017 IP lpq84 > lpq83: . ack 5792 win 16384 <nop,nop,timestamp 572609083 961815>
      09:02:52.085678 IP lpq83 > lpq84: . 5792:7240(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
      09:02:52.085693 IP lpq83 > lpq84: . 7240:8688(1448) ack 1 win 457 <nop,nop,timestamp 961818 572609074>
      09:02:52.086117 IP lpq84 > lpq83: . ack 8688 win 16384 <nop,nop,timestamp 572609086 961818>
      09:02:52.088792 IP lpq83 > lpq84: . 8688:10136(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
      09:02:52.088806 IP lpq83 > lpq84: . 10136:11584(1448) ack 1 win 457 <nop,nop,timestamp 961821 572609077>
      09:02:52.089217 IP lpq84 > lpq83: . ack 11584 win 16384 <nop,nop,timestamp 572609090 961821>
      Reported-by: NSteinar H. Gunderson <sesse@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0eab5eb7
    • N
      ip6tnl: allow to use rtnl ops on fb tunnel · bb814094
      Nicolas Dichtel 提交于
      rtnl ops where introduced by c075b130 ("ip6tnl: advertise tunnel param via
      rtnl"), but I forget to assign rtnl ops to fb tunnels.
      
      Now that it is done, we must remove the explicit call to
      unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
      in ip6_tnl_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
      is valid since commit 0bd87628 ("ip6tnl: add x-netns support")).
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb814094
    • N
      sit: allow to use rtnl ops on fb tunnel · 205983c4
      Nicolas Dichtel 提交于
      rtnl ops where introduced by ba3e3f50 ("sit: advertise tunnel param via
      rtnl"), but I forget to assign rtnl ops to fb tunnels.
      
      Now that it is done, we must remove the explicit call to
      unregister_netdevice_queue(), because  the fallback tunnel is added to the queue
      in sit_destroy_tunnels() when checking rtnl_link_ops of all netdevices (this
      is valid since commit 5e6700b3 ("sit: add support of x-netns")).
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      205983c4
    • S
      ip_tunnel: Remove double unregister of the fallback device · cfe4a536
      Steffen Klassert 提交于
      When queueing the netdevices for removal, we queue the
      fallback device twice in ip_tunnel_destroy(). The first
      time when we queue all netdevices in the namespace and
      then again explicitly. Fix this by removing the explicit
      queueing of the fallback device.
      
      Bug was introduced when network namespace support was added
      with commit 6c742e71 ("ipip: add x-netns support").
      
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfe4a536
    • S
      ip_tunnel_core: Change __skb_push back to skb_push · 78a3694d
      Steffen Klassert 提交于
      Git commit 0e6fbc5b ("ip_tunnels: extend iptunnel_xmit()")
      moved the IP header installation to iptunnel_xmit() and
      changed skb_push() to __skb_push(). This makes possible
      bugs hard to track down, so change it back to skb_push().
      
      Cc: Pravin Shelar <pshelar@nicira.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78a3694d
    • S
      ip_tunnel: Add fallback tunnels to the hash lists · 67013282
      Steffen Klassert 提交于
      Currently we can not update the tunnel parameters of
      the fallback tunnels because we don't find them in the
      hash lists. Fix this by adding them on initialization.
      
      Bug was introduced with commit c5441932
      ("GRE: Refactor GRE tunneling code.")
      
      Cc: Pravin Shelar <pshelar@nicira.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      67013282
    • S
      ip_tunnel: Fix a memory corruption in ip_tunnel_xmit · 3e08f4a7
      Steffen Klassert 提交于
      We might extend the used aera of a skb beyond the total
      headroom when we install the ipip header. Fix this by
      calling skb_cow_head() unconditionally.
      
      Bug was introduced with commit c5441932
      ("GRE: Refactor GRE tunneling code.")
      
      Cc: Pravin Shelar <pshelar@nicira.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e08f4a7
  7. 01 10月, 2013 8 次提交
    • S
      ipv6 mcast: use in6_dev_put in timer handlers instead of __in6_dev_put · 9260d3e1
      Salam Noureddine 提交于
      It is possible for the timer handlers to run after the call to
      ipv6_mc_down so use in6_dev_put instead of __in6_dev_put in the
      handler function in order to do proper cleanup when the refcnt
      reaches 0. Otherwise, the refcnt can reach zero without the
      inet6_dev being destroyed and we end up leaking a reference to
      the net_device and see messages like the following,
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      Tested on linux-3.4.43.
      Signed-off-by: NSalam Noureddine <noureddine@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9260d3e1
    • S
      ipv4 igmp: use in_dev_put in timer handlers instead of __in_dev_put · e2401654
      Salam Noureddine 提交于
      It is possible for the timer handlers to run after the call to
      ip_mc_down so use in_dev_put instead of __in_dev_put in the handler
      function in order to do proper cleanup when the refcnt reaches 0.
      Otherwise, the refcnt can reach zero without the in_device being
      destroyed and we end up leaking a reference to the net_device and
      see messages like the following,
      
      unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      Tested on linux-3.4.43.
      Signed-off-by: NSalam Noureddine <noureddine@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2401654
    • H
      ipv6: gre: correct calculation of max_headroom · 3da812d8
      Hannes Frederic Sowa 提交于
      gre_hlen already accounts for sizeof(struct ipv6_hdr) + gre header,
      so initialize max_headroom to zero. Otherwise the
      
      	if (encap_limit >= 0) {
      		max_headroom += 8;
      		mtu -= 8;
      	}
      
      increments an uninitialized variable before max_headroom was reset.
      
      Found with coverity: 728539
      
      Cc: Dmitry Kozlov <xeb@mail.ru>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3da812d8
    • E
      tcp: TSQ can use a dynamic limit · c9eeec26
      Eric Dumazet 提交于
      When TCP Small Queues was added, we used a sysctl to limit amount of
      packets queues on Qdisc/device queues for a given TCP flow.
      
      Problem is this limit is either too big for low rates, or too small
      for high rates.
      
      Now TCP stack has rate estimation in sk->sk_pacing_rate, and TSO
      auto sizing, it can better control number of packets in Qdisc/device
      queues.
      
      New limit is two packets or at least 1 to 2 ms worth of packets.
      
      Low rates flows benefit from this patch by having even smaller
      number of packets in queues, allowing for faster recovery,
      better RTT estimations.
      
      High rates flows benefit from this patch by allowing more than 2 packets
      in flight as we had reports this was a limiting factor to reach line
      rate. [ In particular if TX completion is delayed because of coalescing
      parameters ]
      
      Example for a single flow on 10Gbp link controlled by FQ/pacing
      
      14 packets in flight instead of 2
      
      $ tc -s -d qd
      qdisc fq 8001: dev eth0 root refcnt 32 limit 10000p flow_limit 100p
      buckets 1024 quantum 3028 initial_quantum 15140
       Sent 1168459366606 bytes 771822841 pkt (dropped 0, overlimits 0
      requeues 6822476)
       rate 9346Mbit 771713pps backlog 953820b 14p requeues 6822476
        2047 flow, 2046 inactive, 1 throttled, delay 15673 ns
        2372 gc, 0 highprio, 0 retrans, 9739249 throttled, 0 flows_plimit
      
      Note that sk_pacing_rate is currently set to twice the actual rate, but
      this might be refined in the future when a flow is in congestion
      avoidance.
      
      Additional change : skb->destructor should be set to tcp_wfree().
      
      A future patch (for linux 3.13+) might remove tcp_limit_output_bytes
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Acked-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9eeec26
    • E
      pkt_sched: fq: qdisc dismantle fixes · 8d34ce10
      Eric Dumazet 提交于
      fq_reset() should drops all packets in queue, including
      throttled flows.
      
      This patch moves code from fq_destroy() to fq_reset()
      to do the cleaning.
      
      fq_change() must stop calling fq_dequeue() if all remaining
      packets are from throttled flows.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d34ce10
    • E
      net: flow_dissector: fix thoff for IPPROTO_AH · b8678358
      Eric Dumazet 提交于
      In commit 8ed78166 ("flow_keys: include thoff into flow_keys for
      later usage"), we missed that existing code was using nhoff as a
      temporary variable that could not always contain transport header
      offset.
      
      This is not a problem for TCP/UDP because port offset (@poff)
      is 0 for these protocols.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Nikolay Aleksandrov <nikolay@redhat.com>
      Acked-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Acked-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b8678358
    • P
      ipv6: Fix preferred_lft not updating in some cases · c9d55d5b
      Paul Marks 提交于
      Consider the scenario where an IPv6 router is advertising a fixed
      preferred_lft of 1800 seconds, while the valid_lft begins at 3600
      seconds and counts down in realtime.
      
      A client should reset its preferred_lft to 1800 every time the RA is
      received, but a bug is causing Linux to ignore the update.
      
      The core problem is here:
        if (prefered_lft != ifp->prefered_lft) {
      
      Note that ifp->prefered_lft is an offset, so it doesn't decrease over
      time.  Thus, the comparison is always (1800 != 1800), which fails to
      trigger an update.
      
      The most direct solution would be to compute a "stored_prefered_lft",
      and use that value in the comparison.  But I think that trying to filter
      out unnecessary updates here is a premature optimization.  In order for
      the filter to apply, both of these would need to hold:
      
        - The advertised valid_lft and preferred_lft are both declining in
          real time.
        - No clock skew exists between the router & client.
      
      So in this patch, I've set "update_lft = 1" unconditionally, which
      allows the surrounding code to be greatly simplified.
      Signed-off-by: NPaul Marks <pmarks@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9d55d5b
    • P
      ip_tunnel: Do not use stale inner_iph pointer. · d4a71b15
      Pravin B Shelar 提交于
      While sending packet skb_cow_head() can change skb header which
      invalidates inner_iph pointer to skb header. Following patch
      avoid using it. Found by code inspection.
      
      This bug was introduced by commit 0e6fbc5b (ip_tunnels: extend
      iptunnel_xmit()).
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4a71b15
  8. 30 9月, 2013 1 次提交
  9. 29 9月, 2013 3 次提交
    • E
      net: net_secret should not depend on TCP · 9a3bab6b
      Eric Dumazet 提交于
      A host might need net_secret[] and never open a single socket.
      
      Problem added in commit aebda156
      ("net: defer net_secret[] initialization")
      
      Based on prior patch from Hannes Frederic Sowa.
      Reported-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@strressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a3bab6b
    • E
      net: Delay default_device_exit_batch until no devices are unregistering v2 · 50624c93
      Eric W. Biederman 提交于
      There is currently serialization network namespaces exiting and
      network devices exiting as the final part of netdev_run_todo does not
      happen under the rtnl_lock.  This is compounded by the fact that the
      only list of devices unregistering in netdev_run_todo is local to the
      netdev_run_todo.
      
      This lack of serialization in extreme cases results in network devices
      unregistering in netdev_run_todo after the loopback device of their
      network namespace has been freed (making dst_ifdown unsafe), and after
      the their network namespace has exited (making the NETDEV_UNREGISTER,
      and NETDEV_UNREGISTER_FINAL callbacks unsafe).
      
      Add the missing serialization by a per network namespace count of how
      many network devices are unregistering and having a wait queue that is
      woken up whenever the count is decreased.  The count and wait queue
      allow default_device_exit_batch to wait until all of the unregistration
      activity for a network namespace has finished before proceeding to
      unregister the loopback device and then allowing the network namespace
      to exit.
      
      Only a single global wait queue is used because there is a single global
      lock, and there is a single waiter, per network namespace wait queues
      would be a waste of resources.
      
      The per network namespace count of unregistering devices gives a
      progress guarantee because the number of network devices unregistering
      in an exiting network namespace must ultimately drop to zero (assuming
      network device unregistration completes).
      
      The basic logic remains the same as in v1.  This patch is now half
      comment and half rtnl_lock_unregistering an expanded version of
      wait_event performs no extra work in the common case where no network
      devices are unregistering when we get to default_device_exit_batch.
      Reported-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50624c93
    • C
      IPv6 NAT: Do not drop DNATed 6to4/6rd packets · 7df37ff3
      Catalin\(ux\) M. BOIE 提交于
      When a router is doing DNAT for 6to4/6rd packets the latest
      anti-spoofing commit 218774dc ("ipv6: add anti-spoofing checks for
      6to4 and 6rd") will drop them because the IPv6 address embedded does
      not match the IPv4 destination. This patch will allow them to pass by
      testing if we have an address that matches on 6to4/6rd interface.  I
      have been hit by this problem using Fedora and IPV6TO4_IPV4ADDR.
      Also, log the dropped packets (with rate limit).
      Signed-off-by: NCatalin(ux) M. BOIE <catab@embedromix.ro>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7df37ff3
  10. 27 9月, 2013 3 次提交