1. 21 10月, 2016 1 次提交
    • J
      ipv4/6: use core net MTU range checking · b96f9afe
      Jarod Wilson 提交于
      ipv4/ip_tunnel:
      - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len - t_hlen
      - preserve all ndo_change_mtu checks for now to prevent regressions
      
      ipv6/ip6_tunnel:
      - min_mtu = 68, max_mtu = 0xFFF8 - dev->hard_header_len
      - preserve all ndo_change_mtu checks for now to prevent regressions
      
      ipv6/ip6_vti:
      - min_mtu = 1280, max_mtu = 65535
      - remove redundant vti6_change_mtu
      
      ipv6/sit:
      - min_mtu = 1280, max_mtu = 0xFFF8 - t_hlen
      - remove redundant ipip6_tunnel_change_mtu
      
      CC: netdev@vger.kernel.org
      CC: "David S. Miller" <davem@davemloft.net>
      CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      CC: James Morris <jmorris@namei.org>
      CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      CC: Patrick McHardy <kaber@trash.net>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b96f9afe
  2. 20 10月, 2016 1 次提交
  3. 17 10月, 2016 1 次提交
  4. 16 10月, 2016 1 次提交
    • T
      ila: Cache a route to translated address · 79ff2fc3
      Tom Herbert 提交于
      Add a dst_cache to ila_lwt structure. This holds a cached route for the
      translated address. In ila_output we now perform a route lookup after
      translation and if possible (destination in original route is full 128
      bits) we set the dst_cache. Subsequent calls to ila_output can then use
      the cache to avoid the route lookup.
      
      This eliminates the need to set the gateway on ILA routes as previously
      was being done. Now we can do something like:
      
      ./ip route add 3333::2000:0:0:2/128 encap ila 2222:0:0:2 \
          csum-mode neutral-map dev eth0  ## No via needed!
      Signed-off-by: NTom Herbert <tom@herbertland.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      79ff2fc3
  5. 13 10月, 2016 1 次提交
    • E
      ipv6: tcp: restore IP6CB for pktoptions skbs · 8ce48623
      Eric Dumazet 提交于
      Baozeng Ding reported following KASAN splat :
      
      BUG: KASAN: use-after-free in ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 at addr ffff880029c84ec8
      Read of size 1 by task poc/25548
      Call Trace:
       [<ffffffff82cf43c9>] dump_stack+0x12e/0x185 /lib/dump_stack.c:15
       [<     inline     >] print_address_description /mm/kasan/report.c:204
       [<ffffffff817ced3b>] kasan_report_error+0x48b/0x4b0 /mm/kasan/report.c:283
       [<     inline     >] kasan_report /mm/kasan/report.c:303
       [<ffffffff817ced9e>] __asan_report_load1_noabort+0x3e/0x40 /mm/kasan/report.c:321
       [<ffffffff85c71da1>] ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 /net/ipv6/datagram.c:687
       [<ffffffff85c734c3>] ip6_datagram_recv_ctl+0x33/0x40
       [<ffffffff85c0b07c>] do_ipv6_getsockopt.isra.4+0xaec/0x2150
       [<ffffffff85c0c7f6>] ipv6_getsockopt+0x116/0x230
       [<ffffffff859b5a12>] tcp_getsockopt+0x82/0xd0 /net/ipv4/tcp.c:3035
       [<ffffffff855fb385>] sock_common_getsockopt+0x95/0xd0 /net/core/sock.c:2647
       [<     inline     >] SYSC_getsockopt /net/socket.c:1776
       [<ffffffff855f8ba2>] SyS_getsockopt+0x142/0x230 /net/socket.c:1758
       [<ffffffff8685cdc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
      Memory state around the buggy address:
       ffff880029c84d80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff880029c84e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      > ffff880029c84e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                    ^
       ffff880029c84f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff880029c84f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      
      He also provided a syzkaller reproducer.
      
      Issue is that ip6_datagram_recv_specific_ctl() expects to find IP6CB
      data that was moved at a different place in tcp_v6_rcv()
      
      This patch moves tcp_v6_restore_cb() up and calls it from
      tcp_v6_do_rcv() when np->pktoptions is set.
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ce48623
  6. 08 10月, 2016 1 次提交
  7. 03 10月, 2016 1 次提交
    • M
      ipv6 addrconf: remove addrconf_sysctl_hop_limit() · cb9e684e
      Maciej Żenczykowski 提交于
      This is an effective no-op in terms of user observable behaviour.
      
      By preventing the overwrite of non-null extra1/extra2 fields
      in addrconf_sysctl() we can enable the use of proc_dointvec_minmax().
      
      This allows us to eliminate the constant min/max (1..255) trampoline
      function that is addrconf_sysctl_hop_limit().
      
      This is nice because it simplifies the code, and allows future
      sysctls with constant min/max limits to also not require trampolines.
      
      We still can't eliminate the trampoline for mtu because it isn't
      actually a constant (it depends on other tunables of the device)
      and thus requires at-write-time logic to enforce range.
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Acked-by: NErik Kline <ek@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb9e684e
  8. 30 9月, 2016 3 次提交
  9. 26 9月, 2016 3 次提交
    • N
      ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route · 2cf75070
      Nikolay Aleksandrov 提交于
      Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
      instead of the previous dst_pid which was copied from in_skb's portid.
      Since the skb is new the portid is 0 at that point so the packets are sent
      to the kernel and we get scheduling while atomic or a deadlock (depending
      on where it happens) by trying to acquire rtnl two times.
      Also since this is RTM_GETROUTE, it can be triggered by a normal user.
      
      Here's the sleeping while atomic trace:
      [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
      [ 7858.212881] 2 locks held by swapper/0/0:
      [ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
      [ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
      [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
      [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
      [ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
      [ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
      [ 7858.215251] Call Trace:
      [ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
      [ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
      [ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
      [ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
      [ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
      [ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
      [ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
      [ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
      [ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
      [ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
      [ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
      [ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
      [ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
      [ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
      [ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
      [ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
      [ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
      [ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
      [ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
      [ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
      [ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
      [ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
      [ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
      [ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
      [ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
      [ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
      [ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
      [ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
      [ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a
      
      Fixes: 2942e900 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cf75070
    • L
      netfilter: nf_log: get rid of XT_LOG_* macros · 8cb2a7d5
      Liping Zhang 提交于
      nf_log is used by both nftables and iptables, so use XT_LOG_XXX macros
      here is not appropriate. Replace them with NF_LOG_XXX.
      Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8cb2a7d5
    • L
      netfilter: nft_log: complete NFTA_LOG_FLAGS attr support · ff107d27
      Liping Zhang 提交于
      NFTA_LOG_FLAGS attribute is already supported, but the related
      NF_LOG_XXX flags are not exposed to the userspace. So we cannot
      explicitly enable log flags to log uid, tcp sequence, ip options
      and so on, i.e. such rule "nft add rule filter output log uid"
      is not supported yet.
      
      So move NF_LOG_XXX macro definitions to the uapi/../nf_log.h. In
      order to keep consistent with other modules, change NF_LOG_MASK to
      refer to all supported log flags. On the other hand, add a new
      NF_LOG_DEFAULT_MASK to refer to the original default log flags.
      
      Finally, if user specify the unsupported log flags or NFTA_LOG_GROUP
      and NFTA_LOG_FLAGS are set at the same time, report EINVAL to the
      userspace.
      Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ff107d27
  10. 25 9月, 2016 2 次提交
  11. 24 9月, 2016 1 次提交
  12. 21 9月, 2016 2 次提交
  13. 20 9月, 2016 2 次提交
  14. 19 9月, 2016 1 次提交
  15. 17 9月, 2016 1 次提交
  16. 13 9月, 2016 2 次提交
  17. 11 9月, 2016 6 次提交
  18. 10 9月, 2016 1 次提交
    • G
      ipv6: report NLM_F_CREATE and NLM_F_EXCL flags in RTM_NEWROUTE events · 73483c12
      Guillaume Nault 提交于
      Since commit 37a1d361 ("ipv6: include NLM_F_REPLACE in route
      replace notifications"), RTM_NEWROUTE notifications have their
      NLM_F_REPLACE flag set if the new route replaced a preexisting one.
      However, other flags aren't set.
      
      This patch reports the missing NLM_F_CREATE and NLM_F_EXCL flag bits.
      
      NLM_F_APPEND is not reported, because in ipv6 a NLM_F_CREATE request
      is interpreted as an append request (contrary to ipv4, "prepend" is not
      supported, so if NLM_F_EXCL is not set then NLM_F_APPEND is implicit).
      
      As a result, the possible flag combination can now be reported
      (iproute2's terminology into parentheses):
      
        * NLM_F_CREATE | NLM_F_EXCL: route didn't exist, exclusive creation
          ("add").
        * NLM_F_CREATE: route did already exist, new route added after
          preexisting ones ("append").
        * NLM_F_REPLACE: route did already exist, new route replaced the
          first preexisting one ("change").
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73483c12
  19. 09 9月, 2016 1 次提交
  20. 07 9月, 2016 3 次提交
    • W
      ipv6: addrconf: fix dev refcont leak when DAD failed · 751eb6b6
      Wei Yongjun 提交于
      In general, when DAD detected IPv6 duplicate address, ifp->state
      will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a
      delayed work, the call tree should be like this:
      
      ndisc_recv_ns
        -> addrconf_dad_failure        <- missing ifp put
           -> addrconf_mod_dad_work
             -> schedule addrconf_dad_work()
               -> addrconf_dad_stop()  <- missing ifp hold before call it
      
      addrconf_dad_failure() called with ifp refcont holding but not put.
      addrconf_dad_work() call addrconf_dad_stop() without extra holding
      refcount. This will not cause any issue normally.
      
      But the race between addrconf_dad_failure() and addrconf_dad_work()
      may cause ifp refcount leak and netdevice can not be unregister,
      dmesg show the following messages:
      
      IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected!
      ...
      unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      Cc: stable@vger.kernel.org
      Fixes: c15b1cca ("ipv6: move DAD and addrconf_verify processing
      to workqueue")
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      751eb6b6
    • D
      ipv6: release dst in ping_v6_sendmsg · 03c2778a
      Dave Jones 提交于
      Neither the failure or success paths of ping_v6_sendmsg release
      the dst it acquires.  This leads to a flood of warnings from
      "net/core/dst.c:288 dst_release" on older kernels that
      don't have 8bf4ada2 backported.
      
      That patch optimistically hoped this had been fixed post 3.10, but
      it seems at least one case wasn't, where I've seen this triggered
      a lot from machines doing unprivileged icmp sockets.
      
      Cc: Martin Lau <kafai@fb.com>
      Signed-off-by: NDave Jones <davej@codemonkey.org.uk>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03c2778a
    • L
      netfilter: nft_chain_route: re-route before skb is queued to userspace · d1a6cba5
      Liping Zhang 提交于
      Imagine such situation, user add the following nft rules, and queue
      the packets to userspace for further check:
        # ip rule add fwmark 0x0/0x1 lookup eth0
        # ip rule add fwmark 0x1/0x1 lookup eth1
        # nft add table filter
        # nft add chain filter output {type route hook output priority 0 \;}
        # nft add rule filter output mark set 0x1
        # nft add rule filter output queue num 0
      
      But after we reinject the skbuff, the packet will be sent via the
      wrong route, i.e. in this case, the packet will be routed via eth0
      table, not eth1 table. Because we skip to do re-route when verdict
      is NF_QUEUE, even if the mark was changed.
      
      Acctually, we should not touch sk_buff if verdict is NF_DROP or
      NF_STOLEN, and when re-route fails, return NF_DROP with error code.
      This is consistent with the mangle table in iptables.
      Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d1a6cba5
  21. 02 9月, 2016 4 次提交
  22. 31 8月, 2016 1 次提交
    • R
      net: lwtunnel: Handle fragmentation · 14972cbd
      Roopa Prabhu 提交于
      Today mpls iptunnel lwtunnel_output redirect expects the tunnel
      output function to handle fragmentation. This is ok but can be
      avoided if we did not do the mpls output redirect too early.
      ie we could wait until ip fragmentation is done and then call
      mpls output for each ip fragment.
      
      To make this work we will need,
      1) the lwtunnel state to carry encap headroom
      2) and do the redirect to the encap output handler on the ip fragment
      (essentially do the output redirect after fragmentation)
      
      This patch adds tunnel headroom in lwtstate to make sure we
      account for tunnel data in mtu calculations during fragmentation
      and adds new xmit redirect handler to redirect to lwtunnel xmit func
      after ip fragmentation.
      
      This includes IPV6 and some mtu fixes and testing from David Ahern.
      Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14972cbd