1. 21 10月, 2016 4 次提交
  2. 17 10月, 2016 1 次提交
    • D
      net: Require exact match for TCP socket lookups if dif is l3mdev · a04a480d
      David Ahern 提交于
      Currently, socket lookups for l3mdev (vrf) use cases can match a socket
      that is bound to a port but not a device (ie., a global socket). If the
      sysctl tcp_l3mdev_accept is not set this leads to ack packets going out
      based on the main table even though the packet came in from an L3 domain.
      The end result is that the connection does not establish creating
      confusion for users since the service is running and a socket shows in
      ss output. Fix by requiring an exact dif to sk_bound_dev_if match if the
      skb came through an interface enslaved to an l3mdev device and the
      tcp_l3mdev_accept is not set.
      
      skb's through an l3mdev interface are marked by setting a flag in
      inet{6}_skb_parm. The IPv6 variant is already set; this patch adds the
      flag for IPv4. Using an skb flag avoids a device lookup on the dif. The
      flag is set in the VRF driver using the IP{6}CB macros. For IPv4, the
      inet_skb_parm struct is moved in the cb per commit 971f10ec, so the
      match function in the TCP stack needs to use TCP_SKB_CB. For IPv6, the
      move is done after the socket lookup, so IP6CB is used.
      
      The flags field in inet_skb_parm struct needs to be increased to add
      another flag. There is currently a 1-byte hole following the flags,
      so it can be expanded to u16 without increasing the size of the struct.
      
      Fixes: 193125db ("net: Introduce VRF device driver")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a04a480d
  3. 14 10月, 2016 4 次提交
    • J
      IPv6: fix DESYNC_FACTOR · 76506a98
      Jiri Bohac 提交于
      The IPv6 temporary address generation uses a variable called DESYNC_FACTOR
      to prevent hosts updating the addresses at the same time. Quoting RFC 4941:
      
         ... The value DESYNC_FACTOR is a random value (different for each
         client) that ensures that clients don't synchronize with each other and
         generate new addresses at exactly the same time ...
      
      DESYNC_FACTOR is defined as:
      
         DESYNC_FACTOR -- A random value within the range 0 - MAX_DESYNC_FACTOR.
         It is computed once at system start (rather than each time it is used)
         and must never be greater than (TEMP_VALID_LIFETIME - REGEN_ADVANCE).
      
      First, I believe the RFC has a typo in it and meant to say: "and must
      never be greater than (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE)"
      
      The reason is that at various places in the RFC, DESYNC_FACTOR is used in
      a calculation like (TEMP_PREFERRED_LIFETIME - DESYNC_FACTOR) or
      (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE - DESYNC_FACTOR). It needs to be
      smaller than (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE) for the result of
      these calculations to be larger than zero. It's never used in a
      calculation together with TEMP_VALID_LIFETIME.
      
      I already submitted an errata to the rfc-editor:
      https://www.rfc-editor.org/errata_search.php?rfc=4941
      
      The Linux implementation of DESYNC_FACTOR is very wrong:
      max_desync_factor is used in places DESYNC_FACTOR should be used.
      max_desync_factor is initialized to the RFC-recommended value for
      MAX_DESYNC_FACTOR (600) but the whole point is to get a _random_ value.
      
      And nothing ensures that the value used is not greater than
      (TEMP_PREFERRED_LIFETIME - REGEN_ADVANCE), which leads to underflows.  The
      effect can easily be observed when setting the temp_prefered_lft sysctl
      e.g. to 60. The preferred lifetime of the temporary addresses will be
      bogus.
      
      TEMP_PREFERRED_LIFETIME and REGEN_ADVANCE are not constants and can be
      influenced by these three sysctls: regen_max_retry, dad_transmits and
      temp_prefered_lft. Thus, the upper bound for desync_factor needs to be
      re-calculated each time a new address is generated and if desync_factor is
      larger than the new upper bound, a new random value needs to be
      re-generated.
      
      And since we already have max_desync_factor configurable per interface, we
      also need to calculate and store desync_factor per interface.
      Signed-off-by: NJiri Bohac <jbohac@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      76506a98
    • J
      IPv6: Drop the temporary address regen_timer · 9d6280da
      Jiri Bohac 提交于
      The randomized interface identifier (rndid) was periodically updated from
      the regen_timer timer. Simplify the code by updating the rndid only when
      needed by ipv6_try_regen_rndid().
      
      This makes the follow-up DESYNC_FACTOR fix much simpler.  Also it fixes a
      reference counting error in this error path, where an in6_dev_put was
      missing:
      		err = addrconf_sysctl_register(ndev);
      		if (err) {
      			ipv6_mc_destroy_dev(ndev);
      	-               del_timer(&ndev->regen_timer);
      			snmp6_unregister_dev(ndev);
      			goto err_release;
      Signed-off-by: NJiri Bohac <jbohac@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d6280da
    • N
      ipv6: correctly add local routes when lo goes up · a220445f
      Nicolas Dichtel 提交于
      The goal of the patch is to fix this scenario:
       ip link add dummy1 type dummy
       ip link set dummy1 up
       ip link set lo down ; ip link set lo up
      
      After that sequence, the local route to the link layer address of dummy1 is
      not there anymore.
      
      When the loopback is set down, all local routes are deleted by
      addrconf_ifdown()/rt6_ifdown(). At this time, the rt6_info entry still
      exists, because the corresponding idev has a reference on it. After the rcu
      grace period, dst_rcu_free() is called, and thus ___dst_free(), which will
      set obsolete to DST_OBSOLETE_DEAD.
      
      In this case, init_loopback() is called before dst_rcu_free(), thus
      obsolete is still sets to something <= 0. So, the function doesn't add the
      route again. To avoid that race, let's check the rt6 refcnt instead.
      
      Fixes: 25fb6ca4 ("net IPv6 : Fix broken IPv6 routing table after loopback down-up")
      Fixes: a881ae1f ("ipv6: don't call addrconf_dst_alloc again when enable lo")
      Fixes: 33d99113 ("ipv6: reallocate addrconf router for ipv6 address when lo device up")
      Reported-by: NFrancesco Santoro <francesco.santoro@6wind.com>
      Reported-by: NSamuel Gauthier <samuel.gauthier@6wind.com>
      CC: Balakumaran Kannan <Balakumaran.Kannan@ap.sony.com>
      CC: Maruthi Thotad <Maruthi.Thotad@ap.sony.com>
      CC: Sabrina Dubroca <sd@queasysnail.net>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      CC: Weilong Chen <chenweilong@huawei.com>
      CC: Gao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a220445f
    • V
      ip6_tunnel: fix ip6_tnl_lookup · 68d00f33
      Vadim Fedorenko 提交于
      The commit ea3dc960 ("ip6_tunnel: Add support for wildcard tunnel
      endpoints.") introduces support for wildcards in tunnels endpoints,
      but in some rare circumstances ip6_tnl_lookup selects wrong tunnel
      interface relying only on source or destination address of the packet
      and not checking presence of wildcard in tunnels endpoints. Later in
      ip6_tnl_rcv this packets can be dicarded because of difference in
      ipproto even if fallback device have proper ipproto configuration.
      
      This patch adds checks of wildcard endpoint in tunnel avoiding such
      behavior
      
      Fixes: ea3dc960 ("ip6_tunnel: Add support for wildcard tunnel endpoints.")
      Signed-off-by: NVadim Fedorenko <junk@yandex-team.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68d00f33
  4. 13 10月, 2016 1 次提交
    • E
      ipv6: tcp: restore IP6CB for pktoptions skbs · 8ce48623
      Eric Dumazet 提交于
      Baozeng Ding reported following KASAN splat :
      
      BUG: KASAN: use-after-free in ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 at addr ffff880029c84ec8
      Read of size 1 by task poc/25548
      Call Trace:
       [<ffffffff82cf43c9>] dump_stack+0x12e/0x185 /lib/dump_stack.c:15
       [<     inline     >] print_address_description /mm/kasan/report.c:204
       [<ffffffff817ced3b>] kasan_report_error+0x48b/0x4b0 /mm/kasan/report.c:283
       [<     inline     >] kasan_report /mm/kasan/report.c:303
       [<ffffffff817ced9e>] __asan_report_load1_noabort+0x3e/0x40 /mm/kasan/report.c:321
       [<ffffffff85c71da1>] ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 /net/ipv6/datagram.c:687
       [<ffffffff85c734c3>] ip6_datagram_recv_ctl+0x33/0x40
       [<ffffffff85c0b07c>] do_ipv6_getsockopt.isra.4+0xaec/0x2150
       [<ffffffff85c0c7f6>] ipv6_getsockopt+0x116/0x230
       [<ffffffff859b5a12>] tcp_getsockopt+0x82/0xd0 /net/ipv4/tcp.c:3035
       [<ffffffff855fb385>] sock_common_getsockopt+0x95/0xd0 /net/core/sock.c:2647
       [<     inline     >] SYSC_getsockopt /net/socket.c:1776
       [<ffffffff855f8ba2>] SyS_getsockopt+0x142/0x230 /net/socket.c:1758
       [<ffffffff8685cdc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
      Memory state around the buggy address:
       ffff880029c84d80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff880029c84e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      > ffff880029c84e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                                    ^
       ffff880029c84f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
       ffff880029c84f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
      
      He also provided a syzkaller reproducer.
      
      Issue is that ip6_datagram_recv_specific_ctl() expects to find IP6CB
      data that was moved at a different place in tcp_v6_rcv()
      
      This patch moves tcp_v6_restore_cb() up and calls it from
      tcp_v6_do_rcv() when np->pktoptions is set.
      
      Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NBaozeng Ding <sploving1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8ce48623
  5. 08 10月, 2016 1 次提交
  6. 03 10月, 2016 1 次提交
    • M
      ipv6 addrconf: remove addrconf_sysctl_hop_limit() · cb9e684e
      Maciej Żenczykowski 提交于
      This is an effective no-op in terms of user observable behaviour.
      
      By preventing the overwrite of non-null extra1/extra2 fields
      in addrconf_sysctl() we can enable the use of proc_dointvec_minmax().
      
      This allows us to eliminate the constant min/max (1..255) trampoline
      function that is addrconf_sysctl_hop_limit().
      
      This is nice because it simplifies the code, and allows future
      sysctls with constant min/max limits to also not require trampolines.
      
      We still can't eliminate the trampoline for mtu because it isn't
      actually a constant (it depends on other tunables of the device)
      and thus requires at-write-time logic to enforce range.
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Acked-by: NErik Kline <ek@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb9e684e
  7. 30 9月, 2016 3 次提交
  8. 26 9月, 2016 3 次提交
    • N
      ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route · 2cf75070
      Nikolay Aleksandrov 提交于
      Since the commit below the ipmr/ip6mr rtnl_unicast() code uses the portid
      instead of the previous dst_pid which was copied from in_skb's portid.
      Since the skb is new the portid is 0 at that point so the packets are sent
      to the kernel and we get scheduling while atomic or a deadlock (depending
      on where it happens) by trying to acquire rtnl two times.
      Also since this is RTM_GETROUTE, it can be triggered by a normal user.
      
      Here's the sleeping while atomic trace:
      [ 7858.212557] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620
      [ 7858.212748] in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/0
      [ 7858.212881] 2 locks held by swapper/0/0:
      [ 7858.213013]  #0:  (((&mrt->ipmr_expire_timer))){+.-...}, at: [<ffffffff810fbbf5>] call_timer_fn+0x5/0x350
      [ 7858.213422]  #1:  (mfc_unres_lock){+.....}, at: [<ffffffff8161e005>] ipmr_expire_process+0x25/0x130
      [ 7858.213807] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc7+ #179
      [ 7858.213934] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014
      [ 7858.214108]  0000000000000000 ffff88005b403c50 ffffffff813a7804 0000000000000000
      [ 7858.214412]  ffffffff81a1338e ffff88005b403c78 ffffffff810a4a72 ffffffff81a1338e
      [ 7858.214716]  000000000000026c 0000000000000000 ffff88005b403ca8 ffffffff810a4b9f
      [ 7858.215251] Call Trace:
      [ 7858.215412]  <IRQ>  [<ffffffff813a7804>] dump_stack+0x85/0xc1
      [ 7858.215662]  [<ffffffff810a4a72>] ___might_sleep+0x192/0x250
      [ 7858.215868]  [<ffffffff810a4b9f>] __might_sleep+0x6f/0x100
      [ 7858.216072]  [<ffffffff8165bea3>] mutex_lock_nested+0x33/0x4d0
      [ 7858.216279]  [<ffffffff815a7a5f>] ? netlink_lookup+0x25f/0x460
      [ 7858.216487]  [<ffffffff8157474b>] rtnetlink_rcv+0x1b/0x40
      [ 7858.216687]  [<ffffffff815a9a0c>] netlink_unicast+0x19c/0x260
      [ 7858.216900]  [<ffffffff81573c70>] rtnl_unicast+0x20/0x30
      [ 7858.217128]  [<ffffffff8161cd39>] ipmr_destroy_unres+0xa9/0xf0
      [ 7858.217351]  [<ffffffff8161e06f>] ipmr_expire_process+0x8f/0x130
      [ 7858.217581]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217785]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.217990]  [<ffffffff810fbc95>] call_timer_fn+0xa5/0x350
      [ 7858.218192]  [<ffffffff810fbbf5>] ? call_timer_fn+0x5/0x350
      [ 7858.218415]  [<ffffffff8161dfe0>] ? ipmr_net_init+0x180/0x180
      [ 7858.218656]  [<ffffffff810fde10>] run_timer_softirq+0x260/0x640
      [ 7858.218865]  [<ffffffff8166379b>] ? __do_softirq+0xbb/0x54f
      [ 7858.219068]  [<ffffffff816637c8>] __do_softirq+0xe8/0x54f
      [ 7858.219269]  [<ffffffff8107a948>] irq_exit+0xb8/0xc0
      [ 7858.219463]  [<ffffffff81663452>] smp_apic_timer_interrupt+0x42/0x50
      [ 7858.219678]  [<ffffffff816625bc>] apic_timer_interrupt+0x8c/0xa0
      [ 7858.219897]  <EOI>  [<ffffffff81055f16>] ? native_safe_halt+0x6/0x10
      [ 7858.220165]  [<ffffffff810d64dd>] ? trace_hardirqs_on+0xd/0x10
      [ 7858.220373]  [<ffffffff810298e3>] default_idle+0x23/0x190
      [ 7858.220574]  [<ffffffff8102a20f>] arch_cpu_idle+0xf/0x20
      [ 7858.220790]  [<ffffffff810c9f8c>] default_idle_call+0x4c/0x60
      [ 7858.221016]  [<ffffffff810ca33b>] cpu_startup_entry+0x39b/0x4d0
      [ 7858.221257]  [<ffffffff8164f995>] rest_init+0x135/0x140
      [ 7858.221469]  [<ffffffff81f83014>] start_kernel+0x50e/0x51b
      [ 7858.221670]  [<ffffffff81f82120>] ? early_idt_handler_array+0x120/0x120
      [ 7858.221894]  [<ffffffff81f8243f>] x86_64_start_reservations+0x2a/0x2c
      [ 7858.222113]  [<ffffffff81f8257c>] x86_64_start_kernel+0x13b/0x14a
      
      Fixes: 2942e900 ("[RTNETLINK]: Use rtnl_unicast() for rtnetlink unicasts")
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2cf75070
    • L
      netfilter: nf_log: get rid of XT_LOG_* macros · 8cb2a7d5
      Liping Zhang 提交于
      nf_log is used by both nftables and iptables, so use XT_LOG_XXX macros
      here is not appropriate. Replace them with NF_LOG_XXX.
      Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8cb2a7d5
    • L
      netfilter: nft_log: complete NFTA_LOG_FLAGS attr support · ff107d27
      Liping Zhang 提交于
      NFTA_LOG_FLAGS attribute is already supported, but the related
      NF_LOG_XXX flags are not exposed to the userspace. So we cannot
      explicitly enable log flags to log uid, tcp sequence, ip options
      and so on, i.e. such rule "nft add rule filter output log uid"
      is not supported yet.
      
      So move NF_LOG_XXX macro definitions to the uapi/../nf_log.h. In
      order to keep consistent with other modules, change NF_LOG_MASK to
      refer to all supported log flags. On the other hand, add a new
      NF_LOG_DEFAULT_MASK to refer to the original default log flags.
      
      Finally, if user specify the unsupported log flags or NFTA_LOG_GROUP
      and NFTA_LOG_FLAGS are set at the same time, report EINVAL to the
      userspace.
      Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ff107d27
  9. 25 9月, 2016 2 次提交
  10. 24 9月, 2016 1 次提交
  11. 21 9月, 2016 2 次提交
  12. 20 9月, 2016 2 次提交
  13. 19 9月, 2016 1 次提交
  14. 17 9月, 2016 1 次提交
  15. 13 9月, 2016 2 次提交
  16. 11 9月, 2016 6 次提交
  17. 10 9月, 2016 1 次提交
    • G
      ipv6: report NLM_F_CREATE and NLM_F_EXCL flags in RTM_NEWROUTE events · 73483c12
      Guillaume Nault 提交于
      Since commit 37a1d361 ("ipv6: include NLM_F_REPLACE in route
      replace notifications"), RTM_NEWROUTE notifications have their
      NLM_F_REPLACE flag set if the new route replaced a preexisting one.
      However, other flags aren't set.
      
      This patch reports the missing NLM_F_CREATE and NLM_F_EXCL flag bits.
      
      NLM_F_APPEND is not reported, because in ipv6 a NLM_F_CREATE request
      is interpreted as an append request (contrary to ipv4, "prepend" is not
      supported, so if NLM_F_EXCL is not set then NLM_F_APPEND is implicit).
      
      As a result, the possible flag combination can now be reported
      (iproute2's terminology into parentheses):
      
        * NLM_F_CREATE | NLM_F_EXCL: route didn't exist, exclusive creation
          ("add").
        * NLM_F_CREATE: route did already exist, new route added after
          preexisting ones ("append").
        * NLM_F_REPLACE: route did already exist, new route replaced the
          first preexisting one ("change").
      Signed-off-by: NGuillaume Nault <g.nault@alphalink.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73483c12
  18. 09 9月, 2016 1 次提交
  19. 07 9月, 2016 3 次提交
    • W
      ipv6: addrconf: fix dev refcont leak when DAD failed · 751eb6b6
      Wei Yongjun 提交于
      In general, when DAD detected IPv6 duplicate address, ifp->state
      will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a
      delayed work, the call tree should be like this:
      
      ndisc_recv_ns
        -> addrconf_dad_failure        <- missing ifp put
           -> addrconf_mod_dad_work
             -> schedule addrconf_dad_work()
               -> addrconf_dad_stop()  <- missing ifp hold before call it
      
      addrconf_dad_failure() called with ifp refcont holding but not put.
      addrconf_dad_work() call addrconf_dad_stop() without extra holding
      refcount. This will not cause any issue normally.
      
      But the race between addrconf_dad_failure() and addrconf_dad_work()
      may cause ifp refcount leak and netdevice can not be unregister,
      dmesg show the following messages:
      
      IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected!
      ...
      unregister_netdevice: waiting for eth0 to become free. Usage count = 1
      
      Cc: stable@vger.kernel.org
      Fixes: c15b1cca ("ipv6: move DAD and addrconf_verify processing
      to workqueue")
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      751eb6b6
    • D
      ipv6: release dst in ping_v6_sendmsg · 03c2778a
      Dave Jones 提交于
      Neither the failure or success paths of ping_v6_sendmsg release
      the dst it acquires.  This leads to a flood of warnings from
      "net/core/dst.c:288 dst_release" on older kernels that
      don't have 8bf4ada2 backported.
      
      That patch optimistically hoped this had been fixed post 3.10, but
      it seems at least one case wasn't, where I've seen this triggered
      a lot from machines doing unprivileged icmp sockets.
      
      Cc: Martin Lau <kafai@fb.com>
      Signed-off-by: NDave Jones <davej@codemonkey.org.uk>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      03c2778a
    • L
      netfilter: nft_chain_route: re-route before skb is queued to userspace · d1a6cba5
      Liping Zhang 提交于
      Imagine such situation, user add the following nft rules, and queue
      the packets to userspace for further check:
        # ip rule add fwmark 0x0/0x1 lookup eth0
        # ip rule add fwmark 0x1/0x1 lookup eth1
        # nft add table filter
        # nft add chain filter output {type route hook output priority 0 \;}
        # nft add rule filter output mark set 0x1
        # nft add rule filter output queue num 0
      
      But after we reinject the skbuff, the packet will be sent via the
      wrong route, i.e. in this case, the packet will be routed via eth0
      table, not eth1 table. Because we skip to do re-route when verdict
      is NF_QUEUE, even if the mark was changed.
      
      Acctually, we should not touch sk_buff if verdict is NF_DROP or
      NF_STOLEN, and when re-route fails, return NF_DROP with error code.
      This is consistent with the mangle table in iptables.
      Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d1a6cba5