1. 18 11月, 2017 4 次提交
  2. 16 11月, 2017 7 次提交
    • E
      net/sctp: Always set scope_id in sctp_inet6_skb_msgname · 7c8a61d9
      Eric W. Biederman 提交于
      Alexandar Potapenko while testing the kernel with KMSAN and syzkaller
      discovered that in some configurations sctp would leak 4 bytes of
      kernel stack.
      
      Working with his reproducer I discovered that those 4 bytes that
      are leaked is the scope id of an ipv6 address returned by recvmsg.
      
      With a little code inspection and a shrewd guess I discovered that
      sctp_inet6_skb_msgname only initializes the scope_id field for link
      local ipv6 addresses to the interface index the link local address
      pertains to instead of initializing the scope_id field for all ipv6
      addresses.
      
      That is almost reasonable as scope_id's are meaniningful only for link
      local addresses.  Set the scope_id in all other cases to 0 which is
      not a valid interface index to make it clear there is nothing useful
      in the scope_id field.
      
      There should be no danger of breaking userspace as the stack leak
      guaranteed that previously meaningless random data was being returned.
      
      Fixes: 372f525b495c ("SCTP:  Resync with LKSCTP tree.")
      History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.gitReported-by: NAlexander Potapenko <glider@google.com>
      Tested-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c8a61d9
    • J
      tipc: enforce valid ratio between skb truesize and contents · d618d09a
      Jon Maloy 提交于
      The socket level flow control is based on the assumption that incoming
      buffers meet the condition (skb->truesize / roundup(skb->len) <= 4),
      where the latter value is rounded off upwards to the nearest 1k number.
      This does empirically hold true for the device drivers we know, but we
      cannot trust that it will always be so, e.g., in a system with jumbo
      frames and very small packets.
      
      We now introduce a check for this condition at packet arrival, and if
      we find it to be false, we copy the packet to a new, smaller buffer,
      where the condition will be true. We expect this to affect only a small
      fraction of all incoming packets, if at all.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d618d09a
    • A
      netfilter: add ifdef around ctnetlink_proto_size · 8252fcea
      Arnd Bergmann 提交于
      This function is no longer marked 'inline', so we now get a warning
      when it is unused:
      
      net/netfilter/nf_conntrack_netlink.c:536:15: error: 'ctnetlink_proto_size' defined but not used [-Werror=unused-function]
      
      We could mark it inline again, mark it __maybe_unused, or add an #ifdef
      around the definition. I'm picking the third approach here since that
      seems to be what the rest of the file has.
      
      Fixes: 5caaed15 ("netfilter: conntrack: don't cache nlattr_tuple_size result in nla_size")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8252fcea
    • M
      genetlink: fix genlmsg_nlhdr() · 0a833c29
      Michal Kubecek 提交于
      According to the description, first argument of genlmsg_nlhdr() points to
      what genlmsg_put() returns, i.e. beginning of user header. Therefore we
      should only subtract size of genetlink header and netlink message header,
      not user header.
      
      This also means we don't need to pass the pointer to genetlink family and
      the same is true for genl_dump_check_consistent() which is the only caller
      of genlmsg_nlhdr(). (Note that at the moment, these functions are only
      used for families which do not have user header so that they are not
      affected.)
      
      Fixes: 670dc283 ("netlink: advertise incomplete dumps")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0a833c29
    • X
      sctp: check stream reset info len before making reconf chunk · 423852f8
      Xin Long 提交于
      Now when resetting stream, if both in and out flags are set, the info
      len can reach:
        sizeof(struct sctp_strreset_outreq) + SCTP_MAX_STREAM(65535) +
        sizeof(struct sctp_strreset_inreq)  + SCTP_MAX_STREAM(65535)
      even without duplicated stream no, this value is far greater than the
      chunk's max size.
      
      _sctp_make_chunk doesn't do any check for this, which would cause the
      skb it allocs is huge, syzbot even reported a crash due to this.
      
      This patch is to check stream reset info len before making reconf
      chunk and return EINVAL if the len exceeds chunk's capacity.
      
      Thanks Marcelo and Neil for making this clear.
      
      v1->v2:
        - move the check into sctp_send_reset_streams instead.
      
      Fixes: cc16f00f ("sctp: add support for generating stream reconf ssn reset request chunk")
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      423852f8
    • X
      sctp: use the right sk after waking up from wait_buf sleep · cea0cc80
      Xin Long 提交于
      Commit dfcb9f4f ("sctp: deny peeloff operation on asocs with threads
      sleeping on it") fixed the race between peeloff and wait sndbuf by
      checking waitqueue_active(&asoc->wait) in sctp_do_peeloff().
      
      But it actually doesn't work, as even if waitqueue_active returns false
      the waiting sndbuf thread may still not yet hold sk lock. After asoc is
      peeled off, sk is not asoc->base.sk any more, then to hold the old sk
      lock couldn't make assoc safe to access.
      
      This patch is to fix this by changing to hold the new sk lock if sk is
      not asoc->base.sk, meanwhile, also set the sk in sctp_sendmsg with the
      new sk.
      
      With this fix, there is no more race between peeloff and waitbuf, the
      check 'waitqueue_active' in sctp_do_peeloff can be removed.
      
      Thanks Marcelo and Neil for making this clear.
      
      v1->v2:
        fix it by changing to lock the new sock instead of adding a flag in asoc.
      Suggested-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cea0cc80
    • X
      sctp: do not free asoc when it is already dead in sctp_sendmsg · ca3af4dd
      Xin Long 提交于
      Now in sctp_sendmsg sctp_wait_for_sndbuf could schedule out without
      holding sock sk. It means the current asoc can be freed elsewhere,
      like when receiving an abort packet.
      
      If the asoc is just created in sctp_sendmsg and sctp_wait_for_sndbuf
      returns err, the asoc will be freed again due to new_asoc is not nil.
      An use-after-free issue would be triggered by this.
      
      This patch is to fix it by setting new_asoc with nil if the asoc is
      already dead when cpu schedules back, so that it will not be freed
      again in sctp_sendmsg.
      
      v1->v2:
        set new_asoc as nil in sctp_sendmsg instead of sctp_wait_for_sndbuf.
      Suggested-by: NNeil Horman <nhorman@tuxdriver.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca3af4dd
  3. 15 11月, 2017 8 次提交
    • E
      tcp: highest_sack fix · 50895b9d
      Eric Dumazet 提交于
      syzbot easily found a regression added in our latest patches [1]
      
      No longer set tp->highest_sack to the head of the send queue since
      this is not logical and error prone.
      
      Only sack processing should maintain the pointer to an skb from rtx queue.
      
      We might in the future only remember the sequence instead of a pointer to skb,
      since rb-tree should allow a fast lookup.
      
      [1]
      BUG: KASAN: use-after-free in tcp_highest_sack_seq include/net/tcp.h:1706 [inline]
      BUG: KASAN: use-after-free in tcp_ack+0x42bb/0x4fd0 net/ipv4/tcp_input.c:3537
      Read of size 4 at addr ffff8801c154faa8 by task syz-executor4/12860
      
      CPU: 0 PID: 12860 Comm: syz-executor4 Not tainted 4.14.0-next-20171113+ #41
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:53
       print_address_description+0x73/0x250 mm/kasan/report.c:252
       kasan_report_error mm/kasan/report.c:351 [inline]
       kasan_report+0x25b/0x340 mm/kasan/report.c:409
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
       tcp_highest_sack_seq include/net/tcp.h:1706 [inline]
       tcp_ack+0x42bb/0x4fd0 net/ipv4/tcp_input.c:3537
       tcp_rcv_established+0x672/0x18a0 net/ipv4/tcp_input.c:5439
       tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1468
       sk_backlog_rcv include/net/sock.h:909 [inline]
       __release_sock+0x124/0x360 net/core/sock.c:2264
       release_sock+0xa4/0x2a0 net/core/sock.c:2778
       tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
       sock_sendmsg_nosec net/socket.c:632 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:642
       ___sys_sendmsg+0x75b/0x8a0 net/socket.c:2048
       __sys_sendmsg+0xe5/0x210 net/socket.c:2082
       SYSC_sendmsg net/socket.c:2093 [inline]
       SyS_sendmsg+0x2d/0x50 net/socket.c:2089
       entry_SYSCALL_64_fastpath+0x1f/0x96
      RIP: 0033:0x452879
      RSP: 002b:00007fc9761bfbe8 EFLAGS: 00000212 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000758020 RCX: 0000000000452879
      RDX: 0000000000000000 RSI: 0000000020917fc8 RDI: 0000000000000015
      RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000212 R12: 00000000006ee3a0
      R13: 00000000ffffffff R14: 00007fc9761c06d4 R15: 0000000000000000
      
      Allocated by task 12860:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
       kmem_cache_alloc_node+0x144/0x760 mm/slab.c:3638
       __alloc_skb+0xf1/0x780 net/core/skbuff.c:193
       alloc_skb_fclone include/linux/skbuff.h:1023 [inline]
       sk_stream_alloc_skb+0x11d/0x900 net/ipv4/tcp.c:870
       tcp_sendmsg_locked+0x1341/0x3b80 net/ipv4/tcp.c:1299
       tcp_sendmsg+0x2f/0x50 net/ipv4/tcp.c:1461
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
       sock_sendmsg_nosec net/socket.c:632 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:642
       SYSC_sendto+0x358/0x5a0 net/socket.c:1749
       SyS_sendto+0x40/0x50 net/socket.c:1717
       entry_SYSCALL_64_fastpath+0x1f/0x96
      
      Freed by task 12860:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:447
       set_track mm/kasan/kasan.c:459 [inline]
       kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
       __cache_free mm/slab.c:3492 [inline]
       kmem_cache_free+0x77/0x280 mm/slab.c:3750
       kfree_skbmem+0xdd/0x1d0 net/core/skbuff.c:603
       __kfree_skb+0x1d/0x20 net/core/skbuff.c:642
       sk_wmem_free_skb include/net/sock.h:1419 [inline]
       tcp_rtx_queue_unlink_and_free include/net/tcp.h:1682 [inline]
       tcp_clean_rtx_queue net/ipv4/tcp_input.c:3111 [inline]
       tcp_ack+0x1b17/0x4fd0 net/ipv4/tcp_input.c:3593
       tcp_rcv_established+0x672/0x18a0 net/ipv4/tcp_input.c:5439
       tcp_v4_do_rcv+0x2ab/0x7d0 net/ipv4/tcp_ipv4.c:1468
       sk_backlog_rcv include/net/sock.h:909 [inline]
       __release_sock+0x124/0x360 net/core/sock.c:2264
       release_sock+0xa4/0x2a0 net/core/sock.c:2778
       tcp_sendmsg+0x3a/0x50 net/ipv4/tcp.c:1462
       inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:763
       sock_sendmsg_nosec net/socket.c:632 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:642
       ___sys_sendmsg+0x75b/0x8a0 net/socket.c:2048
       __sys_sendmsg+0xe5/0x210 net/socket.c:2082
       SYSC_sendmsg net/socket.c:2093 [inline]
       SyS_sendmsg+0x2d/0x50 net/socket.c:2089
       entry_SYSCALL_64_fastpath+0x1f/0x96
      
      The buggy address belongs to the object at ffff8801c154fa80
       which belongs to the cache skbuff_fclone_cache of size 456
      The buggy address is located 40 bytes inside of
       456-byte region [ffff8801c154fa80, ffff8801c154fc48)
      The buggy address belongs to the page:
      page:ffffea00070553c0 count:1 mapcount:0 mapping:ffff8801c154f080 index:0x0
      flags: 0x2fffc0000000100(slab)
      raw: 02fffc0000000100 ffff8801c154f080 0000000000000000 0000000100000006
      raw: ffffea00070a5a20 ffffea0006a18360 ffff8801d9ca0500 0000000000000000
      page dumped because: kasan: bad access detected
      
      Fixes: 737ff314 ("tcp: use sequence distance to detect reordering")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50895b9d
    • S
      Revert "xfrm: Fix stack-out-of-bounds read in xfrm_state_find." · 94802151
      Steffen Klassert 提交于
      This reverts commit c9f3f813.
      
      This commit breaks transport mode when the policy template
      has widlcard addresses configured, so revert it.
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      94802151
    • G
      openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start · b74912a2
      Gustavo A. R. Silva 提交于
      It seems that the intention of the code is to null check the value
      returned by function genlmsg_put. But the current code is null
      checking the address of the pointer that holds the value returned
      by genlmsg_put.
      
      Fix this by properly null checking the value returned by function
      genlmsg_put in order to avoid a pontential null pointer dereference.
      
      Addresses-Coverity-ID: 1461561 ("Dereference before null check")
      Addresses-Coverity-ID: 1461562 ("Dereference null return value")
      Fixes: 96fbc13d ("openvswitch: Add meter infrastructure")
      Signed-off-by: NGustavo A. R. Silva <garsilva@embeddedor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b74912a2
    • S
      netem: remove unnecessary 64 bit modulus · 9b0ed891
      Stephen Hemminger 提交于
      Fix compilation on 32 bit platforms (where doing modulus operation
      with 64 bit requires extra glibc functions) by truncation.
      The jitter for table distribution is limited to a 32 bit value
      because random numbers are scaled as 32 bit value.
      
      Also fix some whitespace.
      
      Fixes: 99803171 ("netem: add uapi to express delay and jitter in nanoseconds")
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b0ed891
    • S
      netem: use 64 bit divide by rate · bce552fd
      Stephen Hemminger 提交于
      Since times are now expressed in nanosecond, need to now do
      true 64 bit divide. Old code would truncate rate at 32 bits.
      Rename function to better express current usage.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bce552fd
    • S
      tcp: Namespace-ify sysctl_tcp_default_congestion_control · 6670e152
      Stephen Hemminger 提交于
      Make default TCP default congestion control to a per namespace
      value. This changes default congestion control to a pointer to congestion ops
      (rather than implicit as first element of available lsit).
      
      The congestion control setting of new namespaces is inherited
      from the current setting of the root namespace.
      Signed-off-by: NStephen Hemminger <sthemmin@microsoft.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6670e152
    • K
      net: Protect iterations over net::fib_notifier_ops in fib_seq_sum() · 11bf284f
      Kirill Tkhai 提交于
      There is at least unlocked deletion of net->ipv4.fib_notifier_ops
      from net::fib_notifier_ops:
      
      ip_fib_net_exit()
        rtnl_unlock()
        fib4_notifier_exit()
          fib_notifier_ops_unregister(net->ipv4.notifier_ops)
            list_del_rcu(&ops->list)
      
      So fib_seq_sum() can't use rtnl_lock() only for protection.
      
      The possible solution could be to use rtnl_lock()
      in fib_notifier_ops_unregister(), but this adds
      a possible delay during net namespace creation,
      so we better use rcu_read_lock() till someone
      really needs the mutex (if that happens).
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11bf284f
    • N
      ipv6: set all.accept_dad to 0 by default · 09400953
      Nicolas Dichtel 提交于
      With commits 35e015e1 and a2d3f3e3, the global 'accept_dad' flag
      is also taken into account (default value is 1). If either global or
      per-interface flag is non-zero, DAD will be enabled on a given interface.
      
      This is not backward compatible: before those patches, the user could
      disable DAD just by setting the per-interface flag to 0. Now, the
      user instead needs to set both flags to 0 to actually disable DAD.
      
      Restore the previous behaviour by setting the default for the global
      'accept_dad' flag to 0. This way, DAD is still enabled by default,
      as per-interface flags are set to 1 on device creation, but setting
      them to 0 is enough to disable DAD on a given interface.
      
      - Before 35e015e1f57a7 and a2d3f3e3:
                global    per-interface    DAD enabled
      [default]   1             1              yes
                  X             0              no
                  X             1              yes
      
      - After 35e015e1 and a2d3f3e3:
                global    per-interface    DAD enabled
      [default]   1             1              yes
                  0             0              no
                  0             1              yes
                  1             0              yes
      
      - After this fix:
                global    per-interface    DAD enabled
                  1             1              yes
                  0             0              no
      [default]   0             1              yes
                  1             0              yes
      
      Fixes: 35e015e1 ("ipv6: fix net.ipv6.conf.all interface DAD handlers")
      Fixes: a2d3f3e3 ("ipv6: fix net.ipv6.conf.all.accept_dad behaviour for real")
      CC: Stefano Brivio <sbrivio@redhat.com>
      CC: Matteo Croce <mcroce@redhat.com>
      CC: Erik Kline <ek@google.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09400953
  4. 14 11月, 2017 21 次提交