1. 08 11月, 2018 4 次提交
    • M
      net: provide a sysctl raw_l3mdev_accept for raw socket lookup with VRFs · 6897445f
      Mike Manning 提交于
      Add a sysctl raw_l3mdev_accept to control raw socket lookup in a manner
      similar to use of tcp_l3mdev_accept for stream and of udp_l3mdev_accept
      for datagram sockets. Have this default to enabled for reasons of
      backwards compatibility. This is so as to specify the output device
      with cmsg and IP_PKTINFO, but using a socket not bound to the
      corresponding VRF. This allows e.g. older ping implementations to be
      run with specifying the device but without executing it in the VRF.
      If the option is disabled, packets received in a VRF context are only
      handled by a raw socket bound to the VRF, and correspondingly packets
      in the default VRF are only handled by a socket not bound to any VRF.
      Signed-off-by: NMike Manning <mmanning@vyatta.att-mail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6897445f
    • M
      net: ensure unbound datagram socket to be chosen when not in a VRF · 6da5b0f0
      Mike Manning 提交于
      Ensure an unbound datagram skt is chosen when not in a VRF. The check
      for a device match in compute_score() for UDP must be performed when
      there is no device match. For this, a failure is returned when there is
      no device match. This ensures that bound sockets are never selected,
      even if there is no unbound socket.
      
      Allow IPv6 packets to be sent over a datagram skt bound to a VRF. These
      packets are currently blocked, as flowi6_oif was set to that of the
      master vrf device, and the ipi6_ifindex is that of the slave device.
      Allow these packets to be sent by checking the device with ipi6_ifindex
      has the same L3 scope as that of the bound device of the skt, which is
      the master vrf device. Note that this check always succeeds if the skt
      is unbound.
      
      Even though the right datagram skt is now selected by compute_score(),
      a different skt is being returned that is bound to the wrong vrf. The
      difference between these and stream sockets is the handling of the skt
      option for SO_REUSEPORT. While the handling when adding a skt for reuse
      correctly checks that the bound device of the skt is a match, the skts
      in the hashslot are already incorrect. So for the same hash, a skt for
      the wrong vrf may be selected for the required port. The root cause is
      that the skt is immediately placed into a slot when it is created,
      but when the skt is then bound using SO_BINDTODEVICE, it remains in the
      same slot. The solution is to move the skt to the correct slot by
      forcing a rehash.
      Signed-off-by: NMike Manning <mmanning@vyatta.att-mail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6da5b0f0
    • M
      net: ensure unbound stream socket to be chosen when not in a VRF · e7819058
      Mike Manning 提交于
      The commit a04a480d ("net: Require exact match for TCP socket
      lookups if dif is l3mdev") only ensures that the correct socket is
      selected for packets in a VRF. However, there is no guarantee that
      the unbound socket will be selected for packets when not in a VRF.
      By checking for a device match in compute_score() also for the case
      when there is no bound device and attaching a score to this, the
      unbound socket is selected. And if a failure is returned when there
      is no device match, this ensures that bound sockets are never selected,
      even if there is no unbound socket.
      Signed-off-by: NMike Manning <mmanning@vyatta.att-mail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7819058
    • R
      net: allow binding socket in a VRF when there's an unbound socket · 3c82a21f
      Robert Shearman 提交于
      Change the inet socket lookup to avoid packets arriving on a device
      enslaved to an l3mdev from matching unbound sockets by removing the
      wildcard for non sk_bound_dev_if and instead relying on check against
      the secondary device index, which will be 0 when the input device is
      not enslaved to an l3mdev and so match against an unbound socket and
      not match when the input device is enslaved.
      
      Change the socket binding to take the l3mdev into account to allow an
      unbound socket to not conflict sockets bound to an l3mdev given the
      datapath isolation now guaranteed.
      Signed-off-by: NRobert Shearman <rshearma@vyatta.att-mail.com>
      Signed-off-by: NMike Manning <mmanning@vyatta.att-mail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Tested-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3c82a21f
  2. 07 11月, 2018 4 次提交
  3. 06 11月, 2018 6 次提交
  4. 04 11月, 2018 3 次提交
  5. 03 11月, 2018 10 次提交
    • V
      netfilter: conntrack: fix calculation of next bucket number in early_drop · f393808d
      Vasily Khoruzhick 提交于
      If there's no entry to drop in bucket that corresponds to the hash,
      early_drop() should look for it in other buckets. But since it increments
      hash instead of bucket number, it actually looks in the same bucket 8
      times: hsize is 16k by default (14 bits) and hash is 32-bit value, so
      reciprocal_scale(hash, hsize) returns the same value for hash..hash+7 in
      most cases.
      
      Fix it by increasing bucket number instead of hash and rename _hash
      to bucket to avoid future confusion.
      
      Fixes: 3e86638e ("netfilter: conntrack: consider ct netns in early_drop logic")
      Cc: <stable@vger.kernel.org> # v4.7+
      Signed-off-by: NVasily Khoruzhick <vasilykh@arista.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      f393808d
    • F
      netfilter: nft_compat: ebtables 'nat' table is normal chain type · e4844c9c
      Florian Westphal 提交于
      Unlike ip(6)tables, the ebtables nat table has no special properties.
      This bug causes 'ebtables -A' to fail when using a target such as
      'snat' (ebt_snat target sets ".table = "nat"').  Targets that have
      no table restrictions work fine.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      e4844c9c
    • P
      netfilter: nfnetlink_cttimeout: pass default timeout policy to obj_to_nlattr · 8866df92
      Pablo Neira Ayuso 提交于
      Otherwise, we hit a NULL pointer deference since handlers always assume
      default timeout policy is passed.
      
        netlink: 24 bytes leftover after parsing attributes in process `syz-executor2'.
        kasan: CONFIG_KASAN_INLINE enabled
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        general protection fault: 0000 [#1] PREEMPT SMP KASAN
        CPU: 0 PID: 9575 Comm: syz-executor1 Not tainted 4.19.0+ #312
        Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
        RIP: 0010:icmp_timeout_obj_to_nlattr+0x77/0x170 net/netfilter/nf_conntrack_proto_icmp.c:297
      
      Fixes: c779e849 ("netfilter: conntrack: remove get_timeout() indirection")
      Reported-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8866df92
    • P
      netfilter: conntrack: add nf_{tcp,udp,sctp,icmp,dccp,icmpv6,generic}_pernet() · a95a7774
      Pablo Neira Ayuso 提交于
      Expose these functions to access conntrack protocol tracker netns area,
      nfnetlink_cttimeout needs this.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a95a7774
    • J
      netfilter: ipset: Fix calling ip_set() macro at dumping · 8a02bdd5
      Jozsef Kadlecsik 提交于
      The ip_set() macro is called when either ip_set_ref_lock held only
      or no lock/nfnl mutex is held at dumping. Take this into account
      properly. Also, use Pablo's suggestion to use rcu_dereference_raw(),
      the ref_netlink protects the set.
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      8a02bdd5
    • T
      netfilter: xt_IDLETIMER: add sysfs filename checking routine · 54451f60
      Taehee Yoo 提交于
      When IDLETIMER rule is added, sysfs file is created under
      /sys/class/xt_idletimer/timers/
      But some label name shouldn't be used.
      ".", "..", "power", "uevent", "subsystem", etc...
      So that sysfs filename checking routine is needed.
      
      test commands:
         %iptables -I INPUT -j IDLETIMER --timeout 1 --label "power"
      
      splat looks like:
      [95765.423132] sysfs: cannot create duplicate filename '/devices/virtual/xt_idletimer/timers/power'
      [95765.433418] CPU: 0 PID: 8446 Comm: iptables Not tainted 4.19.0-rc6+ #20
      [95765.449755] Call Trace:
      [95765.449755]  dump_stack+0xc9/0x16b
      [95765.449755]  ? show_regs_print_info+0x5/0x5
      [95765.449755]  sysfs_warn_dup+0x74/0x90
      [95765.449755]  sysfs_add_file_mode_ns+0x352/0x500
      [95765.449755]  sysfs_create_file_ns+0x179/0x270
      [95765.449755]  ? sysfs_add_file_mode_ns+0x500/0x500
      [95765.449755]  ? idletimer_tg_checkentry+0x3e5/0xb1b [xt_IDLETIMER]
      [95765.449755]  ? rcu_read_lock_sched_held+0x114/0x130
      [95765.449755]  ? __kmalloc_track_caller+0x211/0x2b0
      [95765.449755]  ? memcpy+0x34/0x50
      [95765.449755]  idletimer_tg_checkentry+0x4e2/0xb1b [xt_IDLETIMER]
      [ ... ]
      
      Fixes: 0902b469 ("netfilter: xtables: idletimer target implementation")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      54451f60
    • D
      rxrpc: Fix lockup due to no error backoff after ack transmit error · c7e86acf
      David Howells 提交于
      If the network becomes (partially) unavailable, say by disabling IPv6, the
      background ACK transmission routine can get itself into a tizzy by
      proposing immediate ACK retransmission.  Since we're in the call event
      processor, that happens immediately without returning to the workqueue
      manager.
      
      The condition should clear after a while when either the network comes back
      or the call times out.
      
      Fix this by:
      
       (1) When re-proposing an ACK on failed Tx, don't schedule it immediately.
           This will allow a certain amount of time to elapse before we try
           again.
      
       (2) Enforce a return to the workqueue manager after a certain number of
           iterations of the call processing loop.
      
       (3) Add a backoff delay that increases the delay on deferred ACKs by a
           jiffy per failed transmission to a limit of HZ.  The backoff delay is
           cleared on a successful return from kernel_sendmsg().
      
       (4) Cancel calls immediately if the opening sendmsg fails.  The layer
           above can arrange retransmission or rotate to another server.
      
      Fixes: 248f219c ("rxrpc: Rewrite the data and ack handling code")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7e86acf
    • J
      net/ipv6: Add anycast addresses to a global hashtable · 2384d025
      Jeff Barnhill 提交于
      icmp6_send() function is expensive on systems with a large number of
      interfaces. Every time it’s called, it has to verify that the source
      address does not correspond to an existing anycast address by looping
      through every device and every anycast address on the device.  This can
      result in significant delays for a CPU when there are a large number of
      neighbors and ND timers are frequently timing out and calling
      neigh_invalidate().
      
      Add anycast addresses to a global hashtable to allow quick searching for
      matching anycast addresses.  This is based on inet6_addr_lst in addrconf.c.
      Signed-off-by: NJeff Barnhill <0xeffeff@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2384d025
    • M
      net: document skb parameter in function 'skb_gso_size_check' · 49682bfa
      Mathieu Malaterre 提交于
      Remove kernel-doc warning:
      
        net/core/skbuff.c:4953: warning: Function parameter or member 'skb' not described in 'skb_gso_size_check'
      Signed-off-by: NMathieu Malaterre <malat@debian.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      49682bfa
    • M
      iov_iter: Fix 9p virtio breakage · 2cbfdf4d
      Marc Zyngier 提交于
      When switching to the new iovec accessors, a negation got subtly
      dropped, leading to 9p being remarkably broken (here with kvmtool):
      
      [    7.430941] VFS: Mounted root (9p filesystem) on device 0:15.
      [    7.432080] devtmpfs: mounted
      [    7.432717] Freeing unused kernel memory: 1344K
      [    7.433658] Run /virt/init as init process
        Warning: unable to translate guest address 0x7e00902ff000 to host
        Warning: unable to translate guest address 0x7e00902fefc0 to host
        Warning: unable to translate guest address 0x7e00902ff000 to host
        Warning: unable to translate guest address 0x7e008febef80 to host
        Warning: unable to translate guest address 0x7e008febf000 to host
        Warning: unable to translate guest address 0x7e008febef00 to host
        Warning: unable to translate guest address 0x7e008febf000 to host
      [    7.436376] Kernel panic - not syncing: Requested init /virt/init failed (error -8).
      [    7.437554] CPU: 29 PID: 1 Comm: swapper/0 Not tainted 4.19.0-rc8-02267-g00e23707 #291
      [    7.439006] Hardware name: linux,dummy-virt (DT)
      [    7.439902] Call trace:
      [    7.440387]  dump_backtrace+0x0/0x148
      [    7.441104]  show_stack+0x14/0x20
      [    7.441768]  dump_stack+0x90/0xb4
      [    7.442425]  panic+0x120/0x27c
      [    7.443036]  kernel_init+0xa4/0x100
      [    7.443725]  ret_from_fork+0x10/0x18
      [    7.444444] SMP: stopping secondary CPUs
      [    7.445391] Kernel Offset: disabled
      [    7.446169] CPU features: 0x0,23000438
      [    7.446974] Memory Limit: none
      [    7.447645] ---[ end Kernel panic - not syncing: Requested init /virt/init failed (error -8). ]---
      
      Restoring the missing "!" brings the guest back to life.
      
      Fixes: 00e23707 ("iov_iter: Use accessor function")
      Reported-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2cbfdf4d
  6. 02 11月, 2018 3 次提交
    • A
      missing bits of "iov_iter: Separate type from direction and use accessor functions" · 0e9b4a82
      Al Viro 提交于
      sunrpc patches from nfs tree conflict with calling conventions change done
      in iov_iter work.  Trivial fixup...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0e9b4a82
    • C
      net: drop skb on failure in ip_check_defrag() · 7de414a9
      Cong Wang 提交于
      Most callers of pskb_trim_rcsum() simply drop the skb when
      it fails, however, ip_check_defrag() still continues to pass
      the skb up to stack. This is suspicious.
      
      In ip_check_defrag(), after we learn the skb is an IP fragment,
      passing the skb to callers makes no sense, because callers expect
      fragments are defrag'ed on success. So, dropping the skb when we
      can't defrag it is reasonable.
      
      Note, prior to commit 88078d98, this is not a big problem as
      checksum will be fixed up anyway. After it, the checksum is not
      correct on failure.
      
      Found this during code review.
      
      Fixes: 88078d98 ("net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends")
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7de414a9
    • P
      SUNRPC: Use atomic(64)_t for seq_send(64) · c3be6577
      Paul Burton 提交于
      The seq_send & seq_send64 fields in struct krb5_ctx are used as
      atomically incrementing counters. This is implemented using cmpxchg() &
      cmpxchg64() to implement what amount to custom versions of
      atomic_fetch_inc() & atomic64_fetch_inc().
      
      Besides the duplication, using cmpxchg64() has another major drawback in
      that some 32 bit architectures don't provide it. As such commit
      571ed1fd ("SUNRPC: Replace krb5_seq_lock with a lockless scheme")
      resulted in build failures for some architectures.
      
      Change seq_send to be an atomic_t and seq_send64 to be an atomic64_t,
      then use atomic(64)_* functions to manipulate the values. The atomic64_t
      type & associated functions are provided even on architectures which
      lack real 64 bit atomic memory access via CONFIG_GENERIC_ATOMIC64 which
      uses spinlocks to serialize access. This fixes the build failures for
      architectures lacking cmpxchg64().
      
      A potential alternative that was raised would be to provide cmpxchg64()
      on the 32 bit architectures that currently lack it, using spinlocks.
      However this would provide a version of cmpxchg64() with semantics a
      little different to the implementations on architectures with real 64
      bit atomics - the spinlock-based implementation would only work if all
      access to the memory used with cmpxchg64() is *always* performed using
      cmpxchg64(). That is not currently a requirement for users of
      cmpxchg64(), and making it one seems questionable. As such avoiding
      cmpxchg64() outside of architecture-specific code seems best,
      particularly in cases where atomic64_t seems like a better fit anyway.
      
      The CONFIG_GENERIC_ATOMIC64 implementation of atomic64_* functions will
      use spinlocks & so faces the same issue, but with the key difference
      that the memory backing an atomic64_t ought to always be accessed via
      the atomic64_* functions anyway making the issue moot.
      Signed-off-by: NPaul Burton <paul.burton@mips.com>
      Fixes: 571ed1fd ("SUNRPC: Replace krb5_seq_lock with a lockless scheme")
      Cc: Trond Myklebust <trond.myklebust@hammerspace.com>
      Cc: Anna Schumaker <anna.schumaker@netapp.com>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Jeff Layton <jlayton@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: linux-nfs@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      c3be6577
  7. 01 11月, 2018 6 次提交
    • D
      compat: Cleanup in_compat_syscall() callers · 98f76206
      Dmitry Safonov 提交于
      Now that in_compat_syscall() is consistent on all architectures and does
      not longer report true on native i686, the workarounds (ifdeffery and
      helpers) can be removed.
      Signed-off-by: NDmitry Safonov <dima@arista.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Dmitry Safonov <0x7f454c46@gmail.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andy Lutomirsky <luto@kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: linux-efi@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Link: https://lkml.kernel.org/r/20181012134253.23266-3-dima@arista.com
      98f76206
    • J
      openvswitch: Fix push/pop ethernet validation · 46ebe283
      Jaime Caamaño Ruiz 提交于
      When there are both pop and push ethernet header actions among the
      actions to be applied to a packet, an unexpected EINVAL (Invalid
      argument) error is obtained. This is due to mac_proto not being reset
      correctly when those actions are validated.
      
      Reported-at:
      https://mail.openvswitch.org/pipermail/ovs-discuss/2018-October/047554.html
      Fixes: 91820da6 ("openvswitch: add Ethernet push and pop actions")
      Signed-off-by: NJaime Caamaño Ruiz <jcaamano@suse.com>
      Tested-by: NGreg Rose <gvrose8192@gmail.com>
      Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      46ebe283
    • A
      netfilter: ipset: fix ip_set_list allocation failure · ed956f39
      Andrey Ryabinin 提交于
      ip_set_create() and ip_set_net_init() attempt to allocate physically
      contiguous memory for ip_set_list. If memory is fragmented, the
      allocations could easily fail:
      
              vzctl: page allocation failure: order:7, mode:0xc0d0
      
              Call Trace:
               dump_stack+0x19/0x1b
               warn_alloc_failed+0x110/0x180
               __alloc_pages_nodemask+0x7bf/0xc60
               alloc_pages_current+0x98/0x110
               kmalloc_order+0x18/0x40
               kmalloc_order_trace+0x26/0xa0
               __kmalloc+0x279/0x290
               ip_set_net_init+0x4b/0x90 [ip_set]
               ops_init+0x3b/0xb0
               setup_net+0xbb/0x170
               copy_net_ns+0xf1/0x1c0
               create_new_namespaces+0xf9/0x180
               copy_namespaces+0x8e/0xd0
               copy_process+0xb61/0x1a00
               do_fork+0x91/0x320
      
      Use kvcalloc() to fallback to 0-order allocations if high order
      page isn't available.
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      ed956f39
    • E
      netfilter: ipset: actually allow allowable CIDR 0 in hash:net,port,net · 886503f3
      Eric Westbrook 提交于
      Allow /0 as advertised for hash:net,port,net sets.
      
      For "hash:net,port,net", ipset(8) says that "either subnet
      is permitted to be a /0 should you wish to match port
      between all destinations."
      
      Make that statement true.
      
      Before:
      
          # ipset create cidrzero hash:net,port,net
          # ipset add cidrzero 0.0.0.0/0,12345,0.0.0.0/0
          ipset v6.34: The value of the CIDR parameter of the IP address is invalid
      
          # ipset create cidrzero6 hash:net,port,net family inet6
          # ipset add cidrzero6 ::/0,12345,::/0
          ipset v6.34: The value of the CIDR parameter of the IP address is invalid
      
      After:
      
          # ipset create cidrzero hash:net,port,net
          # ipset add cidrzero 0.0.0.0/0,12345,0.0.0.0/0
          # ipset test cidrzero 192.168.205.129,12345,172.16.205.129
          192.168.205.129,tcp:12345,172.16.205.129 is in set cidrzero.
      
          # ipset create cidrzero6 hash:net,port,net family inet6
          # ipset add cidrzero6 ::/0,12345,::/0
          # ipset test cidrzero6 fe80::1,12345,ff00::1
          fe80::1,tcp:12345,ff00::1 is in set cidrzero6.
      
      See also:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=200897
        https://github.com/ewestbrook/linux/commit/df7ff6efb0934ab6acc11f003ff1a7580d6c1d9cSigned-off-by: NEric Westbrook <linux@westbrook.io>
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      886503f3
    • S
      netfilter: ipset: list:set: Decrease refcount synchronously on deletion and replace · 439cd39e
      Stefano Brivio 提交于
      Commit 45040978 ("netfilter: ipset: Fix set:list type crash
      when flush/dump set in parallel") postponed decreasing set
      reference counters to the RCU callback.
      
      An 'ipset del' command can terminate before the RCU grace period
      is elapsed, and if sets are listed before then, the reference
      counter shown in userspace will be wrong:
      
       # ipset create h hash:ip; ipset create l list:set; ipset add l
       # ipset del l h; ipset list h
       Name: h
       Type: hash:ip
       Revision: 4
       Header: family inet hashsize 1024 maxelem 65536
       Size in memory: 88
       References: 1
       Number of entries: 0
       Members:
       # sleep 1; ipset list h
       Name: h
       Type: hash:ip
       Revision: 4
       Header: family inet hashsize 1024 maxelem 65536
       Size in memory: 88
       References: 0
       Number of entries: 0
       Members:
      
      Fix this by making the reference count update synchronous again.
      
      As a result, when sets are listed, ip_set_name_byindex() might
      now fetch a set whose reference count is already zero. Instead
      of relying on the reference count to protect against concurrent
      set renaming, grab ip_set_ref_lock as reader and copy the name,
      while holding the same lock in ip_set_rename() as writer
      instead.
      Reported-by: NLi Shuang <shuali@redhat.com>
      Fixes: 45040978 ("netfilter: ipset: Fix set:list type crash when flush/dump set in parallel")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      439cd39e
    • J
      ixgbe/ixgbevf: fix XFRM_ALGO dependency · 48e01e00
      Jeff Kirsher 提交于
      Based on the original work from Arnd Bergmann.
      
      When XFRM_ALGO is not enabled, the new ixgbe IPsec code produces a
      link error:
      
      drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.o: In function `ixgbe_ipsec_vf_add_sa':
      ixgbe_ipsec.c:(.text+0x1266): undefined reference to `xfrm_aead_get_byname'
      
      Simply selecting XFRM_ALGO from here causes circular dependencies, so
      to fix it, we probably want this slightly more complex solution that is
      similar to what other drivers with XFRM offload do:
      
      A separate Kconfig symbol now controls whether we include the IPsec
      offload code. To keep the old behavior, this is left as 'default y'. The
      dependency in XFRM_OFFLOAD still causes a circular dependency but is
      not actually needed because this symbol is not user visible, so removing
      that dependency on top makes it all work.
      
      CC: Arnd Bergmann <arnd@arndb.de>
      CC: Shannon Nelson <shannon.nelson@oracle.com>
      Fixes: eda0333a ("ixgbe: add VF IPsec management")
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      48e01e00
  8. 31 10月, 2018 2 次提交
  9. 30 10月, 2018 2 次提交
    • I
      rtnetlink: Disallow FDB configuration for non-Ethernet device · da715775
      Ido Schimmel 提交于
      When an FDB entry is configured, the address is validated to have the
      length of an Ethernet address, but the device for which the address is
      configured can be of any type.
      
      The above can result in the use of uninitialized memory when the address
      is later compared against existing addresses since 'dev->addr_len' is
      used and it may be greater than ETH_ALEN, as with ip6tnl devices.
      
      Fix this by making sure that FDB entries are only configured for
      Ethernet devices.
      
      BUG: KMSAN: uninit-value in memcmp+0x11d/0x180 lib/string.c:863
      CPU: 1 PID: 4318 Comm: syz-executor998 Not tainted 4.19.0-rc3+ #49
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x14b/0x190 lib/dump_stack.c:113
        kmsan_report+0x183/0x2b0 mm/kmsan/kmsan.c:956
        __msan_warning+0x70/0xc0 mm/kmsan/kmsan_instr.c:645
        memcmp+0x11d/0x180 lib/string.c:863
        dev_uc_add_excl+0x165/0x7b0 net/core/dev_addr_lists.c:464
        ndo_dflt_fdb_add net/core/rtnetlink.c:3463 [inline]
        rtnl_fdb_add+0x1081/0x1270 net/core/rtnetlink.c:3558
        rtnetlink_rcv_msg+0xa0b/0x1530 net/core/rtnetlink.c:4715
        netlink_rcv_skb+0x36e/0x5f0 net/netlink/af_netlink.c:2454
        rtnetlink_rcv+0x50/0x60 net/core/rtnetlink.c:4733
        netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
        netlink_unicast+0x1638/0x1720 net/netlink/af_netlink.c:1343
        netlink_sendmsg+0x1205/0x1290 net/netlink/af_netlink.c:1908
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg net/socket.c:631 [inline]
        ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
        __sys_sendmsg net/socket.c:2152 [inline]
        __do_sys_sendmsg net/socket.c:2161 [inline]
        __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
        do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
        entry_SYSCALL_64_after_hwframe+0x63/0xe7
      RIP: 0033:0x440ee9
      Code: e8 cc ab 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
      48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
      ff 0f 83 bb 0a fc ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007fff6a93b518 EFLAGS: 00000213 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000440ee9
      RDX: 0000000000000000 RSI: 0000000020000240 RDI: 0000000000000003
      RBP: 0000000000000000 R08: 00000000004002c8 R09: 00000000004002c8
      R10: 00000000004002c8 R11: 0000000000000213 R12: 000000000000b4b0
      R13: 0000000000401ec0 R14: 0000000000000000 R15: 0000000000000000
      
      Uninit was created at:
        kmsan_save_stack_with_flags mm/kmsan/kmsan.c:256 [inline]
        kmsan_internal_poison_shadow+0xb8/0x1b0 mm/kmsan/kmsan.c:181
        kmsan_kmalloc+0x98/0x100 mm/kmsan/kmsan_hooks.c:91
        kmsan_slab_alloc+0x10/0x20 mm/kmsan/kmsan_hooks.c:100
        slab_post_alloc_hook mm/slab.h:446 [inline]
        slab_alloc_node mm/slub.c:2718 [inline]
        __kmalloc_node_track_caller+0x9e7/0x1160 mm/slub.c:4351
        __kmalloc_reserve net/core/skbuff.c:138 [inline]
        __alloc_skb+0x2f5/0x9e0 net/core/skbuff.c:206
        alloc_skb include/linux/skbuff.h:996 [inline]
        netlink_alloc_large_skb net/netlink/af_netlink.c:1189 [inline]
        netlink_sendmsg+0xb49/0x1290 net/netlink/af_netlink.c:1883
        sock_sendmsg_nosec net/socket.c:621 [inline]
        sock_sendmsg net/socket.c:631 [inline]
        ___sys_sendmsg+0xe70/0x1290 net/socket.c:2114
        __sys_sendmsg net/socket.c:2152 [inline]
        __do_sys_sendmsg net/socket.c:2161 [inline]
        __se_sys_sendmsg+0x2a3/0x3d0 net/socket.c:2159
        __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2159
        do_syscall_64+0xb8/0x100 arch/x86/entry/common.c:291
        entry_SYSCALL_64_after_hwframe+0x63/0xe7
      
      v2:
      * Make error message more specific (David)
      
      Fixes: 090096bf ("net: generic fdb support for drivers without ndo_fdb_<op>")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reported-and-tested-by: syzbot+3a288d5f5530b901310e@syzkaller.appspotmail.com
      Reported-and-tested-by: syzbot+d53ab4e92a1db04110ff@syzkaller.appspotmail.com
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da715775
    • X
      sctp: check policy more carefully when getting pr status · 71335836
      Xin Long 提交于
      When getting pr_assocstatus and pr_streamstatus by sctp_getsockopt,
      it doesn't correctly process the case when policy is set with
      SCTP_PR_SCTP_ALL | SCTP_PR_SCTP_MASK. It even causes a
      slab-out-of-bounds in sctp_getsockopt_pr_streamstatus().
      
      This patch fixes it by return -EINVAL for this case.
      
      Fixes: 0ac1077e ("sctp: get pr_assoc and pr_stream all status with SCTP_PR_SCTP_ALL")
      Reported-by: syzbot+5da0d0a72a9e7d791748@syzkaller.appspotmail.com
      Suggested-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      71335836