1. 15 3月, 2019 1 次提交
    • G
      net: kernel hookers service for toa module · 33576e37
      George Zhang 提交于
      LVS fullnat will replace network traffic's source ip with its local ip,
      and thus the backend servers cannot obtain the real client ip.
      
      To solve this, LVS has introduced the tcp option address (TOA) to store
      the essential ip address information in the last tcp ack packet of the
      3-way handshake, and the backend servers need to retrieve it from the
      packet header.
      
      In this patch, we have introduced the sk_toa_data member in the sock
      structure to hold the TOA information. There used to be an in-tree
      module for TOA managing, whereas it has now been maintained as an
      standalone module.
      
      In this case, the toa module should register its hook function(s) using
      the provided interfaces in the hookers module.
      
      TOA in sock structure:
      
      	__be32 sk_toa_data[16];
      
      The hookers module only provides the sk_toa_data placeholder, and the
      toa module can use this variable through the layout it needs.
      
      Hook interfaces:
      
      The hookers module replaces the kernel's syn_recv_sock and getname
      handler with a stub that chains the toa module's hook function(s) to the
      original handling function. The hookers module allows hook functions to
      be installed and uninstalled in any order.
      
      toa module:
      
      The external toa module will be provided in separate RPM package.
      
      [xuyu@linux.alibaba.com: amend commit log]
      Signed-off-by: NGeorge Zhang <georgezhang@linux.alibaba.com>
      Signed-off-by: NXu Yu <xuyu@linux.alibaba.com>
      Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>
      33576e37
  2. 07 2月, 2019 1 次提交
    • D
      ipvlan, l3mdev: fix broken l3s mode wrt local routes · fcc9c69a
      Daniel Borkmann 提交于
      [ Upstream commit d5256083f62e2720f75bb3c5a928a0afe47d6bc3 ]
      
      While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
      I ran into the issue that while l3 mode is working fine, l3s mode
      does not have any connectivity to kube-apiserver and hence all pods
      end up in Error state as well. The ipvlan master device sits on
      top of a bond device and hostns traffic to kube-apiserver (also running
      in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
      where the latter is the address of the bond0. While in l3 mode, a
      curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
      works fine from hostns, neither of them do in case of l3s. In the
      latter only a curl to https://127.0.0.1:37573 appeared to work where
      for local addresses of bond0 I saw kernel suddenly starting to emit
      ARP requests to query HW address of bond0 which remained unanswered
      and neighbor entries in INCOMPLETE state. These ARP requests only
      happen while in l3s.
      
      Debugging this further, I found the issue is that l3s mode is piggy-
      backing on l3 master device, and in this case local routes are using
      l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
      f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev
      if relevant") and 5f02ce24 ("net: l3mdev: Allow the l3mdev to be
      a loopback"). I found that reverting them back into using the
      net->loopback_dev fixed ipvlan l3s connectivity and got everything
      working for the CNI.
      
      Now judging from 4fbae7d8 ("ipvlan: Introduce l3s mode") and the
      l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
      on l3 master device is to get the l3mdev_ip_rcv() receive hook for
      setting the dst entry of the input route without adding its own
      ipvlan specific hacks into the receive path, however, any l3 domain
      semantics beyond just that are breaking l3s operation. Note that
      ipvlan also has the ability to dynamically switch its internal
      operation from l3 to l3s for all ports via ipvlan_set_port_mode()
      at runtime. In any case, l3 vs l3s soley distinguishes itself by
      'de-confusing' netfilter through switching skb->dev to ipvlan slave
      device late in NF_INET_LOCAL_IN before handing the skb to L4.
      
      Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
      if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
      without any additional l3mdev semantics on top. This should also have
      minimal impact since dev->priv_flags is already hot in cache. With
      this set, l3s mode is working fine and I also get things like
      masquerading pod traffic on the ipvlan master properly working.
      
        [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
      
      Fixes: f5a0aab8 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
      Fixes: 5f02ce24 ("net: l3mdev: Allow the l3mdev to be a loopback")
      Fixes: 4fbae7d8 ("ipvlan: Introduce l3s mode")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Martynas Pumputis <m@lambda.lt>
      Acked-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      fcc9c69a
  3. 31 1月, 2019 1 次提交
    • I
      net: ipv4: Fix memory leak in network namespace dismantle · adbf7e58
      Ido Schimmel 提交于
      [ Upstream commit f97f4dd8b3bb9d0993d2491e0f22024c68109184 ]
      
      IPv4 routing tables are flushed in two cases:
      
      1. In response to events in the netdev and inetaddr notification chains
      2. When a network namespace is being dismantled
      
      In both cases only routes associated with a dead nexthop group are
      flushed. However, a nexthop group will only be marked as dead in case it
      is populated with actual nexthops using a nexthop device. This is not
      the case when the route in question is an error route (e.g.,
      'blackhole', 'unreachable').
      
      Therefore, when a network namespace is being dismantled such routes are
      not flushed and leaked [1].
      
      To reproduce:
      # ip netns add blue
      # ip -n blue route add unreachable 192.0.2.0/24
      # ip netns del blue
      
      Fix this by not skipping error routes that are not marked with
      RTNH_F_DEAD when flushing the routing tables.
      
      To prevent the flushing of such routes in case #1, add a parameter to
      fib_table_flush() that indicates if the table is flushed as part of
      namespace dismantle or not.
      
      Note that this problem does not exist in IPv6 since error routes are
      associated with the loopback device.
      
      [1]
      unreferenced object 0xffff888066650338 (size 56):
        comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 b0 1c 62 61 80 88 ff ff  ..........ba....
          e8 8b a1 64 80 88 ff ff 00 07 00 08 fe 00 00 00  ...d............
        backtrace:
          [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
          [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
          [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
          [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
          [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
          [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
          [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
          [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
          [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
          [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000003a8b605b>] 0xffffffffffffffff
      unreferenced object 0xffff888061621c88 (size 48):
        comm "ip", pid 1206, jiffies 4294786063 (age 26.235s)
        hex dump (first 32 bytes):
          6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b  kkkkkkkkkkkkkkkk
          6b 6b 6b 6b 6b 6b 6b 6b d8 8e 26 5f 80 88 ff ff  kkkkkkkk..&_....
        backtrace:
          [<00000000733609e3>] fib_table_insert+0x978/0x1500
          [<00000000856ed27d>] inet_rtm_newroute+0x129/0x220
          [<00000000fcdfc00a>] rtnetlink_rcv_msg+0x397/0xa20
          [<00000000cb85801a>] netlink_rcv_skb+0x132/0x380
          [<00000000ebc991d2>] netlink_unicast+0x4c0/0x690
          [<0000000014f62875>] netlink_sendmsg+0x929/0xe10
          [<00000000bac9d967>] sock_sendmsg+0xc8/0x110
          [<00000000223e6485>] ___sys_sendmsg+0x77a/0x8f0
          [<000000002e94f880>] __sys_sendmsg+0xf7/0x250
          [<00000000ccb1fa72>] do_syscall_64+0x14d/0x610
          [<00000000ffbe3dae>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
          [<000000003a8b605b>] 0xffffffffffffffff
      
      Fixes: 8cced9ef ("[NETNS]: Enable routing configuration in non-initial namespace.")
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adbf7e58
  4. 23 1月, 2019 2 次提交
  5. 10 1月, 2019 2 次提交
    • D
      sock: Make sock->sk_stamp thread-safe · 60f05ddd
      Deepa Dinamani 提交于
      [ Upstream commit 3a0ed3e9619738067214871e9cb826fa23b2ddb9 ]
      
      Al Viro mentioned (Message-ID
      <20170626041334.GZ10672@ZenIV.linux.org.uk>)
      that there is probably a race condition
      lurking in accesses of sk_stamp on 32-bit machines.
      
      sock->sk_stamp is of type ktime_t which is always an s64.
      On a 32 bit architecture, we might run into situations of
      unsafe access as the access to the field becomes non atomic.
      
      Use seqlocks for synchronization.
      This allows us to avoid using spinlocks for readers as
      readers do not need mutual exclusion.
      
      Another approach to solve this is to require sk_lock for all
      modifications of the timestamps. The current approach allows
      for timestamps to have their own lock: sk_stamp_lock.
      This allows for the patch to not compete with already
      existing critical sections, and side effects are limited
      to the paths in the patch.
      
      The addition of the new field maintains the data locality
      optimizations from
      commit 9115e8cd ("net: reorganize struct sock for better data
      locality")
      
      Note that all the instances of the sk_stamp accesses
      are either through the ioctl or the syscall recvmsg.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60f05ddd
    • W
      ip: validate header length on virtual device xmit · cde81154
      Willem de Bruijn 提交于
      [ Upstream commit cb9f1b783850b14cbd7f87d061d784a666dfba1f ]
      
      KMSAN detected read beyond end of buffer in vti and sit devices when
      passing truncated packets with PF_PACKET. The issue affects additional
      ip tunnel devices.
      
      Extend commit 76c0ddd8 ("ip6_tunnel: be careful when accessing the
      inner header") and commit ccfec9e5 ("ip_tunnel: be careful when
      accessing the inner header").
      
      Move the check to a separate helper and call at the start of each
      ndo_start_xmit function in net/ipv4 and net/ipv6.
      
      Minor changes:
      - convert dev_kfree_skb to kfree_skb on error path,
        as dev_kfree_skb calls consume_skb which is not for error paths.
      - use pskb_network_may_pull even though that is pedantic here,
        as the same as pskb_may_pull for devices without llheaders.
      - do not cache ipv6 hdrs if used only once
        (unsafe across pskb_may_pull, was more relevant to earlier patch)
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cde81154
  6. 29 12月, 2018 1 次提交
  7. 17 12月, 2018 3 次提交
  8. 01 12月, 2018 1 次提交
  9. 23 11月, 2018 1 次提交
  10. 04 11月, 2018 1 次提交
  11. 18 10月, 2018 2 次提交
  12. 16 10月, 2018 2 次提交
    • X
      sctp: use the pmtu from the icmp packet to update transport pathmtu · d805397c
      Xin Long 提交于
      Other than asoc pmtu sync from all transports, sctp_assoc_sync_pmtu
      is also processing transport pmtu_pending by icmp packets. But it's
      meaningless to use sctp_dst_mtu(t->dst) as new pmtu for a transport.
      
      The right pmtu value should come from the icmp packet, and it would
      be saved into transport->mtu_info in this patch and used later when
      the pmtu sync happens in sctp_sendmsg_to_asoc or sctp_packet_config.
      
      Besides, without this patch, as pmtu can only be updated correctly
      when receiving a icmp packet and no place is holding sock lock, it
      will take long time if the sock is busy with sending packets.
      
      Note that it doesn't process transport->mtu_info in .release_cb(),
      as there is no enough information for pmtu update, like for which
      asoc or transport. It is not worth traversing all asocs to check
      pmtu_pending. So unlike tcp, sctp does this in tx path, for which
      mtu_info needs to be atomic_t.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d805397c
    • S
      ipv6: rate-limit probes for neighbourless routes · f547fac6
      Sabrina Dubroca 提交于
      When commit 27097255 ("[IPV6]: ROUTE: Add Router Reachability
      Probing (RFC4191).") introduced router probing, the rt6_probe() function
      required that a neighbour entry existed. This neighbour entry is used to
      record the timestamp of the last probe via the ->updated field.
      
      Later, commit 2152caea ("ipv6: Do not depend on rt->n in rt6_probe().")
      removed the requirement for a neighbour entry. Neighbourless routes skip
      the interval check and are not rate-limited.
      
      This patch adds rate-limiting for neighbourless routes, by recording the
      timestamp of the last probe in the fib6_info itself.
      
      Fixes: 2152caea ("ipv6: Do not depend on rt->n in rt6_probe().")
      Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
      Reviewed-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f547fac6
  13. 11 10月, 2018 3 次提交
  14. 03 10月, 2018 1 次提交
  15. 02 10月, 2018 1 次提交
    • E
      tcp/dccp: fix lockdep issue when SYN is backlogged · 1ad98e9d
      Eric Dumazet 提交于
      In normal SYN processing, packets are handled without listener
      lock and in RCU protected ingress path.
      
      But syzkaller is known to be able to trick us and SYN
      packets might be processed in process context, after being
      queued into socket backlog.
      
      In commit 06f877d6 ("tcp/dccp: fix other lockdep splats
      accessing ireq_opt") I made a very stupid fix, that happened
      to work mostly because of the regular path being RCU protected.
      
      Really the thing protecting ireq->ireq_opt is RCU read lock,
      and the pseudo request refcnt is not relevant.
      
      This patch extends what I did in commit 449809a6 ("tcp/dccp:
      block BH for SYN processing") by adding an extra rcu_read_{lock|unlock}
      pair in the paths that might be taken when processing SYN from
      socket backlog (thus possibly in process context)
      
      Fixes: 06f877d6 ("tcp/dccp: fix other lockdep splats accessing ireq_opt")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ad98e9d
  16. 30 9月, 2018 1 次提交
  17. 27 9月, 2018 1 次提交
    • M
      bonding: avoid possible dead-lock · d4859d74
      Mahesh Bandewar 提交于
      Syzkaller reported this on a slightly older kernel but it's still
      applicable to the current kernel -
      
      ======================================================
      WARNING: possible circular locking dependency detected
      4.18.0-next-20180823+ #46 Not tainted
      ------------------------------------------------------
      syz-executor4/26841 is trying to acquire lock:
      00000000dd41ef48 ((wq_completion)bond_dev->name){+.+.}, at: flush_workqueue+0x2db/0x1e10 kernel/workqueue.c:2652
      
      but task is already holding lock:
      00000000768ab431 (rtnl_mutex){+.+.}, at: rtnl_lock net/core/rtnetlink.c:77 [inline]
      00000000768ab431 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x412/0xc30 net/core/rtnetlink.c:4708
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #2 (rtnl_mutex){+.+.}:
             __mutex_lock_common kernel/locking/mutex.c:925 [inline]
             __mutex_lock+0x171/0x1700 kernel/locking/mutex.c:1073
             mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:1088
             rtnl_lock+0x17/0x20 net/core/rtnetlink.c:77
             bond_netdev_notify drivers/net/bonding/bond_main.c:1310 [inline]
             bond_netdev_notify_work+0x44/0xd0 drivers/net/bonding/bond_main.c:1320
             process_one_work+0xc73/0x1aa0 kernel/workqueue.c:2153
             worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
             kthread+0x35a/0x420 kernel/kthread.c:246
             ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
      
      -> #1 ((work_completion)(&(&nnw->work)->work)){+.+.}:
             process_one_work+0xc0b/0x1aa0 kernel/workqueue.c:2129
             worker_thread+0x189/0x13c0 kernel/workqueue.c:2296
             kthread+0x35a/0x420 kernel/kthread.c:246
             ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:415
      
      -> #0 ((wq_completion)bond_dev->name){+.+.}:
             lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
             flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
             drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
             destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
             __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
             bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
             register_netdevice+0x337/0x1100 net/core/dev.c:8410
             bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
             rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
             rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
             netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
             rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
             netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
             netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
             netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
             sock_sendmsg_nosec net/socket.c:622 [inline]
             sock_sendmsg+0xd5/0x120 net/socket.c:632
             ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
             __sys_sendmsg+0x11d/0x290 net/socket.c:2153
             __do_sys_sendmsg net/socket.c:2162 [inline]
             __se_sys_sendmsg net/socket.c:2160 [inline]
             __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
             do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
             entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      other info that might help us debug this:
      
      Chain exists of:
        (wq_completion)bond_dev->name --> (work_completion)(&(&nnw->work)->work) --> rtnl_mutex
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(rtnl_mutex);
                                     lock((work_completion)(&(&nnw->work)->work));
                                     lock(rtnl_mutex);
        lock((wq_completion)bond_dev->name);
      
       *** DEADLOCK ***
      
      1 lock held by syz-executor4/26841:
      
      stack backtrace:
      CPU: 1 PID: 26841 Comm: syz-executor4 Not tainted 4.18.0-next-20180823+ #46
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c9/0x2b4 lib/dump_stack.c:113
       print_circular_bug.isra.34.cold.55+0x1bd/0x27d kernel/locking/lockdep.c:1222
       check_prev_add kernel/locking/lockdep.c:1862 [inline]
       check_prevs_add kernel/locking/lockdep.c:1975 [inline]
       validate_chain kernel/locking/lockdep.c:2416 [inline]
       __lock_acquire+0x3449/0x5020 kernel/locking/lockdep.c:3412
       lock_acquire+0x1e4/0x4f0 kernel/locking/lockdep.c:3901
       flush_workqueue+0x30a/0x1e10 kernel/workqueue.c:2655
       drain_workqueue+0x2a9/0x640 kernel/workqueue.c:2820
       destroy_workqueue+0xc6/0x9d0 kernel/workqueue.c:4155
       __alloc_workqueue_key+0xef9/0x1190 kernel/workqueue.c:4138
       bond_init+0x269/0x940 drivers/net/bonding/bond_main.c:4734
       register_netdevice+0x337/0x1100 net/core/dev.c:8410
       bond_newlink+0x49/0xa0 drivers/net/bonding/bond_netlink.c:453
       rtnl_newlink+0xef4/0x1d50 net/core/rtnetlink.c:3099
       rtnetlink_rcv_msg+0x46e/0xc30 net/core/rtnetlink.c:4711
       netlink_rcv_skb+0x172/0x440 net/netlink/af_netlink.c:2454
       rtnetlink_rcv+0x1c/0x20 net/core/rtnetlink.c:4729
       netlink_unicast_kernel net/netlink/af_netlink.c:1317 [inline]
       netlink_unicast+0x5a0/0x760 net/netlink/af_netlink.c:1343
       netlink_sendmsg+0xa18/0xfc0 net/netlink/af_netlink.c:1908
       sock_sendmsg_nosec net/socket.c:622 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:632
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2115
       __sys_sendmsg+0x11d/0x290 net/socket.c:2153
       __do_sys_sendmsg net/socket.c:2162 [inline]
       __se_sys_sendmsg net/socket.c:2160 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2160
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457089
      Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f2df20a5c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f2df20a66d4 RCX: 0000000000457089
      RDX: 0000000000000000 RSI: 0000000020000180 RDI: 0000000000000003
      RBP: 0000000000930140 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000004d40b8 R14: 00000000004c8ad8 R15: 0000000000000001
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4859d74
  18. 26 9月, 2018 1 次提交
    • R
      cfg80211: fix reg_query_regdb_wmm kernel-doc · 0bcbf651
      Randy Dunlap 提交于
      Drop @ptr from kernel-doc for function reg_query_regdb_wmm().
      This function parameter was recently removed so update the
      kernel-doc to match that and remove the kernel-doc warnings.
      
      Removes 109 occurrences of this warning message:
      ../include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
      
      Fixes: 38cb87ee ("cfg80211: make wmm_rule part of the reg_rule structure")
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: Kalle Valo <kvalo@codeaurora.org>
      Cc: linux-wireless@vger.kernel.org
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      0bcbf651
  19. 19 9月, 2018 1 次提交
  20. 14 9月, 2018 1 次提交
  21. 29 8月, 2018 1 次提交
    • F
      netfilter: nf_tables: rework ct timeout set support · 0434ccdc
      Florian Westphal 提交于
      Using a private template is problematic:
      
      1. We can't assign both a zone and a timeout policy
         (zone assigns a conntrack template, so we hit problem 1)
      2. Using a template needs to take care of ct refcount, else we'll
         eventually free the private template due to ->use underflow.
      
      This patch reworks template policy to instead work with existing conntrack.
      
      As long as such conntrack has not yet been placed into the hash table
      (unconfirmed) we can still add the timeout extension.
      
      The only caveat is that we now need to update/correct ct->timeout to
      reflect the initial/new state, otherwise the conntrack entry retains the
      default 'new' timeout.
      
      Side effect of this change is that setting the policy must
      now occur from chains that are evaluated *after* the conntrack lookup
      has taken place.
      
      No released kernel contains the timeout policy feature yet, so this change
      should be ok.
      
      Changes since v2:
       - don't handle 'ct is confirmed case'
       - after previous patch, no need to special-case tcp/dccp/sctp timeout
         anymore
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0434ccdc
  22. 28 8月, 2018 1 次提交
    • S
      cfg80211: make wmm_rule part of the reg_rule structure · 38cb87ee
      Stanislaw Gruszka 提交于
      Make wmm_rule be part of the reg_rule structure. This simplifies the
      code a lot at the cost of having bigger memory usage. However in most
      cases we have only few reg_rule's and when we do have many like in
      iwlwifi we do not save memory as it allocates a separate wmm_rule for
      each channel anyway.
      
      This also fixes a bug reported in various places where somewhere the
      pointers were corrupted and we ended up doing a null-dereference.
      
      Fixes: 230ebaa1 ("cfg80211: read wmm rules from regulatory database")
      Signed-off-by: NStanislaw Gruszka <sgruszka@redhat.com>
      [rephrase commit message slightly]
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      38cb87ee
  23. 23 8月, 2018 1 次提交
    • A
      net_sched: fix unused variable warning in stmmac · 191672ca
      Arnd Bergmann 提交于
      The new tcf_exts_for_each_action() macro doesn't reference its
      arguments when CONFIG_NET_CLS_ACT is disabled, which leads to
      a harmless warning in at least one driver:
      
      drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c: In function 'tc_fill_actions':
      drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c:64:6: error: unused variable 'i' [-Werror=unused-variable]
      
      Adding a cast to void lets us avoid this kind of warning.
      To be on the safe side, do it for all three arguments, not
      just the one that caused the warning.
      
      Fixes: 244cd96a ("net_sched: remove list_head from tc_action")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      191672ca
  24. 22 8月, 2018 4 次提交
  25. 17 8月, 2018 3 次提交
    • D
      tcp, ulp: add alias for all ulp modules · 037b0b86
      Daniel Borkmann 提交于
      Lets not turn the TCP ULP lookup into an arbitrary module loader as
      we only intend to load ULP modules through this mechanism, not other
      unrelated kernel modules:
      
        [root@bar]# cat foo.c
        #include <sys/types.h>
        #include <sys/socket.h>
        #include <linux/tcp.h>
        #include <linux/in.h>
      
        int main(void)
        {
            int sock = socket(PF_INET, SOCK_STREAM, 0);
            setsockopt(sock, IPPROTO_TCP, TCP_ULP, "sctp", sizeof("sctp"));
            return 0;
        }
      
        [root@bar]# gcc foo.c -O2 -Wall
        [root@bar]# lsmod | grep sctp
        [root@bar]# ./a.out
        [root@bar]# lsmod | grep sctp
        sctp                 1077248  4
        libcrc32c              16384  3 nf_conntrack,nf_nat,sctp
        [root@bar]#
      
      Fix it by adding module alias to TCP ULP modules, so probing module
      via request_module() will be limited to tcp-ulp-[name]. The existing
      modules like kTLS will load fine given tcp-ulp-tls alias, but others
      will fail to load:
      
        [root@bar]# lsmod | grep sctp
        [root@bar]# ./a.out
        [root@bar]# lsmod | grep sctp
        [root@bar]#
      
      Sockmap is not affected from this since it's either built-in or not.
      
      Fixes: 734942cc ("tcp: ULP infrastructure")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      037b0b86
    • F
      netfilter: nf_tables: fix register ordering · d209df3e
      Florian Westphal 提交于
      We must register nfnetlink ops last, as that exposes nf_tables to
      userspace.  Without this, we could theoretically get nfnetlink request
      before net->nft state has been initialized.
      
      Fixes: 99633ab2 ("netfilter: nf_tables: complete net namespace support")
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d209df3e
    • T
      netfilter: nft_set: fix allocation size overflow in privsize callback. · 4ef360dd
      Taehee Yoo 提交于
      In order to determine allocation size of set, ->privsize is invoked.
      At this point, both desc->size and size of each data structure of set
      are used. desc->size means number of element that is given by user.
      desc->size is u32 type. so that upperlimit of set element is 4294967295.
      but return type of ->privsize is also u32. hence overflow can occurred.
      
      test commands:
         %nft add table ip filter
         %nft add set ip filter hash1 { type ipv4_addr \; size 4294967295 \; }
         %nft list ruleset
      
      splat looks like:
      [ 1239.202910] kasan: CONFIG_KASAN_INLINE enabled
      [ 1239.208788] kasan: GPF could be caused by NULL-ptr deref or user memory access
      [ 1239.217625] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [ 1239.219329] CPU: 0 PID: 1603 Comm: nft Not tainted 4.18.0-rc5+ #7
      [ 1239.229091] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set]
      [ 1239.229091] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16
      [ 1239.229091] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246
      [ 1239.229091] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001
      [ 1239.229091] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410
      [ 1239.229091] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030
      [ 1239.229091] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0
      [ 1239.229091] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000
      [ 1239.229091] FS:  00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
      [ 1239.229091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1239.229091] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0
      [ 1239.229091] Call Trace:
      [ 1239.229091]  ? nft_hash_remove+0xf0/0xf0 [nf_tables_set]
      [ 1239.229091]  ? memset+0x1f/0x40
      [ 1239.229091]  ? __nla_reserve+0x9f/0xb0
      [ 1239.229091]  ? memcpy+0x34/0x50
      [ 1239.229091]  nf_tables_dump_set+0x9a1/0xda0 [nf_tables]
      [ 1239.229091]  ? __kmalloc_reserve.isra.29+0x2e/0xa0
      [ 1239.229091]  ? nft_chain_hash_obj+0x630/0x630 [nf_tables]
      [ 1239.229091]  ? nf_tables_commit+0x2c60/0x2c60 [nf_tables]
      [ 1239.229091]  netlink_dump+0x470/0xa20
      [ 1239.229091]  __netlink_dump_start+0x5ae/0x690
      [ 1239.229091]  nft_netlink_dump_start_rcu+0xd1/0x160 [nf_tables]
      [ 1239.229091]  nf_tables_getsetelem+0x2e5/0x4b0 [nf_tables]
      [ 1239.229091]  ? nft_get_set_elem+0x440/0x440 [nf_tables]
      [ 1239.229091]  ? nft_chain_hash_obj+0x630/0x630 [nf_tables]
      [ 1239.229091]  ? nf_tables_dump_obj_done+0x70/0x70 [nf_tables]
      [ 1239.229091]  ? nla_parse+0xab/0x230
      [ 1239.229091]  ? nft_get_set_elem+0x440/0x440 [nf_tables]
      [ 1239.229091]  nfnetlink_rcv_msg+0x7f0/0xab0 [nfnetlink]
      [ 1239.229091]  ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
      [ 1239.229091]  ? debug_show_all_locks+0x290/0x290
      [ 1239.229091]  ? sched_clock_cpu+0x132/0x170
      [ 1239.229091]  ? find_held_lock+0x39/0x1b0
      [ 1239.229091]  ? sched_clock_local+0x10d/0x130
      [ 1239.229091]  netlink_rcv_skb+0x211/0x320
      [ 1239.229091]  ? nfnetlink_bind+0x1d0/0x1d0 [nfnetlink]
      [ 1239.229091]  ? netlink_ack+0x7b0/0x7b0
      [ 1239.229091]  ? ns_capable_common+0x6e/0x110
      [ 1239.229091]  nfnetlink_rcv+0x2d1/0x310 [nfnetlink]
      [ 1239.229091]  ? nfnetlink_rcv_batch+0x10f0/0x10f0 [nfnetlink]
      [ 1239.229091]  ? netlink_deliver_tap+0x829/0x930
      [ 1239.229091]  ? lock_acquire+0x265/0x2e0
      [ 1239.229091]  netlink_unicast+0x406/0x520
      [ 1239.509725]  ? netlink_attachskb+0x5b0/0x5b0
      [ 1239.509725]  ? find_held_lock+0x39/0x1b0
      [ 1239.509725]  netlink_sendmsg+0x987/0xa20
      [ 1239.509725]  ? netlink_unicast+0x520/0x520
      [ 1239.509725]  ? _copy_from_user+0xa9/0xc0
      [ 1239.509725]  __sys_sendto+0x21a/0x2c0
      [ 1239.509725]  ? __ia32_sys_getpeername+0xa0/0xa0
      [ 1239.509725]  ? retint_kernel+0x10/0x10
      [ 1239.509725]  ? sched_clock_cpu+0x132/0x170
      [ 1239.509725]  ? find_held_lock+0x39/0x1b0
      [ 1239.509725]  ? lock_downgrade+0x540/0x540
      [ 1239.509725]  ? up_read+0x1c/0x100
      [ 1239.509725]  ? __do_page_fault+0x763/0x970
      [ 1239.509725]  ? retint_user+0x18/0x18
      [ 1239.509725]  __x64_sys_sendto+0x177/0x180
      [ 1239.509725]  do_syscall_64+0xaa/0x360
      [ 1239.509725]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 1239.509725] RIP: 0033:0x7f5a8f468e03
      [ 1239.509725] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb d0 0f 1f 84 00 00 00 00 00 83 3d 49 c9 2b 00 00 75 13 49 89 ca b8 2c 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 34 c3 48 83 ec 08 e8
      [ 1239.509725] RSP: 002b:00007ffd78d0b778 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
      [ 1239.509725] RAX: ffffffffffffffda RBX: 00007ffd78d0c890 RCX: 00007f5a8f468e03
      [ 1239.509725] RDX: 0000000000000034 RSI: 00007ffd78d0b7e0 RDI: 0000000000000003
      [ 1239.509725] RBP: 00007ffd78d0b7d0 R08: 00007f5a8f15c160 R09: 000000000000000c
      [ 1239.509725] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd78d0b7e0
      [ 1239.509725] R13: 0000000000000034 R14: 00007f5a8f9aff60 R15: 00005648040094b0
      [ 1239.509725] Modules linked in: nf_tables_set nf_tables nfnetlink ip_tables x_tables
      [ 1239.670713] ---[ end trace 39375adcda140f11 ]---
      [ 1239.676016] RIP: 0010:nft_hash_walk+0x1d2/0x310 [nf_tables_set]
      [ 1239.682834] Code: 84 d2 7f 10 4c 89 e7 89 44 24 38 e8 d8 5a 17 e0 8b 44 24 38 48 8d 7b 10 41 0f b6 0c 24 48 89 fa 48 89 fe 48 c1 ea 03 83 e6 07 <42> 0f b6 14 3a 40 38 f2 7f 1a 84 d2 74 16
      [ 1239.705108] RSP: 0018:ffff8801118cf358 EFLAGS: 00010246
      [ 1239.711115] RAX: 0000000000000000 RBX: 0000000000020400 RCX: 0000000000000001
      [ 1239.719269] RDX: 0000000000004082 RSI: 0000000000000000 RDI: 0000000000020410
      [ 1239.727401] RBP: ffff880114d5a988 R08: 0000000000007e94 R09: ffff880114dd8030
      [ 1239.735530] R10: ffff880114d5a988 R11: ffffed00229bb006 R12: ffff8801118cf4d0
      [ 1239.743658] R13: ffff8801118cf4d8 R14: 0000000000000000 R15: dffffc0000000000
      [ 1239.751785] FS:  00007f5a8fe0b700(0000) GS:ffff88011b600000(0000) knlGS:0000000000000000
      [ 1239.760993] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1239.767560] CR2: 00007f5a8ecc27b0 CR3: 000000010608e000 CR4: 00000000001006f0
      [ 1239.775679] Kernel panic - not syncing: Fatal exception
      [ 1239.776630] Kernel Offset: 0x1f000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      [ 1239.776630] Rebooting in 5 seconds..
      
      Fixes: 20a69341 ("netfilter: nf_tables: add netlink set API")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      4ef360dd
  26. 15 8月, 2018 1 次提交
  27. 13 8月, 2018 1 次提交
    • V
      ipv6: Add icmp_echo_ignore_all support for ICMPv6 · e6f86b0f
      Virgile Jarry 提交于
      Preventing the kernel from responding to ICMP Echo Requests messages
      can be useful in several ways. The sysctl parameter
      'icmp_echo_ignore_all' can be used to prevent the kernel from
      responding to IPv4 ICMP echo requests. For IPv6 pings, such
      a sysctl kernel parameter did not exist.
      
      Add the ability to prevent the kernel from responding to IPv6
      ICMP echo requests through the use of the following sysctl
      parameter : /proc/sys/net/ipv6/icmp/echo_ignore_all.
      Update the documentation to reflect this change.
      Signed-off-by: NVirgile Jarry <virgile@acceis.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6f86b0f