1. 20 8月, 2020 1 次提交
  2. 12 8月, 2020 1 次提交
  3. 08 8月, 2020 1 次提交
    • W
      mm, treewide: rename kzfree() to kfree_sensitive() · 453431a5
      Waiman Long 提交于
      As said by Linus:
      
        A symmetric naming is only helpful if it implies symmetries in use.
        Otherwise it's actively misleading.
      
        In "kzalloc()", the z is meaningful and an important part of what the
        caller wants.
      
        In "kzfree()", the z is actively detrimental, because maybe in the
        future we really _might_ want to use that "memfill(0xdeadbeef)" or
        something. The "zero" part of the interface isn't even _relevant_.
      
      The main reason that kzfree() exists is to clear sensitive information
      that should not be leaked to other future users of the same memory
      objects.
      
      Rename kzfree() to kfree_sensitive() to follow the example of the recently
      added kvfree_sensitive() and make the intention of the API more explicit.
      In addition, memzero_explicit() is used to clear the memory to make sure
      that it won't get optimized away by the compiler.
      
      The renaming is done by using the command sequence:
      
        git grep -w --name-only kzfree |\
        xargs sed -i 's/kzfree/kfree_sensitive/'
      
      followed by some editing of the kfree_sensitive() kerneldoc and adding
      a kzfree backward compatibility macro in slab.h.
      
      [akpm@linux-foundation.org: fs/crypto/inline_crypt.c needs linux/slab.h]
      [akpm@linux-foundation.org: fix fs/crypto/inline_crypt.c some more]
      Suggested-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: "Serge E. Hallyn" <serge@hallyn.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
      Link: http://lkml.kernel.org/r/20200616154311.12314-3-longman@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      453431a5
  4. 06 8月, 2020 1 次提交
  5. 25 7月, 2020 5 次提交
  6. 23 7月, 2020 1 次提交
  7. 20 7月, 2020 4 次提交
  8. 14 7月, 2020 1 次提交
  9. 10 7月, 2020 1 次提交
  10. 08 7月, 2020 1 次提交
  11. 30 6月, 2020 1 次提交
  12. 25 6月, 2020 1 次提交
  13. 24 6月, 2020 1 次提交
  14. 19 6月, 2020 1 次提交
    • E
      net: increment xmit_recursion level in dev_direct_xmit() · 0ad6f6e7
      Eric Dumazet 提交于
      Back in commit f60e5990 ("ipv6: protect skb->sk accesses
      from recursive dereference inside the stack") Hannes added code
      so that IPv6 stack would not trust skb->sk for typical cases
      where packet goes through 'standard' xmit path (__dev_queue_xmit())
      
      Alas af_packet had a dev_direct_xmit() path that was not
      dealing yet with xmit_recursion level.
      
      Also change sk_mc_loop() to dump a stack once only.
      
      Without this patch, syzbot was able to trigger :
      
      [1]
      [  153.567378] WARNING: CPU: 7 PID: 11273 at net/core/sock.c:721 sk_mc_loop+0x51/0x70
      [  153.567378] Modules linked in: nfnetlink ip6table_raw ip6table_filter iptable_raw iptable_nat nf_nat nf_conntrack nf_defrag_ipv4 nf_defrag_ipv6 iptable_filter macsec macvtap tap macvlan 8021q hsr wireguard libblake2s blake2s_x86_64 libblake2s_generic udp_tunnel ip6_udp_tunnel libchacha20poly1305 poly1305_x86_64 chacha_x86_64 libchacha curve25519_x86_64 libcurve25519_generic netdevsim batman_adv dummy team bridge stp llc w1_therm wire i2c_mux_pca954x i2c_mux cdc_acm ehci_pci ehci_hcd mlx4_en mlx4_ib ib_uverbs ib_core mlx4_core
      [  153.567386] CPU: 7 PID: 11273 Comm: b159172088 Not tainted 5.8.0-smp-DEV #273
      [  153.567387] RIP: 0010:sk_mc_loop+0x51/0x70
      [  153.567388] Code: 66 83 f8 0a 75 24 0f b6 4f 12 b8 01 00 00 00 31 d2 d3 e0 a9 bf ef ff ff 74 07 48 8b 97 f0 02 00 00 0f b6 42 3a 83 e0 01 5d c3 <0f> 0b b8 01 00 00 00 5d c3 0f b6 87 18 03 00 00 5d c0 e8 04 83 e0
      [  153.567388] RSP: 0018:ffff95c69bb93990 EFLAGS: 00010212
      [  153.567388] RAX: 0000000000000011 RBX: ffff95c6e0ee3e00 RCX: 0000000000000007
      [  153.567389] RDX: ffff95c69ae50000 RSI: ffff95c6c30c3000 RDI: ffff95c6c30c3000
      [  153.567389] RBP: ffff95c69bb93990 R08: ffff95c69a77f000 R09: 0000000000000008
      [  153.567389] R10: 0000000000000040 R11: 00003e0e00026128 R12: ffff95c6c30c3000
      [  153.567390] R13: ffff95c6cc4fd500 R14: ffff95c6f84500c0 R15: ffff95c69aa13c00
      [  153.567390] FS:  00007fdc3a283700(0000) GS:ffff95c6ff9c0000(0000) knlGS:0000000000000000
      [  153.567390] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  153.567391] CR2: 00007ffee758e890 CR3: 0000001f9ba20003 CR4: 00000000001606e0
      [  153.567391] Call Trace:
      [  153.567391]  ip6_finish_output2+0x34e/0x550
      [  153.567391]  __ip6_finish_output+0xe7/0x110
      [  153.567391]  ip6_finish_output+0x2d/0xb0
      [  153.567392]  ip6_output+0x77/0x120
      [  153.567392]  ? __ip6_finish_output+0x110/0x110
      [  153.567392]  ip6_local_out+0x3d/0x50
      [  153.567392]  ipvlan_queue_xmit+0x56c/0x5e0
      [  153.567393]  ? ksize+0x19/0x30
      [  153.567393]  ipvlan_start_xmit+0x18/0x50
      [  153.567393]  dev_direct_xmit+0xf3/0x1c0
      [  153.567393]  packet_direct_xmit+0x69/0xa0
      [  153.567394]  packet_sendmsg+0xbf0/0x19b0
      [  153.567394]  ? plist_del+0x62/0xb0
      [  153.567394]  sock_sendmsg+0x65/0x70
      [  153.567394]  sock_write_iter+0x93/0xf0
      [  153.567394]  new_sync_write+0x18e/0x1a0
      [  153.567395]  __vfs_write+0x29/0x40
      [  153.567395]  vfs_write+0xb9/0x1b0
      [  153.567395]  ksys_write+0xb1/0xe0
      [  153.567395]  __x64_sys_write+0x1a/0x20
      [  153.567395]  do_syscall_64+0x43/0x70
      [  153.567396]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  153.567396] RIP: 0033:0x453549
      [  153.567396] Code: Bad RIP value.
      [  153.567396] RSP: 002b:00007fdc3a282cc8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [  153.567397] RAX: ffffffffffffffda RBX: 00000000004d32d0 RCX: 0000000000453549
      [  153.567397] RDX: 0000000000000020 RSI: 0000000020000300 RDI: 0000000000000003
      [  153.567398] RBP: 00000000004d32d8 R08: 0000000000000000 R09: 0000000000000000
      [  153.567398] R10: 0000000000000000 R11: 0000000000000246 R12: 00000000004d32dc
      [  153.567398] R13: 00007ffee742260f R14: 00007fdc3a282dc0 R15: 00007fdc3a283700
      [  153.567399] ---[ end trace c1d5ae2b1059ec62 ]---
      
      f60e5990 ("ipv6: protect skb->sk accesses from recursive dereference inside the stack")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0ad6f6e7
  15. 02 6月, 2020 1 次提交
  16. 30 5月, 2020 1 次提交
  17. 29 5月, 2020 9 次提交
  18. 08 5月, 2020 1 次提交
  19. 26 4月, 2020 1 次提交
  20. 03 4月, 2020 2 次提交
    • V
      net: core: enable SO_BINDTODEVICE for non-root users · c427bfec
      Vincent Bernat 提交于
      Currently, SO_BINDTODEVICE requires CAP_NET_RAW. This change allows a
      non-root user to bind a socket to an interface if it is not already
      bound. This is useful to allow an application to bind itself to a
      specific VRF for outgoing or incoming connections. Currently, an
      application wanting to manage connections through several VRF need to
      be privileged.
      
      Previously, IP_UNICAST_IF and IPV6_UNICAST_IF were added for
      Wine (76e21053 and c4062dfc) specifically for use by
      non-root processes. However, they are restricted to sendmsg() and not
      usable with TCP. Allowing SO_BINDTODEVICE would allow TCP clients to
      get the same privilege. As for TCP servers, outside the VRF use case,
      SO_BINDTODEVICE would only further restrict connections a server could
      accept.
      
      When an application is restricted to a VRF (with `ip vrf exec`), the
      socket is bound to an interface at creation and therefore, a
      non-privileged call to SO_BINDTODEVICE to escape the VRF fails.
      
      When an application bound a socket to SO_BINDTODEVICE and transmit it
      to a non-privileged process through a Unix socket, a tentative to
      change the bound device also fails.
      
      Before:
      
          >>> import socket
          >>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
          >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          PermissionError: [Errno 1] Operation not permitted
      
      After:
      
          >>> import socket
          >>> s=socket.socket(socket.AF_INET, socket.SOCK_STREAM)
          >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
          >>> s.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, b"dummy0")
          Traceback (most recent call last):
            File "<stdin>", line 1, in <module>
          PermissionError: [Errno 1] Operation not permitted
      Signed-off-by: NVincent Bernat <vincent@bernat.ch>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c427bfec
    • J
      net, sk_msg: Don't use RCU_INIT_POINTER on sk_user_data · 7a1ca972
      Jakub Sitnicki 提交于
      sparse reports an error due to use of RCU_INIT_POINTER helper to assign to
      sk_user_data pointer, which is not tagged with __rcu:
      
      net/core/sock.c:1875:25: error: incompatible types in comparison expression (different address spaces):
      net/core/sock.c:1875:25:    void [noderef] <asn:4> *
      net/core/sock.c:1875:25:    void *
      
      ... and rightfully so. sk_user_data is not always treated as a pointer to
      an RCU-protected data. When it is used to point at an RCU-protected object,
      we access it with __sk_user_data to inform sparse about it.
      
      In this case, when the child socket does not inherit sk_user_data from the
      parent, there is no reason to treat it as an RCU-protected pointer.
      
      Use a regular assignment to clear the pointer value.
      
      Fixes: f1ff5ce2 ("net, sk_msg: Clear sk_user_data pointer on clone if tagged")
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Link: https://lore.kernel.org/bpf/20200402125524.851439-1-jakub@cloudflare.com
      7a1ca972
  21. 31 3月, 2020 2 次提交
  22. 11 3月, 2020 1 次提交
    • S
      net: memcg: late association of sock to memcg · d752a498
      Shakeel Butt 提交于
      If a TCP socket is allocated in IRQ context or cloned from unassociated
      (i.e. not associated to a memcg) in IRQ context then it will remain
      unassociated for its whole life. Almost half of the TCPs created on the
      system are created in IRQ context, so, memory used by such sockets will
      not be accounted by the memcg.
      
      This issue is more widespread in cgroup v1 where network memory
      accounting is opt-in but it can happen in cgroup v2 if the source socket
      for the cloning was created in root memcg.
      
      To fix the issue, just do the association of the sockets at the accept()
      time in the process context and then force charge the memory buffer
      already used and reserved by the socket.
      Signed-off-by: NShakeel Butt <shakeelb@google.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d752a498
  23. 22 2月, 2020 1 次提交
    • J
      net, sk_msg: Clear sk_user_data pointer on clone if tagged · f1ff5ce2
      Jakub Sitnicki 提交于
      sk_user_data can hold a pointer to an object that is not intended to be
      shared between the parent socket and the child that gets a pointer copy on
      clone. This is the case when sk_user_data points at reference-counted
      object, like struct sk_psock.
      
      One way to resolve it is to tag the pointer with a no-copy flag by
      repurposing its lowest bit. Based on the bit-flag value we clear the child
      sk_user_data pointer after cloning the parent socket.
      
      The no-copy flag is stored in the pointer itself as opposed to externally,
      say in socket flags, to guarantee that the pointer and the flag are copied
      from parent to child socket in an atomic fashion. Parent socket state is
      subject to change while copying, we don't hold any locks at that time.
      
      This approach relies on an assumption that sk_user_data holds a pointer to
      an object aligned at least 2 bytes. A manual audit of existing users of
      rcu_dereference_sk_user_data helper confirms our assumption.
      
      Also, an RCU-protected sk_user_data is not likely to hold a pointer to a
      char value or a pathological case of "struct { char c; }". To be safe, warn
      when the flag-bit is set when setting sk_user_data to catch any future
      misuses.
      
      It is worth considering why clearing sk_user_data unconditionally is not an
      option. There exist users, DRBD, NVMe, and Xen drivers being among them,
      that rely on the pointer being copied when cloning the listening socket.
      
      Potentially we could distinguish these users by checking if the listening
      socket has been created in kernel-space via sock_create_kern, and hence has
      sk_kern_sock flag set. However, this is not the case for NVMe and Xen
      drivers, which create sockets without marking them as belonging to the
      kernel.
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Link: https://lore.kernel.org/bpf/20200218171023.844439-3-jakub@cloudflare.com
      f1ff5ce2