1. 27 12月, 2019 4 次提交
    • J
      af_key: fix leaks in key_pol_get_resp and dump_sp. · 89e5d0f5
      Jeremy Sowden 提交于
      [ Upstream commit 7c80eb1c7e2b8420477fbc998971d62a648035d9 ]
      
      In both functions, if pfkey_xfrm_policy2msg failed we leaked the newly
      allocated sk_buff.  Free it on error.
      
      Fixes: 55569ce2 ("Fix conversion between IPSEC_MODE_xxx and XFRM_MODE_xxx.")
      Reported-by: syzbot+4f0529365f7f2208d9f0@syzkaller.appspotmail.com
      Signed-off-by: NJeremy Sowden <jeremy@azazel.net>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      89e5d0f5
    • C
      xfrm: clean up xfrm protocol checks · 63277674
      Cong Wang 提交于
      mainline inclusion
      from mainline-5.1
      commit dbb2483b
      category: bugfix
      bugzilla: 14337
      CVE: NA
      
      -------------------------------------------------
      
      In commit 6a53b759 ("xfrm: check id proto in validate_tmpl()")
      I introduced a check for xfrm protocol, but according to Herbert
      IPSEC_PROTO_ANY should only be used as a wildcard for lookup, so
      it should be removed from validate_tmpl().
      
      And, IPSEC_PROTO_ANY is expected to only match 3 IPSec-specific
      protocols, this is why xfrm_state_flush() could still miss
      IPPROTO_ROUTING, which leads that those entries are left in
      net->xfrm.state_all before exit net. Fix this by replacing
      IPSEC_PROTO_ANY with zero.
      
      This patch also extracts the check from validate_tmpl() to
      xfrm_id_proto_valid() and uses it in parse_ipsecrequest().
      With this, no other protocols should be added into xfrm.
      
      Fixes: 6a53b759 ("xfrm: check id proto in validate_tmpl()")
      Reported-by: syzbot+0bf0519d6e0de15914fe@syzkaller.appspotmail.com
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NShijie Luo <luoshijie1@huawei.com>
      Reviewed-by: NYue Haibing <yuehaibing@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      63277674
    • C
      xfrm: destroy xfrm_state synchronously on net exit path · d985029e
      Cong Wang 提交于
      mainline inclusion
      from mainline-5.0
      commit f75a2804
      category: bugfix
      bugzilla: 10919
      CVE: NA
      
      -------------------------------------------------
      
      xfrm_state_put() moves struct xfrm_state to the GC list
      and schedules the GC work to clean it up. On net exit call
      path, xfrm_state_flush() is called to clean up and
      xfrm_flush_gc() is called to wait for the GC work to complete
      before exit.
      
      However, this doesn't work because one of the ->destructor(),
      ipcomp_destroy(), schedules the same GC work again inside
      the GC work. It is hard to wait for such a nested async
      callback. This is also why syzbot still reports the following
      warning:
      
       WARNING: CPU: 1 PID: 33 at net/ipv6/xfrm6_tunnel.c:351 xfrm6_tunnel_net_exit+0x2cb/0x500 net/ipv6/xfrm6_tunnel.c:351
       ...
        ops_exit_list.isra.0+0xb0/0x160 net/core/net_namespace.c:153
        cleanup_net+0x51d/0xb10 net/core/net_namespace.c:551
        process_one_work+0xd0c/0x1ce0 kernel/workqueue.c:2153
        worker_thread+0x143/0x14a0 kernel/workqueue.c:2296
        kthread+0x357/0x430 kernel/kthread.c:246
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:352
      
      In fact, it is perfectly fine to bypass GC and destroy xfrm_state
      synchronously on net exit call path, because it is in process context
      and doesn't need a work struct to do any blocking work.
      
      This patch introduces xfrm_state_put_sync() which simply bypasses
      GC, and lets its callers to decide whether to use this synchronous
      version. On net exit path, xfrm_state_fini() and
      xfrm6_tunnel_net_exit() use it. And, as ipcomp_destroy() itself is
      blocking, it can use xfrm_state_put_sync() directly too.
      
      Also rename xfrm_state_gc_destroy() to ___xfrm_state_destroy() to
      reflect this change.
      
      Fixes: b48c05ab ("xfrm: Fix warning in xfrm6_tunnel_net_exit.")
      Reported-and-tested-by: syzbot+e9aebef558e3ed673934@syzkaller.appspotmail.com
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: NWenan Mao <maowenan@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      d985029e
    • S
      af_key: unconditionally clone on broadcast · 7daeb62c
      Sean Tranchetti 提交于
      mainline inclusion
      from mainline-5.0
      commit fc2d5cfd
      category: bugfix
      bugzilla: 10929
      CVE: NA
      
      -------------------------------------------------
      
      Attempting to avoid cloning the skb when broadcasting by inflating
      the refcount with sock_hold/sock_put while under RCU lock is dangerous
      and violates RCU principles. It leads to subtle race conditions when
      attempting to free the SKB, as we may reference sockets that have
      already been freed by the stack.
      
      Unable to handle kernel paging request at virtual address 6b6b6b6b6b6c4b
      [006b6b6b6b6b6c4b] address between user and kernel address ranges
      Internal error: Oops: 96000004 [#1] PREEMPT SMP
      task: fffffff78f65b380 task.stack: ffffff8049a88000
      pc : sock_rfree+0x38/0x6c
      lr : skb_release_head_state+0x6c/0xcc
      Process repro (pid: 7117, stack limit = 0xffffff8049a88000)
      Call trace:
      	sock_rfree+0x38/0x6c
      	skb_release_head_state+0x6c/0xcc
      	skb_release_all+0x1c/0x38
      	__kfree_skb+0x1c/0x30
      	kfree_skb+0xd0/0xf4
      	pfkey_broadcast+0x14c/0x18c
      	pfkey_sendmsg+0x1d8/0x408
      	sock_sendmsg+0x44/0x60
      	___sys_sendmsg+0x1d0/0x2a8
      	__sys_sendmsg+0x64/0xb4
      	SyS_sendmsg+0x34/0x4c
      	el0_svc_naked+0x34/0x38
      Kernel panic - not syncing: Fatal exception
      Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: NSean Tranchetti <stranche@codeaurora.org>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NZhiqiang Liu <liuzhiqiang26@huawei.com>
      Reviewed-by: NWenan Mao <maowenan@huawei.com>
      Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
      7daeb62c
  2. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  3. 23 6月, 2018 1 次提交
  4. 26 5月, 2018 1 次提交
  5. 16 5月, 2018 1 次提交
  6. 09 4月, 2018 1 次提交
    • K
      af_key: Always verify length of provided sadb_key · 4b66af2d
      Kevin Easton 提交于
      Key extensions (struct sadb_key) include a user-specified number of key
      bits.  The kernel uses that number to determine how much key data to copy
      out of the message in pfkey_msg2xfrm_state().
      
      The length of the sadb_key message must be verified to be long enough,
      even in the case of SADB_X_AALG_NULL.  Furthermore, the sadb_key_len value
      must be long enough to include both the key data and the struct sadb_key
      itself.
      
      Introduce a helper function verify_key_len(), and call it from
      parse_exthdrs() where other exthdr types are similarly checked for
      correctness.
      Signed-off-by: NKevin Easton <kevin@guarana.org>
      Reported-by: syzbot+5022a34ca5a3d49b84223653fab632dfb7b4cf37@syzkaller.appspotmail.com
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      4b66af2d
  7. 28 3月, 2018 1 次提交
  8. 28 2月, 2018 1 次提交
  9. 10 1月, 2018 1 次提交
  10. 30 12月, 2017 2 次提交
    • E
      af_key: fix buffer overread in parse_exthdrs() · 4e765b49
      Eric Biggers 提交于
      If a message sent to a PF_KEY socket ended with an incomplete extension
      header (fewer than 4 bytes remaining), then parse_exthdrs() read past
      the end of the message, into uninitialized memory.  Fix it by returning
      -EINVAL in this case.
      
      Reproducer:
      
      	#include <linux/pfkeyv2.h>
      	#include <sys/socket.h>
      	#include <unistd.h>
      
      	int main()
      	{
      		int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
      		char buf[17] = { 0 };
      		struct sadb_msg *msg = (void *)buf;
      
      		msg->sadb_msg_version = PF_KEY_V2;
      		msg->sadb_msg_type = SADB_DELETE;
      		msg->sadb_msg_len = 2;
      
      		write(sock, buf, 17);
      	}
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      4e765b49
    • E
      af_key: fix buffer overread in verify_address_len() · 06b335cb
      Eric Biggers 提交于
      If a message sent to a PF_KEY socket ended with one of the extensions
      that takes a 'struct sadb_address' but there were not enough bytes
      remaining in the message for the ->sa_family member of the 'struct
      sockaddr' which is supposed to follow, then verify_address_len() read
      past the end of the message, into uninitialized memory.  Fix it by
      returning -EINVAL in this case.
      
      This bug was found using syzkaller with KMSAN.
      
      Reproducer:
      
      	#include <linux/pfkeyv2.h>
      	#include <sys/socket.h>
      	#include <unistd.h>
      
      	int main()
      	{
      		int sock = socket(PF_KEY, SOCK_RAW, PF_KEY_V2);
      		char buf[24] = { 0 };
      		struct sadb_msg *msg = (void *)buf;
      		struct sadb_address *addr = (void *)(msg + 1);
      
      		msg->sadb_msg_version = PF_KEY_V2;
      		msg->sadb_msg_type = SADB_DELETE;
      		msg->sadb_msg_len = 3;
      		addr->sadb_address_len = 1;
      		addr->sadb_address_exttype = SADB_EXT_ADDRESS_SRC;
      
      		write(sock, buf, 24);
      	}
      Reported-by: NAlexander Potapenko <glider@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      06b335cb
  11. 14 11月, 2017 1 次提交
  12. 15 8月, 2017 1 次提交
    • E
      af_key: do not use GFP_KERNEL in atomic contexts · 36f41f8f
      Eric Dumazet 提交于
      pfkey_broadcast() might be called from non process contexts,
      we can not use GFP_KERNEL in these cases [1].
      
      This patch partially reverts commit ba51b6be ("net: Fix RCU splat in
      af_key"), only keeping the GFP_ATOMIC forcing under rcu_read_lock()
      section.
      
      [1] : syzkaller reported :
      
      in_atomic(): 1, irqs_disabled(): 0, pid: 2932, name: syzkaller183439
      3 locks held by syzkaller183439/2932:
       #0:  (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff83b43888>] pfkey_sendmsg+0x4c8/0x9f0 net/key/af_key.c:3649
       #1:  (&pfk->dump_lock){+.+.+.}, at: [<ffffffff83b467f6>] pfkey_do_dump+0x76/0x3f0 net/key/af_key.c:293
       #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] spin_lock_bh include/linux/spinlock.h:304 [inline]
       #2:  (&(&net->xfrm.xfrm_policy_lock)->rlock){+...+.}, at: [<ffffffff83957632>] xfrm_policy_walk+0x192/0xa30 net/xfrm/xfrm_policy.c:1028
      CPU: 0 PID: 2932 Comm: syzkaller183439 Not tainted 4.13.0-rc4+ #24
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:52
       ___might_sleep+0x2b2/0x470 kernel/sched/core.c:5994
       __might_sleep+0x95/0x190 kernel/sched/core.c:5947
       slab_pre_alloc_hook mm/slab.h:416 [inline]
       slab_alloc mm/slab.c:3383 [inline]
       kmem_cache_alloc+0x24b/0x6e0 mm/slab.c:3559
       skb_clone+0x1a0/0x400 net/core/skbuff.c:1037
       pfkey_broadcast_one+0x4b2/0x6f0 net/key/af_key.c:207
       pfkey_broadcast+0x4ba/0x770 net/key/af_key.c:281
       dump_sp+0x3d6/0x500 net/key/af_key.c:2685
       xfrm_policy_walk+0x2f1/0xa30 net/xfrm/xfrm_policy.c:1042
       pfkey_dump_sp+0x42/0x50 net/key/af_key.c:2695
       pfkey_do_dump+0xaa/0x3f0 net/key/af_key.c:299
       pfkey_spddump+0x1a0/0x210 net/key/af_key.c:2722
       pfkey_process+0x606/0x710 net/key/af_key.c:2814
       pfkey_sendmsg+0x4d6/0x9f0 net/key/af_key.c:3650
      sock_sendmsg_nosec net/socket.c:633 [inline]
       sock_sendmsg+0xca/0x110 net/socket.c:643
       ___sys_sendmsg+0x755/0x890 net/socket.c:2035
       __sys_sendmsg+0xe5/0x210 net/socket.c:2069
       SYSC_sendmsg net/socket.c:2080 [inline]
       SyS_sendmsg+0x2d/0x50 net/socket.c:2076
       entry_SYSCALL_64_fastpath+0x1f/0xbe
      RIP: 0033:0x445d79
      RSP: 002b:00007f32447c1dc8 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 0000000000445d79
      RDX: 0000000000000000 RSI: 000000002023dfc8 RDI: 0000000000000008
      RBP: 0000000000000086 R08: 00007f32447c2700 R09: 00007f32447c2700
      R10: 00007f32447c2700 R11: 0000000000000202 R12: 0000000000000000
      R13: 00007ffe33edec4f R14: 00007f32447c29c0 R15: 0000000000000000
      
      Fixes: ba51b6be ("net: Fix RCU splat in af_key")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: David Ahern <dsa@cumulusnetworks.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36f41f8f
  13. 19 7月, 2017 1 次提交
  14. 05 7月, 2017 1 次提交
  15. 01 7月, 2017 3 次提交
  16. 16 6月, 2017 3 次提交
    • J
      networking: make skb_put & friends return void pointers · 4df864c1
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions (skb_put, __skb_put and pskb_put) return void *
      and remove all the casts across the tree, adding a (u8 *) cast only
      where the unsigned char pointer was used directly, all done with the
      following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_put, __skb_put };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_put, __skb_put };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
      which actually doesn't cover pskb_put since there are only three
      users overall.
      
      A handful of stragglers were converted manually, notably a macro in
      drivers/isdn/i4l/isdn_bsdcomp.c and, oddly enough, one of the many
      instances in net/bluetooth/hci_sock.c. In the former file, I also
      had to fix one whitespace problem spatch introduced.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4df864c1
    • J
      networking: introduce and use skb_put_data() · 59ae1d12
      Johannes Berg 提交于
      A common pattern with skb_put() is to just want to memcpy()
      some data into the new space, introduce skb_put_data() for
      this.
      
      An spatch similar to the one for skb_put_zero() converts many
      of the places using it:
      
          @@
          identifier p, p2;
          expression len, skb, data;
          type t, t2;
          @@
          (
          -p = skb_put(skb, len);
          +p = skb_put_data(skb, data, len);
          |
          -p = (t)skb_put(skb, len);
          +p = skb_put_data(skb, data, len);
          )
          (
          p2 = (t2)p;
          -memcpy(p2, data, len);
          |
          -memcpy(p, data, len);
          )
      
          @@
          type t, t2;
          identifier p, p2;
          expression skb, data;
          @@
          t *p;
          ...
          (
          -p = skb_put(skb, sizeof(t));
          +p = skb_put_data(skb, data, sizeof(t));
          |
          -p = (t *)skb_put(skb, sizeof(t));
          +p = skb_put_data(skb, data, sizeof(t));
          )
          (
          p2 = (t2)p;
          -memcpy(p2, data, sizeof(*p));
          |
          -memcpy(p, data, sizeof(*p));
          )
      
          @@
          expression skb, len, data;
          @@
          -memcpy(skb_put(skb, len), data, len);
          +skb_put_data(skb, data, len);
      
      (again, manually post-processed to retain some comments)
      Reviewed-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59ae1d12
    • J
      networking: convert many more places to skb_put_zero() · b080db58
      Johannes Berg 提交于
      There were many places that my previous spatch didn't find,
      as pointed out by yuan linyu in various patches.
      
      The following spatch found many more and also removes the
      now unnecessary casts:
      
          @@
          identifier p, p2;
          expression len;
          expression skb;
          type t, t2;
          @@
          (
          -p = skb_put(skb, len);
          +p = skb_put_zero(skb, len);
          |
          -p = (t)skb_put(skb, len);
          +p = skb_put_zero(skb, len);
          )
          ... when != p
          (
          p2 = (t2)p;
          -memset(p2, 0, len);
          |
          -memset(p, 0, len);
          )
      
          @@
          type t, t2;
          identifier p, p2;
          expression skb;
          @@
          t *p;
          ...
          (
          -p = skb_put(skb, sizeof(t));
          +p = skb_put_zero(skb, sizeof(t));
          |
          -p = (t *)skb_put(skb, sizeof(t));
          +p = skb_put_zero(skb, sizeof(t));
          )
          ... when != p
          (
          p2 = (t2)p;
          -memset(p2, 0, sizeof(*p));
          |
          -memset(p, 0, sizeof(*p));
          )
      
          @@
          expression skb, len;
          @@
          -memset(skb_put(skb, len), 0, len);
          +skb_put_zero(skb, len);
      
      Apply it to the tree (with one manual fixup to keep the
      comment in vxlan.c, which spatch removed.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b080db58
  17. 14 6月, 2017 2 次提交
  18. 12 6月, 2017 1 次提交
    • H
      xfrm: move xfrm_garbage_collect out of xfrm_policy_flush · 138437f5
      Hangbin Liu 提交于
      Now we will force to do garbage collection if any policy removed in
      xfrm_policy_flush(). But during xfrm_net_exit(). We call flow_cache_fini()
      first and set set fc->percpu to NULL. Then after we call xfrm_policy_fini()
      -> frxm_policy_flush() -> flow_cache_flush(), we will get NULL pointer
      dereference when check percpu_empty. The code path looks like:
      
      flow_cache_fini()
        - fc->percpu = NULL
      xfrm_policy_fini()
        - xfrm_policy_flush()
          - xfrm_garbage_collect()
            - flow_cache_flush()
              - flow_cache_percpu_empty()
      	  - fcp = per_cpu_ptr(fc->percpu, cpu)
      
      To reproduce, just add ipsec in netns and then remove the netns.
      
      v2:
      As Xin Long suggested, since only two other places need to call it. move
      xfrm_garbage_collect() outside xfrm_policy_flush().
      
      v3:
      Fix subject mismatch after v2 fix.
      
      Fixes: 35db0691 ("xfrm: do the garbage collection after flushing policy")
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      138437f5
  19. 07 6月, 2017 2 次提交
  20. 08 5月, 2017 1 次提交
  21. 18 4月, 2017 1 次提交
    • H
      af_key: Fix sadb_x_ipsecrequest parsing · 096f41d3
      Herbert Xu 提交于
      The parsing of sadb_x_ipsecrequest is broken in a number of ways.
      First of all we're not verifying sadb_x_ipsecrequest_len.  This
      is needed when the structure carries addresses at the end.  Worse
      we don't even look at the length when we parse those optional
      addresses.
      
      The migration code had similar parsing code that's better but
      it also has some deficiencies.  The length is overcounted first
      of all as it includes the header itself.  It also fails to check
      the length before dereferencing the sa_family field.
      
      This patch fixes those problems in parse_sockaddr_pair and then
      uses it in parse_ipsecrequest.
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
      096f41d3
  22. 03 4月, 2017 1 次提交
  23. 24 3月, 2017 1 次提交
  24. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  25. 23 10月, 2015 1 次提交
  26. 25 8月, 2015 1 次提交
    • D
      net: Fix RCU splat in af_key · ba51b6be
      David Ahern 提交于
      Hit the following splat testing VRF change for ipsec:
      
      [  113.475692] ===============================
      [  113.476194] [ INFO: suspicious RCU usage. ]
      [  113.476667] 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED Not tainted
      [  113.477545] -------------------------------
      [  113.478013] /work/monster-14/dsa/kernel.git/include/linux/rcupdate.h:568 Illegal context switch in RCU read-side critical section!
      [  113.479288]
      [  113.479288] other info that might help us debug this:
      [  113.479288]
      [  113.480207]
      [  113.480207] rcu_scheduler_active = 1, debug_locks = 1
      [  113.480931] 2 locks held by setkey/6829:
      [  113.481371]  #0:  (&net->xfrm.xfrm_cfg_mutex){+.+.+.}, at: [<ffffffff814e9887>] pfkey_sendmsg+0xfb/0x213
      [  113.482509]  #1:  (rcu_read_lock){......}, at: [<ffffffff814e767f>] rcu_read_lock+0x0/0x6e
      [  113.483509]
      [  113.483509] stack backtrace:
      [  113.484041] CPU: 0 PID: 6829 Comm: setkey Not tainted 4.2.0-rc6-1+deb7u2+clUNRELEASED #3.2.65-1+deb7u2+clUNRELEASED
      [  113.485422] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
      [  113.486845]  0000000000000001 ffff88001d4c7a98 ffffffff81518af2 ffffffff81086962
      [  113.487732]  ffff88001d538480 ffff88001d4c7ac8 ffffffff8107ae75 ffffffff8180a154
      [  113.488628]  0000000000000b30 0000000000000000 00000000000000d0 ffff88001d4c7ad8
      [  113.489525] Call Trace:
      [  113.489813]  [<ffffffff81518af2>] dump_stack+0x4c/0x65
      [  113.490389]  [<ffffffff81086962>] ? console_unlock+0x3d6/0x405
      [  113.491039]  [<ffffffff8107ae75>] lockdep_rcu_suspicious+0xfa/0x103
      [  113.491735]  [<ffffffff81064032>] rcu_preempt_sleep_check+0x45/0x47
      [  113.492442]  [<ffffffff8106404d>] ___might_sleep+0x19/0x1c8
      [  113.493077]  [<ffffffff81064268>] __might_sleep+0x6c/0x82
      [  113.493681]  [<ffffffff81133190>] cache_alloc_debugcheck_before.isra.50+0x1d/0x24
      [  113.494508]  [<ffffffff81134876>] kmem_cache_alloc+0x31/0x18f
      [  113.495149]  [<ffffffff814012b5>] skb_clone+0x64/0x80
      [  113.495712]  [<ffffffff814e6f71>] pfkey_broadcast_one+0x3d/0xff
      [  113.496380]  [<ffffffff814e7b84>] pfkey_broadcast+0xb5/0x11e
      [  113.497024]  [<ffffffff814e82d1>] pfkey_register+0x191/0x1b1
      [  113.497653]  [<ffffffff814e9770>] pfkey_process+0x162/0x17e
      [  113.498274]  [<ffffffff814e9895>] pfkey_sendmsg+0x109/0x213
      
      In pfkey_sendmsg the net mutex is taken and then pfkey_broadcast takes
      the RCU lock.
      
      Since pfkey_broadcast takes the RCU lock the allocation argument is
      pointless since GFP_ATOMIC must be used between the rcu_read_{,un}lock.
      The one call outside of rcu can be done with GFP_KERNEL.
      
      Fixes: 7f6b9dbd ("af_key: locking change")
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba51b6be
  27. 28 5月, 2015 1 次提交
  28. 11 5月, 2015 1 次提交
  29. 01 4月, 2015 1 次提交
  30. 03 3月, 2015 1 次提交