1. 08 7月, 2019 4 次提交
    • T
      gtp: fix Illegal context switch in RCU read-side critical section. · 3f167e19
      Taehee Yoo 提交于
      ipv4_pdp_add() is called in RCU read-side critical section.
      So GFP_KERNEL should not be used in the function.
      This patch make ipv4_pdp_add() to use GFP_ATOMIC instead of GFP_KERNEL.
      
      Test commands:
      gtp-link add gtp1 &
      gtp-tunnel add gtp1 v1 100 200 1.1.1.1 2.2.2.2
      
      Splat looks like:
      [  130.618881] =============================
      [  130.626382] WARNING: suspicious RCU usage
      [  130.626994] 5.2.0-rc6+ #50 Not tainted
      [  130.627622] -----------------------------
      [  130.628223] ./include/linux/rcupdate.h:266 Illegal context switch in RCU read-side critical section!
      [  130.629684]
      [  130.629684] other info that might help us debug this:
      [  130.629684]
      [  130.631022]
      [  130.631022] rcu_scheduler_active = 2, debug_locks = 1
      [  130.632136] 4 locks held by gtp-tunnel/1025:
      [  130.632925]  #0: 000000002b93c8b7 (cb_lock){++++}, at: genl_rcv+0x15/0x40
      [  130.634159]  #1: 00000000f17bc999 (genl_mutex){+.+.}, at: genl_rcv_msg+0xfb/0x130
      [  130.635487]  #2: 00000000c644ed8e (rtnl_mutex){+.+.}, at: gtp_genl_new_pdp+0x18c/0x1150 [gtp]
      [  130.636936]  #3: 0000000007a1cde7 (rcu_read_lock){....}, at: gtp_genl_new_pdp+0x187/0x1150 [gtp]
      [  130.638348]
      [  130.638348] stack backtrace:
      [  130.639062] CPU: 1 PID: 1025 Comm: gtp-tunnel Not tainted 5.2.0-rc6+ #50
      [  130.641318] Call Trace:
      [  130.641707]  dump_stack+0x7c/0xbb
      [  130.642252]  ___might_sleep+0x2c0/0x3b0
      [  130.642862]  kmem_cache_alloc_trace+0x1cd/0x2b0
      [  130.643591]  gtp_genl_new_pdp+0x6c5/0x1150 [gtp]
      [  130.644371]  genl_family_rcv_msg+0x63a/0x1030
      [  130.645074]  ? mutex_lock_io_nested+0x1090/0x1090
      [  130.645845]  ? genl_unregister_family+0x630/0x630
      [  130.646592]  ? debug_show_all_locks+0x2d0/0x2d0
      [  130.647293]  ? check_flags.part.40+0x440/0x440
      [  130.648099]  genl_rcv_msg+0xa3/0x130
      [ ... ]
      
      Fixes: 459aa660 ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f167e19
    • T
      gtp: remove duplicate code in gtp_dellink() · a635037a
      Taehee Yoo 提交于
      gtp_encap_disable() in gtp_dellink() is unnecessary because it will be
      called by unregister_netdevice().
      unregister_netdevice() internally calls gtp_dev_uninit() by ->ndo_uninit().
      And gtp_dev_uninit() calls gtp_encap_disable().
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a635037a
    • T
      gtp: fix use-after-free in gtp_encap_destroy() · 1788b856
      Taehee Yoo 提交于
      gtp_encap_destroy() is called twice.
      1. When interface is deleted.
      2. When udp socket is destroyed.
      either gtp->sk0 or gtp->sk1u could be freed by sock_put() in
      gtp_encap_destroy(). so, when gtp_encap_destroy() is called again,
      it would uses freed sk pointer.
      
      patch makes gtp_encap_destroy() to set either gtp->sk0 or gtp->sk1u to
      null. in addition, both gtp->sk0 and gtp->sk1u pointer are protected
      by rtnl_lock. so, rtnl_lock() is added.
      
      Test command:
         gtp-link add gtp1 &
         killall gtp-link
         ip link del gtp1
      
      Splat looks like:
      [   83.182767] BUG: KASAN: use-after-free in __lock_acquire+0x3a20/0x46a0
      [   83.184128] Read of size 8 at addr ffff8880cc7d5360 by task ip/1008
      [   83.185567] CPU: 1 PID: 1008 Comm: ip Not tainted 5.2.0-rc6+ #50
      [   83.188469] Call Trace:
      [ ... ]
      [   83.200126]  lock_acquire+0x141/0x380
      [   83.200575]  ? lock_sock_nested+0x3a/0xf0
      [   83.201069]  _raw_spin_lock_bh+0x38/0x70
      [   83.201551]  ? lock_sock_nested+0x3a/0xf0
      [   83.202044]  lock_sock_nested+0x3a/0xf0
      [   83.202520]  gtp_encap_destroy+0x18/0xe0 [gtp]
      [   83.203065]  gtp_encap_disable.isra.14+0x13/0x50 [gtp]
      [   83.203687]  gtp_dellink+0x56/0x170 [gtp]
      [   83.204190]  rtnl_delete_link+0xb4/0x100
      [ ... ]
      [   83.236513] Allocated by task 976:
      [   83.236925]  save_stack+0x19/0x80
      [   83.237332]  __kasan_kmalloc.constprop.3+0xa0/0xd0
      [   83.237894]  kmem_cache_alloc+0xd8/0x280
      [   83.238360]  sk_prot_alloc.isra.42+0x50/0x200
      [   83.238874]  sk_alloc+0x32/0x940
      [   83.239264]  inet_create+0x283/0xc20
      [   83.239684]  __sock_create+0x2dd/0x540
      [   83.240136]  __sys_socket+0xca/0x1a0
      [   83.240550]  __x64_sys_socket+0x6f/0xb0
      [   83.240998]  do_syscall_64+0x9c/0x450
      [   83.241466]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [   83.242061]
      [   83.242249] Freed by task 0:
      [   83.242616]  save_stack+0x19/0x80
      [   83.243013]  __kasan_slab_free+0x111/0x150
      [   83.243498]  kmem_cache_free+0x89/0x250
      [   83.244444]  __sk_destruct+0x38f/0x5a0
      [   83.245366]  rcu_core+0x7e9/0x1c20
      [   83.245766]  __do_softirq+0x213/0x8fa
      
      Fixes: 1e3a3abd ("gtp: make GTP sockets in gtp_newlink optional")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1788b856
    • T
      gtp: fix suspicious RCU usage · e198987e
      Taehee Yoo 提交于
      gtp_encap_enable_socket() and gtp_encap_destroy() are not protected
      by rcu_read_lock(). and it's not safe to write sk->sk_user_data.
      This patch make these functions to use lock_sock() instead of
      rcu_dereference_sk_user_data().
      
      Test commands:
          gtp-link add gtp1
      
      Splat looks like:
      [   83.238315] =============================
      [   83.239127] WARNING: suspicious RCU usage
      [   83.239702] 5.2.0-rc6+ #49 Not tainted
      [   83.240268] -----------------------------
      [   83.241205] drivers/net/gtp.c:799 suspicious rcu_dereference_check() usage!
      [   83.243828]
      [   83.243828] other info that might help us debug this:
      [   83.243828]
      [   83.246325]
      [   83.246325] rcu_scheduler_active = 2, debug_locks = 1
      [   83.247314] 1 lock held by gtp-link/1008:
      [   83.248523]  #0: 0000000017772c7f (rtnl_mutex){+.+.}, at: __rtnl_newlink+0x5f5/0x11b0
      [   83.251503]
      [   83.251503] stack backtrace:
      [   83.252173] CPU: 0 PID: 1008 Comm: gtp-link Not tainted 5.2.0-rc6+ #49
      [   83.253271] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   83.254562] Call Trace:
      [   83.254995]  dump_stack+0x7c/0xbb
      [   83.255567]  gtp_encap_enable_socket+0x2df/0x360 [gtp]
      [   83.256415]  ? gtp_find_dev+0x1a0/0x1a0 [gtp]
      [   83.257161]  ? memset+0x1f/0x40
      [   83.257843]  gtp_newlink+0x90/0xa21 [gtp]
      [   83.258497]  ? __netlink_ns_capable+0xc3/0xf0
      [   83.259260]  __rtnl_newlink+0xb9f/0x11b0
      [   83.260022]  ? rtnl_link_unregister+0x230/0x230
      [ ... ]
      
      Fixes: 1e3a3abd ("gtp: make GTP sockets in gtp_newlink optional")
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e198987e
  2. 31 5月, 2019 1 次提交
  3. 28 4月, 2019 1 次提交
    • J
      genetlink: optionally validate strictly/dumps · ef6243ac
      Johannes Berg 提交于
      Add options to strictly validate messages and dump messages,
      sometimes perhaps validating dump messages non-strictly may
      be required, so add an option for that as well.
      
      Since none of this can really be applied to existing commands,
      set the options everwhere using the following spatch:
      
          @@
          identifier ops;
          expression X;
          @@
          struct genl_ops ops[] = {
          ...,
           {
                  .cmd = X,
          +       .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
                  ...
           },
          ...
          };
      
      For new commands one should just not copy the .validate 'opt-out'
      flags and thus get strict validation.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef6243ac
  4. 22 3月, 2019 1 次提交
    • J
      genetlink: make policy common to family · 3b0f31f2
      Johannes Berg 提交于
      Since maxattr is common, the policy can't really differ sanely,
      so make it common as well.
      
      The only user that did in fact manage to make a non-common policy
      is taskstats, which has to be really careful about it (since it's
      still using a common maxattr!). This is no longer supported, but
      we can fake it using pre_doit.
      
      This reduces the size of e.g. nl80211.o (which has lots of commands):
      
         text	   data	    bss	    dec	    hex	filename
       398745	  14323	   2240	 415308	  6564c	net/wireless/nl80211.o (before)
       397913	  14331	   2240	 414484	  65314	net/wireless/nl80211.o (after)
      --------------------------------
         -832      +8       0    -824
      
      Which is obviously just 8 bytes for each command, and an added 8
      bytes for the new policy pointer. I'm not sure why the ops list is
      counted as .text though.
      
      Most of the code transformations were done using the following spatch:
          @ops@
          identifier OPS;
          expression POLICY;
          @@
          struct genl_ops OPS[] = {
          ...,
           {
          -	.policy = POLICY,
           },
          ...
          };
      
          @@
          identifier ops.OPS;
          expression ops.POLICY;
          identifier fam;
          expression M;
          @@
          struct genl_family fam = {
                  .ops = OPS,
                  .maxattr = M,
          +       .policy = POLICY,
                  ...
          };
      
      This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
      the cb->data as ops, which we want to change in a later genl patch.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b0f31f2
  5. 21 7月, 2018 1 次提交
  6. 13 6月, 2018 1 次提交
    • K
      treewide: kmalloc() -> kmalloc_array() · 6da2ec56
      Kees Cook 提交于
      The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
      patch replaces cases of:
      
              kmalloc(a * b, gfp)
      
      with:
              kmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kmalloc(a * b * c, gfp)
      
      with:
      
              kmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The tools/ directory was manually excluded, since it has its own
      implementation of kmalloc().
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kmalloc
      + kmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(sizeof(THING) * C2, ...)
      |
        kmalloc(sizeof(TYPE) * C2, ...)
      |
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(C1 * C2, ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6da2ec56
  7. 28 3月, 2018 1 次提交
  8. 28 2月, 2018 1 次提交
  9. 02 8月, 2017 1 次提交
  10. 27 6月, 2017 2 次提交
  11. 16 6月, 2017 1 次提交
    • J
      networking: make skb_push & __skb_push return void pointers · d58ff351
      Johannes Berg 提交于
      It seems like a historic accident that these return unsigned char *,
      and in many places that means casts are required, more often than not.
      
      Make these functions return void * and remove all the casts across
      the tree, adding a (u8 *) cast only where the unsigned char pointer
      was used directly, all done with the following spatch:
      
          @@
          expression SKB, LEN;
          typedef u8;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - *(fn(SKB, LEN))
          + *(u8 *)fn(SKB, LEN)
      
          @@
          expression E, SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          type T;
          @@
          - E = ((T *)(fn(SKB, LEN)))
          + E = fn(SKB, LEN)
      
          @@
          expression SKB, LEN;
          identifier fn = { skb_push, __skb_push, skb_push_rcsum };
          @@
          - fn(SKB, LEN)[0]
          + *(u8 *)fn(SKB, LEN)
      
      Note that the last part there converts from push(...)[0] to the
      more idiomatic *(u8 *)push(...).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d58ff351
  12. 08 6月, 2017 1 次提交
    • D
      net: Fix inconsistent teardown and release of private netdev state. · cf124db5
      David S. Miller 提交于
      Network devices can allocate reasources and private memory using
      netdev_ops->ndo_init().  However, the release of these resources
      can occur in one of two different places.
      
      Either netdev_ops->ndo_uninit() or netdev->destructor().
      
      The decision of which operation frees the resources depends upon
      whether it is necessary for all netdev refs to be released before it
      is safe to perform the freeing.
      
      netdev_ops->ndo_uninit() presumably can occur right after the
      NETDEV_UNREGISTER notifier completes and the unicast and multicast
      address lists are flushed.
      
      netdev->destructor(), on the other hand, does not run until the
      netdev references all go away.
      
      Further complicating the situation is that netdev->destructor()
      almost universally does also a free_netdev().
      
      This creates a problem for the logic in register_netdevice().
      Because all callers of register_netdevice() manage the freeing
      of the netdev, and invoke free_netdev(dev) if register_netdevice()
      fails.
      
      If netdev_ops->ndo_init() succeeds, but something else fails inside
      of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
      it is not able to invoke netdev->destructor().
      
      This is because netdev->destructor() will do a free_netdev() and
      then the caller of register_netdevice() will do the same.
      
      However, this means that the resources that would normally be released
      by netdev->destructor() will not be.
      
      Over the years drivers have added local hacks to deal with this, by
      invoking their destructor parts by hand when register_netdevice()
      fails.
      
      Many drivers do not try to deal with this, and instead we have leaks.
      
      Let's close this hole by formalizing the distinction between what
      private things need to be freed up by netdev->destructor() and whether
      the driver needs unregister_netdevice() to perform the free_netdev().
      
      netdev->priv_destructor() performs all actions to free up the private
      resources that used to be freed by netdev->destructor(), except for
      free_netdev().
      
      netdev->needs_free_netdev is a boolean that indicates whether
      free_netdev() should be done at the end of unregister_netdevice().
      
      Now, register_netdevice() can sanely release all resources after
      ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
      and netdev->priv_destructor().
      
      And at the end of unregister_netdevice(), we invoke
      netdev->priv_destructor() and optionally call free_netdev().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf124db5
  13. 26 5月, 2017 1 次提交
  14. 26 3月, 2017 2 次提交
    • J
      gtp: support SGSN-side tunnels · 91ed81f9
      Jonas Bonn 提交于
      The GTP-tunnel driver is explicitly GGSN-side as it searches for PDP
      contexts based on the incoming packets _destination_ address.  If we
      want to place ourselves on the SGSN side of the  tunnel, then we want
      to be identifying PDP contexts based on _source_ address.
      
      Let it be noted that in a "real" configuration this module would never
      be used:  the SGSN normally does not see IP packets as input.  The
      justification for this functionality is for PGW load-testing applications
      where the input to the SGSN is locally generally IP traffic.
      
      This patch adds a "role" argument at GTP-link creation time to specify
      whether we are on the GGSN or SGSN side of the tunnel; this flag is then
      used to determine which part of the IP packet to use in determining
      the PDP context.
      Signed-off-by: NJonas Bonn <jonas@southpole.se>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NHarald Welte <laforge@gnumonks.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91ed81f9
    • J
      gtp: rename SGSN netlink attribute · ae6336b5
      Jonas Bonn 提交于
      This is a mostly cosmetic rename of the SGSN netlink attribute to
      the GTP link.  The justification for this is that we will be making
      the module support decapsulation of "downstream" SGSN packets, in
      which case the netlink parameter actually refers to the upstream GGSN
      peer.  Renaming the parameter makes the relationship clearer.
      
      The legacy name is maintained as a define in the header file in order
      to not break existing code.
      Signed-off-by: NJonas Bonn <jonas@southpole.se>
      Acked-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NHarald Welte <laforge@gnumonks.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae6336b5
  15. 14 3月, 2017 7 次提交
  16. 28 2月, 2017 1 次提交
  17. 30 1月, 2017 2 次提交
  18. 27 1月, 2017 3 次提交
  19. 18 12月, 2016 2 次提交
  20. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  21. 28 10月, 2016 3 次提交
    • J
      genetlink: mark families as __ro_after_init · 56989f6d
      Johannes Berg 提交于
      Now genl_register_family() is the only thing (other than the
      users themselves, perhaps, but I didn't find any doing that)
      writing to the family struct.
      
      In all families that I found, genl_register_family() is only
      called from __init functions (some indirectly, in which case
      I've add __init annotations to clarifly things), so all can
      actually be marked __ro_after_init.
      
      This protects the data structure from accidental corruption.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56989f6d
    • J
      genetlink: statically initialize families · 489111e5
      Johannes Berg 提交于
      Instead of providing macros/inline functions to initialize
      the families, make all users initialize them statically and
      get rid of the macros.
      
      This reduces the kernel code size by about 1.6k on x86-64
      (with allyesconfig).
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      489111e5
    • J
      genetlink: no longer support using static family IDs · a07ea4d9
      Johannes Berg 提交于
      Static family IDs have never really been used, the only
      use case was the workaround I introduced for those users
      that assumed their family ID was also their multicast
      group ID.
      
      Additionally, because static family IDs would never be
      reserved by the generic netlink code, using a relatively
      low ID would only work for built-in families that can be
      registered immediately after generic netlink is started,
      which is basically only the control family (apart from
      the workaround code, which I also had to add code for so
      it would reserve those IDs)
      
      Thus, anything other than GENL_ID_GENERATE is flawed and
      luckily not used except in the cases I mentioned. Move
      those workarounds into a few lines of code, and then get
      rid of GENL_ID_GENERATE entirely, making it more robust.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a07ea4d9
  22. 18 6月, 2016 1 次提交
  23. 13 5月, 2016 1 次提交