1. 10 7月, 2018 1 次提交
    • T
      rhashtable: add restart routine in rhashtable_free_and_destroy() · 0026129c
      Taehee Yoo 提交于
      rhashtable_free_and_destroy() cancels re-hash deferred work
      then walks and destroys elements. at this moment, some elements can be
      still in future_tbl. that elements are not destroyed.
      
      test case:
      nft_rhash_destroy() calls rhashtable_free_and_destroy() to destroy
      all elements of sets before destroying sets and chains.
      But rhashtable_free_and_destroy() doesn't destroy elements of future_tbl.
      so that splat occurred.
      
      test script:
         %cat test.nft
         table ip aa {
      	   map map1 {
      		   type ipv4_addr : verdict;
      		   elements = {
      			   0 : jump a0,
      			   1 : jump a0,
      			   2 : jump a0,
      			   3 : jump a0,
      			   4 : jump a0,
      			   5 : jump a0,
      			   6 : jump a0,
      			   7 : jump a0,
      			   8 : jump a0,
      			   9 : jump a0,
      		}
      	   }
      	   chain a0 {
      	   }
         }
         flush ruleset
         table ip aa {
      	   map map1 {
      		   type ipv4_addr : verdict;
      		   elements = {
      			   0 : jump a0,
      			   1 : jump a0,
      			   2 : jump a0,
      			   3 : jump a0,
      			   4 : jump a0,
      			   5 : jump a0,
      			   6 : jump a0,
      			   7 : jump a0,
      			   8 : jump a0,
      			   9 : jump a0,
      		   }
      	   }
      	   chain a0 {
      	   }
         }
         flush ruleset
      
         %while :; do nft -f test.nft; done
      
      Splat looks like:
      [  200.795603] kernel BUG at net/netfilter/nf_tables_api.c:1363!
      [  200.806944] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
      [  200.812253] CPU: 1 PID: 1582 Comm: nft Not tainted 4.17.0+ #24
      [  200.820297] Hardware name: To be filled by O.E.M. To be filled by O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
      [  200.830309] RIP: 0010:nf_tables_chain_destroy.isra.34+0x62/0x240 [nf_tables]
      [  200.838317] Code: 43 50 85 c0 74 26 48 8b 45 00 48 8b 4d 08 ba 54 05 00 00 48 c7 c6 60 6d 29 c0 48 c7 c7 c0 65 29 c0 4c 8b 40 08 e8 58 e5 fd f8 <0f> 0b 48 89 da 48 b8 00 00 00 00 00 fc ff
      [  200.860366] RSP: 0000:ffff880118dbf4d0 EFLAGS: 00010282
      [  200.866354] RAX: 0000000000000061 RBX: ffff88010cdeaf08 RCX: 0000000000000000
      [  200.874355] RDX: 0000000000000061 RSI: 0000000000000008 RDI: ffffed00231b7e90
      [  200.882361] RBP: ffff880118dbf4e8 R08: ffffed002373bcfb R09: ffffed002373bcfa
      [  200.890354] R10: 0000000000000000 R11: ffffed002373bcfb R12: dead000000000200
      [  200.898356] R13: dead000000000100 R14: ffffffffbb62af38 R15: dffffc0000000000
      [  200.906354] FS:  00007fefc31fd700(0000) GS:ffff88011b800000(0000) knlGS:0000000000000000
      [  200.915533] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  200.922355] CR2: 0000557f1c8e9128 CR3: 0000000106880000 CR4: 00000000001006e0
      [  200.930353] Call Trace:
      [  200.932351]  ? nf_tables_commit+0x26f6/0x2c60 [nf_tables]
      [  200.939525]  ? nf_tables_setelem_notify.constprop.49+0x1a0/0x1a0 [nf_tables]
      [  200.947525]  ? nf_tables_delchain+0x6e0/0x6e0 [nf_tables]
      [  200.952383]  ? nft_add_set_elem+0x1700/0x1700 [nf_tables]
      [  200.959532]  ? nla_parse+0xab/0x230
      [  200.963529]  ? nfnetlink_rcv_batch+0xd06/0x10d0 [nfnetlink]
      [  200.968384]  ? nfnetlink_net_init+0x130/0x130 [nfnetlink]
      [  200.975525]  ? debug_show_all_locks+0x290/0x290
      [  200.980363]  ? debug_show_all_locks+0x290/0x290
      [  200.986356]  ? sched_clock_cpu+0x132/0x170
      [  200.990352]  ? find_held_lock+0x39/0x1b0
      [  200.994355]  ? sched_clock_local+0x10d/0x130
      [  200.999531]  ? memset+0x1f/0x40
      
      V2:
       - free all tables requested by Herbert Xu
      Signed-off-by: NTaehee Yoo <ap420073@gmail.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0026129c
  2. 03 7月, 2018 1 次提交
  3. 25 4月, 2018 3 次提交
  4. 01 4月, 2018 1 次提交
  5. 07 3月, 2018 1 次提交
  6. 11 12月, 2017 3 次提交
    • T
      rhashtable: Call library function alloc_bucket_locks · 64e0cd0d
      Tom Herbert 提交于
      To allocate the array of bucket locks for the hash table we now
      call library function alloc_bucket_spinlocks. This function is
      based on the old alloc_bucket_locks in rhashtable and should
      produce the same effect.
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64e0cd0d
    • T
      rhashtable: Add rhastable_walk_peek · 2db54b47
      Tom Herbert 提交于
      This function is like rhashtable_walk_next except that it only returns
      the current element in the inter and does not advance the iter.
      
      This patch also creates __rhashtable_walk_find_next. It finds the next
      element in the table when the entry cached in iter is NULL or at the end
      of a slot. __rhashtable_walk_find_next is called from
      rhashtable_walk_next and rhastable_walk_peek.
      
      end_of_table is an added field to the iter structure. This indicates
      that the end of table was reached (walker.tbl being NULL is not a
      sufficient condition for end of table).
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2db54b47
    • T
      rhashtable: Change rhashtable_walk_start to return void · 97a6ec4a
      Tom Herbert 提交于
      Most callers of rhashtable_walk_start don't care about a resize event
      which is indicated by a return value of -EAGAIN. So calls to
      rhashtable_walk_start are wrapped wih code to ignore -EAGAIN. Something
      like this is common:
      
             ret = rhashtable_walk_start(rhiter);
             if (ret && ret != -EAGAIN)
                     goto out;
      
      Since zero and -EAGAIN are the only possible return values from the
      function this check is pointless. The condition never evaluates to true.
      
      This patch changes rhashtable_walk_start to return void. This simplifies
      code for the callers that ignore -EAGAIN. For the few cases where the
      caller cares about the resize event, particularly where the table can be
      walked in mulitple parts for netlink or seq file dump, the function
      rhashtable_walk_start_check has been added that returns -EAGAIN on a
      resize event.
      Signed-off-by: NTom Herbert <tom@quantonium.net>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      97a6ec4a
  7. 20 9月, 2017 1 次提交
  8. 11 7月, 2017 1 次提交
  9. 20 6月, 2017 1 次提交
  10. 09 5月, 2017 1 次提交
  11. 02 5月, 2017 1 次提交
  12. 28 4月, 2017 1 次提交
  13. 27 4月, 2017 2 次提交
  14. 19 4月, 2017 1 次提交
  15. 02 3月, 2017 1 次提交
  16. 27 2月, 2017 2 次提交
  17. 18 2月, 2017 1 次提交
    • H
      rhashtable: Add nested tables · da20420f
      Herbert Xu 提交于
      This patch adds code that handles GFP_ATOMIC kmalloc failure on
      insertion.  As we cannot use vmalloc, we solve it by making our
      hash table nested.  That is, we allocate single pages at each level
      and reach our desired table size by nesting them.
      
      When a nested table is created, only a single page is allocated
      at the top-level.  Lower levels are allocated on demand during
      insertion.  Therefore for each insertion to succeed, only two
      (non-consecutive) pages are needed.
      
      After a nested table is created, a rehash will be scheduled in
      order to switch to a vmalloced table as soon as possible.  Also,
      the rehash code will never rehash into a nested table.  If we
      detect a nested table during a rehash, the rehash will be aborted
      and a new rehash will be scheduled.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da20420f
  18. 16 2月, 2017 1 次提交
  19. 14 2月, 2017 1 次提交
    • H
      rhashtable: Add nested tables · 40137906
      Herbert Xu 提交于
      This patch adds code that handles GFP_ATOMIC kmalloc failure on
      insertion.  As we cannot use vmalloc, we solve it by making our
      hash table nested.  That is, we allocate single pages at each level
      and reach our desired table size by nesting them.
      
      When a nested table is created, only a single page is allocated
      at the top-level.  Lower levels are allocated on demand during
      insertion.  Therefore for each insertion to succeed, only two
      (non-consecutive) pages are needed.
      
      After a nested table is created, a rehash will be scheduled in
      order to switch to a vmalloced table as soon as possible.  Also,
      the rehash code will never rehash into a nested table.  If we
      detect a nested table during a rehash, the rehash will be aborted
      and a new rehash will be scheduled.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40137906
  20. 20 9月, 2016 1 次提交
    • H
      rhashtable: Add rhlist interface · ca26893f
      Herbert Xu 提交于
      The insecure_elasticity setting is an ugly wart brought out by
      users who need to insert duplicate objects (that is, distinct
      objects with identical keys) into the same table.
      
      In fact, those users have a much bigger problem.  Once those
      duplicate objects are inserted, they don't have an interface to
      find them (unless you count the walker interface which walks
      over the entire table).
      
      Some users have resorted to doing a manual walk over the hash
      table which is of course broken because they don't handle the
      potential existence of multiple hash tables.  The result is that
      they will break sporadically when they encounter a hash table
      resize/rehash.
      
      This patch provides a way out for those users, at the expense
      of an extra pointer per object.  Essentially each object is now
      a list of objects carrying the same key.  The hash table will
      only see the lists so nothing changes as far as rhashtable is
      concerned.
      
      To use this new interface, you need to insert a struct rhlist_head
      into your objects instead of struct rhash_head.  While the hash
      table is unchanged, for type-safety you'll need to use struct
      rhltable instead of struct rhashtable.  All the existing interfaces
      have been duplicated for rhlist, including the hash table walker.
      
      One missing feature is nulls marking because AFAIK the only potential
      user of it does not need duplicate objects.  Should anyone need
      this it shouldn't be too hard to add.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca26893f
  21. 27 8月, 2016 1 次提交
  22. 26 8月, 2016 1 次提交
    • P
      rhashtable: add rhashtable_lookup_get_insert_key() · 5ca8cc5b
      Pablo Neira Ayuso 提交于
      This patch modifies __rhashtable_insert_fast() so it returns the
      existing object that clashes with the one that you want to insert.
      In case the object is successfully inserted, NULL is returned.
      Otherwise, you get an error via ERR_PTR().
      
      This patch adapts the existing callers of __rhashtable_insert_fast()
      so they handle this new logic, and it adds a new
      rhashtable_lookup_get_insert_key() interface to fetch this existing
      object.
      
      nf_tables needs this change to improve handling of EEXIST cases via
      honoring the NLM_F_EXCL flag and by checking if the data part of the
      mapping matches what we have.
      
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      5ca8cc5b
  23. 20 8月, 2016 1 次提交
    • H
      rhashtable: Remove GFP flag from rhashtable_walk_init · 246779dd
      Herbert Xu 提交于
      The commit 8f6fd83c ("rhashtable:
      accept GFP flags in rhashtable_walk_init") added a GFP flag argument
      to rhashtable_walk_init because some users wish to use the walker
      in an unsleepable context.
      
      In fact we don't need to allocate memory in rhashtable_walk_init
      at all.  The walker is always paired with an iterator so we could
      just stash ourselves there.
      
      This patch does that by introducing a new enter function to replace
      the existing init function.  This way we don't have to churn all
      the existing users again.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      246779dd
  24. 16 8月, 2016 1 次提交
    • V
      rhashtable: fix shift by 64 when shrinking · 12311959
      Vegard Nossum 提交于
      I got this:
      
          ================================================================================
          UBSAN: Undefined behaviour in ./include/linux/log2.h:63:13
          shift exponent 64 is too large for 64-bit type 'long unsigned int'
          CPU: 1 PID: 721 Comm: kworker/1:1 Not tainted 4.8.0-rc1+ #87
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
          Workqueue: events rht_deferred_worker
           0000000000000000 ffff88011661f8d8 ffffffff82344f50 0000000041b58ab3
           ffffffff84f98000 ffffffff82344ea4 ffff88011661f900 ffff88011661f8b0
           0000000000000001 ffff88011661f6b8 dffffc0000000000 ffffffff867f7640
          Call Trace:
           [<ffffffff82344f50>] dump_stack+0xac/0xfc
           [<ffffffff82344ea4>] ? _atomic_dec_and_lock+0xc4/0xc4
           [<ffffffff8242f5b8>] ubsan_epilogue+0xd/0x8a
           [<ffffffff82430c41>] __ubsan_handle_shift_out_of_bounds+0x255/0x29a
           [<ffffffff824309ec>] ? __ubsan_handle_out_of_bounds+0x180/0x180
           [<ffffffff84003436>] ? nl80211_req_set_reg+0x256/0x2f0
           [<ffffffff812112ba>] ? print_context_stack+0x8a/0x160
           [<ffffffff81200031>] ? amd_pmu_reset+0x341/0x380
           [<ffffffff823af808>] rht_deferred_worker+0x1618/0x1790
           [<ffffffff823af808>] ? rht_deferred_worker+0x1618/0x1790
           [<ffffffff823ae1f0>] ? rhashtable_jhash2+0x370/0x370
           [<ffffffff8134c12d>] ? process_one_work+0x6fd/0x1970
           [<ffffffff8134c1cf>] process_one_work+0x79f/0x1970
           [<ffffffff8134c12d>] ? process_one_work+0x6fd/0x1970
           [<ffffffff8134ba30>] ? try_to_grab_pending+0x4c0/0x4c0
           [<ffffffff8134d564>] ? worker_thread+0x1c4/0x1340
           [<ffffffff8134d8ff>] worker_thread+0x55f/0x1340
           [<ffffffff845e904f>] ? __schedule+0x4df/0x1d40
           [<ffffffff8134d3a0>] ? process_one_work+0x1970/0x1970
           [<ffffffff8134d3a0>] ? process_one_work+0x1970/0x1970
           [<ffffffff813642f7>] kthread+0x237/0x390
           [<ffffffff813640c0>] ? __kthread_parkme+0x280/0x280
           [<ffffffff845f8c93>] ? _raw_spin_unlock_irq+0x33/0x50
           [<ffffffff845f95df>] ret_from_fork+0x1f/0x40
           [<ffffffff813640c0>] ? __kthread_parkme+0x280/0x280
          ================================================================================
      
      roundup_pow_of_two() is undefined when called with an argument of 0, so
      let's avoid the call and just fall back to ht->p.min_size (which should
      never be smaller than HASH_MIN_SIZE).
      
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NVegard Nossum <vegard.nossum@oracle.com>
      Acked-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12311959
  25. 15 8月, 2016 1 次提交
    • F
      rhashtable: avoid large lock-array allocations · 4cf0b354
      Florian Westphal 提交于
      Sander reports following splat after netfilter nat bysrc table got
      converted to rhashtable:
      
      swapper/0: page allocation failure: order:3, mode:0x2084020(GFP_ATOMIC|__GFP_COMP)
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.8.0-rc1 [..]
       [<ffffffff811633ed>] warn_alloc_failed+0xdd/0x140
       [<ffffffff811638b1>] __alloc_pages_nodemask+0x3e1/0xcf0
       [<ffffffff811a72ed>] alloc_pages_current+0x8d/0x110
       [<ffffffff8117cb7f>] kmalloc_order+0x1f/0x70
       [<ffffffff811aec19>] __kmalloc+0x129/0x140
       [<ffffffff8146d561>] bucket_table_alloc+0xc1/0x1d0
       [<ffffffff8146da1d>] rhashtable_insert_rehash+0x5d/0xe0
       [<ffffffff819fcfff>] nf_nat_setup_info+0x2ef/0x400
      
      The failure happens when allocating the spinlock array.
      Even with GFP_KERNEL its unlikely for such a large allocation
      to succeed.
      
      Thomas Graf pointed me at inet_ehash_locks_alloc(), so in addition
      to adding NOWARN for atomic allocations this also makes the bucket-array
      sizing more conservative.
      
      In commit 095dc8e0 ("tcp: fix/cleanup inet_ehash_locks_alloc()"),
      Eric Dumazet says: "Budget 2 cache lines per cpu worth of 'spinlocks'".
      IOW, consider size needed by a single spinlock when determining
      number of locks per cpu.  So with 64 byte per cacheline and 4 byte per
      spinlock this gives 32 locks per cpu.
      
      Resulting size of the lock-array (sizeof(spinlock) == 4):
      
      cpus:    1   2   4   8   16   32   64
      old:    1k  1k  4k  8k  16k  16k  16k
      new:   128 256 512  1k   2k   4k   8k
      
      8k allocation should have decent chance of success even
      with GFP_ATOMIC, and should not fail with GFP_KERNEL.
      
      With 72-byte spinlock (LOCKDEP):
      cpus :   1   2
      old:    9k 18k
      new:   ~2k ~4k
      Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
      Suggested-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4cf0b354
  26. 05 4月, 2016 1 次提交
  27. 19 12月, 2015 1 次提交
  28. 17 12月, 2015 1 次提交
    • H
      rhashtable: Fix walker list corruption · c6ff5268
      Herbert Xu 提交于
      The commit ba7c95ea ("rhashtable:
      Fix sleeping inside RCU critical section in walk_stop") introduced
      a new spinlock for the walker list.  However, it did not convert
      all existing users of the list over to the new spin lock.  Some
      continued to use the old mutext for this purpose.  This obviously
      led to corruption of the list.
      
      The fix is to use the spin lock everywhere where we touch the list.
      
      This also allows us to do rcu_rad_lock before we take the lock in
      rhashtable_walk_start.  With the old mutex this would've deadlocked
      but it's safe with the new spin lock.
      
      Fixes: ba7c95ea ("rhashtable: Fix sleeping inside RCU...")
      Reported-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c6ff5268
  29. 16 12月, 2015 1 次提交
  30. 09 12月, 2015 1 次提交
  31. 06 12月, 2015 1 次提交
  32. 05 12月, 2015 2 次提交
  33. 23 9月, 2015 1 次提交