1. 18 12月, 2019 1 次提交
  2. 24 11月, 2019 1 次提交
    • M
      idr: Fix idr_get_next race with idr_remove · a16a3669
      Matthew Wilcox (Oracle) 提交于
      commit 5c089fd0c73411f2170ab795c9ffc16718c7d007 upstream.
      
      If the entry is deleted from the IDR between the call to
      radix_tree_iter_find() and rcu_dereference_raw(), idr_get_next()
      will return NULL, which will end the iteration prematurely.  We should
      instead continue to the next entry in the IDR.  This only happens if the
      iteration is protected by the RCU lock.  Most IDR users use a spinlock
      or semaphore to exclude simultaneous modifications.  It was noticed once
      the PID allocator was converted to use the IDR, as it uses the RCU lock,
      but there may be other users elsewhere in the kernel.
      
      We can't use the normal pattern of calling radix_tree_deref_retry()
      (which catches both a retry entry in a leaf node and a node entry in
      the root) as the IDR supports storing entries which are unaligned,
      which will trigger an infinite loop if they are encountered.  Instead,
      we have to explicitly check whether the entry is a retry entry.
      
      Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree")
      Reported-by: NBrendan Gregg <bgregg@netflix.com>
      Tested-by: NBrendan Gregg <bgregg@netflix.com>
      Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      
      a16a3669
  3. 22 8月, 2018 4 次提交
    • M
      ida: Change ida_get_new_above to return the id · 1df89519
      Matthew Wilcox 提交于
      This calling convention makes more sense for the implementation as well
      as the callers.  It even shaves 32 bytes off the compiled code size.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      1df89519
    • M
      ida: Remove old API · b03f8e43
      Matthew Wilcox 提交于
      Delete ida_pre_get(), ida_get_new(), ida_get_new_above() and ida_remove()
      from the public API.  Some of these functions still exist as internal
      helpers, but they should not be called by consumers.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      b03f8e43
    • M
      ida: Add new API · 5ade60dd
      Matthew Wilcox 提交于
      Add ida_alloc(), ida_alloc_min(), ida_alloc_max(), ida_alloc_range()
      and ida_free().  The ida_alloc_max() and ida_alloc_range() functions
      differ from ida_simple_get() in that they take an inclusive 'max'
      parameter instead of an exclusive 'end' parameter.  Callers are about
      evenly split whether they'd like inclusive or exclusive parameters and
      'max' is easier to document than 'end'.
      
      Change the IDA allocation to first attempt to allocate a bit using
      existing memory, and only allocate memory afterwards.  Also change the
      behaviour of 'min' > INT_MAX from being a BUG() to returning -ENOSPC.
      
      Leave compatibility wrappers in place for ida_simple_get() and
      ida_simple_remove() to avoid changing all callers.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      5ade60dd
    • M
      ida: Lock the IDA in ida_destroy · 50d97d50
      Matthew Wilcox 提交于
      The user has no need to handle locking between ida_simple_get() and
      ida_simple_remove().  They shouldn't be forced to think about whether
      ida_destroy() might be called at the same time as any of their other
      IDA manipulation calls.  Improve the documnetation while I'm in here.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      50d97d50
  4. 08 6月, 2018 1 次提交
  5. 27 2月, 2018 1 次提交
    • M
      idr: Fix handling of IDs above INT_MAX · 4b0ad076
      Matthew Wilcox 提交于
      Khalid reported that the kernel selftests are currently failing:
      
      selftests: test_bpf.sh
      ========================================
      test_bpf: [FAIL]
      not ok 1..8 selftests:  test_bpf.sh [FAIL]
      
      He bisected it to 6ce711f2 ("idr: Make
      1-based IDRs more efficient").
      
      The root cause is doing a signed comparison in idr_alloc_u32() instead
      of an unsigned comparison.  I went looking for any similar problems and
      found a couple (which would each result in the failure to warn in two
      situations that aren't supposed to happen).
      
      I knocked up a few test-cases to prove that I was right and added them
      to the test-suite.
      Reported-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Tested-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      4b0ad076
  6. 22 2月, 2018 1 次提交
  7. 07 2月, 2018 6 次提交
    • M
      idr: Make 1-based IDRs more efficient · 6ce711f2
      Matthew Wilcox 提交于
      About 20% of the IDR users in the kernel want the allocated IDs to start
      at 1.  The implementation currently searches all the way down the left
      hand side of the tree, finds no free ID other than ID 0, walks all the
      way back up, and then all the way down again.  This patch 'rebases' the
      ID so we fill the entire radix tree, rather than leave a gap at 0.
      
      Chris Wilson says: "I did the quick hack of allocating index 0 of the
      idr and that eradicated idr_get_free() from being at the top of the
      profiles for the many-object stress tests. This improvement will be
      much appreciated."
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      6ce711f2
    • M
      idr: Warn if old iterators see large IDs · 72fd6c7b
      Matthew Wilcox 提交于
      Now that the IDR can be used to store large IDs, it is possible somebody
      might only partially convert their old code and use the iterators which
      can only handle IDs up to INT_MAX.  It's probably unwise to show them a
      truncated ID, so settle for spewing warnings to dmesg, and terminating
      the iteration.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      72fd6c7b
    • M
      idr: Rename idr_for_each_entry_ext · 7a457577
      Matthew Wilcox 提交于
      Most places in the kernel that we need to distinguish functions by the
      type of their arguments, we use '_ul' as a suffix for the unsigned long
      variant, not '_ext'.  Also add kernel-doc.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      7a457577
    • M
      idr: Remove idr_alloc_ext · 460488c5
      Matthew Wilcox 提交于
      It has no more users, so remove it.  Move idr_alloc() back into idr.c,
      move the guts of idr_alloc_cmn() into idr_alloc_u32(), remove the
      wrappers around idr_get_free_cmn() and rename it to idr_get_free().
      While there is now no interface to allocate IDs larger than a u32,
      the IDR internals remain ready to handle a larger ID should a need arise.
      
      These changes make it possible to provide the guarantee that, if the
      nextid pointer points into the object, the object's ID will be initialised
      before a concurrent lookup can find the object.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      460488c5
    • M
      idr: Add idr_alloc_u32 helper · e096f6a7
      Matthew Wilcox 提交于
      All current users of idr_alloc_ext() actually want to allocate a u32
      and idr_alloc_u32() fits their needs better.
      
      Like idr_get_next(), it uses a 'nextid' argument which serves as both
      a pointer to the start ID and the assigned ID (instead of a separate
      minimum and pointer-to-assigned-ID argument).  It uses a 'max' argument
      rather than 'end' because the semantics that idr_alloc has for 'end'
      don't work well for unsigned types.
      
      Since idr_alloc_u32() returns an errno instead of the allocated ID, mark
      it as __must_check to help callers use it correctly.  Include copious
      kernel-doc.  Chris Mi <chrism@mellanox.com> has promised to contribute
      test-cases for idr_alloc_u32.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      e096f6a7
    • M
      idr: Delete idr_replace_ext function · 234a4624
      Matthew Wilcox 提交于
      Changing idr_replace's 'id' argument to 'unsigned long' works for all
      callers.  Callers which passed a negative ID now get -ENOENT instead of
      -EINVAL.  No callers relied on this error value.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      234a4624
  8. 16 11月, 2017 1 次提交
    • M
      mm, truncate: do not check mapping for every page being truncated · c7df8ad2
      Mel Gorman 提交于
      During truncation, the mapping has already been checked for shmem and
      dax so it's known that workingset_update_node is required.
      
      This patch avoids the checks on mapping for each page being truncated.
      In all other cases, a lookup helper is used to determine if
      workingset_update_node() needs to be called.  The one danger is that the
      API is slightly harder to use as calling workingset_update_node directly
      without checking for dax or shmem mappings could lead to surprises.
      However, the API rarely needs to be used and hopefully the comment is
      enough to give people the hint.
      
      sparsetruncate (tiny)
                                    4.14.0-rc4             4.14.0-rc4
                                   oneirq-v1r1        pickhelper-v1r1
      Min          Time      141.00 (   0.00%)      140.00 (   0.71%)
      1st-qrtle    Time      142.00 (   0.00%)      141.00 (   0.70%)
      2nd-qrtle    Time      142.00 (   0.00%)      142.00 (   0.00%)
      3rd-qrtle    Time      143.00 (   0.00%)      143.00 (   0.00%)
      Max-90%      Time      144.00 (   0.00%)      144.00 (   0.00%)
      Max-95%      Time      147.00 (   0.00%)      145.00 (   1.36%)
      Max-99%      Time      195.00 (   0.00%)      191.00 (   2.05%)
      Max          Time      230.00 (   0.00%)      205.00 (  10.87%)
      Amean        Time      144.37 (   0.00%)      143.82 (   0.38%)
      Stddev       Time       10.44 (   0.00%)        9.00 (  13.74%)
      Coeff        Time        7.23 (   0.00%)        6.26 (  13.41%)
      Best99%Amean Time      143.72 (   0.00%)      143.34 (   0.26%)
      Best95%Amean Time      142.37 (   0.00%)      142.00 (   0.26%)
      Best90%Amean Time      142.19 (   0.00%)      141.85 (   0.24%)
      Best75%Amean Time      141.92 (   0.00%)      141.58 (   0.24%)
      Best50%Amean Time      141.69 (   0.00%)      141.31 (   0.27%)
      Best25%Amean Time      141.38 (   0.00%)      140.97 (   0.29%)
      
      As you'd expect, the gain is marginal but it can be detected.  The
      differences in bonnie are all within the noise which is not surprising
      given the impact on the microbenchmark.
      
      radix_tree_update_node_t is a callback for some radix operations that
      optionally passes in a private field.  The only user of the callback is
      workingset_update_node and as it no longer requires a mapping, the
      private field is removed.
      
      Link: http://lkml.kernel.org/r/20171018075952.10627-3-mgorman@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c7df8ad2
  9. 04 10月, 2017 1 次提交
  10. 14 9月, 2017 1 次提交
    • E
      idr: remove WARN_ON_ONCE() when trying to replace negative ID · a47f68d6
      Eric Biggers 提交于
      IDR only supports non-negative IDs.  There used to be a 'WARN_ON_ONCE(id <
      0)' in idr_replace(), but it was intentionally removed by commit
      2e1c9b28 ("idr: remove WARN_ON_ONCE() on negative IDs").
      
      Then it was added back by commit 0a835c4f ("Reimplement IDR and IDA
      using the radix tree").  However it seems that adding it back was a
      mistake, given that some users such as drm_gem_handle_delete()
      (DRM_IOCTL_GEM_CLOSE) pass in a value from userspace to idr_replace(),
      allowing the WARN_ON_ONCE to be triggered.  drm_gem_handle_delete()
      actually just wants idr_replace() to return an error code if the ID is
      not allocated, including in the case where the ID is invalid (negative).
      
      So once again remove the bogus WARN_ON_ONCE().
      
      This bug was found by syzkaller, which encountered the following
      warning:
      
          WARNING: CPU: 3 PID: 3008 at lib/idr.c:157 idr_replace+0x1d8/0x240 lib/idr.c:157
          Kernel panic - not syncing: panic_on_warn set ...
      
          CPU: 3 PID: 3008 Comm: syzkaller218828 Not tainted 4.13.0-rc4-next-20170811 #2
          Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
          Call Trace:
           fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
           do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
           do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
           do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
           do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
           invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:930
          RIP: 0010:idr_replace+0x1d8/0x240 lib/idr.c:157
          RSP: 0018:ffff8800394bf9f8 EFLAGS: 00010297
          RAX: ffff88003c6c60c0 RBX: 1ffff10007297f43 RCX: 0000000000000000
          RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800394bfa78
          RBP: ffff8800394bfae0 R08: ffffffff82856487 R09: 0000000000000000
          R10: ffff8800394bf9a8 R11: ffff88006c8bae28 R12: ffffffffffffffff
          R13: ffff8800394bfab8 R14: dffffc0000000000 R15: ffff8800394bfbc8
           drm_gem_handle_delete+0x33/0xa0 drivers/gpu/drm/drm_gem.c:297
           drm_gem_close_ioctl+0xa1/0xe0 drivers/gpu/drm/drm_gem.c:671
           drm_ioctl_kernel+0x1e7/0x2e0 drivers/gpu/drm/drm_ioctl.c:729
           drm_ioctl+0x72e/0xa50 drivers/gpu/drm/drm_ioctl.c:825
           vfs_ioctl fs/ioctl.c:45 [inline]
           do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
           SYSC_ioctl fs/ioctl.c:700 [inline]
           SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
           entry_SYSCALL_64_fastpath+0x1f/0xbe
      
      Here is a C reproducer:
      
          #include <fcntl.h>
          #include <stddef.h>
          #include <stdint.h>
          #include <sys/ioctl.h>
          #include <drm/drm.h>
      
          int main(void)
          {
                  int cardfd = open("/dev/dri/card0", O_RDONLY);
      
                  ioctl(cardfd, DRM_IOCTL_GEM_CLOSE,
                        &(struct drm_gem_close) { .handle = -1 } );
          }
      
      Link: http://lkml.kernel.org/r/20170906235306.20534-1-ebiggers3@gmail.com
      Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree")
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: <stable@vger.kernel.org> [v4.11+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a47f68d6
  11. 31 8月, 2017 1 次提交
  12. 14 2月, 2017 4 次提交
    • M
      idr: Add missing __rcu annotations · 7e73eb0b
      Matthew Wilcox 提交于
      Where we use the radix tree iteration macros, we need to annotate 'slot'
      with __rcu.  Make sure we don't forget any new places in the future with
      the same CFLAGS check used for radix-tree.c.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      7e73eb0b
    • M
      ida: Use exceptional entries for small IDAs · d37cacc5
      Matthew Wilcox 提交于
      We can use the root entry as a bitmap and save allocating a 128 byte
      bitmap for an IDA that contains only a few entries (30 on a 32-bit
      machine, 62 on a 64-bit machine).  This costs about 300 bytes of kernel
      text on x86-64, so as long as 3 IDAs fall into this category, this
      is a net win for memory consumption.
      
      Thanks to Rasmus Villemoes for his work documenting the problem and
      collecting statistics on IDAs.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      d37cacc5
    • M
      ida: Move ida_bitmap to a percpu variable · 7ad3d4d8
      Matthew Wilcox 提交于
      When we preload the IDA, we allocate an IDA bitmap.  Instead of storing
      that preallocated bitmap in the IDA, we store it in a percpu variable.
      Generally there are more IDAs in the system than CPUs, so this cuts down
      on the number of preallocated bitmaps that are unused, and about half
      of the IDA users did not call ida_destroy() so they were leaking IDA
      bitmaps.
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      7ad3d4d8
    • M
      Reimplement IDR and IDA using the radix tree · 0a835c4f
      Matthew Wilcox 提交于
      The IDR is very similar to the radix tree.  It has some functionality that
      the radix tree did not have (alloc next free, cyclic allocation, a
      callback-based for_each, destroy tree), which is readily implementable on
      top of the radix tree.  A few small changes were needed in order to use a
      tag to represent nodes with free space below them.  More extensive
      changes were needed to support storing NULL as a valid entry in an IDR.
      Plain radix trees still interpret NULL as a not-present entry.
      
      The IDA is reimplemented as a client of the newly enhanced radix tree.  As
      in the current implementation, it uses a bitmap at the last level of the
      tree.
      Signed-off-by: NMatthew Wilcox <willy@infradead.org>
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      0a835c4f
  13. 13 12月, 2016 1 次提交
  14. 07 11月, 2015 1 次提交
    • M
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep... · d0164adc
      Mel Gorman 提交于
      mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd
      
      __GFP_WAIT has been used to identify atomic context in callers that hold
      spinlocks or are in interrupts.  They are expected to be high priority and
      have access one of two watermarks lower than "min" which can be referred
      to as the "atomic reserve".  __GFP_HIGH users get access to the first
      lower watermark and can be called the "high priority reserve".
      
      Over time, callers had a requirement to not block when fallback options
      were available.  Some have abused __GFP_WAIT leading to a situation where
      an optimisitic allocation with a fallback option can access atomic
      reserves.
      
      This patch uses __GFP_ATOMIC to identify callers that are truely atomic,
      cannot sleep and have no alternative.  High priority users continue to use
      __GFP_HIGH.  __GFP_DIRECT_RECLAIM identifies callers that can sleep and
      are willing to enter direct reclaim.  __GFP_KSWAPD_RECLAIM to identify
      callers that want to wake kswapd for background reclaim.  __GFP_WAIT is
      redefined as a caller that is willing to enter direct reclaim and wake
      kswapd for background reclaim.
      
      This patch then converts a number of sites
      
      o __GFP_ATOMIC is used by callers that are high priority and have memory
        pools for those requests. GFP_ATOMIC uses this flag.
      
      o Callers that have a limited mempool to guarantee forward progress clear
        __GFP_DIRECT_RECLAIM but keep __GFP_KSWAPD_RECLAIM. bio allocations fall
        into this category where kswapd will still be woken but atomic reserves
        are not used as there is a one-entry mempool to guarantee progress.
      
      o Callers that are checking if they are non-blocking should use the
        helper gfpflags_allow_blocking() where possible. This is because
        checking for __GFP_WAIT as was done historically now can trigger false
        positives. Some exceptions like dm-crypt.c exist where the code intent
        is clearer if __GFP_DIRECT_RECLAIM is used instead of the helper due to
        flag manipulations.
      
      o Callers that built their own GFP flags instead of starting with GFP_KERNEL
        and friends now also need to specify __GFP_KSWAPD_RECLAIM.
      
      The first key hazard to watch out for is callers that removed __GFP_WAIT
      and was depending on access to atomic reserves for inconspicuous reasons.
      In some cases it may be appropriate for them to use __GFP_HIGH.
      
      The second key hazard is callers that assembled their own combination of
      GFP flags instead of starting with something like GFP_KERNEL.  They may
      now wish to specify __GFP_KSWAPD_RECLAIM.  It's almost certainly harmless
      if it's missed in most cases as other activity will wake kswapd.
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Vitaly Wool <vitalywool@gmail.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0164adc
  15. 13 2月, 2015 1 次提交
  16. 09 9月, 2014 1 次提交
  17. 09 8月, 2014 1 次提交
    • A
      lib/idr.c: fix out-of-bounds pointer dereference · 93b7aca3
      Andrey Ryabinin 提交于
      I'm working on address sanitizer project for kernel.  Recently we
      started experiments with stack instrumentation, to detect out-of-bounds
      read/write bugs on stack.
      
      Just after booting I've hit out-of-bounds read on stack in idr_for_each
      (and in __idr_remove_all as well):
      
      	struct idr_layer **paa = &pa[0];
      
      	while (id >= 0 && id <= max) {
      		...
      		while (n < fls(id)) {
      			n += IDR_BITS;
      			p = *--paa; <--- here we are reading pa[-1] value.
      		}
      	}
      
      Despite the fact that after this dereference we are exiting out of loop
      and never use p, such behaviour is undefined and should be avoided.
      
      Fix this by moving pointer derference to the beggining of the loop,
      right before we will use it.
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Alexey Preobrazhensky <preobr@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Konstantin Khlebnikov <koct9i@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93b7aca3
  18. 07 6月, 2014 6 次提交
  19. 08 4月, 2014 2 次提交
  20. 17 2月, 2014 1 次提交
  21. 04 7月, 2013 1 次提交
  22. 30 4月, 2013 1 次提交
    • J
      idr: introduce idr_alloc_cyclic() · 3e6628c4
      Jeff Layton 提交于
      As Tejun points out, there are several users of the IDR facility that
      attempt to use it in a cyclic fashion.  These users are likely to see
      -ENOSPC errors after the counter wraps one or more times however.
      
      This patchset adds a new idr_alloc_cyclic routine and converts several
      of these users to it.  Many of these users are in obscure parts of the
      kernel, and I don't have a good way to test some of them.  The change is
      pretty straightforward though, so hopefully it won't be an issue.
      
      There is one other cyclic user of idr_alloc that I didn't touch in
      ipc/util.c.  That one is doing some strange stuff that I didn't quite
      understand, but it looks like it should probably be converted later
      somehow.
      
      This patch:
      
      Thus spake Tejun Heo:
      
          Ooh, BTW, the cyclic allocation is broken.  It's prone to -ENOSPC
          after the first wraparound.  There are several cyclic users in the
          kernel and I think it probably would be best to implement cyclic
          support in idr.
      
      This patch does that by adding new idr_alloc_cyclic function that such
      users in the kernel can use.  With this, there's no need for a caller to
      keep track of the last value used as that's now tracked internally.  This
      should prevent the ENOSPC problems that can hit when the "last allocated"
      counter exceeds INT_MAX.
      
      Later patches will convert existing cyclic users to the new interface.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Or Gerlitz <ogerlitz@mellanox.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Roland Dreier <roland@purestorage.com>
      Cc: Sridhar Samudrala <sri@us.ibm.com>
      Cc: Steve Wise <swise@opengridcomputing.com>
      Cc: Tom Tucker <tom@opengridcomputing.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3e6628c4
  23. 14 3月, 2013 1 次提交
    • T
      idr: idr_alloc() shouldn't trigger lowmem warning when preloaded · 59bfbcf0
      Tejun Heo 提交于
      GFP_NOIO is often used for idr_alloc() inside preloaded section as the
      allocation mask doesn't really matter.  If the idr tree needs to be
      expanded, idr_alloc() first tries to allocate using the specified
      allocation mask and if it fails falls back to the preloaded buffer.  This
      order prevent non-preloading idr_alloc() users from taking advantage of
      preloading ones by using preload buffer without filling it shifting the
      burden of allocation to the preload users.
      
      Unfortunately, this allowed/expected-to-fail kmem_cache allocation ends up
      generating spurious slab lowmem warning before succeeding the request from
      the preload buffer.
      
      This patch makes idr_layer_alloc() add __GFP_NOWARN to the first
      kmem_cache attempt and try kmem_cache again w/o __GFP_NOWARN after
      allocation from preload_buffer fails so that lowmem warning is generated
      if not suppressed by the original @gfp_mask.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NDavid Teigland <teigland@redhat.com>
      Tested-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59bfbcf0