1. 26 1月, 2011 1 次提交
    • T
      radix_tree: radix_tree_gang_lookup_tag_slot() may never return · ac15ee69
      Toshiyuki Okajima 提交于
      Executed command: fsstress -d /mnt -n 600 -p 850
      
        crash> bt
        PID: 7947   TASK: ffff880160546a70  CPU: 0   COMMAND: "fsstress"
         #0 [ffff8800dfc07d00] machine_kexec at ffffffff81030db9
         #1 [ffff8800dfc07d70] crash_kexec at ffffffff810a7952
         #2 [ffff8800dfc07e40] oops_end at ffffffff814aa7c8
         #3 [ffff8800dfc07e70] die_nmi at ffffffff814aa969
         #4 [ffff8800dfc07ea0] do_nmi_callback at ffffffff8102b07b
         #5 [ffff8800dfc07f10] do_nmi at ffffffff814aa514
         #6 [ffff8800dfc07f50] nmi at ffffffff814a9d60
            [exception RIP: __lookup_tag+100]
            RIP: ffffffff812274b4  RSP: ffff88016056b998  RFLAGS: 00000287
            RAX: 0000000000000000  RBX: 0000000000000002  RCX: 0000000000000006
            RDX: 000000000000001d  RSI: ffff88016056bb18  RDI: ffff8800c85366e0
            RBP: ffff88016056b9c8   R8: ffff88016056b9e8   R9: 0000000000000000
            R10: 000000000000000e  R11: ffff8800c8536908  R12: 0000000000000010
            R13: 0000000000000040  R14: ffffffffffffffc0  R15: ffff8800c85366e0
            ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
        <NMI exception stack>
         #7 [ffff88016056b998] __lookup_tag at ffffffff812274b4
         #8 [ffff88016056b9d0] radix_tree_gang_lookup_tag_slot at ffffffff81227605
         #9 [ffff88016056ba20] find_get_pages_tag at ffffffff810fc110
        #10 [ffff88016056ba80] pagevec_lookup_tag at ffffffff81105e85
        #11 [ffff88016056baa0] write_cache_pages at ffffffff81104c47
        #12 [ffff88016056bbd0] generic_writepages at ffffffff81105014
        #13 [ffff88016056bbe0] do_writepages at ffffffff81105055
        #14 [ffff88016056bbf0] __filemap_fdatawrite_range at ffffffff810fb2cb
        #15 [ffff88016056bc40] filemap_write_and_wait_range at ffffffff810fb32a
        #16 [ffff88016056bc70] generic_file_direct_write at ffffffff810fb3dc
        #17 [ffff88016056bce0] __generic_file_aio_write at ffffffff810fcee5
        #18 [ffff88016056bda0] generic_file_aio_write at ffffffff810fd085
        #19 [ffff88016056bdf0] do_sync_write at ffffffff8114f9ea
        #20 [ffff88016056bf00] vfs_write at ffffffff8114fcf8
        #21 [ffff88016056bf30] sys_write at ffffffff81150691
        #22 [ffff88016056bf80] system_call_fastpath at ffffffff8100c0b2
      
      I think this root cause is the following:
      
       radix_tree_range_tag_if_tagged() always tags the root tag with settag
       if the root tag is set with iftag even if there are no iftag tags
       in the specified range (Of course, there are some iftag tags
       outside the specified range).
      
      ===============================================================================
      [[[Detailed description]]]
      
      (1) Why cannot radix_tree_gang_lookup_tag_slot() return forever?
      
      __lookup_tag():
       - Return with 0.
       - Return with the index which is not bigger than the old one as the
         input parameter.
      
      Therefore the following "while" repeats forever because the above
      conditions cause "ret" not to be updated and the cur_index cannot be
      changed into the bigger one.
      
      (So, radix_tree_gang_lookup_tag_slot() cannot return forever.)
      
      radix_tree_gang_lookup_tag_slot():
      1178         while (ret < max_items) {
      1179                 unsigned int slots_found;
      1180                 unsigned long next_index;       /* Index of next search */
      1181
      1182                 if (cur_index > max_index)
      1183                         break;
      1184                 slots_found = __lookup_tag(node, results + ret,
      1185                                 cur_index, max_items - ret, &next_index,
      tag);
      1186                 ret += slots_found;
      			// cannot update ret because slots_found == 0.
      			// so, this while loops forever.
      1187                 if (next_index == 0)
      1188                         break;
      1189                 cur_index = next_index;
      1190         }
      
      (2) Why does __lookup_tag() return with 0 and doesn't update the index?
      
      Assuming the following:
        - the one of the slot in radix_tree_node is NULL.
        - the one of the tag which corresponds to the slot sets with
          PAGECACHE_TAG_TOWRITE or other.
        - In a certain height(!=0), the corresponding index is 0.
      
      a) __lookup_tag() notices that the tag is set.
      
      1005 static unsigned int
      1006 __lookup_tag(struct radix_tree_node *slot, void ***results, unsigned long index,
      1007         unsigned int max_items, unsigned long *next_index, unsigned int tag)
      1008 {
      1009         unsigned int nr_found = 0;
      1010         unsigned int shift, height;
      1011
      1012         height = slot->height;
      1013         if (height == 0)
      1014                 goto out;
      1015         shift = (height-1) * RADIX_TREE_MAP_SHIFT;
      1016
      1017         while (height > 0) {
      1018                 unsigned long i = (index >> shift) & RADIX_TREE_MAP_MASK ;
      1019
      1020                 for (;;) {
      1021                         if (tag_get(slot, tag, i))
      1022                                 break;
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      * the index is not updated yet.
      
      b) __lookup_tag() notices that the slot is NULL.
      
      1023                         index &= ~((1UL << shift) - 1);
      1024                         index += 1UL << shift;
      1025                         if (index == 0)
      1026                                 goto out;       /* 32-bit wraparound */
      1027                         i++;
      1028                         if (i == RADIX_TREE_MAP_SIZE)
      1029                                 goto out;
      1030                 }
      1031                 height--;
      1032                 if (height == 0) {      /* Bottom level: grab some items */
      ...
      1055                 }
      1056                 shift -= RADIX_TREE_MAP_SHIFT;
      1057                 slot = rcu_dereference_raw(slot->slots[i]);
      1058                 if (slot == NULL)
      1059                         break;
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      c) __lookup_tag() doesn't update the index and return with 0.
      
      1060         }
      1061 out:
      1062         *next_index = index;
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      1063         return nr_found;
      1064 }
      
      (3) Why is the slot NULL even if the tag is set?
      
      Because radix_tree_range_tag_if_tagged() always sets the root tag with
      PAGECACHE_TAG_TOWRITE if the root tag is set with PAGECACHE_TAG_DIRTY,
      even if there is no tag which can be set with PAGECACHE_TAG_TOWRITE
      in the specified range (from *first_indexp to last_index). Of course,
      some PAGECACHE_TAG_DIRTY nodes must exist outside the specified range.
      (radix_tree_range_tag_if_tagged() is called only from tag_pages_for_writeback())
      
       640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root
      *root,
       641                 unsigned long *first_indexp, unsigned long last_index,
       642                 unsigned long nr_to_tag,
       643                 unsigned int iftag, unsigned int settag)
       644 {
       645         unsigned int height = root->height;
       646         struct radix_tree_path path[height];
       647         struct radix_tree_path *pathp = path;
       648         struct radix_tree_node *slot;
       649         unsigned int shift;
       650         unsigned long tagged = 0;
       651         unsigned long index = *first_indexp;
       652
       653         last_index = min(last_index, radix_tree_maxindex(height));
       654         if (index > last_index)
       655                 return 0;
       656         if (!nr_to_tag)
       657                 return 0;
       658         if (!root_tag_get(root, iftag)) {
       659                 *first_indexp = last_index + 1;
       660                 return 0;
       661         }
       662         if (height == 0) {
       663                 *first_indexp = last_index + 1;
       664                 root_tag_set(root, settag);
       665                 return 1;
       666         }
      ...
       733         root_tag_set(root, settag);
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       734         *first_indexp = index;
       735
       736         return tagged;
       737 }
      
      As the result, there is no radix_tree_node which is set with
      PAGECACHE_TAG_TOWRITE but the root tag(radix_tree_root) is set with
      PAGECACHE_TAG_TOWRITE.
      
      [figure: inside radix_tree]
      (Please see the figure with typewriter font)
      ===========================================
                [roottag = DIRTY]
                       |             tag=0:NOTHING
               tag[0 0 0 1]              1:DIRTY
                  [x x x +]              2:WRITEBACK
                         |               3:DIRTY,WRITEBACK
                         p               4:TOWRITE
                   <--->                 5:DIRTY,TOWRITE ...
           specified range (index: 0 to 2)
      
      * There is no DIRTY tag within the specified range.
       (But there is a DIRTY tag outside that range.)
      
                  | | | | | | | | |
          after calling tag_pages_for_writeback()
                  | | | | | | | | |
                  v v v v v v v v v
      
                [roottag = DIRTY,TOWRITE]
                       |                 p is "page".
               tag[0 0 0 1]              x is NULL.
                  [x x x +]              +- is a pointer to "page".
                         |
                         p
      
      * But TOWRITE tag is set on the root tag.
      ============================================
      
      After that, radix_tree_extend() via radix_tree_insert() is called
      when the page is added.
      This function sets the new radix_tree_node with PAGECACHE_TAG_TOWRITE
      to succeed the status of the root tag.
      
       246 static int radix_tree_extend(struct radix_tree_root *root, unsigned long
      index)
       247 {
       248         struct radix_tree_node *node;
       249         unsigned int height;
       250         int tag;
       251
       252         /* Figure out what the height should be.  */
       253         height = root->height + 1;
       254         while (index > radix_tree_maxindex(height))
       255                 height++;
       256
       257         if (root->rnode == NULL) {
       258                 root->height = height;
       259                 goto out;
       260         }
       261
       262         do {
       263                 unsigned int newheight;
       264                 if (!(node = radix_tree_node_alloc(root)))
       265                         return -ENOMEM;
       266
       267                 /* Increase the height.  */
       268                 node->slots[0] = radix_tree_indirect_to_ptr(root->rnode);
       269
       270                 /* Propagate the aggregated tag info into the new root */
       271                 for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
       272                         if (root_tag_get(root, tag))
       273                                 tag_set(node, tag, 0);
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       274                 }
      
      ===========================================
                [roottag = DIRTY,TOWRITE]
                       |     :
               tag[0 0 0 1] [0 0 0 0]
                  [x x x +] [+ x x x]
                         |   |
                         p   p (new page)
      
                  | | | | | | | | |
          after calling radix_tree_insert
                  | | | | | | | | |
                  v v v v v v v v v
      
                [roottag = DIRTY,TOWRITE]
                       |
               tag [5 0 0 0]    *  DIRTY and TOWRITE tags are
                   [+ + x x]       succeeded to the new node.
                    | |
        tag [0 0 0 1] [0 0 0 0]
            [x x x +] [+ x x x]
                   |   |
                   p   p
      ============================================
      
      After that, the index 3 page is released by remove_from_page_cache().
      Then we can make the situation that the tag is set with PAGECACHE_TAG_TOWRITE
      and that the slot which corresponds to the tag is NULL.
      ===========================================
                [roottag = DIRTY,TOWRITE]
                       |
               tag [5 0 0 0]
                   [+ + x x]
                    | |
        tag [0 0 0 1] [0 0 0 0]
            [x x x +] [+ x x x]
                   |   |
                   p   p
               (remove)
      
                  | | | | | | | | |
          after calling remove_page_cache
                  | | | | | | | | |
                  v v v v v v v v v
      
                [roottag = DIRTY,TOWRITE]
                       |
               tag [4 0 0 0]      * Only DIRTY tag is cleared
                   [x + x x]        because no TOWRITE tag is existed
                      |             in the bottom node.
                      [0 0 0 0]
                      [+ x x x]
                       |
                       p
      ============================================
      
      To solve this problem
      
      Change to that radix_tree_tag_if_tagged() doesn't tag the root tag
      if it doesn't set any tags within the specified range.
      
      Like this.
      ============================================
       640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root
      *root,
       641                 unsigned long *first_indexp, unsigned long last_index,
       642                 unsigned long nr_to_tag,
       643                 unsigned int iftag, unsigned int settag)
       644 {
       650         unsigned long tagged = 0;
      ...
       733 	     if (tagged)
      ^^^^^^^^^^^^^^^^^^^^^^^^
       734            root_tag_set(root, settag);
       735         *first_indexp = index;
       736
       737         return tagged;
       738 }
      
      ============================================
      Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac15ee69
  2. 12 11月, 2010 1 次提交
    • N
      radix-tree: fix RCU bug · 27d20fdd
      Nick Piggin 提交于
      Salman Qazi describes the following radix-tree bug:
      
      In the following case, we get can get a deadlock:
      
      0.  The radix tree contains two items, one has the index 0.
      1.  The reader (in this case find_get_pages) takes the rcu_read_lock.
      2.  The reader acquires slot(s) for item(s) including the index 0 item.
      3.  The non-zero index item is deleted, and as a consequence the other item is
          moved to the root of the tree. The place where it used to be is queued for
          deletion after the readers finish.
      3b. The zero item is deleted, removing it from the direct slot, it remains in
          the rcu-delayed indirect node.
      4.  The reader looks at the index 0 slot, and finds that the page has 0 ref
          count
      5.  The reader looks at it again, hoping that the item will either be freed or
          the ref count will increase. This never happens, as the slot it is looking
          at will never be updated. Also, this slot can never be reclaimed because
          the reader is holding rcu_read_lock and is in an infinite loop.
      
      The fix is to re-use the same "indirect" pointer case that requires a slot
      lookup retry into a general "retry the lookup" bit.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      Reported-by: NSalman Qazi <sqazi@google.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27d20fdd
  3. 23 8月, 2010 2 次提交
    • D
      radix-tree: radix_tree_range_tag_if_tagged() can set incorrect tags · 144dcfc0
      Dave Chinner 提交于
      Commit ebf8aa44 ("radix-tree:
      omplement function radix_tree_range_tag_if_tagged") does not safely
      set tags on on intermediate tree nodes. The code walks down the tree
      setting tags before it has fully resolved the path to the leaf under
      the assumption there will be a leaf slot with the tag set in the
      range it is searching.
      
      Unfortunately, this is not a valid assumption - we can abort after
      setting a tag on an intermediate node if we overrun the number of
      tags we are allowed to set in a batch, or stop scanning because we
      we have passed the last scan index before we reach a leaf slot with
      the tag we are searching for set.
      
      As a result, we can leave the function with tags set on intemediate
      nodes which can be tripped over later by tag-based lookups. The
      result of these stale tags is that lookup may end prematurely or
      livelock because the lookup cannot make progress.
      
      The fix for the problem involves reocrding the traversal path we
      take to the leaf nodes, and only propagating the tags back up the
      tree once the tag is set in the leaf node slot. We are already
      recording the path for efficient traversal, so there is no
      additional overhead to do the intermediately node tag setting in
      this manner.
      
      This fixes a radix tree lookup livelock triggered by the new
      writeback sync livelock avoidance code introduced in commit
      f446daae ("mm: implement writeback
      livelock avoidance using page tagging").
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NJan Kara <jack@suse.cz>
      144dcfc0
    • D
      radix-tree: clear all tags in radix_tree_node_rcu_free · b6dd0865
      Dave Chinner 提交于
      Commit f446daae ("mm: implement
      writeback livelock avoidance using page tagging") introduced a new
      radix tree tag, increasing the number of tags in each node from 2 to
      3. It did not, however, fix up the code in
      radix_tree_node_rcu_free() that cleans up after radix_tree_shrink()
      and hence could leave stray tags set in the new tag array.
      
      The result is that the livelock avoidance code added in the the
      above commit would hit stale tags when doing tag based lookups,
      resulting in livelocks when trying to traverse the tree.
      
      Fix this problem in radix_tree_node_rcu_free() so it doesn't happen
      again in the future by using a loop to walk all the tags up to
      RADIX_TREE_MAX_TAGS to clear the stray tags radix_tree_shrink()
      leaves behind.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Acked-by: NNick Piggin <npiggin@kernel.dk>
      Acked-by: NJan Kara <jack@suse.cz>
      b6dd0865
  4. 21 8月, 2010 1 次提交
  5. 20 8月, 2010 1 次提交
  6. 10 8月, 2010 1 次提交
  7. 28 5月, 2010 1 次提交
  8. 10 4月, 2010 1 次提交
    • D
      radix_tree_tag_get() is not as safe as the docs make out [ver #2] · ce82653d
      David Howells 提交于
      radix_tree_tag_get() is not safe to use concurrently with radix_tree_tag_set()
      or radix_tree_tag_clear().  The problem is that the double tag_get() in
      radix_tree_tag_get():
      
      		if (!tag_get(node, tag, offset))
      			saw_unset_tag = 1;
      		if (height == 1) {
      			int ret = tag_get(node, tag, offset);
      
      may see the value change due to the action of set/clear.  RCU is no protection
      against this as no pointers are being changed, no nodes are being replaced
      according to a COW protocol - set/clear alter the node directly.
      
      The documentation in linux/radix-tree.h, however, says that
      radix_tree_tag_get() is an exception to the rule that "any function modifying
      the tree or tags (...) must exclude other modifications, and exclude any
      functions reading the tree".
      
      The problem is that the next statement in radix_tree_tag_get() checks that the
      tag doesn't vary over time:
      
      			BUG_ON(ret && saw_unset_tag);
      
      This has been seen happening in FS-Cache:
      
      	https://www.redhat.com/archives/linux-cachefs/2010-April/msg00013.html
      
      To this end, remove the BUG_ON() from radix_tree_tag_get() and note in various
      comments that the value of the tag may change whilst the RCU read lock is held,
      and thus that the return value of radix_tree_tag_get() may not be relied upon
      unless radix_tree_tag_set/clear() and radix_tree_delete() are excluded from
      running concurrently with it.
      Reported-by: NRomain DEGEZ <romain.degez@smartjog.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce82653d
  9. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  10. 25 2月, 2010 1 次提交
    • P
      radix-tree: Disable RCU lockdep checking in radix tree · 2676a58c
      Paul E. McKenney 提交于
      Because the radix tree is used with many different locking
      designs, we cannot do any effective checking without changing
      the radix-tree APIs. It might make sense to do this later, but
      only if the RCU lockdep checking proves itself sufficiently
      valuable.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <1266887105-1528-10-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2676a58c
  11. 20 11月, 2009 2 次提交
    • D
      FS-Cache: Don't delete pending pages from the page-store tracking tree · 285e728b
      David Howells 提交于
      Don't delete pending pages from the page-store tracking tree, but rather send
      them for another write as they've presumably been updated.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      285e728b
    • D
      FS-Cache: Use radix tree preload correctly in tracking of pages to be stored · b34df792
      David Howells 提交于
      __fscache_write_page() attempts to load the radix tree preallocation pool for
      the CPU it is on before calling radix_tree_insert(), as the insertion must be
      done inside a pair of spinlocks.
      
      Use of the preallocation pool, however, is contingent on the radix tree being
      initialised without __GFP_WAIT specified.  __fscache_acquire_cookie() was
      passing GFP_NOFS to INIT_RADIX_TREE() - but that includes __GFP_WAIT.
      
      The solution is to AND out __GFP_WAIT.
      
      Additionally, the banner comment to radix_tree_preload() is altered to make
      note of this prerequisite.  Possibly there should be a WARN_ON() too.
      
      Without this fix, I have seen the following recursive deadlock caused by
      radix_tree_insert() attempting to allocate memory inside the spinlocked
      region, which resulted in FS-Cache being called back into to release memory -
      which required the spinlock already held.
      
      =============================================
      [ INFO: possible recursive locking detected ]
      2.6.32-rc6-cachefs #24
      ---------------------------------------------
      nfsiod/7916 is trying to acquire lock:
       (&cookie->lock){+.+.-.}, at: [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache]
      
      but task is already holding lock:
       (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache]
      
      other info that might help us debug this:
      5 locks held by nfsiod/7916:
       #0:  (nfsiod){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2
       #1:  (&task->u.tk_work#2){+.+.+.}, at: [<ffffffff81048290>] worker_thread+0x19a/0x2e2
       #2:  (&cookie->lock){+.+.-.}, at: [<ffffffffa0076acc>] __fscache_write_page+0x15c/0x3f3 [fscache]
       #3:  (&object->lock#2){+.+.-.}, at: [<ffffffffa0076b07>] __fscache_write_page+0x197/0x3f3 [fscache]
       #4:  (&cookie->stores_lock){+.+...}, at: [<ffffffffa0076b0f>] __fscache_write_page+0x19f/0x3f3 [fscache]
      
      stack backtrace:
      Pid: 7916, comm: nfsiod Not tainted 2.6.32-rc6-cachefs #24
      Call Trace:
       [<ffffffff8105ac7f>] __lock_acquire+0x1649/0x16e3
       [<ffffffff81059ded>] ? __lock_acquire+0x7b7/0x16e3
       [<ffffffff8100e27d>] ? dump_trace+0x248/0x257
       [<ffffffff8105ad70>] lock_acquire+0x57/0x6d
       [<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache]
       [<ffffffff8135467c>] _spin_lock+0x2c/0x3b
       [<ffffffffa0076872>] ? __fscache_uncache_page+0xdb/0x160 [fscache]
       [<ffffffffa0076872>] __fscache_uncache_page+0xdb/0x160 [fscache]
       [<ffffffffa0077eb7>] ? __fscache_check_page_write+0x0/0x71 [fscache]
       [<ffffffffa00b4755>] nfs_fscache_release_page+0x86/0xc4 [nfs]
       [<ffffffffa00907f0>] nfs_release_page+0x3c/0x41 [nfs]
       [<ffffffff81087ffb>] try_to_release_page+0x32/0x3b
       [<ffffffff81092c2b>] shrink_page_list+0x316/0x4ac
       [<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70
       [<ffffffff8135451b>] ? _spin_unlock_irq+0x2b/0x31
       [<ffffffff81093153>] shrink_inactive_list+0x392/0x67c
       [<ffffffff81058a9b>] ? mark_held_locks+0x52/0x70
       [<ffffffff810934ca>] shrink_list+0x8d/0x8f
       [<ffffffff81093744>] shrink_zone+0x278/0x33c
       [<ffffffff81052c70>] ? ktime_get_ts+0xad/0xba
       [<ffffffff8109453b>] try_to_free_pages+0x22e/0x392
       [<ffffffff8109184c>] ? isolate_pages_global+0x0/0x212
       [<ffffffff8108e16b>] __alloc_pages_nodemask+0x3dc/0x5cf
       [<ffffffff810ae24a>] cache_alloc_refill+0x34d/0x6c1
       [<ffffffff811bcf74>] ? radix_tree_node_alloc+0x52/0x5c
       [<ffffffff810ae929>] kmem_cache_alloc+0xb2/0x118
       [<ffffffff811bcf74>] radix_tree_node_alloc+0x52/0x5c
       [<ffffffff811bcfd5>] radix_tree_insert+0x57/0x19c
       [<ffffffffa0076b53>] __fscache_write_page+0x1e3/0x3f3 [fscache]
       [<ffffffffa00b4248>] __nfs_readpage_to_fscache+0x58/0x11e [nfs]
       [<ffffffffa009bb77>] nfs_readpage_release+0x34/0x9b [nfs]
       [<ffffffffa009c0d9>] nfs_readpage_release_full+0x32/0x4b [nfs]
       [<ffffffffa0006cff>] rpc_release_calldata+0x12/0x14 [sunrpc]
       [<ffffffffa0006e2d>] rpc_free_task+0x59/0x61 [sunrpc]
       [<ffffffffa0006f03>] rpc_async_release+0x10/0x12 [sunrpc]
       [<ffffffff810482e5>] worker_thread+0x1ef/0x2e2
       [<ffffffff81048290>] ? worker_thread+0x19a/0x2e2
       [<ffffffff81352433>] ? thread_return+0x3e/0x101
       [<ffffffffa0006ef3>] ? rpc_async_release+0x0/0x12 [sunrpc]
       [<ffffffff8104bff5>] ? autoremove_wake_function+0x0/0x34
       [<ffffffff81058d25>] ? trace_hardirqs_on+0xd/0xf
       [<ffffffff810480f6>] ? worker_thread+0x0/0x2e2
       [<ffffffff8104bd21>] kthread+0x7a/0x82
       [<ffffffff8100beda>] child_rip+0xa/0x20
       [<ffffffff8100b87c>] ? restore_args+0x0/0x30
       [<ffffffff8104c2b9>] ? add_wait_queue+0x15/0x44
       [<ffffffff8104bca7>] ? kthread+0x0/0x82
       [<ffffffff8100bed0>] ? child_rip+0x0/0x20
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      b34df792
  12. 17 6月, 2009 2 次提交
  13. 07 1月, 2009 1 次提交
  14. 06 1月, 2009 1 次提交
  15. 27 7月, 2008 2 次提交
  16. 05 7月, 2008 1 次提交
  17. 13 6月, 2008 1 次提交
    • N
      radix-tree: fix small lockless radix-tree bug · 643b52b9
      Nick Piggin 提交于
      We shrink a radix tree when its root node has only one child, in the left
      most slot.  The child becomes the new root node.  To perform this
      operation in a manner compatible with concurrent lockless lookups, we
      atomically switch the root pointer from the parent to its child.
      
      However a concurrent lockless lookup may now have loaded a pointer to the
      parent (and is presently deciding what to do next).  For this reason, we
      also have to keep the parent node in a valid state after shrinking the
      tree, until the next RCU grace period -- otherwise this lookup with the
      parent pointer may not do the right thing.  Notably, we need to keep the
      child in the left most slot there in case that is requested by the lookup.
      
      This is all pretty standard RCU stuff.  It is worth repeating because in
      my eagerness to obey the radix tree node constructor scheme, I had broken
      it by zeroing the radix tree node before the grace period.
      
      What could happen is that a lookup can load the parent pointer, then
      decide it wants to follow the left most child slot, only to find the slot
      contained NULL due to the concurrent shrinker having zeroed the parent
      node before waiting for a grace period.  The lookup would return a false
      negative as a result.
      
      Fix it by doing that clearing in the RCU callback.  I would normally want
      to rip out the constructor entirely, but radix tree nodes are one of those
      places where they make sense (only few cachelines will be touched soon
      after allocation).
      
      This was never actually found in any lockless pagecache testing or by the
      test harness, but by seeing the odd problem with my scalable vmap rewrite.
       I have not tickled the test harness into reproducing it yet, but I'll
      keep working at it.
      
      Fortunately, it is not a problem anywhere lockless pagecache is used in
      mainline kernels (pagecache probe is not a guarantee, and brd does not
      have concurrent lookups and deletes).
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      643b52b9
  18. 28 4月, 2008 1 次提交
    • C
      Remove set_migrateflags() · 488514d1
      Christoph Lameter 提交于
      Migrate flags must be set on slab creation as agreed upon when the antifrag
      logic was reviewed.  Otherwise some slabs of a slabcache will end up in the
      unmovable and others in the reclaimable section depending on which flag was
      active when a new slab page was allocated.
      
      This likely slid in somehow when antifrag was merged. Remove it.
      
      The buffer_heads are always allocated with __GFP_RECLAIMABLE because the
      SLAB_RECLAIM_ACCOUNT option is set.  The set_migrateflags() never had any
      effect there.
      
      Radix tree allocations are not directly reclaimable but they are allocated
      with __GFP_RECLAIMABLE set on each allocation.  We now set
      SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix
      tree slabs are consistently placed in the reclaimable section.  Radix tree
      slabs will also be accounted as such.
      
      There is then no user left of set_migratepages. So remove it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      488514d1
  19. 06 2月, 2008 1 次提交
    • N
      radix-tree: avoid atomic allocations for preloaded insertions · e2848a0e
      Nick Piggin 提交于
      Most pagecache (and some other) radix tree insertions have the great
      opportunity to preallocate a few nodes with relaxed gfp flags.  But the
      preallocation is squandered when it comes time to allocate a node, we
      default to first attempting a GFP_ATOMIC allocation -- that doesn't
      normally fail, but it can eat into atomic memory reserves that we don't
      need to be using.
      
      Another upshot of this is that it removes the sometimes highly contended
      zone->lock from underneath tree_lock.  Pagecache insertions are always
      performed with a radix tree preload, and after this change, such a
      situation will never fall back to kmem_cache_alloc within
      radix_tree_node_alloc.
      
      David Miller reports seeing this allocation fail on a highly threaded
      sparc64 system:
      
      [527319.459981] dd: page allocation failure. order:0, mode:0x20
      [527319.460403] Call Trace:
      [527319.460568]  [00000000004b71e0] __slab_alloc+0x1b0/0x6a8
      [527319.460636]  [00000000004b7bbc] kmem_cache_alloc+0x4c/0xa8
      [527319.460698]  [000000000055309c] radix_tree_node_alloc+0x20/0x90
      [527319.460763]  [0000000000553238] radix_tree_insert+0x12c/0x260
      [527319.460830]  [0000000000495cd0] add_to_page_cache+0x38/0xb0
      [527319.460893]  [00000000004e4794] mpage_readpages+0x6c/0x134
      [527319.460955]  [000000000049c7fc] __do_page_cache_readahead+0x170/0x280
      [527319.461028]  [000000000049cc88] ondemand_readahead+0x208/0x214
      [527319.461094]  [0000000000496018] do_generic_mapping_read+0xe8/0x428
      [527319.461152]  [0000000000497948] generic_file_aio_read+0x108/0x170
      [527319.461217]  [00000000004badac] do_sync_read+0x88/0xd0
      [527319.461292]  [00000000004bb5cc] vfs_read+0x78/0x10c
      [527319.461361]  [00000000004bb920] sys_read+0x34/0x60
      [527319.461424]  [0000000000406294] linux_sparc_syscall32+0x3c/0x40
      
      The calltrace is significant: __do_page_cache_readahead allocates a number
      of pages with GFP_KERNEL, and hence it should have reclaimed sufficient
      memory to satisfy GFP_ATOMIC allocations.  However after the list of pages
      goes to mpage_readpages, there can be significant intervals (including disk
      IO) before all the pages are inserted into the radix-tree.  So the reserves
      can easily be depleted at that point.  The patch is confirmed to fix the
      problem.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e2848a0e
  20. 17 10月, 2007 6 次提交
    • P
      avoid negative (and full-width) shifts in radix-tree.c · 430d275a
      Peter Lund 提交于
      Negative shifts are not allowed in C (the result is undefined).  Same thing
      with full-width shifts.
      
      It works on most platforms but not on the VAX with gcc 4.0.1 (it results in an
      "operand reserved" fault).
      
      Shifting by more than the width of the value on the left is also not
      allowed.  I think the extra '>> 1' tacked on at the end in the original
      code was an attempt to work around that.  Getting rid of that is an extra
      feature of this patch.
      
      Here's the chapter and verse, taken from the final draft of the C99
      standard ("6.5.7 Bitwise shift operators", paragraph 3):
      
        "The integer promotions are performed on each of the operands. The
        type of the result is that of the promoted left operand. If the
        value of the right operand is negative or is greater than or equal
        to the width of the promoted left operand, the behavior is
        undefined."
      
      Thank you to Jan-Benedict Glaw, Christoph Hellwig, Maciej Rozycki, Pekka
      Enberg, Andreas Schwab, and Christoph Lameter for review.  Special thanks
      to Andreas for spotting that my fix only removed half the undefined
      behaviour.
      Signed-off-by: NPeter Lund <firefly@vax64.dk>
      Christoph Lameter <clameter@sgi.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: "Maciej W. Rozycki" <macro@linux-mips.org>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Andreas Schwab <schwab@suse.de>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      430d275a
    • C
      Slab API: remove useless ctor parameter and reorder parameters · 4ba9b9d0
      Christoph Lameter 提交于
      Slab constructors currently have a flags parameter that is never used.  And
      the order of the arguments is opposite to other slab functions.  The object
      pointer is placed before the kmem_cache pointer.
      
      Convert
      
              ctor(void *object, struct kmem_cache *s, unsigned long flags)
      
      to
      
              ctor(struct kmem_cache *s, void *object)
      
      throughout the kernel
      
      [akpm@linux-foundation.org: coupla fixes]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ba9b9d0
    • M
      Group short-lived and reclaimable kernel allocations · e12ba74d
      Mel Gorman 提交于
      This patch marks a number of allocations that are either short-lived such as
      network buffers or are reclaimable such as inode allocations.  When something
      like updatedb is called, long-lived and unmovable kernel allocations tend to
      be spread throughout the address space which increases fragmentation.
      
      This patch groups these allocations together as much as possible by adding a
      new MIGRATE_TYPE.  The MIGRATE_RECLAIMABLE type is for allocations that can be
      reclaimed on demand, but not moved.  i.e.  they can be migrated by deleting
      them and re-reading the information from elsewhere.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e12ba74d
    • J
      fix the max path calculation in radix-tree.c · 26fb1589
      Jeff Moyer 提交于
      A while back, Nick Piggin introduced a patch to reduce the node memory
      usage for small files (commit cfd9b7df):
      
      -#define RADIX_TREE_MAP_SHIFT	6
      +#define RADIX_TREE_MAP_SHIFT	(CONFIG_BASE_SMALL ? 4 : 6)
      
      Unfortunately, he didn't take into account the fact that the
      calculation of the maximum path was based on an assumption of having
      to round up:
      
      #define RADIX_TREE_MAX_PATH (RADIX_TREE_INDEX_BITS/RADIX_TREE_MAP_SHIFT + 2)
      
      So, if CONFIG_BASE_SMALL is set, you will end up with a
      RADIX_TREE_MAX_PATH that is one greater than necessary.  The practical
      upshot of this is just a bit of wasted memory (one long in the
      height_to_maxindex array, an extra pre-allocated radix tree node per
      cpu, and extra stack usage in a couple of functions), but it seems
      worth getting right.
      
      It's also worth noting that I never build with CONFIG_BASE_SMALL.
      What I did to test this was duplicate the code in a small user-space
      program and check the results of the calculations for max path and the
      contents of the height_to_maxindex array.
      Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
      Acked-by: NNick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26fb1589
    • N
      radix-tree: use indirect bit · c0bc9875
      Nick Piggin 提交于
      Rather than sign direct radix-tree pointers with a special bit, sign the
      indirect one that hangs off the root.  This means that, given a lookup_slot
      operation, the invalid result will be differentiated from the valid
      (previously, valid results could have the bit either set or clear).
      
      This does not affect slot lookups which occur under lock -- they can never
      return an invalid result.  Is needed in future for lockless pagecache.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0bc9875
    • F
      radixtree: introduce radix_tree_next_hole() · 6df8ba4f
      Fengguang Wu 提交于
      Introduce radix_tree_next_hole(root, index, max_scan) to scan radix tree for
      the first hole.  It will be used in interleaved readahead.
      
      The implementation is dumb and obviously correct.  It can help debug(and
      document) the possible smart one in future.
      
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6df8ba4f
  21. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  22. 14 7月, 2007 1 次提交
  23. 10 5月, 2007 1 次提交
    • R
      Add suspend-related notifications for CPU hotplug · 8bb78442
      Rafael J. Wysocki 提交于
      Since nonboot CPUs are now disabled after tasks and devices have been
      frozen and the CPU hotplug infrastructure is used for this purpose, we need
      special CPU hotplug notifications that will help the CPU-hotplug-aware
      subsystems distinguish normal CPU hotplug events from CPU hotplug events
      related to a system-wide suspend or resume operation in progress.  This
      patch introduces such notifications and causes them to be used during
      suspend and resume transitions.  It also changes all of the
      CPU-hotplug-aware subsystems to take these notifications into consideration
      (for now they are handled in the same way as the corresponding "normal"
      ones).
      
      [oleg@tv-sign.ru: cleanups]
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bb78442
  24. 08 12月, 2006 3 次提交
    • I
      [PATCH] hotplug CPU: clean up hotcpu_notifier() use · 02316067
      Ingo Molnar 提交于
      There was lots of #ifdef noise in the kernel due to hotcpu_notifier(fn,
      prio) not correctly marking 'fn' as used in the !HOTPLUG_CPU case, and thus
      generating compiler warnings of unused symbols, hence forcing people to add
      #ifdefs.
      
      the compiler can skip truly unused functions just fine:
      
          text    data     bss     dec     hex filename
       1624412  728710 3674856 6027978  5bfaca vmlinux.before
       1624412  728710 3674856 6027978  5bfaca vmlinux.after
      
      [akpm@osdl.org: topology.c fix]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      02316067
    • N
      [PATCH] radix-tree: RCU lockless readside · 7cf9c2c7
      Nick Piggin 提交于
      Make radix tree lookups safe to be performed without locks.  Readers are
      protected against nodes being deleted by using RCU based freeing.  Readers
      are protected against new node insertion by using memory barriers to ensure
      the node itself will be properly written before it is visible in the radix
      tree.
      
      Each radix tree node keeps a record of their height (above leaf nodes).
      This height does not change after insertion -- when the radix tree is
      extended, higher nodes are only inserted in the top.  So a lookup can take
      the pointer to what is *now* the root node, and traverse down it even if
      the tree is concurrently extended and this node becomes a subtree of a new
      root.
      
      "Direct" pointers (tree height of 0, where root->rnode points directly to
      the data item) are handled by using the low bit of the pointer to signal
      whether rnode is a direct pointer or a pointer to a radix tree node.
      
      When a reader wants to traverse the next branch, they will take a copy of
      the pointer.  This pointer will be either NULL (and the branch is empty) or
      non-NULL (and will point to a valid node).
      
      [akpm@osdl.org: cleanups]
      [Lee.Schermerhorn@hp.com: bugfixes, comments, simplifications]
      [clameter@sgi.com: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Christoph Lameter <clameter@engr.sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      7cf9c2c7
    • C
      [PATCH] slab: remove kmem_cache_t · e18b890b
      Christoph Lameter 提交于
      Replace all uses of kmem_cache_t with struct kmem_cache.
      
      The patch was generated using the following script:
      
      	#!/bin/sh
      	#
      	# Replace one string by another in all the kernel sources.
      	#
      
      	set -e
      
      	for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
      		quilt add $file
      		sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
      		mv /tmp/$$ $file
      		quilt refresh
      	done
      
      The script was run like this
      
      	sh replace kmem_cache_t "struct kmem_cache"
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e18b890b
  25. 11 10月, 2006 1 次提交
  26. 26 6月, 2006 1 次提交
  27. 23 6月, 2006 3 次提交
    • P
      [PATCH] buglet in radix_tree_tag_set · 4c91c364
      Peter Zijlstra 提交于
      The comment states: 'Setting a tag on a not-present item is a BUG.' Hence
      if 'index' is larger than the maxindex; the item _cannot_ be presen; it
      should also be a BUG.
      
      Also, this allows the following statement (assume a fresh tree):
      
        radix_tree_tag_set(root, 16, 1);
      
      to fail silently, but when preceded by:
      
        radix_tree_insert(root, 32, item);
      
      it would BUG, because the height has been extended by the insert.
      
      In neither case was 16 present.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      4c91c364
    • N
      [PATCH] radix-tree: small · cfd9b7df
      Nick Piggin 提交于
      Reduce radix tree node memory usage by about a factor of 4 for small files
      (< 64K).  There are pointer traversal and memory usage costs for large
      files with dense pagecache.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      cfd9b7df
    • N
      [PATCH] radix-tree: direct data · 612d6c19
      Nick Piggin 提交于
      The ability to have height 0 radix trees (a direct pointer to the data item
      rather than going through a full node->slot) quietly disappeared with
      old-2.6-bkcvs commit ffee171812d51652f9ba284302d9e5c5cc14bdfd.  On 64-bit
      machines this causes nearly 600 bytes to be used for every <= 4K file in
      pagecache.
      
      Re-introduce this feature, root tags stored in spare ->gfp_mask bits.
      
      Simplify radix_tree_delete's complex tag clearing arrangement (which would
      become even more complex) by just falling back to tag clearing functions
      (the pagecache radix-tree never uses this path anyway, so the icache
      savings will mean it's actually a speedup).
      
      On my 4GB G5, this saves 8MB RAM per kernel kernel source+object tree in
      pagecache.
      
      Pagecache lookup, insertion, and removal speed for small files will also be
      improved.
      
      This makes RCU radix tree harder, but it's worth it.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      612d6c19