1. 26 1月, 2011 36 次提交
    • T
      console: rename acquire/release_console_sem() to console_lock/unlock() · ac751efa
      Torben Hohn 提交于
      The -rt patches change the console_semaphore to console_mutex.  As a
      result, a quite large chunk of the patches changes all
      acquire/release_console_sem() to acquire/release_console_mutex()
      
      This commit makes things use more neutral function names which dont make
      implications about the underlying lock.
      
      The only real change is the return value of console_trylock which is
      inverted from try_acquire_console_sem()
      
      This patch also paves the way to switching console_sem from a semaphore to
      a mutex.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert]
      Signed-off-by: NTorben Hohn <torbenh@gmx.de>
      Cc: Thomas Gleixner <tglx@tglx.de>
      Cc: Greg KH <gregkh@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac751efa
    • P
      squashfs: fix use of uninitialised variable in zlib & xz decompressors · 3689456b
      Phillip Lougher 提交于
      Fix potential use of uninitialised variable caused by recent
      decompressor code optimisations.
      
      In zlib_uncompress (zlib_wrapper.c) we have
      
      	int zlib_err, zlib_init = 0;
      	...
      	do {
      		...
      			if (avail == 0) {
      				offset = 0;
      				put_bh(bh[k++]);
      				continue;
      			}
      		...
      		zlib_err = zlib_inflate(stream, Z_SYNC_FLUSH);
      		...
      	} while (zlib_err == Z_OK);
      
      If continue is executed (avail == 0) then the while condition will be
      evaluated testing zlib_err, which is uninitialised first time around the
      loop.
      
      Fix this by getting rid of the 'if (avail == 0)' condition test, this
      edge condition should not be being handled in the decompressor code, and
      instead handle it generically in the caller code.
      
      Similarly for xz_wrapper.c.
      
      Incidentally, on most architectures (bar Mips and Parisc), no
      uninitialised variable warning is generated by gcc, this is because the
      while condition test on continue is optimised out and not performed
      (when executing continue zlib_err has not been changed since entering
      the loop, and logically if the while condition was true previously, then
      it's still true).
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      Reported-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3689456b
    • T
      radix_tree: radix_tree_gang_lookup_tag_slot() may never return · ac15ee69
      Toshiyuki Okajima 提交于
      Executed command: fsstress -d /mnt -n 600 -p 850
      
        crash> bt
        PID: 7947   TASK: ffff880160546a70  CPU: 0   COMMAND: "fsstress"
         #0 [ffff8800dfc07d00] machine_kexec at ffffffff81030db9
         #1 [ffff8800dfc07d70] crash_kexec at ffffffff810a7952
         #2 [ffff8800dfc07e40] oops_end at ffffffff814aa7c8
         #3 [ffff8800dfc07e70] die_nmi at ffffffff814aa969
         #4 [ffff8800dfc07ea0] do_nmi_callback at ffffffff8102b07b
         #5 [ffff8800dfc07f10] do_nmi at ffffffff814aa514
         #6 [ffff8800dfc07f50] nmi at ffffffff814a9d60
            [exception RIP: __lookup_tag+100]
            RIP: ffffffff812274b4  RSP: ffff88016056b998  RFLAGS: 00000287
            RAX: 0000000000000000  RBX: 0000000000000002  RCX: 0000000000000006
            RDX: 000000000000001d  RSI: ffff88016056bb18  RDI: ffff8800c85366e0
            RBP: ffff88016056b9c8   R8: ffff88016056b9e8   R9: 0000000000000000
            R10: 000000000000000e  R11: ffff8800c8536908  R12: 0000000000000010
            R13: 0000000000000040  R14: ffffffffffffffc0  R15: ffff8800c85366e0
            ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
        <NMI exception stack>
         #7 [ffff88016056b998] __lookup_tag at ffffffff812274b4
         #8 [ffff88016056b9d0] radix_tree_gang_lookup_tag_slot at ffffffff81227605
         #9 [ffff88016056ba20] find_get_pages_tag at ffffffff810fc110
        #10 [ffff88016056ba80] pagevec_lookup_tag at ffffffff81105e85
        #11 [ffff88016056baa0] write_cache_pages at ffffffff81104c47
        #12 [ffff88016056bbd0] generic_writepages at ffffffff81105014
        #13 [ffff88016056bbe0] do_writepages at ffffffff81105055
        #14 [ffff88016056bbf0] __filemap_fdatawrite_range at ffffffff810fb2cb
        #15 [ffff88016056bc40] filemap_write_and_wait_range at ffffffff810fb32a
        #16 [ffff88016056bc70] generic_file_direct_write at ffffffff810fb3dc
        #17 [ffff88016056bce0] __generic_file_aio_write at ffffffff810fcee5
        #18 [ffff88016056bda0] generic_file_aio_write at ffffffff810fd085
        #19 [ffff88016056bdf0] do_sync_write at ffffffff8114f9ea
        #20 [ffff88016056bf00] vfs_write at ffffffff8114fcf8
        #21 [ffff88016056bf30] sys_write at ffffffff81150691
        #22 [ffff88016056bf80] system_call_fastpath at ffffffff8100c0b2
      
      I think this root cause is the following:
      
       radix_tree_range_tag_if_tagged() always tags the root tag with settag
       if the root tag is set with iftag even if there are no iftag tags
       in the specified range (Of course, there are some iftag tags
       outside the specified range).
      
      ===============================================================================
      [[[Detailed description]]]
      
      (1) Why cannot radix_tree_gang_lookup_tag_slot() return forever?
      
      __lookup_tag():
       - Return with 0.
       - Return with the index which is not bigger than the old one as the
         input parameter.
      
      Therefore the following "while" repeats forever because the above
      conditions cause "ret" not to be updated and the cur_index cannot be
      changed into the bigger one.
      
      (So, radix_tree_gang_lookup_tag_slot() cannot return forever.)
      
      radix_tree_gang_lookup_tag_slot():
      1178         while (ret < max_items) {
      1179                 unsigned int slots_found;
      1180                 unsigned long next_index;       /* Index of next search */
      1181
      1182                 if (cur_index > max_index)
      1183                         break;
      1184                 slots_found = __lookup_tag(node, results + ret,
      1185                                 cur_index, max_items - ret, &next_index,
      tag);
      1186                 ret += slots_found;
      			// cannot update ret because slots_found == 0.
      			// so, this while loops forever.
      1187                 if (next_index == 0)
      1188                         break;
      1189                 cur_index = next_index;
      1190         }
      
      (2) Why does __lookup_tag() return with 0 and doesn't update the index?
      
      Assuming the following:
        - the one of the slot in radix_tree_node is NULL.
        - the one of the tag which corresponds to the slot sets with
          PAGECACHE_TAG_TOWRITE or other.
        - In a certain height(!=0), the corresponding index is 0.
      
      a) __lookup_tag() notices that the tag is set.
      
      1005 static unsigned int
      1006 __lookup_tag(struct radix_tree_node *slot, void ***results, unsigned long index,
      1007         unsigned int max_items, unsigned long *next_index, unsigned int tag)
      1008 {
      1009         unsigned int nr_found = 0;
      1010         unsigned int shift, height;
      1011
      1012         height = slot->height;
      1013         if (height == 0)
      1014                 goto out;
      1015         shift = (height-1) * RADIX_TREE_MAP_SHIFT;
      1016
      1017         while (height > 0) {
      1018                 unsigned long i = (index >> shift) & RADIX_TREE_MAP_MASK ;
      1019
      1020                 for (;;) {
      1021                         if (tag_get(slot, tag, i))
      1022                                 break;
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      * the index is not updated yet.
      
      b) __lookup_tag() notices that the slot is NULL.
      
      1023                         index &= ~((1UL << shift) - 1);
      1024                         index += 1UL << shift;
      1025                         if (index == 0)
      1026                                 goto out;       /* 32-bit wraparound */
      1027                         i++;
      1028                         if (i == RADIX_TREE_MAP_SIZE)
      1029                                 goto out;
      1030                 }
      1031                 height--;
      1032                 if (height == 0) {      /* Bottom level: grab some items */
      ...
      1055                 }
      1056                 shift -= RADIX_TREE_MAP_SHIFT;
      1057                 slot = rcu_dereference_raw(slot->slots[i]);
      1058                 if (slot == NULL)
      1059                         break;
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      c) __lookup_tag() doesn't update the index and return with 0.
      
      1060         }
      1061 out:
      1062         *next_index = index;
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      1063         return nr_found;
      1064 }
      
      (3) Why is the slot NULL even if the tag is set?
      
      Because radix_tree_range_tag_if_tagged() always sets the root tag with
      PAGECACHE_TAG_TOWRITE if the root tag is set with PAGECACHE_TAG_DIRTY,
      even if there is no tag which can be set with PAGECACHE_TAG_TOWRITE
      in the specified range (from *first_indexp to last_index). Of course,
      some PAGECACHE_TAG_DIRTY nodes must exist outside the specified range.
      (radix_tree_range_tag_if_tagged() is called only from tag_pages_for_writeback())
      
       640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root
      *root,
       641                 unsigned long *first_indexp, unsigned long last_index,
       642                 unsigned long nr_to_tag,
       643                 unsigned int iftag, unsigned int settag)
       644 {
       645         unsigned int height = root->height;
       646         struct radix_tree_path path[height];
       647         struct radix_tree_path *pathp = path;
       648         struct radix_tree_node *slot;
       649         unsigned int shift;
       650         unsigned long tagged = 0;
       651         unsigned long index = *first_indexp;
       652
       653         last_index = min(last_index, radix_tree_maxindex(height));
       654         if (index > last_index)
       655                 return 0;
       656         if (!nr_to_tag)
       657                 return 0;
       658         if (!root_tag_get(root, iftag)) {
       659                 *first_indexp = last_index + 1;
       660                 return 0;
       661         }
       662         if (height == 0) {
       663                 *first_indexp = last_index + 1;
       664                 root_tag_set(root, settag);
       665                 return 1;
       666         }
      ...
       733         root_tag_set(root, settag);
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       734         *first_indexp = index;
       735
       736         return tagged;
       737 }
      
      As the result, there is no radix_tree_node which is set with
      PAGECACHE_TAG_TOWRITE but the root tag(radix_tree_root) is set with
      PAGECACHE_TAG_TOWRITE.
      
      [figure: inside radix_tree]
      (Please see the figure with typewriter font)
      ===========================================
                [roottag = DIRTY]
                       |             tag=0:NOTHING
               tag[0 0 0 1]              1:DIRTY
                  [x x x +]              2:WRITEBACK
                         |               3:DIRTY,WRITEBACK
                         p               4:TOWRITE
                   <--->                 5:DIRTY,TOWRITE ...
           specified range (index: 0 to 2)
      
      * There is no DIRTY tag within the specified range.
       (But there is a DIRTY tag outside that range.)
      
                  | | | | | | | | |
          after calling tag_pages_for_writeback()
                  | | | | | | | | |
                  v v v v v v v v v
      
                [roottag = DIRTY,TOWRITE]
                       |                 p is "page".
               tag[0 0 0 1]              x is NULL.
                  [x x x +]              +- is a pointer to "page".
                         |
                         p
      
      * But TOWRITE tag is set on the root tag.
      ============================================
      
      After that, radix_tree_extend() via radix_tree_insert() is called
      when the page is added.
      This function sets the new radix_tree_node with PAGECACHE_TAG_TOWRITE
      to succeed the status of the root tag.
      
       246 static int radix_tree_extend(struct radix_tree_root *root, unsigned long
      index)
       247 {
       248         struct radix_tree_node *node;
       249         unsigned int height;
       250         int tag;
       251
       252         /* Figure out what the height should be.  */
       253         height = root->height + 1;
       254         while (index > radix_tree_maxindex(height))
       255                 height++;
       256
       257         if (root->rnode == NULL) {
       258                 root->height = height;
       259                 goto out;
       260         }
       261
       262         do {
       263                 unsigned int newheight;
       264                 if (!(node = radix_tree_node_alloc(root)))
       265                         return -ENOMEM;
       266
       267                 /* Increase the height.  */
       268                 node->slots[0] = radix_tree_indirect_to_ptr(root->rnode);
       269
       270                 /* Propagate the aggregated tag info into the new root */
       271                 for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
       272                         if (root_tag_get(root, tag))
       273                                 tag_set(node, tag, 0);
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       274                 }
      
      ===========================================
                [roottag = DIRTY,TOWRITE]
                       |     :
               tag[0 0 0 1] [0 0 0 0]
                  [x x x +] [+ x x x]
                         |   |
                         p   p (new page)
      
                  | | | | | | | | |
          after calling radix_tree_insert
                  | | | | | | | | |
                  v v v v v v v v v
      
                [roottag = DIRTY,TOWRITE]
                       |
               tag [5 0 0 0]    *  DIRTY and TOWRITE tags are
                   [+ + x x]       succeeded to the new node.
                    | |
        tag [0 0 0 1] [0 0 0 0]
            [x x x +] [+ x x x]
                   |   |
                   p   p
      ============================================
      
      After that, the index 3 page is released by remove_from_page_cache().
      Then we can make the situation that the tag is set with PAGECACHE_TAG_TOWRITE
      and that the slot which corresponds to the tag is NULL.
      ===========================================
                [roottag = DIRTY,TOWRITE]
                       |
               tag [5 0 0 0]
                   [+ + x x]
                    | |
        tag [0 0 0 1] [0 0 0 0]
            [x x x +] [+ x x x]
                   |   |
                   p   p
               (remove)
      
                  | | | | | | | | |
          after calling remove_page_cache
                  | | | | | | | | |
                  v v v v v v v v v
      
                [roottag = DIRTY,TOWRITE]
                       |
               tag [4 0 0 0]      * Only DIRTY tag is cleared
                   [x + x x]        because no TOWRITE tag is existed
                      |             in the bottom node.
                      [0 0 0 0]
                      [+ x x x]
                       |
                       p
      ============================================
      
      To solve this problem
      
      Change to that radix_tree_tag_if_tagged() doesn't tag the root tag
      if it doesn't set any tags within the specified range.
      
      Like this.
      ============================================
       640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root
      *root,
       641                 unsigned long *first_indexp, unsigned long last_index,
       642                 unsigned long nr_to_tag,
       643                 unsigned int iftag, unsigned int settag)
       644 {
       650         unsigned long tagged = 0;
      ...
       733 	     if (tagged)
      ^^^^^^^^^^^^^^^^^^^^^^^^
       734            root_tag_set(root, settag);
       735         *first_indexp = index;
       736
       737         return tagged;
       738 }
      
      ============================================
      Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac15ee69
    • V
      drivers/clocksource/tcb_clksrc.c: fix init sequence · 1817dc03
      Voss, Nikolaus 提交于
      setup_irq() was called before clockevents_register_device() which is
      needed by the irq handler.  Bug was reproducible by restarting the
      kernel using kexec (reliable crash).
      Signed-off-by: NNikolaus Voss <n.voss@weinmann.de>
      Cc: David Brownell <dbrownell@users.sourceforge.net>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1817dc03
    • K
      memcg: fix race at move_parent around compound_order() · 52dbb905
      KAMEZAWA Hiroyuki 提交于
      A fix up mem_cgroup_move_parent() which use compound_order() in
      asynchronous manner.  This compound_order() may return unknown value
      because we don't take lock.  Use PageTransHuge() and HPAGE_SIZE instead
      of it.
      
      Also clean up for mem_cgroup_move_parent().
       - remove unnecessary initialization of local variable.
       - rename charge_size -> page_size
       - remove unnecessary (wrong) comment.
       - added a comment about THP.
      
      Note:
       Current design take compound_page_lock() in caller of move_account().
       This should be revisited when we implement direct move_task of hugepage
       without splitting.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52dbb905
    • K
      memcg: bugfix check mem_cgroup_disabled() at split fixup · 3d37c4a9
      KAMEZAWA Hiroyuki 提交于
      mem_cgroup_disabled() should be checked at splitting.  If disabled, no
      heavy work is necesary.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d37c4a9
    • K
      memcg: fix account leak at failure of memsw acconting · 01c88e2d
      KAMEZAWA Hiroyuki 提交于
      Commit 4b534334 ("memcg: clean up try_charge main loop") removes a
      cancel of charge at case: memory charge-> success.  mem+swap charge->
      failure.
      
      This leaks usage of memory.  Fix it.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: <stable@kernel.org>	[2.6.36+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01c88e2d
    • M
      mm: migration: clarify migrate_pages() comment · 28bd6578
      Minchan Kim 提交于
      Callers of migrate_pages should putback_lru_pages to return pages
      isolated to LRU or free list.  Now comment is rather confusing.  It says
      caller always have to call it.
      
      It is more clear to point out that the caller has to call it if
      migrate_pages's return value isn't zero.
      Signed-off-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28bd6578
    • A
      mm: compaction: don't depend on HUGETLB_PAGE · 33a93877
      Andrea Arcangeli 提交于
      Commit 5d689240 ("thp: select CONFIG_COMPACTION if TRANSPARENT_HUGEPAGE
      enabled") causes this warning during the configuration process:
      
        warning: (TRANSPARENT_HUGEPAGE) selects COMPACTION which has unmet
        direct dependencies (EXPERIMENTAL && HUGETLB_PAGE && MMU)
      
      COMPACTION doesn't depend on HUGETLB_PAGE, it doesn't depend on THP
      either, it is also useful for regular alloc_pages(order > 0) including
      the very kernel stack during fork (THREAD_ORDER = 1).  It's always
      better to enable COMPACTION.
      
      The warning should be an error because we would end up with MIGRATION
      not selected, and COMPACTION wouldn't work without migration (despite it
      seems to build with an inline migrate_pages returning -ENOSYS).
      
      I'd also like to remove EXPERIMENTAL: compaction has been in the kernel
      for some releases (for full safety the default remains disabled which I
      think is enough).
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NLuca Tettamanti <kronos.it@gmail.com>
      Tested-by: NLuca Tettamanti <kronos.it@gmail.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33a93877
    • J
      mm/memcontrol.c: fix uninitialized variable use in mem_cgroup_move_parent() · 8dba474f
      Jesper Juhl 提交于
      In mm/memcontrol.c::mem_cgroup_move_parent() there's a path that jumps
      to the 'put_back' label
      
        	ret = __mem_cgroup_try_charge(NULL, gfp_mask, &parent, false, charge);
        	if (ret || !parent)
        		goto put_back;
      
      where we'll
      
        	if (charge > PAGE_SIZE)
        		compound_unlock_irqrestore(page, flags);
      
      but, we have not assigned anything to 'flags' at this point, nor have we
      called 'compound_lock_irqsave()' (which is what sets 'flags').  The
      'put_back' label should be moved below the call to
      compound_unlock_irqrestore() as per this patch.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8dba474f
    • D
      mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator · 2ff754fa
      David Rientjes 提交于
      Commit 0e093d99 ("writeback: do not sleep on the congestion queue if
      there are no congested BDIs or if significant congestion is not being
      encountered in the current zone") uncovered a livelock in the page
      allocator that resulted in tasks infinitely looping trying to find
      memory and kswapd running at 100% cpu.
      
      The issue occurs because drain_all_pages() is called immediately
      following direct reclaim when no memory is freed and try_to_free_pages()
      returns non-zero because all zones in the zonelist do not have their
      all_unreclaimable flag set.
      
      When draining the per-cpu pagesets back to the buddy allocator for each
      zone, the zone->pages_scanned counter is cleared to avoid erroneously
      setting zone->all_unreclaimable later.  The problem is that no pages may
      actually be drained and, thus, the unreclaimable logic never fails
      direct reclaim so the oom killer may be invoked.
      
      This apparently only manifested after wait_iff_congested() was
      introduced and the zone was full of anonymous memory that would not
      congest the backing store.  The page allocator would infinitely loop if
      there were no other tasks waiting to be scheduled and clear
      zone->pages_scanned because of drain_all_pages() as the result of this
      change before kswapd could scan enough pages to trigger the reclaim
      logic.  Additionally, with every loop of the page allocator and in the
      reclaim path, kswapd would be kicked and would end up running at 100%
      cpu.  In this scenario, current and kswapd are all running continuously
      with kswapd incrementing zone->pages_scanned and current clearing it.
      
      The problem is even more pronounced when current swaps some of its
      memory to swap cache and the reclaimable logic then considers all active
      anonymous memory in the all_unreclaimable logic, which requires a much
      higher zone->pages_scanned value for try_to_free_pages() to return zero
      that is never attainable in this scenario.
      
      Before wait_iff_congested(), the page allocator would incur an
      unconditional timeout and allow kswapd to elevate zone->pages_scanned to
      a level that the oom killer would be called the next time it loops.
      
      The fix is to only attempt to drain pcp pages if there is actually a
      quantity to be drained.  The unconditional clearing of
      zone->pages_scanned in free_pcppages_bulk() need not be changed since
      other callers already ensure that draining will occur.  This patch
      ensures that free_pcppages_bulk() will actually free memory before
      calling into it from drain_all_pages() so zone->pages_scanned is only
      cleared if appropriate.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ff754fa
    • D
      mm: fix deferred congestion timeout if preferred zone is not allowed · f33261d7
      David Rientjes 提交于
      Before 0e093d99 ("writeback: do not sleep on the congestion queue if
      there are no congested BDIs or if significant congestion is not being
      encountered in the current zone"), preferred_zone was only used for NUMA
      statistics, to determine the zoneidx from which to allocate from given
      the type requested, and whether to utilize memory compaction.
      
      wait_iff_congested(), though, uses preferred_zone to determine if the
      congestion wait should be deferred because its dirty pages are backed by
      a congested bdi.  This incorrectly defers the timeout and busy loops in
      the page allocator with various cond_resched() calls if preferred_zone
      is not allowed in the current context, usually consuming 100% of a cpu.
      
      This patch ensures preferred_zone is an allowed zone in the fastpath
      depending on whether current is constrained by its cpuset or nodes in
      its mempolicy (when the nodemask passed is non-NULL).  This is correct
      since the fastpath allocation always passes ALLOC_CPUSET when trying to
      allocate memory.  In the slowpath, this patch resets preferred_zone to
      the first zone of the allowed type when the allocation is not
      constrained by current's cpuset, i.e.  it does not pass ALLOC_CPUSET.
      
      This patch also ensures preferred_zone is from the set of allowed nodes
      when called from within direct reclaim since allocations are always
      constrained by cpusets in this context (it is blockable).
      
      Both of these uses of cpuset_current_mems_allowed are protected by
      get_mems_allowed().
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f33261d7
    • A
      pps: claim parallel port exclusively · 4f542e3d
      Alexander Gordeev 提交于
      Both pps_parport and pps_gen_parport are written in a way that they
      can't share a port with any other driver.  This can result in locking up
      the process that loads modules or even the whole kernel if the modules
      are compiled in.  Use PARPORT_FLAG_EXCL to indicate this.
      Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su>
      Cc: Alexander Gordeev <lasaine@lvk.cs.msu.su>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4f542e3d
    • R
      pps ktimer: remove noisy message · a783ac44
      Rodolfo Giometti 提交于
      Signed-off-by: NRodolfo Giometti <giometti@linux.it>
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Cc: Alexander Gordeev <lasaine@lvk.cs.msu.su>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a783ac44
    • A
      parport: make lockdep happy with waitlist_lock · cbeb4b7a
      Alexander Gordeev 提交于
      parport_unregister_device() should never be used when interrupts are
      enabled in hardware and irq handler is registered so there is no need to
      disable interrupts when using waitlist_lock.  But there is no way to
      explain this subtle semantics to lockdep analyzer.
      
      So disable interrupts here too to simplify things.  The price is
      negligible.
      Signed-off-by: NAlexander Gordeev <lasaine@lvk.cs.msu.su>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbeb4b7a
    • F
      langwell_gpio: modify EOI handling following change of kernel irq subsystem · 0766d20f
      Feng Tang 提交于
      Latest kernel has many changes in IRQ subsystem and its interfaces, like
      adding "irq_eoi" for struct irq_chip, this patch is a follow up change
      for that.
      
      Also remove the unnecessary cast for a "void *".
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      Cc: Alek Du <alek.du@intel.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0766d20f
    • A
      leds: leds-pwm: return proper error if pwm_request failed · d8cc667b
      Axel Lin 提交于
      Return PTR_ERR(led_dat->pwm) instead of 0 if pwm_request failed
      Signed-off-by: NAxel Lin <axel.lin@gmail.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: Luotao Fu <l.fu@pengutronix.de>
      Cc: Reviewed-by: Dmitry Torokhov <dtor@mail.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8cc667b
    • A
      mm/pgtable-generic.c: fix CONFIG_SWAP=n build · f95ba941
      Andrew Morton 提交于
      mips (and sparc32):
      
        In file included from arch/mips/include/asm/tlb.h:21,
                         from mm/pgtable-generic.c:9:
        include/asm-generic/tlb.h: In function `tlb_flush_mmu':
        include/asm-generic/tlb.h:76: error: implicit declaration of function `release_pages'
        include/asm-generic/tlb.h: In function `tlb_remove_page':
        include/asm-generic/tlb.h:105: error: implicit declaration of function `page_cache_release'
      
      free_pages_and_swap_cache() and free_page_and_swap_cache() are macros
      which call release_pages() and page_cache_release().  The obvious fix is
      to include pagemap.h in swap.h, where those macros are defined.  But that
      breaks sparc for weird reasons.
      
      So fix it within mm/pgtable-generic.c instead.
      Reported-by: NYoichi Yuasa <yuasa@linux-mips.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NSam Ravnborg <sam@ravnborg.org>
      Cc: Sergei Shtylyov <sshtylyov@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f95ba941
    • A
      thp: fix PARAVIRT x86 32bit noPAE · cacf061c
      Andrea Arcangeli 提交于
      This fixes TRANSPARENT_HUGEPAGE=y with PARAVIRT=y and HIGHMEM64=n.
      
      The #ifdef that this patch removes was erratically introduced to fix a
      build error for noPAE (where pmd.pmd doesn't exist).  So then the kernel
      built but it failed at runtime because set_pmd_at was a noop.  This will
      correct it by enabling set_pmd_at for noPAE mode too.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: Nwerner <w.landgraf@ru.ru>
      Reported-by: NMinchan Kim <minchan.kim@gmail.com>
      Tested-by: NMinchan Kim <minchan.kim@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cacf061c
    • L
      Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm · 6663050e
      Linus Torvalds 提交于
      * 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm:
        ALSA: AACI: fix timeout duration
        ALSA: AACI: fix timeout condition checking
        ARM: 6636/1: ep93xx: default multiplexed gpio ports to gpio mode
        ARM: 6637/1: Make the argument to virt_to_phys() "const volatile"
        ARM: twd: ensure timer reload is reprogrammed on entry to periodic mode
        ARM: 6635/2: Configure reference clock for Versatile Express timers
        ARM: versatile: name configuration options after actual board names
        ARM: realview: name configuration options after actual board names
        ARM: realview,vexpress: fix section mismatch warning for pen_release
        ARM: 6632/3: mmci: stop using the blockend interrupts
      6663050e
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 · 3af03655
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
        nilfs2: fix crash after one superblock became unavailable
      3af03655
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k · 01ed875a
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k:
        m68k/amiga: Fix "debug=mem"
        m68k/atari: Rename "scc" to "atari_scc"
        m68k: Uninline strchr()
      01ed875a
    • L
      Merge branch 'rmobile-fixes-for-linus' of... · 8348ad7c
      Linus Torvalds 提交于
      Merge branch 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6
      
      * 'rmobile-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
        ARM: mach-shmobile: AG5EVM LCDC / MIPI-DSI platform data
        ARM: mach-shmobile: sh73a0 CPGA fix for PLL CFG bit
        ARM: mach-shmobile: mackerel: clarify shdi/mmcif switch settings
        ARM: mach-shmobile: sh73a0 CPGA fix for IrDA MSTP
        ARM: mach-shmobile: sh73a0 CPGA fix for FRQCRA M3
        ARM: mach-shmobile: remove sh7367 on-chip set_irq_type()
        ARM: mach-shmobile: sh7372 INTCS MFIS2 interrupt update
        ARM: mach-shmobile: ag5evm: Add IrDA support
        ARM: mach-shmobile: clock-sh7372: fixup pllc2 set_rate
        mmc: sh_mmcif: Convert to __raw_xxx() I/O accessors.
        ARM: mach-shmobile: ag5evm requires GPIOLIB
        ARM: mach-shmobile: fix cpu_base of gic_init() on sh73a0
      8348ad7c
    • L
      Merge branch 'fbdev-fixes-for-linus' of... · 9948d378
      Linus Torvalds 提交于
      Merge branch 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6
      
      * 'fbdev-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/fbdev-2.6:
        mailmap: Add an entry for Axel Lin.
        video: fix some comments in drivers/video/console/vgacon.c
        drivers/video/bf537-lq035.c: Add missing IS_ERR test
        video: pxa168fb: remove a redundant pxa168fb_check_var call
        video: da8xx-fb: fix fb_probe error path
        video: pxa3xx-gcu: Return -EFAULT when copy_from_user() fails
        video: nuc900fb: properly free resources in nuc900fb_remove
        video: nuc900fb: fix compile error
      9948d378
    • L
      Merge branch 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 · fd3830b3
      Linus Torvalds 提交于
      * 'sh-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
        sh: Fix build of sh7750 base boards
        sh: update INTC to clear IRQ sense valid flag
        sh: Fix sh build failure when CONFIG_SFC=m
        sh: fix MSIOF0 SPI on ecovec: it conflicts with VOU
        sh: support XZ-compressed kernel.
        sh: Fix up breakage from asm-generic/pgtable.h changes.
      fd3830b3
    • D
      KEYS: Fix __key_link_end() quota fixup on error · ceb73c12
      David Howells 提交于
      Fix __key_link_end()'s attempt to fix up the quota if an error occurs.
      
      There are two erroneous cases: Firstly, we always decrease the quota if
      the preallocated replacement keyring needs cleaning up, irrespective of
      whether or not we should (we may have replaced a pointer rather than
      adding another pointer).
      
      Secondly, we never clean up the quota if we added a pointer without the
      keyring storage being extended (we allocate multiple pointers at a time,
      even if we're not going to use them all immediately).
      
      We handle this by setting the bottom bit of the preallocation pointer in
      __key_link_begin() to indicate that the quota needs fixing up, which is
      then passed to __key_link() (which clears the whole thing) and
      __key_link_end().
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ceb73c12
    • A
      intel_scu_ipcutils: Fix the license tag · f5c66d70
      Alan Cox 提交于
      GPL V2 should be GPL v2
      Signed-off-by: NAlan Cox <alan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f5c66d70
    • A
      Documentation: Fix kernel parameter ordering · 9cfe268e
      Alan Cox 提交于
      A B C D E ...
      Signed-off-by: NAlan Cox <alan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9cfe268e
    • A
      intel_scu_ipc: fix signedness bug · ecb5646c
      Axel Lin 提交于
      busy_loop() returns negative error code, thus change err variable
      from u32 to int to properly propagate correct error code.
      
      Also remove unneeded initialization for err and i variables.
      Signed-off-by: NAxel Lin <axel.lin@gmail.com>
      Signed-off-by: NAlan Cox <alan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ecb5646c
    • R
      Merge branch 'mmci' into fixes · 2f8e7285
      Russell King 提交于
      2f8e7285
    • R
      ALSA: AACI: fix timeout duration · 250c7a61
      Russell King 提交于
      Relying on the access time of peripherals is unreliable - it depends
      on the speed of the CPU and the bus.  On Versatile Express, these
      timeouts were expiring, causing the driver to fail.
      
      Add udelay(1) to ensure that they don't expire early, and adjust
      timeouts to give a reasonable margin over the response times.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      250c7a61
    • R
      ALSA: AACI: fix timeout condition checking · 69058cd6
      Russell King 提交于
      Ensure that a timeout coincident with the condition being waited for
      results in success rather than failure.  This helps avoid timeout
      conditions being inappropriately flagged.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      69058cd6
    • H
      ARM: 6636/1: ep93xx: default multiplexed gpio ports to gpio mode · fd015480
      Hartley Sweeten 提交于
      The EP93xx C and D GPIO ports are multiplexed with the Keypad Interface
      peripheral.  At power-up they default into non-GPIO mode with the Key
      Matrix controller enabled so these ports are unusable for GPIO.  Note
      that the Keypad Interface peripheral is only available in the EP9307,
      EP9312, and EP9315 processor variants.
      
      The keypad support will clear the DeviceConfig bits appropriately to
      enable the Keypad Interface when the driver is loaded.  And, when the
      driver is unloaded it will set the bits to return the ports to GPIO mode.
      
      To make these ports available for GPIO after power-up on all EP93xx
      processor variants, set the KEYS and GONK bits in the DeviceConfig
      register.
      
      Similarly, the E, G, and H ports are multiplexed with the IDE Interface
      peripheral.  At power-up these also default into non-GPIO mode.  Note
      that the IDE peripheral is only available in the EP9312 and EP9315
      processor variants.
      
      Since an IDE driver is not even available in mainline, set the EONIDE,
      GONIDE, and HONIDE bits in the DeviceConfig register so that these
      ports will be available for GPIO use after power-up.
      Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
      Acked-by: NRyan Mallon <ryan@bluewatersys.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      fd015480
    • C
      ARM: 6637/1: Make the argument to virt_to_phys() "const volatile" · 05b112ff
      Catalin Marinas 提交于
      Changing the virt_to_phys() argument to "const volatile void *" avoids
      compiler warnings in some situations where this function is used.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NStephen Boyd <sboyd@codeaurora.org>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      05b112ff
    • R
      ARM: twd: ensure timer reload is reprogrammed on entry to periodic mode · 03399c1c
      Russell King 提交于
      Ensure that the twd timer reload value is reprogrammed each time we
      enter periodic mode.  This ensures that the reload value is always
      reset correctly.
      Tested-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Acked-by: NColin Cross <ccross@android.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      03399c1c
    • P
      ARM: 6635/2: Configure reference clock for Versatile Express timers · baaece22
      Pawel Moll 提交于
      Timers on Versatile Express mainboard are used as system clock/event
      sources. Driver assumes that they are clocked with 1MHz signal.
      Old V2M firmware apparently configured it by default, but on newer
      boards one can observe that "sleep 1" command takes over 30 seconds
      to finish, as the timers are fed with 32kHz instead...
      
      This patch performs required magic and also removes code clearing
      timer's control registers, as exactly the same operations are
      performed by the timer driver few jiffies later.
      Signed-off-by: NPawel Moll <pawel.moll@arm.com>
      Tested-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      baaece22
  2. 25 1月, 2011 4 次提交
    • R
      ARM: versatile: name configuration options after actual board names · e5310f61
      Russell King 提交于
      Update the option text to those which appear on the front of the
      appropriate board user guides.  This gives consistent board naming, and
      makes it obvious which option is for which platform.
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      e5310f61
    • R
      ARM: realview: name configuration options after actual board names · d2a1c9ad
      Russell King 提交于
      As no one seems to really know which configuration options tie up with
      which boards, I thought I'd do some investigation and try to work it
      out.  After discussion with some folk in linaro, I think I have this
      nailed.
      
      The names are updated to use the name on the front of the appropriate
      board user guide for the various baseboards, which I've taken to be
      the official name for each board.
      
      I haven't significantly updated the descriptions for the tiles as that
      is even less clear - as far as I can see on ARMs website, there is no
      Cortex-A9 tile for Realview EB - only ARM11MPCore, ARM1156T2F-S,
      ARM1176TZF-S and Cortex-R4F.  So exactly what this 'Multicore Cortex-A9
      Tile' is...
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      d2a1c9ad
    • R
      ARM: realview,vexpress: fix section mismatch warning for pen_release · ec15038f
      Russell King 提交于
      Fix two section mismatch warnings in the platform SMP bringup code for
      Realview and Versatile Express:
      
      WARNING: arch/arm/mach-realview/built-in.o(.text+0x8ac): Section mismatch in reference from the function write_pen_release() to the variable .cpuinit.data:pen_release
      The function write_pen_release() references
      the variable __cpuinitdata pen_release.
      This is often because write_pen_release lacks a __cpuinitdata
      annotation or the annotation of pen_release is wrong.
      
      WARNING: arch/arm/mach-vexpress/built-in.o(.text+0x7b4): Section mismatch in reference from the function write_pen_release() to the variable .cpuinit.data:pen_release
      The function write_pen_release() references
      the variable __cpuinitdata pen_release.
      This is often because write_pen_release lacks a __cpuinitdata
      annotation or the annotation of pen_release is wrong.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      ec15038f
    • P
      mailmap: Add an entry for Axel Lin. · 45f17984
      Paul Mundt 提交于
      Not all of Axel's patches have used a consistent casing, so fix it up
      here.
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      45f17984