1. 21 1月, 2011 17 次提交
    • M
      kernel/smp.c: consolidate writes in smp_call_function_interrupt() · 225c8e01
      Milton Miller 提交于
      We have to test the cpu mask in the interrupt handler before checking the
      refs, otherwise we can start to follow an entry before its deleted and
      find it partially initailzed for the next trip.  Presently we also clear
      the cpumask bit before executing the called function, which implies
      getting write access to the line.  After the function is called we then
      decrement refs, and if they go to zero we then unlock the structure.
      
      However, this implies getting write access to the call function data
      before and after another the function is called.  If we can assert that no
      smp_call_function execution function is allowed to enable interrupts, then
      we can move both writes to after the function is called, hopfully allowing
      both writes with one cache line bounce.
      
      On a 256 thread system with a kernel compiled for 1024 threads, the time
      to execute testcase in the "smp_call_function_many race" changelog was
      reduced by about 30-40ms out of about 545 ms.
      
      I decided to keep this as WARN because its now a buggy function, even
      though the stack trace is of no value -- a simple printk would give us the
      information needed.
      
      Raw data:
      
      Without patch:
        ipi_test startup took 1219366ns complete 539819014ns total 541038380ns
        ipi_test startup took 1695754ns complete 543439872ns total 545135626ns
        ipi_test startup took 7513568ns complete 539606362ns total 547119930ns
        ipi_test startup took 13304064ns complete 533898562ns total 547202626ns
        ipi_test startup took 8668192ns complete 544264074ns total 552932266ns
        ipi_test startup took 4977626ns complete 548862684ns total 553840310ns
        ipi_test startup took 2144486ns complete 541292318ns total 543436804ns
        ipi_test startup took 21245824ns complete 530280180ns total 551526004ns
      
      With patch:
        ipi_test startup took 5961748ns complete 500859628ns total 506821376ns
        ipi_test startup took 8975996ns complete 495098924ns total 504074920ns
        ipi_test startup took 19797750ns complete 492204740ns total 512002490ns
        ipi_test startup took 14824796ns complete 487495878ns total 502320674ns
        ipi_test startup took 11514882ns complete 494439372ns total 505954254ns
        ipi_test startup took 8288084ns complete 502570774ns total 510858858ns
        ipi_test startup took 6789954ns complete 493388112ns total 500178066ns
      
      	#include <linux/module.h>
      	#include <linux/init.h>
      	#include <linux/sched.h> /* sched clock */
      
      	#define ITERATIONS 100
      
      	static void do_nothing_ipi(void *dummy)
      	{
      	}
      
      	static void do_ipis(struct work_struct *dummy)
      	{
      		int i;
      
      		for (i = 0; i < ITERATIONS; i++)
      			smp_call_function(do_nothing_ipi, NULL, 1);
      
      		printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
      	}
      
      	static struct work_struct work[NR_CPUS];
      
      	static int __init testcase_init(void)
      	{
      		int cpu;
      		u64 start, started, done;
      
      		start = local_clock();
      		for_each_online_cpu(cpu) {
      			INIT_WORK(&work[cpu], do_ipis);
      			schedule_work_on(cpu, &work[cpu]);
      		}
      		started = local_clock();
      		for_each_online_cpu(cpu)
      			flush_work(&work[cpu]);
      		done = local_clock();
      		pr_info("ipi_test startup took %lldns complete %lldns total %lldns\n",
      			started-start, done-started, done-start);
      
      		return 0;
      	}
      
      	static void __exit testcase_exit(void)
      	{
      	}
      
      	module_init(testcase_init)
      	module_exit(testcase_exit)
      	MODULE_LICENSE("GPL");
      	MODULE_AUTHOR("Anton Blanchard");
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      225c8e01
    • A
      kernel/smp.c: fix smp_call_function_many() SMP race · 6dc19899
      Anton Blanchard 提交于
      I noticed a failure where we hit the following WARN_ON in
      generic_smp_call_function_interrupt:
      
                      if (!cpumask_test_and_clear_cpu(cpu, data->cpumask))
                              continue;
      
                      data->csd.func(data->csd.info);
      
                      refs = atomic_dec_return(&data->refs);
                      WARN_ON(refs < 0);      <-------------------------
      
      We atomically tested and cleared our bit in the cpumask, and yet the
      number of cpus left (ie refs) was 0.  How can this be?
      
      It turns out commit 54fdade1
      ("generic-ipi: make struct call_function_data lockless") is at fault.  It
      removes locking from smp_call_function_many and in doing so creates a
      rather complicated race.
      
      The problem comes about because:
      
       - The smp_call_function_many interrupt handler walks call_function.queue
         without any locking.
       - We reuse a percpu data structure in smp_call_function_many.
       - We do not wait for any RCU grace period before starting the next
         smp_call_function_many.
      
      Imagine a scenario where CPU A does two smp_call_functions back to back,
      and CPU B does an smp_call_function in between.  We concentrate on how CPU
      C handles the calls:
      
      CPU A            CPU B                  CPU C              CPU D
      
      smp_call_function
                                              smp_call_function_interrupt
                                                  walks
      					call_function.queue sees
      					data from CPU A on list
      
                       smp_call_function
      
                                              smp_call_function_interrupt
                                                  walks
      
                                              call_function.queue sees
                                                (stale) CPU A on list
      							   smp_call_function int
      							   clears last ref on A
      							   list_del_rcu, unlock
      smp_call_function reuses
      percpu *data A
                                               data->cpumask sees and
                                               clears bit in cpumask
                                               might be using old or new fn!
                                               decrements refs below 0
      
      set data->refs (too late!)
      
      The important thing to note is since the interrupt handler walks a
      potentially stale call_function.queue without any locking, then another
      cpu can view the percpu *data structure at any time, even when the owner
      is in the process of initialising it.
      
      The following test case hits the WARN_ON 100% of the time on my PowerPC
      box (having 128 threads does help :)
      
      #include <linux/module.h>
      #include <linux/init.h>
      
      #define ITERATIONS 100
      
      static void do_nothing_ipi(void *dummy)
      {
      }
      
      static void do_ipis(struct work_struct *dummy)
      {
      	int i;
      
      	for (i = 0; i < ITERATIONS; i++)
      		smp_call_function(do_nothing_ipi, NULL, 1);
      
      	printk(KERN_DEBUG "cpu %d finished\n", smp_processor_id());
      }
      
      static struct work_struct work[NR_CPUS];
      
      static int __init testcase_init(void)
      {
      	int cpu;
      
      	for_each_online_cpu(cpu) {
      		INIT_WORK(&work[cpu], do_ipis);
      		schedule_work_on(cpu, &work[cpu]);
      	}
      
      	return 0;
      }
      
      static void __exit testcase_exit(void)
      {
      }
      
      module_init(testcase_init)
      module_exit(testcase_exit)
      MODULE_LICENSE("GPL");
      MODULE_AUTHOR("Anton Blanchard");
      
      I tried to fix it by ordering the read and the write of ->cpumask and
      ->refs.  In doing so I missed a critical case but Paul McKenney was able
      to spot my bug thankfully :) To ensure we arent viewing previous
      iterations the interrupt handler needs to read ->refs then ->cpumask then
      ->refs _again_.
      
      Thanks to Milton Miller and Paul McKenney for helping to debug this issue.
      
      [miltonm@bga.com: add WARN_ON and BUG_ON, remove extra read of refs before initial read of mask that doesn't help (also noted by Peter Zijlstra), adjust comments, hopefully clarify scenario ]
      [miltonm@bga.com: remove excess tests]
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMilton Miller <miltonm@bga.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: <stable@kernel.org> [2.6.32+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6dc19899
    • J
      memcg: correctly order reading PCG_USED and pc->mem_cgroup · 713735b4
      Johannes Weiner 提交于
      The placement of the read-side barrier is confused: the writer first
      sets pc->mem_cgroup, then PCG_USED.  The read-side barrier has to be
      between testing PCG_USED and reading pc->mem_cgroup.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      713735b4
    • R
      backlight: fix 88pm860x_bl macro collision · 2550326a
      Randy Dunlap 提交于
      Fix collision with kernel-supplied #define:
      
        drivers/video/backlight/88pm860x_bl.c:24:1: warning: "CURRENT_MASK" redefined
        arch/x86/include/asm/page_64_types.h:6:1: warning: this is the location of the previous definition
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Haojian Zhuang <haojian.zhuang@marvell.com>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2550326a
    • J
      drivers/leds/ledtrig-gpio.c: make output match input, tighten input checking · cc587ece
      Janusz Krzysztofik 提交于
      Replicate changes made to drivers/leds/ledtrig-backlight.c.
      
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Richard Purdie <richard.purdie@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc587ece
    • N
      MAINTAINERS: update Atmel AT91 entry · c1fc8675
      Nicolas Ferre 提交于
      Add two co-maintainers and update the entry with new information.
      Signed-off-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Acked-by: NAndrew Victor <linux@maxim.org.za>
      Acked-by: NJean-Christophe PLAGNIOL-VILLARD <plagnioj@jcrosoft.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1fc8675
    • J
      mm: fix truncate_setsize() comment · 382e27da
      Jan Kara 提交于
      Contrary to what the comment says, truncate_setsize() should be called
      *before* filesystem truncated blocks.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      382e27da
    • K
      memcg: fix rmdir, force_empty with THP · 987eba66
      KAMEZAWA Hiroyuki 提交于
      Now, when THP is enabled, memcg's rmdir() function is broken because
      move_account() for THP page is not supported.
      
      This will cause account leak or -EBUSY issue at rmdir().
      This patch fixes the issue by supporting move_account() THP pages.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      987eba66
    • K
      memcg: fix LRU accounting with THP · ece35ca8
      KAMEZAWA Hiroyuki 提交于
      memory cgroup's LRU stat should take care of size of pages because
      Transparent Hugepage inserts hugepage into LRU.  If this value is the
      number wrong, memory reclaim will not work well.
      
      Note: only head page of THP's huge page is linked into LRU.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ece35ca8
    • K
      memcg: fix USED bit handling at uncharge in THP · ca3e0214
      KAMEZAWA Hiroyuki 提交于
      Now, under THP:
      
      at charge:
        - PageCgroupUsed bit is set to all page_cgroup on a hugepage.
          ....set to 512 pages.
      at uncharge
        - PageCgroupUsed bit is unset on the head page.
      
      So, some pages will remain with "Used" bit.
      
      This patch fixes that Used bit is set only to the head page.
      Used bits for tail pages will be set at splitting if necessary.
      
      This patch adds this lock order:
         compound_lock() -> page_cgroup_move_lock().
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca3e0214
    • K
      memcg: modify accounting function for supporting THP better · e401f176
      KAMEZAWA Hiroyuki 提交于
      mem_cgroup_charge_statisics() was designed for charging a page but now, we
      have transparent hugepage.  To fix problems (in following patch) it's
      required to change the function to get the number of pages as its
      arguments.
      
      The new function gets following as argument.
        - type of page rather than 'pc'
        - size of page which is accounted.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e401f176
    • D
      fs/direct-io.c: don't try to allocate more than BIO_MAX_PAGES in a bio · 20d9600c
      David Dillow 提交于
      When using devices that support max_segments > BIO_MAX_PAGES (256), direct
      IO tries to allocate a bio with more pages than allowed, which leads to an
      oops in dio_bio_alloc().  Clamp the request to the supported maximum, and
      change dio_bio_alloc() to reflect that bio_alloc() will always return a
      bio when called with __GFP_WAIT and a valid number of vectors.
      
      [akpm@linux-foundation.org: remove redundant BUG_ON()]
      Signed-off-by: NDavid Dillow <dillowda@ornl.gov>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20d9600c
    • J
      mm: compaction: prevent division-by-zero during user-requested compaction · 82478fb7
      Johannes Weiner 提交于
      Up until 3e7d3449 ("mm: vmscan: reclaim order-0 and use compaction instead
      of lumpy reclaim"), compaction skipped calculating the fragmentation index
      of a zone when compaction was explicitely requested through the procfs
      knob.
      
      However, when compaction_suitable was introduced, it did not come with an
      extra check for order == -1, set on explicit compaction requests, and
      passed this order on to the fragmentation index calculation, where it
      overshifts the number of requested pages, leading to a division by zero.
      
      This patch makes sure that order == -1 is recognized as the flag it is
      rather than passing it along as valid order parameter.
      
      [akpm@linux-foundation.org: add comment, per Mel]
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82478fb7
    • J
    • T
      memblock: fix memblock_is_region_memory() · abb65272
      Tomi Valkeinen 提交于
      memblock_is_region_memory() uses reserved memblocks to search for the
      given region, while it should use the memory memblocks.
      
      I encountered the problem with OMAP's framebuffer ram allocation.
      Normally the ram is allocated dynamically, and this function is not
      called.  However, if we want to pass the framebuffer from the bootloader
      to the kernel (to retain the boot image), this function is used to check
      the validity of the kernel parameters for the framebuffer ram area.
      Signed-off-by: NTomi Valkeinen <tomi.valkeinen@nokia.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      abb65272
    • J
      thp: keep highpte mapped until it is no longer needed · 453c7192
      Johannes Weiner 提交于
      Two users reported THP-related crashes on 32-bit x86 machines.  Their oops
      reports indicated an invalid pte, and subsequent code inspection showed
      that the highpte is actually used after unmap.
      
      The fix is to unmap the pte only after all operations against it are
      finished.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NIlya Dryomov <idryomov@gmail.com>
      Reported-by: Nwerner <w.landgraf@ru.ru>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Tested-by: NIlya Dryomov <idryomov@gmail.com>
      Tested-by: Steven Rostedt <rostedt@goodmis.org
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      453c7192
    • D
      kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT · 6a108a14
      David Rientjes 提交于
      The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
      is used to configure any non-standard kernel with a much larger scope than
      only small devices.
      
      This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
      references to the option throughout the kernel.  A new CONFIG_EMBEDDED
      option is added that automatically selects CONFIG_EXPERT when enabled and
      can be used in the future to isolate options that should only be
      considered for embedded systems (RISC architectures, SLOB, etc).
      
      Calling the option "EXPERT" more accurately represents its intention: only
      expert users who understand the impact of the configuration changes they
      are making should enable it.
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NDavid Woodhouse <david.woodhouse@intel.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Greg KH <gregkh@suse.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Robin Holt <holt@sgi.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a108a14
  2. 20 1月, 2011 5 次提交
  3. 19 1月, 2011 18 次提交