1. 07 2月, 2010 1 次提交
  2. 12 1月, 2010 1 次提交
    • A
      lib: add support for LZO-compressed kernels · 7dd65feb
      Albin Tonnerre 提交于
      This patch series adds generic support for creating and extracting
      LZO-compressed kernel images, as well as support for using such images on
      the x86 and ARM architectures, and support for creating and using
      LZO-compressed initrd and initramfs images.
      
      Russell King said:
      
      : Testing on a Cortex A9 model:
      : - lzo decompressor is 65% of the time gzip takes to decompress a kernel
      : - lzo kernel is 9% larger than a gzip kernel
      :
      : which I'm happy to say confirms your figures when comparing the two.
      :
      : However, when comparing your new gzip code to the old gzip code:
      : - new is 99% of the size of the old code
      : - new takes 42% of the time to decompress than the old code
      :
      : What this means is that for a proper comparison, the results get even better:
      : - lzo is 7.5% larger than the old gzip'd kernel image
      : - lzo takes 28% of the time that the old gzip code took
      :
      : So the expense seems definitely worth the effort.  The only reason I
      : can think of ever using gzip would be if you needed the additional
      : compression (eg, because you have limited flash to store the image.)
      :
      : I would argue that the default for ARM should therefore be LZO.
      
      This patch:
      
      The lzo compressor is worse than gzip at compression, but faster at
      extraction.  Here are some figures for an ARM board I'm working on:
      
      Uncompressed size: 3.24Mo
      gzip  1.61Mo 0.72s
      lzo   1.75Mo 0.48s
      
      So for a compression ratio that is still relatively close to gzip, it's
      much faster to extract, at least in that case.
      
      This part contains:
       - Makefile routine to support lzo compression
       - Fixes to the existing lzo compressor so that it can be used in
         compressed kernels
       - wrapper around the existing lzo1x_decompress, as it only extracts one
         block at a time, while we need to extract a whole file here
       - config dialog for kernel compression
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: cleanup]
      Signed-off-by: NAlbin Tonnerre <albin.tonnerre@free-electrons.com>
      Tested-by: NWu Zhangjin <wuzhangjin@gmail.com>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Tested-by: NRussell King <rmk@arm.linux.org.uk>
      Acked-by: NRussell King <rmk@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7dd65feb
  3. 17 12月, 2009 1 次提交
  4. 16 12月, 2009 3 次提交
  5. 12 12月, 2009 2 次提交
  6. 09 12月, 2009 1 次提交
  7. 03 12月, 2009 1 次提交
  8. 02 12月, 2009 2 次提交
  9. 26 11月, 2009 1 次提交
    • M
      timers, init: Limit the number of per cpu calibration bootup messages · feae3203
      Mike Travis 提交于
      Limit the number of per cpu calibration messages by only
      printing out results for the first cpu to boot.
      
      Also, don't print "CPUx is down" as this is expected, and we
      don't need 4096 reminders... ;-)
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Roland Dreier <rdreier@cisco.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
      Cc: Jack Steiner <steiner@sgi.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <20091118002219.889552000@alcatraz.americas.sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      feae3203
  10. 20 11月, 2009 1 次提交
    • D
      SLOW_WORK: Allow the work items to be viewed through a /proc file · 8fba10a4
      David Howells 提交于
      Allow the executing and queued work items to be viewed through a /proc file
      for debugging purposes.  The contents look something like the following:
      
          THR PID   ITEM ADDR        FL MARK  DESC
          === ===== ================ == ===== ==========
            0  3005 ffff880023f52348  a 952ms FSC: OBJ17d3: LOOK
            1  3006 ffff880024e33668  2 160ms FSC: OBJ17e5 OP60d3b: Write1/Store fl=2
            2  3165 ffff8800296dd180  a 424ms FSC: OBJ17e4: LOOK
            3  4089 ffff8800262c8d78  a 212ms FSC: OBJ17ea: CRTN
            4  4090 ffff88002792bed8  2 388ms FSC: OBJ17e8 OP60d36: Write1/Store fl=2
            5  4092 ffff88002a0ef308  2 388ms FSC: OBJ17e7 OP60d2e: Write1/Store fl=2
            6  4094 ffff88002abaf4b8  2 132ms FSC: OBJ17e2 OP60d4e: Write1/Store fl=2
            7  4095 ffff88002bb188e0  a 388ms FSC: OBJ17e9: CRTN
          vsq     - ffff880023d99668  1 308ms FSC: OBJ17e0 OP60f91: Write1/EnQ fl=2
          vsq     - ffff8800295d1740  1 212ms FSC: OBJ16be OP4d4b6: Write1/EnQ fl=2
          vsq     - ffff880025ba3308  1 160ms FSC: OBJ179a OP58dec: Write1/EnQ fl=2
          vsq     - ffff880024ec83e0  1 160ms FSC: OBJ17ae OP599f2: Write1/EnQ fl=2
          vsq     - ffff880026618e00  1 160ms FSC: OBJ17e6 OP60d33: Write1/EnQ fl=2
          vsq     - ffff880025a2a4b8  1 132ms FSC: OBJ16a2 OP4d583: Write1/EnQ fl=2
          vsq     - ffff880023cbe6d8  9 212ms FSC: OBJ17eb: LOOK
          vsq     - ffff880024d37590  9 212ms FSC: OBJ17ec: LOOK
          vsq     - ffff880027746cb0  9 212ms FSC: OBJ17ed: LOOK
          vsq     - ffff880024d37ae8  9 212ms FSC: OBJ17ee: LOOK
          vsq     - ffff880024d37cb0  9 212ms FSC: OBJ17ef: LOOK
          vsq     - ffff880025036550  9 212ms FSC: OBJ17f0: LOOK
          vsq     - ffff8800250368e0  9 212ms FSC: OBJ17f1: LOOK
          vsq     - ffff880025036aa8  9 212ms FSC: OBJ17f2: LOOK
      
      In the 'THR' column, executing items show the thread they're occupying and
      queued threads indicate which queue they're on.  'PID' shows the process ID of
      a slow-work thread that's executing something.  'FL' shows the work item flags.
      'MARK' indicates how long since an item was queued or began executing.  Lastly,
      the 'DESC' column permits the owner of an item to give some information.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      8fba10a4
  11. 14 11月, 2009 1 次提交
    • T
      locking: Make inlining decision Kconfig based · 6beb0009
      Thomas Gleixner 提交于
      commit 892a7c67 (locking: Allow arch-inlined spinlocks) implements the
      selection of which lock functions are inlined based on defines in
      arch/.../spinlock.h: #define __always_inline__LOCK_FUNCTION
      
      Despite of the name __always_inline__* the lock functions can be built
      out of line depending on config options. Also if the arch does not set
      some inline defines the generic code might set them; again depending on
      config options.
      
      This makes it unnecessary hard to figure out when and which lock
      functions are inlined. Aside of that it makes it way harder and
      messier for -rt to manipulate the lock functions.
      
      Convert the inlining decision to CONFIG switches. Each lock function
      is inlined depending on CONFIG_INLINE_*. The configs implement the
      existing dependencies. The architecture code can select ARCH_INLINE_*
      to signal that it wants the corresponding lock function inlined.
      ARCH_INLINE_* is necessary as Kconfig ignores "depends on"
      restrictions when a config element is selected.
      
      No functional change.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <20091109151428.504477141@linutronix.de>
      Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      6beb0009
  12. 11 11月, 2009 1 次提交
    • E
      sysctl: Reduce sys_sysctl to a compatibility wrapper around /proc/sys · 26a7034b
      Eric W. Biederman 提交于
      To simply maintenance and to be able to remove all of the binary
      sysctl support from various subsystems I have rewritten the binary
      sysctl code as a compatibility wrapper around proc/sys.
      
      The code is built around a hard coded table based on the table
      in sysctl_check.c that lists all of our current binary sysctls
      and provides enough information to convert from the sysctl
      binary input into into ascii and back again.  New in this
      patch is the realization that the only dynamic entries
      that need to be handled have ifname as the asscii string
      and ifindex as their ctl_name.
      
      When a sys_sysctl is called the code now looks in the
      translation table converting the binary name to the
      path under /proc where the value is to be found.  Opens
      that file, and calls into a format conversion wrapper
      that calls fop->read and then fop->write as appropriate.
      
      Since in practice the practically no one uses or tests
      sys_sysctl rewritting the code to be beautiful is a little
      silly.  The redeeming merit of this work is it allows us to
      rip out all of the binary sysctl syscall support from
      everywhere else in the tree.  Allowing us to remove
      a lot of dead (after this patch) and barely maintained code.
      
      In addition it becomes much easier to optimize the sysctl
      implementation for being the backing store of /proc/sys,
      without having to worry about sys_sysctl.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      26a7034b
  13. 02 11月, 2009 1 次提交
  14. 27 10月, 2009 1 次提交
  15. 26 10月, 2009 1 次提交
    • P
      rcu: "Tiny RCU", The Bloatwatch Edition · 9b1d82fa
      Paul E. McKenney 提交于
      This patch is a version of RCU designed for !SMP provided for a
      small-footprint RCU implementation.  In particular, the
      implementation of synchronize_rcu() is extremely lightweight and
      high performance. It passes rcutorture testing in each of the
      four relevant configurations (combinations of NO_HZ and PREEMPT)
      on x86.  This saves about 1K bytes compared to old Classic RCU
      (which is no longer in mainline), and more than three kilobytes
      compared to Hierarchical RCU (updated to 2.6.30):
      
      	CONFIG_TREE_RCU:
      
      	   text	   data	    bss	    dec	    filename
      	    183       4       0     187     kernel/rcupdate.o
      	   2783     520      36    3339     kernel/rcutree.o
      				   3526 Total (vs 4565 for v7)
      
      	CONFIG_TREE_PREEMPT_RCU:
      
      	   text	   data	    bss	    dec	    filename
      	    263       4       0     267     kernel/rcupdate.o
      	   4594     776      52    5422     kernel/rcutree.o
      	   			   5689 Total (6155 for v7)
      
      	CONFIG_TINY_RCU:
      
      	   text	   data	    bss	    dec	    filename
      	     96       4       0     100     kernel/rcupdate.o
      	    734      24       0     758     kernel/rcutiny.o
      	    			    858 Total (vs 848 for v7)
      
      The above is for x86.  Your mileage may vary on other platforms.
      Further compression is possible, but is being procrastinated.
      
      Changes from v7 (http://lkml.org/lkml/2009/10/9/388)
      
      o	Apply Lai Jiangshan's review comments (aside from
      might_sleep() 	in synchronize_sched(), which is covered by SMP builds).
      
      o	Fix up expedited primitives.
      
      Changes from v6 (http://lkml.org/lkml/2009/9/23/293).
      
      o	Forward ported to put it into the 2.6.33 stream.
      
      o	Added lockdep support.
      
      o	Make lightweight rcu_barrier.
      
      Changes from v5 (http://lkml.org/lkml/2009/6/23/12).
      
      o	Ported to latest pre-2.6.32 merge window kernel.
      
      	- Renamed rcu_qsctr_inc() to rcu_sched_qs().
      	- Renamed rcu_bh_qsctr_inc() to rcu_bh_qs().
      	- Provided trivial rcu_cpu_notify().
      	- Provided trivial exit_rcu().
      	- Provided trivial rcu_needs_cpu().
      	- Fixed up the rcu_*_enter/exit() functions in linux/hardirq.h.
      
      o	Removed the dependence on EMBEDDED, with a view to making
      	TINY_RCU default for !SMP at some time in the future.
      
      o	Added (trivial) support for expedited grace periods.
      
      Changes from v4 (http://lkml.org/lkml/2009/5/2/91) include:
      
      o	Squeeze the size down a bit further by removing the
      	->completed field from struct rcu_ctrlblk.
      
      o	This permits synchronize_rcu() to become the empty function.
      	Previous concerns about rcutorture were unfounded, as
      	rcutorture correctly handles a constant value from
      	rcu_batches_completed() and rcu_batches_completed_bh().
      
      Changes from v3 (http://lkml.org/lkml/2009/3/29/221) include:
      
      o	Changed rcu_batches_completed(), rcu_batches_completed_bh()
      	rcu_enter_nohz(), rcu_exit_nohz(), rcu_nmi_enter(), and
      	rcu_nmi_exit(), to be static inlines, as suggested by David
      	Howells.  Doing this saves about 100 bytes from rcutiny.o.
      	(The numbers between v3 and this v4 of the patch are not directly
      	comparable, since they are against different versions of Linux.)
      
      Changes from v2 (http://lkml.org/lkml/2009/2/3/333) include:
      
      o	Fix whitespace issues.
      
      o	Change short-circuit "||" operator to instead be "+" in order
      to 	fix performance bug noted by "kraai" on LWN.
      
      		(http://lwn.net/Articles/324348/)
      
      Changes from v1 (http://lkml.org/lkml/2009/1/13/440) include:
      
      o	This version depends on EMBEDDED as well as !SMP, as suggested
      	by Ingo.
      
      o	Updated rcu_needs_cpu() to unconditionally return zero,
      	permitting the CPU to enter dynticks-idle mode at any time.
      	This works because callbacks can be invoked upon entry to
      	dynticks-idle mode.
      
      o	Paul is now OK with this being included, based on a poll at
      the 	Kernel Miniconf at linux.conf.au, where about ten people said
      	that they cared about saving 900 bytes on single-CPU systems.
      
      o	Applies to both mainline and tip/core/rcu.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NJosh Triplett <josh@joshtriplett.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: avi@redhat.com
      Cc: mtosatti@redhat.com
      LKML-Reference: <12565226351355-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9b1d82fa
  16. 06 10月, 2009 1 次提交
    • P
      perf_event: Provide vmalloc() based mmap() backing · 906010b2
      Peter Zijlstra 提交于
      Some architectures such as Sparc, ARM and MIPS (basically
      everything with flush_dcache_page()) need to deal with dcache
      aliases by carefully placing pages in both kernel and user maps.
      
      These architectures typically have to use vmalloc_user() for this.
      
      However, on other architectures, vmalloc() is not needed and has
      the downsides of being more restricted and slower than regular
      allocations.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NDavid Miller <davem@davemloft.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1254830228.21044.272.camel@laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      906010b2
  17. 24 9月, 2009 3 次提交
    • A
      headers: utsname.h redux · 2bcd57ab
      Alexey Dobriyan 提交于
      * remove asm/atomic.h inclusion from linux/utsname.h --
         not needed after kref conversion
       * remove linux/utsname.h inclusion from files which do not need it
      
      NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
      due to some personality stuff it _is_ needed -- cowardly leave ELF-related
      headers and files alone.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2bcd57ab
    • R
      cpumask: remove unused cpu_mask_all · 72d78d05
      Rusty Russell 提交于
      It's only defined for NR_CPUS > BITS_PER_LONG; cpu_all_mask is always
      defined (and const).
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      72d78d05
    • P
      rcu: Clean up code based on review feedback from Josh Triplett, part 2 · 1eba8f84
      Paul E. McKenney 提交于
      These issues identified during an old-fashioned face-to-face code
      review extending over many hours.
      
      o	Add comments for tricky parts of code, and correct comments
      	that have passed their sell-by date.
      
      o	Get rid of the vestiges of rcu_init_sched(), which is no
      	longer needed now that PREEMPT_RCU is gone.
      
      o	Move the #include of rcutree_plugin.h to the end of
      	rcutree.c, which means that, rather than having a random
      	collection of forward declarations, the new set of forward
      	declarations document the set of plugins.  The new home for
      	this #include also allows __rcu_init_preempt() to move into
      	rcutree_plugin.h.
      
      o	Fix rcu_preempt_check_callbacks() to be static.
      Suggested-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      LKML-Reference: <12537246443924-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Peter Zijlstra <peterz@infradead.org>
      1eba8f84
  18. 22 9月, 2009 1 次提交
  19. 21 9月, 2009 2 次提交
    • I
      perf: Tidy up after the big rename · 57c0c15b
      Ingo Molnar 提交于
       - provide compatibility Kconfig entry for existing PERF_COUNTERS .config's
      
       - provide courtesy copy of old perf_counter.h, for user-space projects
      
       - small indentation fixups
      
       - fix up MAINTAINERS
      
       - fix small x86 printout fallout
      
       - fix up small PowerPC comment fallout (use 'counter' as in register)
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      57c0c15b
    • I
      perf: Do the big rename: Performance Counters -> Performance Events · cdd6c482
      Ingo Molnar 提交于
      Bye-bye Performance Counters, welcome Performance Events!
      
      In the past few months the perfcounters subsystem has grown out its
      initial role of counting hardware events, and has become (and is
      becoming) a much broader generic event enumeration, reporting, logging,
      monitoring, analysis facility.
      
      Naming its core object 'perf_counter' and naming the subsystem
      'perfcounters' has become more and more of a misnomer. With pending
      code like hw-breakpoints support the 'counter' name is less and
      less appropriate.
      
      All in one, we've decided to rename the subsystem to 'performance
      events' and to propagate this rename through all fields, variables
      and API names. (in an ABI compatible fashion)
      
      The word 'event' is also a bit shorter than 'counter' - which makes
      it slightly more convenient to write/handle as well.
      
      Thanks goes to Stephane Eranian who first observed this misnomer and
      suggested a rename.
      
      User-space tooling and ABI compatibility is not affected - this patch
      should be function-invariant. (Also, defconfigs were not touched to
      keep the size down.)
      
      This patch has been generated via the following script:
      
        FILES=$(find * -type f | grep -vE 'oprofile|[^K]config')
      
        sed -i \
          -e 's/PERF_EVENT_/PERF_RECORD_/g' \
          -e 's/PERF_COUNTER/PERF_EVENT/g' \
          -e 's/perf_counter/perf_event/g' \
          -e 's/nb_counters/nb_events/g' \
          -e 's/swcounter/swevent/g' \
          -e 's/tpcounter_event/tp_event/g' \
          $FILES
      
        for N in $(find . -name perf_counter.[ch]); do
          M=$(echo $N | sed 's/perf_counter/perf_event/g')
          mv $N $M
        done
      
        FILES=$(find . -name perf_event.*)
      
        sed -i \
          -e 's/COUNTER_MASK/REG_MASK/g' \
          -e 's/COUNTER/EVENT/g' \
          -e 's/\<event\>/event_id/g' \
          -e 's/counter/event/g' \
          -e 's/Counter/Event/g' \
          $FILES
      
      ... to keep it as correct as possible. This script can also be
      used by anyone who has pending perfcounters patches - it converts
      a Linux kernel tree over to the new naming. We tried to time this
      change to the point in time where the amount of pending patches
      is the smallest: the end of the merge window.
      
      Namespace clashes were fixed up in a preparatory patch - and some
      stylistic fallout will be fixed up in a subsequent patch.
      
      ( NOTE: 'counters' are still the proper terminology when we deal
        with hardware registers - and these sed scripts are a bit
        over-eager in renaming them. I've undone some of that, but
        in case there's something left where 'counter' would be
        better than 'event' we can undo that on an individual basis
        instead of touching an otherwise nicely automated patch. )
      Suggested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Reviewed-by: NArjan van de Ven <arjan@linux.intel.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cdd6c482
  20. 20 9月, 2009 1 次提交
  21. 19 9月, 2009 1 次提交
  22. 18 9月, 2009 1 次提交
  23. 16 9月, 2009 1 次提交
    • K
      Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev · 2b2af54a
      Kay Sievers 提交于
      Devtmpfs lets the kernel create a tmpfs instance called devtmpfs
      very early at kernel initialization, before any driver-core device
      is registered. Every device with a major/minor will provide a
      device node in devtmpfs.
      
      Devtmpfs can be changed and altered by userspace at any time,
      and in any way needed - just like today's udev-mounted tmpfs.
      Unmodified udev versions will run just fine on top of it, and will
      recognize an already existing kernel-created device node and use it.
      The default node permissions are root:root 0600. Proper permissions
      and user/group ownership, meaningful symlinks, all other policy still
      needs to be applied by userspace.
      
      If a node is created by devtmps, devtmpfs will remove the device node
      when the device goes away. If the device node was created by
      userspace, or the devtmpfs created node was replaced by userspace, it
      will no longer be removed by devtmpfs.
      
      If it is requested to auto-mount it, it makes init=/bin/sh work
      without any further userspace support. /dev will be fully populated
      and dynamic, and always reflect the current device state of the kernel.
      With the commonly used dynamic device numbers, it solves the problem
      where static devices nodes may point to the wrong devices.
      
      It is intended to make the initial bootup logic simpler and more robust,
      by de-coupling the creation of the inital environment, to reliably run
      userspace processes, from a complex userspace bootstrap logic to provide
      a working /dev.
      Signed-off-by: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJan Blunck <jblunck@suse.de>
      Tested-By: NHarald Hoyer <harald@redhat.com>
      Tested-By: NScott James Remnant <scott@ubuntu.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      2b2af54a
  24. 04 9月, 2009 1 次提交
  25. 02 9月, 2009 1 次提交
  26. 29 8月, 2009 1 次提交
  27. 27 8月, 2009 1 次提交
    • T
      init: Move sched_clock_init after late_time_init · fa84e9ee
      Thomas Gleixner 提交于
      Some architectures initialize clocks and timers in late_time_init and
      x86 wants to do the same to avoid FIXMAP hackery for calibrating the
      TSC. That would result in undefined sched_clock readout and wreckaged
      printk timestamps again. We probably have those already on archs which
      do all their time/clock setup in late_time_init.
      
      There is no harm to move that after late_time_init except that a few
      more boot timestamps are stale. The scheduler is not active at that
      point so no real wreckage is expected.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      LKML-Reference: <new-submission>
      Cc: linux-arch@vger.kernel.org
      fa84e9ee
  28. 23 8月, 2009 2 次提交
    • P
      rcu: Remove CONFIG_PREEMPT_RCU · 6b3ef48a
      Paul E. McKenney 提交于
      Now that CONFIG_TREE_PREEMPT_RCU is in place, there is no
      further need for CONFIG_PREEMPT_RCU.  Remove it, along with
      whatever subtle bugs it may (or may not) contain.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <125097461396-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6b3ef48a
    • P
      rcu: Merge preemptable-RCU functionality into hierarchical RCU · f41d911f
      Paul E. McKenney 提交于
      Create a kernel/rcutree_plugin.h file that contains definitions
      for preemptable RCU (or, under the #else branch of the #ifdef,
      empty definitions for the classic non-preemptable semantics).
      These definitions fit into plugins defined in kernel/rcutree.c
      for this purpose.
      
      This variant of preemptable RCU uses a new algorithm whose
      read-side expense is roughly that of classic hierarchical RCU
      under CONFIG_PREEMPT. This new algorithm's update-side expense
      is similar to that of classic hierarchical RCU, and, in absence
      of read-side preemption or blocking, is exactly that of classic
      hierarchical RCU.  Perhaps more important, this new algorithm
      has a much simpler implementation, saving well over 1,000 lines
      of code compared to mainline's implementation of preemptable
      RCU, which will hopefully be retired in favor of this new
      algorithm.
      
      The simplifications are obtained by maintaining per-task
      nesting state for running tasks, and using a simple
      lock-protected algorithm to handle accounting when tasks block
      within RCU read-side critical sections, making use of lessons
      learned while creating numerous user-level RCU implementations
      over the past 18 months.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: akpm@linux-foundation.org
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josht@linux.vnet.ibm.com
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      LKML-Reference: <12509746134003-git-send-email->
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f41d911f
  29. 21 8月, 2009 1 次提交
    • I
      tracing: Fix too large stack usage in do_one_initcall() · 4a683bf9
      Ingo Molnar 提交于
      One of my testboxes triggered this nasty stack overflow crash
      during SCSI probing:
      
      [    5.874004] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
      [    5.875004] device: 'sda': device_add
      [    5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
      [    5.878004] IP: [<b1008321>] print_context_stack+0x81/0x110
      [    5.878004] *pde = 00000000
      [    5.878004] Thread overran stack, or stack corrupted
      [    5.878004] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      [    5.878004] last sysfs file:
      [    5.878004]
      [    5.878004] Pid: 1, comm: swapper Not tainted (2.6.31-rc6-tip-01272-g9919e28-dirty #5685)
      [    5.878004] EIP: 0060:[<b1008321>] EFLAGS: 00010083 CPU: 0
      [    5.878004] EIP is at print_context_stack+0x81/0x110
      [    5.878004] EAX: cf8a3000 EBX: cf8a3fe4 ECX: 00000049 EDX: 00000000
      [    5.878004] ESI: b1cfce84 EDI: 00000000 EBP: cf8a3018 ESP: cf8a2ff4
      [    5.878004]  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
      [    5.878004] Process swapper (pid: 1, ti=cf8a2000 task=cf8a8000 task.ti=cf8a3000)
      [    5.878004] Stack:
      [    5.878004]  b1004867 fffff000 cf8a3ffc
      [    5.878004] Call Trace:
      [    5.878004]  [<b1004867>] ? kernel_thread_helper+0x7/0x10
      [    5.878004] BUG: unable to handle kernel NULL pointer dereference at 00000a0c
      [    5.878004] IP: [<b1008321>] print_context_stack+0x81/0x110
      [    5.878004] *pde = 00000000
      [    5.878004] Thread overran stack, or stack corrupted
      [    5.878004] Oops: 0000 [#2] PREEMPT SMP DEBUG_PAGEALLOC
      
      The oops did not reveal any more details about the real stack
      that we have and the system got into an infinite loop of
      recursive pagefaults.
      
      So i booted with CONFIG_STACK_TRACER=y and the 'stacktrace' boot
      parameter. The box did not crash (timings/conditions probably
      changed a tiny bit to trigger the catastrophic crash), but the
      /debug/tracing/stack_trace file was rather revealing:
      
              Depth    Size   Location    (72 entries)
              -----    ----   --------
        0)     3704      52   __change_page_attr+0xb8/0x290
        1)     3652      24   __change_page_attr_set_clr+0x43/0x90
        2)     3628      60   kernel_map_pages+0x108/0x120
        3)     3568      40   prep_new_page+0x7d/0x130
        4)     3528      84   get_page_from_freelist+0x106/0x420
        5)     3444     116   __alloc_pages_nodemask+0xd7/0x550
        6)     3328      36   allocate_slab+0xb1/0x100
        7)     3292      36   new_slab+0x1c/0x160
        8)     3256      36   __slab_alloc+0x133/0x2b0
        9)     3220       4   kmem_cache_alloc+0x1bb/0x1d0
       10)     3216     108   create_object+0x28/0x250
       11)     3108      40   kmemleak_alloc+0x81/0xc0
       12)     3068      24   kmem_cache_alloc+0x162/0x1d0
       13)     3044      52   scsi_pool_alloc_command+0x29/0x70
       14)     2992      20   scsi_host_alloc_command+0x22/0x70
       15)     2972      24   __scsi_get_command+0x1b/0x90
       16)     2948      28   scsi_get_command+0x35/0x90
       17)     2920      24   scsi_setup_blk_pc_cmnd+0xd4/0x100
       18)     2896     128   sd_prep_fn+0x332/0xa70
       19)     2768      36   blk_peek_request+0xe7/0x1d0
       20)     2732      56   scsi_request_fn+0x54/0x520
       21)     2676      12   __generic_unplug_device+0x2b/0x40
       22)     2664      24   blk_execute_rq_nowait+0x59/0x80
       23)     2640     172   blk_execute_rq+0x6b/0xb0
       24)     2468      32   scsi_execute+0xe0/0x140
       25)     2436      64   scsi_execute_req+0x152/0x160
       26)     2372      60   scsi_vpd_inquiry+0x6c/0x90
       27)     2312      44   scsi_get_vpd_page+0x112/0x160
       28)     2268      52   sd_revalidate_disk+0x1df/0x320
       29)     2216      92   rescan_partitions+0x98/0x330
       30)     2124      52   __blkdev_get+0x309/0x350
       31)     2072       8   blkdev_get+0xf/0x20
       32)     2064      44   register_disk+0xff/0x120
       33)     2020      36   add_disk+0x6e/0xb0
       34)     1984      44   sd_probe_async+0xfb/0x1d0
       35)     1940      44   __async_schedule+0xf4/0x1b0
       36)     1896       8   async_schedule+0x12/0x20
       37)     1888      60   sd_probe+0x305/0x360
       38)     1828      44   really_probe+0x63/0x170
       39)     1784      36   driver_probe_device+0x5d/0x60
       40)     1748      16   __device_attach+0x49/0x50
       41)     1732      32   bus_for_each_drv+0x5b/0x80
       42)     1700      24   device_attach+0x6b/0x70
       43)     1676      16   bus_attach_device+0x47/0x60
       44)     1660      76   device_add+0x33d/0x400
       45)     1584      52   scsi_sysfs_add_sdev+0x6a/0x2c0
       46)     1532     108   scsi_add_lun+0x44b/0x460
       47)     1424     116   scsi_probe_and_add_lun+0x182/0x4e0
       48)     1308      36   __scsi_add_device+0xd9/0xe0
       49)     1272      44   ata_scsi_scan_host+0x10b/0x190
       50)     1228      24   async_port_probe+0x96/0xd0
       51)     1204      44   __async_schedule+0xf4/0x1b0
       52)     1160       8   async_schedule+0x12/0x20
       53)     1152      48   ata_host_register+0x171/0x1d0
       54)     1104      60   ata_pci_sff_activate_host+0xf3/0x230
       55)     1044      44   ata_pci_sff_init_one+0xea/0x100
       56)     1000      48   amd_init_one+0xb2/0x190
       57)      952       8   local_pci_probe+0x13/0x20
       58)      944      32   pci_device_probe+0x68/0x90
       59)      912      44   really_probe+0x63/0x170
       60)      868      36   driver_probe_device+0x5d/0x60
       61)      832      20   __driver_attach+0x89/0xa0
       62)      812      32   bus_for_each_dev+0x5b/0x80
       63)      780      12   driver_attach+0x1e/0x20
       64)      768      72   bus_add_driver+0x14b/0x2d0
       65)      696      36   driver_register+0x6e/0x150
       66)      660      20   __pci_register_driver+0x53/0xc0
       67)      640       8   amd_init+0x14/0x16
       68)      632     572   do_one_initcall+0x2b/0x1d0
       69)       60      12   do_basic_setup+0x56/0x6a
       70)       48      20   kernel_init+0x84/0xce
       71)       28      28   kernel_thread_helper+0x7/0x10
      
      There's a lot of fat functions on that stack trace, but
      the largest of all is do_one_initcall(). This is due to
      the boot trace entry variables being on the stack.
      
      Fixing this is relatively easy, initcalls are fundamentally
      serialized, so we can move the local variables to file scope.
      
      Note that this large stack footprint was present for a
      couple of months already - what pushed my system over
      the edge was the addition of kmemleak to the call-chain:
      
        6)     3328      36   allocate_slab+0xb1/0x100
        7)     3292      36   new_slab+0x1c/0x160
        8)     3256      36   __slab_alloc+0x133/0x2b0
        9)     3220       4   kmem_cache_alloc+0x1bb/0x1d0
       10)     3216     108   create_object+0x28/0x250
       11)     3108      40   kmemleak_alloc+0x81/0xc0
       12)     3068      24   kmem_cache_alloc+0x162/0x1d0
       13)     3044      52   scsi_pool_alloc_command+0x29/0x70
      
      This pushes the total to ~3800 bytes, only a tiny bit
      more was needed to corrupt the on-kernel-stack thread_info.
      
      The fix reduces the stack footprint from 572 bytes
      to 28 bytes.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <srostedt@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@kernel.org>
      LKML-Reference: <new-submission>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4a683bf9
  30. 14 8月, 2009 1 次提交
  31. 05 8月, 2009 1 次提交
  32. 02 8月, 2009 1 次提交