1. 26 7月, 2008 8 次提交
  2. 22 7月, 2008 2 次提交
  3. 21 7月, 2008 1 次提交
    • G
      initrd: Fix virtual/physical mix-up in overwrite test · fb6624eb
      Geert Uytterhoeven 提交于
      On recent kernels, I get the following error when using an initrd:
      
      | initrd overwritten (0x00b78000 < 0x07668000) - disabling it.
      
      My Amiga 4000 has 12 MiB of RAM at physical address 0x07400000 (virtual
      0x00000000).
      The initrd is located at the end of RAM: 0x00b78000 - 0x00c00000 (virtual).
      The overwrite test compares the (virtual) initrd location to the (physical)
      first available memory location, which fails.
      
      This patch converts initrd_start to a page frame number, so it can safely be
      compared with min_low_pfn.
      
      Before the introduction of discontiguous memory support on m68k
      (12d810c1), min_low_pfn was just left
      untouched by the m68k-specific code (zero, I guess), and everything worked
      fine.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb6624eb
  4. 18 7月, 2008 1 次提交
    • M
      cpu hotplug, sched: Introduce cpu_active_map and redo sched domain managment (take 2) · e761b772
      Max Krasnyansky 提交于
      This is based on Linus' idea of creating cpu_active_map that prevents
      scheduler load balancer from migrating tasks to the cpu that is going
      down.
      
      It allows us to simplify domain management code and avoid unecessary
      domain rebuilds during cpu hotplug event handling.
      
      Please ignore the cpusets part for now. It needs some more work in order
      to avoid crazy lock nesting. Although I did simplfy and unify domain
      reinitialization logic. We now simply call partition_sched_domains() in
      all the cases. This means that we're using exact same code paths as in
      cpusets case and hence the test below cover cpusets too.
      Cpuset changes to make rebuild_sched_domains() callable from various
      contexts are in the separate patch (right next after this one).
      
      This not only boots but also easily handles
      	while true; do make clean; make -j 8; done
      and
      	while true; do on-off-cpu 1; done
      at the same time.
      (on-off-cpu 1 simple does echo 0/1 > /sys/.../cpu1/online thing).
      
      Suprisingly the box (dual-core Core2) is quite usable. In fact I'm typing
      this on right now in gnome-terminal and things are moving just fine.
      
      Also this is running with most of the debug features enabled (lockdep,
      mutex, etc) no BUG_ONs or lockdep complaints so far.
      
      I believe I addressed all of the Dmitry's comments for original Linus'
      version. I changed both fair and rt balancer to mask out non-active cpus.
      And replaced cpu_is_offline() with !cpu_active() in the main scheduler
      code where it made sense (to me).
      Signed-off-by: NMax Krasnyanskiy <maxk@qualcomm.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Acked-by: NGregory Haskins <ghaskins@novell.com>
      Cc: dmitry.adamushko@gmail.com
      Cc: pj@sgi.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e761b772
  5. 15 7月, 2008 1 次提交
  6. 26 6月, 2008 1 次提交
    • J
      Add generic helpers for arch IPI function calls · 3d442233
      Jens Axboe 提交于
      This adds kernel/smp.c which contains helpers for IPI function calls. In
      addition to supporting the existing smp_call_function() in a more efficient
      manner, it also adds a more scalable variant called smp_call_function_single()
      for calling a given function on a single CPU only.
      
      The core of this is based on the x86-64 patch from Nick Piggin, lots of
      changes since then. "Alan D. Brunelle" <Alan.Brunelle@hp.com> has
      contributed lots of fixes and suggestions as well. Also thanks to
      Paul E. McKenney <paulmck@linux.vnet.ibm.com> for reviewing RCU usage
      and getting rid of the data allocation fallback deadlock.
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      3d442233
  7. 24 6月, 2008 2 次提交
    • A
      x86: use cpu_khz for loops_per_jiffy calculation, cleanup · f3f3149f
      Alok Kataria 提交于
      As suggested by Ingo, remove all references to tsc from init/calibrate.c
      
      TSC is x86 specific, and using tsc in variable names in a generic file should
      be avoided. lpj_tsc is now called lpj_fine, since it is related to fine tuning
      of lpj value. Also tsc_rate_*  is called timer_rate_*
      Signed-off-by: NAlok N Kataria <akataria@vmware.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Daniel Hecht <dhecht@vmware.com>
      Cc: Tim Mann <mann@vmware.com>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Sahil Rihan <srihan@vmware.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f3f3149f
    • A
      x86: use cpu_khz for loops_per_jiffy calculation · 3da757da
      Alok Kataria 提交于
      On the x86 platform we can use the value of tsc_khz computed during tsc
      calibration to calculate the loops_per_jiffy value. Its very important
      to keep the error in lpj values to minimum as any error in that may
      result in kernel panic in check_timer. In virtualization environment, On
      a highly overloaded host the guest delay calibration may sometimes
      result in errors beyond the ~50% that timer_irq_works can handle,
      resulting in the guest panicking.
      
      Does some formating changes to lpj_setup code to now have a single
      printk to print the bogomips value.
      
      We do this only for the boot processor because the AP's can have
      different base frequencies or the BIOS might boot a AP at a different
      frequency.
      Signed-off-by: NAlok N Kataria <akataria@vmware.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Daniel Hecht <dhecht@vmware.com>
      Cc: Tim Mann <mann@vmware.com>
      Cc: Zach Amsden <zach@vmware.com>
      Cc: Sahil Rihan <srihan@vmware.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3da757da
  8. 26 5月, 2008 1 次提交
    • S
      Kconfig: introduce ARCH_DEFCONFIG to DEFCONFIG_LIST · 73531905
      Sam Ravnborg 提交于
      init/Kconfig contains a list of configs that are searched
      for if 'make *config' are used with no .config present.
      Extend this list to look at the config identified by
      ARCH_DEFCONFIG.
      
      With this change we now try the defconfig targets last.
      
      This fixes a regression reported
      by: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      73531905
  9. 25 5月, 2008 1 次提交
  10. 19 5月, 2008 1 次提交
    • P
      rcu: add call_rcu_sched() · 4446a36f
      Paul E. McKenney 提交于
      Fourth cut of patch to provide the call_rcu_sched().  This is again to
      synchronize_sched() as call_rcu() is to synchronize_rcu().
      
      Should be fine for experimental and -rt use, but not ready for inclusion.
      With some luck, I will be able to tell Andrew to come out of hiding on
      the next round.
      
      Passes multi-day rcutorture sessions with concurrent CPU hotplugging.
      
      Fixes since the first version include a bug that could result in
      indefinite blocking (spotted by Gautham Shenoy), better resiliency
      against CPU-hotplug operations, and other minor fixes.
      
      Fixes since the second version include reworking grace-period detection
      to avoid deadlocks that could happen when running concurrently with
      CPU hotplug, adding Mathieu's fix to avoid the softlockup messages,
      as well as Mathieu's fix to allow use earlier in boot.
      
      Fixes since the third version include a wrong-CPU bug spotted by
      Andrew, getting rid of the obsolete synchronize_kernel API that somehow
      snuck back in, merging spin_unlock() and local_irq_restore() in a
      few places, commenting the code that checks for quiescent states based
      on interrupting from user-mode execution or the idle loop, removing
      some inline attributes, and some code-style changes.
      
      Known/suspected shortcomings:
      
      o	I still do not entirely trust the sleep/wakeup logic.  Next step
      	will be to use a private snapshot of the CPU online mask in
      	rcu_sched_grace_period() -- if the CPU wasn't there at the start
      	of the grace period, we don't need to hear from it.  And the
      	bit about accounting for changes in online CPUs inside of
      	rcu_sched_grace_period() is ugly anyway.
      
      o	It might be good for rcu_sched_grace_period() to invoke
      	resched_cpu() when a given CPU wasn't responding quickly,
      	but resched_cpu() is declared static...
      
      This patch also fixes a long-standing bug in the earlier preemptable-RCU
      implementation of synchronize_rcu() that could result in loss of
      concurrent external changes to a task's CPU affinity mask.  I still cannot
      remember who reported this...
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      4446a36f
  11. 16 5月, 2008 3 次提交
  12. 15 5月, 2008 1 次提交
  13. 13 5月, 2008 1 次提交
  14. 09 5月, 2008 1 次提交
  15. 07 5月, 2008 1 次提交
  16. 06 5月, 2008 3 次提交
  17. 05 5月, 2008 1 次提交
    • L
      Make forced module loading optional · 826e4506
      Linus Torvalds 提交于
      The kernel module loader used to be much too happy to allow loading of
      modules for the wrong kernel version by default.  For example, if you
      had MODVERSIONS enabled, but tried to load a module with no version
      info, it would happily load it and taint the kernel - whether it was
      likely to actually work or not!
      
      Generally, such forced module loading should be considered a really
      really bad idea, so make it conditional on a new config option
      (MODULE_FORCE_LOAD), and make it default to off.
      
      If somebody really wants to force module loads, that's their problem,
      but we should not encourage it.  Especially as it happened to me by
      mistake (ie regular unversioned Fedora modules getting loaded) causing
      lots of strange behavior.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      826e4506
  18. 02 5月, 2008 1 次提交
  19. 30 4月, 2008 3 次提交
    • T
      infrastructure to debug (dynamic) objects · 3ac7fe5a
      Thomas Gleixner 提交于
      We can see an ever repeating problem pattern with objects of any kind in the
      kernel:
      
      1) freeing of active objects
      2) reinitialization of active objects
      
      Both problems can be hard to debug because the crash happens at a point where
      we have no chance to decode the root cause anymore.  One problem spot are
      kernel timers, where the detection of the problem often happens in interrupt
      context and usually causes the machine to panic.
      
      While working on a timer related bug report I had to hack specialized code
      into the timer subsystem to get a reasonable hint for the root cause.  This
      debug hack was fine for temporary use, but far from a mergeable solution due
      to the intrusiveness into the timer code.
      
      The code further lacked the ability to detect and report the root cause
      instantly and keep the system operational.
      
      Keeping the system operational is important to get hold of the debug
      information without special debugging aids like serial consoles and special
      knowledge of the bug reporter.
      
      The problems described above are not restricted to timers, but timers tend to
      expose it usually in a full system crash.  Other objects are less explosive,
      but the symptoms caused by such mistakes can be even harder to debug.
      
      Instead of creating specialized debugging code for the timer subsystem a
      generic infrastructure is created which allows developers to verify their code
      and provides an easy to enable debug facility for users in case of trouble.
      
      The debugobjects core code keeps track of operations on static and dynamic
      objects by inserting them into a hashed list and sanity checking them on
      object operations and provides additional checks whenever kernel memory is
      freed.
      
      The tracked object operations are:
      - initializing an object
      - adding an object to a subsystem list
      - deleting an object from a subsystem list
      
      Each operation is sanity checked before the operation is executed and the
      subsystem specific code can provide a fixup function which allows to prevent
      the damage of the operation.  When the sanity check triggers a warning message
      and a stack trace is printed.
      
      The list of operations can be extended if the need arises.  For now it's
      limited to the requirements of the first user (timers).
      
      The core code enqueues the objects into hash buckets.  The hash index is
      generated from the address of the object to simplify the lookup for the check
      on kfree/vfree.  Each bucket has it's own spinlock to avoid contention on a
      global lock.
      
      The debug code can be compiled in without being active.  The runtime overhead
      is minimal and could be optimized by asm alternatives.  A kernel command line
      option enables the debugging code.
      
      Thanks to Ingo Molnar for review, suggestions and cleanup patches.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ac7fe5a
    • P
      Deprecate find_task_by_pid() · 5cd20455
      Pavel Emelyanov 提交于
      There are some places that are known to operate on tasks'
      global pids only:
      
      * the rest_init() call (called on boot)
      * the kgdb's getthread
      * the create_kthread() (since the kthread is run in init ns)
      
      So use the find_task_by_pid_ns(..., &init_pid_ns) there
      and schedule the find_task_by_pid for removal.
      
      [sukadev@us.ibm.com: Fix warning in kernel/pid.c]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5cd20455
    • O
      signals: fix /sbin/init protection from unwanted signals · fae5fa44
      Oleg Nesterov 提交于
      The global init has a lot of long standing problems with the unhandled fatal
      signals.
      
      	- The "is_global_init(current)" check in get_signal_to_deliver()
      	  protects only the main thread. Sub-thread can dequee the fatal
      	  signal and shutdown the whole thread group except the main thread.
      	  If it dequeues SIGSTOP /sbin/init will be stopped, this is not
      	  right too. Note that we can't use is_global_init(->group_leader),
      	  this breaks exec and this can't solve other problems we have.
      
      	- Even if afterwards ignored, the fatal signals sets SIGNAL_GROUP_EXIT
      	  on delivery. This breaks exec, has other bad implications, and this
      	  is just wrong.
      
      Introduce the new SIGNAL_UNKILLABLE flag to fix these problems.  It also helps
      to solve some other problems addressed by the subsequent patches.
      
      Currently we use this flag for the global init only, but it could also be used
      by kthreads and (perhaps) by the sub-namespace inits.
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Roland McGrath <roland@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fae5fa44
  20. 29 4月, 2008 6 次提交
    • A
      idr: create idr_layer_cache at boot time · 199f0ca5
      Akinobu Mita 提交于
      Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
      unconditionary at boot time rather than creating it on-demand when idr_init()
      is called the first time.
      
      This change also enables us to eliminate the check every time idr_init() is
      called.
      
      [akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
      [akpm@linux-foundation.org: fix alpha build]
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      199f0ca5
    • H
      sysctl: allow embedded targets to disable sysctl_check.c · 88f458e4
      Holger Schurig 提交于
      Disable sysctl_check.c for embedded targets. This saves about about 11 kB
      in .text and another 11 kB in .data on a PXA255 embedded platform.
      Signed-off-by: NHolger Schurig <hs4233@mail.mn-solutions.de>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88f458e4
    • B
      cgroups: add an owner to the mm_struct · cf475ad2
      Balbir Singh 提交于
      Remove the mem_cgroup member from mm_struct and instead adds an owner.
      
      This approach was suggested by Paul Menage.  The advantage of this approach
      is that, once the mm->owner is known, using the subsystem id, the cgroup
      can be determined.  It also allows several control groups that are
      virtually grouped by mm_struct, to exist independent of the memory
      controller i.e., without adding mem_cgroup's for each controller, to
      mm_struct.
      
      A new config option CONFIG_MM_OWNER is added and the memory resource
      controller selects this config option.
      
      This patch also adds cgroup callbacks to notify subsystems when mm->owner
      changes.  The mm_cgroup_changed callback is called with the task_lock() of
      the new task held and is called just prior to changing the mm->owner.
      
      I am indebted to Paul Menage for the several reviews of this patchset and
      helping me make it lighter and simpler.
      
      This patch was tested on a powerpc box, it was compiled with both the
      MM_OWNER config turned on and off.
      
      After the thread group leader exits, it's moved to init_css_state by
      cgroup_exit(), thus all future charges from runnings threads would be
      redirected to the init_css_set's subsystem.
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelianov <xemul@openvz.org>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: Hirokazu Takahashi <taka@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>,
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Reviewed-by: NPaul Menage <menage@google.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf475ad2
    • S
      cgroups: implement device whitelist · 08ce5f16
      Serge E. Hallyn 提交于
      Implement a cgroup to track and enforce open and mknod restrictions on device
      files.  A device cgroup associates a device access whitelist with each cgroup.
       A whitelist entry has 4 fields.  'type' is a (all), c (char), or b (block).
      'all' means it applies to all types and all major and minor numbers.  Major
      and minor are either an integer or * for all.  Access is a composition of r
      (read), w (write), and m (mknod).
      
      The root device cgroup starts with rwm to 'all'.  A child devcg gets a copy of
      the parent.  Admins can then remove devices from the whitelist or add new
      entries.  A child cgroup can never receive a device access which is denied its
      parent.  However when a device access is removed from a parent it will not
      also be removed from the child(ren).
      
      An entry is added using devices.allow, and removed using
      devices.deny.  For instance
      
      	echo 'c 1:3 mr' > /cgroups/1/devices.allow
      
      allows cgroup 1 to read and mknod the device usually known as
      /dev/null.  Doing
      
      	echo a > /cgroups/1/devices.deny
      
      will remove the default 'a *:* mrw' entry.
      
      CAP_SYS_ADMIN is needed to change permissions or move another task to a new
      cgroup.  A cgroup may not be granted more permissions than the cgroup's parent
      has.  Any task can move itself between cgroups.  This won't be sufficient, but
      we can decide the best way to adequately restrict movement later.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Acked-by: NJames Morris <jmorris@namei.org>
      Looks-good-to: Pavel Emelyanov <xemul@openvz.org>
      Cc: Daniel Hokka Zakrisson <daniel@hozac.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08ce5f16
    • P
      CGroup API files: make CGROUP_DEBUG default to off · 418d7d87
      Paul Menage 提交于
      The cgroup debug subsystem isn't generally useful for users.  It should
      default to "n".
      Signed-off-by: NPaul Menage <menage@google.com>
      Cc: "Li Zefan" <lizf@cn.fujitsu.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: "YAMAMOTO Takashi" <yamamoto@valinux.co.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      418d7d87
    • T
      directly use kmalloc() and kfree() in init/initramfs.c · 3265e66b
      Thomas Petazzoni 提交于
      Instead of using the malloc() and free() wrappers needed by the
      lib/inflate.c code for allocations, simply use kmalloc() and kfree() in the
      initramfs code.  This is needed for a further lib/inflate.c-related cleanup
      patch that will remove the malloc() and free() functions.
      
      Take that opportunity to remove the useless kmalloc() return value
      cast.
      
      Based on work done by Matt Mackall.
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Jan Engelhardt <jengelh@computergmbh.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3265e66b