1. 03 11月, 2011 6 次提交
    • L
      Revert "perf: Add PM notifiers to fix CPU hotplug races" · 4536e4d1
      Linus Torvalds 提交于
      This reverts commit 144060fe.
      
      It causes a resume regression for Andi on his Acer Aspire 1830T post
      3.1.  The screen just stays black after wakeup.
      
      Also, it really looks like the wrong way to suspend and resume perf
      events: I think they should be done as part of the CPU suspend and
      resume, rather than as a notifier that does smp_call_function().
      Reported-by: NAndi Kleen <andi@firstfloor.org>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4536e4d1
    • A
      memcg: replace ss->id_lock with a rwlock · c1e2ee2d
      Andrew Bresticker 提交于
      While back-porting Johannes Weiner's patch "mm: memcg-aware global
      reclaim" for an internal effort, we noticed a significant performance
      regression during page-reclaim heavy workloads due to high contention of
      the ss->id_lock.  This lock protects idr map, and serializes calls to
      idr_get_next() in css_get_next() (which is used during the memcg hierarchy
      walk).
      
      Since idr_get_next() is just doing a look up, we need only serialize it
      with respect to idr_remove()/idr_get_new().  By making the ss->id_lock a
      rwlock, contention is greatly reduced and performance improves.
      
      Tested: cat a 256m file from a ramdisk in a 128m container 50 times on
      each core (one file + container per core) in parallel on a NUMA machine.
      Result is the time for the test to complete in 1 of the containers.
      Both kernels included Johannes' memcg-aware global reclaim patches.
      
      Before rwlock patch: 1710.778s
      After rwlock patch: 152.227s
      Signed-off-by: NAndrew Bresticker <abrestic@google.com>
      Cc: Paul Menage <menage@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ying Han <yinghan@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c1e2ee2d
    • L
      sysctl: add support for poll() · f1ecf068
      Lucas De Marchi 提交于
      Adding support for poll() in sysctl fs allows userspace to receive
      notifications of changes in sysctl entries.  This adds a infrastructure to
      allow files in sysctl fs to be pollable and implements it for hostname and
      domainname.
      
      [akpm@linux-foundation.org: s/declare/define/ for definitions]
      Signed-off-by: NLucas De Marchi <lucas.demarchi@profusion.mobi>
      Cc: Greg KH <gregkh@suse.de>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1ecf068
    • D
      cpusets: avoid looping when storing to mems_allowed if one node remains set · 89e8a244
      David Rientjes 提交于
      {get,put}_mems_allowed() exist so that general kernel code may locklessly
      access a task's set of allowable nodes without having the chance that a
      concurrent write will cause the nodemask to be empty on configurations
      where MAX_NUMNODES > BITS_PER_LONG.
      
      This could incur a significant delay, however, especially in low memory
      conditions because the page allocator is blocking and reclaim requires
      get_mems_allowed() itself.  It is not atypical to see writes to
      cpuset.mems take over 2 seconds to complete, for example.  In low memory
      conditions, this is problematic because it's one of the most imporant
      times to change cpuset.mems in the first place!
      
      The only way a task's set of allowable nodes may change is through cpusets
      by writing to cpuset.mems and when attaching a task to a generic code is
      not reading the nodemask with get_mems_allowed() at the same time, and
      then clearing all the old nodes.  This prevents the possibility that a
      reader will see an empty nodemask at the same time the writer is storing a
      new nodemask.
      
      If at least one node remains unchanged, though, it's possible to simply
      set all new nodes and then clear all the old nodes.  Changing a task's
      nodemask is protected by cgroup_mutex so it's guaranteed that two threads
      are not changing the same task's nodemask at the same time, so the
      nodemask is guaranteed to be stored before another thread changes it and
      determines whether a node remains set or not.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Miao Xie <miaox@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Paul Menage <paul@paulmenage.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89e8a244
    • B
      cgroups: don't attach task to subsystem if migration failed · 77ceab8e
      Ben Blum 提交于
      If a task has exited to the point it has called cgroup_exit() already,
      then we can't migrate it to another cgroup anymore.
      
      This can happen when we are attaching a task to a new cgroup between the
      call to ->can_attach_task() on subsystems and the migration that is
      eventually tried in cgroup_task_migrate().
      
      In this case cgroup_task_migrate() returns -ESRCH and we don't want to
      attach the task to the subsystems because the attachment to the new cgroup
      itself failed.
      
      Fix this by only calling ->attach_task() on the subsystems if the cgroup
      migration succeeded.
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
      Acked-by: NPaul Menage <paul@paulmenage.org>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      77ceab8e
    • B
      cgroups: more safe tasklist locking in cgroup_attach_proc · 33ef6b69
      Ben Blum 提交于
      Fix unstable tasklist locking in cgroup_attach_proc.
      
      According to this thread - https://lkml.org/lkml/2011/7/27/243 - RCU is
      not sufficient to guarantee the tasklist is stable w.r.t.  de_thread and
      exit.  Taking tasklist_lock for reading, instead of rcu_read_lock, ensures
      proper exclusion.
      Signed-off-by: NBen Blum <bblum@andrew.cmu.edu>
      Acked-by: NPaul Menage <paul@paulmenage.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33ef6b69
  2. 01 11月, 2011 11 次提交
    • A
      kgdb: follow rename pack_hex_byte() to hex_byte_pack() · 50e1499f
      Andy Shevchenko 提交于
      There is no functional change.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Acked-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50e1499f
    • W
      printk: remove bounds checking for log_prefix · ae29bc92
      William Douglas 提交于
      Currently log_prefix is testing that the first character of the log level
      and facility is less than '0' and greater than '9' (which is always
      false).
      
      Since the code being updated works because strtoul bombs out (endp isn't
      updated) and 0 is returned anyway just remove the check and don't change
      the behavior of the function.
      Signed-off-by: NWilliam Douglas <william.douglas@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae29bc92
    • W
      printk: fix bounds checking for log_prefix · 48e41899
      William Douglas 提交于
      Currently log_prefix is testing that the first character of the log level
      and facility is less than '0' and greater than '9' (which is always
      false).  It should be testing to see if the character less than '0' or
      greater than '9' instead.  This patch makes that change.
      
      The code being changed worked because strtoul bombs out (endp isn't
      updated) and 0 is returned anyway.
      Signed-off-by: NWilliam Douglas <william.douglas@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48e41899
    • Y
      printk: add console_suspend module parameter · 134620f7
      Yanmin Zhang 提交于
      We are enabling some power features on medfield.  To test suspend-2-RAM
      conveniently, we need turn on/off console_suspend_enabled frequently.
      
      Add a module parameter, so users could change it by:
      /sys/module/printk/parameters/console_suspend
      Signed-off-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      134620f7
    • Y
      printk: add module parameter ignore_loglevel to control ignore_loglevel · 0eca6b7c
      Yanmin Zhang 提交于
      We are enabling some power features on medfield.  To test suspend-2-RAM
      conveniently, we need turn on/off ignore_loglevel frequently without
      rebooting.
      
      Add a module parameter, so users can change it by:
      /sys/module/printk/parameters/ignore_loglevel
      Signed-off-by: NYanmin Zhang <yanmin.zhang@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0eca6b7c
    • D
      kernel/sysctl.c: add cap_last_cap to /proc/sys/kernel · 73efc039
      Dan Ballard 提交于
      Userspace needs to know the highest valid capability of the running
      kernel, which right now cannot reliably be retrieved from the header files
      only.  The fact that this value cannot be determined properly right now
      creates various problems for libraries compiled on newer header files
      which are run on older kernels.  They assume capabilities are available
      which actually aren't.  libcap-ng is one example.  And we ran into the
      same problem with systemd too.
      
      Now the capability is exported in /proc/sys/kernel/cap_last_cap.
      
      [akpm@linux-foundation.org: make cap_last_cap const, per Ulrich]
      Signed-off-by: NDan Ballard <dan@mindstab.net>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Lennart Poettering <lennart@poettering.net>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Ulrich Drepper <drepper@akkadia.org>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73efc039
    • V
      watchdog: move watchdog_*_all_cpus under CONFIG_SYSCTL · 4ff81951
      Vasily Averin 提交于
      Fix compilation warnings for CONFIG_SYSCTL=n:
      
      fixed compilation warnings in case of disabled CONFIG_SYSCTL
      kernel/watchdog.c:483:13: warning: `watchdog_enable_all_cpus' defined but not used
      kernel/watchdog.c:500:13: warning: `watchdog_disable_all_cpus' defined but not used
      
      these functions are static and are used only in sysctl handler, so move
      them inside #ifdef CONFIG_SYSCTL too
      Signed-off-by: NVasily Averin <vvs@sw.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ff81951
    • J
      stop_machine: make stop_machine safe and efficient to call early · f445027e
      Jeremy Fitzhardinge 提交于
      Make stop_machine() safe to call early in boot, before SMP has been set
      up, by simply calling the callback function directly if there's only one
      CPU online.
      
      [ Fixes from AKPM:
         - add comment
         - local_irq_flags, not save_flags
         - also call hard_irq_disable() for systems which need it
      
        Tejun suggested using an explicit flag rather than just looking at
        the online cpu count. ]
      
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Acked-by: NTejun Heo <htejun@gmail.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f445027e
    • C
      mm: distinguish between mlocked and pinned pages · bc3e53f6
      Christoph Lameter 提交于
      Some kernel components pin user space memory (infiniband and perf) (by
      increasing the page count) and account that memory as "mlocked".
      
      The difference between mlocking and pinning is:
      
      A. mlocked pages are marked with PG_mlocked and are exempt from
         swapping. Page migration may move them around though.
         They are kept on a special LRU list.
      
      B. Pinned pages cannot be moved because something needs to
         directly access physical memory. They may not be on any
         LRU list.
      
      I recently saw an mlockalled process where mm->locked_vm became
      bigger than the virtual size of the process (!) because some
      memory was accounted for twice:
      
      Once when the page was mlocked and once when the Infiniband
      layer increased the refcount because it needt to pin the RDMA
      memory.
      
      This patch introduces a separate counter for pinned pages and
      accounts them seperately.
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Cc: Mike Marciniszyn <infinipath@qlogic.com>
      Cc: Roland Dreier <roland@kernel.org>
      Cc: Sean Hefty <sean.hefty@intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bc3e53f6
    • D
      oom: remove oom_disable_count · c9f01245
      David Rientjes 提交于
      This removes mm->oom_disable_count entirely since it's unnecessary and
      currently buggy.  The counter was intended to be per-process but it's
      currently decremented in the exit path for each thread that exits, causing
      it to underflow.
      
      The count was originally intended to prevent oom killing threads that
      share memory with threads that cannot be killed since it doesn't lead to
      future memory freeing.  The counter could be fixed to represent all
      threads sharing the same mm, but it's better to remove the count since:
      
       - it is possible that the OOM_DISABLE thread sharing memory with the
         victim is waiting on that thread to exit and will actually cause
         future memory freeing, and
      
       - there is no guarantee that a thread is disabled from oom killing just
         because another thread sharing its mm is oom disabled.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Ying Han <yinghan@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c9f01245
    • C
      Cross Memory Attach · fcf63409
      Christopher Yeoh 提交于
      The basic idea behind cross memory attach is to allow MPI programs doing
      intra-node communication to do a single copy of the message rather than a
      double copy of the message via shared memory.
      
      The following patch attempts to achieve this by allowing a destination
      process, given an address and size from a source process, to copy memory
      directly from the source process into its own address space via a system
      call.  There is also a symmetrical ability to copy from the current
      process's address space into a destination process's address space.
      
      - Use of /proc/pid/mem has been considered, but there are issues with
        using it:
        - Does not allow for specifying iovecs for both src and dest, assuming
          preadv or pwritev was implemented either the area read from or
        written to would need to be contiguous.
        - Currently mem_read allows only processes who are currently
        ptrace'ing the target and are still able to ptrace the target to read
        from the target. This check could possibly be moved to the open call,
        but its not clear exactly what race this restriction is stopping
        (reason  appears to have been lost)
        - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
        domain socket is a bit ugly from a userspace point of view,
        especially when you may have hundreds if not (eventually) thousands
        of processes  that all need to do this with each other
        - Doesn't allow for some future use of the interface we would like to
        consider adding in the future (see below)
        - Interestingly reading from /proc/pid/mem currently actually
        involves two copies! (But this could be fixed pretty easily)
      
      As mentioned previously use of vmsplice instead was considered, but has
      problems.  Since you need the reader and writer working co-operatively if
      the pipe is not drained then you block.  Which requires some wrapping to
      do non blocking on the send side or polling on the receive.  In all to all
      communication it requires ordering otherwise you can deadlock.  And in the
      example of many MPI tasks writing to one MPI task vmsplice serialises the
      copying.
      
      There are some cases of MPI collectives where even a single copy interface
      does not get us the performance gain we could.  For example in an
      MPI_Reduce rather than copy the data from the source we would like to
      instead use it directly in a mathops (say the reduce is doing a sum) as
      this would save us doing a copy.  We don't need to keep a copy of the data
      from the source.  I haven't implemented this, but I think this interface
      could in the future do all this through the use of the flags - eg could
      specify the math operation and type and the kernel rather than just
      copying the data would apply the specified operation between the source
      and destination and store it in the destination.
      
      Although we don't have a "second user" of the interface (though I've had
      some nibbles from people who may be interested in using it for intra
      process messaging which is not MPI).  This interface is something which
      hardware vendors are already doing for their custom drivers to implement
      fast local communication.  And so in addition to this being useful for
      OpenMPI it would mean the driver maintainers don't have to fix things up
      when the mm changes.
      
      There was some discussion about how much faster a true zero copy would
      go. Here's a link back to the email with some testing I did on that:
      
      http://marc.info/?l=linux-mm&m=130105930902915&w=2
      
      There is a basic man page for the proposed interface here:
      
      http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt
      
      This has been implemented for x86 and powerpc, other architecture should
      mainly (I think) just need to add syscall numbers for the process_vm_readv
      and process_vm_writev. There are 32 bit compatibility versions for
      64-bit kernels.
      
      For arch maintainers there are some simple tests to be able to quickly
      verify that the syscalls are working correctly here:
      
      http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgzSigned-off-by: NChris Yeoh <yeohc@au1.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: <linux-man@vger.kernel.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcf63409
  3. 31 10月, 2011 1 次提交
  4. 30 10月, 2011 5 次提交
    • M
      [S390] sparse: fix sparse warnings about missing prototypes · 638ad34a
      Martin Schwidefsky 提交于
      Add prototypes and includes for functions used in different modules.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      638ad34a
    • M
      [S390] kdump: Add infrastructure for unmapping crashkernel memory · 558df720
      Michael Holzheu 提交于
      This patch introduces a mechanism that allows architecture backends to
      remove page tables for the crashkernel memory. This can protect the loaded
      kdump kernel from being overwritten by broken kernel code.  Two new
      functions crash_map_reserved_pages() and crash_unmap_reserved_pages() are
      added that can be implemented by architecture code.  The
      crash_map_reserved_pages() function is called before and
      crash_unmap_reserved_pages() after the crashkernel segments are loaded.  The
      functions are also called in crash_shrink_memory() to create/remove page
      tables when the crashkernel memory size is reduced.
      
      To support architectures that have large pages this patch also introduces
      a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must
      always be aligned with KEXEC_CRASH_MEM_ALIGN.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      558df720
    • M
      [S390] kdump: Initialize vmcoreinfo note at startup · fa8ff292
      Michael Holzheu 提交于
      Currently the vmcoreinfo note is only initialized in case of kdump. On s390
      it is possible to create kernel dumps with other dump mechanisms than kdump
      (e.g. via hypervisor dump or stand-alone dump tools). For those dumps it
      would also be desirable to include the vmcoreinfo data. To accomplish this,
      with this patch the vmcoreinfo ELF note is always initialized, not only in
      case of a (kdump) crash. On s390 we will add an ABI defined pointer at
      a well known address to vmcoreinfo so that dump analysis tools are able to
      find this information.
      
      In particular on s390 we have a tool named zgetdump. With this tool it is
      possible to convert dump formats on the fly using fuse. E.g. you can mount a
      s390 stand-alone dump as ELF dump. When this is done, the tool finds the
      vmcoreinfo in the stand-alone dump via the well known ABI defined address and
      it creates the respective VMCOREINFO ELF note in the output ELF dump. This then
      can be used e.g. by makedumpfile for dump filtering.  No more need for a
      vmlinux file with debug information.
      
      So this will look like the following:
      $ zgetdump --mount standalone.dump -f elf /mnt
      $ ls /mnt
        dump.elf
      $ readelf -n /mnt/dump.elf
      $ ...
        VMCOREINFO            0x00000474      Unknown note type: (0x00000000)
      $ makedumpfile -c -d 31 /mnt/dump.elf dump.kdump
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      fa8ff292
    • M
      [S390] kdump: Add size to elfcorehdr kernel parameter · d3bf3795
      Michael Holzheu 提交于
      Currently only the address of the pre-allocated ELF header is passed with
      the elfcorehdr= kernel parameter. In order to reserve memory for the header
      in the 2nd kernel also the size is required. Current kdump architecture
      backends use different methods to do that, e.g. x86 uses the memmap= kernel
      parameter. On s390 there is no easy way to transfer this information.
      Therefore the elfcorehdr kernel parameter is extended to also pass the size.
      This now can also be used as standard mechanism by all future kdump
      architecture backends.
      
      The syntax of the kernel parameter is extended as follows:
      
      elfcorehdr=[size[KMG]@]offset[KMG]
      
      This change is backward compatible because elfcorehdr=size is still allowed.
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      d3bf3795
    • M
      [S390] kdump: Add KEXEC_CRASH_CONTROL_MEMORY_LIMIT · 3d214fae
      Michael Holzheu 提交于
      On s390 there is a different KEXEC_CONTROL_MEMORY_LIMIT for the normal and
      the kdump kexec case. Therefore this patch introduces a new macro
      KEXEC_CRASH_CONTROL_MEMORY_LIMIT. This is set to
      KEXEC_CONTROL_MEMORY_LIMIT for all architectures that do not define
      KEXEC_CRASH_CONTROL_MEMORY_LIMIT.
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      3d214fae
  5. 26 10月, 2011 2 次提交
  6. 24 10月, 2011 1 次提交
  7. 18 10月, 2011 1 次提交
    • P
      cputimer: Cure lock inversion · bcd5cff7
      Peter Zijlstra 提交于
      There's a lock inversion between the cputimer->lock and rq->lock;
      notably the two callchains involved are:
      
       update_rlimit_cpu()
         sighand->siglock
         set_process_cpu_timer()
           cpu_timer_sample_group()
             thread_group_cputimer()
               cputimer->lock
               thread_group_cputime()
                 task_sched_runtime()
                   ->pi_lock
                   rq->lock
      
       scheduler_tick()
         rq->lock
         task_tick_fair()
           update_curr()
             account_group_exec()
               cputimer->lock
      
      Where the first one is enabling a CLOCK_PROCESS_CPUTIME_ID timer, and
      the second one is keeping up-to-date.
      
      This problem was introduced by e8abccb7 ("posix-cpu-timers: Cure
      SMP accounting oddities").
      
      Cure the problem by removing the cputimer->lock and rq->lock nesting,
      this leaves concurrent enablers doing duplicate work, but the time
      wasted should be on the same order otherwise wasted spinning on the
      lock and the greater-than assignment filter should ensure we preserve
      monotonicity.
      Reported-by: NDave Jones <davej@redhat.com>
      Reported-by: NSimon Kirby <sim@hostway.ca>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: stable@kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Link: http://lkml.kernel.org/r/1318928713.21167.4.camel@twinsSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      bcd5cff7
  8. 17 10月, 2011 13 次提交