1. 14 7月, 2011 1 次提交
    • G
      KVM: Steal time implementation · c9aaa895
      Glauber Costa 提交于
      To implement steal time, we need the hypervisor to pass the guest
      information about how much time was spent running other processes
      outside the VM, while the vcpu had meaningful work to do - halt
      time does not count.
      
      This information is acquired through the run_delay field of
      delayacct/schedstats infrastructure, that counts time spent in a
      runqueue but not running.
      
      Steal time is a per-cpu information, so the traditional MSR-based
      infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the
      memory area address containing information about steal time
      
      This patch contains the hypervisor part of the steal time infrasructure,
      and can be backported independently of the guest portion.
      
      [avi, yongjie: export delayacct_on, to avoid build failures in some configs]
      Signed-off-by: NGlauber Costa <glommer@redhat.com>
      Tested-by: NEric B Munson <emunson@mgebm.net>
      CC: Rik van Riel <riel@redhat.com>
      CC: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Anthony Liguori <aliguori@us.ibm.com>
      Signed-off-by: NYongjie Ren <yongjie.ren@intel.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      c9aaa895
  2. 12 7月, 2011 1 次提交
    • A
      KVM: Add compat ioctl for KVM_SET_SIGNAL_MASK · 1dda606c
      Alexander Graf 提交于
      KVM has an ioctl to define which signal mask should be used while running
      inside VCPU_RUN. At least for big endian systems, this mask is different
      on 32-bit and 64-bit systems (though the size is identical).
      
      Add a compat wrapper that converts the mask to whatever the kernel accepts,
      allowing 32-bit kvm user space to set signal masks.
      
      This patch fixes qemu with --enable-io-thread on ppc64 hosts when running
      32-bit user land.
      Signed-off-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      1dda606c
  3. 07 7月, 2011 1 次提交
  4. 28 6月, 2011 1 次提交
    • V
      taskstats: don't allow duplicate entries in listener mode · 26c4caea
      Vasiliy Kulikov 提交于
      Currently a single process may register exit handlers unlimited times.
      It may lead to a bloated listeners chain and very slow process
      terminations.
      
      Eg after 10KK sent TASKSTATS_CMD_ATTR_REGISTER_CPUMASKs ~300 Mb of
      kernel memory is stolen for the handlers chain and "time id" shows 2-7
      seconds instead of normal 0.003.  It makes it possible to exhaust all
      kernel memory and to eat much of CPU time by triggerring numerous exits
      on a single CPU.
      
      The patch limits the number of times a single process may register
      itself on a single CPU to one.
      
      One little issue is kept unfixed - as taskstats_exit() is called before
      exit_files() in do_exit(), the orphaned listener entry (if it was not
      explicitly deregistered) is kept until the next someone's exit() and
      implicit deregistration in send_cpu_listeners().  So, if a process
      registered itself as a listener exits and the next spawned process gets
      the same pid, it would inherit taskstats attributes.
      Signed-off-by: NVasiliy Kulikov <segooon@gmail.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26c4caea
  5. 22 6月, 2011 3 次提交
    • J
      alarmtimers: Return -ENOTSUPP if no RTC device is present · 1c6b39ad
      John Stultz 提交于
      Toralf Förster and Richard Weinberger noted that if there is
      no RTC device, the alarm timers core prints out an annoying
      "ALARM timers will not wake from suspend" message.
      
      This warning has been removed in a previous patch, however
      the issue still remains:  The original idea was to support
      alarm timers even if there was no rtc device, as long as the
      system didn't go into suspend.
      
      However, after further consideration, communicating to the application
      that alarmtimers are not fully functional seems like the better
      solution.
      
      So this patch makes it so we return -ENOTSUPP to any posix _ALARM
      clockid calls if there is no backing RTC device on the system.
      
      Further this changes the behavior where when there is no rtc device
      we will check for one on clock_getres, clock_gettime, timer_create,
      and timer_nsleep instead of on suspend.
      
      CC: Toralf Förster <toralf.foerster@gmx.de>
      CC: Richard Weinberger <richard@nod.at
      CC: Peter Zijlstra <peterz@infradead.org>
      CC: Thomas Gleixner <tglx@linutronix.de>
      Reported-by: NToralf Förster <toralf.foerster@gmx.de>
      Reported by: Richard Weinberger <richard@nod.at>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      1c6b39ad
    • J
      alarmtimers: Handle late rtc module loading · c008ba58
      John Stultz 提交于
      The alarmtimers code currently picks a rtc device to use at
      late init time. However, if your rtc driver is loaded as a module,
      it may be registered after the alarmtimers late init code, leaving
      the alarmtimers nonfunctional.
      
      This patch moves the the rtcdevice selection to when we actually try
      to use it, allowing us to make use of rtc modules that may have been
      loaded at any point since bootup.
      
      CC: Thomas Gleixner <tglx@linutronix.de>
      CC: Meelis Roos <mroos@ut.ee>
      Reported-by: NMeelis Roos <mroos@ut.ee>
      Signed-off-by: NJohn Stultz <john.stultz@linaro.org>
      c008ba58
    • M
      PM: Free memory bitmaps if opening /dev/snapshot fails · 8440f4b1
      Michal Kubecek 提交于
      When opening /dev/snapshot device, snapshot_open() creates memory
      bitmaps which are freed in snapshot_release(). But if any of the
      callbacks called by pm_notifier_call_chain() returns NOTIFY_BAD, open()
      fails, snapshot_release() is never called and bitmaps are not freed.
      Next attempt to open /dev/snapshot then triggers BUG_ON() check in
      create_basic_memory_bitmaps(). This happens e.g. when vmwatchdog module
      is active on s390x.
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: stable@kernel.org
      8440f4b1
  6. 18 6月, 2011 1 次提交
    • D
      KEYS/DNS: Fix ____call_usermodehelper() to not lose the session keyring · 87966996
      David Howells 提交于
      ____call_usermodehelper() now erases any credentials set by the
      subprocess_inf::init() function.  The problem is that commit
      17f60a7d ("capabilites: allow the application of capability limits
      to usermode helpers") creates and commits new credentials with
      prepare_kernel_cred() after the call to the init() function.  This wipes
      all keyrings after umh_keys_init() is called.
      
      The best way to deal with this is to put the init() call just prior to
      the commit_creds() call, and pass the cred pointer to init().  That
      means that umh_keys_init() and suchlike can modify the credentials
      _before_ they are published and potentially in use by the rest of the
      system.
      
      This prevents request_key() from working as it is prevented from passing
      the session keyring it set up with the authorisation token to
      /sbin/request-key, and so the latter can't assume the authority to
      instantiate the key.  This causes the in-kernel DNS resolver to fail
      with ENOKEY unconditionally.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      Tested-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87966996
  7. 17 6月, 2011 3 次提交
  8. 16 6月, 2011 3 次提交
  9. 15 6月, 2011 5 次提交
  10. 10 6月, 2011 1 次提交
  11. 09 6月, 2011 1 次提交
    • S
      tracing: Fix regression in printk_formats file · db5e7ecc
      Steven Rostedt 提交于
      The fix to fix the printk_formats of modules broke the
      printk_formats of trace_printks in the kernel.
      
      The update of what to show via the seq_file was only updated
      if the passed in fmt was NULL, which happens only on the first
      iteration. The result was showing the first format every time
      instead of iterating through the available formats.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      db5e7ecc
  12. 08 6月, 2011 2 次提交
  13. 07 6月, 2011 3 次提交
  14. 04 6月, 2011 1 次提交
  15. 03 6月, 2011 6 次提交
  16. 31 5月, 2011 4 次提交
  17. 30 5月, 2011 1 次提交
    • L
      mm: Fix boot crash in mm_alloc() · 6345d24d
      Linus Torvalds 提交于
      Thomas Gleixner reports that we now have a boot crash triggered by
      CONFIG_CPUMASK_OFFSTACK=y:
      
          BUG: unable to handle kernel NULL pointer dereference at   (null)
          IP: [<c11ae035>] find_next_bit+0x55/0xb0
          Call Trace:
           [<c11addda>] cpumask_any_but+0x2a/0x70
           [<c102396b>] flush_tlb_mm+0x2b/0x80
           [<c1022705>] pud_populate+0x35/0x50
           [<c10227ba>] pgd_alloc+0x9a/0xf0
           [<c103a3fc>] mm_init+0xec/0x120
           [<c103a7a3>] mm_alloc+0x53/0xd0
      
      which was introduced by commit de03c72c ("mm: convert
      mm->cpu_vm_cpumask into cpumask_var_t"), and is due to wrong ordering of
      mm_init() vs mm_init_cpumask
      
      Thomas wrote a patch to just fix the ordering of initialization, but I
      hate the new double allocation in the fork path, so I ended up instead
      doing some more radical surgery to clean it all up.
      Reported-by: NThomas Gleixner <tglx@linutronix.de>
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6345d24d
  18. 29 5月, 2011 1 次提交
    • T
      idle governor: Avoid lock acquisition to read pm_qos before entering idle · 333c5ae9
      Tim Chen 提交于
      Thanks to the reviews and comments by Rafael, James, Mark and Andi.
      Here's version 2 of the patch incorporating your comments and also some
      update to my previous patch comments.
      
      I noticed that before entering idle state, the menu idle governor will
      look up the current pm_qos target value according to the list of qos
      requests received.  This look up currently needs the acquisition of a
      lock to access the list of qos requests to find the qos target value,
      slowing down the entrance into idle state due to contention by multiple
      cpus to access this list.  The contention is severe when there are a lot
      of cpus waking and going into idle.  For example, for a simple workload
      that has 32 pair of processes ping ponging messages to each other, where
      64 cpu cores are active in test system, I see the following profile with
      37.82% of cpu cycles spent in contention of pm_qos_lock:
      
      -     37.82%          swapper  [kernel.kallsyms]          [k]
      _raw_spin_lock_irqsave
         - _raw_spin_lock_irqsave
            - 95.65% pm_qos_request
                 menu_select
                 cpuidle_idle_call
               - cpu_idle
                    99.98% start_secondary
      
      A better approach will be to cache the updated pm_qos target value so
      reading it does not require lock acquisition as in the patch below.
      With this patch the contention for pm_qos_lock is removed and I saw a
      2.2X increase in throughput for my message passing workload.
      
      cc: stable@kernel.org
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJames Bottomley <James.Bottomley@suse.de>
      Acked-by: Nmark gross <markgross@thegnar.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      333c5ae9
  19. 28 5月, 2011 1 次提交