1. 19 7月, 2014 1 次提交
  2. 22 5月, 2014 1 次提交
  3. 08 4月, 2014 1 次提交
    • A
      mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE · a0715cc2
      Alex Thorlton 提交于
      Add VM_INIT_DEF_MASK, to allow us to set the default flags for VMs.  It
      also adds a prctl control which allows us to set the THP disable bit in
      mm->def_flags so that VMs will pick up the setting as they are created.
      Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0715cc2
  4. 23 2月, 2014 1 次提交
  5. 24 1月, 2014 2 次提交
  6. 13 11月, 2013 1 次提交
  7. 31 8月, 2013 1 次提交
  8. 10 7月, 2013 2 次提交
  9. 04 7月, 2013 3 次提交
  10. 13 6月, 2013 1 次提交
    • R
      reboot: rigrate shutdown/reboot to boot cpu · cf7df378
      Robin Holt 提交于
      We recently noticed that reboot of a 1024 cpu machine takes approx 16
      minutes of just stopping the cpus.  The slowdown was tracked to commit
      f96972f2 ("kernel/sys.c: call disable_nonboot_cpus() in
      kernel_restart()").
      
      The current implementation does all the work of hot removing the cpus
      before halting the system.  We are switching to just migrating to the
      boot cpu and then continuing with shutdown/reboot.
      
      This also has the effect of not breaking x86's command line parameter
      for specifying the reboot cpu.  Note, this code was shamelessly copied
      from arch/x86/kernel/reboot.c with bits removed pertaining to the
      reboot_cpu command line parameter.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Tested-by: NShawn Guo <shawn.guo@linaro.org>
      Cc: "Srivatsa S. Bhat" <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Russ Anderson <rja@sgi.com>
      Cc: Robin Holt <holt@sgi.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf7df378
  11. 01 5月, 2013 2 次提交
    • A
      kernel/sys.c: make prctl(PR_SET_MM) generally available · 52b36941
      Amnon Shiloh 提交于
      The purpose of this patch is to allow privileged processes to set
      their own per-memory memory-region fields:
      
            start_code, end_code, start_data, end_data, start_brk, brk,
            start_stack, arg_start, arg_end, env_start, env_end.
      
      This functionality is needed by any application or package that needs to
      reconstruct Linux processes, that is, to start them in any way other than
      by means of an "execve()" from an executable file.  This includes:
      
      1. Restoring processes from a checkpoint-file (by all potential
         user-level checkpointing packages, not only CRIU's).
      2. Restarting processes on another node after process migration.
      3. Starting duplicated copies of a running process (for reliability
         and high-availablity).
      4. Starting a process from an executable format that is not supported
         by Linux, thus requiring a "manual execve" by a user-level utility.
      5. Similarly, starting a process from a networked and/or crypted
         executable that, for confidentiality, licensing or other reasons,
         may not be written to the local file-systems.
      
      The code that does that was already included in the Linux kernel by the
      CRIU group, in the form of "prctl(PR_SET_MM)", but prior to this was
      enclosed within their private "#ifdef CONFIG_CHECKPOINT_RESTORE", which is
      normally disabled.  The patch removes those ifdefs.
      Signed-off-by: NAmnon Shiloh <u3557@miso.sublimeip.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52b36941
    • S
      kernel/timer.c: move some non timer related syscalls to kernel/sys.c · 4a22f166
      Stephen Rothwell 提交于
      Andrew Morton noted:
      
      	akpm3:/usr/src/25> grep SYSCALL kernel/timer.c
      	SYSCALL_DEFINE1(alarm, unsigned int, seconds)
      	SYSCALL_DEFINE0(getpid)
      	SYSCALL_DEFINE0(getppid)
      	SYSCALL_DEFINE0(getuid)
      	SYSCALL_DEFINE0(geteuid)
      	SYSCALL_DEFINE0(getgid)
      	SYSCALL_DEFINE0(getegid)
      	SYSCALL_DEFINE0(gettid)
      	SYSCALL_DEFINE1(sysinfo, struct sysinfo __user *, info)
      	COMPAT_SYSCALL_DEFINE1(sysinfo, struct compat_sysinfo __user *, info)
      
      	Only one of those should be in kernel/timer.c.  Who wrote this thing?
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a22f166
  12. 09 4月, 2013 1 次提交
    • H
      PM / reboot: call syscore_shutdown() after disable_nonboot_cpus() · 6f389a8f
      Huacai Chen 提交于
      As commit 40dc166c (PM / Core: Introduce struct syscore_ops for core
      subsystems PM) say, syscore_ops operations should be carried with one
      CPU on-line and interrupts disabled. However, after commit f96972f2
      (kernel/sys.c: call disable_nonboot_cpus() in kernel_restart()),
      syscore_shutdown() is called before disable_nonboot_cpus(), so break
      the rules. We have a MIPS machine with a 8259A PIC, and there is an
      external timer (HPET) linked at 8259A. Since 8259A has been shutdown
      too early (by syscore_shutdown()), disable_nonboot_cpus() runs without
      timer interrupt, so it hangs and reboot fails. This patch call
      syscore_shutdown() a little later (after disable_nonboot_cpus()) to
      avoid reboot failure, this is the same way as poweroff does.
      
      For consistency, add disable_nonboot_cpus() to kernel_halt().
      Signed-off-by: NHuacai Chen <chenhc@lemote.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      6f389a8f
  13. 23 3月, 2013 1 次提交
    • O
      poweroff: change orderly_poweroff() to use schedule_work() · 2ca067ef
      Oleg Nesterov 提交于
      David said:
      
          Commit 6c0c0d4d ("poweroff: fix bug in orderly_poweroff()")
          apparently fixes one bug in orderly_poweroff(), but introduces
          another.  The comments on orderly_poweroff() claim it can be called
          from any context - and indeed we call it from interrupt context in
          arch/powerpc/platforms/pseries/ras.c for example.  But since that
          commit this is no longer safe, since call_usermodehelper_fns() is not
          safe in interrupt context without the UMH_NO_WAIT option.
      
      orderly_poweroff() can be used from any context but UMH_WAIT_EXEC is
      sleepable.  Move the "force" logic into __orderly_poweroff() and change
      orderly_poweroff() to use the global poweroff_work which simply calls
      __orderly_poweroff().
      
      While at it, remove the unneeded "int argc" and change argv_split() to
      use GFP_KERNEL.
      
      We use the global "bool poweroff_force" to pass the argument, this can
      obviously affect the previous request if it is pending/running.  So we
      only allow the "false => true" transition assuming that the pending
      "true" should succeed anyway.  If schedule_work() fails after that we
      know that work->func() was not called yet, it must see the new value.
      
      This means that orderly_poweroff() becomes async even if we do not run
      the command and always succeeds, schedule_work() can only fail if the
      work is already pending.  We can export __orderly_poweroff() and change
      the non-atomic callers which want the old semantics.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Cc: Lucas De Marchi <lucas.demarchi@profusion.mobi>
      Cc: Feng Hong <hongfeng@marvell.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ca067ef
  14. 04 3月, 2013 1 次提交
  15. 28 2月, 2013 1 次提交
  16. 23 2月, 2013 1 次提交
  17. 22 2月, 2013 2 次提交
  18. 27 12月, 2012 1 次提交
  19. 29 11月, 2012 1 次提交
    • F
      cputime: Rename thread_group_times to thread_group_cputime_adjusted · e80d0a1a
      Frederic Weisbecker 提交于
      We have thread_group_cputime() and thread_group_times(). The naming
      doesn't provide enough information about the difference between
      these two APIs.
      
      To lower the confusion, rename thread_group_times() to
      thread_group_cputime_adjusted(). This name better suggests that
      it's a version of thread_group_cputime() that does some stabilization
      on the raw cputime values. ie here: scale on top of CFS runtime
      stats and bound lower value for monotonicity.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      e80d0a1a
  20. 20 10月, 2012 2 次提交
  21. 06 10月, 2012 2 次提交
    • H
      poweroff: fix bug in orderly_poweroff() · 6c0c0d4d
      hongfeng 提交于
      orderly_poweroff is trying to poweroff platform in two steps:
      
      step 1: Call user space application to poweroff
      step 2: If user space poweroff fail, then do a force power off if force param
              is set.
      
      The bug here is, step 1 is always successful with param UMH_NO_WAIT, which obey
      the design goal of orderly_poweroff.
      
      We have two choices here:
      UMH_WAIT_EXEC which means wait for the exec, but not the process;
      UMH_WAIT_PROC which means wait for the process to complete.
      we need to trade off the two choices:
      
      If using UMH_WAIT_EXEC, there is potential issue comments by Serge E.
      Hallyn: The exec will have started, but may for whatever (very unlikely)
      reason fail.
      
      If using UMH_WAIT_PROC, there is potential issue comments by Eric W.
      Biederman: If the caller is not running in a kernel thread then we can
      easily get into a case where the user space caller will block waiting for
      us when we are waiting for the user space caller.
      
      Thanks for their excellent ideas, based on the above discussion, we
      finally choose UMH_WAIT_EXEC, which is much more safe, if the user
      application really fails, we just complain the application itself, it
      seems a better choice here.
      Signed-off-by: NFeng Hong <hongfeng@marvell.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c0c0d4d
    • S
      kernel/sys.c: call disable_nonboot_cpus() in kernel_restart() · f96972f2
      Shawn Guo 提交于
      As kernel_power_off() calls disable_nonboot_cpus(), we may also want to
      have kernel_restart() call disable_nonboot_cpus().  Doing so can help
      machines that require boot cpu be the last alive cpu during reboot to
      survive with kernel restart.
      
      This fixes one reboot issue seen on imx6q (Cortex-A9 Quad).  The machine
      requires that the restart routine be run on the primary cpu rather than
      secondary ones.  Otherwise, the secondary core running the restart
      routine will fail to come to online after reboot.
      Signed-off-by: NShawn Guo <shawn.guo@linaro.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f96972f2
  22. 27 9月, 2012 2 次提交
  23. 31 7月, 2012 2 次提交
  24. 12 7月, 2012 1 次提交
  25. 21 6月, 2012 1 次提交
  26. 08 6月, 2012 4 次提交
  27. 01 6月, 2012 1 次提交
    • C
      c/r: prctl: add ability to set new mm_struct::exe_file · b32dfe37
      Cyrill Gorcunov 提交于
      When we do restore we would like to have a way to setup a former
      mm_struct::exe_file so that /proc/pid/exe would point to the original
      executable file a process had at checkpoint time.
      
      For this the PR_SET_MM_EXE_FILE code is introduced.  This option takes a
      file descriptor which will be set as a source for new /proc/$pid/exe
      symlink.
      
      Note it allows to change /proc/$pid/exe if there are no VM_EXECUTABLE
      vmas present for current process, simply because this feature is a special
      to C/R and mm::num_exe_file_vmas become meaningless after that.
      
      To minimize the amount of transition the /proc/pid/exe symlink might have,
      this feature is implemented in one-shot manner.  Thus once changed the
      symlink can't be changed again.  This should help sysadmins to monitor the
      symlinks over all process running in a system.
      
      In particular one could make a snapshot of processes and ring alarm if
      there unexpected changes of /proc/pid/exe's in a system.
      
      Note -- this feature is available iif CONFIG_CHECKPOINT_RESTORE is set and
      the caller must have CAP_SYS_RESOURCE capability granted, otherwise the
      request to change symlink will be rejected.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b32dfe37