1. 09 8月, 2014 5 次提交
    • V
      kexec: implementation of new syscall kexec_file_load · cb105258
      Vivek Goyal 提交于
      Previous patch provided the interface definition and this patch prvides
      implementation of new syscall.
      
      Previously segment list was prepared in user space.  Now user space just
      passes kernel fd, initrd fd and command line and kernel will create a
      segment list internally.
      
      This patch contains generic part of the code.  Actual segment preparation
      and loading is done by arch and image specific loader.  Which comes in
      next patch.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cb105258
    • V
      kexec: new syscall kexec_file_load() declaration · f0895685
      Vivek Goyal 提交于
      This is the new syscall kexec_file_load() declaration/interface.  I have
      reserved the syscall number only for x86_64 so far.  Other architectures
      (including i386) can reserve syscall number when they enable the support
      for this new syscall.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0895685
    • V
      kexec: use common function for kimage_normal_alloc() and kimage_crash_alloc() · 255aedd9
      Vivek Goyal 提交于
      kimage_normal_alloc() and kimage_crash_alloc() are doing lot of similar
      things and differ only little.  So instead of having two separate
      functions create a common function kimage_alloc_init() and pass it the
      "flags" argument which tells whether it is normal kexec or kexec_on_panic.
       And this function should be able to deal with both the cases.
      
      This consolidation also helps later where we can use a common function
      kimage_file_alloc_init() to handle normal and crash cases for new file
      based kexec syscall.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      255aedd9
    • V
      kexec: move segment verification code in a separate function · dabe7862
      Vivek Goyal 提交于
      Previously do_kimage_alloc() will allocate a kimage structure, copy
      segment list from user space and then do the segment list sanity
      verification.
      
      Break down this function in 3 parts.  do_kimage_alloc_init() to do actual
      allocation and basic initialization of kimage structure.
      copy_user_segment_list() to copy segment list from user space and
      sanity_check_segment_list() to verify the sanity of segment list as passed
      by user space.
      
      In later patches, I need to only allocate kimage and not copy segment list
      from user space.  So breaking down in smaller functions enables re-use of
      code at other places.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dabe7862
    • V
      kexec: rename unusebale_pages to unusable_pages · 7d3e2bca
      Vivek Goyal 提交于
      Let's use the more common "unusable".
      
      This patch was originally written and posted by Boris. I am including it
      in this patch series.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Greg Kroah-Hartman <greg@kroah.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: WANG Chao <chaowang@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7d3e2bca
  2. 31 7月, 2014 2 次提交
  3. 24 6月, 2014 1 次提交
    • P
      kexec: save PG_head_mask in VMCOREINFO · b3acc56b
      Petr Tesarik 提交于
      To allow filtering of huge pages, makedumpfile must be able to identify
      them in the dump.  This can be done by checking the appropriate page
      flag, so communicate its value to makedumpfile through the VMCOREINFO
      interface.
      
      There's only one small catch.  Depending on how many page flags are
      available on a given architecture, this bit can be called PG_head or
      PG_compound.
      
      I sent a similar patch back in 2012, but Eric Biederman did not like
      using an #ifdef.  So, this time I'm adding a common symbol
      (PG_head_mask) instead.
      
      See https://lkml.org/lkml/2012/11/28/91 for the previous version.
      Signed-off-by: NPetr Tesarik <ptesarik@suse.cz>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3acc56b
  4. 07 6月, 2014 1 次提交
  5. 28 5月, 2014 1 次提交
    • S
      powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode · 011e4b02
      Srivatsa S. Bhat 提交于
      If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
      (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
      get the following messages during boot:
      
      [    0.089866] POWER8 performance monitor hardware support registered
      [    0.089985] power8-pmu: PMAO restore workaround active.
      [    5.095419] Processor 1 is stuck.
      [   10.097933] Processor 2 is stuck.
      [   15.100480] Processor 3 is stuck.
      [   20.102982] Processor 4 is stuck.
      [   25.105489] Processor 5 is stuck.
      [   30.108005] Processor 6 is stuck.
      [   35.110518] Processor 7 is stuck.
      [   40.113369] Processor 9 is stuck.
      [   45.115879] Processor 10 is stuck.
      [   50.118389] Processor 11 is stuck.
      [   55.120904] Processor 12 is stuck.
      [   60.123425] Processor 13 is stuck.
      [   65.125970] Processor 14 is stuck.
      [   70.128495] Processor 15 is stuck.
      [   75.131316] Processor 17 is stuck.
      
      Note that only the sibling threads are stuck, while the primary threads (0, 8,
      16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
      that kexec tries to wakeup (bring online) the sibling threads of all the cores,
      before performing kexec:
      
      [ 9464.131231] Starting new kernel
      [ 9464.148507] kexec: Waking offline cpu 1.
      [ 9464.148552] kexec: Waking offline cpu 2.
      [ 9464.148600] kexec: Waking offline cpu 3.
      [ 9464.148636] kexec: Waking offline cpu 4.
      [ 9464.148671] kexec: Waking offline cpu 5.
      [ 9464.148708] kexec: Waking offline cpu 6.
      [ 9464.148743] kexec: Waking offline cpu 7.
      [ 9464.148779] kexec: Waking offline cpu 9.
      [ 9464.148815] kexec: Waking offline cpu 10.
      [ 9464.148851] kexec: Waking offline cpu 11.
      [ 9464.148887] kexec: Waking offline cpu 12.
      [ 9464.148922] kexec: Waking offline cpu 13.
      [ 9464.148958] kexec: Waking offline cpu 14.
      [ 9464.148994] kexec: Waking offline cpu 15.
      [ 9464.149030] kexec: Waking offline cpu 17.
      
      Instrumenting this piece of code revealed that the cpu_up() operation actually
      fails with -EBUSY. Thus, only the primary threads of all the cores are online
      during kexec, and hence this is a sure-shot receipe for disaster, as explained
      in commit e8e5c215 (powerpc/kexec: Fix orphaned offline CPUs across kexec),
      as well as in the comment above wake_offline_cpus().
      
      It turns out that cpu_up() was returning -EBUSY because the variable
      'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
      by migrate_to_reboot_cpu() inside kernel_kexec().
      
      Now, migrate_to_reboot_cpu() was originally written with the assumption that
      any further code will not need to perform CPU hotplug, since we are anyway in
      the reboot path. However, kexec is clearly not such a case, since we depend on
      onlining CPUs, atleast on powerpc.
      
      So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
      kexec path, to fix this regression in kexec on powerpc.
      
      Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
      can catch such issues more easily in the future.
      
      Fixes: c97102ba (kexec: migrate to reboot cpu)
      Cc: stable@vger.kernel.org
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      011e4b02
  6. 08 4月, 2014 1 次提交
  7. 04 4月, 2014 1 次提交
    • P
      kernel: audit/fix non-modular users of module_init in core code · c96d6660
      Paul Gortmaker 提交于
      Code that is obj-y (always built-in) or dependent on a bool Kconfig
      (built-in or absent) can never be modular.  So using module_init as an
      alias for __initcall can be somewhat misleading.
      
      Fix these up now, so that we can relocate module_init from init.h into
      module.h in the future.  If we don't do this, we'd have to add module.h
      to obviously non-modular code, and that would be a worse thing.
      
      The audit targets the following module_init users for change:
       kernel/user.c                  obj-y
       kernel/kexec.c                 bool KEXEC (one instance per arch)
       kernel/profile.c               bool PROFILING
       kernel/hung_task.c             bool DETECT_HUNG_TASK
       kernel/sched/stats.c           bool SCHEDSTATS
       kernel/user_namespace.c        bool USER_NS
      
      Note that direct use of __initcall is discouraged, vs.  one of the
      priority categorized subgroups.  As __initcall gets mapped onto
      device_initcall, our use of subsys_initcall (which makes sense for these
      files) will thus change this registration from level 6-device to level
      4-subsys (i.e.  slightly earlier).  However no observable impact of that
      difference has been observed during testing.
      
      Also, two instances of missing ";" at EOL are fixed in kexec.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c96d6660
  8. 06 3月, 2014 1 次提交
  9. 28 1月, 2014 1 次提交
  10. 24 1月, 2014 1 次提交
    • K
      kexec: add sysctl to disable kexec_load · 7984754b
      Kees Cook 提交于
      For general-purpose (i.e.  distro) kernel builds it makes sense to build
      with CONFIG_KEXEC to allow end users to choose what kind of things they
      want to do with kexec.  However, in the face of trying to lock down a
      system with such a kernel, there needs to be a way to disable kexec_load
      (much like module loading can be disabled).  Without this, it is too easy
      for the root user to modify kernel memory even when CONFIG_STRICT_DEVMEM
      and modules_disabled are set.  With this change, it is still possible to
      load an image for use later, then disable kexec_load so the image (or lack
      of image) can't be altered.
      
      The intention is for using this in environments where "perfect"
      enforcement is hard.  Without a verified boot, along with verified
      modules, and along with verified kexec, this is trying to give a system a
      better chance to defend itself (or at least grow the window of
      discoverability) against attack in the face of a privilege escalation.
      
      In my mind, I consider several boot scenarios:
      
      1) Verified boot of read-only verified root fs loading fd-based
         verification of kexec images.
      2) Secure boot of writable root fs loading signed kexec images.
      3) Regular boot loading kexec (e.g. kcrash) image early and locking it.
      4) Regular boot with no control of kexec image at all.
      
      1 and 2 don't exist yet, but will soon once the verified kexec series has
      landed.  4 is the state of things now.  The gap between 2 and 4 is too
      large, so this change creates scenario 3, a middle-ground above 4 when 2
      and 1 are not possible for a system.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7984754b
  11. 19 12月, 2013 1 次提交
  12. 08 12月, 2013 1 次提交
  13. 14 10月, 2013 1 次提交
  14. 12 9月, 2013 1 次提交
  15. 01 5月, 2013 2 次提交
  16. 30 4月, 2013 3 次提交
  17. 18 4月, 2013 3 次提交
  18. 28 2月, 2013 6 次提交
  19. 30 1月, 2013 1 次提交
    • Y
      x86: Add Crash kernel low reservation · 0212f915
      Yinghai Lu 提交于
      During kdump kernel's booting stage, it need to find low ram for
      swiotlb buffer when system does not support intel iommu/dmar remapping.
      
      kexed-tools is appending memmap=exactmap and range from /proc/iomem
      with "Crash kernel", and that range is above 4G for 64bit after boot
      protocol 2.12.
      
      We need to add another range in /proc/iomem like "Crash kernel low",
      so kexec-tools could find that info and append to kdump kernel
      command line.
      
      Try to reserve some under 4G if the normal "Crash kernel" is above 4G.
      
      User could specify the size with crashkernel_low=XX[KMG].
      
      -v2: fix warning that is found by Fengguang's test robot.
      -v3: move out get_mem_size change to another patch, to solve compiling
           warning that is found by Borislav Petkov <bp@alien8.de>
      -v4: user must specify crashkernel_low if system does not support
           intel or amd iommu.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-31-git-send-email-yinghai@kernel.org
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Rob Landley <rob@landley.net>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0212f915
  20. 06 10月, 2012 1 次提交
  21. 31 7月, 2012 1 次提交
  22. 29 3月, 2012 3 次提交
  23. 30 1月, 2012 1 次提交
    • R
      PM / Sleep: Introduce "late suspend" and "early resume" of devices · cf579dfb
      Rafael J. Wysocki 提交于
      The current device suspend/resume phases during system-wide power
      transitions appear to be insufficient for some platforms that want
      to use the same callback routines for saving device states and
      related operations during runtime suspend/resume as well as during
      system suspend/resume.  In principle, they could point their
      .suspend_noirq() and .resume_noirq() to the same callback routines
      as their .runtime_suspend() and .runtime_resume(), respectively,
      but at least some of them require device interrupts to be enabled
      while the code in those routines is running.
      
      It also makes sense to have device suspend-resume callbacks that will
      be executed with runtime PM disabled and with device interrupts
      enabled in case someone needs to run some special code in that
      context during system-wide power transitions.
      
      Apart from this, .suspend_noirq() and .resume_noirq() were introduced
      as a workaround for drivers using shared interrupts and failing to
      prevent their interrupt handlers from accessing suspended hardware.
      It appears to be better not to use them for other porposes, or we may
      have to deal with some serious confusion (which seems to be happening
      already).
      
      For the above reasons, introduce new device suspend/resume phases,
      "late suspend" and "early resume" (and analogously for hibernation)
      whose callback will be executed with runtime PM disabled and with
      device interrupts enabled and whose callback pointers generally may
      point to runtime suspend/resume routines.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Reviewed-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      cf579dfb