1. 31 7月, 2014 1 次提交
    • A
      kexec: export free_huge_page to VMCOREINFO · 8f1d26d0
      Atsushi Kumagai 提交于
      PG_head_mask was added into VMCOREINFO to filter huge pages in b3acc56b
      ("kexec: save PG_head_mask in VMCOREINFO"), but makedumpfile still need
      another symbol to filter *hugetlbfs* pages.
      
      If a user hope to filter user pages, makedumpfile tries to exclude them by
      checking the condition whether the page is anonymous, but hugetlbfs pages
      aren't anonymous while they also be user pages.
      
      We know it's possible to detect them in the same way as PageHuge(),
      so we need the start address of free_huge_page():
      
          int PageHuge(struct page *page)
          {
                  if (!PageCompound(page))
                          return 0;
      
                  page = compound_head(page);
                  return get_compound_page_dtor(page) == free_huge_page;
          }
      
      For that reason, this patch changes free_huge_page() into public
      to export it to VMCOREINFO.
      Signed-off-by: NAtsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f1d26d0
  2. 24 6月, 2014 1 次提交
    • P
      kexec: save PG_head_mask in VMCOREINFO · b3acc56b
      Petr Tesarik 提交于
      To allow filtering of huge pages, makedumpfile must be able to identify
      them in the dump.  This can be done by checking the appropriate page
      flag, so communicate its value to makedumpfile through the VMCOREINFO
      interface.
      
      There's only one small catch.  Depending on how many page flags are
      available on a given architecture, this bit can be called PG_head or
      PG_compound.
      
      I sent a similar patch back in 2012, but Eric Biederman did not like
      using an #ifdef.  So, this time I'm adding a common symbol
      (PG_head_mask) instead.
      
      See https://lkml.org/lkml/2012/11/28/91 for the previous version.
      Signed-off-by: NPetr Tesarik <ptesarik@suse.cz>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Alexey Kardashevskiy <aik@ozlabs.ru>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3acc56b
  3. 07 6月, 2014 1 次提交
  4. 28 5月, 2014 1 次提交
    • S
      powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode · 011e4b02
      Srivatsa S. Bhat 提交于
      If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
      (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
      get the following messages during boot:
      
      [    0.089866] POWER8 performance monitor hardware support registered
      [    0.089985] power8-pmu: PMAO restore workaround active.
      [    5.095419] Processor 1 is stuck.
      [   10.097933] Processor 2 is stuck.
      [   15.100480] Processor 3 is stuck.
      [   20.102982] Processor 4 is stuck.
      [   25.105489] Processor 5 is stuck.
      [   30.108005] Processor 6 is stuck.
      [   35.110518] Processor 7 is stuck.
      [   40.113369] Processor 9 is stuck.
      [   45.115879] Processor 10 is stuck.
      [   50.118389] Processor 11 is stuck.
      [   55.120904] Processor 12 is stuck.
      [   60.123425] Processor 13 is stuck.
      [   65.125970] Processor 14 is stuck.
      [   70.128495] Processor 15 is stuck.
      [   75.131316] Processor 17 is stuck.
      
      Note that only the sibling threads are stuck, while the primary threads (0, 8,
      16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
      that kexec tries to wakeup (bring online) the sibling threads of all the cores,
      before performing kexec:
      
      [ 9464.131231] Starting new kernel
      [ 9464.148507] kexec: Waking offline cpu 1.
      [ 9464.148552] kexec: Waking offline cpu 2.
      [ 9464.148600] kexec: Waking offline cpu 3.
      [ 9464.148636] kexec: Waking offline cpu 4.
      [ 9464.148671] kexec: Waking offline cpu 5.
      [ 9464.148708] kexec: Waking offline cpu 6.
      [ 9464.148743] kexec: Waking offline cpu 7.
      [ 9464.148779] kexec: Waking offline cpu 9.
      [ 9464.148815] kexec: Waking offline cpu 10.
      [ 9464.148851] kexec: Waking offline cpu 11.
      [ 9464.148887] kexec: Waking offline cpu 12.
      [ 9464.148922] kexec: Waking offline cpu 13.
      [ 9464.148958] kexec: Waking offline cpu 14.
      [ 9464.148994] kexec: Waking offline cpu 15.
      [ 9464.149030] kexec: Waking offline cpu 17.
      
      Instrumenting this piece of code revealed that the cpu_up() operation actually
      fails with -EBUSY. Thus, only the primary threads of all the cores are online
      during kexec, and hence this is a sure-shot receipe for disaster, as explained
      in commit e8e5c215 (powerpc/kexec: Fix orphaned offline CPUs across kexec),
      as well as in the comment above wake_offline_cpus().
      
      It turns out that cpu_up() was returning -EBUSY because the variable
      'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
      by migrate_to_reboot_cpu() inside kernel_kexec().
      
      Now, migrate_to_reboot_cpu() was originally written with the assumption that
      any further code will not need to perform CPU hotplug, since we are anyway in
      the reboot path. However, kexec is clearly not such a case, since we depend on
      onlining CPUs, atleast on powerpc.
      
      So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
      kexec path, to fix this regression in kexec on powerpc.
      
      Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
      can catch such issues more easily in the future.
      
      Fixes: c97102ba (kexec: migrate to reboot cpu)
      Cc: stable@vger.kernel.org
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      011e4b02
  5. 08 4月, 2014 1 次提交
  6. 04 4月, 2014 1 次提交
    • P
      kernel: audit/fix non-modular users of module_init in core code · c96d6660
      Paul Gortmaker 提交于
      Code that is obj-y (always built-in) or dependent on a bool Kconfig
      (built-in or absent) can never be modular.  So using module_init as an
      alias for __initcall can be somewhat misleading.
      
      Fix these up now, so that we can relocate module_init from init.h into
      module.h in the future.  If we don't do this, we'd have to add module.h
      to obviously non-modular code, and that would be a worse thing.
      
      The audit targets the following module_init users for change:
       kernel/user.c                  obj-y
       kernel/kexec.c                 bool KEXEC (one instance per arch)
       kernel/profile.c               bool PROFILING
       kernel/hung_task.c             bool DETECT_HUNG_TASK
       kernel/sched/stats.c           bool SCHEDSTATS
       kernel/user_namespace.c        bool USER_NS
      
      Note that direct use of __initcall is discouraged, vs.  one of the
      priority categorized subgroups.  As __initcall gets mapped onto
      device_initcall, our use of subsys_initcall (which makes sense for these
      files) will thus change this registration from level 6-device to level
      4-subsys (i.e.  slightly earlier).  However no observable impact of that
      difference has been observed during testing.
      
      Also, two instances of missing ";" at EOL are fixed in kexec.
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c96d6660
  7. 06 3月, 2014 1 次提交
  8. 28 1月, 2014 1 次提交
  9. 24 1月, 2014 1 次提交
    • K
      kexec: add sysctl to disable kexec_load · 7984754b
      Kees Cook 提交于
      For general-purpose (i.e.  distro) kernel builds it makes sense to build
      with CONFIG_KEXEC to allow end users to choose what kind of things they
      want to do with kexec.  However, in the face of trying to lock down a
      system with such a kernel, there needs to be a way to disable kexec_load
      (much like module loading can be disabled).  Without this, it is too easy
      for the root user to modify kernel memory even when CONFIG_STRICT_DEVMEM
      and modules_disabled are set.  With this change, it is still possible to
      load an image for use later, then disable kexec_load so the image (or lack
      of image) can't be altered.
      
      The intention is for using this in environments where "perfect"
      enforcement is hard.  Without a verified boot, along with verified
      modules, and along with verified kexec, this is trying to give a system a
      better chance to defend itself (or at least grow the window of
      discoverability) against attack in the face of a privilege escalation.
      
      In my mind, I consider several boot scenarios:
      
      1) Verified boot of read-only verified root fs loading fd-based
         verification of kexec images.
      2) Secure boot of writable root fs loading signed kexec images.
      3) Regular boot loading kexec (e.g. kcrash) image early and locking it.
      4) Regular boot with no control of kexec image at all.
      
      1 and 2 don't exist yet, but will soon once the verified kexec series has
      landed.  4 is the state of things now.  The gap between 2 and 4 is too
      large, so this change creates scenario 3, a middle-ground above 4 when 2
      and 1 are not possible for a system.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7984754b
  10. 19 12月, 2013 1 次提交
  11. 08 12月, 2013 1 次提交
  12. 14 10月, 2013 1 次提交
  13. 12 9月, 2013 1 次提交
  14. 01 5月, 2013 2 次提交
  15. 30 4月, 2013 3 次提交
  16. 18 4月, 2013 3 次提交
  17. 28 2月, 2013 6 次提交
  18. 30 1月, 2013 1 次提交
    • Y
      x86: Add Crash kernel low reservation · 0212f915
      Yinghai Lu 提交于
      During kdump kernel's booting stage, it need to find low ram for
      swiotlb buffer when system does not support intel iommu/dmar remapping.
      
      kexed-tools is appending memmap=exactmap and range from /proc/iomem
      with "Crash kernel", and that range is above 4G for 64bit after boot
      protocol 2.12.
      
      We need to add another range in /proc/iomem like "Crash kernel low",
      so kexec-tools could find that info and append to kdump kernel
      command line.
      
      Try to reserve some under 4G if the normal "Crash kernel" is above 4G.
      
      User could specify the size with crashkernel_low=XX[KMG].
      
      -v2: fix warning that is found by Fengguang's test robot.
      -v3: move out get_mem_size change to another patch, to solve compiling
           warning that is found by Borislav Petkov <bp@alien8.de>
      -v4: user must specify crashkernel_low if system does not support
           intel or amd iommu.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1359058816-7615-31-git-send-email-yinghai@kernel.org
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Rob Landley <rob@landley.net>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0212f915
  19. 06 10月, 2012 1 次提交
  20. 31 7月, 2012 1 次提交
  21. 29 3月, 2012 3 次提交
  22. 30 1月, 2012 1 次提交
    • R
      PM / Sleep: Introduce "late suspend" and "early resume" of devices · cf579dfb
      Rafael J. Wysocki 提交于
      The current device suspend/resume phases during system-wide power
      transitions appear to be insufficient for some platforms that want
      to use the same callback routines for saving device states and
      related operations during runtime suspend/resume as well as during
      system suspend/resume.  In principle, they could point their
      .suspend_noirq() and .resume_noirq() to the same callback routines
      as their .runtime_suspend() and .runtime_resume(), respectively,
      but at least some of them require device interrupts to be enabled
      while the code in those routines is running.
      
      It also makes sense to have device suspend-resume callbacks that will
      be executed with runtime PM disabled and with device interrupts
      enabled in case someone needs to run some special code in that
      context during system-wide power transitions.
      
      Apart from this, .suspend_noirq() and .resume_noirq() were introduced
      as a workaround for drivers using shared interrupts and failing to
      prevent their interrupt handlers from accessing suspended hardware.
      It appears to be better not to use them for other porposes, or we may
      have to deal with some serious confusion (which seems to be happening
      already).
      
      For the above reasons, introduce new device suspend/resume phases,
      "late suspend" and "early resume" (and analogously for hibernation)
      whose callback will be executed with runtime PM disabled and with
      device interrupts enabled and whose callback pointers generally may
      point to runtime suspend/resume routines.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Reviewed-by: NMark Brown <broonie@opensource.wolfsonmicro.com>
      Reviewed-by: NKevin Hilman <khilman@ti.com>
      cf579dfb
  23. 13 1月, 2012 3 次提交
    • M
      kdump: crashk_res init check for /sys/kernel/kexec_crash_size · bec013c4
      Michael Holzheu 提交于
      Currently it is possible to set the crash_size via the sysfs
      /sys/kernel/kexec_crash_size even if no crash kernel memory has been
      defined with the "crashkernel" parameter.  In this case "crashk_res" is
      not initialized and crashk_res.start = crashk_res.end = 0.  Unfortunately
      resource_size(&crashk_res) returns 1 in this case.  This breaks the s390
      implementation of crash_(un)map_reserved_pages().
      
      To fix the problem the correct "old_size" is now calculated in
      crash_shrink_memory().  "old_size is set to "0" if crashk_res is not
      initialized.  With this change crash_shrink_memory() will do nothing, when
      "crashk_res" is not initialized.  It will return "0" for "echo 0 >
      /sys/kernel/kexec_crash_size" and -EINVAL for "echo [not zero] >
      /sys/kernel/kexec_crash_size".
      
      In addition to that this patch also simplifies the "ret = -EINVAL" vs.
      "ret = 0" logic as suggested by Simon Horman.
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Reviewed-by: NDave Young <dyoung@redhat.com>
      Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Reviewed-by: NSimon Horman <horms@verge.net.au>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bec013c4
    • M
      kdump: add missing RAM resource in crash_shrink_memory() · 6480e5a0
      Michael Holzheu 提交于
      When shrinking crashkernel memory using /sys/kernel/kexec_crash_size for
      the newly added memory no RAM resource is created at the moment.
      
      Example:
      
        $ cat /proc/iomem
        00000000-bfffffff : System RAM
          00000000-005b7ac3 : Kernel code
          005b7ac4-009743bf : Kernel data
          009bb000-00a85c33 : Kernel bss
        c0000000-cfffffff : Crash kernel
        d0000000-ffffffff : System RAM
      
        $ echo 0 > /sys/kernel/kexec_crash_size
        $ cat /proc/iomem
        00000000-bfffffff : System RAM
          00000000-005b7ac3 : Kernel code
          005b7ac4-009743bf : Kernel data
          009bb000-00a85c33 : Kernel bss
                                         <<-- here is System RAM missing
        d0000000-ffffffff : System RAM
      
      One result of this bug is that the memory chunk can never be set offline
      using memory hotplug.  With this patch I insert a new "System RAM"
      resource for the released memory.  Then the upper example looks like the
      following:
      
        $ echo 0 > /sys/kernel/kexec_crash_size
        $ cat /proc/iomem
        00000000-bfffffff : System RAM
          00000000-005b7ac3 : Kernel code
          005b7ac4-009743bf : Kernel data
          009bb000-00a85c33 : Kernel bss
        c0000000-cfffffff : System RAM   <<-- new rescoure
        d0000000-ffffffff : System RAM
      
      And now I can set chunk c0000000-cfffffff offline.
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6480e5a0
    • W
      kexec: remove KMSG_DUMP_KEXEC · a3dd3323
      WANG Cong 提交于
      KMSG_DUMP_KEXEC is useless because we already save kernel messages inside
      /proc/vmcore, and it is unsafe to allow modules to do other stuffs in a
      crash dump scenario.
      
      [akpm@linux-foundation.org: fix powerpc build]
      Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Reported-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NJarod Wilson <jarod@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3dd3323
  24. 09 12月, 2011 1 次提交
  25. 30 10月, 2011 2 次提交
    • M
      [S390] kdump: Add infrastructure for unmapping crashkernel memory · 558df720
      Michael Holzheu 提交于
      This patch introduces a mechanism that allows architecture backends to
      remove page tables for the crashkernel memory. This can protect the loaded
      kdump kernel from being overwritten by broken kernel code.  Two new
      functions crash_map_reserved_pages() and crash_unmap_reserved_pages() are
      added that can be implemented by architecture code.  The
      crash_map_reserved_pages() function is called before and
      crash_unmap_reserved_pages() after the crashkernel segments are loaded.  The
      functions are also called in crash_shrink_memory() to create/remove page
      tables when the crashkernel memory size is reduced.
      
      To support architectures that have large pages this patch also introduces
      a new define KEXEC_CRASH_MEM_ALIGN. The crashkernel start and size must
      always be aligned with KEXEC_CRASH_MEM_ALIGN.
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      558df720
    • M
      [S390] kdump: Initialize vmcoreinfo note at startup · fa8ff292
      Michael Holzheu 提交于
      Currently the vmcoreinfo note is only initialized in case of kdump. On s390
      it is possible to create kernel dumps with other dump mechanisms than kdump
      (e.g. via hypervisor dump or stand-alone dump tools). For those dumps it
      would also be desirable to include the vmcoreinfo data. To accomplish this,
      with this patch the vmcoreinfo ELF note is always initialized, not only in
      case of a (kdump) crash. On s390 we will add an ABI defined pointer at
      a well known address to vmcoreinfo so that dump analysis tools are able to
      find this information.
      
      In particular on s390 we have a tool named zgetdump. With this tool it is
      possible to convert dump formats on the fly using fuse. E.g. you can mount a
      s390 stand-alone dump as ELF dump. When this is done, the tool finds the
      vmcoreinfo in the stand-alone dump via the well known ABI defined address and
      it creates the respective VMCOREINFO ELF note in the output ELF dump. This then
      can be used e.g. by makedumpfile for dump filtering.  No more need for a
      vmlinux file with debug information.
      
      So this will look like the following:
      $ zgetdump --mount standalone.dump -f elf /mnt
      $ ls /mnt
        dump.elf
      $ readelf -n /mnt/dump.elf
      $ ...
        VMCOREINFO            0x00000474      Unknown note type: (0x00000000)
      $ makedumpfile -c -d 31 /mnt/dump.elf dump.kdump
      
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      fa8ff292