1. 13 10月, 2014 1 次提交
    • R
      ima: added support for new kernel cmdline parameter ima_template_fmt · c2426d2a
      Roberto Sassu 提交于
      This patch allows users to provide a custom template format through the
      new kernel command line parameter 'ima_template_fmt'. If the supplied
      format is not valid, IMA uses the default template descriptor.
      
      Changelog:
       - v3:
         - added check for 'fields' and 'num_fields' in
           template_desc_init_fields() (suggested by Mimi Zohar)
      
       - v2:
         - using template_desc_init_fields() to validate a format string
           (Roberto Sassu)
         - updated documentation by stating that only the chosen template
           descriptor is initialized (Roberto Sassu)
      
       - v1:
         - simplified code of ima_template_fmt_setup()
           (Roberto Sassu, suggested by Mimi Zohar)
      Signed-off-by: NRoberto Sassu <roberto.sassu@polito.it>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      c2426d2a
  2. 18 9月, 2014 1 次提交
    • D
      ima: provide 'ima_appraise=log' kernel option · 2faa6ef3
      Dmitry Kasatkin 提交于
      The kernel boot parameter "ima_appraise" currently defines 'off',
      'enforce' and 'fix' modes.  When designing a policy and labeling
      the system, access to files are either blocked in the default
      'enforce' mode or automatically fixed in the 'fix' mode.  It is
      beneficial to be able to run the system in a logging only mode,
      without fixing it, in order to properly analyze the system. This
      patch adds a 'log' mode to run the system in a permissive mode and
      log the appraisal results.
      Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      2faa6ef3
  3. 17 7月, 2014 4 次提交
    • D
      KEYS: validate certificate trust only with builtin keys · 32c4741c
      Dmitry Kasatkin 提交于
      Instead of allowing public keys, with certificates signed by any
      key on the system trusted keyring, to be added to a trusted keyring,
      this patch further restricts the certificates to those signed only by
      builtin keys on the system keyring.
      
      This patch defines a new option 'builtin' for the kernel parameter
      'keys_ownerid' to allow trust validation using builtin keys.
      
      Simplified Mimi's "KEYS: define an owner trusted keyring" patch
      
      Changelog v7:
      - rename builtin_keys to use_builtin_keys
      Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      32c4741c
    • D
      KEYS: validate certificate trust only with selected key · ffb70f61
      Dmitry Kasatkin 提交于
      Instead of allowing public keys, with certificates signed by any
      key on the system trusted keyring, to be added to a trusted keyring,
      this patch further restricts the certificates to those signed by a
      particular key on the system keyring.
      
      This patch defines a new kernel parameter 'ca_keys' to identify the
      specific key which must be used for trust validation of certificates.
      
      Simplified Mimi's "KEYS: define an owner trusted keyring" patch.
      
      Changelog:
      - support for builtin x509 public keys only
      - export "asymmetric_keyid_match"
      - remove ifndefs MODULE
      - rename kernel boot parameter from keys_ownerid to ca_keys
      Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      ffb70f61
    • D
      ima: introduce multi-page collect buffers · 6edf7a89
      Dmitry Kasatkin 提交于
      Use of multiple-page collect buffers reduces:
      1) the number of block IO requests
      2) the number of asynchronous hash update requests
      
      Second is important for HW accelerated hashing, because significant
      amount of time is spent for preparation of hash update operation,
      which includes configuring acceleration HW, DMA engine, etc...
      Thus, HW accelerators are more efficient when working on large
      chunks of data.
      
      This patch introduces usage of multi-page collect buffers. Buffer size
      can be specified using 'ahash_bufsize' module parameter. Default buffer
      size is 4096 bytes.
      
      Changes in v3:
      - kernel parameter replaced with module parameter
      Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      6edf7a89
    • D
      ima: use ahash API for file hash calculation · 3bcced39
      Dmitry Kasatkin 提交于
      Async hash API allows the use of HW acceleration for hash calculation.
      It may give significant performance gain and/or reduce power consumption,
      which might be very beneficial for battery powered devices.
      
      This patch introduces hash calculation using ahash API. ahash performance
      depends on the data size and the particular HW. Depending on the specific
      system, shash performance may be better.
      
      This patch defines 'ahash_minsize' module parameter, which is used to
      define the minimal file size to use with ahash.  If this minimum file size
      is not set or the file is smaller than defined by the parameter, shash will
      be used.
      
      Changes in v3:
      - kernel parameter replaced with module parameter
      - pr_crit replaced with pr_crit_ratelimited
      - more comment changes - Mimi
      
      Changes in v2:
      - ima_ahash_size became as ima_ahash
      - ahash pre-allocation moved out from __init code to be able to use
        ahash crypto modules. Ahash allocated once on the first use.
      - hash calculation falls back to shash if ahash allocation/calculation fails
      - complex initialization separated from variable declaration
      - improved comments
      Signed-off-by: NDmitry Kasatkin <d.kasatkin@samsung.com>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      3bcced39
  4. 15 7月, 2014 1 次提交
  5. 24 6月, 2014 2 次提交
    • A
      kernel/watchdog.c: print traces for all cpus on lockup detection · ed235875
      Aaron Tomlin 提交于
      A 'softlockup' is defined as a bug that causes the kernel to loop in
      kernel mode for more than a predefined period to time, without giving
      other tasks a chance to run.
      
      Currently, upon detection of this condition by the per-cpu watchdog
      task, debug information (including a stack trace) is sent to the system
      log.
      
      On some occasions, we have observed that the "victim" rather than the
      actual "culprit" (i.e.  the owner/holder of the contended resource) is
      reported to the user.  Often this information has proven to be
      insufficient to assist debugging efforts.
      
      To avoid loss of useful debug information, for architectures which
      support NMI, this patch makes it possible to improve soft lockup
      reporting.  This is accomplished by issuing an NMI to each cpu to obtain
      a stack trace.
      
      If NMI is not supported we just revert back to the old method.  A sysctl
      and boot-time parameter is available to toggle this feature.
      
      [dzickus@redhat.com: add CONFIG_SMP in certain areas]
      [akpm@linux-foundation.org: additional CONFIG_SMP=n optimisations]
      [mq@suse.cz: fix warning]
      Signed-off-by: NAaron Tomlin <atomlin@redhat.com>
      Signed-off-by: NDon Zickus <dzickus@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Mateusz Guzik <mguzik@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NJan Moskyto Matejka <mq@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed235875
    • P
      rcu: Reduce overhead of cond_resched() checks for RCU · 4a81e832
      Paul E. McKenney 提交于
      Commit ac1bea85 (Make cond_resched() report RCU quiescent states)
      fixed a problem where a CPU looping in the kernel with but one runnable
      task would give RCU CPU stall warnings, even if the in-kernel loop
      contained cond_resched() calls.  Unfortunately, in so doing, it introduced
      performance regressions in Anton Blanchard's will-it-scale "open1" test.
      The problem appears to be not so much the increased cond_resched() path
      length as an increase in the rate at which grace periods complete, which
      increased per-update grace-period overhead.
      
      This commit takes a different approach to fixing this bug, mainly by
      moving the RCU-visible quiescent state from cond_resched() to
      rcu_note_context_switch(), and by further reducing the check to a
      simple non-zero test of a single per-CPU variable.  However, this
      approach requires that the force-quiescent-state processing send
      resched IPIs to the offending CPUs.  These will be sent only once
      the grace period has reached an age specified by the boot/sysfs
      parameter rcutree.jiffies_till_sched_qs, or once the grace period
      reaches an age halfway to the point at which RCU CPU stall warnings
      will be emitted, whichever comes first.
      Reported-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Christoph Lameter <cl@gentwo.org>
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      [ paulmck: Made rcu_momentary_dyntick_idle() as suggested by the
        ktest build robot.  Also fixed smp_mb() comment as noted by
        Oleg Nesterov. ]
      
      Merge with e552592e (Reduce overhead of cond_resched() checks for RCU)
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      4a81e832
  6. 17 6月, 2014 2 次提交
  7. 07 6月, 2014 1 次提交
  8. 05 6月, 2014 2 次提交
    • P
      init/main.c: add initcall_blacklist kernel parameter · 7b0b73d7
      Prarit Bhargava 提交于
      When a module is built into the kernel the module_init() function
      becomes an initcall.  Sometimes debugging through dynamic debug can
      help, however, debugging built in kernel modules is typically done by
      changing the .config, recompiling, and booting the new kernel in an
      effort to determine exactly which module caused a problem.
      
      This patchset can be useful stand-alone or combined with initcall_debug.
      There are cases where some initcalls can hang the machine before the
      console can be flushed, which can make initcall_debug output inaccurate.
      Having the ability to skip initcalls can help further debugging of these
      scenarios.
      
      Usage: initcall_blacklist=<list of comma separated initcalls>
      
      ex) added "initcall_blacklist=sgi_uv_sysfs_init" as a kernel parameter and
      the log contains:
      
      	blacklisting initcall sgi_uv_sysfs_init
      	...
      	...
      	initcall sgi_uv_sysfs_init blacklisted
      
      ex) added "initcall_blacklist=foo_bar,sgi_uv_sysfs_init" as a kernel parameter
      and the log contains:
      
      	blacklisting initcall foo_bar
      	blacklisting initcall sgi_uv_sysfs_init
      	...
      	...
      	initcall sgi_uv_sysfs_init blacklisted
      
      [akpm@linux-foundation.org: tweak printk text]
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Richard Weinberger <richard.weinberger@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: Rob Landley <rob@landley.net>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b0b73d7
    • A
      cma: add placement specifier for "cma=" kernel parameter · 5ea3b1b2
      Akinobu Mita 提交于
      Currently, "cma=" kernel parameter is used to specify the size of CMA,
      but we can't specify where it is located.  We want to locate CMA below
      4GB for devices only supporting 32-bit addressing on 64-bit systems
      without iommu.
      
      This enables to specify the placement of CMA by extending "cma=" kernel
      parameter.
      
      Examples:
       1. locate 64MB CMA below 4GB by "cma=64M@0-4G"
       2. locate 64MB CMA exact at 512MB by "cma=64M@512M"
      
      Note that the DMA contiguous memory allocator on x86 assumes that
      page_address() works for the pages to allocate.  So this change requires
      to limit end address of contiguous memory area upto max_pfn_mapped to
      prevent from locating it on highmem area by the argument of
      dma_contiguous_reserve().
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ea3b1b2
  9. 01 6月, 2014 1 次提交
    • L
      ACPI: Fix x86 regression related to early mapping size limitation · 4fc0a7e8
      Lv Zheng 提交于
      The following warning message is triggered:
       WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:136 __early_ioremap+0x11f/0x1f2()
       Modules linked in:
       CPU: 0 PID: 0 Comm: swapper Not tainted 3.15.0-rc1-00017-g86dfc6f3-dirty #298
       Hardware name: Intel Corporation S2600CP/S2600CP, BIOS SE5C600.86B.99.99.x036.091920111209 09/19/2011
        0000000000000009 ffffffff81b75c40 ffffffff817c627b 0000000000000000
        ffffffff81b75c78 ffffffff81067b5d 000000000000007b 8000000000000563
        00000000b96b20dc 0000000000000001 ffffffffff300e0c ffffffff81b75c88
       Call Trace:
        [<ffffffff817c627b>] dump_stack+0x45/0x56
        [<ffffffff81067b5d>] warn_slowpath_common+0x7d/0xa0
        [<ffffffff81067c3a>] warn_slowpath_null+0x1a/0x20
        [<ffffffff81d4b9d5>] __early_ioremap+0x11f/0x1f2
        [<ffffffff81d4bc5b>] early_ioremap+0x13/0x15
        [<ffffffff81d2b8f3>] __acpi_map_table+0x13/0x18
        [<ffffffff817b8d1a>] acpi_os_map_memory+0x26/0x14e
        [<ffffffff813ff018>] acpi_tb_acquire_table+0x42/0x70
        [<ffffffff813ff086>] acpi_tb_validate_table+0x27/0x37
        [<ffffffff813ff0e5>] acpi_tb_verify_table+0x22/0xd8
        [<ffffffff813ff6a8>] acpi_tb_install_non_fixed_table+0x60/0x1c9
        [<ffffffff81d61024>] acpi_tb_parse_root_table+0x218/0x26a
        [<ffffffff81d1b120>] ? early_idt_handlers+0x120/0x120
        [<ffffffff81d610cd>] acpi_initialize_tables+0x57/0x59
        [<ffffffff81d5f25d>] acpi_table_init+0x1b/0x99
        [<ffffffff81d2bca0>] acpi_boot_table_init+0x1e/0x85
        [<ffffffff81d23043>] setup_arch+0x99d/0xcc6
        [<ffffffff81d1b120>] ? early_idt_handlers+0x120/0x120
        [<ffffffff81d1bbbe>] start_kernel+0x8b/0x415
        [<ffffffff81d1b120>] ? early_idt_handlers+0x120/0x120
        [<ffffffff81d1b5ee>] x86_64_start_reservations+0x2a/0x2c
        [<ffffffff81d1b72e>] x86_64_start_kernel+0x13e/0x14d
       ---[ end trace 11ae599a1898f4e7 ]---
      when installing the following table during early stage:
       ACPI: SSDT 0x00000000B9638018 07A0C4 (v02 INTEL  S2600CP  00004000 INTL 20100331)
      The regression is caused by the size limitation of the x86 early IO mapping.
      
      The root cause is:
       1. ACPICA doesn't split IO memory mapping and table mapping;
       2. Linux x86 OSL implements acpi_os_map_memory() using a size limited fix-map
          mechanism during early boot stage, which is more suitable for only IO
          mappings.
      
      This patch fixes this issue by utilizing acpi_gbl_verify_table_checksum to
      disable the table mapping during early stage and enabling it again for the
      late stage. In this way, the normal code path is not affected. Then after
      the code related to the root cause is cleaned up, the early checksum
      verification can be easily re-enabled.
      
      A new boot parameter - acpi_force_table_verification is introduced for
      the platforms that require the checksum verification to stop loading bad
      tables.
      
      This fix also covers the checksum verification for the table overrides. Now
      large tables can also be overridden using the initrd override mechanism.
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Reported-and-tested-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4fc0a7e8
  10. 28 5月, 2014 1 次提交
  11. 26 5月, 2014 1 次提交
    • R
      PM / sleep: Introduce command line argument for sleep state enumeration · 0399d4db
      Rafael J. Wysocki 提交于
      On some systems the platform doesn't support neither
      PM_SUSPEND_MEM nor PM_SUSPEND_STANDBY, so PM_SUSPEND_FREEZE is the
      only available system sleep state.  However, some user space frameworks
      only use the "mem" and (sometimes) "standby" sleep state labels, so
      the users of those systems need to modify user space in order to be
      able to use system suspend at all and that is not always possible.
      
      For this reason, add a new kernel command line argument,
      relative_sleep_states, allowing the users of those systems to change
      the way in which the kernel assigns labels to system sleep states.
      Namely, for relative_sleep_states=1, the "mem", "standby" and "freeze"
      labels will enumerate the available system sleem states from the
      deepest to the shallowest, respectively, so that "mem" is always
      present in /sys/power/state and the other state strings may or may
      not be presend depending on what is supported by the platform.
      
      Update system sleep states documentation to reflect this change.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0399d4db
  12. 14 5月, 2014 1 次提交
  13. 12 5月, 2014 1 次提交
  14. 07 5月, 2014 1 次提交
    • H
      ACPI / video: change acpi-video brightness_switch_enabled default to 0 · 886129a8
      Hans de Goede 提交于
      acpi-video is unique in that it not only generates brightness up/down
      keypresses, but also (sometimes) actively changes the brightness itself.
      
      This presents an inconsistent kernel interface to userspace, basically there
      are 2 different scenarios, depending on the laptop model:
      
       1) On some laptops a brightness up/down keypress means: show a brightness osd
       with the current brightness, iow it is a brightness has changed notification.
      
       2) Where as on (a lot of) other laptops it means a brightness up/down key was
       pressed, deal with it.
      
      Most of the desktop environments interpret any press as in scenario 2, and
      change the brightness up / down as a response to the key events, causing it
      to be changed twice, once by acpi-video and once by the DE.
      
      With the new default for video.use_native_backlight we will be moving even
      more laptops over to behaving as in scenario 2. Making the remaining laptops
      even more of a weird exception. Also note that it is hard to detect scenario
      1 properly in userspace, and AFAIK none of the DE-s deals with it.
      
      Therefor this commit changes the default of brightness_switch_enabled to 0
      making its behavior consistent with all the other backlight drivers.
      Signed-off-by: NHans de Goede <hdegoede@redhat.com>
      Reviewed-by: NAaron Lu <aaron.lu@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      886129a8
  15. 25 4月, 2014 2 次提交
  16. 21 4月, 2014 1 次提交
  17. 17 4月, 2014 1 次提交
  18. 08 4月, 2014 1 次提交
  19. 07 4月, 2014 1 次提交
  20. 26 3月, 2014 3 次提交
  21. 21 3月, 2014 1 次提交
  22. 14 3月, 2014 1 次提交
  23. 26 2月, 2014 1 次提交
    • K
      x86, kaslr: randomize module base load address · e2b32e67
      Kees Cook 提交于
      Randomize the load address of modules in the kernel to make kASLR
      effective for modules.  Modules can only be loaded within a particular
      range of virtual address space.  This patch adds 10 bits of entropy to
      the load address by adding 1-1024 * PAGE_SIZE to the beginning range
      where modules are loaded.
      
      The single base offset was chosen because randomizing each module
      load ends up wasting/fragmenting memory too much. Prior approaches to
      minimizing fragmentation while doing randomization tend to result in
      worse entropy than just doing a single base address offset.
      
      Example kASLR boot without this change, with a single module loaded:
      ---[ Modules ]---
      0xffffffffc0000000-0xffffffffc0001000           4K     ro     GLB x  pte
      0xffffffffc0001000-0xffffffffc0002000           4K     ro     GLB NX pte
      0xffffffffc0002000-0xffffffffc0004000           8K     RW     GLB NX pte
      0xffffffffc0004000-0xffffffffc0200000        2032K                   pte
      0xffffffffc0200000-0xffffffffff000000        1006M                   pmd
      ---[ End Modules ]---
      
      Example kASLR boot after this change, same module loaded:
      ---[ Modules ]---
      0xffffffffc0000000-0xffffffffc0200000           2M                   pmd
      0xffffffffc0200000-0xffffffffc03bf000        1788K                   pte
      0xffffffffc03bf000-0xffffffffc03c0000           4K     ro     GLB x  pte
      0xffffffffc03c0000-0xffffffffc03c1000           4K     ro     GLB NX pte
      0xffffffffc03c1000-0xffffffffc03c3000           8K     RW     GLB NX pte
      0xffffffffc03c3000-0xffffffffc0400000         244K                   pte
      0xffffffffc0400000-0xffffffffff000000        1004M                   pmd
      ---[ End Modules ]---
      Signed-off-by: NAndy Honig <ahonig@google.com>
      Link: http://lkml.kernel.org/r/20140226005916.GA27083@www.outflux.netSigned-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      e2b32e67
  24. 13 2月, 2014 1 次提交
    • L
      ACPICA: Add boot option to disable auto return object repair · 4dde507f
      Lv Zheng 提交于
      Sometimes, there might be bugs caused by unexpected AML which is compliant
      to the Windows but not compliant to the Linux implementation.
      
      There is a predefined validation mechanism implemented in ACPICA to repair
      the unexpected AML evaluation results that are caused by the unexpected
      AMLs.  For example, BIOS may return misorder _CST result and the repair
      mechanism can make an ascending order on the returned _CST package object
      based on the C-state type.
      This mechanism is quite useful to implement an AML interpreter with better
      compliance with the real world where Windows is the de-facto standard and
      BIOS codes are only tested on one platform thus not compliant to the
      ACPI specification.
      
      But if a compliance issue hasn't been figured out yet, it will be
      difficult for developers to identify if the unexpected evaluation result
      is caused by this mechanism or by the AML interpreter.
      For example, _PR0 is expected to be a control method, but BIOS may use
      Package: "Name(_PR0, Package(1) {P1PR})".
      This boot option can disable the predefined validation mechanism so that
      developers can make sure the root cause comes from the parser/executer.
      
      This patch adds a new kernel parameter to disable this feature.
      
      A build test has been made on a Dell Inspiron mini 1100 (i386 z530)
      machine when this patch is applied and the corresponding boot test is
      performed w/ or w/o the new kernel parameter specified.
      
      References: https://bugzilla.kernel.org/show_bug.cgi?id=67901Tested-by: NFabian Wehning <fabian.wehning@googlemail.com>
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4dde507f
  25. 07 2月, 2014 1 次提交
  26. 24 1月, 2014 2 次提交
  27. 18 1月, 2014 1 次提交
  28. 16 1月, 2014 2 次提交
    • P
      ACPI / memhotplug: add parameter to disable memory hotplug · 00159a20
      Prarit Bhargava 提交于
      When booting a kexec/kdump kernel on a system that has specific memory
      hotplug regions the boot will fail with warnings like:
      
       swapper/0: page allocation failure: order:9, mode:0x84d0
       CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.10.0-65.el7.x86_64 #1
       Hardware name: QCI QSSC-S4R/QSSC-S4R, BIOS QSSC-S4R.QCI.01.00.S013.032920111005 03/29/2011
        0000000000000000 ffff8800341bd8c8 ffffffff815bcc67 ffff8800341bd950
        ffffffff8113b1a0 ffff880036339b00 0000000000000009 00000000000084d0
        ffff8800341bd950 ffffffff815b87ee 0000000000000000 0000000000000200
       Call Trace:
        [<ffffffff815bcc67>] dump_stack+0x19/0x1b
        [<ffffffff8113b1a0>] warn_alloc_failed+0xf0/0x160
        [<ffffffff815b87ee>] ?  __alloc_pages_direct_compact+0xac/0x196
        [<ffffffff8113f14f>] __alloc_pages_nodemask+0x7ff/0xa00
        [<ffffffff815b417c>] vmemmap_alloc_block+0x62/0xba
        [<ffffffff815b41e9>] vmemmap_alloc_block_buf+0x15/0x3b
        [<ffffffff815b1ff6>] vmemmap_populate+0xb4/0x21b
        [<ffffffff815b461d>] sparse_mem_map_populate+0x27/0x35
        [<ffffffff815b400f>] sparse_add_one_section+0x7a/0x185
        [<ffffffff815a1e9f>] __add_pages+0xaf/0x240
        [<ffffffff81047359>] arch_add_memory+0x59/0xd0
        [<ffffffff815a21d9>] add_memory+0xb9/0x1b0
        [<ffffffff81333b9c>] acpi_memory_device_add+0x18d/0x26d
        [<ffffffff81309a01>] acpi_bus_device_attach+0x7d/0xcd
        [<ffffffff8132379d>] acpi_ns_walk_namespace+0xc8/0x17f
        [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
        [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
        [<ffffffff81323c8c>] acpi_walk_namespace+0x95/0xc5
        [<ffffffff8130a6d6>] acpi_bus_scan+0x8b/0x9d
        [<ffffffff81a2019a>] acpi_scan_init+0x63/0x160
        [<ffffffff81a1ffb5>] acpi_init+0x25d/0x2a6
        [<ffffffff81a1fd58>] ? acpi_sleep_proc_init+0x2a/0x2a
        [<ffffffff810020e2>] do_one_initcall+0xe2/0x190
        [<ffffffff819e20c4>] kernel_init_freeable+0x17c/0x207
        [<ffffffff819e18d0>] ? do_early_param+0x88/0x88
        [<ffffffff8159fea0>] ? rest_init+0x80/0x80
        [<ffffffff8159feae>] kernel_init+0xe/0x180
        [<ffffffff815cca2c>] ret_from_fork+0x7c/0xb0
        [<ffffffff8159fea0>] ? rest_init+0x80/0x80
       Mem-Info:
       Node 0 DMA per-cpu:
       CPU    0: hi:    0, btch:   1 usd:   0
       Node 0 DMA32 per-cpu:
       CPU    0: hi:   42, btch:   7 usd:   0
       active_anon:0 inactive_anon:0 isolated_anon:0
        active_file:0 inactive_file:0 isolated_file:0
        unevictable:0 dirty:0 writeback:0 unstable:0
        free:872 slab_reclaimable:13 slab_unreclaimable:1880
        mapped:0 shmem:0 pagetables:0 bounce:0
        free_cma:0
      
      because the system has run out of memory at boot time.  This occurs
      because of the following sequence in the boot:
      
      Main kernel boots and sets E820 map.  The second kernel is booted with a
      map generated by the kdump service using memmap= and memmap=exactmap.
      These parameters are added to the kernel parameters of the kexec/kdump
      kernel.   The kexec/kdump kernel has limited memory resources so as not
      to severely impact the main kernel.
      
      The system then panics and the kdump/kexec kernel boots (which is a
      completely new kernel boot).  During this boot ACPI is initialized and the
      kernel (as can be seen above) traverses the ACPI namespace and finds an
      entry for a memory device to be hotadded.
      
      ie)
      
        [<ffffffff815a1e9f>] __add_pages+0xaf/0x240
        [<ffffffff81047359>] arch_add_memory+0x59/0xd0
        [<ffffffff815a21d9>] add_memory+0xb9/0x1b0
        [<ffffffff81333b9c>] acpi_memory_device_add+0x18d/0x26d
        [<ffffffff81309a01>] acpi_bus_device_attach+0x7d/0xcd
        [<ffffffff8132379d>] acpi_ns_walk_namespace+0xc8/0x17f
        [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
        [<ffffffff81309984>] ? acpi_bus_type_and_status+0x90/0x90
        [<ffffffff81323c8c>] acpi_walk_namespace+0x95/0xc5
        [<ffffffff8130a6d6>] acpi_bus_scan+0x8b/0x9d
        [<ffffffff81a2019a>] acpi_scan_init+0x63/0x160
        [<ffffffff81a1ffb5>] acpi_init+0x25d/0x2a6
      
      At this point the kernel adds page table information and the the kexec/kdump
      kernel runs out of memory.
      
      This can also be reproduced by using the memmap=exactmap and mem=X
      parameters on the main kernel and booting.
      
      This patchset resolves the problem by adding a kernel parameter,
      acpi_no_memhotplug, to disable ACPI memory hotplug.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      00159a20
    • H
      x86, apic, kexec: Add disable_cpu_apicid kernel parameter · 151e0c7d
      HATAYAMA Daisuke 提交于
      Add disable_cpu_apicid kernel parameter. To use this kernel parameter,
      specify an initial APIC ID of the corresponding CPU you want to
      disable.
      
      This is mostly used for the kdump 2nd kernel to disable BSP to wake up
      multiple CPUs without causing system reset or hang due to sending INIT
      from AP to BSP.
      
      Kdump users first figure out initial APIC ID of the BSP, CPU0 in the
      1st kernel, for example from /proc/cpuinfo and then set up this kernel
      parameter for the 2nd kernel using the obtained APIC ID.
      
      However, doing this procedure at each boot time manually is awkward,
      which should be automatically done by user-land service scripts, for
      example, kexec-tools on fedora/RHEL distributions.
      
      This design is more flexible than disabling BSP in kernel boot time
      automatically in that in kernel boot time we have no choice but
      referring to ACPI/MP table to obtain initial APIC ID for BSP, meaning
      that the method is not applicable to the systems without such BIOS
      tables.
      
      One assumption behind this design is that users get initial APIC ID of
      the BSP in still healthy state and so BSP is uniquely kept in
      CPU0. Thus, through the kernel parameter, only one initial APIC ID can
      be specified.
      
      In a comparison with disabled_cpu_apicid, we use read_apic_id(), not
      boot_cpu_physical_apicid, because on some platforms, the variable is
      modified to the apicid reported as BSP through MP table and this
      function is executed with the temporarily modified
      boot_cpu_physical_apicid. As a result, disabled_cpu_apicid kernel
      parameter doesn't work well for apicids of APs.
      
      Fixing the wrong handling of boot_cpu_physical_apicid requires some
      reviews and tests beyond some platforms and it could take some
      time. The fix here is a kind of workaround to focus on the main topic
      of this patch.
      Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Link: http://lkml.kernel.org/r/20140115064458.1545.38775.stgit@localhost6.localdomain6Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      151e0c7d
  29. 14 1月, 2014 1 次提交