1. 27 10月, 2013 2 次提交
    • M
      ima: enable support for larger default filedata hash algorithms · e7a2ad7e
      Mimi Zohar 提交于
      The IMA measurement list contains two hashes - a template data hash
      and a filedata hash.  The template data hash is committed to the TPM,
      which is limited, by the TPM v1.2 specification, to 20 bytes.  The
      filedata hash is defined as 20 bytes as well.
      
      Now that support for variable length measurement list templates was
      added, the filedata hash is not limited to 20 bytes.  This patch adds
      Kconfig support for defining larger default filedata hash algorithms
      and replacing the builtin default with one specified on the kernel
      command line.
      
      <uapi/linux/hash_info.h> contains a list of hash algorithms.  The
      Kconfig default hash algorithm is a subset of this list, but any hash
      algorithm included in the list can be specified at boot, using the
      'ima_hash=' kernel command line option.
      
      Changelog v2:
      - update Kconfig
      
      Changelog:
      - support hashes that are configured
      - use generic HASH_ALGO_ definitions
      - add Kconfig support
      - hash_setup must be called only once (Dmitry)
      - removed trailing whitespaces (Roberto Sassu)
      Signed-off-by: NMimi Zohar <zohar@us.ibm.com>
      Signed-off-by: NRoberto Sassu <roberto.sassu@polito.it>
      e7a2ad7e
    • R
      ima: define kernel parameter 'ima_template=' to change configured default · 9b9d4ce5
      Roberto Sassu 提交于
      This patch allows users to specify from the kernel command line the
      template descriptor, among those defined, that will be used to generate
      and display measurement entries. If an user specifies a wrong template,
      IMA reverts to the template descriptor set in the kernel configuration.
      Signed-off-by: NRoberto Sassu <roberto.sassu@polito.it>
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      9b9d4ce5
  2. 05 9月, 2013 1 次提交
  3. 24 8月, 2013 1 次提交
  4. 05 8月, 2013 1 次提交
    • C
      vt: make the default color configurable · 3855ae1c
      Clemens Ladisch 提交于
      The virtual console has (undocumented) module parameters to set the
      colors for italic and underlined text, but the default text color was
      hardcoded for some reason.  This made it impossible to change the color
      for startup messages, or to set the default for new virtual consoles.
      Add a module parameter for that, and document the entire bunch.
      
      Any hacker who thinks that a command prompt on a "black screen with
      white font" is not supicious enough can now use the kernel parameter
      vt.color=10 to get a nice, evil green.
      Signed-off-by: NClemens Ladisch <clemens@ladisch.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      3855ae1c
  5. 23 7月, 2013 2 次提交
    • L
      ACPI: Add facility to remove all _OSI strings · 741d8128
      Lv Zheng 提交于
      This patch changes the "acpi_osi=" boot parameter implementation so
      that:
      1. "acpi_osi=!" can be used to disable all _OSI OS vendor strings by
         default.  It is meaningless to specify "acpi_osi=!" multiple
         times as it can only affect the default state of the target _OSI
         strings.
      2. "acpi_osi=!*" can be used to remove all _OSI OS vendor strings
         and all _OSI feature group strings.  It is useful to specify
         "acpi_osi=!*" multiple times through kernel command line to
         override the current state of the target _OSI strings.
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Reviewed-by: NZhang Rui <rui.zhang@intel.com>
      Acked-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      741d8128
    • L
      ACPI: Add facility to disable all _OSI OS vendor strings · 5dc17986
      Lv Zheng 提交于
      This patch introduces "acpi_osi=!" command line to force Linux replying
      "UNSUPPORTED" to all of the _OSI strings.  This patch is based on an
      ACPICA enhancement - the new API acpi_update_interfaces().
      
      The _OSI object provides the platform with the ability to query OSPM
      to determine the set of ACPI related interfaces, behaviors, or
      features that the operating system supports.  The argument passed to
      the _OSI is a string like the followings:
      1. Feature Group String, examples include
         Module Device
         Processor Device
         3.0 _SCP Extensions
         Processor Aggregator Device
         ...
      2. OS Vendor String, examples include
         Linux
         FreeBSD
         Windows
         ...
      
      There are AML codes provided in the ACPI namespace written in the
      following style to determine OSPM interfaces / features:
          Method(OSCK)
          {
              if (CondRefOf(_OSI, Local0))
              {
                  if (\_OSI("Windows"))
                  {
                      Return (One)
                  }
                  if (\_OSI("Windows 2006"))
                  {
                      Return (Ones)
                  }
                  Return (Zero)
              }
              Return (Zero)
          }
      
      There is a debugging facility implemented in Linux.  Users can pass
      "acpi_osi=" boot parameters to the kernel to tune the _OSI evaluation
      result so that certain AML codes can be executed.  Current
      implementation includes:
      1. 'acpi_osi=' - this makes CondRefOf(_OSI, Local0) TRUE
      2. 'acpi_osi="Windows"' - this makes \_OSI("Windows") TRUE
      3. 'acpi_osi="!Windows"' - this makes \_OSI("Windows") FALSE
      The function to implement this feature is also used as a quirk mechanism
      in the Linux ACPI subystem.
      
      When _OSI is evaluatated by the AML codes, ACPICA replies "SUPPORTED"
      to all Windows operating system vendor strings.  This is because
      Windows operating systems return "SUPPORTED" if the argument to the
      _OSI method specifies an earlier version of Windows.  Please refer to
      the following MSDN document:
      
      How to Identify the Windows Version in ACPI by Using _OSI
      http://msdn.microsoft.com/en-us/library/hardware/gg463275.aspx
      
      This adds difficulties when developers want to feed specific Windows
      operating system vendor string to the BIOS codes for debugging
      purpose, multiple acpi_osi="!xxx" have to be specified in the command
      line to force Linux replying "UNSUPPORTED" to the Windows OS vendor
      strings listed in the AML codes.
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Reviewed-by: NZhang Rui <rui.zhang@intel.com>
      Acked-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5dc17986
  6. 10 7月, 2013 1 次提交
  7. 21 6月, 2013 1 次提交
  8. 20 6月, 2013 2 次提交
    • M
      integrity: move integrity_audit_msg() · d726d8d7
      Mimi Zohar 提交于
      This patch moves the integrity_audit_msg() function and defintion to
      security/integrity/, the parent directory, renames the 'ima_audit'
      boot command line option to 'integrity_audit', and fixes the Kconfig
      help text to reflect the actual code.
      
      Changelog:
      - Fixed ifdef inclusion of integrity_audit_msg() (Fengguang Wu)
      Signed-off-by: NMimi Zohar <zohar@linux.vnet.ibm.com>
      d726d8d7
    • S
      tracing: Disable tracing on warning · de7edd31
      Steven Rostedt (Red Hat) 提交于
      Add a traceoff_on_warning option in both the kernel command line as well
      as a sysctl option. When set, any WARN*() function that is hit will cause
      the tracing_on variable to be cleared, which disables writing to the
      ring buffer.
      
      This is useful especially when tracing a bug with function tracing. When
      a warning is hit, the print caused by the warning can flood the trace with
      the functions that producing the output for the warning. This can make the
      resulting trace useless by either hiding where the bug happened, or worse,
      by overflowing the buffer and losing the trace of the bug totally.
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      de7edd31
  9. 15 6月, 2013 1 次提交
  10. 22 5月, 2013 1 次提交
    • V
      libata: Add atapi_dmadir force flag · 966fbe19
      Vincent Pelletier 提交于
      Some device require DMADIR to be enabled, but are not detected as such
      by atapi_id_dmadir.  One such example is "Asus Serillel 2"
      SATA-host-to-PATA-device bridge: the bridge itself requires DMADIR,
      even if the bridged device does not.
      
      As atapi_dmadir module parameter can cause problems with some devices
      (as per Tejun Heo's memory), enabling it globally may not be possible
      depending on the hardware.
      
      This patch adds atapi_dmadir in the form of a "force" horkage value,
      allowing global, per-bus and per-device control.
      Signed-off-by: NVincent Pelletier <plr.vincent@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      966fbe19
  11. 17 5月, 2013 1 次提交
  12. 15 5月, 2013 3 次提交
    • K
      xen/tmem: Don't use self[ballooning|shrinking] if frontswap is off. · 37d46e15
      Konrad Rzeszutek Wilk 提交于
      There is no point. We would just squeeze the guest to put more and
      more pages in the swap disk without any purpose.
      
      The only time it makes sense to use the selfballooning and shrinking
      is when frontswap is being utilized.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      37d46e15
    • K
      xen/tmem: Remove the boot options and fold them in the tmem.X parameters. · 2ca62b04
      Konrad Rzeszutek Wilk 提交于
      If tmem is built-in or a module, the user has the option on
      the command line to influence it by doing: tmem.<some option>
      instead of having a variety of "nocleancache", and
      "nofrontswap". The others: "noselfballooning" and "selfballooning";
      and "noselfshrink" are in a different driver xen-selfballoon.c
      and the patches:
      
       xen/tmem: Remove the usage of 'noselfshrink' and use 'tmem.selfshrink' bool instead.
       xen/tmem: Remove the usage of 'noselfballoon','selfballoon' and use 'tmem.selfballon' bool instead.
      
      remove them.
      
      Also add documentation.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      2ca62b04
    • V
      workqueues: Introduce new flag WQ_POWER_EFFICIENT for power oriented workqueues · cee22a15
      Viresh Kumar 提交于
      Workqueues can be performance or power-oriented. Currently, most workqueues are
      bound to the CPU they were created on. This gives good performance (due to cache
      effects) at the cost of potentially waking up otherwise idle cores (Idle from
      scheduler's perspective. Which may or may not be physically idle) just to
      process some work. To save power, we can allow the work to be rescheduled on a
      core that is already awake.
      
      Workqueues created with the WQ_UNBOUND flag will allow some power savings.
      However, we don't change the default behaviour of the system.  To enable
      power-saving behaviour, a new config option CONFIG_WQ_POWER_EFFICIENT needs to
      be turned on. This option can also be overridden by the
      workqueue.power_efficient boot parameter.
      
      tj: Updated config description and comments.  Renamed
          CONFIG_WQ_POWER_EFFICIENT to CONFIG_WQ_POWER_EFFICIENT_DEFAULT.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NAmit Kucheria <amit.kucheria@linaro.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      cee22a15
  13. 28 4月, 2013 1 次提交
  14. 20 4月, 2013 1 次提交
  15. 19 4月, 2013 2 次提交
    • F
      nohz: Ensure full dynticks CPUs are RCU nocbs · d1e43fa5
      Frederic Weisbecker 提交于
      We need full dynticks CPU to also be RCU nocb so
      that we don't have to keep the tick to handle RCU
      callbacks.
      
      Make sure the range passed to nohz_full= boot
      parameter is a subset of rcu_nocbs=
      
      The CPUs that fail to meet this requirement will be
      excluded from the nohz_full range. This is checked
      early in boot time, before any CPU has the opportunity
      to stop its tick.
      Suggested-by: NSteven Rostedt <rostedt@goodmis.org>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      d1e43fa5
    • F
      nohz: Force boot CPU outside full dynticks range · 0453b435
      Frederic Weisbecker 提交于
      The timekeeping job must be able to run early on boot
      because there may be some pre-SMP (and thus pre-initcalls )
      components that rely on it. The IO-APIC is one such users
      as it tests the timer health by watching jiffies progression.
      
      Given that it happens before we know the initial online
      set, we can't rely on it to select a timekeeper. We need
      one before SMP time otherwise we simply crash on boot.
      
      To fix this and keep things simple for now, force the boot CPU
      outside of the full dynticks range in any case and do this early
      on kernel parameter parsing time.
      
      We might want a trickier solution later, expecially for aSMP
      architectures that need to assign housekeeping tasks to arbitrary
      low power CPUs.
      
      But it's still first pass KISS time for now.
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      0453b435
  16. 18 4月, 2013 3 次提交
  17. 17 4月, 2013 2 次提交
  18. 16 4月, 2013 1 次提交
    • F
      nohz: Switch from "extended nohz" to "full nohz" based naming · c5bfece2
      Frederic Weisbecker 提交于
      "Extended nohz" was used as a naming base for the full dynticks
      API and Kconfig symbols. It reflects the fact the system tries
      to stop the tick in more places than just idle.
      
      But that "extended" name is a bit opaque and vague. Rename it to
      "full" makes it clearer what the system tries to do under this
      config: try to shutdown the tick anytime it can. The various
      constraints that prevent that to happen shouldn't be considered
      as fundamental properties of this feature but rather technical
      issues that may be solved in the future.
      Reported-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      c5bfece2
  19. 11 4月, 2013 1 次提交
  20. 02 4月, 2013 1 次提交
    • T
      workqueue: update sysfs interface to reflect NUMA awareness and a kernel param... · d55262c4
      Tejun Heo 提交于
      workqueue: update sysfs interface to reflect NUMA awareness and a kernel param to disable NUMA affinity
      
      Unbound workqueues are now NUMA aware.  Let's add some control knobs
      and update sysfs interface accordingly.
      
      * Add kernel param workqueue.numa_disable which disables NUMA affinity
        globally.
      
      * Replace sysfs file "pool_id" with "pool_ids" which contain
        node:pool_id pairs.  This change is userland-visible but "pool_id"
        hasn't seen a release yet, so this is okay.
      
      * Add a new sysf files "numa" which can toggle NUMA affinity on
        individual workqueues.  This is implemented as attrs->no_numa whichn
        is special in that it isn't part of a pool's attributes.  It only
        affects how apply_workqueue_attrs() picks which pools to use.
      
      After "pool_ids" change, first_pwq() doesn't have any user left.
      Removed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
      d55262c4
  21. 27 3月, 2013 2 次提交
  22. 26 3月, 2013 2 次提交
  23. 21 3月, 2013 1 次提交
    • F
      nohz: Basic full dynticks interface · a831881b
      Frederic Weisbecker 提交于
      For extreme usecases such as Real Time or HPC, having
      the ability to shutdown the tick when a single task runs
      on a CPU is a desired feature:
      
      * Reducing the amount of interrupts improves throughput
      for CPU-bound tasks. The CPU is less distracted from its
      real job, from an execution time and from the cache point
      of views.
      
      * This also improve latency response as we have less critical
      sections.
      
      Start with introducing a very simple interface to define
      full dynticks CPU: use a boot time option defined cpumask
      through the "nohz_extended=" kernel parameter. CPUs that
      are part of this range will have their tick shutdown
      whenever possible: provided they run a single task and
      they don't do kernel activity that require the periodic
      tick. These details will be later documented in
      Documentation/*
      
      An online CPU must be kept outside this range to handle the
      timekeeping.
      Suggested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Gilad Ben Yossef <gilad@benyossef.com>
      Cc: Hakan Akkan <hakanakkan@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Kevin Hilman <khilman@linaro.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      a831881b
  24. 15 3月, 2013 1 次提交
    • S
      tracing: Add alloc_snapshot kernel command line parameter · 55034cd6
      Steven Rostedt (Red Hat) 提交于
      If debugging the kernel, and the developer wants to use
      tracing_snapshot() in places where tracing_snapshot_alloc() may
      be difficult (or more likely, the developer is lazy and doesn't
      want to bother with tracing_snapshot_alloc() at all), then adding
      
        alloc_snapshot
      
      to the kernel command line parameter will tell ftrace to allocate
      the snapshot buffer (if configured) when it allocates the main
      tracing buffer.
      
      I also noticed that ring_buffer_expanded and tracing_selftest_disabled
      had inconsistent use of boolean "true" and "false" with "0" and "1".
      I cleaned that up too.
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      55034cd6
  25. 03 3月, 2013 2 次提交
    • J
      metag: Basic documentation · fdabf525
      James Hogan 提交于
      Add basic metag documentation. This includes an outline description of
      the ABIs (including syscall ABI) and calling conventions, similar to the
      one in Documentation/frv/.
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Rob Landley <rob@landley.net>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: linux-doc@vger.kernel.org
      fdabf525
    • Y
      x86, ACPI, mm: Revert movablemem_map support · 20e6926d
      Yinghai Lu 提交于
      Tim found:
      
        WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
        Hardware name: S2600CP
        sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
        smpboot: Booting Node   1, Processors  #1
        Modules linked in:
        Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
        Call Trace:
          set_cpu_sibling_map+0x279/0x449
          start_secondary+0x11d/0x1e5
      
      Don Morris reproduced on a HP z620 workstation, and bisected it to
      commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
      is ready")
      
      It turns out movable_map has some problems, and it breaks several things
      
      1. numa_init is called several times, NOT just for srat. so those
      	nodes_clear(numa_nodes_parsed)
      	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
         can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
         and make fall back path working.
      
      2. simply split acpi_numa_init to early_parse_srat.
         a. that early_parse_srat is NOT called for ia64, so you break ia64.
         b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
      	     set_apicid_to_node(i, NUMA_NO_NODE)
           still left in numa_init. So it will just clear result from early_parse_srat.
           it should be moved before that....
         c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
             early before override from INITRD is settled.
      
      3. that patch TITLE is total misleading, there is NO x86 in the title,
         but it changes critical x86 code. It caused x86 guys did not
         pay attention to find the problem early. Those patches really should
         be routed via tip/x86/mm.
      
      4. after that commit, following range can not use movable ram:
        a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
        b. initrd... it will be freed after booting, so it could be on movable...
        c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
      	anymore.
        d. init_mem_mapping: can not put page table high anymore.
        e. initmem_init: vmemmap can not be high local node anymore. That is
           not good.
      
      If node is hotplugable, the mem related range like page table and
      vmemmap could be on the that node without problem and should be on that
      node.
      
      We have workaround patch that could fix some problems, but some can not
      be fixed.
      
      So just remove that offending commit and related ones including:
      
       f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
          protect movablecore_map in memblock_overlaps_region().")
      
       01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
          SRAT")
      
       27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
          the end of node")
      
       e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
          ready")
      
       fb06bc8e ("page_alloc: bootmem limit with movablecore_map")
      
       42f47e27 ("page_alloc: make movablemem_map have higher priority")
      
       6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
          movable limit for nodes")
      
       34b71f1e ("page_alloc: add movable_memmap kernel parameter")
      
       4d59a751 ("x86: get pg_data_t's memory from other node")
      
      Later we should have patches that will make sure kernel put page table
      and vmemmap on local node ram instead of push them down to node0.  Also
      need to find way to put other kernel used ram to local node ram.
      Reported-by: NTim Gardner <tim.gardner@canonical.com>
      Reported-by: NDon Morris <don.morris@hp.com>
      Bisected-by: NDon Morris <don.morris@hp.com>
      Tested-by: NDon Morris <don.morris@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20e6926d
  26. 26 2月, 2013 2 次提交
  27. 24 2月, 2013 1 次提交
    • T
      acpi, memory-hotplug: support getting hotplug info from SRAT · 01a178a9
      Tang Chen 提交于
      We now provide an option for users who don't want to specify physical
      memory address in kernel commandline.
      
               /*
                * For movablemem_map=acpi:
                *
                * SRAT:                |_____| |_____| |_________| |_________| ......
                * node id:                0       1         1           2
                * hotpluggable:           n       y         y           n
                * movablemem_map:              |_____| |_________|
                *
                * Using movablemem_map, we can prevent memblock from allocating memory
                * on ZONE_MOVABLE at boot time.
                */
      
      So user just specify movablemem_map=acpi, and the kernel will use
      hotpluggable info in SRAT to determine which memory ranges should be set
      as ZONE_MOVABLE.
      
      If all the memory ranges in SRAT is hotpluggable, then no memory can be
      used by kernel.  But before parsing SRAT, memblock has already reserve
      some memory ranges for other purposes, such as for kernel image, and so
      on.  We cannot prevent kernel from using these memory.  So we need to
      exclude these ranges even if these memory is hotpluggable.
      
      Furthermore, there could be several memory ranges in the single node
      which the kernel resides in.  We may skip one range that have memory
      reserved by memblock, but if the rest of memory is too small, then the
      kernel will fail to boot.  So, make the whole node which the kernel
      resides in un-hotpluggable.  Then the kernel has enough memory to use.
      
      NOTE: Using this way will cause NUMA performance down because the
            whole node will be set as ZONE_MOVABLE, and kernel cannot use memory
            on it.  If users don't want to lose NUMA performance, just don't use
            it.
      
      [akpm@linux-foundation.org: fix warning]
      [akpm@linux-foundation.org: use strcmp()]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: "Brown, Len" <len.brown@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01a178a9