1. 12 3月, 2013 6 次提交
  2. 04 3月, 2013 5 次提交
  3. 03 3月, 2013 1 次提交
    • Y
      x86, ACPI, mm: Revert movablemem_map support · 20e6926d
      Yinghai Lu 提交于
      Tim found:
      
        WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80()
        Hardware name: S2600CP
        sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
        smpboot: Booting Node   1, Processors  #1
        Modules linked in:
        Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1
        Call Trace:
          set_cpu_sibling_map+0x279/0x449
          start_secondary+0x11d/0x1e5
      
      Don Morris reproduced on a HP z620 workstation, and bisected it to
      commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock
      is ready")
      
      It turns out movable_map has some problems, and it breaks several things
      
      1. numa_init is called several times, NOT just for srat. so those
      	nodes_clear(numa_nodes_parsed)
      	memset(&numa_meminfo, 0, sizeof(numa_meminfo))
         can not be just removed.  Need to consider sequence is: numaq, srat, amd, dummy.
         and make fall back path working.
      
      2. simply split acpi_numa_init to early_parse_srat.
         a. that early_parse_srat is NOT called for ia64, so you break ia64.
         b.  for (i = 0; i < MAX_LOCAL_APIC; i++)
      	     set_apicid_to_node(i, NUMA_NO_NODE)
           still left in numa_init. So it will just clear result from early_parse_srat.
           it should be moved before that....
         c.  it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved
             early before override from INITRD is settled.
      
      3. that patch TITLE is total misleading, there is NO x86 in the title,
         but it changes critical x86 code. It caused x86 guys did not
         pay attention to find the problem early. Those patches really should
         be routed via tip/x86/mm.
      
      4. after that commit, following range can not use movable ram:
        a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed?
        b. initrd... it will be freed after booting, so it could be on movable...
        c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G
      	anymore.
        d. init_mem_mapping: can not put page table high anymore.
        e. initmem_init: vmemmap can not be high local node anymore. That is
           not good.
      
      If node is hotplugable, the mem related range like page table and
      vmemmap could be on the that node without problem and should be on that
      node.
      
      We have workaround patch that could fix some problems, but some can not
      be fixed.
      
      So just remove that offending commit and related ones including:
      
       f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to
          protect movablecore_map in memblock_overlaps_region().")
      
       01a178a9 ("acpi, memory-hotplug: support getting hotplug info from
          SRAT")
      
       27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to
          the end of node")
      
       e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is
          ready")
      
       fb06bc8e ("page_alloc: bootmem limit with movablecore_map")
      
       42f47e27 ("page_alloc: make movablemem_map have higher priority")
      
       6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep
          movable limit for nodes")
      
       34b71f1e ("page_alloc: add movable_memmap kernel parameter")
      
       4d59a751 ("x86: get pg_data_t's memory from other node")
      
      Later we should have patches that will make sure kernel put page table
      and vmemmap on local node ram instead of push them down to node0.  Also
      need to find way to put other kernel used ram to local node ram.
      Reported-by: NTim Gardner <tim.gardner@canonical.com>
      Reported-by: NDon Morris <don.morris@hp.com>
      Bisected-by: NDon Morris <don.morris@hp.com>
      Tested-by: NDon Morris <don.morris@hp.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20e6926d
  4. 26 2月, 2013 2 次提交
  5. 24 2月, 2013 4 次提交
    • T
      acpi, memory-hotplug: parse SRAT before memblock is ready · e8d19552
      Tang Chen 提交于
      On linux, the pages used by kernel could not be migrated.  As a result,
      if a memory range is used by kernel, it cannot be hot-removed.  So if we
      want to hot-remove memory, we should prevent kernel from using it.
      
      The way now used to prevent this is specify a memory range by
      movablemem_map boot option and set it as ZONE_MOVABLE.
      
      But when the system is booting, memblock will allocate memory, and
      reserve the memory for kernel.  And before we parse SRAT, and know the
      node memory ranges, memblock is working.  And it may allocate memory in
      ranges to be set as ZONE_MOVABLE.  This memory can be used by kernel,
      and never be freed.
      
      So, let's parse SRAT before memblock is called first.  And it is early
      enough.
      
      The first call of memblock_find_in_range_node() is in:
      
        setup_arch()
          |-->setup_real_mode()
      
      so, this patch add a function early_parse_srat() to parse SRAT, and call
      it before setup_real_mode() is called.
      
      NOTE:
      
      1) early_parse_srat() is called before numa_init(), and has initialized
         numa_meminfo.  So DO NOT clear numa_nodes_parsed in numa_init() and DO
         NOT zero numa_meminfo in numa_init(), otherwise we will lose memory
         numa info.
      
      2) I don't know why using count of memory affinities parsed from SRAT
         as a return value in original acpi_numa_init().  So I add a static
         variable srat_mem_cnt to remember this count and use it as the return
         value of the new acpi_numa_init()
      
      [mhocko@suse.cz: parse SRAT before memblock is ready fix]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: "Brown, Len" <len.brown@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8d19552
    • W
      cpu-hotplug, memory-hotplug: try offlining the node when hotremoving a cpu · 76bba142
      Wen Congyang 提交于
      The node will be offlined when all memory/cpu on the node is hotremoved.
      So we should try offline the node when hotremoving a cpu on the node.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Len Brown <lenb@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76bba142
    • T
      memory-hotplug: remove sysfs file of node · 60a5a19e
      Tang Chen 提交于
      Introduce a new function try_offline_node() to remove sysfs file of node
      when all memory sections of this node are removed.  If some memory
      sections of this node are not removed, this function does nothing.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60a5a19e
    • R
      ACPI / PM: Take unusual configurations of power resources into account · b5d667eb
      Rafael J. Wysocki 提交于
      Commit d2e5f0c1 (ACPI / PCI: Rework the setup and cleanup of device
      wakeup) moved the initial disabling of system wakeup for PCI devices
      into a place where it can actually work and that exposed a hidden old
      issue with crap^Wunusual system designs where the same power
      resources are used for both wakeup power and device power control at
      run time.
      
      Namely, say there is one power resource such that the ACPI power
      state D0 of a PCI device depends on that power resource (i.e. the
      device is in D0 when that power resource is "on") and it is used
      as a wakeup power resource for the same device.  Then, calling
      acpi_pci_sleep_wake(pci_dev, false) for the device in question will
      cause the reference counter of that power resource to drop to 0,
      which in turn will cause it to be turned off.  As a result, the
      device will go into D3cold at that point, although it should have
      stayed in D0.
      
      As it turns out, that happens to USB controllers on some laptops
      and USB becomes unusable on those machines as a result, which is
      a major regression from v3.8.
      
      To fix this problem, (1) increment the reference counters of wakup
      power resources during their initialization if they are "on"
      initially, (2) prevent acpi_disable_wakeup_device_power() from
      decrementing the reference counters of wakeup power resources that
      were not enabled for wakeup power previously, and (3) prevent
      acpi_enable_wakeup_device_power() from incrementing the reference
      counters of wakeup power resources that already are enabled for
      wakeup power.
      
      In addition to that, if it is impossible to determine the initial
      states of wakeup power resources, avoid enabling wakeup for devices
      whose wakeup power depends on those power resources.
      Reported-by: NDave Jones <davej@redhat.com>
      Reported-by: NFabio Baltieri <fabio.baltieri@linaro.org>
      Tested-by: NFabio Baltieri <fabio.baltieri@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      b5d667eb
  6. 23 2月, 2013 1 次提交
  7. 22 2月, 2013 1 次提交
  8. 17 2月, 2013 1 次提交
  9. 16 2月, 2013 1 次提交
  10. 13 2月, 2013 10 次提交
    • R
      ACPI / hotplug: Fix concurrency issues and memory leaks · 3757b948
      Rafael J. Wysocki 提交于
      This changeset is aimed at fixing a few different but related
      problems in the ACPI hotplug infrastructure.
      
      First of all, since notify handlers may be run in parallel with
      acpi_bus_scan(), acpi_bus_trim() and acpi_bus_hot_remove_device()
      and some of them are installed for ACPI handles that have no struct
      acpi_device objects attached (i.e. before those objects are created),
      those notify handlers have to take acpi_scan_lock to prevent races
      from taking place (e.g. a struct acpi_device is found to be present
      for the given ACPI handle, but right after that it is removed by
      acpi_bus_trim() running in parallel to the given notify handler).
      Moreover, since some of them call acpi_bus_scan() and
      acpi_bus_trim(), this leads to the conclusion that acpi_scan_lock
      should be acquired by the callers of these two funtions rather by
      these functions themselves.
      
      For these reasons, make all notify handlers that can handle device
      addition and eject events take acpi_scan_lock and remove the
      acpi_scan_lock locking from acpi_bus_scan() and acpi_bus_trim().
      Accordingly, update all of their users to make sure that they
      are always called under acpi_scan_lock.
      
      Furthermore, since eject operations are carried out asynchronously
      with respect to the notify events that trigger them, with the help
      of acpi_bus_hot_remove_device(), even if notify handlers take the
      ACPI scan lock, it still is possible that, for example,
      acpi_bus_trim() will run between acpi_bus_hot_remove_device() and
      the notify handler that scheduled its execution and that
      acpi_bus_trim() will remove the device node passed to
      acpi_bus_hot_remove_device() for ejection.  In that case, the struct
      acpi_device object obtained by acpi_bus_hot_remove_device() will be
      invalid and not-so-funny things will ensue.  To protect agaist that,
      make the users of acpi_bus_hot_remove_device() run get_device() on
      ACPI device node objects that are about to be passed to it and make
      acpi_bus_hot_remove_device() run put_device() on them and check if
      their ACPI handles are not NULL (make acpi_device_unregister() clear
      the device nodes' ACPI handles for that check to work).
      
      Finally, observe that acpi_os_hotplug_execute() actually can fail,
      in which case its caller ought to free memory allocated for the
      context object to prevent leaks from happening.  It also needs to
      run put_device() on the device node that it ran get_device() on
      previously in that case.  Modify the code accordingly.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      3757b948
    • R
      ACPI / scan: Full transition to D3cold in acpi_device_unregister() · 0aa120a0
      Rafael J. Wysocki 提交于
      In order to drop reference counts of all power resources used by an
      ACPI device node being removed, acpi_device_unregister() calls
      acpi_power_transition(device, ACPI_STATE_D3_COLD), which effectively
      transitions the device node into D3cold if it uses any power
      resources.  However, for some device nodes it may not be appropriate
      to remove power from them entirely before putting them into D3hot
      before.  On the other hand, executing _PS3 for devices that don't
      use power resources before removing them shouldn't really hurt.
      In fact, that is done by acpi_bus_hot_remove_device(), but this is
      not the right place to do it, because the bus trimming may have
      caused power to be removed from the device node in question already
      before.
      
      For these reasons, make acpi_device_unregister() carry out full
      power-off transition for all device nodes supporting that and remove
      the direct evaluation of _PS3 from acpi_bus_hot_remove_device().
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      0aa120a0
    • R
      ACPI / scan: Make acpi_bus_hot_remove_device() acquire the scan lock · f058cdf4
      Rafael J. Wysocki 提交于
      The ACPI scan lock has been introduced to prevent acpi_bus_scan()
      and acpi_bus_trim() from running in parallel with each other for
      overlapping ACPI namespace scopes.  However, it is not sufficient
      to do that, because if acpi_bus_scan() is run (for an overlapping
      namespace scope) right after the acpi_bus_trim() in
      acpi_bus_hot_remove_device(), the subsequent eject will remove
      devices without removing the corresponding struct acpi_device
      objects (and possibly companion "physical" device objects).
      Therefore acpi_bus_hot_remove_device() has to acquire the scan
      lock before carrying out the bus trimming and hold it through
      the evaluation of _EJ0, so make that happen.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      f058cdf4
    • R
      ACPI: Drop the container.h header file · 87d4a4da
      Rafael J. Wysocki 提交于
      The include/acpi/container.h only contains a definition of a
      structure that is not used any more, so drop it entirely.
      
      Similar change was proposed earlier by Toshi Kani.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      87d4a4da
    • R
      ACPI / scan: Make container driver use struct acpi_scan_handler · 737f1a9f
      Rafael J. Wysocki 提交于
      Make the ACPI container driver use struct acpi_scan_handler for
      representing the object used to initialize ACPI containers and remove
      the ACPI driver structure used previously and the data structures
      created by it, since in fact they were not used for any purpose.
      
      This simplifies the code and reduces the kernel's memory footprint by
      avoiding the registration of a struct device_driver object with the
      driver core and creation of its sysfs directory which is unnecessary.
      
      In addition to that, make the namespace walk callback used for
      installing the notify handlers for ACPI containers more
      straightforward.
      
      This change includes fixes from Toshi Kani.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Tested-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reviewed-by: NToshi Kani <toshi.kani@hp.com>
      Tested-by: NToshi Kani <toshi.kani@hp.com>
      737f1a9f
    • R
      ACPI / scan: Remove useless #ifndef from acpi_eject_store() · 38475b3b
      Rafael J. Wysocki 提交于
      Since the FORCE_EJECT symbol is never defined, the
      #ifndef FORCE_EJECT in acpi_eject_store() is always true, so drop it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Tested-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reviewed-by: NToshi Kani <toshi.kani@hp.com>
      Tested-by: NToshi Kani <toshi.kani@hp.com>
      38475b3b
    • T
      ACPI: Unbind ACPI drv when probe failed · 5f27ee8e
      Toshi Kani 提交于
      When acpi_device_install_notify_handler() failed in acpi_device_probe(),
      it calls acpi_drv->ops.remove() and fails the probe.  However, the ACPI
      driver is left bound to the acpi_device.  Fix it by clearing the driver
      and driver_data fields.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5f27ee8e
    • T
      ACPI: sysfs eject support for ACPI scan handlers · ce7685ad
      Toshi Kani 提交于
      Changed sysfs eject, acpi_eject_store(), so that it doesn't return
      error codes for devices nodes with ACPI scan handlers attached and
      no ACPI drivers.
      
      [rjw: Changelog]
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      ce7685ad
    • R
      ACPI / scan: Follow priorities of IDs when matching scan handlers · 87b85b3c
      Rafael J. Wysocki 提交于
      The IDs of ACPI device nodes stored in their pnp.ids member arrays
      are sorted by decreasing priority (i.e. the highest-priority ID is
      the first entry).  This means that when matching scan handlers to
      device nodes, the namespace scanning code should walk the list of
      scan handlers for each device node ID instead of walking the list
      of device node IDs for each handler (the latter causes the first
      handler matching any of the device node IDs to be chosen, although
      there may be another handler matching an ID of a higher priority
      which should be preferred).  Make the code follow this observation.
      
      This change has been suggested and justified by Toshi Kani.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      87b85b3c
    • J
      ACPI / PCI: pci_slot: replace printk(KERN_xxx) with pr_xxx() · 73ce873a
      Jiang Liu 提交于
      Trivial changes to replace printk(KERN_xxx) with pr_xxx().
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      73ce873a
  11. 10 2月, 2013 2 次提交
    • L
      x86 idle: remove mwait_idle() and "idle=mwait" cmdline param · 69fb3676
      Len Brown 提交于
      mwait_idle() is a C1-only idle loop intended to be more efficient
      than HLT, starting on Pentium-4 HT-enabled processors.
      
      But mwait_idle() has been replaced by the more general
      mwait_idle_with_hints(), which handles both C1 and deeper C-states.
      ACPI processor_idle and intel_idle use only mwait_idle_with_hints(),
      and no longer use mwait_idle().
      
      Here we simplify the x86 native idle code by removing mwait_idle(),
      and the "idle=mwait" bootparam used to invoke it.
      
      Since Linux 3.0 there has been a boot-time warning when "idle=mwait"
      was invoked saying it would be removed in 2012.  This removal
      was also noted in the (now removed:-) feature-removal-schedule.txt.
      
      After this change, kernels configured with
      (CONFIG_ACPI=n && CONFIG_INTEL_IDLE=n) when run on hardware
      that supports MWAIT will simply use HLT.  If MWAIT is desired
      on those systems, cpuidle and the cpuidle drivers above
      can be enabled.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: x86@kernel.org
      69fb3676
    • Z
      ACPI: enable ACPI SCI during suspend · 89a22dad
      Zhang Rui 提交于
      Enable ACPI SCI during suspend so that SCI can be used
      as wake events for PM_SUSPEND_FREEZE.
      
      For S3/S4 transition,
      We disable all GPEs in suspend_ops->prepare_late() to
      fix a problem that GPEs may trigger SCI  before
      arch_suspend_disable_irqs() is run.
      So it is safe to leave the SCI enabled until
      arch_suspend_irq_disable() is run.
      Signed-off-by: NZhang Rui <rui.zhang@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      89a22dad
  12. 09 2月, 2013 4 次提交
  13. 06 2月, 2013 1 次提交
  14. 03 2月, 2013 1 次提交
    • R
      ACPI / PM: Handle missing _PSC in acpi_bus_update_power() · 511d5c42
      Rafael J. Wysocki 提交于
      If _PS0 is defined for an ACPI device node, but _PSC isn't and
      the device node doesn't use power resources for power management,
      acpi_bus_update_power() will fail to update the power state of it,
      because acpi_device_get_power() returns ACPI_STATE_UNKNOWN in that
      case.
      
      To handle that situation make acpi_bus_update_power() follow
      acpi_bus_init_power() and try to force the given device node into
      power state D0.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      511d5c42