1. 26 4月, 2013 3 次提交
    • N
      powerpc/pseries: Use ARRAY_SIZE to iterate over firmware_features_table array · 43c0ea60
      Nathan Fontenot 提交于
      When iterating over the entries in firmware_features_table we only need
      to go over the actual number of entries in the array instead of declaring
      it to be bigger and checking to make sure there is a valid entry in every
      slot.
      
      This patch removes the FIRMWARE_MAX_FEATURES #define and replaces the
      array looping with the use of ARRAY_SIZE().
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      43c0ea60
    • N
      powerpc/pseries: Correct buffer parsing in update_dt_node() · 2e9b7b02
      Nathan Fontenot 提交于
      Correct parsing of the buffer returned from ibm,update-properties. The first
      element is a length and the path to the property which is slightly different
      from the list of properties in the buffer so we need to specifically
      handle this.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      2e9b7b02
    • N
      powerpc/pseries: Expose pseries devicetree_update() · 762ec157
      Nathan Fontenot 提交于
      Newer firmware on Power systems can transparently reassign platform resources
      (CPU and Memory) in use. For instance, if a processor or memory unit is
      predicted to fail, the platform may transparently move the processing to an
      equivalent unused processor or the memory state to an equivalent unused
      memory unit. However, reassigning resources across NUMA boundaries may alter
      the performance of the partition. When such reassignment is necessary, the
      Platform Resource Reassignment Notification (PRRN) option provides a
      mechanism to inform the Linux kernel of changes to the NUMA affinity of
      its platform resources.
      
      When rtasd receives a PRRN event, it needs to make a series of RTAS
      calls (ibm,update-nodes and ibm,update-properties) to retrieve the
      updated device tree information. These calls are already handled in the
      pseries_devicetree_update() routine used in partition migration.
      
      This patch exposes pseries_devicetree_update() to make it accessible
      to other pseries routines, this patch also updates pseries_devicetree_update()
      to take a 32-bit scope parameter. The scope value, which was previously hard
      coded to 1 for partition migration, is used for the RTAS calls
      ibm,update-nodes/properties to update the device tree.
      Signed-off-by: NNathan Fontenot <nfont@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      762ec157
  2. 18 4月, 2013 2 次提交
    • N
      powerpc/pseries: close DDW race between functions of adapter · 61435690
      Nishanth Aravamudan 提交于
      Given a PCI device with multiple functions in a DDW capable slot, the
      following situation can be encountered: When the first function sets a
      64-bit DMA mask, enable_ddw() will be called and we can fail to properly
      configure DDW (the most common reason being the new DMA window's size is
      not large enough to map all of an LPAR's memory). With the recent
      changes to DDW, we remove the base window in order to determine if the
      new window is of sufficient size to cover an LPAR's memory. We correctly
      replace the base window if we find that not to be the case. However,
      once we go through and re-configured 32-bit DMA via the IOMMU, the next
      function of the adapter will go through the same process. And since DDW
      is a characteristic of the slot itself, we are most likely going to fail
      again. But to determine we are going to fail the second slot, we again
      remove the base window -- but that is now in-use by the first
      function/driver, which might be issuing I/O already.
      
      To close this window, keep a list of all the failed struct device_nodes
      that have failed to configure DDW. If the current device_node is in that
      list, just fail out immediately and fall back to 32-bit DMA without
      doing any DDW manipulation.
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      61435690
    • L
      powerpc: Use VPA subfunction macros instead of numbers for vpa calls · bb18b3a4
      Li Zhong 提交于
      Use macros in vpa calls.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      bb18b3a4
  3. 08 4月, 2013 1 次提交
  4. 05 3月, 2013 1 次提交
  5. 23 2月, 2013 1 次提交
  6. 08 2月, 2013 2 次提交
    • N
      pseries/iommu: Remove DDW on kexec · 14b6f00f
      Nishanth Aravamudan 提交于
      pseries/iommu: remove DDW on kexec
      
      We currently insert a property in the device-tree when we successfully
      configure DDW for a given slot. This was meant to be an optimization to
      speed up kexec/kdump, so that we don't need to make the RTAS calls again
      to re-configured DDW in the new kernel.
      
      However, we end up tripping a plpar_tce_stuff failure on kexec/kdump
      because we unconditionally parse the ibm,dma-window property for the
      node at bus/dev setup time. This property contains the 32-bit DMA window
      LIOBN, which is distinct from the DDW window's. We pass that LIOBN (via
      iommu_table_init -> iommu_table_clear -> tce_free ->
      tce_freemulti_pSeriesLP) to plpar_tce_stuff, which fails because that
      32-bit window is no longer present after
      25ebc45b ("powerpc/pseries/iommu: remove
      default window before attempting DDW manipulation").
      
      I believe the simplest, easiest-to-maintain fix is to just change our
      initcall to, rather than detecting and updating the new kernel's DDW
      knowledge, just remove all DDW configurations. When the drivers
      re-initialize, we will set everything back up as it was before.
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      14b6f00f
    • N
      pseries/iommu: Restore_default_window does not use liobn parameter · a1dabade
      Nishanth Aravamudan 提交于
      The parameter is unused, and complicates a following fix. Just remove
      it.
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a1dabade
  7. 29 1月, 2013 2 次提交
    • N
      pseries/iommu: Ensure TCEs are cleared with non-huge DDW · 71cf1def
      Nishanth Aravamudan 提交于
      There are now two kinds of DMA windows that might be presented by
      PowerVM DDW support -- huge windows (that can map all of system memory
      regardless of the LPAR configuration) and non-huge windows (which
      can't). They are implemented slightly differently in PowerVM, and thus
      have different characteristics. The most obvious is that slot isolate
      doesn't clear the TCEs/window for us with non-huge windows. Thus, when a
      DLPAR operation occurs on a slot using a non-huge window, TCEs are still
      present (the notifier chain doesn't currently remove them explicitly)
      and the DLPAR fails. Fix this by calling remove_ddw() first, which will
      unmap the DDW TCEs.
      
      Note: a corresponding change to drmgr is needed to actually successfully
      DLPAR, such that the device-tree update (which causes the notifier chain
      to fire) occurs before slot isolate.
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      71cf1def
    • N
      pseries/iommu: Fix iteration in DDW TCE clearrange · 22b38298
      Nishanth Aravamudan 提交于
      tce_clearrange_multi_pSeriesLP is attempting to iterate over all TCEs in
      a given range. However, is it not advancing the dma_offset value passed
      to plpar_tce_stuff via the next value. This prevents DLPAR from
      completing, because TCEs are still present at slot isolation time.
      Signed-off-by: NNishanth Aravamudan <nacc@us.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      22b38298
  8. 28 1月, 2013 1 次提交
    • F
      cputime: Generic on-demand virtual cputime accounting · abf917cd
      Frederic Weisbecker 提交于
      If we want to stop the tick further idle, we need to be
      able to account the cputime without using the tick.
      
      Virtual based cputime accounting solves that problem by
      hooking into kernel/user boundaries.
      
      However implementing CONFIG_VIRT_CPU_ACCOUNTING require
      low level hooks and involves more overhead. But we already
      have a generic context tracking subsystem that is required
      for RCU needs by archs which plan to shut down the tick
      outside idle.
      
      This patch implements a generic virtual based cputime
      accounting that relies on these generic kernel/user hooks.
      
      There are some upsides of doing this:
      
      - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
      if context tracking is already built (already necessary for RCU in full
      tickless mode).
      
      - We can rely on the generic context tracking subsystem to dynamically
      (de)activate the hooks, so that we can switch anytime between virtual
      and tick based accounting. This way we don't have the overhead
      of the virtual accounting when the tick is running periodically.
      
      And one downside:
      
      - There is probably more overhead than a native virtual based cputime
      accounting. But this relies on hooks that are already set anyway.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      abf917cd
  9. 10 1月, 2013 10 次提交
  10. 04 1月, 2013 1 次提交
    • G
      POWERPC: drivers: remove __dev* attributes. · cad5cef6
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitdata,
      __devinitconst, and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cad5cef6
  11. 27 11月, 2012 1 次提交
    • J
      cpuidle: Measure idle state durations with monotonic clock · a474a515
      Julius Werner 提交于
      Many cpuidle drivers measure their time spent in an idle state by
      reading the wallclock time before and after idling and calculating the
      difference. This leads to erroneous results when the wallclock time gets
      updated by another processor in the meantime, adding that clock
      adjustment to the idle state's time counter.
      
      If the clock adjustment was negative, the result is even worse due to an
      erroneous cast from int to unsigned long long of the last_residency
      variable. The negative 32 bit integer will zero-extend and result in a
      forward time jump of roughly four billion milliseconds or 1.3 hours on
      the idle state residency counter.
      
      This patch changes all affected cpuidle drivers to either use the
      monotonic clock for their measurements or make use of the generic time
      measurement wrapper in cpuidle.c, which was already working correctly.
      Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
      should always already be disabled before entering the idle function, and
      not get reenabled until the generic wrapper has performed its second
      measurement). It also removes the erroneous cast, making sure that
      negative residency values are applied correctly even though they should
      not appear anymore.
      Signed-off-by: NJulius Werner <jwerner@chromium.org>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Tested-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a474a515
  12. 26 11月, 2012 1 次提交
  13. 23 11月, 2012 1 次提交
  14. 17 11月, 2012 1 次提交
  15. 15 11月, 2012 8 次提交
  16. 18 10月, 2012 3 次提交
    • D
      cpuidle/powerpc: Fix snooze state problem in the cpuidle design on pseries. · 83dac594
      Deepthi Dharwar 提交于
      Earlier without cpuidle framework on pseries, the native arch
      idle routine comprised of both snooze and nap
      states.  smt_snooze_delay variable was used to delay
      the idle process entry to deeper idle state like  nap.
      With the coming of cpuidle, this arch specific idle was replaced
      by two different idle routines, one for supporting snooze and other
      for nap. This enabled addition of more
      low level idle states on pseries in the future.
      
      On adopting the generic cpuidle framework for POWER systems,
      the decision of which idle state to choose from,  given a predicted
      idle time is taken by the menu governor based on
      target_residency and  exit_latency of the idle states.
      target_residency is the minimum time to be resident in that idle state.
      Exit_latency is time taken to exit out of idle state.
      Deeper the idle state, both the target residency and exit latency
      would be higher.
      
      In the current design, smt_snooze_delay is used as target_residency
      for the  snooze state which is incorrect, as it is not the
      minimum but the maximum duration to be in snooze state.
      This would  result in the governor in taking bad decision,
      as presently target_residency of nap < target_residency of snooze
      inspite of nap being deeper idle state.
      
      This patch aims to fix this problem by replacing the smt_snooze_delay loop
      in snooze state, with the need_resched()  as the governor is aware of
      entry and exit of various idle transitions based on which
      next idle time prediction.
      
      The governor is intelligent enough to determine the idle state the needs to
      be transitioned to and maintains a whole of heuristics including
      io load, previous idle states predictions etc for the same, based on
      which idle state entry decision is taken.
      
      With this fix, of setting target_residency of snooze to 0
      					     nap to smt_snooze_delay
      if the predicted idle time is less
      than smt_snooze_delay (target_residency of nap)
      value governor would pick snooze state, else nap. This adhers to the
      previous native idle design.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      83dac594
    • D
      cpuidle/powerpc: Fix smt_snooze_delay functionality. · 8ea959a1
      Deepthi Dharwar 提交于
      smt_snooze_delay was designed to  delay idle loop's nap entry
      in the native idle code before it got  ported over to use as part of
      the cpuidle framework.
      
      A -ve value  assigned to smt_snooze_delay should result in
      busy looping, in other words disabling the entry to nap state.
      
      	- https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-May/082450.html
      
      This particular functionality can be achieved currently by
      echo 1 > /sys/devices/system/cpu/cpu*/state1/disable
      but it is broken when one assigns -ve value to  the smt_snooze_delay
      variable either via sysfs entry or ppc64_cpu util.
      
      This patch aims to fix this, by disabling nap state when smt_snooze_delay
      variable is set to -ve value.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8ea959a1
    • D
      cpuidle/powerpc: Fix target residency initialisation in pseries cpuidle · 817deb05
      Deepthi Dharwar 提交于
      Remove the redundant target residency initialisation in pseries_cpuidle_driver_init().
      This is currently over-writing the residency time updated as part of the static
      table, resulting in  all the idle states having the same target
      residency of 100us which is incorrect. This may result in the menu governor making
      wrong state decisions.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      817deb05
  17. 12 10月, 2012 1 次提交