1. 28 1月, 2013 1 次提交
    • F
      cputime: Generic on-demand virtual cputime accounting · abf917cd
      Frederic Weisbecker 提交于
      If we want to stop the tick further idle, we need to be
      able to account the cputime without using the tick.
      
      Virtual based cputime accounting solves that problem by
      hooking into kernel/user boundaries.
      
      However implementing CONFIG_VIRT_CPU_ACCOUNTING require
      low level hooks and involves more overhead. But we already
      have a generic context tracking subsystem that is required
      for RCU needs by archs which plan to shut down the tick
      outside idle.
      
      This patch implements a generic virtual based cputime
      accounting that relies on these generic kernel/user hooks.
      
      There are some upsides of doing this:
      
      - This requires no arch code to implement CONFIG_VIRT_CPU_ACCOUNTING
      if context tracking is already built (already necessary for RCU in full
      tickless mode).
      
      - We can rely on the generic context tracking subsystem to dynamically
      (de)activate the hooks, so that we can switch anytime between virtual
      and tick based accounting. This way we don't have the overhead
      of the virtual accounting when the tick is running periodically.
      
      And one downside:
      
      - There is probably more overhead than a native virtual based cputime
      accounting. But this relies on hooks that are already set anyway.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      abf917cd
  2. 04 1月, 2013 1 次提交
    • G
      POWERPC: drivers: remove __dev* attributes. · cad5cef6
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitdata,
      __devinitconst, and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cad5cef6
  3. 27 11月, 2012 1 次提交
    • J
      cpuidle: Measure idle state durations with monotonic clock · a474a515
      Julius Werner 提交于
      Many cpuidle drivers measure their time spent in an idle state by
      reading the wallclock time before and after idling and calculating the
      difference. This leads to erroneous results when the wallclock time gets
      updated by another processor in the meantime, adding that clock
      adjustment to the idle state's time counter.
      
      If the clock adjustment was negative, the result is even worse due to an
      erroneous cast from int to unsigned long long of the last_residency
      variable. The negative 32 bit integer will zero-extend and result in a
      forward time jump of roughly four billion milliseconds or 1.3 hours on
      the idle state residency counter.
      
      This patch changes all affected cpuidle drivers to either use the
      monotonic clock for their measurements or make use of the generic time
      measurement wrapper in cpuidle.c, which was already working correctly.
      Some superfluous CLIs/STIs in the ACPI code are removed (interrupts
      should always already be disabled before entering the idle function, and
      not get reenabled until the generic wrapper has performed its second
      measurement). It also removes the erroneous cast, making sure that
      negative residency values are applied correctly even though they should
      not appear anymore.
      Signed-off-by: NJulius Werner <jwerner@chromium.org>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Tested-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
      Acked-by: NLen Brown <len.brown@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a474a515
  4. 26 11月, 2012 1 次提交
  5. 23 11月, 2012 1 次提交
  6. 17 11月, 2012 1 次提交
  7. 15 11月, 2012 8 次提交
  8. 18 10月, 2012 3 次提交
    • D
      cpuidle/powerpc: Fix snooze state problem in the cpuidle design on pseries. · 83dac594
      Deepthi Dharwar 提交于
      Earlier without cpuidle framework on pseries, the native arch
      idle routine comprised of both snooze and nap
      states.  smt_snooze_delay variable was used to delay
      the idle process entry to deeper idle state like  nap.
      With the coming of cpuidle, this arch specific idle was replaced
      by two different idle routines, one for supporting snooze and other
      for nap. This enabled addition of more
      low level idle states on pseries in the future.
      
      On adopting the generic cpuidle framework for POWER systems,
      the decision of which idle state to choose from,  given a predicted
      idle time is taken by the menu governor based on
      target_residency and  exit_latency of the idle states.
      target_residency is the minimum time to be resident in that idle state.
      Exit_latency is time taken to exit out of idle state.
      Deeper the idle state, both the target residency and exit latency
      would be higher.
      
      In the current design, smt_snooze_delay is used as target_residency
      for the  snooze state which is incorrect, as it is not the
      minimum but the maximum duration to be in snooze state.
      This would  result in the governor in taking bad decision,
      as presently target_residency of nap < target_residency of snooze
      inspite of nap being deeper idle state.
      
      This patch aims to fix this problem by replacing the smt_snooze_delay loop
      in snooze state, with the need_resched()  as the governor is aware of
      entry and exit of various idle transitions based on which
      next idle time prediction.
      
      The governor is intelligent enough to determine the idle state the needs to
      be transitioned to and maintains a whole of heuristics including
      io load, previous idle states predictions etc for the same, based on
      which idle state entry decision is taken.
      
      With this fix, of setting target_residency of snooze to 0
      					     nap to smt_snooze_delay
      if the predicted idle time is less
      than smt_snooze_delay (target_residency of nap)
      value governor would pick snooze state, else nap. This adhers to the
      previous native idle design.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      83dac594
    • D
      cpuidle/powerpc: Fix smt_snooze_delay functionality. · 8ea959a1
      Deepthi Dharwar 提交于
      smt_snooze_delay was designed to  delay idle loop's nap entry
      in the native idle code before it got  ported over to use as part of
      the cpuidle framework.
      
      A -ve value  assigned to smt_snooze_delay should result in
      busy looping, in other words disabling the entry to nap state.
      
      	- https://lists.ozlabs.org/pipermail/linuxppc-dev/2010-May/082450.html
      
      This particular functionality can be achieved currently by
      echo 1 > /sys/devices/system/cpu/cpu*/state1/disable
      but it is broken when one assigns -ve value to  the smt_snooze_delay
      variable either via sysfs entry or ppc64_cpu util.
      
      This patch aims to fix this, by disabling nap state when smt_snooze_delay
      variable is set to -ve value.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8ea959a1
    • D
      cpuidle/powerpc: Fix target residency initialisation in pseries cpuidle · 817deb05
      Deepthi Dharwar 提交于
      Remove the redundant target residency initialisation in pseries_cpuidle_driver_init().
      This is currently over-writing the residency time updated as part of the static
      table, resulting in  all the idle states having the same target
      residency of 100us which is incorrect. This may result in the menu governor making
      wrong state decisions.
      Signed-off-by: NDeepthi Dharwar <deepthi@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      817deb05
  9. 12 10月, 2012 1 次提交
  10. 11 10月, 2012 2 次提交
  11. 09 10月, 2012 1 次提交
    • Y
      memory-hotplug: suppress "Trying to free nonexistent resource... · d760afd4
      Yasuaki Ishimatsu 提交于
      memory-hotplug: suppress "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>" warning
      
      When our x86 box calls __remove_pages(), release_mem_region() shows many
      warnings.  And x86 box cannot unregister iomem_resource.
      
        "Trying to free nonexistent resource <XXXXXXXXXXXXXXXX-YYYYYYYYYYYYYYYY>"
      
      release_mem_region() has been changed to be called in each
      PAGES_PER_SECTION by commit de7f0cba ("memory hotplug: release
      memory regions in PAGES_PER_SECTION chunks").  Because powerpc registers
      iomem_resource in each PAGES_PER_SECTION chunk.  But when I hot add
      memory on x86 box, iomem_resource is register in each _CRS not
      PAGES_PER_SECTION chunk.  So x86 box unregisters iomem_resource.
      
      The patch fixes the problem.
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Cc: Nathan Fontenot <nfont@austin.ibm.com>
      Cc: Badari Pulavarty <pbadari@us.ibm.com>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d760afd4
  12. 27 9月, 2012 1 次提交
    • A
      powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get · 7844663a
      Aneesh Kumar K.V 提交于
      Acked-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      
      =====================================
      [ BUG: bad unlock balance detected! ]
      3.6.0-rc5-00338-gcaa1d631-dirty #6 Not tainted
      -------------------------------------
      swapper/0/1 is trying to release lock (eeh_mutex) at:
      [<c000000000058218>] .eeh_add_to_parent_pe+0x318/0x410
      but there are no more locks to release!
      
      other info that might help us debug this:
      no locks held by swapper/0/1.
      
      stack backtrace:
      Call Trace:
      [c00000003e483870] [c000000000013310] .show_stack+0x70/0x1c0 (unreliable)
      [c00000003e483920] [c0000000000d8310] .print_unlock_inbalance_bug+0x110/0x120
      [c00000003e4839b0] [c0000000000d9a50] .lock_release+0x1d0/0x240
      [c00000003e483a60] [c000000000778064] .__mutex_unlock_slowpath+0xb4/0x250
      [c00000003e483b10] [c000000000058218] .eeh_add_to_parent_pe+0x318/0x410
      [c00000003e483bc0] [c00000000005a118] .pseries_eeh_of_probe+0x258/0x2f0
      [c00000003e483cc0] [c000000000032528] .traverse_pci_devices+0xa8/0x150
      [c00000003e483d70] [c000000000aa7288] .eeh_init+0xd4/0x140
      [c00000003e483e00] [c00000000000abc4] .do_one_initcall+0x64/0x1e0
      [c00000003e483ec0] [c000000000a90418] .kernel_init+0x1e8/0x2bc
      [c00000003e483f90] [c00000000002048c] .kernel_thread+0x54/0x70
      EEH: PCI Enhanced I/O Error Handling Enabled
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7844663a
  13. 18 9月, 2012 5 次提交
    • G
      powerpc/eeh: Fix crash on converting OF node to edev · 1e38b714
      Gavin Shan 提交于
      The kernel crash was reported by Alexy. He was testing some feature
      with private kernel, in which Alexy added some code in pci_pm_reset()
      to read the CSR after writting it. The bug could be reproduced on
      Fiber Channel card (Fibre Channel: Emulex Corporation Saturn-X:
      LightPulse Fibre Channel Host Adapter (rev 03)) by the following
      commands.
      
      	# echo 1 > /sys/devices/pci0004:01/0004:01:00.0/reset
      	# rmmod lpfc
      	# modprobe lpfc
      
      The history behind the test case is that those additional config
      space reading operations in pci_pm_reset() would cause EEH error,
      but we didn't detect EEH error until "modprobe lpfc". For the case,
      all the PCI devices on PCI bus (0004:01) were removed and added after
      PE reset. Then the EEH devices would be figured out again based on
      the OF nodes. Unfortunately, there were some child OF nodes under
      PCI device (0004:01:00.0), but they didn't have attached PCI_DN since
      they're invisible from PCI domain. However, we were still trying to
      convert OF node to EEH device without checking on the attached PCI_DN.
      Eventually, it caused the kernel crash as follows:
      
      Unable to handle kernel paging request for data at address 0x00000030
      Faulting instruction address: 0xc00000000004d888
      cpu 0x0: Vector: 300 (Data Access) at [c000000fc797b950]
          pc: c00000000004d888: .eeh_add_device_tree_early+0x78/0x140
          lr: c00000000004d880: .eeh_add_device_tree_early+0x70/0x140
          sp: c000000fc797bbd0
         msr: 8000000000009032
         dar: 30
       dsisr: 40000000
        current = 0xc000000fc78d9f70
        paca    = 0xc00000000edb0000   softe: 0        irq_happened: 0x00
          pid   = 2951, comm = eehd
      enter ? for help
      [c000000fc797bc50] c00000000004d848 .eeh_add_device_tree_early+0x38/0x140
      [c000000fc797bcd0] c00000000004d848 .eeh_add_device_tree_early+0x38/0x140
      [c000000fc797bd50] c000000000051b54 .pcibios_add_pci_devices+0x34/0x190
      [c000000fc797bde0] c00000000004fb10 .eeh_reset_device+0x100/0x160
      [c000000fc797be70] c0000000000502dc .eeh_handle_event+0x19c/0x300
      [c000000fc797bf00] c000000000050570 .eeh_event_handler+0x130/0x1a0
      [c000000fc797bf90] c000000000020138 .kernel_thread+0x54/0x70
      
      The patch changes of_node_to_eeh_dev() and just returns NULL if the
      passed OF node doesn't have attached PCI_DN.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      1e38b714
    • G
      powerpc/eeh: Lock module while handling EEH event · feadf7c0
      Gavin Shan 提交于
      The EEH core is talking with the PCI device driver to determine the
      action (purely reset, or PCI device removal). During the period, the
      driver might be unloaded and in turn causes kernel crash as follows:
      
      EEH: Detected PCI bus error on PHB#4-PE#10000
      EEH: This PCI device has failed 3 times in the last hour
      lpfc 0004:01:00.0: 0:2710 PCI channel disable preparing for reset
      Unable to handle kernel paging request for data at address 0x00000490
      Faulting instruction address: 0xd00000000e682c90
      cpu 0x1: Vector: 300 (Data Access) at [c000000fc75ffa20]
          pc: d00000000e682c90: .lpfc_io_error_detected+0x30/0x240 [lpfc]
          lr: d00000000e682c8c: .lpfc_io_error_detected+0x2c/0x240 [lpfc]
          sp: c000000fc75ffca0
         msr: 8000000000009032
         dar: 490
       dsisr: 40000000
        current = 0xc000000fc79b88b0
        paca    = 0xc00000000edb0380	 softe: 0	 irq_happened: 0x00
          pid   = 3386, comm = eehd
      enter ? for help
      [c000000fc75ffca0] c000000fc75ffd30 (unreliable)
      [c000000fc75ffd30] c00000000004fd3c .eeh_report_error+0x7c/0xf0
      [c000000fc75ffdc0] c00000000004ee00 .eeh_pe_dev_traverse+0xa0/0x180
      [c000000fc75ffe70] c00000000004ffd8 .eeh_handle_event+0x68/0x300
      [c000000fc75fff00] c0000000000503a0 .eeh_event_handler+0x130/0x1a0
      [c000000fc75fff90] c000000000020138 .kernel_thread+0x54/0x70
      1:mon>
      
      The patch increases the reference of the corresponding driver modules
      while EEH core does the negotiation with PCI device driver so that the
      corresponding driver modules can't be unloaded during the period and
      we're safe to refer the callbacks.
      
      Cc: stable@vger.kernel.org
      Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      feadf7c0
    • G
      powerpc/eeh: Global mutex to protect PE tree · ea81245c
      Gavin Shan 提交于
      We have missed lots of situations where the PE hierarchy tree need
      protection through the EEH global mutex. The patch fixes that for
      those public APIs implemented in eeh_pe.c. The only exception is
      eeh_pe_restore_bars() because it calls eeh_pe_dev_traverse(), which
      has been protected by the mutex.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ea81245c
    • G
      powerpc/eeh: Remove EEH PE for normal PCI hotplug · 20ee6a97
      Gavin Shan 提交于
      Function eeh_rmv_from_parent_pe() could be called by the path of
      either normal PCI hotplug, or EEH recovery. For the former case,
      we need purge the corresponding PE on removal of the associated
      PE bus.
      
      The patch tries to cover that by passing more information to function
      pcibios_remove_pci_devices() so that we know if the corresponding PE
      needs to be purged or be marked as "invalid".
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      20ee6a97
    • G
      powerpc/eeh: Introduce EEH_PE_INVALID type PE · 5efc3ad7
      Gavin Shan 提交于
      When EEH error happens on the PE whose PCI devices don't have
      attached drivers. In function eeh_handle_event(), the default
      value PCI_ERS_RESULT_NONE will be returned after iterating all
      drivers of those PCI devices belonging to the PE. Actually, we
      don't have installed drivers for the PCI devices. Under the
      circumstance, we will remove the corresponding PCI bus of the PE,
      including the associated EEH devices and PE instance. However,
      we still need the information stored in the PE instance to do PE
      reset after that. So it's unsafe to free the PE instance.
      
      The patch introduces EEH_PE_INVALID type PE to address the issue.
      When the PCI bus and the corresponding attached EEH devices are
      removed, we will mark the PE as EEH_PE_INVALID. At later point,
      the PE will be changed to EEH_PE_DEVICE or EEH_PE_BUS when the
      corresponding EEH devices are attached again.
      Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5efc3ad7
  14. 17 9月, 2012 1 次提交
  15. 10 9月, 2012 12 次提交