1. 30 10月, 2009 1 次提交
  2. 14 10月, 2009 1 次提交
    • A
      powerpc: Fix hypervisor TLB batching · b6dcde5c
      Anton Blanchard 提交于
      Profiling of a page fault scalability microbenchmark shows flush_hash_range
      is not calling the batch hpte invalidate hcall (H_BULK_REMOVE).
      
      It turns out we have a duplicate firmware feature for hcall-bulk and the
      current setup code stops after finding the first match. This meant we never
      batch and always do individual invalidates.
      
      The patch below removes the duplicate and shifts FW_FEATURE_CMO to close
      the gap. With the patch applied the single threaded page fault rate improves
      from 217169 to 238755 per second on a POWER5 test box, a 10% improvement.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      b6dcde5c
  3. 02 10月, 2009 1 次提交
  4. 24 9月, 2009 2 次提交
  5. 23 9月, 2009 1 次提交
  6. 11 9月, 2009 1 次提交
    • P
      powerpc: Fix bug where perf_counters breaks oprofile · a6dbf93a
      Paul Mackerras 提交于
      Currently there is a bug where if you use oprofile on a pSeries
      machine, then use perf_counters, then use oprofile again, oprofile
      will not work correctly; it will lose the PMU configuration the next
      time the hypervisor does a partition context switch, and thereafter
      won't count anything.
      
      Maynard Johnson identified the sequence causing the problem:
      - oprofile setup calls ppc_enable_pmcs(), which calls
        pseries_lpar_enable_pmcs, which tells the hypervisor that we want
        to use the PMU, and sets the "PMU in use" flag in the lppaca.
        This flag tells the hypervisor whether it needs to save and restore
        the PMU config.
      - The perf_counter code sets and clears the "PMU in use" flag directly
        as it context-switches the PMU between tasks, and leaves it clear
        when it finishes.
      - oprofile setup, called for a new oprofile run, calls ppc_enable_pmcs,
        which does nothing because it has already been called.  In particular
        it doesn't set the "PMU in use" flag.
      
      This fixes the problem by arranging for ppc_enable_pmcs to always set
      the "PMU in use" flag.  It makes the perf_counter code call
      ppc_enable_pmcs also rather than calling the lower-level function
      directly, and removes the setting of the "PMU in use" flag from
      pseries_lpar_enable_pmcs, since that is now done in its caller.
      
      This also removes the declaration of pasemi_enable_pmcs because it
      isn't defined anywhere.
      Reported-by: NMaynard Johnson <mpjohn@us.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: <stable@kernel.org)
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a6dbf93a
  7. 10 9月, 2009 1 次提交
  8. 02 9月, 2009 2 次提交
  9. 20 8月, 2009 1 次提交
  10. 08 7月, 2009 2 次提交
  11. 26 6月, 2009 1 次提交
  12. 17 6月, 2009 1 次提交
  13. 09 6月, 2009 1 次提交
  14. 02 6月, 2009 1 次提交
    • A
      powerpc: Convert RTAS event scan from kernel thread to workqueue · f8729e85
      Anton Blanchard 提交于
      RTAS event scan has to run across all cpus. Right now we use a kernel
      thread and set_cpus_allowed but in doing so we wake up the previous cpu
      unnecessarily.
      
      Some ftrace output shows this:
      
      previous cpu (2):
      [002]  7.022331: sched_switch: task swapper:0 [140] ==> rtasd:194 [120]
      [002]  7.022338: sched_switch: task rtasd:194 [120] ==> migration/2:9 [0]
      [002]  7.022344: sched_switch: task migration/2:9 [0] ==> swapper:0 [140]
      
      next cpu (3):
      [003]  7.022345: sched_switch: task swapper:0 [140] ==> rtasd:194 [120]
      [003]  7.022371: sched_switch: task rtasd:194 [120] ==> swapper:0 [140]
      
      We can use schedule_delayed_work_on and avoid the unnecessary wakeup.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      f8729e85
  15. 21 5月, 2009 2 次提交
  16. 28 4月, 2009 1 次提交
    • Y
      irq: change ->set_affinity() to return status · d5dedd45
      Yinghai Lu 提交于
      according to Ingo, change set_affinity() in irq_chip should return int,
      because that way we can handle failure cases in a much cleaner way, in
      the genirq layer.
      
      v2: fix two typos
      
      [ Impact: extend API ]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: linux-arch@vger.kernel.org
      LKML-Reference: <49F654E9.4070809@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      d5dedd45
  17. 15 4月, 2009 2 次提交
    • S
      powerpc: pseries/dtl.c should include asm/firmware.h · b71a0c29
      Sachin Sant 提交于
      A randconfig build on powerpc failed with:
      
      dtl.c: In function 'dtl_init':
      dtl.c:238: error: implicit declaration of function 'firmware_has_feature'
      dtl.c:238: error: 'FW_FEATURE_SPLPAR' undeclared (first use in this function)
      
      - We need firmware.h for these definitions.
      Signed-off-by: NSachin Sant <sachinp@in.ibm.com>
      Signed-off-by: NJeremy Kerr <jk@ozlabs.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      b71a0c29
    • M
      powerpc/pseries: Set error_state to pci_channel_io_normal in eeh_report_reset() · c58dc575
      Mike Mason 提交于
      While adding native EEH support to Emulex and Qlogic drivers, it was
      discovered that dev->error_state was set to pci_io_channel_normal too
      late in the recovery process. These drivers rely on error_state to
      determine if they can access the device in their slot_reset callback,
      thus error_state needs to be set to pci_io_channel_normal in
      eeh_report_reset(). Below is a detailed explanation (courtesy of Richard
      Lary) as to why this is necessary.
      
      Background:
      PCI MMIO or DMA accesses to a frozen slot generate additional EEH
      errors. If the number of additional EEH errors exceeds EEH_MAX_FAILS the
      adapter will be shutdown. To avoid triggering excessive EEH errors and
      an undesirable adapter shutdown, some drivers use the
      pci_channel_offline(dev) wrapper function to return a Boolean value
      based on the value of pci_dev->error_state to determine if PCI MMIO or
      DMA accesses are safe. If the wrapper returns TRUE, drivers must not
      make PCI MMIO or DMA access to their hardware.
      
      The pci_dev structure member error_state reflects one of three values,
      1) pci_channel_io_normal, 2) pci_channel_io_frozen, 3)
      pci_channel_io_perm_failure.  Function pci_channel_offline(dev) returns
      TRUE if error_state is pci_channel_io_frozen or pci_channel_io_perm_failure.
      
      The EEH driver sets pci_dev->error_state to pci_channel_io_frozen at the
      point where the PCI slot is frozen. Currently, the EEH driver restores
      dev->error_state to pci_channel_io_normal in eeh_report_resume() before
      calling the driver's resume callback. However, when the EEH driver calls
      the driver's slot_reset callback() from eeh_report_reset(), it
      incorrectly indicates the error state is still pci_channel_io_frozen.
      
      Waiting until eeh_report_resume() to restore dev->error_state to
      pci_channel_io_normal is too late for Emulex and QLogic FC drivers and
      any other drivers which are designed to use common code paths in these
      two cases: i) those called after the driver's slot_reset callback() and
      ii) those called after the PCI slot is frozen but before the driver's
      slot_reset callback is called. Case i) all driver paths executed to
      reinitialize the hardware after a reset and case ii) all code paths
      executed by driver kernel threads that run asynchronous to the main
      driver thread, such as interrupt handlers and worker threads to process
      driver work queues.
      
      Emulex and QLogic FC drivers are designed with common code paths which
      require that pci_channel_offline(dev) reflect the true state of the
      hardware. The state transitions that the hardware takes from Normal
      Operations to Slot Frozen to Reset to Normal Operations are documented
      in the Power Architecture™ Platform Requirements+ (PAPR+) in Table 75.
      PE State Control.
      
      PAPR defines the following 3 states:
      
      0 -- Not reset, Not EEH stopped, MMIO load/store allowed, DMA allowed
           (Normal Operations)
      1 -- Reset, Not EEH stopped, MMIO load/store disabled, DMA disabled
      2 -- Not reset, EEH stopped, MMIO load/store disabled, DMA disabled
           (Slot Frozen)
      
      An EEH error places the slot in state 2 (Frozen) and the adapter driver
      is notified that an EEH error was detected. If the adapter driver
      returns PCI_ERS_RESULT_NEED_RESET, the EEH driver calls
      eeh_reset_device() to place the slot into state 1 (Reset) and
      eeh_reset_device completes by placing the slot into State 0 (Normal
      Operations). Upon return from eeh_reset_device(), the EEH driver calls
      eeh_report_reset, which then calls the adapter's slot_reset callback. At
      the time the adapter's slot_reset callback is called, the true state of
      the hardware is Normal Operations and should be accurately reflected by
      setting dev->error_state to pci_channel_io_normal.
      
      The current implementation of EEH driver does not do so and requires
      this change to correct this deficiency.
      Signed-off-by: NMike Mason <mmlnx@us.ibm.com>
      Acked-by: NLinas Vepstas <linasvepstas@gmail.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      c58dc575
  18. 27 3月, 2009 1 次提交
  19. 24 3月, 2009 2 次提交
  20. 11 3月, 2009 3 次提交
  21. 23 2月, 2009 2 次提交
    • M
      powerpc/pseries: Implement a quota system for MSIs · 448e2ca0
      Michael Ellerman 提交于
      There are hardware limitations on the number of available MSIs,
      which firmware expresses using a property named "ibm,pe-total-#msi".
      This property tells us how many MSIs are available for devices below
      the point in the PCI tree where we find the property.
      
      For old firmwares which don't have the property, we assume there are
      8 MSIs available per "partitionable endpoint" (PE). The PE can be
      found using existing EEH code, which uses the methods described in
      PAPR. For our purposes we want the parent of the node that's
      identified using this method.
      
      When a driver requests n MSIs for a device, we first establish where
      the "ibm,pe-total-#msi" property above that device is, or we find the
      PE if the property is not found. In both cases we call this node
      the "pe_dn".
      
      We then count all non-bridge devices below the pe_dn, to establish
      how many devices in total may need MSIs. The quota is then simply the
      total available divided by the number of devices, if the request is
      less than or equal to the quota, the request is fine and we're done.
      
      If the request is greater than the quota, we try to determine if there
      are any "spare" MSIs which we can give to this device. Spare MSIs are
      found by looking for other devices which can never use their full
      quota, because their "req#msi(-x)" property is less than the quota.
      
      If we find any spare, we divide the spares by the number of devices
      that could request more than their quota. This ensures the spare
      MSIs are spread evenly amongst all over-quota requestors.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      448e2ca0
    • M
      powerpc/pseries: Return req#msi(-x) if request is larger · d523cc37
      Michael Ellerman 提交于
      If a driver asks for more MSIs than the devices "req#msi(-x)" property,
      we currently return -ENOSPC. This doesn't give the driver any chance to
      make a new request with a number that might work.
      
      So if "req#msi(-x)" is less than the request, return its value. To be
      100% safe, make sure we return an error if req_msi == 0.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d523cc37
  22. 11 2月, 2009 6 次提交
  23. 10 2月, 2009 1 次提交
  24. 28 1月, 2009 1 次提交
  25. 13 1月, 2009 2 次提交