1. 25 7月, 2008 13 次提交
    • N
      powerpc/pseries: Verify CMO memory entitlement updates with virtual I/O · 22e1a4dd
      Nathan Fontenot 提交于
      Verify memory entitlement updates can be handled by vio.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      22e1a4dd
    • R
      powerpc/pseries: vio bus support for CMO · a90ab95a
      Robert Jennings 提交于
      This is a large patch but the normal code path is not affected.  For
      non-pSeries platforms the code is ifdef'ed out and for non-CMO enabled
      pSeries systems this does not affect the normal code path.  Devices that
      do not perform DMA operations do not need modification with this patch.
      The function get_desired_dma was renamed from get_io_entitlement for
      clarity.
      
      Overview
      
      Cooperative Memory Overcommitment (CMO) allows for a set of OS partitions
      to be run with less RAM than the aggregate needs of the group of
      partitions.  The firmware will balance memory between the partitions
      and page in/out memory as needed.  Based on the number and type of IO
      adpaters preset each partition is allocated an amount of memory for
      DMA operations and this allocation will be guaranteed to the partition;
      this is referred to as the partition's 'entitlement'.
      
      Partitions running in a CMO environment can only have virtual IO devices
      present.  The VIO bus layer will manage the IO entitlement for the system.
      Accounting, at a system and per-device level, is tracked in the VIO bus
      code and exposed via sysfs.  A set of dma_ops functions are added to
      the bus to allow for this accounting.
      
      Bus initialization
      
      At initialization, the bus will calculate the minimum needs of the system
      based on providing each device present with a standard minimum entitlement
      along with a spare allocation for the bus to handle hotplug events.
      If the minimum needs can not be met the system boot will be halted.
      
      Device changes
      
      The significant changes for devices while running under CMO are that the
      devices must specify how much dedicated IO entitlement they desire and
      must also handle DMA mapping errors that can occur due to constrained
      IO memory.  The virtual IO drivers are modified to silence errors when
      DMA mappings fail for CMO and handle these failures gracefully.
      
      Each devices will be guaranteed a minimum entitlement that can always
      be mapped.  Devices will specify how much entitlement they desire and
      the VIO bus will attempt to provide for this.  Devices can change their
      desired entitlement level at any point in time to address particular needs
      (via vio_cmo_set_dev_desired()), not just at device probe time.
      
      VIO bus changes
      
      The system will have a particular entitlement level available from which
      it can provide memory to the devices.  The bus defines two pools of memory
      within this entitlement, the reserved and excess pools.  Each device is
      provided with it's own entitlement no less than a system defined minimum
      entitlement and no greater than what the device has specified as it's
      desired entitlement.  The entitlement provided to devices comes from the
      reserve pool.  The reserve pool can also contain a spare allocation as
      large as the system defined minimum entitlement which is used for device
      hotplug events.  Any entitlement not needed to fulfill the needs of a
      reserve pool is placed in the excess pool.  Each device is guaranteed
      that it can map up to it's entitled level; additional mapping are possible
      as long as there is unmapped memory in the excess pool.
      
      Bus probe
      
      As the system starts, each device is given an entitlement equal only
      to the system defined minimum entitlement.  The reserve pool is equal
      to the sum of these entitlements, plus a spare allocation.  The VIO bus
      also tracks the aggregate desired entitlement of all the devices.  If the
      system desired entitlement is greater than the size of the reserve pool,
      when devices unmap IO memory it will be reserved and a balance operation
      will be scheduled for some time in the future.
      
      Entitlement balancing
      
      The balance function tries to fairly distribute entitlement between the
      devices in the system with the goal of providing each device with it's
      desired amount of entitlement.  Devices using more than what would be
      ideal will have their entitled set-point adjusted; this will effectively
      set a goal for lower IO memory usage as future mappings can fail and
      deallocations will trigger a balance operation to distribute the newly
      unmapped memory.  A fair distribution of entitlement can take several
      balance operations to achieve.  Entitlement changes and device DLPAR
      events will alter the state of CMO and will trigger balance operations.
      
      Hotplug events
      
      The VIO bus allows for changes in system entitlement at run-time via
      'vio_cmo_entitlement_update()'.  When devices are added the hotplug
      device event will be preceded by a system entitlement increase and this
      is reversed when devices are removed.
      
      The following changes are made that the VIO bus layer for CMO:
       * add IO memory accounting per device structure.
       * add IO memory entitlement query function to driver structure.
       * during vio bus probe, if CMO is enabled, check that driver has
         memory entitlement query function defined.  Fail if function not defined.
       * fail to register driver if io entitlement function not defined.
       * create set of dma_ops at vio level for CMO that will track allocations
         and return DMA failures once entitlement is reached.  Entitlement will
         limited by overall system entitlement.  Devices will have a reserved
         quantity of memory that is guaranteed, the rest can be used as available.
       * expose entitlement, current allocation, desired allocation, and the
         allocation error counter for devices to the user through sysfs
       * provide mechanism for changing a device's desired entitlement at run time
         for devices as an exported function and sysfs tunable
       * track any DMA failures for entitled IO memory for each vio device.
       * check entitlement against available system entitlement on device add
       * track entitlement metrics (high water mark, current usage)
       * provide function to reset high water mark
       * provide minimum and desired entitlement numbers at a bus level
       * provide drivers with a minimum guaranteed entitlement
       * balance available entitlement between devices to satisfy their needs
       * handle system entitlement changes and device hotplug
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a90ab95a
    • R
      powerpc/pseries: iommu enablement for CMO · 6490c490
      Robert Jennings 提交于
      To support Cooperative Memory Overcommitment (CMO), we need to check
      for failure from some of the tce hcalls.
      
      These changes for the pseries platform affect the powerpc architecture;
      patches for the other affected platforms are included in this patch.
      
      pSeries platform IOMMU code changes:
       * platform TCE functions must handle H_NOT_ENOUGH_RESOURCES errors and
         return an error.
      
      Architecture IOMMU code changes:
       * Calls to ppc_md.tce_build need to check return values and return
         DMA_MAPPING_ERROR for transient errors.
      
      Architecture changes:
       * struct machdep_calls for tce_build*_pSeriesLP functions need to change
         to indicate failure.
       * all other platforms will need updates to iommu functions to match the new
         calling semantics; they will return 0 on success.  The other platforms
         default configs have been built, but no further testing was performed.
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NOlof Johansson <olof@lixom.net>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6490c490
    • B
      powerpc/pseries: Add CMO paging statistics · ffa5abbd
      Brian King 提交于
      With the addition of Cooperative Memory Overcommitment (CMO) support
      for IBM Power Systems, two fields have been added to the VPA to report
      paging statistics.  Add support in lparcfg to report them to userspace.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ffa5abbd
    • R
      powerpc/pseries: Split retrieval of processor entitlement data into a helper routine · 398778f7
      Robert Jennings 提交于
      Split the retrieval of processor entitlement data returned in the H_GET_PPP
      hcall into its own helper routine.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      398778f7
    • N
      powerpc/pseries: Add memory entitlement capabilities to /proc/ppc64/lparcfg · dfc3403f
      Nathan Fontenot 提交于
      Update /proc/ppc64/lparcfg to display Cooperative Memory
      Overcommitment statistics as reported by the H_GET_MPP hcall.  This
      also updates the lparcfg interface to allow setting memory entitlement
      and weight.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      dfc3403f
    • N
      powerpc/pseries: Split processor entitlement retrieval and gathering to helper routines · 11529396
      Nathan Fotenot 提交于
      Split the retrieval and setting of processor entitlement and weight into
      helper routines.  This also removes the printing of the raw values
      returned from h_get_ppp, the values are already parsed and printed.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      11529396
    • N
      powerpc/pseries: Remove extraneous error reporting for hcall failures in lparcfg · 545500b3
      Nathan Fontenot 提交于
      Remove the extraneous error reporting used when a hcall made from lparcfg fails.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      545500b3
    • S
      powerpc: Fix compile error with binutils 2.15 · 80c60bf9
      Segher Boessenkool 提交于
      My previous patch to fix compilation with binutils-2.17 causes
      a "file truncated" build error from ld with binutils 2.15 (and
      possibly older), and a warning with 2.16 and 2.17.
      
      This fixes it.
      Signed-off-by: NSegher Boessenkool <segher@kernel.crashing.org>
      Acked-by: NChuck Meade <chuckmeade@mindspring.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      80c60bf9
    • L
      powerpc: BookE hardware watchpoint support · d6a61bfc
      Luis Machado 提交于
      This patch implements support for HW based watchpoint via the
      DBSR_DAC (Data Address Compare) facility of the BookE processors.
      
      It does so by interfacing with the existing DABR breakpoint code
      and adding the necessary bits and pieces for the new bits to
      be properly set or cleared
      Signed-off-by: NLuis Machado <luisgpm@br.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d6a61bfc
    • S
      powerpc: Fallout from sysdev API changes · 00bf6e90
      Stephen Rothwell 提交于
      A struct sysdev_attribute * parameter was added to the show routine by
      commit 4a0b2b4d "sysdev: Pass the
      attribute to the low level sysdev show/store function".
      
      This eliminates a warning:
      
      arch/powerpc/kernel/sysfs.c:538: warning: initialization from incompatible pointer type
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      00bf6e90
    • N
      powerpc: Enable AT_BASE_PLATFORM aux vector · 9115d134
      Nathan Lynch 提交于
      Stash the first platform string matched by identify_cpu() in
      powerpc_base_platform, and supply that to the ELF loader for the value
      of AT_BASE_PLATFORM.
      Signed-off-by: NNathan Lynch <ntl@pobox.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9115d134
    • A
      PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures · 27ac792c
      Andrea Righi 提交于
      On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
      boundary. For example:
      
      	u64 val = PAGE_ALIGN(size);
      
      always returns a value < 4GB even if size is greater than 4GB.
      
      The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
      example):
      
      #define PAGE_SHIFT      12
      #define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
      #define PAGE_MASK       (~(PAGE_SIZE-1))
      ...
      #define PAGE_ALIGN(addr)       (((addr)+PAGE_SIZE-1)&PAGE_MASK)
      
      The "~" is performed on a 32-bit value, so everything in "and" with
      PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
      Using the ALIGN() macro seems to be the right way, because it uses
      typeof(addr) for the mask.
      
      Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
      include/linux/mm.h.
      
      See also lkml discussion: http://lkml.org/lkml/2008/6/11/237
      
      [akpm@linux-foundation.org: fix drivers/media/video/uvc/uvc_queue.c]
      [akpm@linux-foundation.org: fix v850]
      [akpm@linux-foundation.org: fix powerpc]
      [akpm@linux-foundation.org: fix arm]
      [akpm@linux-foundation.org: fix mips]
      [akpm@linux-foundation.org: fix drivers/media/video/pvrusb2/pvrusb2-dvb.c]
      [akpm@linux-foundation.org: fix drivers/mtd/maps/uclinux.c]
      [akpm@linux-foundation.org: fix powerpc]
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27ac792c
  2. 24 7月, 2008 1 次提交
    • J
      kgdb, powerpc: arch specific powerpc kgdb support · 17ce452f
      Jason Wessel 提交于
      This patch removes the old kgdb reminants from ARCH=powerpc and
      implements the new style arch specific stub for the common kgdb core
      interface.
      
      It is possible to have xmon and kgdb in the same kernel, but you
      cannot use both at the same time because there is only one set of
      debug hooks.
      
      The arch specific kgdb implementation saves the previous state of the
      debug hooks and restores them if you unconfigure the kgdb I/O driver.
      Kgdb should have no impact on a kernel that has no kgdb I/O driver
      configured.
      Signed-off-by: NJason Wessel <jason.wessel@windriver.com>
      17ce452f
  3. 22 7月, 2008 6 次提交
  4. 19 7月, 2008 1 次提交
    • T
      nohz: prevent tick stop outside of the idle loop · b8f8c3cf
      Thomas Gleixner 提交于
      Jack Ren and Eric Miao tracked down the following long standing
      problem in the NOHZ code:
      
      	scheduler switch to idle task
      	enable interrupts
      
      Window starts here
      
      	----> interrupt happens (does not set NEED_RESCHED)
      	      	irq_exit() stops the tick
      
      	----> interrupt happens (does set NEED_RESCHED)
      
      	return from schedule()
      	
      	cpu_idle(): preempt_disable();
      
      Window ends here
      
      The interrupts can happen at any point inside the race window. The
      first interrupt stops the tick, the second one causes the scheduler to
      rerun and switch away from idle again and we end up with the tick
      disabled.
      
      The fact that it needs two interrupts where the first one does not set
      NEED_RESCHED and the second one does made the bug obscure and extremly
      hard to reproduce and analyse. Kudos to Jack and Eric.
      
      Solution: Limit the NOHZ functionality to the idle loop to make sure
      that we can not run into such a situation ever again.
      
      cpu_idle()
      {
      	preempt_disable();
      
      	while(1) {
      		 tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we
      		 			          are in the idle loop
      
      		 while (!need_resched())
      		       halt();
      
      		 tick_nohz_restart_sched_tick(); <- disables NOHZ mode
      		 preempt_enable_no_resched();
      		 schedule();
      		 preempt_disable();
      	}
      }
      
      In hindsight we should have done this forever, but ... 
      
      /me grabs a large brown paperbag.
      
      Debugged-by: Jack Ren <jack.ren@marvell.com>, 
      Debugged-by: Neric miao <eric.y.miao@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      b8f8c3cf
  5. 17 7月, 2008 2 次提交
  6. 15 7月, 2008 5 次提交
  7. 14 7月, 2008 2 次提交
    • S
      generic-ipi: powerpc/generic-ipi tree build failure · 7798ed0f
      Stephen Rothwell 提交于
      Today's linux-next build (powerpc allmodconfig) failed like this:
      
      ERROR: ".save_stack_trace" [tests/backtracetest.ko] undefined!
      
      But save_stack_trace is exported in arch/powerpc/kernel/stacktrace.c
      
      I couldn't figure it out until I noticed these earlier warnings:
      
      arch/powerpc/kernel/stacktrace.c:47: warning: data definition has no type or storage class
      arch/powerpc/kernel/stacktrace.c:47: warning: type defaults to 'int' in declaration of 'EXPORT_SYMBOL_GPL'
      arch/powerpc/kernel/stacktrace.c:47: warning: parameter names (without types) in function declaration
      
      I applied the patch below.
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: <linuxppc-dev@ozlabs.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7798ed0f
    • K
      powerpc/booke: don't reinitialize time base · ddb107e9
      Kumar Gala 提交于
      For some reason long ago I decided that we should zero out the time base
      when we calibrate the decrementer.  The problem is that this can be
      harmful in SMP systems where the firmware has already synchronized the
      time bases on the various cores.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      ddb107e9
  8. 10 7月, 2008 1 次提交
  9. 09 7月, 2008 9 次提交