1. 20 8月, 2008 3 次提交
    • B
      powerpc: Fix vio_bus_probe oops on probe error · cd5aeb9f
      Brian King 提交于
      When CMO is enabled and booted on a non CMO system and the VIO
      device's probe function fails, an oops can result since
      vio_cmo_bus_remove is called when it should not.  This fixes it by
      avoiding the vio_cmo_bus_remove call on platforms that don't implement
      CMO.
      
      cpu 0x0: Vector: 300 (Data Access) at [c00000000e13b3d0]
          pc: c000000000020d34: .vio_cmo_bus_remove+0xc0/0x1f4
          lr: c000000000020ca4: .vio_cmo_bus_remove+0x30/0x1f4
          sp: c00000000e13b650
         msr: 8000000000009032
         dar: 0
       dsisr: 40000000
        current = 0xc00000000e0566c0
        paca    = 0xc0000000006f9b80
          pid   = 2428, comm = modprobe
      enter ? for help
      [c00000000e13b6e0] c000000000021d94 .vio_bus_probe+0x2f8/0x33c
      [c00000000e13b7a0] c00000000029fc88 .driver_probe_device+0x13c/0x200
      [c00000000e13b830] c00000000029fdac .__driver_attach+0x60/0xa4
      [c00000000e13b8c0] c00000000029f050 .bus_for_each_dev+0x80/0xd8
      [c00000000e13b980] c00000000029f9ec .driver_attach+0x28/0x40
      [c00000000e13ba00] c00000000029f630 .bus_add_driver+0xd4/0x284
      [c00000000e13baa0] c0000000002a01bc .driver_register+0xc4/0x198
      [c00000000e13bb50] c00000000002168c .vio_register_driver+0x40/0x5c
      [c00000000e13bbe0] d0000000003b3f1c .ibmvfc_module_init+0x70/0x109c [ibmvfc]
      [c00000000e13bc70] c0000000000acf08 .sys_init_module+0x184c/0x1a10
      [c00000000e13be30] c000000000008748 syscall_exit+0x0/0x40
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      cd5aeb9f
    • J
      powerpc/ibmebus: Restore "name" sysfs attribute on ibmebus devices · 4589f1fe
      Joachim Fenkes 提交于
      Recent of_platform changes made of_bus_type_init() overwrite the bus
      type's .dev_attrs list, meaning that the "name" attribute that ibmebus
      devices previously had is no longer present.  This is a user-visible
      regression which breaks the userspace eHCA support, since the eHCA
      userspace driver relies on the name attribute to check for valid
      adapters.
      
      This fixes it by providing the "name" attribute in the generic OF
      device code instead.  Tested on POWER.
      Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      4589f1fe
    • M
      powerpc: Fix /dev/oldmem interface for kdump · 7230ced4
      Michael Ellerman 提交于
      A change to __ioremap() broke reading /dev/oldmem because we're no
      longer able to ioremap pfn 0 (d177c207, "[PATCH] powerpc: IOMMU: don't
      ioremap null addresses").
      
      We actually don't need to ioremap for anything that's part of the linear
      mapping, so just read it directly.
      
      Also make sure we're only reading one page or less at a time.
      Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
      Signed-off-by: NSachin Sant <sachinp@in.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      7230ced4
  2. 18 8月, 2008 5 次提交
  3. 15 8月, 2008 1 次提交
  4. 11 8月, 2008 2 次提交
  5. 04 8月, 2008 1 次提交
  6. 30 7月, 2008 4 次提交
  7. 28 7月, 2008 14 次提交
  8. 27 7月, 2008 2 次提交
    • A
      SL*B: drop kmem cache argument from constructor · 51cc5068
      Alexey Dobriyan 提交于
      Kmem cache passed to constructor is only needed for constructors that are
      themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
      passed kmem cache in non-trivial way, so pass only pointer to object.
      
      Non-trivial places are:
      	arch/powerpc/mm/init_64.c
      	arch/powerpc/mm/hugetlbpage.c
      
      This is flag day, yes.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Matt Mackall <mpm@selenic.com>
      [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
      [akpm@linux-foundation.org: fix mm/slab.c]
      [akpm@linux-foundation.org: fix ubifs]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51cc5068
    • H
      kexec jump · 3ab83521
      Huang Ying 提交于
      This patch provides an enhancement to kexec/kdump.  It implements the
      following features:
      
      - Backup/restore memory used by the original kernel before/after
        kexec.
      
      - Save/restore CPU state before/after kexec.
      
      The features of this patch can be used as a general method to call program in
      physical mode (paging turning off).  This can be used to call BIOS code under
      Linux.
      
      kexec-tools needs to be patched to support kexec jump. The patches and
      the precompiled kexec can be download from the following URL:
      
             source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
             patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
             binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10
      
      Usage example of calling some physical mode code and return:
      
      1. Compile and install patched kernel with following options selected:
      
      CONFIG_X86_32=y
      CONFIG_KEXEC=y
      CONFIG_PM=y
      CONFIG_KEXEC_JUMP=y
      
      2. Build patched kexec-tool or download the pre-built one.
      
      3. Build some physical mode executable named such as "phy_mode"
      
      4. Boot kernel compiled in step 1.
      
      5. Load physical mode executable with /sbin/kexec. The shell command
         line can be as follow:
      
         /sbin/kexec --load-preserve-context --args-none phy_mode
      
      6. Call physical mode executable with following shell command line:
      
         /sbin/kexec -e
      
      Implementation point:
      
      To support jumping without reserving memory.  One shadow backup page (source
      page) is allocated for each page used by kexeced code image (destination
      page).  When do kexec_load, the image of kexeced code is loaded into source
      pages, and before executing, the destination pages and the source pages are
      swapped, so the contents of destination pages are backupped.  Before jumping
      to the kexeced code image and after jumping back to the original kernel, the
      destination pages and the source pages are swapped too.
      
      C ABI (calling convention) is used as communication protocol between
      kernel and called code.
      
      A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
      indicate that the loaded kernel image is used for jumping back.
      
      Now, only the i386 architecture is supported.
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3ab83521
  9. 26 7月, 2008 3 次提交
    • N
      powerpc: Fix boot problem due to AT_BASE_PLATFORM change · fc532f81
      Nathan Lynch 提交于
      Commit 9115d134 ("powerpc: Enable
      AT_BASE_PLATFORM aux vector") broke boot on 32-bit powerpc systems; we
      have to use PTRRELOC to initialize powerpc_base_platform this early in
      boot.
      
      Bug reported by Jon Smirl.
      Signed-off-by: NNathan Lynch <ntl@pobox.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fc532f81
    • K
      powerpc: clean up the Book-E HW watchpoint support · 0b21bb49
      Kumar Gala 提交于
      * CONFIG_BOOKE is selected by CONFIG_44x so we dont need both
      * Fixed a few comments
      * Go back to only using DBCR0_IDM to determine if we are using
        debug resources.
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      0b21bb49
    • S
      kprobes: improve kretprobe scalability with hashed locking · ef53d9c5
      Srinivasa D S 提交于
      Currently list of kretprobe instances are stored in kretprobe object (as
      used_instances,free_instances) and in kretprobe hash table.  We have one
      global kretprobe lock to serialise the access to these lists.  This causes
      only one kretprobe handler to execute at a time.  Hence affects system
      performance, particularly on SMP systems and when return probe is set on
      lot of functions (like on all systemcalls).
      
      Solution proposed here gives fine-grain locks that performs better on SMP
      system compared to present kretprobe implementation.
      
      Solution:
      
       1) Instead of having one global lock to protect kretprobe instances
          present in kretprobe object and kretprobe hash table.  We will have
          two locks, one lock for protecting kretprobe hash table and another
          lock for kretporbe object.
      
       2) We hold lock present in kretprobe object while we modify kretprobe
          instance in kretprobe object and we hold per-hash-list lock while
          modifying kretprobe instances present in that hash list.  To prevent
          deadlock, we never grab a per-hash-list lock while holding a kretprobe
          lock.
      
       3) We can remove used_instances from struct kretprobe, as we can
          track used instances of kretprobe instances using kretprobe hash
          table.
      
      Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
      with return probes set on all systemcalls looks like this.
      
      cacheline              non-cacheline             Un-patched kernel
      aligned patch 	       aligned patch
      ===============================================================================
      real    9m46.784s       9m54.412s                  10m2.450s
      user    40m5.715s       40m7.142s                  40m4.273s
      sys     2m57.754s       2m58.583s                  3m17.430s
      ===========================================================
      
      Time duration for kernel compilation ("make -j 8) on the same system, when
      kernel is not probed.
      =========================
      real    9m26.389s
      user    40m8.775s
      sys     2m7.283s
      =========================
      Signed-off-by: NSrinivasa DS <srinivasa@in.ibm.com>
      Signed-off-by: NJim Keniston <jkenisto@us.ibm.com>
      Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef53d9c5
  10. 25 7月, 2008 5 次提交
    • N
      powerpc/pseries: Remove kmalloc call in handling writes to lparcfg · 16c14b46
      Nathan Fontenot 提交于
      There are only 4 valid name=value pairs for writes to
      /proc/ppc64/lparcfg.  Current code allocates a buffer to copy
      this information in from the user.  Since the longest name=value
      pair will easily fit into a buffer of 64 characters, simply
      put the buffer on the stack instead of allocating the buffer.
      Signed-off-by: NNathan Fotenot <nfont@austin.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      16c14b46
    • N
      powerpc/pseries: Update arch vector to indicate support for CMO · 8391e42a
      Nathan Fontenot 提交于
      Update the architecture vector to indicate that Cooperative Memory
      Overcommitment is supported if CONFIG_PPC_SMLPAR is set.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8391e42a
    • N
      powerpc/pseries: Verify CMO memory entitlement updates with virtual I/O · 22e1a4dd
      Nathan Fontenot 提交于
      Verify memory entitlement updates can be handled by vio.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      22e1a4dd
    • R
      powerpc/pseries: vio bus support for CMO · a90ab95a
      Robert Jennings 提交于
      This is a large patch but the normal code path is not affected.  For
      non-pSeries platforms the code is ifdef'ed out and for non-CMO enabled
      pSeries systems this does not affect the normal code path.  Devices that
      do not perform DMA operations do not need modification with this patch.
      The function get_desired_dma was renamed from get_io_entitlement for
      clarity.
      
      Overview
      
      Cooperative Memory Overcommitment (CMO) allows for a set of OS partitions
      to be run with less RAM than the aggregate needs of the group of
      partitions.  The firmware will balance memory between the partitions
      and page in/out memory as needed.  Based on the number and type of IO
      adpaters preset each partition is allocated an amount of memory for
      DMA operations and this allocation will be guaranteed to the partition;
      this is referred to as the partition's 'entitlement'.
      
      Partitions running in a CMO environment can only have virtual IO devices
      present.  The VIO bus layer will manage the IO entitlement for the system.
      Accounting, at a system and per-device level, is tracked in the VIO bus
      code and exposed via sysfs.  A set of dma_ops functions are added to
      the bus to allow for this accounting.
      
      Bus initialization
      
      At initialization, the bus will calculate the minimum needs of the system
      based on providing each device present with a standard minimum entitlement
      along with a spare allocation for the bus to handle hotplug events.
      If the minimum needs can not be met the system boot will be halted.
      
      Device changes
      
      The significant changes for devices while running under CMO are that the
      devices must specify how much dedicated IO entitlement they desire and
      must also handle DMA mapping errors that can occur due to constrained
      IO memory.  The virtual IO drivers are modified to silence errors when
      DMA mappings fail for CMO and handle these failures gracefully.
      
      Each devices will be guaranteed a minimum entitlement that can always
      be mapped.  Devices will specify how much entitlement they desire and
      the VIO bus will attempt to provide for this.  Devices can change their
      desired entitlement level at any point in time to address particular needs
      (via vio_cmo_set_dev_desired()), not just at device probe time.
      
      VIO bus changes
      
      The system will have a particular entitlement level available from which
      it can provide memory to the devices.  The bus defines two pools of memory
      within this entitlement, the reserved and excess pools.  Each device is
      provided with it's own entitlement no less than a system defined minimum
      entitlement and no greater than what the device has specified as it's
      desired entitlement.  The entitlement provided to devices comes from the
      reserve pool.  The reserve pool can also contain a spare allocation as
      large as the system defined minimum entitlement which is used for device
      hotplug events.  Any entitlement not needed to fulfill the needs of a
      reserve pool is placed in the excess pool.  Each device is guaranteed
      that it can map up to it's entitled level; additional mapping are possible
      as long as there is unmapped memory in the excess pool.
      
      Bus probe
      
      As the system starts, each device is given an entitlement equal only
      to the system defined minimum entitlement.  The reserve pool is equal
      to the sum of these entitlements, plus a spare allocation.  The VIO bus
      also tracks the aggregate desired entitlement of all the devices.  If the
      system desired entitlement is greater than the size of the reserve pool,
      when devices unmap IO memory it will be reserved and a balance operation
      will be scheduled for some time in the future.
      
      Entitlement balancing
      
      The balance function tries to fairly distribute entitlement between the
      devices in the system with the goal of providing each device with it's
      desired amount of entitlement.  Devices using more than what would be
      ideal will have their entitled set-point adjusted; this will effectively
      set a goal for lower IO memory usage as future mappings can fail and
      deallocations will trigger a balance operation to distribute the newly
      unmapped memory.  A fair distribution of entitlement can take several
      balance operations to achieve.  Entitlement changes and device DLPAR
      events will alter the state of CMO and will trigger balance operations.
      
      Hotplug events
      
      The VIO bus allows for changes in system entitlement at run-time via
      'vio_cmo_entitlement_update()'.  When devices are added the hotplug
      device event will be preceded by a system entitlement increase and this
      is reversed when devices are removed.
      
      The following changes are made that the VIO bus layer for CMO:
       * add IO memory accounting per device structure.
       * add IO memory entitlement query function to driver structure.
       * during vio bus probe, if CMO is enabled, check that driver has
         memory entitlement query function defined.  Fail if function not defined.
       * fail to register driver if io entitlement function not defined.
       * create set of dma_ops at vio level for CMO that will track allocations
         and return DMA failures once entitlement is reached.  Entitlement will
         limited by overall system entitlement.  Devices will have a reserved
         quantity of memory that is guaranteed, the rest can be used as available.
       * expose entitlement, current allocation, desired allocation, and the
         allocation error counter for devices to the user through sysfs
       * provide mechanism for changing a device's desired entitlement at run time
         for devices as an exported function and sysfs tunable
       * track any DMA failures for entitled IO memory for each vio device.
       * check entitlement against available system entitlement on device add
       * track entitlement metrics (high water mark, current usage)
       * provide function to reset high water mark
       * provide minimum and desired entitlement numbers at a bus level
       * provide drivers with a minimum guaranteed entitlement
       * balance available entitlement between devices to satisfy their needs
       * handle system entitlement changes and device hotplug
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a90ab95a
    • R
      powerpc/pseries: iommu enablement for CMO · 6490c490
      Robert Jennings 提交于
      To support Cooperative Memory Overcommitment (CMO), we need to check
      for failure from some of the tce hcalls.
      
      These changes for the pseries platform affect the powerpc architecture;
      patches for the other affected platforms are included in this patch.
      
      pSeries platform IOMMU code changes:
       * platform TCE functions must handle H_NOT_ENOUGH_RESOURCES errors and
         return an error.
      
      Architecture IOMMU code changes:
       * Calls to ppc_md.tce_build need to check return values and return
         DMA_MAPPING_ERROR for transient errors.
      
      Architecture changes:
       * struct machdep_calls for tce_build*_pSeriesLP functions need to change
         to indicate failure.
       * all other platforms will need updates to iommu functions to match the new
         calling semantics; they will return 0 on success.  The other platforms
         default configs have been built, but no further testing was performed.
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NOlof Johansson <olof@lixom.net>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6490c490