- 20 8月, 2008 3 次提交
-
-
由 Brian King 提交于
When CMO is enabled and booted on a non CMO system and the VIO device's probe function fails, an oops can result since vio_cmo_bus_remove is called when it should not. This fixes it by avoiding the vio_cmo_bus_remove call on platforms that don't implement CMO. cpu 0x0: Vector: 300 (Data Access) at [c00000000e13b3d0] pc: c000000000020d34: .vio_cmo_bus_remove+0xc0/0x1f4 lr: c000000000020ca4: .vio_cmo_bus_remove+0x30/0x1f4 sp: c00000000e13b650 msr: 8000000000009032 dar: 0 dsisr: 40000000 current = 0xc00000000e0566c0 paca = 0xc0000000006f9b80 pid = 2428, comm = modprobe enter ? for help [c00000000e13b6e0] c000000000021d94 .vio_bus_probe+0x2f8/0x33c [c00000000e13b7a0] c00000000029fc88 .driver_probe_device+0x13c/0x200 [c00000000e13b830] c00000000029fdac .__driver_attach+0x60/0xa4 [c00000000e13b8c0] c00000000029f050 .bus_for_each_dev+0x80/0xd8 [c00000000e13b980] c00000000029f9ec .driver_attach+0x28/0x40 [c00000000e13ba00] c00000000029f630 .bus_add_driver+0xd4/0x284 [c00000000e13baa0] c0000000002a01bc .driver_register+0xc4/0x198 [c00000000e13bb50] c00000000002168c .vio_register_driver+0x40/0x5c [c00000000e13bbe0] d0000000003b3f1c .ibmvfc_module_init+0x70/0x109c [ibmvfc] [c00000000e13bc70] c0000000000acf08 .sys_init_module+0x184c/0x1a10 [c00000000e13be30] c000000000008748 syscall_exit+0x0/0x40 Signed-off-by: NBrian King <brking@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Joachim Fenkes 提交于
Recent of_platform changes made of_bus_type_init() overwrite the bus type's .dev_attrs list, meaning that the "name" attribute that ibmebus devices previously had is no longer present. This is a user-visible regression which breaks the userspace eHCA support, since the eHCA userspace driver relies on the name attribute to check for valid adapters. This fixes it by providing the "name" attribute in the generic OF device code instead. Tested on POWER. Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Michael Ellerman 提交于
A change to __ioremap() broke reading /dev/oldmem because we're no longer able to ioremap pfn 0 (d177c207, "[PATCH] powerpc: IOMMU: don't ioremap null addresses"). We actually don't need to ioremap for anything that's part of the linear mapping, so just read it directly. Also make sure we're only reading one page or less at a time. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NSachin Sant <sachinp@in.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 18 8月, 2008 5 次提交
-
-
由 Christoph Hellwig 提交于
Use the generic compat_sys_old_readdir instead of the powerpc one which is almost the same except for the almost complete lack of error handling. Note that we can't just use SYSCALL() in systbl.h because the native syscall is named old_readdir, not sys_old_readdir. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Paul Collins 提交于
Commit 163f6876 missed one, resulting in the following compile error: AS arch/powerpc/kernel/misc_32.o arch/powerpc/kernel/misc_32.S: Assembler messages: arch/powerpc/kernel/misc_32.S:902: Error: unsupported relocation against KEXEC_CONTROL_CODE_SIZE make[2]: *** [arch/powerpc/kernel/misc_32.o] Error 1 make[1]: *** [arch/powerpc/kernel] Error 2 make: *** [vmlinux] Error 2 I grepped arch/ and found no further instances. Signed-off-by: NPaul Collins <paul@ondioline.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Steven Rostedt 提交于
Doing some various "make randconfig", I came across an error when CONFIG_BUG was not set: arch/powerpc/kernel/module.c: In function 'module_find_bug': arch/powerpc/kernel/module.c:111: error: increment of pointer to unknown structure arch/powerpc/kernel/module.c:111: error: arithmetic on pointer to an incomplete type arch/powerpc/kernel/module.c:112: error: dereferencing pointer to incomplete type Looking further into this, I found that module_find_bug, defined in powerpc arch code, is not called anywhere, so this just removes it. There is a static module_find_bug in lib/bug.c but that is a separate issue. Signed-off-by: NSteven Rostedt <srostedt@redhat.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Robert Jennings 提交于
Add a field in lparcfg output to indicate whether the kernel is running on a dedicated or shared memory lpar. Added fields to show the paging space pool IDs and the CMO page size. Submitted-by: NRobert Jennings <rcj@linux.vnet.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Rocky Craig 提交于
The intent of "flush_tlbs" is to invalidate all TLB entries by doing a TLB invalidate instruction for all pages in the address range 0 to 0x00400000. A loop counter is set up at the high value and decremented by page size. However, the loop is only done once as the sense of the conditional branch at the loop end does not match the setup/decrement. This fixes it to do the whole range by correcting the branch condition. Signed-off-by: NRocky Craig <rocky.craig@hp.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 15 8月, 2008 1 次提交
-
-
由 Huang Ying 提交于
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control page is used for not only code on some platform. For example in kexec jump, it is used for data and stack too. [akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion] Signed-off-by: NHuang Ying <ying.huang@intel.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 8月, 2008 2 次提交
-
-
由 Benjamin Herrenschmidt 提交于
When we have an ISA memory hole (ie, a PCI window that allows us to generate PCI memory cycles at low PCI address) mixed with other resources using a different CPU <=> PCI mapping, we must not keep the ISA hole in the bridge resource list. If we do, things might start trying to allocate device resources in there and will get the PCI addresses wrong. This fixes it by arranging to remove the ISA memory hole resource in this case. This fixes various cases of PCMCIA breakage on PowerBooks using the MPC106 "grackle" bridge. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Nathan Fontenot 提交于
The kernel copy of the rtas args struct contains the return value(s) for the specified rtas call. These are copied back to user space with the assumption that every value has been set by the rtas call, which turns out to be not always true. Thus userspace can see random values and think the call failed when in fact it succeeded, but for some reason didn't set one of the return values. This fixes the problem by zeroing out the return value fields of the rtas args struct before processing the rtas call. Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 04 8月, 2008 1 次提交
-
-
由 Kumar Gala 提交于
Now that arch/ppc is gone and CONFIG_PPC_MERGE is always set, remove the dead code associated with !CONFIG_PPC_MERGE from arch/powerpc and include/asm-powerpc. Signed-off-by: NKumar Gala <galak@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 30 7月, 2008 4 次提交
-
-
由 Michael Neuling 提交于
In PTRACE_GET/SETVSRREGS, we should be using the thread we are ptracing rather than current. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Neuling 提交于
Fix cut-and-paste error in the size setting for ptrace buffers for VSX. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Michael Neuling 提交于
Fix bug where PTRACE_GET/SETVSRREGS are not connected for 32 bit processes. Signed-off-by: NMichael Neuling <mikey@neuling.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nathan Fontenot 提交于
The code to handle writes to /proc/ppc64/lparcfg incorrectly assumes that the return code from the helper routines to update processor or memory entitlement return a hcall return value. It then assumes any non-hcall return value is bad and sets the return code for the write to be -EIO. The update_[mp]pp routines can return values other than a hcall return value. This patch removes the automatic setting of any return code that is not an hcall return value from these routines to -EIO. Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 28 7月, 2008 14 次提交
-
-
由 Benjamin Herrenschmidt 提交于
This removes the non-working code in legacy_serial that tried to handle the powermac SCC ports, and instead add a (now working) function to the powermac platform code to find the default serial console if any. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nathan Lynch 提交于
Collect cache information from the OF device tree and display it in the cpu hierarchy in sysfs. This is intended to be compatible at the userspace level with x86's implementation[1], hence some of the funny attribute names. The arrangement of cache info is not immediately intuitive, but (again) it's for compatibility's sake. The cache attributes exposed are: type (Data, Instruction, or Unified) level (1, 2, 3...) size coherency_line_size number_of_sets ways_of_associativity All of these can be derived on platforms that follow the OF PowerPC Processor binding. The code "publishes" only those attributes for which it is able to determine values; attributes for values which cannot be determined are not created at all. [1] arch/x86/kernel/cpu/intel_cacheinfo.c BenH: Turned some printk's into pr_debug, added better NULL checking in a couple of places. Signed-off-by: NNathan Lynch <ntl@pobox.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nathan Lynch 提交于
Existing Open Firmware practice is to report each processor core as a separate node in the device tree. Report the value of the "reg" OF property corresponding to a logical CPU's device node as the core_id attribute in /sys/devices/system/cpu/cpu*/topology/core_id. Signed-off-by: NNathan Lynch <ntl@pobox.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nathan Lynch 提交于
Implement the notion of "core siblings" for powerpc. This makes /sys/devices/system/cpu/cpu*/topology/core_siblings present sensible values, indicating online CPUs which share an L2 cache. BenH: Made cpu_to_l2cache() use of_find_node_by_phandle() instead of IBM-specific open coded search Signed-off-by: NNathan Lynch <ntl@pobox.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Stephen Rothwell 提交于
arch/powerpc/kernel/vio.c:533: error: too few arguments to function 'dma_mapping_error' Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Roland McGrath 提交于
This adds TIF_NOTIFY_RESUME support for powerpc. When set, we call tracehook_notify_resume() on the way to user mode. This overloads do_signal() to do the work, but changes its arguments to it has the TIF_* bits handy in a register and drops the useless first argument that was always zero. Signed-off-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Roland McGrath 提交于
This changes powerpc syscall tracing to use the new tracehook.h entry points. There is no change, only cleanup. In addition, the assembly changes allow do_syscall_trace_enter() to abort the syscall without losing the information about the original r0 value. Signed-off-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Roland McGrath 提交于
This makes the powerpc signal handling code call tracehook_signal_handler() after a handler is set up. This means that using PTRACE_SINGLESTEP to enter a signal handler will report to ptrace on the first instruction of the handler, instead of the second. This is consistent with what x86 and other machines do, and what users and debuggers want. BenH: Fixed up the test for the trap value. Signed-off-by: NRoland McGrath <roland@redhat.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nathan Lynch 提交于
Rather doing one initialization pass over all the per-cpu cpu_sibling_maps at boot, update the maps at cpu online/offline time. This is a behavior change -- the thread_siblings attribute now reflects only online siblings, whereas it would display offline siblings before. The new behavior matches that of x86, and is arguably more useful. Signed-off-by: NNathan Lynch <ntl@pobox.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nathan Lynch 提交于
It is called only in cpu online paths. (caught by CONFIG_DEBUG_SECTION_MISMATCH=y) Signed-off-by: NNathan Lynch <ntl@pobox.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nathan Lynch 提交于
This piece of code is broken for >2 threads, and possibly in some other subtle ways (such as comparing a value obtained from an "ibm,ppc-interrupt-server#s" property to a value obtained from a "reg" property) and doesn't seem to have any useful purpose in the first place other than a dubious warning in case NR_CPUS is too small, which probably isn't the right place to do so. Signed-off-by: NNathan Lynch <ntl@pobox.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nathan Lynch 提交于
arch/powerpc/kernel/vio.c:1034: warning: function declaration isnât a prototype arch/powerpc/kernel/vio.c:1035: warning: function declaration isnât a prototype Signed-off-by: NNathan Lynch <ntl@pobox.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Kumar Gala 提交于
* CONFIG_BOOKE is selected by CONFIG_44x so we dont need both * Fixed a few comments * Go back to only using DBCR0_IDM to determine if we are using debug resources. Signed-off-by: NKumar Gala <galak@kernel.crashing.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Huang Weiyi 提交于
Removed duplicated include file <linux/module.h> in arch/powerpc/kernel/stacktrace.c. Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 27 7月, 2008 2 次提交
-
-
由 Alexey Dobriyan 提交于
Kmem cache passed to constructor is only needed for constructors that are themselves multiplexeres. Nobody uses this "feature", nor does anybody uses passed kmem cache in non-trivial way, so pass only pointer to object. Non-trivial places are: arch/powerpc/mm/init_64.c arch/powerpc/mm/hugetlbpage.c This is flag day, yes. Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Acked-by: NPekka Enberg <penberg@cs.helsinki.fi> Acked-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Jon Tollefson <kniht@linux.vnet.ibm.com> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Matt Mackall <mpm@selenic.com> [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c] [akpm@linux-foundation.org: fix mm/slab.c] [akpm@linux-foundation.org: fix ubifs] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Huang Ying 提交于
This patch provides an enhancement to kexec/kdump. It implements the following features: - Backup/restore memory used by the original kernel before/after kexec. - Save/restore CPU state before/after kexec. The features of this patch can be used as a general method to call program in physical mode (paging turning off). This can be used to call BIOS code under Linux. kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 Usage example of calling some physical mode code and return: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_KEXEC=y CONFIG_PM=y CONFIG_KEXEC_JUMP=y 2. Build patched kexec-tool or download the pre-built one. 3. Build some physical mode executable named such as "phy_mode" 4. Boot kernel compiled in step 1. 5. Load physical mode executable with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context --args-none phy_mode 6. Call physical mode executable with following shell command line: /sbin/kexec -e Implementation point: To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by kexeced code image (destination page). When do kexec_load, the image of kexeced code is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the kexeced code image and after jumping back to the original kernel, the destination pages and the source pages are swapped too. C ABI (calling convention) is used as communication protocol between kernel and called code. A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to indicate that the loaded kernel image is used for jumping back. Now, only the i386 architecture is supported. Signed-off-by: NHuang Ying <ying.huang@intel.com> Acked-by: NVivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 7月, 2008 3 次提交
-
-
由 Nathan Lynch 提交于
Commit 9115d134 ("powerpc: Enable AT_BASE_PLATFORM aux vector") broke boot on 32-bit powerpc systems; we have to use PTRRELOC to initialize powerpc_base_platform this early in boot. Bug reported by Jon Smirl. Signed-off-by: NNathan Lynch <ntl@pobox.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Kumar Gala 提交于
* CONFIG_BOOKE is selected by CONFIG_44x so we dont need both * Fixed a few comments * Go back to only using DBCR0_IDM to determine if we are using debug resources. Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
-
由 Srinivasa D S 提交于
Currently list of kretprobe instances are stored in kretprobe object (as used_instances,free_instances) and in kretprobe hash table. We have one global kretprobe lock to serialise the access to these lists. This causes only one kretprobe handler to execute at a time. Hence affects system performance, particularly on SMP systems and when return probe is set on lot of functions (like on all systemcalls). Solution proposed here gives fine-grain locks that performs better on SMP system compared to present kretprobe implementation. Solution: 1) Instead of having one global lock to protect kretprobe instances present in kretprobe object and kretprobe hash table. We will have two locks, one lock for protecting kretprobe hash table and another lock for kretporbe object. 2) We hold lock present in kretprobe object while we modify kretprobe instance in kretprobe object and we hold per-hash-list lock while modifying kretprobe instances present in that hash list. To prevent deadlock, we never grab a per-hash-list lock while holding a kretprobe lock. 3) We can remove used_instances from struct kretprobe, as we can track used instances of kretprobe instances using kretprobe hash table. Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system with return probes set on all systemcalls looks like this. cacheline non-cacheline Un-patched kernel aligned patch aligned patch =============================================================================== real 9m46.784s 9m54.412s 10m2.450s user 40m5.715s 40m7.142s 40m4.273s sys 2m57.754s 2m58.583s 3m17.430s =========================================================== Time duration for kernel compilation ("make -j 8) on the same system, when kernel is not probed. ========================= real 9m26.389s user 40m8.775s sys 2m7.283s ========================= Signed-off-by: NSrinivasa DS <srinivasa@in.ibm.com> Signed-off-by: NJim Keniston <jkenisto@us.ibm.com> Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Masami Hiramatsu <mhiramat@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 7月, 2008 5 次提交
-
-
由 Nathan Fontenot 提交于
There are only 4 valid name=value pairs for writes to /proc/ppc64/lparcfg. Current code allocates a buffer to copy this information in from the user. Since the longest name=value pair will easily fit into a buffer of 64 characters, simply put the buffer on the stack instead of allocating the buffer. Signed-off-by: NNathan Fotenot <nfont@austin.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nathan Fontenot 提交于
Update the architecture vector to indicate that Cooperative Memory Overcommitment is supported if CONFIG_PPC_SMLPAR is set. Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com> Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Nathan Fontenot 提交于
Verify memory entitlement updates can be handled by vio. Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com> Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Robert Jennings 提交于
This is a large patch but the normal code path is not affected. For non-pSeries platforms the code is ifdef'ed out and for non-CMO enabled pSeries systems this does not affect the normal code path. Devices that do not perform DMA operations do not need modification with this patch. The function get_desired_dma was renamed from get_io_entitlement for clarity. Overview Cooperative Memory Overcommitment (CMO) allows for a set of OS partitions to be run with less RAM than the aggregate needs of the group of partitions. The firmware will balance memory between the partitions and page in/out memory as needed. Based on the number and type of IO adpaters preset each partition is allocated an amount of memory for DMA operations and this allocation will be guaranteed to the partition; this is referred to as the partition's 'entitlement'. Partitions running in a CMO environment can only have virtual IO devices present. The VIO bus layer will manage the IO entitlement for the system. Accounting, at a system and per-device level, is tracked in the VIO bus code and exposed via sysfs. A set of dma_ops functions are added to the bus to allow for this accounting. Bus initialization At initialization, the bus will calculate the minimum needs of the system based on providing each device present with a standard minimum entitlement along with a spare allocation for the bus to handle hotplug events. If the minimum needs can not be met the system boot will be halted. Device changes The significant changes for devices while running under CMO are that the devices must specify how much dedicated IO entitlement they desire and must also handle DMA mapping errors that can occur due to constrained IO memory. The virtual IO drivers are modified to silence errors when DMA mappings fail for CMO and handle these failures gracefully. Each devices will be guaranteed a minimum entitlement that can always be mapped. Devices will specify how much entitlement they desire and the VIO bus will attempt to provide for this. Devices can change their desired entitlement level at any point in time to address particular needs (via vio_cmo_set_dev_desired()), not just at device probe time. VIO bus changes The system will have a particular entitlement level available from which it can provide memory to the devices. The bus defines two pools of memory within this entitlement, the reserved and excess pools. Each device is provided with it's own entitlement no less than a system defined minimum entitlement and no greater than what the device has specified as it's desired entitlement. The entitlement provided to devices comes from the reserve pool. The reserve pool can also contain a spare allocation as large as the system defined minimum entitlement which is used for device hotplug events. Any entitlement not needed to fulfill the needs of a reserve pool is placed in the excess pool. Each device is guaranteed that it can map up to it's entitled level; additional mapping are possible as long as there is unmapped memory in the excess pool. Bus probe As the system starts, each device is given an entitlement equal only to the system defined minimum entitlement. The reserve pool is equal to the sum of these entitlements, plus a spare allocation. The VIO bus also tracks the aggregate desired entitlement of all the devices. If the system desired entitlement is greater than the size of the reserve pool, when devices unmap IO memory it will be reserved and a balance operation will be scheduled for some time in the future. Entitlement balancing The balance function tries to fairly distribute entitlement between the devices in the system with the goal of providing each device with it's desired amount of entitlement. Devices using more than what would be ideal will have their entitled set-point adjusted; this will effectively set a goal for lower IO memory usage as future mappings can fail and deallocations will trigger a balance operation to distribute the newly unmapped memory. A fair distribution of entitlement can take several balance operations to achieve. Entitlement changes and device DLPAR events will alter the state of CMO and will trigger balance operations. Hotplug events The VIO bus allows for changes in system entitlement at run-time via 'vio_cmo_entitlement_update()'. When devices are added the hotplug device event will be preceded by a system entitlement increase and this is reversed when devices are removed. The following changes are made that the VIO bus layer for CMO: * add IO memory accounting per device structure. * add IO memory entitlement query function to driver structure. * during vio bus probe, if CMO is enabled, check that driver has memory entitlement query function defined. Fail if function not defined. * fail to register driver if io entitlement function not defined. * create set of dma_ops at vio level for CMO that will track allocations and return DMA failures once entitlement is reached. Entitlement will limited by overall system entitlement. Devices will have a reserved quantity of memory that is guaranteed, the rest can be used as available. * expose entitlement, current allocation, desired allocation, and the allocation error counter for devices to the user through sysfs * provide mechanism for changing a device's desired entitlement at run time for devices as an exported function and sysfs tunable * track any DMA failures for entitled IO memory for each vio device. * check entitlement against available system entitlement on device add * track entitlement metrics (high water mark, current usage) * provide function to reset high water mark * provide minimum and desired entitlement numbers at a bus level * provide drivers with a minimum guaranteed entitlement * balance available entitlement between devices to satisfy their needs * handle system entitlement changes and device hotplug Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Robert Jennings 提交于
To support Cooperative Memory Overcommitment (CMO), we need to check for failure from some of the tce hcalls. These changes for the pseries platform affect the powerpc architecture; patches for the other affected platforms are included in this patch. pSeries platform IOMMU code changes: * platform TCE functions must handle H_NOT_ENOUGH_RESOURCES errors and return an error. Architecture IOMMU code changes: * Calls to ppc_md.tce_build need to check return values and return DMA_MAPPING_ERROR for transient errors. Architecture changes: * struct machdep_calls for tce_build*_pSeriesLP functions need to change to indicate failure. * all other platforms will need updates to iommu functions to match the new calling semantics; they will return 0 on success. The other platforms default configs have been built, but no further testing was performed. Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com> Acked-by: NOlof Johansson <olof@lixom.net> Acked-by: NPaul Mackerras <paulus@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-