提交 · cd5aeb9f6cf7ada6baa218e01b4299e201497cde · openeuler / raspberrypi-kernel

20 8月, 2008 3 次提交

powerpc: Fix vio_bus_probe oops on probe error · cd5aeb9f

由 Brian King 提交于 8月 13, 2008

When CMO is enabled and booted on a non CMO system and the VIO
device's probe function fails, an oops can result since
vio_cmo_bus_remove is called when it should not.  This fixes it by
avoiding the vio_cmo_bus_remove call on platforms that don't implement
CMO.

cpu 0x0: Vector: 300 (Data Access) at [c00000000e13b3d0]
    pc: c000000000020d34: .vio_cmo_bus_remove+0xc0/0x1f4
    lr: c000000000020ca4: .vio_cmo_bus_remove+0x30/0x1f4
    sp: c00000000e13b650
   msr: 8000000000009032
   dar: 0
 dsisr: 40000000
  current = 0xc00000000e0566c0
  paca    = 0xc0000000006f9b80
    pid   = 2428, comm = modprobe
enter ? for help
[c00000000e13b6e0] c000000000021d94 .vio_bus_probe+0x2f8/0x33c
[c00000000e13b7a0] c00000000029fc88 .driver_probe_device+0x13c/0x200
[c00000000e13b830] c00000000029fdac .__driver_attach+0x60/0xa4
[c00000000e13b8c0] c00000000029f050 .bus_for_each_dev+0x80/0xd8
[c00000000e13b980] c00000000029f9ec .driver_attach+0x28/0x40
[c00000000e13ba00] c00000000029f630 .bus_add_driver+0xd4/0x284
[c00000000e13baa0] c0000000002a01bc .driver_register+0xc4/0x198
[c00000000e13bb50] c00000000002168c .vio_register_driver+0x40/0x5c
[c00000000e13bbe0] d0000000003b3f1c .ibmvfc_module_init+0x70/0x109c [ibmvfc]
[c00000000e13bc70] c0000000000acf08 .sys_init_module+0x184c/0x1a10
[c00000000e13be30] c000000000008748 syscall_exit+0x0/0x40
Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

cd5aeb9f

powerpc/ibmebus: Restore "name" sysfs attribute on ibmebus devices · 4589f1fe

由 Joachim Fenkes 提交于 8月 06, 2008

Recent of_platform changes made of_bus_type_init() overwrite the bus
type's .dev_attrs list, meaning that the "name" attribute that ibmebus
devices previously had is no longer present.  This is a user-visible
regression which breaks the userspace eHCA support, since the eHCA
userspace driver relies on the name attribute to check for valid
adapters.

This fixes it by providing the "name" attribute in the generic OF
device code instead.  Tested on POWER.
Signed-off-by: NJoachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

4589f1fe

powerpc: Fix /dev/oldmem interface for kdump · 7230ced4

由 Michael Ellerman 提交于 7月 31, 2008

A change to __ioremap() broke reading /dev/oldmem because we're no
longer able to ioremap pfn 0 (d177c207, "[PATCH] powerpc: IOMMU: don't
ioremap null addresses").

We actually don't need to ioremap for anything that's part of the linear
mapping, so just read it directly.

Also make sure we're only reading one page or less at a time.
Signed-off-by: NMichael Ellerman <michael@ellerman.id.au>
Signed-off-by: NSachin Sant <sachinp@in.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

7230ced4

18 8月, 2008 5 次提交

powerpc: Use generic compat_sys_old_readdir · 50d0b176

由 Christoph Hellwig 提交于 8月 17, 2008

Use the generic compat_sys_old_readdir instead of the powerpc one which
is almost the same except for the almost complete lack of error
handling.

Note that we can't just use SYSCALL() in systbl.h because the native
syscall is named old_readdir, not sys_old_readdir.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

50d0b176

powerpc/kexec: Fix up KEXEC_CONTROL_CODE_SIZE missed during conversion · d9178f4c

由 Paul Collins 提交于 8月 16, 2008

Commit 163f6876 missed one, resulting in
the following compile error:

  AS      arch/powerpc/kernel/misc_32.o
arch/powerpc/kernel/misc_32.S: Assembler messages:
arch/powerpc/kernel/misc_32.S:902: Error: unsupported relocation against KEXEC_CONTROL_CODE_SIZE
make[2]: *** [arch/powerpc/kernel/misc_32.o] Error 1
make[1]: *** [arch/powerpc/kernel] Error 2
make: *** [vmlinux] Error 2

I grepped arch/ and found no further instances.
Signed-off-by: NPaul Collins <paul@ondioline.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

d9178f4c

powerpc: Remove dead module_find_bug code · b9754568

由 Steven Rostedt 提交于 8月 16, 2008

Doing some various "make randconfig", I came across an error when
CONFIG_BUG was not set:

arch/powerpc/kernel/module.c: In function 'module_find_bug':
arch/powerpc/kernel/module.c:111: error: increment of pointer to unknown structure
arch/powerpc/kernel/module.c:111: error: arithmetic on pointer to an incomplete type
arch/powerpc/kernel/module.c:112: error: dereferencing pointer to incomplete type

Looking further into this, I found that module_find_bug, defined in
powerpc arch code, is not called anywhere, so this just removes it.

There is a static module_find_bug in lib/bug.c but that is a separate issue.
Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

b9754568

powerpc: Add CMO enabled flag and paging space data to lparcfg · ac22429d

由 Robert Jennings 提交于 8月 16, 2008

Add a field in lparcfg output to indicate whether the kernel is
running on a dedicated or shared memory lpar.  Added fields to show
the paging space pool IDs and the CMO page size.
Submitted-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

ac22429d

powerpc: Fix TLB invalidation on boot on 32-bit · 9acd57ca

由 Rocky Craig 提交于 8月 14, 2008

The intent of "flush_tlbs" is to invalidate all TLB entries by doing a
TLB invalidate instruction for all pages in the address range 0 to
0x00400000.  A loop counter is set up at the high value and
decremented by page size.  However, the loop is only done once as the
sense of the conditional branch at the loop end does not match the
setup/decrement.  This fixes it to do the whole range by correcting
the branch condition.
Signed-off-by: NRocky Craig <rocky.craig@hp.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

9acd57ca

15 8月, 2008 1 次提交

kexec jump: rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE · 163f6876

由 Huang Ying 提交于 8月 15, 2008

Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control
page is used for not only code on some platform.  For example in kexec
jump, it is used for data and stack too.

[akpm@linux-foundation.org: unbreak powerpc and arm, finish conversion]
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

163f6876

11 8月, 2008 2 次提交

powerpc/pci: Don't keep ISA memory hole resources in the tree · 8db13a0e

由 Benjamin Herrenschmidt 提交于 7月 31, 2008

When we have an ISA memory hole (ie, a PCI window that allows us to
generate PCI memory cycles at low PCI address) mixed with other
resources using a different CPU <=> PCI mapping, we must not keep
the ISA hole in the bridge resource list.

If we do, things might start trying to allocate device resources
in there and will get the PCI addresses wrong.

This fixes it by arranging to remove the ISA memory hole resource in
this case.  This fixes various cases of PCMCIA breakage on PowerBooks
using the MPC106 "grackle" bridge.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

8db13a0e

powerpc: Zero fill the return values of rtas argument buffer · b79998fc

由 Nathan Fontenot 提交于 7月 31, 2008

The kernel copy of the rtas args struct contains the return
value(s) for the specified rtas call.  These are copied back
to user space with the assumption that every value has been
set by the rtas call, which turns out to be not always true.
Thus userspace can see random values and think the call failed
when in fact it succeeded, but for some reason didn't set one
of the return values.

This fixes the problem by zeroing out the return value fields
of the rtas args struct before processing the rtas call.
Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

b79998fc

04 8月, 2008 1 次提交

powerpc: Remove use of CONFIG_PPC_MERGE · 9c4cb825

由 Kumar Gala 提交于 8月 02, 2008

Now that arch/ppc is gone and CONFIG_PPC_MERGE is always set, remove
the dead code associated with !CONFIG_PPC_MERGE from arch/powerpc
and include/asm-powerpc.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

9c4cb825

30 7月, 2008 4 次提交

powerpc: Don't use the wrong thread_struct for ptrace get/set VSX regs · 7d2a175b

由 Michael Neuling 提交于 7月 29, 2008

In PTRACE_GET/SETVSRREGS, we should be using the thread we are
ptracing rather than current.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7d2a175b

powerpc: Fix ptrace buffer size for VSX · 1ac42ef8

由 Michael Neuling 提交于 7月 29, 2008

Fix cut-and-paste error in the size setting for ptrace buffers for VSX.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

1ac42ef8

powerpc: Correctly hookup PTRACE_GET/SETVSRREGS for 32 bit processes · 33b3f03d

由 Michael Neuling 提交于 7月 29, 2008

Fix bug where PTRACE_GET/SETVSRREGS are not connected for 32 bit processes.
Signed-off-by: NMichael Neuling <mikey@neuling.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

33b3f03d

powerpc: Allow non-hcall return values for lparcfg writes · 9ee07f91

由 Nathan Fontenot 提交于 7月 26, 2008

The code to handle writes to /proc/ppc64/lparcfg incorrectly
assumes that the return code from the helper routines to update
processor or memory entitlement return a hcall return value. It
then assumes any non-hcall return value is bad and sets the return
code for the write to be -EIO.

The update_[mp]pp routines can return values other than a hcall
return value. This patch removes the automatic setting of any
return code that is not an hcall return value from these routines
to -EIO.
Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

9ee07f91

28 7月, 2008 14 次提交

powerpc/powermac: Fixup default serial port device for pmac_zilog · 025d7917

由 Benjamin Herrenschmidt 提交于 7月 28, 2008

This removes the non-working code in legacy_serial that tried to handle
the powermac SCC ports, and instead add a (now working) function to the
powermac platform code to find the default serial console if any.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

025d7917

powerpc: Show processor cache information in sysfs · 124c27d3

由 Nathan Lynch 提交于 7月 27, 2008

Collect cache information from the OF device tree and display it in
the cpu hierarchy in sysfs.  This is intended to be compatible at the
userspace level with x86's implementation[1], hence some of the funny
attribute names.  The arrangement of cache info is not immediately
intuitive, but (again) it's for compatibility's sake.

The cache attributes exposed are:

type (Data, Instruction, or Unified)
level (1, 2, 3...)
size
coherency_line_size
number_of_sets
ways_of_associativity

All of these can be derived on platforms that follow the OF PowerPC
Processor binding.  The code "publishes" only those attributes for
which it is able to determine values; attributes for values which
cannot be determined are not created at all.

[1] arch/x86/kernel/cpu/intel_cacheinfo.c

BenH: Turned some printk's into pr_debug, added better NULL checking
in a couple of places.
Signed-off-by: NNathan Lynch <ntl@pobox.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

124c27d3

powerpc: Make core id information available to userspace · e9efed3b

由 Nathan Lynch 提交于 7月 27, 2008

Existing Open Firmware practice is to report each processor core as a
separate node in the device tree. Report the value of the "reg" OF
property corresponding to a logical CPU's device node as the core_id
attribute in /sys/devices/system/cpu/cpu*/topology/core_id.
Signed-off-by: NNathan Lynch <ntl@pobox.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e9efed3b

powerpc: Make core sibling information available to userspace · 440a0857

由 Nathan Lynch 提交于 7月 27, 2008

Implement the notion of "core siblings" for powerpc.  This makes
/sys/devices/system/cpu/cpu*/topology/core_siblings present sensible
values, indicating online CPUs which share an L2 cache.

BenH: Made cpu_to_l2cache() use of_find_node_by_phandle() instead
of IBM-specific open coded search
Signed-off-by: NNathan Lynch <ntl@pobox.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

440a0857

powerpc/vio: More fallout from dma_mapping_error API change · 0764bf63

由 Stephen Rothwell 提交于 7月 28, 2008

arch/powerpc/kernel/vio.c:533: error: too few arguments to function 'dma_mapping_error'
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

0764bf63

powerpc: Add TIF_NOTIFY_RESUME support for tracehook · 7d6d637d

由 Roland McGrath 提交于 7月 27, 2008

This adds TIF_NOTIFY_RESUME support for powerpc.  When set,
we call tracehook_notify_resume() on the way to user mode.
This overloads do_signal() to do the work, but changes its
arguments to it has the TIF_* bits handy in a register and
drops the useless first argument that was always zero.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7d6d637d

powerpc: Make syscall tracing use tracehook.h helpers · 4f72c427

由 Roland McGrath 提交于 7月 27, 2008

This changes powerpc syscall tracing to use the new tracehook.h entry
points.  There is no change, only cleanup.

In addition, the assembly changes allow do_syscall_trace_enter() to
abort the syscall without losing the information about the original
r0 value.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

4f72c427

powerpc: Call tracehook_signal_handler() when setting up signal frames · 6558ba2b

由 Roland McGrath 提交于 7月 27, 2008

This makes the powerpc signal handling code call tracehook_signal_handler()
after a handler is set up. This means that using PTRACE_SINGLESTEP to
enter a signal handler will report to ptrace on the first instruction of
the handler, instead of the second. This is consistent with what x86 and
other machines do, and what users and debuggers want.

BenH: Fixed up the test for the trap value.
Signed-off-by: NRoland McGrath <roland@redhat.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6558ba2b

powerpc: Update cpu_sibling_maps dynamically · e2075f79

由 Nathan Lynch 提交于 7月 27, 2008

Rather doing one initialization pass over all the per-cpu
cpu_sibling_maps at boot, update the maps at cpu online/offline time.

This is a behavior change -- the thread_siblings attribute now
reflects only online siblings, whereas it would display offline
siblings before.  The new behavior matches that of x86, and is
arguably more useful.
Signed-off-by: NNathan Lynch <ntl@pobox.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

e2075f79

powerpc: register_cpu_online should be __cpuinit · 9ba1984e

由 Nathan Lynch 提交于 7月 27, 2008

It is called only in cpu online paths.

(caught by CONFIG_DEBUG_SECTION_MISMATCH=y)
Signed-off-by: NNathan Lynch <ntl@pobox.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

9ba1984e

powerpc: kill useless SMT code in prom_hold_cpus · 7d2f6075

由 Nathan Lynch 提交于 7月 27, 2008

This piece of code is broken for >2 threads, and possibly in some
other subtle ways (such as comparing a value obtained from an
"ibm,ppc-interrupt-server#s" property to a value obtained from a
"reg" property) and doesn't seem to have any useful purpose in the
first place other than a dubious warning in case NR_CPUS is too
small, which probably isn't the right place to do so.
Signed-off-by: NNathan Lynch <ntl@pobox.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

7d2f6075

powerpc: Fix vio build warnings · b9fa49a9

由 Nathan Lynch 提交于 7月 26, 2008

arch/powerpc/kernel/vio.c:1034: warning: function declaration isnât a prototype
arch/powerpc/kernel/vio.c:1035: warning: function declaration isnât a prototype
Signed-off-by: NNathan Lynch <ntl@pobox.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

b9fa49a9

powerpc/booke: Clean up the hardware watchpoint support · 2325f0a0

由 Kumar Gala 提交于 7月 26, 2008

* CONFIG_BOOKE is selected by CONFIG_44x so we dont need both
* Fixed a few comments
* Go back to only using DBCR0_IDM to determine if we are using
  debug resources.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

2325f0a0

powerpc: Removed duplicated include in stacktrace.c · d3b06023

由 Huang Weiyi 提交于 7月 24, 2008

Removed duplicated include file <linux/module.h> in
arch/powerpc/kernel/stacktrace.c.
Signed-off-by: NHuang Weiyi <weiyi.huang@gmail.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

d3b06023

27 7月, 2008 2 次提交

SL*B: drop kmem cache argument from constructor · 51cc5068

由 Alexey Dobriyan 提交于 7月 25, 2008

Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

51cc5068

kexec jump · 3ab83521

由 Huang Ying 提交于 7月 25, 2008

This patch provides an enhancement to kexec/kdump.  It implements the
following features:

- Backup/restore memory used by the original kernel before/after
  kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call program in
physical mode (paging turning off).  This can be used to call BIOS code under
Linux.

kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

       source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
       patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
       binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10

Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y
CONFIG_PM=y
CONFIG_KEXEC_JUMP=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
   line can be as follow:

   /sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

   /sbin/kexec -e

Implementation point:

To support jumping without reserving memory.  One shadow backup page (source
page) is allocated for each page used by kexeced code image (destination
page).  When do kexec_load, the image of kexeced code is loaded into source
pages, and before executing, the destination pages and the source pages are
swapped, so the contents of destination pages are backupped.  Before jumping
to the kexeced code image and after jumping back to the original kernel, the
destination pages and the source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.

Now, only the i386 architecture is supported.
Signed-off-by: NHuang Ying <ying.huang@intel.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ab83521

26 7月, 2008 3 次提交

powerpc: Fix boot problem due to AT_BASE_PLATFORM change · fc532f81

由 Nathan Lynch 提交于 7月 25, 2008

Commit 9115d134 ("powerpc: Enable
AT_BASE_PLATFORM aux vector") broke boot on 32-bit powerpc systems; we
have to use PTRRELOC to initialize powerpc_base_platform this early in
boot.

Bug reported by Jon Smirl.
Signed-off-by: NNathan Lynch <ntl@pobox.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

fc532f81

powerpc: clean up the Book-E HW watchpoint support · 0b21bb49

由 Kumar Gala 提交于 7月 25, 2008

* CONFIG_BOOKE is selected by CONFIG_44x so we dont need both
* Fixed a few comments
* Go back to only using DBCR0_IDM to determine if we are using
  debug resources.
Signed-off-by: NKumar Gala <galak@kernel.crashing.org>

0b21bb49

kprobes: improve kretprobe scalability with hashed locking · ef53d9c5

由 Srinivasa D S 提交于 7月 25, 2008

Currently list of kretprobe instances are stored in kretprobe object (as
used_instances,free_instances) and in kretprobe hash table.  We have one
global kretprobe lock to serialise the access to these lists.  This causes
only one kretprobe handler to execute at a time.  Hence affects system
performance, particularly on SMP systems and when return probe is set on
lot of functions (like on all systemcalls).

Solution proposed here gives fine-grain locks that performs better on SMP
system compared to present kretprobe implementation.

Solution:

 1) Instead of having one global lock to protect kretprobe instances
    present in kretprobe object and kretprobe hash table.  We will have
    two locks, one lock for protecting kretprobe hash table and another
    lock for kretporbe object.

 2) We hold lock present in kretprobe object while we modify kretprobe
    instance in kretprobe object and we hold per-hash-list lock while
    modifying kretprobe instances present in that hash list.  To prevent
    deadlock, we never grab a per-hash-list lock while holding a kretprobe
    lock.

 3) We can remove used_instances from struct kretprobe, as we can
    track used instances of kretprobe instances using kretprobe hash
    table.

Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
with return probes set on all systemcalls looks like this.

cacheline              non-cacheline             Un-patched kernel
aligned patch 	       aligned patch
===============================================================================
real    9m46.784s       9m54.412s                  10m2.450s
user    40m5.715s       40m7.142s                  40m4.273s
sys     2m57.754s       2m58.583s                  3m17.430s
===========================================================

Time duration for kernel compilation ("make -j 8) on the same system, when
kernel is not probed.
=========================
real    9m26.389s
user    40m8.775s
sys     2m7.283s
=========================
Signed-off-by: NSrinivasa DS <srinivasa@in.ibm.com>
Signed-off-by: NJim Keniston <jkenisto@us.ibm.com>
Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef53d9c5

25 7月, 2008 5 次提交

powerpc/pseries: Remove kmalloc call in handling writes to lparcfg · 16c14b46

由 Nathan Fontenot 提交于 7月 24, 2008

There are only 4 valid name=value pairs for writes to
/proc/ppc64/lparcfg.  Current code allocates a buffer to copy
this information in from the user.  Since the longest name=value
pair will easily fit into a buffer of 64 characters, simply
put the buffer on the stack instead of allocating the buffer.
Signed-off-by: NNathan Fotenot <nfont@austin.ibm.com>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

16c14b46

powerpc/pseries: Update arch vector to indicate support for CMO · 8391e42a

由 Nathan Fontenot 提交于 7月 24, 2008

Update the architecture vector to indicate that Cooperative Memory
Overcommitment is supported if CONFIG_PPC_SMLPAR is set.
Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

8391e42a

powerpc/pseries: Verify CMO memory entitlement updates with virtual I/O · 22e1a4dd

由 Nathan Fontenot 提交于 7月 24, 2008

Verify memory entitlement updates can be handled by vio.
Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

22e1a4dd

powerpc/pseries: vio bus support for CMO · a90ab95a

由 Robert Jennings 提交于 7月 24, 2008

This is a large patch but the normal code path is not affected.  For
non-pSeries platforms the code is ifdef'ed out and for non-CMO enabled
pSeries systems this does not affect the normal code path.  Devices that
do not perform DMA operations do not need modification with this patch.
The function get_desired_dma was renamed from get_io_entitlement for
clarity.

Overview

Cooperative Memory Overcommitment (CMO) allows for a set of OS partitions
to be run with less RAM than the aggregate needs of the group of
partitions.  The firmware will balance memory between the partitions
and page in/out memory as needed.  Based on the number and type of IO
adpaters preset each partition is allocated an amount of memory for
DMA operations and this allocation will be guaranteed to the partition;
this is referred to as the partition's 'entitlement'.

Partitions running in a CMO environment can only have virtual IO devices
present.  The VIO bus layer will manage the IO entitlement for the system.
Accounting, at a system and per-device level, is tracked in the VIO bus
code and exposed via sysfs.  A set of dma_ops functions are added to
the bus to allow for this accounting.

Bus initialization

At initialization, the bus will calculate the minimum needs of the system
based on providing each device present with a standard minimum entitlement
along with a spare allocation for the bus to handle hotplug events.
If the minimum needs can not be met the system boot will be halted.

Device changes

The significant changes for devices while running under CMO are that the
devices must specify how much dedicated IO entitlement they desire and
must also handle DMA mapping errors that can occur due to constrained
IO memory.  The virtual IO drivers are modified to silence errors when
DMA mappings fail for CMO and handle these failures gracefully.

Each devices will be guaranteed a minimum entitlement that can always
be mapped.  Devices will specify how much entitlement they desire and
the VIO bus will attempt to provide for this.  Devices can change their
desired entitlement level at any point in time to address particular needs
(via vio_cmo_set_dev_desired()), not just at device probe time.

VIO bus changes

The system will have a particular entitlement level available from which
it can provide memory to the devices.  The bus defines two pools of memory
within this entitlement, the reserved and excess pools.  Each device is
provided with it's own entitlement no less than a system defined minimum
entitlement and no greater than what the device has specified as it's
desired entitlement.  The entitlement provided to devices comes from the
reserve pool.  The reserve pool can also contain a spare allocation as
large as the system defined minimum entitlement which is used for device
hotplug events.  Any entitlement not needed to fulfill the needs of a
reserve pool is placed in the excess pool.  Each device is guaranteed
that it can map up to it's entitled level; additional mapping are possible
as long as there is unmapped memory in the excess pool.

Bus probe

As the system starts, each device is given an entitlement equal only
to the system defined minimum entitlement.  The reserve pool is equal
to the sum of these entitlements, plus a spare allocation.  The VIO bus
also tracks the aggregate desired entitlement of all the devices.  If the
system desired entitlement is greater than the size of the reserve pool,
when devices unmap IO memory it will be reserved and a balance operation
will be scheduled for some time in the future.

Entitlement balancing

The balance function tries to fairly distribute entitlement between the
devices in the system with the goal of providing each device with it's
desired amount of entitlement.  Devices using more than what would be
ideal will have their entitled set-point adjusted; this will effectively
set a goal for lower IO memory usage as future mappings can fail and
deallocations will trigger a balance operation to distribute the newly
unmapped memory.  A fair distribution of entitlement can take several
balance operations to achieve.  Entitlement changes and device DLPAR
events will alter the state of CMO and will trigger balance operations.

Hotplug events

The VIO bus allows for changes in system entitlement at run-time via
'vio_cmo_entitlement_update()'.  When devices are added the hotplug
device event will be preceded by a system entitlement increase and this
is reversed when devices are removed.

The following changes are made that the VIO bus layer for CMO:
 * add IO memory accounting per device structure.
 * add IO memory entitlement query function to driver structure.
 * during vio bus probe, if CMO is enabled, check that driver has
   memory entitlement query function defined.  Fail if function not defined.
 * fail to register driver if io entitlement function not defined.
 * create set of dma_ops at vio level for CMO that will track allocations
   and return DMA failures once entitlement is reached.  Entitlement will
   limited by overall system entitlement.  Devices will have a reserved
   quantity of memory that is guaranteed, the rest can be used as available.
 * expose entitlement, current allocation, desired allocation, and the
   allocation error counter for devices to the user through sysfs
 * provide mechanism for changing a device's desired entitlement at run time
   for devices as an exported function and sysfs tunable
 * track any DMA failures for entitled IO memory for each vio device.
 * check entitlement against available system entitlement on device add
 * track entitlement metrics (high water mark, current usage)
 * provide function to reset high water mark
 * provide minimum and desired entitlement numbers at a bus level
 * provide drivers with a minimum guaranteed entitlement
 * balance available entitlement between devices to satisfy their needs
 * handle system entitlement changes and device hotplug
Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

a90ab95a

powerpc/pseries: iommu enablement for CMO · 6490c490

由 Robert Jennings 提交于 7月 24, 2008

To support Cooperative Memory Overcommitment (CMO), we need to check
for failure from some of the tce hcalls.

These changes for the pseries platform affect the powerpc architecture;
patches for the other affected platforms are included in this patch.

pSeries platform IOMMU code changes:
 * platform TCE functions must handle H_NOT_ENOUGH_RESOURCES errors and
   return an error.

Architecture IOMMU code changes:
 * Calls to ppc_md.tce_build need to check return values and return
   DMA_MAPPING_ERROR for transient errors.

Architecture changes:
 * struct machdep_calls for tce_build*_pSeriesLP functions need to change
   to indicate failure.
 * all other platforms will need updates to iommu functions to match the new
   calling semantics; they will return 0 on success.  The other platforms
   default configs have been built, but no further testing was performed.
Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
Acked-by: NOlof Johansson <olof@lixom.net>
Acked-by: NPaul Mackerras <paulus@samba.org>
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>

6490c490