提交 · c2c9f115741453715d6b4da1cd2de65af8c7ad86 · openanolis / cloud-kernel

16 12月, 2009 40 次提交

x86: uv: update XPC to handle updated BIOS interface · c2c9f115

由 Robin Holt 提交于 12月 15, 2009

The UV BIOS has moved the location of some of their pointers to the
"partition reserved page" from memory into a uv hub MMR.  The GRU does not
support bcopy operations from MMR space so we need to special case the MMR
addresses using VLOAD operations.

Additionally, the BIOS call for registering a message queue watchlist has
removed the 'blade' value and eliminated the structure that was being
passed in.  This is also reflected in this patch.
Signed-off-by: NRobin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c2c9f115

X86: uv: implement a gru_read_gpa kernel function · 289750d1

由 Robin Holt 提交于 12月 15, 2009

The BIOS has decided to store a pointer to the partition reserved page in
a scratch MMR.  The GRU is only able to read an MMR using a vload
instruction.  The gru_read_gpa() function will implemented.
Signed-off-by: NRobin Holt <holt@sgi.com>
Signed-off-by: NJack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

289750d1

x86: uv: introduce uv_gpa_is_mmr · fae419f2

由 Robin Holt 提交于 12月 15, 2009

Provide a mechanism for determining if a global physical address is
pointing to a UV hub MMR.
Signed-off-by: NRobin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fae419f2

x86: uv: xpc needs to provide an abstraction for uv_gpa · 68212893

由 Robin Holt 提交于 12月 15, 2009

Provide an SGI SN2/UV agnositic method for converting a global physical
address into a socket physical address.
Signed-off-by: NRobin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

68212893

x86: uv: introduce a means to translate from gpa -> socket_paddr · 729d69e6

由 Robin Holt 提交于 12月 15, 2009

The UV BIOS has been updated to implement some of our interface
functionality differently than originally expected.  These patches update
the kernel to the bios implementation and include a few minor bug fixes
which prevent us from doing significant testing on real hardware.

This patch:

For SGI UV systems, translate from a global physical address back to a
socket physical address.  This does nothing to ensure the socket physical
address is actually addressable by the kernel.  That is the responsibility
of the user of the function.
Signed-off-by: NRobin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

729d69e6

direct-io: cleanup blockdev_direct_IO locking · 5fe878ae

由 Christoph Hellwig 提交于 12月 15, 2009

Currently the locking in blockdev_direct_IO is a mess, we have three
different locking types and very confusing checks for some of them.  The
most complicated one is DIO_OWN_LOCKING for reads, which happens to not
actually be used.

This patch gets rid of the DIO_OWN_LOCKING - as mentioned above the read
case is unused anyway, and the write side is almost identical to
DIO_NO_LOCKING.  The difference is that DIO_NO_LOCKING always sets the
create argument for the get_blocks callback to zero, but we can easily
move that to the actual get_blocks callbacks.  There are four users of the
DIO_NO_LOCKING mode: gfs already ignores the create argument and thus is
fine with the new version, ocfs2 only errors out if create were ever set,
and we can remove this dead code now, the block device code only ever uses
create for an error message if we are fully beyond the device which can
never happen, and last but not least XFS will need the new behavour for
writes.

Now we can replace the lock_type variable with a flags one, where no flag
means the DIO_NO_LOCKING behaviour and DIO_LOCKING is kept as the first
flag.  Separate out the check for not allowing to fill holes into a
separate flag, although for now both flags always get set at the same
time.

Also revamp the documentation of the locking scheme to actually make
sense.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Zach Brown <zach.brown@oracle.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alex Elder <aelder@sgi.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5fe878ae

dio: don't zero out the pages array inside struct dio · 23aee091

由 Jeff Moyer 提交于 12月 15, 2009

Intel reported a performance regression caused by the following commit:

commit 848c4dd5
Author: Zach Brown <zach.brown@oracle.com>
Date:   Mon Aug 20 17:12:01 2007 -0700

    dio: zero struct dio with kzalloc instead of manually

    This patch uses kzalloc to zero all of struct dio rather than
    manually trying to track which fields we rely on being zero.  It
    passed aio+dio stress testing and some bug regression testing on
    ext3.

    This patch was introduced by Linus in the conversation that lead up
    to Badari's minimal fix to manually zero .map_bh.b_state in commit:

      6a648fa7

    It makes the code a bit smaller.  Maybe a couple fewer cachelines to
    load, if we're lucky:

       text    data     bss     dec     hex filename
    3285925  568506 1304616 5159047  4eb887 vmlinux
    3285797  568506 1304616 5158919  4eb807 vmlinux.patched

    I was unable to measure a stable difference in the number of cpu
    cycles spent in blockdev_direct_IO() when pushing aio+dio 256K reads
    at ~340MB/s.

    So the resulting intent of the patch isn't a performance gain but to
    avoid exposing ourselves to the risk of finding another field like
    .map_bh.b_state where we rely on zeroing but don't enforce it in the
    code.

Zach surmised that zeroing out the page array was what caused most of
the problem, and suggested the approach taken in the attached patch for
resolving the issue.  Intel re-tested with this patch and saw a 0.6%
performance gain (the original regression was 0.5%).

[akpm@linux-foundation.org: add comment]
Signed-off-by: NJeff Moyer <jmoyer@redhat.com>
Acked-by: NZach Brown <zach.brown@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

23aee091

aio: remove unused field · fac046ad

由 Shaohua Li 提交于 12月 15, 2009

Don't know the reason, but it appears ki_wait field of iocb never gets used.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Benjamin LaHaise <bcrl@kvack.org>
Cc: Zach Brown <zach.brown@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fac046ad

FS-Cache: Avoid maybe-used-uninitialised warning on variable · ea58ceb5

由 David Howells 提交于 12月 15, 2009

Andrew Morton's compiler sees the following warning in FS-Cache:

fs/fscache/object-list.c: In function 'fscache_objlist_lookup':
fs/fscache/object-list.c:94: warning: 'obj' may be used uninitialized in this function

which my compiler doesn't.  This is a false positive as obj can only be
used in the comparison against minobj if minobj has been set to something
other than NULL, but for that to happen, obj has to be first set to
something.

Deal with this by preclearing obj too.
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ea58ceb5

kexec: premit reduction of the reserved memory size · 06a7f711

由 Amerigo Wang 提交于 12月 15, 2009

Implement shrinking the reserved memory for crash kernel, if it is more
than enough.

For example, if you have already reserved 128M, now you just want 100M,
you can do:

# echo $((100*1024*1024)) > /sys/kernel/kexec_crash_size

Note, you can only do this before loading the crash kernel.
Signed-off-by: NWANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@redhat.com>
Acked-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

06a7f711

parport_pc.c: use correct length in strncmp · 1f2c19f8

由 Joe Perches 提交于 12月 15, 2009

Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1f2c19f8

dma-mapping: fix off-by-one error in dma_capable() · ac2b3e67

由 Jan Beulich 提交于 12月 15, 2009

dma_mask is, when interpreted as address, the last valid byte, and hence
comparison msut also be done using the last valid of the buffer in
question.

Also fix the open-coded instances in lib/swiotlb.c.
Signed-off-by: NJan Beulich <jbeulich@novell.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Becky Bruce <beckyb@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac2b3e67

edac: i5100 add 6 ranks per channel · bbead210

由 Nils Carlson 提交于 12月 15, 2009

Add support for 6 ranks per channel to the i5100 chipset. I have tested
the patch as far as possible with correctible errors and things appear
good. The DIMM mapping is correct for our board, but boards may differ.
Signed-off-by: NNils Carlson <nils.carlson@ludd.ltu.se>
Acked-by: NArthur Jones <ajones@riverbed.com>
Signed-off-by: NDoug Thompson <dougthompson@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bbead210

edac: i5100 add scrubbing · 295439f2

由 Nils Carlson 提交于 12月 15, 2009

Addscrubbing to the i5100 chipset. The i5100 chipset only supports one
scrubbing rate, which is not constant but dependent on memory load. The
rate returned by this driver is an estimate based on some experimentation,
but is substantially closer to the truth than the speed supplied in the
documentation.

Also, scrubbing is done once, and then a done-bit is set. This means that
to accomplish continuous scrubbing a re-enabling mechanism must be used.
I have created the simplest possible such mechanism in the form of a
work-queue which will check every five minutes. This interval is quite
arbitrary but should be sufficient for all sizes of system memory.
Signed-off-by: NNils Carlson <nils.carlson@ludd.ltu.se>
Signed-off-by: NDoug Thompson <dougthompson@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

295439f2

edac: i5100 clean controller to channel terms · b18dfd05

由 Nils Carlson 提交于 12月 15, 2009

The i5100 driver uses the word controller instead of channel in a lot of
places, this is simply a cleanup of the patch.
Signed-off-by: NNils Carlson <nils.carlson@ludd.ltu.se>
Signed-off-by: NDoug Thompson <dougthompson@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b18dfd05

pid: reduce code size by using a pointer to iterate over array · 417e3152

由 André Goddard Rosa 提交于 12月 15, 2009

It decreases code size by 16 bytes on my gcc 4.4.1 on Core 2:
  text    data     bss     dec     hex filename
  4314    2216       8    6538    198a kernel/pid.o-BEFORE
  4298    2216       8    6522    197a kernel/pid.o-AFTER
Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

417e3152

pid: tighten pidmap spinlock critical section by removing kfree() · 7be6d991

由 André Goddard Rosa 提交于 12月 15, 2009

Avoid calling kfree() under pidmap spinlock, calling it afterwards.

Normally kfree() is fast, but sometimes it can be slow, so avoid
calling it under the spinlock if we can do it.
Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7be6d991

elf: kill USE_ELF_CORE_DUMP · 698ba7b5

由 Christoph Hellwig 提交于 12月 15, 2009

Currently all architectures but microblaze unconditionally define
USE_ELF_CORE_DUMP.  The microblaze omission seems like an error to me, so
let's kill this ifdef and make sure we are the same everywhere.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: <linux-arch@vger.kernel.org>
Cc: Michal Simek <michal.simek@petalogix.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

698ba7b5

drivers/char/ipmi: Use KCS_IDLE_STATE · d1da96aa

由 Julia Lawall 提交于 12月 15, 2009

KCS_IDLE and KCS_IDLE state have the same value, but in this function the
constants ending in _STATE are compared to the state variable.
Signed-off-by: NJulia Lawall <julia@diku.dk>
Acked-by: NCore Minyard <cminyard@mvista.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d1da96aa

ipc: HARD_MSGMAX should be higher not lower on 64bit · 9cf18e1d

由 Amerigo Wang 提交于 12月 15, 2009

We have HARD_MSGMAX lower on 64bit than on 32bit, since usually 64bit
machines have more memory than 32bit machines.

Making it higher on 64bit seems reasonable, and keep the original number
on 32bit.
Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Signed-off-by: NWANG Cong <amwang@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9cf18e1d

ipc: remove unreachable code in sem.c · e5cc9c7b

由 Amerigo Wang 提交于 12月 15, 2009

This line is unreachable, remove it.

[akpm@linux-foundation.org: remove unneeded initialisation of `err']
Signed-off-by: NWANG Cong <amwang@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e5cc9c7b

ipc/sem.c: optimize single sops when semval is zero · d987f8b2

由 Manfred Spraul 提交于 12月 15, 2009

If multiple simple decrements on the same semaphore are pending, then the
current code scans all decrement operations, even if the semaphore value
is already 0.

The patch optimizes that: if the semaphore value is 0, then there is no
need to scan the q->alter entries.

Note that this is a common case: It happens if 100 decrements by one are
pending and now an increment by one increases the semaphore value from 0
to 1.  Without this patch, all 100 entries are scanned.  With the patch,
only one entry is scanned, then woken up.  Then the new rule triggers and
the scanning is aborted, without looking at the remaining 99 tasks.

With this patch, single sop increment/decrement by 1 are now O(1).
(same as with Nick's patch)
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d987f8b2

ipc/sem.c: optimize single semop operations · 636c6be8

由 Manfred Spraul 提交于 12月 15, 2009

sysv sem has the concept of semaphore arrays that consist out of multiple
semaphores.  Atomic operations that affect multiple semaphores are
supported.

The patch optimizes single semaphore operation calls that affect only one
semaphore: It's not necessary to scan all pending operations, it is
sufficient to scan the per-semaphore list.

The idea is from Nick Piggin version of an ipc sem improvement, the
implementation is different: The code tries to keep as much common code as
possible.

As the result, the patch is simpler, but optimizes fewer cases.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

636c6be8

ipc/sem.c: add a per-semaphore pending list · b97e820f

由 Manfred Spraul 提交于 12月 15, 2009

Based on Nick's findings:

sysv sem has the concept of semaphore arrays that consist out of multiple
semaphores.  Atomic operations that affect multiple semaphores are
supported.

The patch is the first step for optimizing simple, single semaphore
operations: In addition to the global list of all pending operations, a
2nd, per-semaphore list with the simple operations is added.

Note: this patch does not make sense by itself, the new list is used
nowhere.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b97e820f

ipc/sem.c: optimize if semops fail · b6e90822

由 Manfred Spraul 提交于 12月 15, 2009

Reduce the amount of scanning of the list of pending semaphore operations:
If try_atomic_semop failed, then no changes were applied.  Thus no need to
restart.

Additionally, this patch correct an incorrect comment: It's possible to
wait for arbitrary semaphore values (do a dec by <x>, wait-for-zero, inc
by <x> in one atomic operation)

Both changes are from Nick Piggin, the patch is the result of a different
split of the individual changes.
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b6e90822

ipc/sem.c: sem preempt improve · d4212093

由 Nick Piggin 提交于 12月 15, 2009

The strange sysv semaphore wakeup scheme has a kind of busy-wait lock
involved, which could deadlock if preemption is enabled during the "lock".

It is an implementation detail (due to a spinlock being held) that this is
actually the case. However if "spinlocks" are made preemptible, or if the
sem lock is changed to a sleeping lock for example, then the wakeup would
become buggy. So this might be a bugfix for -rt kernels.

Imagine waker being preempted by wakee and never clearing IN_WAKEUP -- if
wakee has higher RT priority then there is a priority inversion deadlock.
Even if there is not a priority inversion to cause a deadlock, then there
is still time wasted spinning.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d4212093

ipc/sem.c: sem use list operations · 9cad200c

由 Nick Piggin 提交于 12月 15, 2009

Replace the handcoded list operations in update_queue() with the standard
list_for_each_entry macros.

list_for_each_entry_safe() must be used, because list entries can
disappear immediately uppon the wakeup event.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9cad200c

ipc/sem.c: sem optimise undo list search · bf17bb71

由 Nick Piggin 提交于 12月 15, 2009

Around a month ago, there was some discussion about an improvement of the
sysv sem algorithm: Most (at least: some important) users only use simple
semaphore operations, therefore it's worthwile to optimize this use case.

This patch:

Move last looked up sem_undo struct to the head of the task's undo list.
Attempt to move common entries to the front of the list so search time is
reduced.  This reduces lookup_undo on oprofile of problematic SAP workload
by 30% (see patch 4 for a description of SAP workload).
Signed-off-by: NNick Piggin <npiggin@suse.de>
Signed-off-by: NManfred Spraul <manfred@colorfullife.com>
Cc: Pierre Peiffer <peifferp@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bf17bb71

ipc ns: fix memory leak (idr) · 7d6feeb2

由 Serge E. Hallyn 提交于 12月 15, 2009

We have apparently had a memory leak since
7ca7e564 "ipc: store ipcs into IDRs" in
2007.  The idr of which 3 exist for each ipc namespace is never freed.

This patch simply frees them when the ipcns is freed.  I don't believe any
idr_remove() are done from rcu (and could therefore be delayed until after
this idr_destroy()), so the patch should be safe.  Some quick testing
showed no harm, and the memory leak fixed.

Caught by kmemleak.
Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7d6feeb2

signals: check ->group_stop_count after tracehook_get_signal() · 1be53963