提交 · 96a388de5dc53a8b234b3fd41f3ae2cedc9ffd42 · xiphi1978 / linux

11 10月, 2007 3 次提交

i386/x86_64: move headers to include/asm-x86 · 96a388de

由 Thomas Gleixner 提交于 10月 11, 2007

Move the headers to include/asm-x86 and fixup the
header install make rules
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

96a388de

x86_64: remove unused header file: · c339dd68

由 Thomas Gleixner 提交于 10月 11, 2007

Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NIngo Molnar <mingo@elte.hu>

c339dd68

T
i386: remove module.h include from termios.h · b7d33128
由 Thomas Gleixner 提交于 10月 11, 2007
```
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
```
b7d33128

09 10月, 2007 1 次提交

mm: set_page_dirty_balance() vs ->page_mkwrite() · a200ee18

由 Peter Zijlstra 提交于 10月 08, 2007

All the current page_mkwrite() implementations also set the page dirty. Which
results in the set_page_dirty_balance() call to _not_ call balance, because the
page is already found dirty.

This allows us to dirty a _lot_ of pages without ever hitting
balance_dirty_pages().  Not good (tm).

Force a balance call if ->page_mkwrite() was successful.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a200ee18

08 10月, 2007 2 次提交

[ROSE]: Fix rose.ko oops on unload · 891e6a93

由 Alexey Dobriyan 提交于 10月 07, 2007

Commit a3d38402 aka
"[AX.25]: Fix unchecked rose_add_loopback_neigh uses"
transformed rose_loopback_neigh var into statically allocated one.
However, on unload it will be kfree's which can't work.

Steps to reproduce:

	modprobe rose
	rmmod rose

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
 printing eip:
c014c664
*pde = 00000000
Oops: 0000 [#1]
PREEMPT DEBUG_PAGEALLOC
Modules linked in: rose ax25 fan ufs loop usbhid rtc snd_intel8x0 snd_ac97_codec ehci_hcd ac97_bus uhci_hcd thermal usbcore button processor evdev sr_mod cdrom
CPU:    0
EIP:    0060:[<c014c664>]    Not tainted VLI
EFLAGS: 00210086   (2.6.23-rc9 #3)
EIP is at kfree+0x48/0xa1
eax: 00000556   ebx: c1734aa0   ecx: f6a5e000   edx: f7082000
esi: 00000000   edi: f9a55d20   ebp: 00200287   esp: f6a5ef28
ds: 007b   es: 007b   fs: 0000  gs: 0033  ss: 0068
Process rmmod (pid: 1823, ti=f6a5e000 task=f7082000 task.ti=f6a5e000)
Stack: f9a55d20 f9a5200c 00000000 00000000 00000000 f6a5e000 f9a5200c f9a55a00 
       00000000 bf818cf0 f9a51f3f f9a55a00 00000000 c0132c60 65736f72 00000000 
       f69f9630 f69f9528 c014244a f6a4e900 00200246 f7082000 c01025e6 00000000 
Call Trace:
 [<f9a5200c>] rose_rt_free+0x1d/0x49 [rose]
 [<f9a5200c>] rose_rt_free+0x1d/0x49 [rose]
 [<f9a51f3f>] rose_exit+0x4c/0xd5 [rose]
 [<c0132c60>] sys_delete_module+0x15e/0x186
 [<c014244a>] remove_vma+0x40/0x45
 [<c01025e6>] sysenter_past_esp+0x8f/0x99
 [<c012bacf>] trace_hardirqs_on+0x118/0x13b
 [<c01025b6>] sysenter_past_esp+0x5f/0x99
 =======================
Code: 05 03 1d 80 db 5b c0 8b 03 25 00 40 02 00 3d 00 40 02 00 75 03 8b 5b 0c 8b 73 10 8b 44 24 18 89 44 24 04 9c 5d fa e8 77 df fd ff <8b> 56 08 89 f8 e8 84 f4 fd ff e8 bd 32 06 00 3b 5c 86 60 75 0f 
EIP: [<c014c664>] kfree+0x48/0xa1 SS:ESP 0068:f6a5ef28
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

891e6a93

Don't do load-average calculations at even 5-second intervals · 0c2043ab

由 Linus Torvalds 提交于 10月 07, 2007

It turns out that there are a few other five-second timers in the
kernel, and if the timers get in sync, the load-average can get
artificially inflated by events that just happen to coincide.

So just offset the load average calculation it by a timer tick.

Noticed by Anders Boström, for whom the coincidence started triggering
on one of his machines with the JBD jiffies rounding code (JBD is one of
the subsystems that also end up using a 5-second timer by default).
Tested-by: NAnders Boström <anders@bostrom.dyndns.org>
Cc: Chuck Ebbert <cebbert@redhat.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0c2043ab

05 10月, 2007 1 次提交

Remove unnecessary cast in prefetch() · 4ecbca85

由 Serge Belyshev 提交于 10月 04, 2007

It is ok to call prefetch() function with NULL argument, as specifically
commented in include/linux/prefetch.h.  But in standard C, it is invalid
to dereference NULL pointer (see C99 standard 6.5.3.2 paragraph 4 and
note #84).

prefetch() has a memory reference for its argument.

Newer gcc versions (4.3 and above) will use that to conclude that "x"
argument is non-null and thus wreaking havok everywhere prefetch() was
inlined.

Fixed by removing cast and changing asm constraint.

[ It seems in theory gcc 4.2 could miscompile this too; although no
  cases known.  In 2.6.24 we should probably switch to
  __builtin_prefetch() instead, but this is a simpler fix for now.
				-- AK ]
Signed-off-by: NSerge Belyshev <belyshev@depni.sinp.msu.ru>
Signed-off-by: NAndi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ecbca85

04 10月, 2007 2 次提交

M
Blackfin arch: fix PORT_J BUG for BF537/6 EMAC driver reported by Kalle Pokki <kalle.pokki@iki.fi> · cda6a20b
由 Michael Hennerich 提交于 10月 04, 2007
```
Cc: Kalle Pokki <kalle.pokki@iki.fi>
Signed-off-by: NMichael Hennerich <michael.hennerich@analog.com>
Signed-off-by: NBryan Wu <bryan.wu@analog.com>
```
cda6a20b

Blackfin arch: gpio pinmux and resource allocation API required by BF537 on... · c58c2140

由 Michael Hennerich 提交于 10月 04, 2007

Blackfin arch: gpio pinmux and resource allocation API required by BF537 on chip ethernet mac driver
Signed-off-by: NMichael Hennerich <michael.hennerich@analog.com>
Signed-off-by: NBryan Wu <bryan.wu@analog.com>

c58c2140

03 10月, 2007 2 次提交

[MIPS] Terminally fix local_{dec,sub}_if_positive · 9ea0f043

由 Ralf Baechle 提交于 10月 03, 2007

They contain 64-bit instructions so wouldn't work on 32-bit kernels or
32-bit hardware. Since there are no users, blow them away. They
probably were only ever created because there are atomic_sub_if_positive
and atomic_dec_if_positive which exist only for sake of semaphores.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

9ea0f043

R
[MIPS] Type proof reimplementation of cmpxchg. · fef74705
由 Ralf Baechle 提交于 10月 01, 2007
```
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
```
fef74705

01 10月, 2007 1 次提交

[MIPS] Fix value of O_TRUNC · ca074a33

由 Ralf Baechle 提交于 9月 30, 2007

A "cleanup" almost two years ago deleted the old definition from
<asm/fcntl.h>, so asm-generic/fcntl.h defaulted it to the the same
value as FASYNC ...   which happened to be the wrong thing.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

ca074a33

30 9月, 2007 1 次提交

i386: remove bogus comment about memory barrier · 4827bbb0

由 Nick Piggin 提交于 9月 29, 2007

The comment being removed by this patch is incorrect and misleading.

In the following situation:

	1. load  ...
	2. store 1 -> X
	3. wmb
	4. rmb
	5. load  a <- Y
	6. store ...

4 will only ensure ordering of 1 with 5.
3 will only ensure ordering of 2 with 6.

Further, a CPU with strictly in-order stores will still only provide that
2 and 6 are ordered (effectively, it is the same as a weakly ordered CPU
with wmb after every store).

In all cases, 5 may still be executed before 2 is visible to other CPUs!

The additional piece of the puzzle that mb() provides is the store/load
ordering, which fundamentally cannot be achieved with any combination of
rmb()s and wmb()s.

This can be an unexpected result if one expected any sort of global ordering
guarantee to barriers (eg. that the barriers themselves are sequentially
consistent with other types of barriers).  However sfence or lfence barriers
need only provide an ordering partial ordering of memory operations -- Consider
that wmb may be implemented as nothing more than inserting a special barrier
entry in the store queue, or, in the case of x86, it can be a noop as the store
queue is in order. And an rmb may be implemented as a directive to prevent
subsequent loads only so long as their are no previous outstanding loads (while
there could be stores still in store queues).

I can actually see the occasional load/store being reordered around lfence on
my core2. That doesn't prove my above assertions, but it does show the comment
is wrong (unless my program is -- can send it out by request).

So:
   mb() and smp_mb() always have and always will require a full mfence
   or lock prefixed instruction on x86.  And we should remove this comment.
Signed-off-by: NNick Piggin <npiggin@suse.de>
Cc: Paul McKenney <paulmck@us.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4827bbb0

29 9月, 2007 2 次提交

[TCP]: Fix MD5 signature handling on big-endian. · f8ab18d2

由 David S. Miller 提交于 9月 28, 2007

Based upon a report and initial patch by Peter Lieven.

tcp4_md5sig_key and tcp6_md5sig_key need to start with
the exact same members as tcp_md5sig_key.  Because they
are both cast to that type by tcp_v{4,6}_md5_do_lookup().

Unfortunately tcp{4,6}_md5sig_key use a u16 for the key
length instead of a u8, which is what tcp_md5sig_key
uses.  This just so happens to work by accident on
little-endian, but on big-endian it doesn't.

Instead of casting, just place tcp_md5sig_key as the first member of
the address-family specific structures, adjust the access sites, and
kill off the ugly casts.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8ab18d2

[MIPS] Fix CONFIG_BUILD_ELF64 kernels with symbols in CKSEG0. · 9ae6399f

由 Ralf Baechle 提交于 9月 11, 2007

The __pa() for those did assume that all symbols have XKPHYS values and
the math fails for any other address range.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

9ae6399f

28 9月, 2007 1 次提交

[MIPS] Fix CONFIG_BUILD_ELF64 kernels with symbols in CKSEG0. · 7d809ba3

由 Ralf Baechle 提交于 9月 11, 2007

The __pa() for those did assume that all symbols have XKPHYS values and
the math fails for any other address range.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

7d809ba3

27 9月, 2007 4 次提交

Revert "[PATCH] x86-64: fix x86_64-mm-sched-clock-share" · ff0ce684

由 Linus Torvalds 提交于 9月 26, 2007

This reverts commit 184c44d2.

As noted by Dave Jones:
   "Linus, please revert the above cset.  It doesn't seem to be
    necessary (it was added to fix a miscompile in 'make allnoconfig'
    which doesn't seem to be repeatable with it reverted) and actively
   breaks the ARM SA1100 framebuffer driver."
Requested-by: NDave Jones <davej@redhat.com>
Cc: Russell King <rmk+lkml@arm.linux.org.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andi Kleen <ak@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ff0ce684

Revert "x86-64: Disable local APIC timer use on AMD systems with C1E" · f7f847b0

由 Linus Torvalds 提交于 9月 26, 2007

This reverts commit e66485d7, since
Rafael Wysocki noticed that the change only works for his in -mm, not in
mainline (and that both "noapictimer" _and_ "apicmaintimer" are broken
on his hardware, but that's apparently not a regression, just a symptom
of the same issue that causes the automatic apic timer disable to not
work).

It turns out that it really doesn't work correctly on x86-64, since
x86-64 doesn't use the generic clock events for timers yet.

Thanks to Rafal for testing, and here's the ugly details on x86-64 as
per Thomas:

  "I just looked into the code and the logic vs.  noapictimer on SMP is
   completely broken.

   On i386 the noapictimer option not only disables the local APIC
   timer, it also registers the CPUs for broadcasting via IPI on SMP
   systems.

   The x86-64 code uses the broadcast only when the local apic timer is
   active, i.e.  "noapictimer" is not on the command line.  This defeats
   the whole purpose of "noapictimer".  It should be there to make boxen
   work, where the local APIC timer actually has a hardware problem,
   e.g.  the nx6325.

   The current implementation of x86_64 only fixes the ACPI c-states
   related problem where the APIC timer stops in C3(2), nothing else.

   On nx6325 and other AMD X2 equipped systems which have the C1E
   enabled we run into the following:

   PIT keeps jiffies (and the system) running, but the local APIC timer
   interrupts can get out of sync due to this C1E effect.

   I don't think this is a critical problem, but it is wrong
   nevertheless.

   I think it's safe to revert the C1E patch and postpone the fix to the
   clock events conversion."

On further reflection, Thomas noted:

   "It's even worse than I thought on the first check:

    "noapictimer" on the command line of an SMP box prevents _ONLY_ the
    boot CPU apic timer from being used.  But the secondary CPU is still
    unconditionally setting up the APIC timer and uses the non
    calibrated variable calibration_result, which is of course 0, to
    setup the APIC timer.  Wreckage guaranteed."

so we'll just have to wait for the x86 merge to hopefully fix this up
for x86-64.
Tested-and-requested-by: NRafael J. Wysocki <rjw@sisk.pl>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f7f847b0

x86-64: Disable local APIC timer use on AMD systems with C1E · e66485d7

由 Thomas Gleixner 提交于 9月 25, 2007

commit 3556ddfa titled

 [PATCH] x86-64: Disable local APIC timer use on AMD systems with C1E

solves a problem with AMD dual core laptops e.g. HP nx6325 (Turion 64
X2) with C1E enabled:

When both cores go into idle at the same time, then the system switches
into C1E state, which is basically the same as C3. This stops the local
apic timer.

This was debugged right after the dyntick merge on i386 and despite the
patch title it fixes only the 32 bit path.

x86_64 is still missing this fix. It seems that mainline is not really
affected by this issue, as the PIT is running and keeps jiffies
incrementing, but that's just waiting for trouble.

-mm suffers from this problem due to the x86_64 high resolution timer
patches.

This is a quick and dirty port of the i386 code to x86_64.

I spent quite a time with Rafael to debug the -mm / hrt wreckage until
someone pointed us to this. I really had forgotten that we debugged this
half a year ago already.

Sigh, is it just me or is there something yelling arch/x86 into my ear?
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e66485d7

fix sctp_del_bind_addr() last argument type · 78bd8fbb

由 Al Viro 提交于 9月 26, 2007

It gets pointer to fastcall function, expects a pointer to normal
one and calls the sucker.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

78bd8fbb

26 9月, 2007 3 次提交

SCTP : Add paramters validity check for ASCONF chunk · 6f4c618d

由 Wei Yongjun 提交于 9月 19, 2007

If ADDIP is enabled, when an ASCONF chunk is received with ASCONF
paramter length set to zero, this will cause infinite loop.
By the way, if an malformed ASCONF chunk is received, will cause
processing to access memory without verifying.

This is because of not check the validity of parameters in ASCONF chunk.
This patch fixed this.
Signed-off-by: NWei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

6f4c618d

SCTP: Clean up OOTB handling and fix infinite loop processing · ece25dfa

由 Vlad Yasevich 提交于 9月 07, 2007

While processing OOTB chunks as well as chunks with an invalid
length of 0, it was possible to SCTP to get wedged inside an
infinite loop because we didn't catch the condition correctly,
or didn't mark the packet for discard correctly.
This work is based on original findings and work by
Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>

ece25dfa

ACPI: CONFIG_ACPI_SLEEP=n power off regression in 2.6.23-rc8 (NOT in rc7) · 853298bc

由 Alexey Starikovskiy 提交于 9月 25, 2007

Signed-off-by: NAlexey Starikovskiy <astarikovskiy@suse.de>
Acked-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLen Brown <len.brown@intel.com>

853298bc

25 9月, 2007 1 次提交

[MIPS] SMTC: Make ack_bad_irq() safe with no IM backstop. · 1146fe30

由 Ralf Baechle 提交于 9月 21, 2007

Issue reported and original patch by Kevin Kissel, cleaner (imho)
implementation by me.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

1146fe30

23 9月, 2007 2 次提交

ACPI: disable lower idle C-states across suspend/resume · b04e7bdb

由 Thomas Gleixner 提交于 9月 22, 2007

device_suspend() calls ACPI suspend functions, which seems to have undesired
side effects on lower idle C-states. It took me some time to realize that
especially the VAIO BIOSes (both Andrews jinxed UP and my elfstruck SMP one)
show this effect. I'm quite sure that other bug reports against suspend/resume
about turning the system into a brick have the same root cause.

After fishing in the dark for quite some time, I realized that removing the ACPI
processor module before suspend (this removes the lower C-state functionality)
made the problem disappear. Interestingly enough the propability of having a
bricked box is influenced by various factors (interrupts, size of the ram image,
...). Even adding a bunch of printks in the wrong places made the problem go
away. The previous periodic tick implementation simply pampered over the
problem, which explains why the dyntick / clockevents changes made this more
prominent.

We avoid complex functionality during the boot process and we have to do the
same during suspend/resume. It is a similar scenario and equaly fragile.

Add suspend / resume functions to the ACPI processor code and disable the lower
idle C-states across suspend/resume. Fall back to the default idle
implementation (halt) instead.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Tested-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b04e7bdb

Blackfin arch: add some missing syscall · 0b95f22b

由 Bryan Wu 提交于 9月 23, 2007

When compiling the Blackfin kernel, checksyscalls.pl will report lots of missing syscalls warnings.
This patch will add some missing syscalls which make sense on Blackfin arch

After appling this patch, toolchain should be rebuilt. Then recompiling the kernel with the new
toolchain.
Signed-off-by: NBryan Wu <bryan.wu@analog.com>

0b95f22b

03 10月, 2007 1 次提交

Binfmt_flat: Add minimum support for the Blackfin relocations · f9720205

由 Bernd Schmidt 提交于 10月 03, 2007

Add minimum support for the Blackfin relocations, since we don't have
enough space in each reloc.  The idea is to store a value with one
relocation so that subsequent ones can access it.

Actually, this patch is required for Blackfin.  Currently if BINFMT_FLAT is
enabled, git-tree kernel will fail to compile.
Signed-off-by: NBernd Schmidt <bernd.schmidt@analog.com>
Signed-off-by: NBryan Wu <bryan.wu@analog.com>
Cc: David McCullough <davidm@snapgear.com>
Cc: Greg Ungerer <gerg@snapgear.com>
Cc: Miles Bader <miles.bader@necel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

f9720205

22 9月, 2007 1 次提交

Revert "x86_64: Quicklist support for x86_64" · da8f153e

由 Linus Torvalds 提交于 9月 21, 2007

This reverts commit 34feb2c8.

Suresh Siddha points out that this one breaks the fundamental
requirement that you cannot free page table pages before the TLB caches
are flushed.  The quicklists do not give the same kinds of guarantees
that the mmu_gather structure does, at least not in NUMA configurations.
Requested-by: NSuresh Siddha <suresh.b.siddha@intel.com>
Acked-by: NAndi Kleen <ak@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Christoph Lameter <clameter@sgi.com>
Cc: Asit Mallick <asit.k.mallick@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

da8f153e

21 9月, 2007 1 次提交

signalfd simplification · b8fceee1

由 Davide Libenzi 提交于 9月 20, 2007

This simplifies signalfd code, by avoiding it to remain attached to the
sighand during its lifetime.

In this way, the signalfd remain attached to the sighand only during
poll(2) (and select and epoll) and read(2).  This also allows to remove
all the custom "tsk == current" checks in kernel/signal.c, since
dequeue_signal() will only be called by "current".

I think this is also what Ben was suggesting time ago.

The external effect of this, is that a thread can extract only its own
private signals and the group ones.  I think this is an acceptable
behaviour, in that those are the signals the thread would be able to
fetch w/out signalfd.
Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8fceee1

20 9月, 2007 5 次提交

sched: add /proc/sys/kernel/sched_compat_yield · 1799e35d

由 Ingo Molnar 提交于 9月 19, 2007

add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield()
more agressive, by moving the yielding task to the last position
in the rbtree.

with sched_compat_yield=0:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  2539 mingo     20   0  1576  252  204 R   50  0.0   0:02.03 loop_yield
  2541 mingo     20   0  1576  244  196 R   50  0.0   0:02.05 loop

with sched_compat_yield=1:

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
  2584 mingo     20   0  1576  248  196 R   99  0.0   0:52.45 loop
  2582 mingo     20   0  1576  256  204 R    0  0.0   0:00.00 loop_yield
Signed-off-by: NIngo Molnar <mingo@elte.hu>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>

1799e35d

[MIPS] cpu-bugs64.c: GCC 3.3 constraint workaround · 09abbcff

由 Maciej W. Rozycki 提交于 9月 17, 2007

Add a workaround to address warnings generated on the "n" constraint by
GCC 3.3 and below.
Signed-off-by: NMaciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

09abbcff

Fix NUMA Memory Policy Reference Counting · 480eccf9

由 Lee Schermerhorn 提交于 9月 18, 2007

This patch proposes fixes to the reference counting of memory policy in the
page allocation paths and in show_numa_map().  Extracted from my "Memory
Policy Cleanups and Enhancements" series as stand-alone.

Shared policy lookup [shmem] has always added a reference to the policy,
but this was never unrefed after page allocation or after formatting the
numa map data.

Default system policy should not require additional ref counting, nor
should the current task's task policy.  However, show_numa_map() calls
get_vma_policy() to examine what may be [likely is] another task's policy.
The latter case needs protection against freeing of the policy.

This patch adds a reference count to a mempolicy returned by
get_vma_policy() when the policy is a vma policy or another task's
mempolicy.  Again, shared policy is already reference counted on lookup.  A
matching "unref" [__mpol_free()] is performed in alloc_page_vma() for
shared and vma policies, and in show_numa_map() for shared and another
task's mempolicy.  We can call __mpol_free() directly, saving an admittedly
inexpensive inline NULL test, because we know we have a non-NULL policy.

Handling policy ref counts for hugepages is a bit trickier.
huge_zonelist() returns a zone list that might come from a shared or vma
'BIND policy.  In this case, we should hold the reference until after the
huge page allocation in dequeue_hugepage().  The patch modifies
huge_zonelist() to return a pointer to the mempolicy if it needs to be
unref'd after allocation.

Kernel Build [16cpu, 32GB, ia64] - average of 10 runs:

		w/o patch	w/ refcount patch
	    Avg	  Std Devn	   Avg	  Std Devn
Real:	 100.59	    0.38	 100.63	    0.43
User:	1209.60	    0.37	1209.91	    0.31
System:   81.52	    0.42	  81.64	    0.34
Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: NAndi Kleen <ak@suse.de>
Cc: Christoph Lameter <clameter@sgi.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

480eccf9

Fix user namespace exiting OOPs · 28f300d2

由 Pavel Emelyanov 提交于 9月 18, 2007

It turned out, that the user namespace is released during the do_exit() in
exit_task_namespaces(), but the struct user_struct is released only during the
put_task_struct(), i.e.  MUCH later.

On debug kernels with poisoned slabs this will cause the oops in
uid_hash_remove() because the head of the chain, which resides inside the
struct user_namespace, will be already freed and poisoned.

Since the uid hash itself is required only when someone can search it, i.e.
when the namespace is alive, we can safely unhash all the user_struct-s from
it during the namespace exiting.  The subsequent free_uid() will complete the
user_struct destruction.

For example simple program

   #include <sched.h>

   char stack[2 * 1024 * 1024];

   int f(void *foo)
   {
   	return 0;
   }

   int main(void)
   {
   	clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0);
   	return 0;
   }

run on kernel with CONFIG_USER_NS turned on will oops the
kernel immediately.

This was spotted during OpenVZ kernel testing.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
Acked-by: N"Serge E. Hallyn" <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

28f300d2

Convert uid hash to hlist · 735de223

由 Pavel Emelyanov 提交于 9月 18, 2007

Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses
list_heads, thus occupying twice as much place as it could.  Convert it to
hlist_heads.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAlexey Dobriyan <adobriyan@openvz.org>
Acked-by: NSerge Hallyn <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

735de223

19 9月, 2007 1 次提交

[POWERPC] Fix timekeeping on PowerPC 601 · c27da339

由 Benjamin Herrenschmidt 提交于 9月 19, 2007

Recent changes to the timekeeping code broke support for the PowerPC 601
processor which doesn't have the usual timebase facility but a slightly
different thing called (yuck) the RTC.

This fixes it, boot tested on an old 601 based PowerMac 7200.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NPaul Mackerras <paulus@samba.org>

c27da339

17 9月, 2007 5 次提交

Fix non-ISA link error in drivers/scsi/advansys.c · fa890d58

由 Matthew Wilcox 提交于 9月 16, 2007

When CONFIG_ISA is disabled, the isa_driver support will not be compiled
in.  Define stubs so that we don't get link-time errors.
Signed-off-by: NMatthew Wilcox <matthew@wil.cx>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa890d58

[NET] skbuff: Add skb_cow_head · d9cc2048

由 Herbert Xu 提交于 9月 16, 2007

This patch adds an optimised version of skb_cow that avoids the copy if
the header can be modified even if the rest of the payload is cloned.

This can be used in encapsulating paths where we only need to modify the
header. As it is, this can be used in PPPOE and bridging.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d9cc2048

[SCTP]: Convert bind_addr_list locking to RCU · 559cf710

由 Vlad Yasevich 提交于 9月 16, 2007

Since the sctp_sockaddr_entry is now RCU enabled as part of
the patch to synchronize sctp_localaddr_list, it makes sense to
change all handling of these entries to RCU.  This includes the
sctp_bind_addrs structure and it's list of bound addresses.

This list is currently protected by an external rw_lock and that
looks like an overkill.  There are only 2 writers to the list:
bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
These are already seriealized via the socket lock, so they will
not step on each other.  These are also relatively rare, so we
should be good with RCU.

The readers are varied and they are easily converted to RCU.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NSridhar Samdurala <sri@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

559cf710

[SCTP]: Add RCU synchronization around sctp_localaddr_list · 29303547

由 Vlad Yasevich 提交于 9月 16, 2007

sctp_localaddr_list is modified dynamically via NETDEV_UP
and NETDEV_DOWN events, but there is not synchronization
between writer (even handler) and readers.  As a result,
the readers can access an entry that has been freed and
crash the sytem.
Signed-off-by: NVlad Yasevich <vladislav.yasevich@hp.com>
Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: NSridhar Samdurala <sri@us.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

29303547

[SPARC64]: Fix lockdep, particularly on SMP. · 301feb65

由 David S. Miller 提交于 9月 16, 2007

As noted by Al Viro, when we try to call prom_set_trap_table()
in the SMP trampoline code we try to take the PROM call spinlock
which doesn't work because the current thread pointer isn't
valid yet and lockdep depends upon that being correct.

Furthermore, we cannot set the current thread pointer register
because it can't be properly dereferenced until we return from
prom_set_trap_table().  Kernel TLB misses only work after that
call.

So do the PROM call to set the trap table directly instead of
going through the OBP library C code, and thus avoid the lock
altogether.

These calls are guarenteed to be serialized fully.

Since there are now no calls to the prom_set_trap_table{_sun4v}()
library functions, they can be deleted.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

301feb65