提交 · 60a0d23386eab0559ad32ae50b200cc58545f327 · openeuler / raspberrypi-kernel

15 11月, 2007 40 次提交

hibernate: fix lockdep report · 60a0d233

由 Johannes Berg 提交于 11月 14, 2007

Lockdep reports a circular locking dependency in the hibernate code
because
 - during system boot hibernate code (from an initcall) locks pm_mutex
   and then a sysfs buffer mutex via name_to_dev_t
 - during regular operation hibernate code locks pm_mutex under a
   sysfs buffer mutex because it's called from sysfs methods.

The deadlock can never happen because during initcall invocation nothing
can write to sysfs yet. This removes the lockdep report by marking the
initcall locking as being in a different class.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Alan Stern <stern@rowland.harvard.edu>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

60a0d233

__do_IRQ does not check IRQ_DISABLED when IRQ_PER_CPU is set · c642b839

由 Russ Anderson 提交于 11月 14, 2007

In __do_IRQ(), the normal case is that IRQ_DISABLED is checked and if set
the handler (handle_IRQ_event()) is not called.

Earlier in __do_IRQ(), if IRQ_PER_CPU is set the code does not check
IRQ_DISABLED and calls the handler even though IRQ_DISABLED is set.  This
behavior seems unintentional.

One user encountering this behavior is the CPE handler (in
arch/ia64/kernel/mca.c).  When the CPE handler encounters too many CPEs
(such as a solid single bit error), it sets up a polling timer and disables
the CPE interrupt (to avoid excessive overhead logging the stream of single
bit errors).  disable_irq_nosync() is called which sets IRQ_DISABLED.  The
IRQ_PER_CPU flag was previously set (in ia64_mca_late_init()).  The net
result is the CPE handler gets called even though it is marked disabled.

If the behavior of not checking IRQ_DISABLED when IRQ_PER_CPU is set is
intentional, it would be worthy of a comment describing the intended
behavior.  disable_irq_nosync() does call chip->disable() to provide a
chipset specifiec interface for disabling the interrupt, which avoids this
issue when used.
Signed-off-by: NRuss Anderson <rja@sgi.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c642b839

pidns: Place under CONFIG_EXPERIMENTAL · 57d5f66b

由 Eric W. Biederman 提交于 11月 14, 2007

This is my trivial patch to swat innumerable little bugs with a single
blow.

After some intensive review (my apologies for not having gotten to this
sooner) what we have looks like a good base to build on with the current
pid namespace code but it is not complete, and it is still much to simple
to find issues where the kernel does the wrong thing outside of the initial
pid namespace.

Until the dust settles and we are certain we have the ABI and the
implementation is as correct as humanly possible let's keep process ID
namespaces behind CONFIG_EXPERIMENTAL.

Allowing us the option of fixing any ABI or other bugs we find as long as
they are minor.

Allowing users of the kernel to avoid those bugs simply by ensuring their
kernel does not have support for multiple pid namespaces.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Cedric Le Goater <clg@fr.ibm.com>
Cc: Adrian Bunk <bunk@kernel.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Kir Kolyshkin <kir@swsoft.com>
Cc: Kirill Korotaev <dev@sw.ru>
Cc: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57d5f66b

vmstat: fix section mismatch warning · 42614fcd

由 Randy Dunlap 提交于 11月 14, 2007

Mark start_cpu_timer() as __cpuinit instead of __devinit.
Fixes this section warning:

WARNING: vmlinux.o(.text+0x60e53): Section mismatch: reference to .init.text:start_cpu_timer (between 'vmstat_cpuup_callback' and 'vmstat_show')
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Acked-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

42614fcd

gbefb: fix section mismatch warnings · 579d6d93

由 Randy Dunlap 提交于 11月 14, 2007

Make 'default_mode' and 'default_var' be __initdata.
Fixes these section warnings:

WARNING: vmlinux.o(.data+0x128e0): Section mismatch: reference to .init.data:default_mode_CRT (between 'default_mode' and 'default_var')
WARNING: vmlinux.o(.data+0x128e4): Section mismatch: reference to .init.data:default_var_CRT (between 'default_var' and 'dev_attr_size')
Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

579d6d93

mark sys_open/sys_read exports unused · cb51f973

由 Arjan van de Ven 提交于 11月 14, 2007

sys_open / sys_read were used in the early 1.2 days to load firmware from
disk inside drivers.  Since 2.0 or so this was deprecated behavior, but
several drivers still were using this.  Since a few years we have a
request_firmware() API that implements this in a nice, consistent way.
Only some old ISA sound drivers (pre-ALSA) still straggled along for some
time....  however with commit c2b1239a the
last user is now gone.

This is a good thing, since using sys_open / sys_read etc for firmware is a
very buggy to dangerous thing to do; these operations put an fd in the
process file descriptor table....  which then can be tampered with from
other threads for example.  For those who don't want the firmware loader,
filp_open()/vfs_read are the better APIs to use, without this security
issue.

The patch below marks sys_open and sys_read unused now that they're
really not used anymore, and for deletion in the 2.6.25 timeframe.
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cb51f973

fix param_sysfs_builtin name length check · 22800a28

由 Jan Kiszka 提交于 11月 14, 2007

Commit faf8c714 caused a regression:
parameter names longer than MAX_KBUILD_MODNAME will now be rejected,
although we just need to keep the module name part that short.  This patch
restores the old behaviour while still avoiding that memchr is called with
its length parameter larger than the total string length.
Signed-off-by: NJan Kiszka <jan.kiszka@web.de>
Cc: Dave Young <hidave.darkstar@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

22800a28

proc: simplify and correct proc_flush_task · 9fcc2d15

由 Eric W. Biederman 提交于 11月 14, 2007

Currently we special case when we have only the initial pid namespace.
Unfortunately in doing so the copied case for the other namespaces was
broken so we don't properly flush the thread directories :(

So this patch removes the unnecessary special case (removing a usage of
proc_mnt) and corrects the flushing of the thread directories.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Pavel Emelyanov <xemul@openvz.org>
Cc: Sukadev Bhattiprolu <sukadev@us.ibm.com>
Cc: Kirill Korotaev <dev@sw.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9fcc2d15

mips: undo locking on error path returns · c0f2a9d7

由 Roel Kluin 提交于 11月 14, 2007

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: NRoel Kluin <12o3l@tiscali.nl>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c0f2a9d7

cris gpio: undo locks before returning · 5c6ff79d

由 Roel Kluin 提交于 11月 14, 2007

Signed-off-by: NRoel Kluin <12o3l@tiscali.nl>
Cc: Mikael Starvik <starvik@axis.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5c6ff79d

rd: fix data corruption on memory pressure · 5d0360ee

由 Christian Borntraeger 提交于 11月 14, 2007

We have seen ramdisk based install systems, where some pages of mapped
libraries and programs were suddendly zeroed under memory pressure. This
should not happen, as the ramdisk avoids freeing its pages by keeping them
dirty all the time.

It turns out that there is a case, where the VM makes a ramdisk page clean,
without telling the ramdisk driver. On memory pressure shrink_zone runs
and it starts to run shrink_active_list. There is a check for
buffer_heads_over_limit, and if true, pagevec_strip is called.
pagevec_strip calls try_to_release_page. If the mapping has no releasepage
callback, try_to_free_buffers is called. try_to_free_buffers has now a
special logic for some file systems to make a dirty page clean, if all
buffers are clean. Thats what happened in our test case.

The simplest solution is to provide a noop-releasepage callback for the
ramdisk driver. This avoids try_to_free_buffers for ramdisk pages.
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NNick Piggin <npiggin@suse.de>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d0360ee

tle62x0 driver stops ignoring read errors · 822bd5aa

由 David Brownell 提交于 11月 14, 2007

The tle62x0 driver was ignoring all read errors. This patch makes it
pass such errors up the stack, instead of returning bogus data.
Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

822bd5aa

fuse_file_alloc(): fix NULL dereferences · 8744969a

由 Adrian Bunk 提交于 11月 14, 2007

Fix obvious NULL dereferences spotted by the Coverity checker.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Acked-by: NMiklos Szeredi <miklos@szeredi.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8744969a

fix mm/util.c:krealloc() · be21f0ab

由 Adrian Bunk 提交于 11月 14, 2007

Commit ef8b4520 added one NULL check for
"p" in krealloc(), but that doesn't seem to be enough since there
doesn't seem to be any guarantee that memcpy(ret, NULL, 0) works
(spotted by the Coverity checker).

For making it clearer what happens this patch also removes the pointless
min().
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Acked-by: NChristoph Lameter <clameter@sgi.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be21f0ab

sunrpc/xprtrdma/transport.c: fix use-after-free · d5cd9787

由 Adrian Bunk 提交于 11月 14, 2007

Fix an obvious use-after-free spotted by the Coverity checker.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d5cd9787

serial: only use PNP IRQ if it's valid · e02f5f52

由 Bjorn Helgaas 提交于 11月 14, 2007

"Luming Yu" <luming.yu@gmail.com> says:

  There is a "ttyS1 irq is -1" problem observed on tiger4 which cause the
  serial port broken.

  It is because that there is __no__ ACPI IRQ resource assigned for the
  serial port.  So the value of the IRQ for the port is never changed since it
  got initialized to -1.

If PNP supplies a valid IRQ, use it.  Otherwise, leave port.irq == 0, which
means "no IRQ" to the serial core.
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Yu Luming <luming.yu@intel.com>
Acked-by: NMatthew Wilcox <matthew@wil.cx>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e02f5f52

i5000_edac: no need to __stringify() KBUILD_BASENAME · 57510c2f

由 Darrick J. Wong 提交于 11月 14, 2007

The i5000_edac driver's PCI registration structure has the name
""i5000_edac"" (with extra set of double-quotes) which is probably not
intentional.  Get rid of __stringify.
Signed-off-by: NDarrick J. Wong <djwong@us.ibm.com>
Cc: Doug Thompson <norsk5@yahoo.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57510c2f

rtc: fall back to requesting only the ports we actually use · 9626f1f1

由 Bjorn Helgaas 提交于 11月 14, 2007

Firmware like PNPBIOS or ACPI can report the address space consumed by the
RTC.  The actual space consumed may be less than the size (RTC_IO_EXTENT)
assumed by the RTC driver.

The PNP core doesn't request resources yet, but I'd like to make it do so.
If/when it does, the RTC_IO_EXTENT request may fail, which prevents the RTC
driver from loading.

Since we only use the RTC index and data registers at RTC_PORT(0) and
RTC_PORT(1), we can fall back to requesting just enough space for those.

If the PNP core requests resources, this results in typical I/O port usage
like this:

    0070-0073 : 00:06		<-- PNP device 00:06 responds to 70-73
      0070-0071 : rtc		<-- RTC driver uses only 70-71

instead of the current:

    0070-0077 : rtc		<-- RTC_IO_EXTENT == 8
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9626f1f1

rtc: release correct region in error path · 4c06be10

由 Bjorn Helgaas 提交于 11月 14, 2007

The misc_register() error path always released an I/O port region,
even if the region was memory-mapped (only mips uses memory-mapped RTC,
as far as I can see).
Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Acked-by: NRalf Baechle <ralf@linux-mips.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4c06be10

reiserfs: don't drop PG_dirty when releasing sub-page-sized dirty file · c06a018f

由 Fengguang Wu 提交于 11月 14, 2007

This is not a new problem in 2.6.23-git17.  2.6.22/2.6.23 is buggy in the
same way.

Reiserfs could accumulate dirty sub-page-size files until umount time.
They cannot be synced to disk by pdflush routines or explicit `sync'
commands.  Only `umount' can do the trick.

The direct cause is: the dirty page's PG_dirty is wrongly _cleared_.
Call trace:
	 [<ffffffff8027e920>] cancel_dirty_page+0xd0/0xf0
	 [<ffffffff8816d470>] :reiserfs:reiserfs_cut_from_item+0x660/0x710
	 [<ffffffff8816d791>] :reiserfs:reiserfs_do_truncate+0x271/0x530
	 [<ffffffff8815872d>] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0
	 [<ffffffff8815d3d0>] :reiserfs:reiserfs_file_release+0x1e0/0x340
	 [<ffffffff802a187c>] __fput+0xcc/0x1b0
	 [<ffffffff802a1ba6>] fput+0x16/0x20
	 [<ffffffff8029e676>] filp_close+0x56/0x90
	 [<ffffffff8029fe0d>] sys_close+0xad/0x110
	 [<ffffffff8020c41e>] system_call+0x7e/0x83

Fix the bug by removing the cancel_dirty_page() call. Tests show that
it causes no bad behaviors on various write sizes.

=== for the patient ===
Here are more detailed demonstrations of the problem.

1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to;
   and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed.

------------------------------ screen 0 ------------------------------
[T0] root /home/wfg# cat > /test/tiny
[T1] hi
[T2] root /home/wfg#

------------------------------ screen 1 ------------------------------
[T1] root /home/wfg# echo /test/tiny > /proc/filecache
[T1] root /home/wfg# cat /proc/filecache
     # file /test/tiny
     # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
     # idx   len     state   refcnt
     0       1       ___UD__Bd_      2
[T2] root /home/wfg# cat /proc/filecache
     # file /test/tiny
     # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
     # idx   len     state   refcnt
     0       1       ___U___Bd_      2

2) note the non-zero 'cancelled_write_bytes' after /tmp/hi is copied.

------------------------------ screen 0 ------------------------------
[T0] root /home/wfg# echo hi > /tmp/hi
[T1] root /home/wfg# cp /tmp/hi /dev/stdin /test
[T2] hi
[T3] root /home/wfg#

------------------------------ screen 1 ------------------------------
[T1] root /proc/4397# cd /proc/`pidof cp`
[T1] root /proc/4713# cat io
     rchar: 8396
     wchar: 3
     syscr: 20
     syscw: 1
     read_bytes: 0
     write_bytes: 20480
     cancelled_write_bytes: 4096
[T2] root /proc/4713# cat io
     rchar: 8399
     wchar: 6
     syscr: 21
     syscw: 2
     read_bytes: 0
     write_bytes: 24576
     cancelled_write_bytes: 4096

//Question: the 'write_bytes' is a bit more than expected ;-)
Tested-by: NMaxim Levitsky <maximlevitsky@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
Reviewed-by: NChris Mason <chris.mason@oracle.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c06a018f

I/OAT: Add support for version 2 of ioatdma device · 7bb67c14

由 Shannon Nelson 提交于 11月 14, 2007

Add support for version 2 of the ioatdma device.  This device handles
the descriptor chain and DCA services slightly differently:
 - Instead of moving the dma descriptors between a busy and an idle chain,
   this new version uses a single circular chain so that we don't have
   rewrite the next_descriptor pointers as we add new requests, and the
   device doesn't need to re-read the last descriptor.
 - The new device has the DCA tags defined internally instead of needing
   them defined statically.
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Cc: "Williams, Dan J" <dan.j.williams@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7bb67c14

Linux Kernel Markers: fix samples to follow format string standard · cc9f2f8f

由 Mathieu Desnoyers 提交于 11月 14, 2007

Add the field names to marker example format string.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cc9f2f8f

Linux Kernel Markers: document format string · 5f9468ce

由 Mathieu Desnoyers 提交于 11月 14, 2007

Describes the format string standard further: Use of field names before the
type specifiers..
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5f9468ce

Linux Kernel Markers: fix marker mutex not taken upon module load · 314de8a9

由 Mathieu Desnoyers 提交于 11月 14, 2007

Upon module load, we must take the markers mutex.  It implies that the marker
mutex must be nested inside the module mutex.

It implies changing the nesting order : now the marker mutex nests inside the
module mutex.  Make the necessary changes to reverse the order in which the
mutexes are taken.

Includes some cleanup from Dave Hansen <haveblue@us.ibm.com>.
Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

314de8a9

Fixes to the BFS filesystem driver · f433dc56

由 Dmitri Vorobiev 提交于 11月 14, 2007

I found a few bugs in the BFS driver.  Detailed description of the bugs as
well as the steps to reproduce the errors are given in the kernel bugzilla.
 Please follow these links for more information:

http://bugzilla.kernel.org/show_bug.cgi?id=9363
http://bugzilla.kernel.org/show_bug.cgi?id=9364
http://bugzilla.kernel.org/show_bug.cgi?id=9365
http://bugzilla.kernel.org/show_bug.cgi?id=9366

This patch fixes the bugs described above.  Besides, the patch introduces
coding style changes to make the BFS driver conform to the requirements
specified for Linux kernel code.  Finally, I made a few cosmetic changes
such as removal of trivial debug output.

Also, the patch removes the fields `si_lf_ioff' and `si_lf_sblk' of the
in-core superblock structure.  These fields are initialized but never
actually used.

If you are wondering why I need BFS, here is the answer: I am using this
driver in the context of Linux kernel classes I am teaching in the Moscow
State University and in the International Institute of Information
Technology in Pune, India.
Signed-off-by: NDmitri Vorobiev <dmitri.vorobiev@gmail.com>
Cc: Tigran Aivazian <tigran@veritas.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f433dc56

revert "Task Control Groups: example CPU accounting subsystem" · cfb52856

由 Andrew Morton 提交于 11月 14, 2007

Revert 62d0df64.

This was originally intended as a simple initial example of how to create a
control groups subsystem; it wasn't intended for mainline, but I didn't make
this clear enough to Andrew.

The CFS cgroup subsystem now has better functionality for the per-cgroup usage
accounting (based directly on CFS stats) than the "usage" status file in this
patch, and the "load" status file is rather simplistic - although having a
per-cgroup load average report would be a useful feature, I don't believe this
patch actually provides it.  If it gets into the final 2.6.24 we'd probably
have to support this interface for ever.

Cc: Paul Menage <menage@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cfb52856

hugetlb: fix i_blocks accounting · 45c682a6

由 Ken Chen 提交于 11月 14, 2007

For administrative purpose, we want to query actual block usage for
hugetlbfs file via fstat.  Currently, hugetlbfs always return 0.  Fix that
up since kernel already has all the information to track it properly.
Signed-off-by: NKen Chen <kenchen@google.com>
Acked-by: NAdam Litke <agl@us.ibm.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Cc: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

45c682a6

mm/hugetlb.c: make a function static · 8cde045c

由 Adrian Bunk 提交于 11月 14, 2007

return_unused_surplus_pages() can become static.
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Acked-by: NAdam Litke <agl@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8cde045c

hugetlb: enforce quotas during reservation for shared mappings · 90d8b7e6

由 Adam Litke 提交于 11月 14, 2007

When a MAP_SHARED mmap of a hugetlbfs file succeeds, huge pages are reserved
to guarantee no problems will occur later when instantiating pages.  If quotas
are in force, page instantiation could fail due to a race with another process
or an oversized (but approved) shared mapping.

To prevent these scenarios, debit the quota for the full reservation amount up
front and credit the unused quota when the reservation is released.
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90d8b7e6

hugetlb: allow bulk updating in hugetlb_*_quota() · 9a119c05

由 Adam Litke 提交于 11月 14, 2007

Add a second parameter 'delta' to hugetlb_get_quota and hugetlb_put_quota to
allow bulk updating of the sbinfo->free_blocks counter.  This will be used by
the next patch in the series.
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9a119c05

hugetlb: debit quota in alloc_huge_page · 2fc39cec

由 Adam Litke 提交于 11月 14, 2007

Now that quota is credited by free_huge_page(), calls to hugetlb_get_quota()
seem out of place.  The alloc/free API is unbalanced because we handle the
hugetlb_put_quota() but expect the caller to open-code hugetlb_get_quota().
Move the get inside alloc_huge_page to clean up this disparity.

This patch has been kept apart from the previous patch because of the somewhat
dodgy ERR_PTR() use herein.  Moving the quota logic means that
alloc_huge_page() has two failure modes.  Quota failure must result in a
SIGBUS while a standard allocation failure is OOM.  Unfortunately, ERR_PTR()
doesn't like the small positive errnos we have in VM_FAULT_* so they must be
negated before they are used.

Does anyone take issue with the way I am using PTR_ERR.  If so, what are your
thoughts on how to clean this up (without needing an if,else if,else block at
each alloc_huge_page() callsite)?
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2fc39cec

hugetlb: fix quota management for private mappings · c79fb75e

由 Adam Litke 提交于 11月 14, 2007

The hugetlbfs quota management system was never taught to handle MAP_PRIVATE
mappings when that support was added.  Currently, quota is debited at page
instantiation and credited at file truncation.  This approach works correctly
for shared pages but is incomplete for private pages.  In addition to
hugetlb_no_page(), private pages can be instantiated by hugetlb_cow(); but
this function does not respect quotas.

Private huge pages are treated very much like normal, anonymous pages.  They
are not "backed" by the hugetlbfs file and are not stored in the mapping's
radix tree.  This means that private pages are invisible to
truncate_hugepages() so that function will not credit the quota.

This patch (based on a prototype provided by Ken Chen) moves quota crediting
for all pages into free_huge_page().  page->private is used to store a pointer
to the mapping to which this page belongs.  This is used to credit quota on
the appropriate hugetlbfs instance.
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c79fb75e

hugetlb: split alloc_huge_page into private and shared components · 348ea204

由 Adam Litke 提交于 11月 14, 2007

Hugetlbfs implements a quota system which can limit the amount of memory that
can be used by the filesystem.  Before allocating a new huge page for a file,
the quota is checked and debited.  The quota is then credited when truncating
the file.  I found a few bugs in the code for both MAP_PRIVATE and MAP_SHARED
mappings.  Before detailing the problems and my proposed solutions, we should
agree on a definition of quotas that properly addresses both private and
shared pages.  Since the purpose of quotas is to limit total memory
consumption on a per-filesystem basis, I argue that all pages allocated by the
fs (private and shared) should be charged against quota.

Private Mappings
================

The current code will debit quota for private pages sometimes, but will never
credit it.  At a minimum, this causes a leak in the quota accounting which
renders the accounting essentially useless as it is.  Shared pages have a one
to one mapping with a hugetlbfs file and are easy to account by debiting on
allocation and crediting on truncate.  Private pages are anonymous in nature
and have a many to one relationship with their hugetlbfs files (due to copy on
write).  Because private pages are not indexed by the mapping's radix tree,
thier quota cannot be credited at file truncation time.  Crediting must be
done when the page is unmapped and freed.

Shared Pages
============

I discovered an issue concerning the interaction between the MAP_SHARED
reservation system and quotas.  Since quota is not checked until page
instantiation, an over-quota mmap/reservation will initially succeed.  When
instantiating the first over-quota page, the program will receive SIGBUS.
This is inconsistent since the reservation is supposed to be a guarantee.  The
solution is to debit the full amount of quota at reservation time and credit
the unused portion when the reservation is released.

This patch series brings quotas back in line by making the following
modifications:
 * Private pages
   - Debit quota in alloc_huge_page()
   - Credit quota in free_huge_page()
 * Shared pages
   - Debit quota for entire reservation at mmap time
   - Credit quota for instantiated pages in free_huge_page()
   - Credit quota for unused reservation at munmap time

This patch:

The shared page reservation and dynamic pool resizing features have made the
allocation of private vs.  shared huge pages quite different.  By splitting
out the private/shared-specific portions of the process into their own
functions, readability is greatly improved.  alloc_huge_page now calls the
proper helper and performs common operations.

[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Cc: Ken Chen <kenchen@google.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

348ea204

raid5: fix unending write sequence · 6c55be8b

由 Dan Williams 提交于 11月 14, 2007

<debug output from Joel's system>
handling stripe 7629696, state=0x14 cnt=1, pd_idx=2 ops=0:0:0
check 5: state 0x6 toread 0000000000000000 read 0000000000000000 write fffff800ffcffcc0 written 0000000000000000
check 4: state 0x6 toread 0000000000000000 read 0000000000000000 write fffff800fdd4e360 written 0000000000000000
check 3: state 0x1 toread 0000000000000000 read 0000000000000000 write 0000000000000000 written 0000000000000000
check 2: state 0x1 toread 0000000000000000 read 0000000000000000 write 0000000000000000 written 0000000000000000
check 1: state 0x6 toread 0000000000000000 read 0000000000000000 write fffff800ff517e40 written 0000000000000000
check 0: state 0x6 toread 0000000000000000 read 0000000000000000 write fffff800fd4cae60 written 0000000000000000
locked=4 uptodate=2 to_read=0 to_write=4 failed=0 failed_num=0
for sector 7629696, rmw=0 rcw=0
</debug>

These blocks were prepared to be written out, but were never handled in
ops_run_biodrain(), so they remain locked forever.  The operations flags
are all clear which means handle_stripe() thinks nothing else needs to be
done.

This state suggests that the STRIPE_OP_PREXOR bit was sampled 'set' when it
should not have been.  This patch cleans up cases where the code looks at
sh->ops.pending when it should be looking at the consistent stack-based
snapshot of the operations flags.

Report from Joel:
	Resync done. Patch fix this bug.
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Tested-by: NJoel Bertrand <joel.bertrand@systella.fr>
Cc: <stable@kernel.org>
Cc: Neil Brown <neilb@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6c55be8b

hugetlb: follow_hugetlb_page() for write access · 5b23dbe8

由 Adam Litke 提交于 11月 14, 2007

When calling get_user_pages(), a write flag is passed in by the caller to
indicate if write access is required on the faulted-in pages.  Currently,
follow_hugetlb_page() ignores this flag and always faults pages for
read-only access.  This can cause data corruption because a device driver
that calls get_user_pages() with write set will not expect COW faults to
occur on the returned pages.

This patch passes the write flag down to follow_hugetlb_page() and makes
sure hugetlb_fault() is called with the right write_access parameter.

[ezk@cs.sunysb.edu: build fix]
Signed-off-by: NAdam Litke <agl@us.ibm.com>
Reviewed-by: NKen Chen <kenchen@google.com>
Cc: David Gibson <hermes@gibson.dropbear.id.au>
Cc: William Lee Irwin III <wli@holomorphy.com>
Cc: Badari Pulavarty <pbadari@us.ibm.com>
Signed-off-by: NErez Zadok <ezk@cs.sunysb.edu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5b23dbe8

atmel_serial build warnings begone · 19cd7537

由 David Brownell 提交于 11月 14, 2007

Remove annoying build warnings about unused variables in atmel_serial,
which afflict both AT91 and AVR32 builds.
Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
Acked-by: NHaavard Skinnemoen <hskinnemoen@atmel.com>
Cc: Andrew Victor <andrew@sanpeople.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

19cd7537

dmaengine: fix broken device refcounting · 348badf1

由 Haavard Skinnemoen 提交于 11月 14, 2007

When a DMA device is unregistered, its reference count is decremented twice
for each channel: Once dma_class_dev_release() and once in
dma_chan_cleanup().  This may result in the DMA device driver's remove()
function completing before all channels have been cleaned up, causing lots
of use-after-free fun.

Fix it by incrementing the device's reference count twice for each
channel during registration.

[dan.j.williams@intel.com: kill unnecessary client refcounting]
Signed-off-by: NHaavard Skinnemoen <hskinnemoen@atmel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

348badf1

drivers/misc: Move misplaced pci_dev_put's · 90d8dabf

由 Julia Lawall 提交于 11月 14, 2007

Move pci_dev_put outside the loops in which it occurs.  Within the loop,
pci_dev_put is done implicitly by pci_get_device.

The problem was detected using the following semantic patch, and corrected
by hand.

@@
expression dev;
expression E;
@@

- pci_dev_put(dev)
   ... when != dev = E
- pci_get_device(...,dev)
Signed-off-by: NJulia Lawall <julia@diku.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

90d8dabf

paride: pf driver fixes · e62aa046

由 Ondrej Zary 提交于 11月 14, 2007

The pf driver for parallel port floppy drives seems to be broken. At least
with Imation SuperDisk with EPAT chip, the driver calls pi_connect() and
pi_disconnect after each transferred sector. At least with EPAT, this
operation is very expensive - causes drive recalibration. Thus, transferring
even a single byte (dd if=/dev/pf0 of=/dev/null bs=1 count=1) takes 20
seconds, making the driver useless.

The pf_next_buf() function seems to be broken as it returns 1 always (except
when pf_run is non-zero), causing the loop in do_pf_read_drq (and
do_pf_write_drq) to be executed only once.

The following patch fixes this problem. It also fixes swapped descriptions in
pf_lock() function and removes DBMSG macro, which seems useless.
Signed-off-by: NOndrej Zary <linux@rainbow-software.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e62aa046

spi: fix error paths on txx9spi_probe · ba0a7f39

由 Atsushi Nemoto 提交于 11月 14, 2007

Some error paths in txx9spi_probe wrongly return 0.  This patch fixes them by
using the devres interfaces.
Signed-off-by: NAtsushi Nemoto <anemo@mba.ocn.ne.jp>
Acked-by: NDavid Brownell <david-b@pacbell.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ba0a7f39