提交 · fef2c9bc1b54c0261324a96e948c0b849796e896 · openanolis / cloud-kernel

23 3月, 2011 40 次提交

kernel/watchdog.c: allow hardlockup to panic by default · fef2c9bc

由 Don Zickus 提交于 3月 22, 2011

When a cpu is considered stuck, instead of limping along and just printing
a warning, it is sometimes preferred to just panic, let kdump capture the
vmcore and reboot.  This gets the machine back into a stable state quickly
while saving the info that got it into a stuck state to begin with.

Add a Kconfig option to allow users to set the hardlockup to panic
by default.  Also add in a 'nmi_watchdog=nopanic' to override this.

[akpm@linux-foundation.org: fix strncmp length]
Signed-off-by: NDon Zickus <dzickus@redhat.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fef2c9bc

calibrate: retry with wider bounds when converge seems to fail · b1b5f65e

由 Phil Carmody 提交于 3月 22, 2011

Systems with unmaskable interrupts such as SMIs may massively
underestimate loops_per_jiffy, and fail to converge anywhere near the real
value.  A case seen on x86_64 was an initial estimate of 256<<12, which
converged to 511<<12 where the real value should have been over 630<<12.
This admitedly requires bypassing the TSC calibration (lpj_fine), and a
failure to settle in the direct calibration too, but is physically
possible.  This failure does not depend on my previous calibration
optimisation, but by luck is easy to fix with the optimisation in place
with a trivial retry loop.

In the context of the optimised converging method, as we can no longer
trust the starting estimate, enlarge the search bounds exponentially so
that the number of retries is logarithmically bounded.

[akpm@linux-foundation.org: mention x86_64 SMIs in comment]
Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Tested-by: NStephen Boyd <sboyd@codeaurora.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b1b5f65e

calibrate: home in on correct lpj value more quickly · 191e5688

由 Phil Carmody 提交于 3月 22, 2011

Binary chop with a jiffy-resync on each step to find an upper bound is
slow, so just race in a tight-ish loop to find an underestimate.

If done with lots of individual steps, sometimes several hundreds of
iterations would be required, which would impose a significant overhead,
and make the initial estimate very low.  By taking slowly increasing steps
there will be less overhead.

E.g.  an x86_64 2.67GHz could have fitted in 613 individual small delays,
but in reality should have been able to fit in a single delay 644 times
longer, so underestimated by 31 steps.  To reach the equivalent of 644
small delays with the accelerating scheme now requires about 130
iterations, so has <1/4th of the overhead, and can therefore be expected
to underestimate by only 7 steps.

As now we have a better initial estimate we can binary chop over a smaller
range.  With the loop overhead in the initial estimate kept low, and the
step sizes moderate, we won't have under-estimated by much, so chose as
tight a range as we can.
Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Tested-by: NStephen Boyd <sboyd@codeaurora.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

191e5688

calibrate: extract fall-back calculation into own helper · 71c696b1

由 Phil Carmody 提交于 3月 22, 2011

The motivation for this patch series is that currently our OMAP calibrates
itself using the trial-and-error binary chop fallback that some other
architectures no longer need to perform.  This is a lengthy process,
taking 0.2s in an environment where boot time is of great interest.

Patch 2/4 has two optimisations.  Firstly, it replaces the initial
repeated- doubling to find the relevant power of 2 with a tight loop that
just does as much as it can in a jiffy.  Secondly, it doesn't binary chop
over an entire power of 2 range, it choses a much smaller range based on
how much it squeezed in, and failed to squeeze in, during the first stage.
 Both are significant optimisations, and bring our calibration down from
23 jiffies to 5, and, in the process, often arrive at a more accurate lpj
value.

The 'bands' and 'sub-logarithmic' growth may look over-engineered, but
they only cost a small level of inaccuracy in the initial guess (for all
architectures) in order to avoid the very large inaccuracies that appeared
during testing (on x86_64 architectures, and presumably others with less
metronomic operation).  Note that due to the existence of the TSC and
other timers, the x86_64 will not typically use this fallback routine, but
I wanted to code defensively, able to cope with all kinds of processor
behaviours and kernel command line options.

Patch 3/4 is an additional trap for the nightmare scenario where the
initial estimate is very inaccurate, possibly due to things like SMIs.
It simply retries with a larger bound.

Stephen said:

I tried this patch set out on an MSM7630.
:
: Before:
:
: Calibrating delay loop... 681.57 BogoMIPS (lpj=3407872)
:
: After:
:
: Calibrating delay loop... 680.75 BogoMIPS (lpj=3403776)
:
: But the really good news is calibration time dropped from ~247ms to ~56ms.
:  Sadly we won't be able to benefit from this should my udelay patches make
: it into ARM because we would be using calibrate_delay_direct() instead (at
: least on machines who choose to).  Can we somehow reapply the logic behind
: this to calibrate_delay_direct()?  That would be even better, but this is
: definitely a boot time improvement.
:
: Or maybe we could just replace calibrate_delay_direct() with this fallback
: calculation?  If __delay() is a thin wrapper around read_current_timer()
: it should work just as well (plus patch 3 makes it handle SMIs).  I'll try
: that out.

This patch:

... so that it can be modified more clinically.

This is almost entirely cosmetic. The only change to the operation
is that the global variable is only set once after the estimation is
completed, rather than taking on all the intermediate values. However,
there are no readers of that variable, so this change is unimportant.
Signed-off-by: NPhil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Tested-by: NStephen Boyd <sboyd@codeaurora.org>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

71c696b1

sys_unshare: remove the dead CLONE_THREAD/SIGHAND/VM code · 9bfb23fc

由 Oleg Nesterov 提交于 3月 22, 2011

Cleanup: kill the dead code which does nothing but complicates the code
and confuses the reader.

sys_unshare(CLONE_THREAD/SIGHAND/VM) is not really implemented, and I
doubt very much it will ever work.  At least, nobody even tried since the
original 99d1419d ("unshare system call -v5: system call
handler function") was applied more than 4 years ago.

And the code is not consistent.  unshare_thread() always fails
unconditionally, while unshare_sighand() and unshare_vm() pretend to work
if there is nothing to unshare.

Remove unshare_thread(), unshare_sighand(), unshare_vm() helpers and
related variables and add a simple CLONE_THREAD | CLONE_SIGHAND| CLONE_VM
check into check_unshare_flags().

Also, move the "CLONE_NEWNS needs CLONE_FS" check from
check_unshare_flags() to sys_unshare().  This looks more consistent and
matches the similar do_sysvsem check in sys_unshare().

Note: with or without this patch "atomic_read(mm->mm_users) > 1" can give
a false positive due to get_task_mm().
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NRoland McGrath <roland@redhat.com>
Cc: Janak Desai <janak@us.ibm.com>
Cc: Daniel Lezcano <daniel.lezcano@free.fr>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9bfb23fc

kernel/cpu.c: fix many errors related to style. · 4d51985e

由 Michael Rodriguez 提交于 3月 22, 2011

Change the printk() calls to have the KERN_INFO/KERN_ERROR stuff, and
fixes other coding style errors.  Not _all_ of them are gone, though.

[akpm@linux-foundation.org: revert the bits I disagree with]
Signed-off-by: NMichael Rodriguez <dkingston02@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4d51985e

smp: move smp setup functions to kernel/smp.c · 34db18a0

由 Amerigo Wang 提交于 3月 22, 2011

Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter
setup functions from init/main.c to kenrel/smp.c, saves some #ifdef
CONFIG_SMP.
Signed-off-by: NWANG Cong <amwang@redhat.com>
Cc: Rakib Mullick <rakib.mullick@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

34db18a0

include/linux/err.h: add a function to cast error-pointers to a return value · fa9ee9c4

由 Uwe Kleine-König 提交于 3月 22, 2011

PTR_RET() can be used if you have an error-pointer and are only interested
in the eventual error value, but not the pointer.  Yields the usual 0 for
no error, -ESOMETHING otherwise.
Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fa9ee9c4

move x86 specific oops=panic to generic code · d404ab0a

由 Olaf Hering 提交于 3月 22, 2011

The oops=panic cmdline option is not x86 specific, move it to generic code.
Update documentation.
Signed-off-by: NOlaf Hering <olaf@aepfle.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d404ab0a

drivers/misc/pch_phub.c: add MODULE_DEVICE_TABLE · b2595142

由 Axel Lin 提交于 3月 22, 2011

The device table is required to load modules based on modaliases.
Signed-off-by: NAxel Lin <axel.lin@gmail.com>
Cc: Masayuki Ohtak <masa-korg@dsn.okisemi.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b2595142

drivers/misc/ep93xx_pwm.c: world-writable sysfs files · deb187e7

由 Vasiliy Kulikov 提交于 3月 22, 2011

Don't allow everybody to change device settings.
Signed-off-by: NVasiliy Kulikov <segoon@openwall.com>
Acked-by: NHartley Sweeten <hartleys@visionengravers.com>
Cc: Matthieu Crapet <mcrapet@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

deb187e7

drivers/misc/atmel_tclib.c: fix a memory leak · a844b43c

由 Axel Lin 提交于 3月 22, 2011

request_mem_region() will call kzalloc to allocate memory for struct
resource.  release_resource() unregisters the resource but does not free
the allocated memory, thus use release_mem_region() instead to fix the
memory leak.
Signed-off-by: NAxel Lin <axel.lin@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a844b43c

drivers/misc/hmc6352.c: fix wrong return value checking for i2c_master_recv() · 6f7d485e

由 Axel Lin 提交于 3月 22, 2011

i2c_master_recv() returns negative errno, or else the number of bytes
read.  Thus i2c_master_recv(client, i2c_data, 2) returns 2 instead of 1 in
success case.

[akpm@linux-foundation.org: make `ret' signed]
Signed-off-by: NAxel Lin <axel.lin@gmail.com>
Cc: Kalhan Trisal <kalhan.trisal@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6f7d485e

drivers/misc/apds9802als.c: put the device into runtime suspend after resume()/probe() is handled · 4e673599

由 Hong Liu 提交于 3月 22, 2011

Put the device into runtime suspend after resume()/probe() is handled by
the PM core and the device core code.  No need to manually add them in
each single driver.  And correct the runtime state in remove().
Signed-off-by: NHong Liu <hong.liu@intel.com>
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4e673599

ST SPEAr: PCIE gadget suppport · b9500546

由 Pratyush Anand 提交于 3月 22, 2011

This is a configurable gadget.  can be configured by configfs interface.
Any IP available at PCIE bus can be programmed to be used by host
controller.It supoorts both INTX and MSI.

By default, the gadget is configured for INTX and SYSRAM1 is mapped to
BAR0 with size 0x1000
Signed-off-by: NPratyush Anand <pratyush.anand@st.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Viresh Kumar <viresh.kumar@st.com>
Cc: Shiraz Hashim <shiraz.hashim@st.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b9500546

drivers/misc/bmp085.c: free initmem memory · 45bff2ea

由 Shubhrajyoti Datta 提交于 3月 22, 2011

Free the memory that is used only at init
Signed-off-by: NShubhrajyoti Datta <shubhrajyoti@ti.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

45bff2ea

bh1780gli: convert to dev pm ops · 4a7de634

由 Shubhrajyoti Datta 提交于 3月 22, 2011

Signed-off-by: NShubhrajyoti Datta <shubhrajyoti@ti.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4a7de634

fs.h: remove 8 bytes of padding from block_device on 64bit builds · f3ccfcda

由 Richard Kennedy 提交于 3月 22, 2011

Re-ordering struct block_inode to remove 8 bytes of padding on 64 bit
builds, which also shrinks bdev_inode by 8 bytes (776 -> 768) allowing it
to fit into one fewer cache lines.
Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f3ccfcda

include/linux/compiler-gcc*.h: unify macro definitions · c837fb37

由 Borislav Petkov 提交于 3月 22, 2011

Unify identical gcc3.x and gcc4.x macros.
Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c837fb37

fs: use appropriate printk priority levels · 80cdc6da

由 Mandeep Singh Baines 提交于 3月 22, 2011

printk()s without a priority level default to KERN_WARNING.  To reduce
noise at KERN_WARNING, this patch set the priority level appriopriately
for unleveled printks()s.  This should be useful to folks that look at
dmesg warnings closely.
Signed-off-by: NMandeep Singh Baines <msb@chromium.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

80cdc6da

add the common dma_addr_t typedef to include/linux/types.h · 3e50594e

由 FUJITA Tomonori 提交于 3月 22, 2011

All architectures can use the common dma_addr_t typedef now. We can
remove the arch specific dma_addr_t.
Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3e50594e

um: remove file pointer from ioctl · 8a06dc4d

由 Richard Weinberger 提交于 3月 22, 2011

Commit 6caa76b7 ("tty: now phase out the ioctl file pointer for good")
removed the ioctl file pointer.  User Mode Linux's line driver uses this
ioctl and needs a signature update too.
Signed-off-by: NRichard Weinberger <richard@nod.at>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <greg@kroah.com>
Cc: Jeff Dike <jdike@addtoit.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8a06dc4d

uml: kernels on {i386,x86_64} produce bad coredumps · 13e165ba

由 Paul Pluzhnikov 提交于 3月 22, 2011

One of our users reported that when a user-level program SIGSEGVs under
UML kernel, the resulting core dump is not very usable.

I have reproduced that with the latest kernel:

  make ARCH=um defconfig; make ARCH=um

Run the resulting kernel, then "inside" run this program:

#include <pthread.h>

void *fn(void *p)
{
 abort();
}

int main()
{
 pthread_t tid;
 pthread_create(&tid, 0, fn, 0);
 pthread_join(tid, 0);
 return 0;
}

Analyze the coredump with GDB. Here is what you'll see:

sudo gdb -q -ex 'set solib-absolute-prefix ../root_fs' -ex 'file ../root_fs/var/tmp/mt-abort' -ex 'core ../root_fs/var/tmp/core.762'
Reading symbols from /usr/local/google/root_fs/var/tmp/mt-abort...done.
[New Thread 763]
[New Thread 762]
Core was generated by `./mt-abort'.
Program terminated with signal 6, Aborted.
#0  0x0000000040255250 in raise () from ../root_fs/lib64/libc.so.6
(gdb) info thread
  2 Thread 762  0x0000000000000000 in ?? ()
* 1 Thread 763  0x0000000040255250 in raise () from ../root_fs/lib64/libc.so.6

Note that thread#2 looks funny.

(gdb) thread 2
[Switching to thread 2 (Thread 762)]#0  0x0000000000000000 in ?? ()
(gdb) info reg
rax            0x0      0
rbx            0x0      0
rcx            0x0      0
rdx            0x0      0
rsi            0x0      0
rdi            0x0      0
rbp            0x0      0x0
rsp            0x0      0x0
r8             0x0      0
r9             0x0      0
r10            0x0      0
r11            0x0      0
r12            0x0      0
r13            0x0      0
r14            0x0      0
r15            0x0      0
rip            0x0      0
eflags         0x0      [ ]
cs             0x0      0
ss             0x0      0
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0

Examining the core shows that NT_PRSTATUS notes for all threads other than
the one that crashed are zeroed out.

I believe this is happening because neither ELF_CORE_COPY_TASK_REGS nor
task_pt_regs are defined under ARCH=um, and so elf_core_copy_task_regs()
becomes a no-op.

Attached patch fixes this for SUBARCH={x86_64,i386}.
Signed-off-by: NPaul Pluzhnikov <ppluzhnikov@google.com>
Cc: Jeff Dike <jdike@addtoit.com>
Acked-by: NWANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

13e165ba

mm: simplify code of swap.c · 3dd7ae8e

由 Shaohua Li 提交于 3月 22, 2011

Clean up code and remove duplicate code. Next patch will use
pagevec_lru_move_fn introduced here too.
Signed-off-by: NShaohua Li <shaohua.li@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hiroyuki Kamezawa <kamezawa.hiroyuki@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3dd7ae8e

shmem: let shared anonymous be nonlinear again · bee4c36a

由 Hugh Dickins 提交于 3月 22, 2011

Up to 2.6.22, you could use remap_file_pages(2) on a tmpfs file or a
shared mapping of /dev/zero or a shared anonymous mapping.  In 2.6.23 we
disabled it by default, but set VM_CAN_NONLINEAR to enable it on safe
mappings.  We made sure to set it in shmem_mmap() for tmpfs files, but
missed it in shmem_zero_setup() for the others.  Fix that at last.
Reported-by: NKenny Simpson <theonetruekenny@yahoo.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bee4c36a

mm/memblock: properly handle overlaps and fix error path · 8f7a6605

由 Benjamin Herrenschmidt 提交于 3月 22, 2011

Currently memblock_reserve() or memblock_free() don't handle overlaps of
any kind.  There is some special casing for coalescing exactly adjacent
regions but that's about it.

This is annoying because typically memblock_reserve() is used to mark
regions passed by the firmware as reserved and we all know how much we can
trust our firmwares...

Also, with the current code, if we do something it doesn't handle right
such as trying to memblock_reserve() a large range spanning multiple
existing smaller reserved regions for example, or doing overlapping
reservations, it can silently corrupt the internal region array, causing
odd errors much later on, such as allocations returning reserved regions
etc...

This patch rewrites the underlying functions that add or remove a region
to the arrays.  The new code is a lot more robust as it fully handles
overlapping regions.  It's also, imho, simpler than the previous
implementation.

In addition, while doing so, I found a bug where if we fail to double the
array while adding a region, we would remove the last region of the array
rather than the region we just allocated.  This fixes it too.
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: NYinghai Lu <yinghai@kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8f7a6605

mm/page_alloc.c: use list_move() instead of list_del()/list_add() combination · 84be48d8

由 Kirill A. Shutemov 提交于 3月 22, 2011

Signed-off-by: NKirill A. Shutemov <kirill@shutemov.name>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

84be48d8

vmalloc: remove confusing comment on vwrite() · a42931bf

由 Namhyung Kim 提交于 3月 22, 2011

KM_USER1 is never used for vwrite() path so the caller doesn't need to
guarantee it is not used.  Only the caller should guarantee is KM_USER0
and it is commented already.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a42931bf

writeback: make mapping->writeback_index to point to the last written page · cf15b07c

由 Jun'ichi Nomura 提交于 3月 22, 2011

For range-cyclic writeback (e.g.  kupdate), the writeback code sets a
continuation point of the next writeback to mapping->writeback_index which
is set the page after the last written page.  This happens so that we
evenly write the whole file even if pages in it get continuously
redirtied.

However, in some cases, sequential writer is writing in the middle of the
page and it just redirties the last written page by continuing from that.
For example with an application which uses a file as a big ring buffer we
see:

[1st writeback session]
       ...
       flush-8:0-2743  4571: block_bio_queue: 8,0 W 94898514 + 8
       flush-8:0-2743  4571: block_bio_queue: 8,0 W 94898522 + 8
       flush-8:0-2743  4571: block_bio_queue: 8,0 W 94898530 + 8
       flush-8:0-2743  4571: block_bio_queue: 8,0 W 94898538 + 8
       flush-8:0-2743  4571: block_bio_queue: 8,0 W 94898546 + 8
     kworker/0:1-11    4571: block_rq_issue: 8,0 W 0 () 94898514 + 40
>>     flush-8:0-2743  4571: block_bio_queue: 8,0 W 94898554 + 8
>>     flush-8:0-2743  4571: block_rq_issue: 8,0 W 0 () 94898554 + 8

[2nd writeback session after 35sec]
       flush-8:0-2743  4606: block_bio_queue: 8,0 W 94898562 + 8
       flush-8:0-2743  4606: block_bio_queue: 8,0 W 94898570 + 8
       flush-8:0-2743  4606: block_bio_queue: 8,0 W 94898578 + 8
       ...
     kworker/0:1-11    4606: block_rq_issue: 8,0 W 0 () 94898562 + 640
     kworker/0:1-11    4606: block_rq_issue: 8,0 W 0 () 94899202 + 72
       ...
       flush-8:0-2743  4606: block_bio_queue: 8,0 W 94899962 + 8
       flush-8:0-2743  4606: block_bio_queue: 8,0 W 94899970 + 8
       flush-8:0-2743  4606: block_bio_queue: 8,0 W 94899978 + 8
       flush-8:0-2743  4606: block_bio_queue: 8,0 W 94899986 + 8
       flush-8:0-2743  4606: block_bio_queue: 8,0 W 94899994 + 8
     kworker/0:1-11    4606: block_rq_issue: 8,0 W 0 () 94899962 + 40
>>     flush-8:0-2743  4606: block_bio_queue: 8,0 W 94898554 + 8
>>     flush-8:0-2743  4606: block_rq_issue: 8,0 W 0 () 94898554 + 8

So we seeked back to 94898554 after we wrote all the pages at the end of
the file.

This extra seek seems unnecessary.  If we continue writeback from the last
written page, we can avoid it and do not cause harm to other cases.  The
original intent of even writeout over the whole file is preserved and if
the page does not get redirtied pagevec_lookup_tag() just skips it.

As an exceptional case, when I/O error happens, set done_index to the next
page as the comment in the code suggests.
Tested-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf15b07c

mm: remove inline from scan_swap_map() · 24b8ff7c

由 Cesar Eduardo Barros 提交于 3月 22, 2011

scan_swap_map() is a large function (224 lines), with several loops and a
complex control flow involving several gotos.

Given all that, it is a bit silly that it is marked as inline.  The
compiler agrees with me: on a x86-64 compile, it did not inline the
function.

Remove the "inline" and let the compiler decide instead.
Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

24b8ff7c

sys_swapon: separate final enabling of the swapfile · 40531542

由 Cesar Eduardo Barros 提交于 3月 22, 2011

The block in sys_swapon which does the final adjustments to the
swap_info_struct and to swap_list is the same as the block which
re-inserts it again at sys_swapoff on failure of try_to_unuse(). Move
this code to a separate function, and use it both in sys_swapon and
sys_swapoff.
Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
Tested-by: NEric B Munson <emunson@mgebm.net>
Acked-by: NEric B Munson <emunson@mgebm.net>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

40531542

sys_swapoff: change order to match sys_swapon · c6a2b64b

由 Cesar Eduardo Barros 提交于 3月 22, 2011

The block in sys_swapon which does the final adjustments to the
swap_info_struct and to swap_list is the same as the block which
re-inserts it again at sys_swapoff on failure of try_to_unuse(), except
for the order of the operations within the lock. Since the order should
not matter, arbitrarily change sys_swapoff to match sys_swapon, in
preparation to making both share the same code.
Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
Tested-by: NEric B Munson <emunson@mgebm.net>
Acked-by: NEric B Munson <emunson@mgebm.net>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c6a2b64b

sys_swapon: move printk outside lock · c69dbfb8

由 Cesar Eduardo Barros 提交于 3月 22, 2011

The block in sys_swapon which does the final adjustments to the
swap_info_struct and to swap_list is the same as the block which
re-inserts it again at sys_swapoff on failure of try_to_unuse(). To be
able to make both share the same code, move the printk() call in the
middle of it to just after it.
Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
Tested-by: NEric B Munson <emunson@mgebm.net>
Acked-by: NEric B Munson <emunson@mgebm.net>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c69dbfb8

sys_swapon: remove nr_good_pages variable · 9c8100ef

由 Cesar Eduardo Barros 提交于 3月 22, 2011

It still exists within setup_swap_map_and_extents(), but after it
nr_good_pages == p->pages.
Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
Tested-by: NEric B Munson <emunson@mgebm.net>
Acked-by: NEric B Munson <emunson@mgebm.net>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9c8100ef

sys_swapon: simplify error flow in setup_swap_map_and_extents() · bdb8e3f6

由 Cesar Eduardo Barros 提交于 3月 22, 2011

Since there is no cleanup to do, there is no reason to jump to a label.
Return directly instead.
Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
Tested-by: NEric B Munson <emunson@mgebm.net>
Acked-by: NEric B Munson <emunson@mgebm.net>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bdb8e3f6

sys_swapon: separate parsing of bad blocks and extents · 915d4d7b

由 Cesar Eduardo Barros 提交于 3月 22, 2011

Move the code which parses the bad block list and the extents to a
separate function. Only code movement, no functional changes.

This change uses the fact that, after the success path, nr_good_pages ==
p->pages.
Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
Tested-by: NEric B Munson <emunson@mgebm.net>
Acked-by: NEric B Munson <emunson@mgebm.net>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

915d4d7b

sys_swapon: call swap_cgroup_swapon() earlier · 1421ef3c

由 Cesar Eduardo Barros 提交于 3月 22, 2011

The call to swap_cgroup_swapon is in the middle of loading the swap map
and extents. As it only does memory allocation and does not depend on
the swapfile layout (map/extents), it can be called earlier (or later).

Move it to just after the allocation of swap_map, since it is
conceptually similar (allocates a map).
Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
Tested-by: NEric B Munson <emunson@mgebm.net>
Acked-by: NEric B Munson <emunson@mgebm.net>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1421ef3c

sys_swapon: simplify error flow in read_swap_header() · 38719025

由 Cesar Eduardo Barros 提交于 3月 22, 2011

Since there is no cleanup to do, there is no reason to jump to a label.
Return directly instead.
Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
Tested-by: NEric B Munson <emunson@mgebm.net>
Acked-by: NEric B Munson <emunson@mgebm.net>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

38719025

sys_swapon: separate parsing of swapfile header · ca8bd38b

由 Cesar Eduardo Barros 提交于 3月 22, 2011

Move the code which parses and checks the swapfile header (except for
the bad block list) to a separate function. Only code movement, no
functional changes.
Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
Tested-by: NEric B Munson <emunson@mgebm.net>
Acked-by: NEric B Munson <emunson@mgebm.net>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ca8bd38b

sys_swapon: move setting of swapfilepages near use · 5de771e4

由 Cesar Eduardo Barros 提交于 3月 22, 2011

There is no reason I can see to read inode->i_size long before it is
needed. Move its read to just before it is needed, to reduce the
variable lifetime.
Signed-off-by: NCesar Eduardo Barros <cesarb@cesarb.net>
Tested-by: NEric B Munson <emunson@mgebm.net>
Acked-by: NEric B Munson <emunson@mgebm.net>
Reviewed-by: NJesper Juhl <jj@chaosbits.net>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5de771e4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功