提交 · 081794564e3000e602de290d1121792c33b475a4 · openeuler / raspberrypi-kernel

13 1月, 2012 40 次提交

dma-buf: Documentation update for Kconfig select · 08179456

由 Sumit Semwal 提交于 1月 13, 2012

As per Linus' comment, dma-buf Kconfig entry shouldn't have an option
text, but should be selected by the subsystems that use it.

Add this information in the documentation as well.
Signed-off-by: NSumit Semwal <sumit.semwal@ti.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

08179456

nouveau: Support Optimus models for vga_switcheroo · d099230c

由 Peter Lekensteyn 提交于 12月 17, 2011

Newer nVidia cards with Optimus do not support/use the DSM switching functions.
Instead, it require a DSM function to be called prior to bringing a device into
D3 state. No other _DSM calls are necessary before/after enabling/disabling a
device. Switching between discrete and integrated GPU is not supported by
this Optimus _DSM call, therefore return on the switching method.
Signed-off-by: NPeter Lekensteyn <lekensteyn@gmail.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

d099230c

nouveau: properly check for _DSM function support · 9075e85f

由 Peter Lekensteyn 提交于 12月 17, 2011

According to the ACPI spec version 4, section 9.14.1, _DSM functions
must return a value with the first bit enabled if any DSM functions are
supported for the given UUID and revision ID. For a given function index n
to be marked supported, bit n must be enabled.
Signed-off-by: NPeter Lekensteyn <lekensteyn@gmail.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

9075e85f

dma-buf: drop option text so users don't select it. · 3b32a592

由 Dave Airlie 提交于 1月 13, 2012

This is going to be used by other subsystems so they should select it.
Signed-off-by: NDave Airlie <airlied@redhat.com>

3b32a592

radeon: Call pci_clear_master() instead of open-coding it. · 642ce525

由 Michel Dänzer 提交于 1月 12, 2012

Reported-by: NBen Hutchings <ben@decadent.org.uk>
Signed-off-by: NMichel Dänzer <michel.daenzer@amd.com>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

642ce525

gma500: Discard modes that don't fit in stolen memory · 9f821c67

由 Alan Cox 提交于 1月 12, 2012

[This fixes a crash on boot if the system is plugged into an HDTV so it's
 probably appropriate to push even though it didn't make the window. We could
 be cleverer about this but the simple version seems to be the safe one]

From: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>

At the moment we cannot allocate more than stolen memory size for framebuffers.
To get around that issues we discard modes that doesn't fit. This is a temporary
solution until we can freely allocate framebuffer memory.

[Currently the framebuffer needs to be linear in kernel space due to limits
 in the kernel fb layer - AC]
Signed-off-by: NPatrik Jakobsson <patrik.r.jakobsson@gmail.com>
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

9f821c67

drm: bump DRM_CONNECTOR_MAX_ENCODER from 2 to 3 · afe887df

由 Ben Skeggs 提交于 1月 12, 2012

There exists at least one NVIDIA GPU (Quadro NVS 300) that has a DMS-59
connector which is capable of supporting DisplayPort, TMDS and VGA on
a single connector.

We need to bump the allowed encoder limit to support all three configs.
Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

afe887df

drm/radeon/kms: Fix module parameter description format · 27d4d052

由 Jean Delvare 提交于 11月 30, 2011

Module parameter descriptions don't take a trailing \n, otherwise it
breaks formatting of modinfo's output. Also add missing space after
comma.
Signed-off-by: NJean Delvare <jdelvare@suse.de>
Cc: David Airlie <airlied@linux.ie>
Reviewed-by: NAlex Deucher <alexander.deucher@amd.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

27d4d052

drm/radeon/kms/ni: fix packet2 handling for VM IB parser · 0b41da60

由 Alex Deucher 提交于 1月 12, 2012

Packet2 is only one dword.
Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
Reviewed-by: NJerome Glisse <jglisse@redhat.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

0b41da60

ttm/dma: Remove the WARN() which is not useful. · 0e113315

由 Konrad Rzeszutek Wilk 提交于 1月 12, 2012

. It was useful during development, but now on a production system
we can get this (if the user forgot to upload the firmware):

[drm] radeon: irq initialized.
[drm] GART: num cpu pages 131072, num gpu pages 131072
[drm] radeon: ib pool ready.
[drm] Loading SUMO Microcode
r600_cp: Failed to load firmware "radeon/SUMO_pfp.bin"
atl1c 0000:03:00.0: version 1.0.1.0-NAPI.213057] [drm:evergreen_startup] *ERROR* Failed to load firmware!
radeon 0000:00:01.0: disabling GPU acceleration
88] radeon 0000:00:01.0: ffff8801bb782400 unpin not necessary
------------[ cut here ]------------
WARNING: at /home/konrad/linux-linus/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c:956 ttm_dma_unpopulate+0x79/0x300 [ttm]()
Hardware name: System Product Name
Modules linked in: e1000e atl1c radeon(+) ahci libahci libata scsi_mod fbcon tileblit font ttm bitblit softcursor drm_kms_helper wmi xen_blkfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea xenfs xen_privcmd
Pid: 1600, comm: modprobe Not tainted 3.2.0-06100-ge343a895 #1
Call Trace:
 [<ffffffff8108973a>] warn_slowpath_common+0x7a/0xb0
 [<ffffffff81089785>] warn_slowpath_null+0x15/0x20
 [<ffffffffa0060309>] ttm_dma_unpopulate+0x79/0x300 [ttm]
 [<ffffffffa01341c0>] radeon_ttm_tt_unpopulate+0x120/0x130 [radeon]
 [<ffffffffa0056e0c>] ttm_tt_destroy+0x2c/0x70 [ttm]
 [<ffffffffa0057a4e>] ttm_bo_cleanup_memtype_use+0x3e/0x80 [ttm]
 [<ffffffffa00595a1>] ttm_bo_release+0x251/0x280 [ttm]
 [<ffffffffa0059610>] ttm_bo_unref+0x40/0x60 [ttm]
 [<ffffffffa0134d02>] radeon_bo_unref+0x42/0x80 [radeon]
 [<ffffffffa0186dfb>] radeon_sa_bo_manager_fini+0x6b/0x80 [radeon]
 [<ffffffffa0146b8f>] radeon_ib_pool_fini+0x6f/0x90 [radeon]
 [<ffffffffa014be49>] r100_ib_fini+0x19/0x20 [radeon]
 [<ffffffffa017b47e>] evergreen_init+0x1ee/0x2d0 [radeon]

The big WARN() has nothing to do with the culprit - which is that
the firmware was not loaded. So lets remove the WARN() from the TTM DMA code.
Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: NJerome Glisse <jglisse@redhat.com>
Signed-off-by: NDave Airlie <airlied@redhat.com>

0e113315

Merge branch 'akpm' (aka "Andrew's patch-bomb, take two") · 09946950

由 Linus Torvalds 提交于 1月 12, 2012

Andrew explains:

 - various misc stuff

 - Most of the rest of MM: memcg, threaded hugepages, others.

 - cpumask

 - kexec

 - kdump

 - some direct-io performance tweaking

 - radix-tree optimisations

 - new selftests code

   A note on this: often people will develop a new userspace-visible
   feature and will develop userspace code to exercise/test that
   feature.  Then they merge the patch and the selftest code dies.
   Sometimes we paste it into the changelog.  Sometimes the code gets
   thrown into Documentation/(!).

   This saddens me.  So this patch creates a bare-bones framework which
   will henceforth allow me to ask people to include their test apps in
   the kernel tree so we can keep them alive.  Then when people enhance
   or fix the feature, I can ask them to update the test app too.

   The infrastruture is terribly trivial at present - let's see how it
   evolves.

 - checkpoint/restart feature work.

   A note on this: this is a project by various mad Russians to perform
   c/r mainly from userspace, with various oddball helper code added
   into the kernel where the need is demonstrated.

   So rather than some large central lump of code, what we have is
   little bits and pieces popping up in various places which either
   expose something new or which permit something which is normally
   kernel-private to be modified.

   The overall project is an ongoing thing.  I've judged that the size
   and scope of the thing means that we're more likely to be successful
   with it if we integrate the support into mainline piecemeal rather
   than allowing it all to develop out-of-tree.

   However I'm less confident than the developers that it will all
   eventually work! So what I'm asking them to do is to wrap each piece
   of new code inside CONFIG_CHECKPOINT_RESTORE.  So if it all
   eventually comes to tears and the project as a whole fails, it should
   be a simple matter to go through and delete all trace of it.

This lot pretty much wraps up the -rc1 merge for me.

* akpm: (96 commits)
  unlzo: fix input buffer free
  ramoops: update parameters only after successful init
  ramoops: fix use of rounddown_pow_of_two()
  c/r: prctl: add PR_SET_MM codes to set up mm_struct entries
  c/r: procfs: add start_data, end_data, start_brk members to /proc/$pid/stat v4
  c/r: introduce CHECKPOINT_RESTORE symbol
  selftests: new x86 breakpoints selftest
  selftests: new very basic kernel selftests directory
  radix_tree: take radix_tree_path off stack
  radix_tree: remove radix_tree_indirect_to_ptr()
  dio: optimize cache misses in the submission path
  vfs: cache request_queue in struct block_device
  fs/direct-io.c: calculate fs_count correctly in get_more_blocks()
  drivers/parport/parport_pc.c: fix warnings
  panic: don't print redundant backtraces on oops
  sysctl: add the kernel.ns_last_pid control
  kdump: add udev events for memory online/offline
  include/linux/crash_dump.h needs elf.h
  kdump: fix crash_kexec()/smp_send_stop() race in panic()
  kdump: crashk_res init check for /sys/kernel/kexec_crash_size
  ...

09946950

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 7c17d86a

由 Linus Torvalds 提交于 1月 12, 2012

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (69 commits)
  pptp: Accept packet with seq zero
  RDS: Remove some unused iWARP code
  net: fsl: fec: handle 10Mbps speed in RMII mode
  drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c: add missing iounmap
  drivers/net/ethernet/tundra/tsi108_eth.c: add missing iounmap
  ksz884x: fix mtu for VLAN
  net_sched: sfq: add optional RED on top of SFQ
  dp83640: Fix NOHZ local_softirq_pending 08 warning
  gianfar: Fix invalid TX frames returned on error queue when time stamping
  gianfar: Fix missing sock reference when processing TX time stamps
  phylib: introduce mdiobus_alloc_size()
  net: decrement memcg jump label when limit, not usage, is changed
  net: reintroduce missing rcu_assign_pointer() calls
  inet_diag: Rename inet_diag_req_compat into inet_diag_req
  inet_diag: Rename inet_diag_req into inet_diag_req_v2
  bond_alb: don't disable softirq under bond_alb_xmit
  mac80211: fix rx->key NULL pointer dereference in promiscuous mode
  nl80211: fix old station flags compatibility
  mdio-octeon: use an unique MDIO bus name.
  mdio-gpio: use an unique MDIO bus name.
  ...

7c17d86a

unlzo: fix input buffer free · 35f15268

由 Sascha Hauer 提交于 1月 12, 2012

unlzo modifies the pointer to in_buf, so we have to free the original
buffer, not the modified pointer.
Signed-off-by: NSascha Hauer <s.hauer@pengutronix.de>
Cc: Lasse Collin <lasse.collin@tukaani.org>
Cc: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

35f15268

ramoops: update parameters only after successful init · c755201e

由 Kees Cook 提交于 1月 12, 2012

If a platform device exists on the system, but ramoops fails to attach to
it, the module parameters are overridden before ramoops can fall back and
try to use passed module parameters.  Move update to end of init routine.
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: Marco Stornelli <marco.stornelli@gmail.com>
Cc: Sergiu Iordache <sergiu@chromium.org>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c755201e

ramoops: fix use of rounddown_pow_of_two() · fdb59507

由 Marco Stornelli 提交于 1月 12, 2012

The return value of rounddown_pow_of_two wasn't evaluated, so the
operation was a no-op.
Signed-off-by: NMarco Stornelli <marco.stornelli@gmail.com>
Reported-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fdb59507

c/r: prctl: add PR_SET_MM codes to set up mm_struct entries · 028ee4be

由 Cyrill Gorcunov 提交于 1月 12, 2012

When we restore a task we need to set up text, data and data heap sizes
from userspace to the values a task had at checkpoint time.  This patch
adds auxilary prctl codes for that.

While most of them have a statistical nature (their values are involved
into calculation of /proc/<pid>/statm output) the start_brk and brk values
are used to compute an allowed size of program data segment expansion.
Which means an arbitrary changes of this values might be dangerous
operation.  So to restrict access the following requirements applied to
prctl calls:

 - The process has to have CAP_SYS_ADMIN capability granted.
 - For all opcodes except start_brk/brk members an appropriate
   VMA area must exist and should fit certain VMA flags,
   such as:
   - code segment must be executable but not writable;
   - data segment must not be executable.

start_brk/brk values must not intersect with data segment and must not
exceed RLIMIT_DATA resource limit.

Still the main guard is CAP_SYS_ADMIN capability check.

Note the kernel should be compiled with CONFIG_CHECKPOINT_RESTORE support
otherwise these prctl calls will return -EINVAL.

[akpm@linux-foundation.org: cache current->mm in a local, saving 200 bytes text]
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

028ee4be

c/r: procfs: add start_data, end_data, start_brk members to /proc/$pid/stat v4 · b3f7f573

由 Cyrill Gorcunov 提交于 1月 12, 2012

The mm->start_code/end_code, mm->start_data/end_data, mm->start_brk are
involved into calculation of program text/data segment sizes (which might
be seen in /proc/<pid>/statm) and into brk() call final address.

For restore we need to know all these values.  While
mm->start_code/end_code already present in /proc/$pid/stat, the rest
members are not, so this patch brings them in.

The restore procedure of these members is addressed in another patch using
prctl().
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b3f7f573

c/r: introduce CHECKPOINT_RESTORE symbol · 067bce1a

由 Cyrill Gorcunov 提交于 1月 12, 2012

For checkpoint/restore we need auxilary features being compiled into the
kernel, such as additional prctl codes, /proc/<pid>/map_files and etc...
but same time these features are not mandatory for a regular kernel so
CHECKPOINT_RESTORE config symbol should bring a way to disable them all at
once if one wish to get rid of additional functionality.
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Reviewed-by: NKees Cook <keescook@chromium.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

067bce1a

selftests: new x86 breakpoints selftest · 85bbddc3

由 Frederic Weisbecker 提交于 1月 12, 2012

Bring a first selftest in the relevant directory.  This tests several
combinations of breakpoints and watchpoints in x86, as well as icebp traps
and int3 traps.  Given the amount of breakpoint regressions we raised
after we merged the generic breakpoint infrastructure, such selftest
became necessary and can still serve today as a basis for new patches that
touch the do_debug() path.
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

85bbddc3

selftests: new very basic kernel selftests directory · 274343ad

由 Frederic Weisbecker 提交于 1月 12, 2012

Bring a new kernel selftests directory in tools/testing/selftests.  To
add a new selftest, create a subdirectory with the sources and a
makefile that creates a target named "run_test" then add the
subdirectory name to the TARGET var in tools/testing/selftests/Makefile
and tools/testing/selftests/run_tests script.

This can help centralizing and maintaining any useful selftest that
developers usually tend to let rust in peace on some random server.
Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Sam Ravnborg <sam@ravnborg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

274343ad

radix_tree: take radix_tree_path off stack · e2bdb933

由 Hugh Dickins 提交于 1月 12, 2012

Down, down in the deepest depths of GFP_NOIO page reclaim, we have
shrink_page_list() calling __remove_mapping() calling __delete_from_
swap_cache() or __delete_from_page_cache().

You would not expect those to need much stack, but in fact they call
radix_tree_delete(): which declares a 192-byte radix_tree_path array on
its stack (to record the node,offsets it visits when descending, in case
it needs to ascend to update them). And if any tag is still set [1],
that calls radix_tree_tag_clear(), which declares a further such
192-byte radix_tree_path array on the stack. (At least we have
interrupts disabled here, so won't then be pushing registers too.)

That was probably a good choice when most users were 32-bit (array of
half the size), and adding fields to radix_tree_node would have bloated
it unnecessarily. But nowadays many are 64-bit, and each
radix_tree_node contains a struct rcu_head, which is only used when
freeing; whereas the radix_tree_path info is only used for updating the
tree (deleting, clearing tags or setting tags if tagged) when a lock
must be held, of no interest when accessing the tree locklessly.

So add a parent pointer to the radix_tree_node, in union with the
rcu_head, and remove all uses of the radix_tree_path. There would be
space in that union to save the offset when descending as before (we can
argue that a lock must already be held to exclude other users), but
recalculating it when ascending is both easy (a constant shift and a
constant mask) and uncommon, so it seems better just to do that.

Two little optimizations: no need to decrement height when descending,
adjusting shift is enough; and once radix_tree_tag_if_tagged() has set
tag on a node and its ancestors, it need not ascend from that node
again.

perf on the radix tree test harness reports radix_tree_insert() as 2%
slower (now having to set parent), but radix_tree_delete() 24% faster.
Surely that's an exaggeration from rtth's artificially low map shift 3,
but forcing it back to 6 still rates radix_tree_delete() 8% faster.

[1] Can a pagecache tag (dirty, writeback or towrite) actually still be
set at the time of radix_tree_delete()? Perhaps not if the filesystem is
well-behaved. But although I've not tracked any stack overflow down to
this cause, I have observed a curious case in which a dirty tag is set
and left set on tmpfs: page migration's migrate_page_copy() happens to
use __set_page_dirty_nobuffers() to set PageDirty on the newpage, and
that sets PAGECACHE_TAG_DIRTY as a side-effect - harmless to a
filesystem which doesn't use tags, except for this stack depth issue.
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nai Xia <nai.xia@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e2bdb933

radix_tree: remove radix_tree_indirect_to_ptr() · 928da837

由 Xiao Guangrong 提交于 1月 12, 2012

It is not used anymore, remove it
Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Acked-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

928da837

dio: optimize cache misses in the submission path · 65dd2aa9

由 Andi Kleen 提交于 1月 12, 2012

Some investigation of a transaction processing workload showed that a
major consumer of cycles in __blockdev_direct_IO is the cache miss while
accessing the block size.  This is because it has to walk the chain from
block_dev to gendisk to queue.

The block size is needed early on to check alignment and sizes.  It's only
done if the check for the inode block size fails.  But the costly block
device state is unconditionally fetched.

- Reorganize the code to only fetch block dev state when actually
  needed.

Then do a prefetch on the block dev early on in the direct IO path.  This
is worth it, because there is substantial code run before we actually
touch the block dev now.

- I also added some unlikelies to make it clear the compiler that block
  device fetch code is not normally executed.

This gave a small, but measurable improvement on a large database
benchmark (about 0.3%)

[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: using prefetch requires including prefetch.h]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

65dd2aa9

vfs: cache request_queue in struct block_device · 87192a2a

由 Andi Kleen 提交于 1月 12, 2012

This makes it possible to get from the inode to the request_queue with one
less cache miss.  Used in followon optimization.

The livetime of the pointer is the same as the gendisk.

This assumes that the queue will always stay the same in the gendisk while
it's visible to block_devices.  I think that's safe correct?
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NJeff Moyer <jmoyer@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

87192a2a

fs/direct-io.c: calculate fs_count correctly in get_more_blocks() · ae55e1aa

由 Tao Ma 提交于 1月 12, 2012

In get_more_blocks(), we use dio_count to calcuate fs_count and do some
tricky things to increase fs_count if dio_count isn't aligned.  But
actually it still has some corner cases that can't be coverd.  See the
following example:

	dio_write foo -s 1024 -w 4096

(direct write 4096 bytes at offset 1024).  The same goes if the offset
isn't aligned to fs_blocksize.

In this case, the old calculation counts fs_count to be 1, but actually we
will write into 2 different blocks (if fs_blocksize=4096).  The old code
just works, since it will call get_block twice (and may have to allocate
and create extents twice for filesystems like ext4).  So we'd better call
get_block just once with the proper fs_count.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ae55e1aa

drivers/parport/parport_pc.c: fix warnings · 45dac90f

由 Andrew Morton 提交于 1月 12, 2012

drivers/parport/parport_pc.c: In function '__check_irq':
drivers/parport/parport_pc.c:3415: warning: return from incompatible pointer type
drivers/parport/parport_pc.c: In function '__check_dma':
drivers/parport/parport_pc.c:3417: warning: return from incompatible pointer type
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

45dac90f

panic: don't print redundant backtraces on oops · 6e6f0a1f

由 Andi Kleen 提交于 1月 12, 2012

When an oops causes a panic and panic prints another backtrace it's pretty
common to have the original oops data be scrolled away on a 80x50 screen.

The second backtrace is quite redundant and not needed anyways.

So don't print the panic backtrace when oops_in_progress is true.

[akpm@linux-foundation.org: add comment]
Signed-off-by: NAndi Kleen <ak@linux.intel.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6e6f0a1f

sysctl: add the kernel.ns_last_pid control · b8f566b0

由 Pavel Emelyanov 提交于 1月 12, 2012

The sysctl works on the current task's pid namespace, getting and setting
its last_pid field.

Writing is allowed for CAP_SYS_ADMIN-capable tasks thus making it possible
to create a task with desired pid value.  This ability is required badly
for the checkpoint/restore in userspace.

This approach suits all the parties for now.
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b8f566b0

kdump: add udev events for memory online/offline · f5138e42

由 Michael Holzheu 提交于 1月 12, 2012

Currently no udev events for memory hotplug "online" and "offline" are
generated:

  # udevadm monitor
  # echo offline > /sys/devices/system/memory/memory4/state
  ==> No event

When kdump is loaded, kexec detects the current memory configuration and
stores it in the pre-allocated ELF core header.  Therefore, for kdump it
is necessary to reload the kdump kernel with kexec when the memory
configuration changes (e.g.  for online/offline hotplug memory).

In order to do this automatically, udev rules should be used.  This kernel
patch adds udev events for "online" and "offline".  Together with this
kernel patch, the following udev rules for online/offline have to be added
to "/etc/udev/rules.d/98-kexec.rules":

  SUBSYSTEM=="memory", ACTION=="online", PROGRAM="/etc/init.d/kdump restart"
  SUBSYSTEM=="memory", ACTION=="offline", PROGRAM="/etc/init.d/kdump restart"

[sfr@canb.auug.org.au: fixups for class to subsystem conversion]
Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Cc: Dave Hansen <haveblue@us.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5138e42

include/linux/crash_dump.h needs elf.h · 1f536b9e

由 Fabio Estevam 提交于 1月 12, 2012

Building an ARM target we get the following warnings:

  CC      arch/arm/kernel/setup.o
  In file included from arch/arm/kernel/setup.c:39:
  arch/arm/include/asm/elf.h:102:1: warning: "vmcore_elf64_check_arch" redefined
  In file included from arch/arm/kernel/setup.c:24:
  include/linux/crash_dump.h:30:1: warning: this is the location of the previous definition

Quoting Russell King:

"linux/crash_dump.h makes no attempt to include asm/elf.h, but it depends
on stuff in asm/elf.h to determine how stuff inside this file is defined
at parse time.

So, if asm/elf.h is included after linux/crash_dump.h or not at all, you
get a different result from the situation where asm/elf.h is included
before."

So add elf.h header to crash_dump.h to avoid this problem.

The original discussion about this can be found at:
http://www.spinics.net/lists/arm-kernel/msg154113.htmlSigned-off-by: NFabio Estevam <fabio.estevam@freescale.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: <stable@vger.kernel.org>	[3.2.1]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1f536b9e

kdump: fix crash_kexec()/smp_send_stop() race in panic() · 93e13a36

由 Michael Holzheu 提交于 1月 12, 2012

When two CPUs call panic at the same time there is a possible race
condition that can stop kdump.  The first CPU calls crash_kexec() and the
second CPU calls smp_send_stop() in panic() before crash_kexec() finished
on the first CPU.  So the second CPU stops the first CPU and therefore
kdump fails:

1st CPU:
  panic()->crash_kexec()->mutex_trylock(&kexec_mutex)-> do kdump

2nd CPU:
  panic()->crash_kexec()->kexec_mutex already held by 1st CPU
       ->smp_send_stop()-> stop 1st CPU (stop kdump)

This patch fixes the problem by introducing a spinlock in panic that
allows only one CPU to process crash_kexec() and the subsequent panic
code.

All other CPUs call the weak function panic_smp_self_stop() that stops the
CPU itself.  This function can be overloaded by architecture code.  For
example "tile" can use their lower-power "nap" instruction for that.
Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Acked-by: NChris Metcalf <cmetcalf@tilera.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

93e13a36

kdump: crashk_res init check for /sys/kernel/kexec_crash_size · bec013c4

由 Michael Holzheu 提交于 1月 12, 2012

Currently it is possible to set the crash_size via the sysfs
/sys/kernel/kexec_crash_size even if no crash kernel memory has been
defined with the "crashkernel" parameter.  In this case "crashk_res" is
not initialized and crashk_res.start = crashk_res.end = 0.  Unfortunately
resource_size(&crashk_res) returns 1 in this case.  This breaks the s390
implementation of crash_(un)map_reserved_pages().

To fix the problem the correct "old_size" is now calculated in
crash_shrink_memory().  "old_size is set to "0" if crashk_res is not
initialized.  With this change crash_shrink_memory() will do nothing, when
"crashk_res" is not initialized.  It will return "0" for "echo 0 >
/sys/kernel/kexec_crash_size" and -EINVAL for "echo [not zero] >
/sys/kernel/kexec_crash_size".

In addition to that this patch also simplifies the "ret = -EINVAL" vs.
"ret = 0" logic as suggested by Simon Horman.
Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Reviewed-by: NDave Young <dyoung@redhat.com>
Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
Reviewed-by: NSimon Horman <horms@verge.net.au>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bec013c4

kdump: add missing RAM resource in crash_shrink_memory() · 6480e5a0

由 Michael Holzheu 提交于 1月 12, 2012

When shrinking crashkernel memory using /sys/kernel/kexec_crash_size for
the newly added memory no RAM resource is created at the moment.

Example:

  $ cat /proc/iomem
  00000000-bfffffff : System RAM
    00000000-005b7ac3 : Kernel code
    005b7ac4-009743bf : Kernel data
    009bb000-00a85c33 : Kernel bss
  c0000000-cfffffff : Crash kernel
  d0000000-ffffffff : System RAM

  $ echo 0 > /sys/kernel/kexec_crash_size
  $ cat /proc/iomem
  00000000-bfffffff : System RAM
    00000000-005b7ac3 : Kernel code
    005b7ac4-009743bf : Kernel data
    009bb000-00a85c33 : Kernel bss
                                   <<-- here is System RAM missing
  d0000000-ffffffff : System RAM

One result of this bug is that the memory chunk can never be set offline
using memory hotplug.  With this patch I insert a new "System RAM"
resource for the released memory.  Then the upper example looks like the
following:

  $ echo 0 > /sys/kernel/kexec_crash_size
  $ cat /proc/iomem
  00000000-bfffffff : System RAM
    00000000-005b7ac3 : Kernel code
    005b7ac4-009743bf : Kernel data
    009bb000-00a85c33 : Kernel bss
  c0000000-cfffffff : System RAM   <<-- new rescoure
  d0000000-ffffffff : System RAM

And now I can set chunk c0000000-cfffffff offline.
Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6480e5a0

kexec: remove KMSG_DUMP_KEXEC · a3dd3323

由 WANG Cong 提交于 1月 12, 2012

KMSG_DUMP_KEXEC is useless because we already save kernel messages inside
/proc/vmcore, and it is unsafe to allow modules to do other stuffs in a
crash dump scenario.

[akpm@linux-foundation.org: fix powerpc build]
Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
Reported-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NJarod Wilson <jarod@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a3dd3323

cpumask: update setup_node_to_cpumask_map() comments · 9512938b

由 Wanlong Gao 提交于 1月 12, 2012

node_to_cpumask() has been replaced by cpumask_of_node(), and wholly
removed since commit 29c337a0 ("cpumask: remove obsolete node_to_cpumask
now everyone uses cpumask_of_node").

So update the comments for setup_node_to_cpumask_map().
Signed-off-by: NWanlong Gao <gaowanlong@cn.fujitsu.com>
Acked-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9512938b

mm/vmalloc.c: eliminate extra loop in pcpu_get_vm_areas error path · f1db7afd

由 Kautuk Consul 提交于 1月 12, 2012

If either of the vas or vms arrays are not properly kzalloced, then the
code jumps to the err_free label.

The err_free label runs a loop to check and free each of the array members
of the vas and vms arrays which is not required for this situation as none
of the array members have been allocated till this point.

Eliminate the extra loop we have to go through by introducing a new label
err_free2 and then jumping to it.

[akpm@linux-foundation.org: remove now-unneeded tests]
Signed-off-by: NKautuk Consul <consul.kautuk@gmail.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f1db7afd

mm: rearrange putback_inactive_pages · 3f79768f

由 Hugh Dickins 提交于 1月 12, 2012

There is sometimes confusion between the global putback_lru_pages() in
migrate.c and the static putback_lru_pages() in vmscan.c: rename the
latter putback_inactive_pages(): it helps shrink_inactive_list() rather as
move_active_pages_to_lru() helps shrink_active_list().

Remove unused scan_control arg from putback_inactive_pages() and from
update_isolated_counts().  Move clear_active_flags() inside
update_isolated_counts().  Move NR_ISOLATED accounting up into
shrink_inactive_list() itself, so the balance is clearer.

Do the spin_lock_irq() before calling putback_inactive_pages() and
spin_unlock_irq() after return from it, so that it better matches
update_isolated_counts() and move_active_pages_to_lru().
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3f79768f

mm: remove isolate_pages() · f626012d

由 Hugh Dickins 提交于 1月 12, 2012

The isolate_pages() level in vmscan.c offers little but indirection: merge
it into isolate_lru_pages() as the compiler does, and use the names
nr_to_scan and nr_scanned in each case.
Signed-off-by: NHugh Dickins <hughd@google.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f626012d

mm: remove del_page_from_lru, add page_off_lru · 1c1c53d4

由 Hugh Dickins 提交于 1月 12, 2012

del_page_from_lru() repeats del_page_from_lru_list(), also working out
which LRU the page was on, clearing the relevant bits. Decouple those
functions: remove del_page_from_lru() and add page_off_lru().
Signed-off-by: NHugh Dickins <hughd@google.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1c1c53d4

mm: enum lru_list lru · 4111304d

由 Hugh Dickins 提交于 1月 12, 2012

Mostly we use "enum lru_list lru": change those few "l"s to "lru"s.
Signed-off-by: NHugh Dickins <hughd@google.com>
Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4111304d