提交 · 40e575b1d0b34b38519d361c10bdf8e0c688957b · openeuler / Kernel

06 7月, 2012 3 次提交

remoteproc: remove the get_by_name/put API · 40e575b1

由 Ohad Ben-Cohen 提交于 7月 02, 2012

Remove rproc_get_by_name() and rproc_put(), and the associated
remoteproc infrastructure that supports it (i.e. klist and friends),
because:

1. No one uses them
2. Using them is highly discouraged, and any potential user
   will be deeply scrutinized and encouraged to move.

If a user, that absolutely can't live with the direct boot/shutdown
model, does show up one day, then bringing this functionality back
is going to be trivial.

At this point though, keeping this functionality around is way too
much of a maintenance burden.

Cc: Sjur Brændeland <sjur.brandeland@stericsson.com>
Cc: Loic Pallardy <loic.pallardy@stericsson.com>
Cc: Ludovic BARRE <ludovic.barre@stericsson.com>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Fernando Guzman Lugo <fernando.lugo@ti.com>
Cc: Suman Anna <s-anna@ti.com>
Cc: Mark Grosen <mgrosen@ti.com>
Acked-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>

40e575b1

remoteproc: remove the now-redundant kref · 7183a2a7

由 Ohad Ben-Cohen 提交于 5月 30, 2012

Now that every rproc instance contains a device, we don't need a
kref anymore to maintain the refcount of the rproc instances:
that's what device are good with!

This patch removes the now-redundant kref, and switches to
{get, put}_device instead of kref_{get, put}.

We also don't need the kref's release function anymore, and instead,
we just utilize the class's release handler (which is now responsible
for all memory de-allocations).

Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Fernando Guzman Lugo <fernando.lugo@ti.com>
Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>

7183a2a7

remoteproc: maintain a generic child device for each rproc · b5ab5e24

由 Ohad Ben-Cohen 提交于 5月 30, 2012

For each registered rproc, maintain a generic remoteproc device whose
parent is the low level platform-specific device (commonly a pdev, but
it may certainly be any other type of device too).

With this in hand, the resulting device hierarchy might then look like:

omap-rproc.0
 |
 - remoteproc0  <---- new !
    |
    - virtio0
    |
    - virtio1
       |
       - rpmsg0
       |
       - rpmsg1
       |
       - rpmsg2

Where:
- omap-rproc.0 is the low level device that's bound to the
  driver which invokes rproc_register()
- remoteproc0 is the result of this patch, and will be added by the
  remoteproc framework when rproc_register() is invoked
- virtio0 and virtio1 are vdevs that are registered by remoteproc
  when it realizes that they are supported by the firmware
  of the physical remote processor represented by omap-rproc.0
- rpmsg0, rpmsg1 and rpmsg2 are rpmsg devices that represent rpmsg
  channels, and are registerd by the rpmsg bus when it gets notified
  about their existence

Technically, this patch:
- changes 'struct rproc' to contain this generic remoteproc.x device
- creates a new "remoteproc" type, to which this new generic remoteproc.x
  device belong to.
- adds a super simple enumeration method for the indices of the
  remoteproc.x devices
- updates all dev_* messaging to use the generic remoteproc.x device
  instead of the low level platform-specific device
- updates all dma_* allocations to use the parent of remoteproc.x (where
  the platform-specific memory pools, most commonly CMA, are to be found)

Adding this generic device has several merits:
- we can now add remoteproc runtime PM support simply by hooking onto the
  new "remoteproc" type
- all remoteproc log messages will now carry a common name prefix
  instead of having a platform-specific one
- having a device as part of the rproc struct makes it possible to simplify
  refcounting (see subsequent patch)

Thanks to Stephen Boyd <sboyd@codeaurora.org> for suggesting and
discussing these ideas in one of the remoteproc review threads and
to Fernando Guzman Lugo <fernando.lugo@ti.com> for trying them out
with the (upcoming) runtime PM support for remoteproc.

Cc: Fernando Guzman Lugo <fernando.lugo@ti.com>
Reviewed-by: NStephen Boyd <sboyd@codeaurora.org>
Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>

b5ab5e24

16 6月, 2012 1 次提交

swap: fix shmem swapping when more than 8 areas · 9b15b817

由 Hugh Dickins 提交于 6月 15, 2012

Minchan Kim reports that when a system has many swap areas, and tmpfs
swaps out to the ninth or more, shmem_getpage_gfp()'s attempts to read
back the page cannot locate it, and the read fails with -ENOMEM.

Whoops. Yes, I blindly followed read_swap_header()'s pte_to_swp_entry(
swp_entry_to_pte()) technique for determining maximum usable swap
offset, without stopping to realize that that actually depends upon the
pte swap encoding shifting swap offset to the higher bits and truncating
it there. Whereas our radix_tree swap encoding leaves offset in the
lower bits: it's swap "type" (that is, index of swap area) that was
truncated.

Fix it by reducing the SWP_TYPE_SHIFT() in swapops.h, and removing the
broken radix_to_swp_entry(swp_to_radix_entry()) from read_swap_header().

This does not reduce the usable size of a swap area any further, it
leaves it as claimed when making the original commit: no change from 3.0
on x86_64, nor on i386 without PAE; but 3.0's 512GB is reduced to 128GB
per swapfile on i386 with PAE. It's not a change I would have risked
five years ago, but with x86_64 supported for ten years, I believe it's
appropriate now.

Hmm, and what if some architecture implements its swap pte with offset
encoded below type? That would equally break the maximum usable swap
offset check. Happily, they all follow the same tradition of encoding
offset above type, but I'll prepare a check on that for next.
Reported-and-Reviewed-and-Tested-by: NMinchan Kim <minchan@kernel.org>
Signed-off-by: NHugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org [3.1, 3.2, 3.3, 3.4]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b15b817

14 6月, 2012 1 次提交

USB: add NO_D3_DURING_SLEEP flag and revert · c2fb8a3f

由 Alan Stern 提交于 6月 13, 2012

This patch (as1558) fixes a problem affecting several ASUS computers:
The machine crashes or corrupts memory when going into suspend if the
ehci-hcd driver is bound to any controllers.  Users have been forced
to unbind or unload ehci-hcd before putting their systems to sleep.

After extensive testing, it was determined that the machines don't
like going into suspend when any EHCI controllers are in the PCI D3
power state.  Presumably this is a firmware bug, but there's nothing
we can do about it except to avoid putting the controllers in D3
during system sleep.

The patch adds a new flag to indicate whether the problem is present,
and avoids changing the controller's power state if the flag is set.
Runtime suspend is unaffected; this matters only for system suspend.
However as a side effect, the controller will not respond to remote
wakeup requests while the system is asleep.  Hence USB wakeup is not
functional -- but of course, this is already true in the current state
of affairs.

A similar patch has already been applied as commit
151b6128 (USB: EHCI: fix crash during
suspend on ASUS computers).  The patch supersedes that one and reverts
it.  There are two differences:

	The old patch added the flag at the USB level; this patch
	adds it at the PCI level.

	The old patch applied to all chipsets with the same vendor,
	subsystem vendor, and product IDs; this patch makes an
	exception for a known-good system (based on DMI information).
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Tested-by: NDâniel Fraga <fragabr@gmail.com>
Tested-by: NAndrey Rahmatullin <wrar@wrar.name>
Tested-by: NSteven Rostedt <rostedt@goodmis.org>
Cc: stable <stable@vger.kernel.org>
Reviewed-by: NRafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c2fb8a3f

10 6月, 2012 1 次提交

net: Make linux/tcp.h C++ friendly (trivial) · 8876d6b5

由 Paul Pluzhnikov 提交于 6月 09, 2012

I originally sent this patch to <trivial@kernel.org>, but Jiri Kosina did
not feel that this is fully appropriate for the trivial tree.

Using linux/tcp.h from C++ results in:

cat t.cc
#include <linux/tcp.h>
int main() { }

g++ -c t.cc

In file included from t.cc:1:
/usr/include/linux/tcp.h:72: error: '__u32 __fswab32(__u32)' cannot appear in a constant-expression
/usr/include/linux/tcp.h:72: error: a function call cannot appear in a constant-expression
...

Attached trivial patch fixes this problem.

Tested:
- the t.cc above compiles with g++ and
- the following program generates the same output before/after
  the patch:

#include <linux/tcp.h>
#include <stdio.h>

int main ()
{
#define P(a) printf("%s: %08x\n", #a, (int)a)
 P(TCP_FLAG_CWR);
 P(TCP_FLAG_ECE);
 P(TCP_FLAG_URG);
 P(TCP_FLAG_ACK);
 P(TCP_FLAG_PSH);
 P(TCP_FLAG_RST);
 P(TCP_FLAG_SYN);
 P(TCP_FLAG_FIN);
 P(TCP_RESERVED_BITS);
 P(TCP_DATA_OFFSET);
#undef P
 return 0;
}
Signed-off-by: NPaul Pluzhnikov <ppluzhnikov@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8876d6b5

08 6月, 2012 5 次提交

T
vga_switcheroo: Fix error without CONFIG_VGA_SWITCHEROO · 505cff00
由 Takashi Iwai 提交于 6月 08, 2012
```
Fix a typo that is built only when CONFIG_VGA_SWITCHEROO=n.
Signed-off-by: NTakashi Iwai <tiwai@suse.de>
```
505cff00

vga_switcheroo: Add a helper function to get the client state · c8e9cf7b

由 Takashi Iwai 提交于 6月 07, 2012

Add vga_switcheroo_get_client_state() to get the current state of the
client.  This is necessary to determine the proper initial state of
audio clients in HD-audio driver.
Acked-by: NDave Airlie <airlied@redhat.com>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>

c8e9cf7b

module_param: stop double-calling parameters. · ae82fdb1

由 Rusty Russell 提交于 6月 08, 2012

Commit 026cee00 "params:
<level>_initcall-like kernel parameters" set old-style module
parameters to level 0.  And we call those level 0 calls where we used
to, early in start_kernel().

We also loop through the initcall levels and call the levelled
module_params before the corresponding initcall.  Unfortunately level
0 is early_init(), so we call the standard module_param calls twice.

(Turns out most things don't care, but at least ubi.mtd does).

Change the level to -1 for standard module_param calls.
Reported-by: NBenoît Thébaudeau <benoit.thebaudeau@advansee.com>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Cc: stable@kernel.org

ae82fdb1

c/r: prctl: add ability to get clear_tid_address · 300f786b

由 Cyrill Gorcunov 提交于 6月 07, 2012

Zero is written at clear_tid_address when the process exits.  This
functionality is used by pthread_join().

We already have sys_set_tid_address() to change this address for the
current task but there is no way to obtain it from user space.

Without the ability to find this address and dump it we can't restore
pthread'ed apps which call pthread_join() once they have been restored.

This patch introduces the PR_GET_TID_ADDRESS prctl option which allows
the current process to obtain own clear_tid_address.

This feature is available iif CONFIG_CHECKPOINT_RESTORE is set.

[akpm@linux-foundation.org: fix prctl numbering]
Signed-off-by: NAndrew Vagin <avagin@openvz.org>
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Cc: Pedro Alves <palves@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

300f786b

c/r: prctl: update prctl_set_mm_exe_file() after mm->num_exe_file_vmas removal · bafb282d

由 Konstantin Khlebnikov 提交于 6月 07, 2012

A fix for commit b32dfe37 ("c/r: prctl: add ability to set new
mm_struct::exe_file").

After removing mm->num_exe_file_vmas kernel keeps mm->exe_file until
final mmput(), it never becomes NULL while task is alive.

We can check for other mapped files in mm instead of checking
mm->num_exe_file_vmas, and mark mm with flag MMF_EXE_FILE_CHANGED in
order to forbid second changing of mm->exe_file.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Reviewed-by: NCyrill Gorcunov <gorcunov@openvz.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bafb282d

07 6月, 2012 2 次提交

netfilter: xt_HMARK: fix endianness and provide consistent hashing · d1992b16

由 Hans Schillstrom 提交于 5月 17, 2012

This patch addresses two issues:

a) Fix usage of u32 and __be32 that causes endianess warnings via sparse.
b) Ensure consistent hashing in a cluster that is composed of big and
   little endian systems. Thus, we obtain the same hash mark in an
   heterogeneous cluster.
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NHans Schillstrom <hans@schillstrom.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

d1992b16

rcu: Precompute RCU_FAST_NO_HZ timer offsets · aa9b1630

由 Paul E. McKenney 提交于 5月 10, 2012

When a CPU is entering dyntick-idle mode, tick_nohz_stop_sched_tick()
calls rcu_needs_cpu() see if RCU needs that CPU, and, if not, computes the
next wakeup time based on the timer wheels. Only later, when actually
entering the idle loop, rcu_prepare_for_idle() will be invoked. In some
cases, rcu_prepare_for_idle() will post timers to wake the CPU back up.
But all for naught: The next wakeup time for the CPU has already been
computed, and posting a timer afterwards does not force that wakeup
time to be recomputed. This means that rcu_prepare_for_idle()'s have
no effect.

This is not a problem on a busy system because something else will wake
up the CPU soon enough. However, on lightly loaded systems, the CPU
might stay asleep for a considerable length of time. If that CPU has
a callback that the rest of the system is waiting on, the system might
run very slowly or (in theory) even hang.

This commit avoids this problem by having rcu_needs_cpu() give
tick_nohz_stop_sched_tick() an estimate of when RCU will need the CPU
to wake back up, which tick_nohz_stop_sched_tick() takes into account
when programming the CPU's wakeup time. An alternative approach is
for rcu_prepare_for_idle() to use hrtimers instead of normal timers,
but timers are much more efficient than are hrtimers for frequently
and repeatedly posting and cancelling a given timer, which is exactly
what RCU_FAST_NO_HZ does.
Reported-by: NPascal Chapperon <pascal.chapperon@wanadoo.fr>
Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
Tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Tested-by: NPascal Chapperon <pascal.chapperon@wanadoo.fr>

aa9b1630

06 6月, 2012 4 次提交

perf: Limit callchains to 127 · 0b0d9cf6

由 Arun Sharma 提交于 4月 20, 2012

Stack depth of 255 seems excessive, given that copy_from_user_nmi()
could be slow.
Signed-off-by: NArun Sharma <asharma@fb.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1334961696-19580-3-git-send-email-asharma@fb.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

0b0d9cf6

sched: Fix domain iteration · c1174876

由 Peter Zijlstra 提交于 5月 31, 2012

Weird topologies can lead to asymmetric domain setups. This needs
further consideration since these setups are typically non-minimal
too.

For now, make it work by adding an extra mask selecting which CPUs
are allowed to iterate up.

The topology that triggered it is the one from David Rientjes:

	10 20 20 30
	20 10 20 20
	20 20 10 20
	30 20 20 10

resulting in boxes that wouldn't even boot.
Reported-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/n/tip-3p86l9cuaqnxz7uxsojmz5rm@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

c1174876

NFS: Fix a commit bug · 9bce008b

由 Trond Myklebust 提交于 6月 05, 2012

The new commit code fails to copy the verifier into the wb_verf field
of _all_ the nfs_page structures; it only copies it into the first entry.
The consequence is that most requests end up failing to match in
nfs_commit_release.

Fix is to copy the verifier into the req->wb_verf field in
nfs_write_completion.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: Fred Isaman <iisaman@netapp.com>

9bce008b

radix-tree: fix contiguous iterator · fffaee36

由 Konstantin Khlebnikov 提交于 6月 05, 2012

This patch fixes bug in macro radix_tree_for_each_contig().

If radix_tree_next_slot() sees NULL in next slot it returns NULL, but following
radix_tree_next_chunk() switches iterating into next chunk. As result iterating
becomes non-contiguous and breaks vfs "splice" and all its users.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
Reported-and-bisected-by: NHans de Bruin <jmdebruin@xmsnet.nl>
Reported-and-bisected-by: NOndrej Zary <linux@rainbow-software.org>
Reported-bisected-and-tested-by: NToralf Förster <toralf.foerster@gmx.de>
Link: https://lkml.org/lkml/2012/6/5/64
Cc: stable <stable@vger.kernel.org> # 3.4.x
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fffaee36

05 6月, 2012 1 次提交

NFSv4: Fix an Oops in the open recovery code · 1549210f

由 Trond Myklebust 提交于 6月 05, 2012

The open recovery code does not need to request a new value for the
mdsthreshold, and so does not allocate a struct nfs4_threshold.
The problem is that encode_getfattr_open() will still request an
mdsthreshold, and so we end up Oopsing in decode_attr_mdsthreshold.

This patch fixes encode_getfattr_open so that it doesn't request an
mdsthreshold when the caller isn't asking for one. It also fixes
decode_attr_mdsthreshold so that it errors if the server returns
an mdsthreshold that we didn't ask for (instead of Oopsing).
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: Andy Adamson <andros@netapp.com>

1549210f

04 6月, 2012 3 次提交

i2c: Add generic I2C multiplexer using pinctrl API · ae58d1e4

由 Stephen Warren 提交于 5月 18, 2012

This is useful for SoCs whose I2C module's signals can be routed to
different sets of pins at run-time, using the pinctrl API.

                                 +-----+  +-----+
                                 | dev |  | dev |
    +------------------------+   +-----+  +-----+
    | SoC                    |      |        |
    |                   /----|------+--------+
    |   +---+   +------+     | child bus A, on first set of pins
    |   |I2C|---|Pinmux|     |
    |   +---+   +------+     | child bus B, on second set of pins
    |                   \----|------+--------+--------+
    |                        |      |        |        |
    +------------------------+  +-----+  +-----+  +-----+
                                | dev |  | dev |  | dev |
                                +-----+  +-----+  +-----+
Signed-off-by: NStephen Warren <swarren@nvidia.com>
Acked-by: NLinus Walleij <linus.walleij@linaro.org>
Acked-by: NRob Herring <rob.herring@calxeda.com>
Signed-off-by: NWolfram Sang <w.sang@pengutronix.de>

ae58d1e4

Revert "mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks" · 68e3e926

由 Linus Torvalds 提交于 6月 03, 2012

This reverts commit 5ceb9ce6.

That commit seems to be the cause of the mm compation list corruption
issues that Dave Jones reported.  The locking (or rather, absense
there-of) is dubious, as is the use of the 'page' variable once it has
been found to be outside the pageblock range.

So revert it for now, we can re-visit this for 3.6.  If we even need to:
as Minchan Kim says, "The patch wasn't a bug fix and even test workload
was very theoretical".
Reported-and-tested-by: NDave Jones <davej@redhat.com>
Acked-by: NHugh Dickins <hughd@google.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@gmail.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

68e3e926

vfs: move inode stat information closer together · 2f9d3df8

由 Linus Torvalds 提交于 6月 03, 2012

The comment above it says "Stat data, not accessed from path walking",
but in fact some of inode fields we use for the common stat data was way
down at the end of the inode, causing unnecessary cache misses for the
common stat operations.

The inode structure is pretty big, and this can change padding depending
on field width, but at least on the common 64-bit configurations this
doesn't change the size.  Some of our inode layout has historically been
to tro to avoid unnecessary padding fields, but cache locality is at
least as important for layout, if not more.

Noticed by looking at kernel profiles, and noticing that the "i_blkbits"
access stood out like a sore thumb.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2f9d3df8

03 6月, 2012 1 次提交

tty: Revert the tty locking series, it needs more work · f309532b

由 Linus Torvalds 提交于 6月 02, 2012

This reverts the tty layer change to use per-tty locking, because it's
not correct yet, and fixing it will require some more deep surgery.

The main revert is d29f3ef3 ("tty_lock: Localise the lock"), but
there are several smaller commits that built upon it, they also get
reverted here. The list of reverted commits is:

  fde86d31 - tty: add lockdep annotations
  8f6576ad - tty: fix ldisc lock inversion trace
  d3ca8b64 - pty: Fix lock inversion
  b1d679af - tty: drop the pty lock during hangup
  abcefe5f - tty/amiserial: Add missing argument for tty_unlock()
  fd11b42e - cris: fix missing tty arg in wait_event_interruptible_tty call
  d29f3ef3 - tty_lock: Localise the lock

The revert had a trivial conflict in the 68360serial.c staging driver
that got removed in the meantime.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f309532b

02 6月, 2012 9 次提交

new helper: signal_delivered() · efee984c

由 Al Viro 提交于 4月 28, 2012

Does block_sigmask() + tracehook_signal_handler();  called when
sigframe has been successfully built.  All architectures converted
to it; block_sigmask() itself is gone now (merged into this one).

I'm still not too happy with the signature, but that's a separate
story (IMO we need a structure that would contain signal number +
siginfo + k_sigaction, so that get_signal_to_deliver() would fill one,
signal_delivered(), handle_signal() and probably setup...frame() -
take one).
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

efee984c

most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set · 77097ae5

由 Al Viro 提交于 4月 27, 2012

Only 3 out of 63 do not.  Renamed the current variant to __set_current_blocked(),
added set_current_blocked() that will exclude unblockable signals, switched
open-coded instances to it.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

77097ae5

A
set_restore_sigmask() is never called without SIGPENDING (and never should be) · edd63a27
由 Al Viro 提交于 4月 27, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
edd63a27

new helper: sigmask_to_save() · b7f9a11a

由 Al Viro 提交于 5月 02, 2012

replace boilerplate "should we use ->saved_sigmask or ->blocked?"
with calls of obvious inlined helper...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

b7f9a11a

new helper: restore_saved_sigmask() · 51a7b448

由 Al Viro 提交于 5月 21, 2012

first fruits of ..._restore_sigmask() helpers: now we can take
boilerplate "signal didn't have a handler, clear RESTORE_SIGMASK
and restore the blocked mask from ->saved_mask" into a common
helper.  Open-coded instances switched...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

51a7b448

new helpers: {clear,test,test_and_clear}_restore_sigmask() · 4ebefe3e

由 Al Viro 提交于 4月 26, 2012

helpers parallel to set_restore_sigmask(), used in the next commits
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4ebefe3e

HAVE_RESTORE_SIGMASK is defined on all architectures now · 754421c8

由 Al Viro 提交于 4月 26, 2012

Everyone either defines it in arch thread_info.h or has TIF_RESTORE_SIGMASK
and picks default set_restore_sigmask() in linux/thread_info.h.  Kill the
ifdefs, slap #error in linux/thread_info.h to catch breakage when new ones
get merged.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

754421c8

vfs: retry last component if opening stale dentry · 16b1c1cd

由 Miklos Szeredi 提交于 5月 21, 2012

NFS optimizes away d_revalidates for last component of open.  This means that
open itself can find the dentry stale.

This patch allows the filesystem to return EOPENSTALE and the VFS will retry the
lookup on just the last component if possible.

If the lookup was done using RCU mode, including the last component, then this
is not possible since the parent dentry is lost.  In this case fall back to
non-RCU lookup.  Currently this is not used since NFS will always leave RCU
mode.
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

16b1c1cd

fs: introduce inode operation ->update_time · c3b2da31

由 Josef Bacik 提交于 3月 26, 2012

Btrfs has to make sure we have space to allocate new blocks in order to modify
the inode, so updating time can fail.  We've gotten around this by having our
own file_update_time but this is kind of a pain, and Christoph has indicated he
would like to make xfs do something different with atime updates.  So introduce
->update_time, where we will deal with i_version an a/m/c time updates and
indicate which changes need to be made.  The normal version just does what it
has always done, updates the time and marks the inode dirty, and then
filesystems can choose to do something different.

I've gone through all of the users of file_update_time and made them check for
errors with the exception of the fault code since it's complicated and I wasn't
quite sure what to do there, also Jan is going to be pushing the file time
updates into page_mkwrite for those who have it so that should satisfy btrfs and
make it not a big deal to check the file_update_time() return code in the
generic fault path. Thanks,
Signed-off-by: NJosef Bacik <josef@redhat.com>

c3b2da31

01 6月, 2012 9 次提交

A
switch aio and shm to do_mmap_pgoff(), make do_mmap() static · e3fc629d
由 Al Viro 提交于 5月 30, 2012
```
after all, 0 bytes and 0 pages is the same thing...
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
e3fc629d
A
take security_mmap_file() outside of ->mmap_sem · 8b3ec681
由 Al Viro 提交于 5月 30, 2012
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
8b3ec681

c/r: prctl: add ability to set new mm_struct::exe_file · b32dfe37

由 Cyrill Gorcunov 提交于 5月 31, 2012

When we do restore we would like to have a way to setup a former
mm_struct::exe_file so that /proc/pid/exe would point to the original
executable file a process had at checkpoint time.

For this the PR_SET_MM_EXE_FILE code is introduced.  This option takes a
file descriptor which will be set as a source for new /proc/$pid/exe
symlink.

Note it allows to change /proc/$pid/exe if there are no VM_EXECUTABLE
vmas present for current process, simply because this feature is a special
to C/R and mm::num_exe_file_vmas become meaningless after that.

To minimize the amount of transition the /proc/pid/exe symlink might have,
this feature is implemented in one-shot manner.  Thus once changed the
symlink can't be changed again.  This should help sysadmins to monitor the
symlinks over all process running in a system.

In particular one could make a snapshot of processes and ring alarm if
there unexpected changes of /proc/pid/exe's in a system.

Note -- this feature is available iif CONFIG_CHECKPOINT_RESTORE is set and
the caller must have CAP_SYS_RESOURCE capability granted, otherwise the
request to change symlink will be rejected.
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b32dfe37

c/r: prctl: extend PR_SET_MM to set up more mm_struct entries · fe8c7f5c

由 Cyrill Gorcunov 提交于 5月 31, 2012

During checkpoint we dump whole process memory to a file and the dump
includes process stack memory.  But among stack data itself, the stack
carries additional parameters such as command line arguments, environment
data and auxiliary vector.

So when we do restore procedure and once we've restored stack data itself
we need to setup mm_struct::arg_start/end, env_start/end, so restored
process would be able to find command line arguments and environment data
it had at checkpoint time.  The same applies to auxiliary vector.

For this reason additional PR_SET_MM_(ARG_START | ARG_END | ENV_START |
ENV_END | AUXV) codes are introduced.
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: NKees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Vagin <avagin@openvz.org>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fe8c7f5c

syscalls, x86: add __NR_kcmp syscall · d97b46a6

由 Cyrill Gorcunov 提交于 5月 31, 2012

While doing the checkpoint-restore in the user space one need to determine
whether various kernel objects (like mm_struct-s of file_struct-s) are
shared between tasks and restore this state.

The 2nd step can be solved by using appropriate CLONE_ flags and the
unshare syscall, while there's currently no ways for solving the 1st one.

One of the ways for checking whether two tasks share e.g.  mm_struct is to
provide some mm_struct ID of a task to its proc file, but showing such
info considered to be not that good for security reasons.

Thus after some debates we end up in conclusion that using that named
'comparison' syscall might be the best candidate.  So here is it --
__NR_kcmp.

It takes up to 5 arguments - the pids of the two tasks (which
characteristics should be compared), the comparison type and (in case of
comparison of files) two file descriptors.

Lookups for pids are done in the caller's PID namespace only.

At moment only x86 is supported and tested.

[akpm@linux-foundation.org: fix up selftests, warnings]
[akpm@linux-foundation.org: include errno.h]
[akpm@linux-foundation.org: tweak comment text]
Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Andrey Vagin <avagin@openvz.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Vasiliy Kulikov <segoon@openwall.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Valdis.Kletnieks@vt.edu
Cc: Michal Marek <mmarek@suse.cz>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d97b46a6

aio/vfs: cleanup of rw_copy_check_uvector() and compat_rw_copy_check_uvector() · ac34ebb3

由 Christopher Yeoh 提交于 5月 31, 2012

A cleanup of rw_copy_check_uvector and compat_rw_copy_check_uvector after
changes made to support CMA in an earlier patch.

Rather than having an additional check_access parameter to these
functions, the first paramater type is overloaded to allow the caller to
specify CHECK_IOVEC_ONLY which means check that the contents of the iovec
are valid, but do not check the memory that they point to. This is used
by process_vm_readv/writev where we need to validate that a iovec passed
to the syscall is valid but do not want to check the memory that it points
to at this point because it refers to an address space in another process.
Signed-off-by: NChris Yeoh <yeohc@au1.ibm.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ac34ebb3

eventfd: change int to __u64 in eventfd_signal() · ee62c6b2

由 Sha Zhengju 提交于 5月 31, 2012

eventfd_ctx->count is an __u64 counter which is allowed to reach
ULLONG_MAX.  eventfd_write() adds a __u64 value to "count", but the kernel
side eventfd_signal() only adds an int value to it.  Make them consistent.

[akpm@linux-foundation.org: update interface documentation]
Signed-off-by: NSha Zhengju <handai.szj@taobao.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ee62c6b2

rapidio: add DMA engine support for RIO data transfers · e42d98eb

由 Alexandre Bounine 提交于 5月 31, 2012

Adds DMA Engine framework support into RapidIO subsystem.

Uses DMA Engine DMA_SLAVE interface to generate data transfers to/from
remote RapidIO target devices.

Introduces RapidIO-specific wrapper for prep_slave_sg() interface with an
extra parameter to pass target specific information.

Uses scatterlist to describe local data buffer.  Address flat data buffer
on a remote side.
Signed-off-by: NAlexandre Bounine <alexandre.bounine@idt.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Acked-by: NVinod Koul <vinod.koul@linux.intel.com>
Cc: Li Yang <leoli@freescale.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e42d98eb

mqueue: separate mqueue default value from maximum value · cef0184c

由 KOSAKI Motohiro 提交于 5月 31, 2012

Commit b231cca4 ("message queues: increase range limits") changed
mqueue default value when attr parameter is specified NULL from hard
coded value to fs.mqueue.{msg,msgsize}_max sysctl value.

This made large side effect.  When user need to use two mqueue
applications 1) using !NULL attr parameter and it require big message
size and 2) using NULL attr parameter and only need small size message,
app (1) require to raise fs.mqueue.msgsize_max and app (2) consume large
memory size even though it doesn't need.

Doug Ledford propsed to switch back it to static hard coded value.
However it also has a compatibility problem.  Some applications might
started depend on the default value is tunable.

The solution is to separate default value from maximum value.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>
Acked-by: NDoug Ledford <dledford@redhat.com>
Acked-by: NJoe Korty <joe.korty@ccur.com>
Cc: Amerigo Wang <amwang@redhat.com>
Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: Manfred Spraul <manfred@colorfullife.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cef0184c

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功