提交 · 0a01f2cc390e10633a54f72c608cc3fe19a50c3d · openeuler / Kernel

19 11月, 2012 4 次提交

pidns: Make the pidns proc mount/umount logic obvious. · 0a01f2cc

由 Eric W. Biederman 提交于 8月 01, 2012

Track the number of pids in the proc hash table.  When the number of
pids goes to 0 schedule work to unmount the kernel mount of proc.

Move the mount of proc into alloc_pid when we allocate the pid for
init.

Remove the surprising calls of pid_ns_release proc in fork and
proc_flush_task.  Those code paths really shouldn't know about proc
namespace implementation details and people have demonstrated several
times that finding and understanding those code paths is difficult and
non-obvious.

Because of the call path detach pid is alwasy called with the
rtnl_lock held free_pid is not allowed to sleep, so the work to
unmounting proc is moved to a work queue.  This has the side benefit
of not blocking the entire world waiting for the unnecessary
rcu_barrier in deactivate_locked_super.

In the process of making the code clear and obvious this fixes a bug
reported by Gao feng <gaofeng@cn.fujitsu.com> where we would leak a
mount of proc during clone(CLONE_NEWPID|CLONE_NEWNET) if copy_pid_ns
succeeded and copy_net_ns failed.
Acked-by: N"Serge E. Hallyn" <serge@hallyn.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

0a01f2cc

pidns: Use task_active_pid_ns where appropriate · 17cf22c3

由 Eric W. Biederman 提交于 3月 02, 2010

The expressions tsk->nsproxy->pid_ns and task_active_pid_ns
aka ns_of_pid(task_pid(tsk)) should have the same number of
cache line misses with the practical difference that
ns_of_pid(task_pid(tsk)) is released later in a processes life.

Furthermore by using task_active_pid_ns it becomes trivial
to write an unshare implementation for the the pid namespace.

So I have used task_active_pid_ns everywhere I can.

In fork since the pid has not yet been attached to the
process I use ns_of_pid, to achieve the same effect.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

17cf22c3

procfs: Don't cache a pid in the root inode. · ae06c7c8

由 Eric W. Biederman 提交于 7月 10, 2010

Now that we have s_fs_info pointing to our pid namespace
the original reason for the proc root inode having a struct
pid is gone.

Caching a pid in the root inode has led to some complicated
code.  Now that we don't need the struct pid, just remove it.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

ae06c7c8

procfs: Use the proc generic infrastructure for proc/self. · e656d8a6

由 Eric W. Biederman 提交于 7月 10, 2010

I had visions at one point of splitting proc into two filesystems. If
that had happened proc/self being the the part of proc that actually deals
with pids would have been a nice cleanup. As it is proc/self requires
a lot of unnecessary infrastructure for a single file.

The only user visible change is that a mounted /proc for a pid namespace
that is dead now shows a broken proc symlink, instead of being completely
invisible. I don't think anyone will notice or care.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

e656d8a6

15 11月, 2012 2 次提交

userns: Support fuse interacting with multiple user namespaces · 499dcf20

由 Eric W. Biederman 提交于 2月 07, 2012

Use kuid_t and kgid_t in struct fuse_conn and struct fuse_mount_data.

The connection between between a fuse filesystem and a fuse daemon is
established when a fuse filesystem is mounted and provided with a file
descriptor the fuse daemon created by opening /dev/fuse.

For now restrict the communication of uids and gids between the fuse
filesystem and the fuse daemon to the initial user namespace.  Enforce
this by verifying the file descriptor passed to the mount of fuse was
opened in the initial user namespace.  Ensuring the mount happens in
the initial user namespace is not necessary as mounts from non-initial
user namespaces are not yet allowed.

In fuse_req_init_context convert the currrent fsuid and fsgid into the
initial user namespace for the request that will be sent to the fuse
daemon.

In fuse_fill_attr convert the uid and gid passed from the fuse daemon
from the initial user namespace into kuids and kgids.

In iattr_to_fattr called from fuse_setattr convert kuids and kgids
into the uids and gids in the initial user namespace before passing
them to the fuse filesystem.

In fuse_change_attributes_common called from fuse_dentry_revalidate,
fuse_permission, fuse_geattr, and fuse_setattr, and fuse_iget convert
the uid and gid from the fuse daemon into a kuid and a kgid to store
on the fuse inode.

By default fuse mounts are restricted to task whose uid, suid, and
euid matches the fuse user_id and whose gid, sgid, and egid matches
the fuse group id.  Convert the user_id and group_id mount options
into kuids and kgids at mount time, and use uid_eq and gid_eq to
compare the in fuse_allow_task.

Cc: Miklos Szeredi <miklos@szeredi.hu>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

499dcf20

userns: Support autofs4 interacing with multiple user namespaces · 45634cd8

由 Eric W. Biederman 提交于 2月 07, 2012

Use kuid_t and kgid_t in struct autofs_info and struct autofs_wait_queue.

When creating directories and symlinks default the uid and gid of
the mount requester to the global root uid and gid.  autofs4_wait
will update these fields when a mount is requested.

When generating autofsv5 packets report the uid and gid of the mount
requestor in user namespace of the process that opened the pipe,
reporting unmapped uids and gids as overflowuid and overflowgid.

In autofs_dev_ioctl_requester return the uid and gid of the last mount
requester converted into the calling processes user namespace.  When the
uid or gid don't map return overflowuid and overflowgid as appropriate,
allowing failure to find a mount requester to be distinguished from
failure to map a mount requester.

The uid and gid mount options specifying the user and group of the
root autofs inode are converted into kuid and kgid as they are parsed
defaulting to the current uid and current gid of the process that
mounts autofs.

Mounting of autofs for the present remains confined to processes in
the initial user namespace.

Cc: Ian Kent <raven@themaw.net>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

45634cd8

29 10月, 2012 1 次提交

Lock splice_read and splice_write functions · 1a25b1c4

由 Mikulas Patocka 提交于 10月 15, 2012

Functions generic_file_splice_read and generic_file_splice_write access
the pagecache directly. For block devices these functions must be locked
so that block size is not changed while they are in progress.

This patch is an additional fix for commit b87570f5 ("Fix a crash
when block device is read and block size is changed at the same time")
that locked aio_read, aio_write and mmap against block size change.
Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1a25b1c4

27 10月, 2012 1 次提交

VFS: don't do protected {sym,hard}links by default · 561ec64a

由 Linus Torvalds 提交于 10月 26, 2012

In commit 800179c9 ("This adds symlink and hardlink restrictions to
the Linux VFS"), the new link protections were enabled by default, in
the hope that no actual application would care, despite it being
technically against legacy UNIX (and documented POSIX) behavior.

However, it does turn out to break some applications.  It's rare, and
it's unfortunate, but it's unacceptable to break existing systems, so
we'll have to default to legacy behavior.

In particular, it has broken the way AFD distributes files, see

  http://www.dwd.de/AFD/

along with some legacy scripts.

Distributions can end up setting this at initrd time or in system
scripts: if you have security problems due to link attacks during your
early boot sequence, you have bigger problems than some kernel sysctl
setting. Do:

	echo 1 > /proc/sys/fs/protected_symlinks
	echo 1 > /proc/sys/fs/protected_hardlinks

to re-enable the link protections.

Alternatively, we may at some point introduce a kernel config option
that sets these kinds of "more secure but not traditional" behavioural
options automatically.
Reported-by: NNick Bowler <nbowler@elliptictech.com>
Reported-by: NHolger Kiehl <Holger.Kiehl@dwd.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: stable@kernel.org # v3.6
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

561ec64a

26 10月, 2012 13 次提交

fs/compat_ioctl.c: VIDEO_SET_SPU_PALETTE missing error check · 12176503

由 Kees Cook 提交于 10月 25, 2012

The compat ioctl for VIDEO_SET_SPU_PALETTE was missing an error check
while converting ioctl arguments.  This could lead to leaking kernel
stack contents into userspace.

Patch extracted from existing fix in grsecurity.
Signed-off-by: NKees Cook <keescook@chromium.org>
Cc: David Miller <davem@davemloft.net>
Cc: Brad Spengler <spender@grsecurity.net>
Cc: PaX Team <pageexec@freemail.hu>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12176503

freezer: exec should clear PF_NOFREEZE along with PF_KTHREAD · b40a7959

由 Oleg Nesterov 提交于 10月 25, 2012

flush_old_exec() clears PF_KTHREAD but forgets about PF_NOFREEZE.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

b40a7959

Btrfs: do not bug when we fail to commit the transaction · c37b2b62

由 Josef Bacik 提交于 10月 22, 2012

We BUG if we fail to commit the transaction when creating a snapshot, which
is just obnoxious.  Remove the BUG_ON().  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

c37b2b62

Btrfs: fix memory leak when cloning root's node · 7bfdcf7f

由 Liu Bo 提交于 10月 25, 2012

After cloning root's node, we forgot to dec the src's ref
which can lead to a memory leak.
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NChris Mason <chris.mason@fusionio.com>

7bfdcf7f

Btrfs: Use btrfs_update_inode_fallback when creating a snapshot · be6aef60

由 Josef Bacik 提交于 10月 22, 2012

On a really full file system I was getting ENOSPC back from
btrfs_update_inode when trying to update the parent inode when creating a
snapshot. Just use the fallback method so we can update the inode and not
have to worry about having a delayed ref. Thanks,
Signed-off-by: NJosef Bacik <jbacik@fusionio.com>

be6aef60

Btrfs: Send: preserve ownership (uid and gid) also for symlinks. · e2d044fe

由 Alex Lyakas 提交于 10月 17, 2012

This patch also requires a change in the user-space part of "receive".
We need to use "lchown" instead of "chown". We will do this in the
following patch.
Signed-off-by: NAlex Lyakas <alex.btrfs@zadarastorage.com>

 	if (S_ISREG(sctx->cur_inode_mode)) {

e2d044fe

Btrfs: fix deadlock caused by the nested chunk allocation · 671415b7

由 Miao Xie 提交于 10月 16, 2012

Steps to reproduce:
 # mkfs.btrfs -m raid1 <disk1> <disk2>
 # btrfstune -S 1 <disk1>
 # mount <disk1> <mnt>
 # btrfs device add <disk3> <disk4> <mnt>
 # mount -o remount,rw <mnt>
 # dd if=/dev/zero of=<mnt>/tmpfile bs=1M count=1
 Deadlock happened.

It is because of the nested chunk allocation. When we wrote the data
into the filesystem, we would allocate the data chunk because there was
no data chunk in the filesystem. At the end of the data chunk allocation,
we should insert the metadata of the data chunk into the extent tree, but
there was no raid1 chunk, so we tried to lock the chunk allocation mutex to
allocate the new chunk, but we had held the mutex, the deadlock happened.

By rights, we would allocate the raid1 chunk when we added the second device
because the profile of the seed filesystem is raid1 and we had two devices.
But we didn't do that in fact. It is because the last step of the first device
insertion didn't commit the transaction. So when we added the second device,
we didn't cow the tree, and just inserted the relative metadata into the leaves
which were generated by the first device insertion, and its profile was dup.

So, I fix this problem by commiting the transaction at the end of the first
device insertion.
Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>

671415b7

btrfs: Return EINVAL when length to trim is less than FSB · e515c18b

由 Lukas Czerner 提交于 10月 16, 2012

Currently if len argument in btrfs_ioctl_fitrim() is smaller than
one FSB we will continue and finally return 0 bytes discarded.
However if the length to discard is smaller then file system block
we should really return EINVAL.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>

e515c18b

Btrfs: fix memory leak in btrfs_quota_enable() · 5b7ff5b3

由 Tsutomu Itoh 提交于 10月 16, 2012

We should free quota_root before returning from the error
handling code.
Signed-off-by: NTsutomu Itoh <t-itoh@jp.fujitsu.com>

5b7ff5b3

Btrfs: send correct rdev and mode in btrfs-send · d79e5043

由 Arne Jansen 提交于 10月 15, 2012

When sending a device file, the stream was missing the mode. Also the
rdev was encoded wrongly.
Signed-off-by: NArne Jansen <sensille@gmx.net>

d79e5043

Btrfs: extended inode refs support for send mechanism · 96b5bd77

由 Jan Schmidt 提交于 10月 15, 2012

This adds support for the new extended inode refs to btrfs send.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

96b5bd77

Btrfs: Fix wrong error handling code · 84167d19

由 Stefan Behrens 提交于 10月 11, 2012

gcc says "warning: comparison of unsigned expression >= 0 is always
true" because i is an unsigned long. And gcc is right this time.
Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>

84167d19

Fix a sign bug causing invalid memory access in the ino_paths ioctl. · 661bec6b

由 Gabriel de Perthuis 提交于 10月 10, 2012

To see the problem, create many hardlinks to the same file (120 should do it),
then look up paths by inode with:

  ls -i
  btrfs inspect inode-resolve -v $ino /mnt/btrfs

I noticed the memory layout of the fspath->val data had some irregularities
(some unnecessary gaps that stop appearing about halfway),
so I'm not sure there aren't any bugs left in it.

661bec6b

25 10月, 2012 1 次提交

sysfs: sysfs_pathname/sysfs_add_one: Use strlcat() instead of strcat() · 66081a72

由 Geert Uytterhoeven 提交于 9月 29, 2012

The warning check for duplicate sysfs entries can cause a buffer overflow
when printing the warning, as strcat() doesn't check buffer sizes.
Use strlcat() instead.

Since strlcat() doesn't return a pointer to the passed buffer, unlike
strcat(), I had to convert the nested concatenation in sysfs_add_one() to
an admittedly more obscure comma operator construct, to avoid emitting code
for the concatenation if CONFIG_BUG is disabled.
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

66081a72

24 10月, 2012 6 次提交

LOCKD: Clear ln->nsm_clnt only when ln->nsm_users is zero · e498daa8

由 Trond Myklebust 提交于 10月 24, 2012

The current code is clearing it in all cases _except_ when zero.
Reported-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

e498daa8

LOCKD: fix races in nsm_client_get · a4ee8d97

由 Trond Myklebust 提交于 10月 23, 2012

Commit e9406db2 (lockd: per-net
NSM client creation and destruction helpers introduced) contains
a nasty race on initialisation of the per-net NSM client because
it doesn't check whether or not the client is set after grabbing
the nsm_create_mutex.
Reported-by: NNix <nix@esperi.org.uk>
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org

a4ee8d97

Btrfs: comment for loop in tree_mod_log_insert_move · 01763a2e

由 Jan Schmidt 提交于 10月 23, 2012

Emphasis the way tree_mod_log_insert_move avoids adding
MOD_LOG_KEY_REMOVE_WHILE_MOVING operations, depending on the direction of
the move operation.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

01763a2e

Btrfs: fix extent buffer reference for tree mod log roots · d6381084

由 Jan Schmidt 提交于 10月 23, 2012

In get_old_root we grab a lock on the extent buffer before we obtain a
reference on that buffer. That order is changed now.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

d6381084

Btrfs: determine level of old roots · 5b6602e7

由 Jan Schmidt 提交于 10月 23, 2012

In btrfs_find_all_roots' termination condition, we compare the level of the
old buffer we got from btrfs_search_old_slot to the level of the current
root node. We'd better compare it to the level of the rewinded root node.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

5b6602e7

Btrfs: tree mod log's old roots could still be part of the tree · 834328a8

由 Jan Schmidt 提交于 10月 23, 2012

Tree mod log treated old root buffers as always empty buffers when starting
the rewind operations. However, the old root may still be part of the
current tree at a lower level, with still some valid entries.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

834328a8

23 10月, 2012 3 次提交

Btrfs: fix a tree mod logging issue for root replacement operations · ba1bfbd5

由 Jan Schmidt 提交于 10月 22, 2012

Avoid the implicit free by tree_mod_log_set_root_pointer, which is wrong in
two places. Where needed, we call tree_mod_log_free_eb explicitly now.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

ba1bfbd5

Btrfs: don't put removals from push_node_left into tree mod log twice · 57911b8b

由 Jan Schmidt 提交于 10月 19, 2012

Independant of the check (push_items < src_items) tree_mod_log_eb_copy did
log the removal of the old data entries from the source buffer. Therefore,
we must not call tree_mod_log_eb_move if the check evaluates to true, as
that would log the removal twice, finally resulting in (rewinded) buffers
with wrong values for header_nritems.
Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>

57911b8b

ext4: Avoid underflow in ext4_trim_fs() · 5de35e8d

由 Lukas Czerner 提交于 10月 22, 2012

Currently if len argument in ext4_trim_fs() is smaller than one block,
the 'end' variable underflow. Avoid that by returning EINVAL if len is
smaller than file system block.

Also remove useless unlikely().
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

5de35e8d

22 10月, 2012 2 次提交

char_dev: pin parent kobject · 2f0157f1

由 Dmitry Torokhov 提交于 10月 21, 2012

In certain cases (for example when a cdev structure is embedded into
another object whose lifetime is controlled by a separate kobject) it is
beneficial to tie lifetime of another object to the lifetime of
character device so that related object is not freed until after
char_dev object is freed.

To achieve this let's pin kobject's parent when doing cdev_add() and
unpin when last reference to cdev structure is being released.
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2f0157f1

ext4: Checksum the block bitmap properly with bigalloc enabled · 79f1ba49

由 Tao Ma 提交于 10月 22, 2012

In mke2fs, we only checksum the whole bitmap block and it is right.
While in the kernel, we use EXT4_BLOCKS_PER_GROUP to indicate the
size of the checksumed bitmap which is wrong when we enable bigalloc.
The right size should be EXT4_CLUSTERS_PER_GROUP and this patch fixes
it.

Also as every caller of ext4_block_bitmap_csum_set and
ext4_block_bitmap_csum_verify pass in EXT4_BLOCKS_PER_GROUP(sb)/8,
we'd better removes this parameter and sets it in the function itself.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Reviewed-by: NLukas Czerner <lczerner@redhat.com>
Cc: stable@vger.kernel.org

79f1ba49

20 10月, 2012 1 次提交

hold task->mempolicy while numa_maps scans. · 9e781440

由 KAMEZAWA Hiroyuki 提交于 10月 19, 2012

  /proc/<pid>/numa_maps scans vma and show mempolicy under
  mmap_sem. It sometimes accesses task->mempolicy which can
  be freed without mmap_sem and numa_maps can show some
  garbage while scanning.

This patch tries to take reference count of task->mempolicy at reading
numa_maps before calling get_vma_policy(). By this, task->mempolicy
will not be freed until numa_maps reaches its end.

V2->v3
  -  updated comments to be more verbose.
  -  removed task_lock() in numa_maps code.
V1->V2
  -  access task->mempolicy only once and remember it.  Becase kernel/exit.c
     can overwrite it.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9e781440

19 10月, 2012 1 次提交

fs, xattr: fix bug when removing a name not in xattr list · 43385846

由 David Rientjes 提交于 10月 17, 2012

Commit 38f38657 ("xattr: extract simple_xattr code from tmpfs") moved
some code from tmpfs but introduced a subtle bug along the way.

If the name passed to simple_xattr_remove() does not exist in the list of
xattrs, then it is possible to call kfree(new_xattr) when new_xattr is
actually initialized to itself on the stack via uninitialized_var().

This causes a BUG() since the memory was not allocated via the slab
allocator and was not bypassed through to the page allocator because it
was too large.

Initialize the local variable to NULL so the kfree() never takes place.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NHugh Dickins <hughd@google.com>
Acked-by: NAristeu Rozanski <aris@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

43385846

17 10月, 2012 5 次提交

jfs: Fix FITRIM argument handling · 4e7a4b01

由 Lukas Czerner 提交于 10月 16, 2012

Currently when 'range->start' is beyond the end of file system
nothing is done and that fact is ignored, where in fact we should return
EINVAL. The same problem is when 'range.len' is smaller than file system
block.

Fix this by adding check for such conditions and return EINVAL
appropriately.
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Acked-by: NTino Reichardt <milky-kernel@mcmilk.de>
Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>

4e7a4b01

NLM: nlm_lookup_file() may return NLMv4-specific error codes · cd0b16c1

由 Trond Myklebust 提交于 10月 13, 2012

If the filehandle is stale, or open access is denied for some reason,
nlm_fopen() may return one of the NLMv4-specific error codes nlm4_stale_fh
or nlm4_failed. These get passed right through nlm_lookup_file(),
and so when nlmsvc_retrieve_args() calls the latter, it needs to filter
the result through the cast_status() machinery.

Failure to do so, will trigger the BUG_ON() in encode_nlm_stat...
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Reported-by: NLarry McVoy <lm@bitmover.com>
Cc: stable@kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

cd0b16c1

mm, mempolicy: fix printing stack contents in numa_maps · 32f8516a

由 David Rientjes 提交于 10月 16, 2012

When reading /proc/pid/numa_maps, it's possible to return the contents of
the stack where the mempolicy string should be printed if the policy gets
freed from beneath us.

This happens because mpol_to_str() may return an error the
stack-allocated buffer is then printed without ever being stored.

There are two possible error conditions in mpol_to_str():

 - if the buffer allocated is insufficient for the string to be stored,
   and

 - if the mempolicy has an invalid mode.

The first error condition is not triggered in any of the callers to
mpol_to_str(): at least 50 bytes is always allocated on the stack and this
is sufficient for the string to be written.  A future patch should convert
this into BUILD_BUG_ON() since we know the maximum strlen possible, but
that's not -rc material.

The second error condition is possible if a race occurs in dropping a
reference to a task's mempolicy causing it to be freed during the read().
The slab poison value is then used for the mode and mpol_to_str() returns
-EINVAL.

This race is only possible because get_vma_policy() believes that
mm->mmap_sem protects task->mempolicy, which isn't true.  The exit path
does not hold mm->mmap_sem when dropping the reference or setting
task->mempolicy to NULL: it uses task_lock(task) instead.

Thus, it's required for the caller of a task mempolicy to hold
task_lock(task) while grabbing the mempolicy and reading it.  Callers with
a vma policy store their mempolicy earlier and can simply increment the
reference count so it's guaranteed not to be freed.
Reported-by: NDave Jones <davej@redhat.com>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

32f8516a

fix a leak in replace_fd() users · 45525b26

由 Al Viro 提交于 10月 16, 2012

replace_fd() began with "eats a reference, tries to insert into
descriptor table" semantics; at some point I'd switched it to
much saner current behaviour ("try to insert into descriptor
table, grabbing a new reference if inserted; caller should do
fput() in any case"), but forgot to update the callers.
Mea culpa...

[Spotted by Pavel Roskin, who has really weird system with pipe-fed
coredumps as part of what he considers a normal boot ;-)]
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

45525b26

NFSv4: Fix the return value for nfs_callback_start_svc · e9b7e917

由 Trond Myklebust 提交于 10月 16, 2012

returning PTR_ERR(cb_info->task) just after we have set it to
NULL looks like a typo...
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>

e9b7e917

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功