提交 · de04e76935ad5985d318fbce298a17e9dd2092b7 · openeuler / Kernel

28 9月, 2016 10 次提交

fs/aio.c: eliminate redundant loads in put_aio_ring_file · de04e769

由 Rasmus Villemoes 提交于 9月 15, 2016

Using a local variable we can prevent gcc from reloading
aio_ring_file->f_inode->i_mapping twice, eliminating 2x2 dependent
loads.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

de04e769

fs/internal.h: add const to ns_dentry_operations declaration · be218aa2

由 Rasmus Villemoes 提交于 9月 15, 2016

The actual definition in fs/nsfs.c is already const.
Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

be218aa2

compat: remove compat_printk() · 9dcfcda5

由 Arnd Bergmann 提交于 9月 21, 2016

After 7e8e385a ("x86/compat: Remove sys32_vm86_warning"), this
function has become unused, so we can remove it as well.

Link: http://lkml.kernel.org/r/20160617142903.3070388-1-arnd@arndb.deSigned-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>

9dcfcda5

fs/buffer.c: make __getblk_slow() static · 0026ba40

由 Eric Biggers 提交于 9月 12, 2016

__getblk_slow() was exported to modules in commit 3b5e6454
("fs/buffer.c: support buffer cache allocations with gfp modifiers").
This seems to have been a mistake, as no users were introduced nor was
the function declared in a header.  Change it back to 'static'.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0026ba40

proc: unsigned file descriptors · 771187d6

由 Alexey Dobriyan 提交于 9月 02, 2016

Make struct proc_inode::fd unsigned.

This allows better code generation on x86_64 (less sign extensions).
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

771187d6

fs/file: more unsigned file descriptors · 9b80a184

由 Alexey Dobriyan 提交于 9月 02, 2016

Propagate unsignedness for grand total of 149 bytes:

	$ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux
	add/remove: 0/0 grow/shrink: 0/10 up/down: 0/-149 (-149)
	function                                     old     new   delta
	set_close_on_exec                             99      98      -1
	put_files_struct                             201     200      -1
	get_close_on_exec                             59      58      -1
	do_prlimit                                   498     497      -1
	do_execveat_common.isra                     1662    1661      -1
	__close_fd                                   178     173      -5
	do_dup2                                      219     204     -15
	seq_show                                     685     660     -25
	__alloc_fd                                   384     357     -27
	dup_fd                                       718     646     -72

It mostly comes from converting "unsigned int" to "long" for bit operations.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9b80a184

fs: compat: remove redundant check of nr_segs · 85e7340f

由 Shawn Lin 提交于 8月 22, 2016

nr_segs should never be less than zero as its type
is unsigned long, so let's remove this check.
Signed-off-by: NShawn Lin <shawn.lin@rock-chips.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

85e7340f

cachefiles: Fix attempt to read i_blocks after deleting file [ver #2] · a818101d

由 David Howells 提交于 8月 09, 2016

An NULL-pointer dereference happens in cachefiles_mark_object_inactive()
when it tries to read i_blocks so that it can tell the cachefilesd daemon
how much space it's making available.

The problem is that cachefiles_drop_object() calls
cachefiles_mark_object_inactive() after calling cachefiles_delete_object()
because the object being marked active staves off attempts to (re-)use the
file at that filename until after it has been deleted.  This means that
d_inode is NULL by the time we come to try to access it.

To fix the problem, have the caller of cachefiles_mark_object_inactive()
supply the number of blocks freed up.

Without this, the following oops may occur:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000098
IP: [<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
...
CPU: 11 PID: 527 Comm: kworker/u64:4 Tainted: G          I    ------------   3.10.0-470.el7.x86_64 #1
Hardware name: Hewlett-Packard HP Z600 Workstation/0B54h, BIOS 786G4 v03.19 03/11/2011
Workqueue: fscache_object fscache_object_work_func [fscache]
task: ffff880035edaf10 ti: ffff8800b77c0000 task.ti: ffff8800b77c0000
RIP: 0010:[<ffffffffa06c5cc1>] cachefiles_mark_object_inactive+0x61/0xb0 [cachefiles]
RSP: 0018:ffff8800b77c3d70  EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff8800bf6cc400 RCX: 0000000000000034
RDX: 0000000000000000 RSI: ffff880090ffc710 RDI: ffff8800bf761ef8
RBP: ffff8800b77c3d88 R08: 2000000000000000 R09: 0090ffc710000000
R10: ff51005d2ff1c400 R11: 0000000000000000 R12: ffff880090ffc600
R13: ffff8800bf6cc520 R14: ffff8800bf6cc400 R15: ffff8800bf6cc498
FS:  0000000000000000(0000) GS:ffff8800bb8c0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000098 CR3: 00000000019ba000 CR4: 00000000000007e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Stack:
 ffff880090ffc600 ffff8800bf6cc400 ffff8800867df140 ffff8800b77c3db0
 ffffffffa06c48cb ffff880090ffc600 ffff880090ffc180 ffff880090ffc658
 ffff8800b77c3df0 ffffffffa085d846 ffff8800a96b8150 ffff880090ffc600
Call Trace:
 [<ffffffffa06c48cb>] cachefiles_drop_object+0x6b/0xf0 [cachefiles]
 [<ffffffffa085d846>] fscache_drop_object+0xd6/0x1e0 [fscache]
 [<ffffffffa085d615>] fscache_object_work_func+0xa5/0x200 [fscache]
 [<ffffffff810a605b>] process_one_work+0x17b/0x470
 [<ffffffff810a6e96>] worker_thread+0x126/0x410
 [<ffffffff810a6d70>] ? rescuer_thread+0x460/0x460
 [<ffffffff810ae64f>] kthread+0xcf/0xe0
 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140
 [<ffffffff81695418>] ret_from_fork+0x58/0x90
 [<ffffffff810ae580>] ? kthread_create_on_node+0x140/0x140

The oopsing code shows:

	callq  0xffffffff810af6a0 <wake_up_bit>
	mov    0xf8(%r12),%rax
	mov    0x30(%rax),%rax
	mov    0x98(%rax),%rax   <---- oops here
	lock add %rax,0x130(%rbx)

where this is:

	d_backing_inode(object->dentry)->i_blocks

Fixes: a5b3a80b (CacheFiles: Provide read-and-reset release counters for cachefilesd)
Reported-by: NJianhong Yin <jiyin@redhat.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Reviewed-by: NSteve Dickson <steved@redhat.com>
cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

a818101d

A
cifs: don't use memcpy() to copy struct iov_iter · fc56b983
由 Al Viro 提交于 9月 21, 2016
```
it's not 70s anymore.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
fc56b983

get rid of separate multipage fault-in primitives · 4bce9f6e

由 Al Viro 提交于 9月 17, 2016

* the only remaining callers of "short" fault-ins are just as happy with generic
variants (both in lib/iov_iter.c); switch them to multipage variants, kill the
"short" ones
* rename the multipage variants to now available plain ones.
* get rid of compat macro defining iov_iter_fault_in_multipage_readable by
expanding it in its only user.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

4bce9f6e

22 9月, 2016 2 次提交

btrfs: ensure that file descriptor used with subvol ioctls is a dir · 325c50e3

由 Jeff Mahoney 提交于 9月 21, 2016

If the subvol/snapshot create/destroy ioctls are passed a regular file
with execute permissions set, we'll eventually Oops while trying to do
inode->i_op->lookup via lookup_one_len.

This patch ensures that the file descriptor refers to a directory.

Fixes: cb8e7090 (Btrfs: Fix subvolume creation locking rules)
Fixes: 76dda93c (Btrfs: add snapshot/subvolume destroy ioctl)
Cc: <stable@vger.kernel.org> #v2.6.29+
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

325c50e3

Btrfs: handle quota reserve failure properly · 1e5ec2e7

由 Josef Bacik 提交于 9月 15, 2016

btrfs/022 was spitting a warning for the case that we exceed the quota.  If we
fail to make our quota reservation we need to clean up our data space
reservation.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Tested-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

1e5ec2e7

21 9月, 2016 2 次提交

fs/proc/kcore.c: Add bounce buffer for ktext data · df04abfd

由 Jiri Olsa 提交于 9月 08, 2016

We hit hardened usercopy feature check for kernel text access by reading
kcore file:

  usercopy: kernel memory exposure attempt detected from ffffffff8179a01f (<kernel text>) (4065 bytes)
  kernel BUG at mm/usercopy.c:75!

Bypassing this check for kcore by adding bounce buffer for ktext data.
Reported-by: NSteve Best <sbest@redhat.com>
Fixes: f5509cc1 ("mm: Hardened usercopy")
Suggested-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df04abfd

fs/proc/kcore.c: Make bounce buffer global for read · f5beeb18

由 Jiri Olsa 提交于 9月 08, 2016

Next patch adds bounce buffer for ktext area, so it's
convenient to have single bounce buffer for both
vmalloc/module and ktext cases.
Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NJiri Olsa <jolsa@kernel.org>
Acked-by: NKees Cook <keescook@chromium.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f5beeb18

20 9月, 2016 10 次提交

Revert "ocfs2: bump up o2cb network protocol version" · 63b52c49

由 Junxiao Bi 提交于 9月 19, 2016

This reverts commit 38b52efd ("ocfs2: bump up o2cb network protocol
version").

This commit made rolling upgrade fail.  When one node is upgraded to new
version with this commit, the remaining nodes will fail to establish
connections to it, then the application like VMs on the remaining nodes
can't be live migrated to the upgraded one.  This will cause an outage.
Since negotiate hb timeout behavior didn't change without this commit,
so revert it.

Fixes: 38b52efd ("ocfs2: bump up o2cb network protocol version")
Link: http://lkml.kernel.org/r/1471396924-10375-1-git-send-email-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

63b52c49

ocfs2: fix start offset to ocfs2_zero_range_for_truncate() · d21c353d

由 Ashish Samant 提交于 9月 19, 2016

If we punch a hole on a reflink such that following conditions are met:

1. start offset is on a cluster boundary
2. end offset is not on a cluster boundary
3. (end offset is somewhere in another extent) or
   (hole range > MAX_CONTIG_BYTES(1MB)),

we dont COW the first cluster starting at the start offset.  But in this
case, we were wrongly passing this cluster to
ocfs2_zero_range_for_truncate() to zero out.  This will modify the
cluster in place and zero it in the source too.

Fix this by skipping this cluster in such a scenario.

To reproduce:

1. Create a random file of say 10 MB
     xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
2. Reflink  it
     reflink -f 10MBfile reflnktest
3. Punch a hole at starting at cluster boundary  with range greater that
1MB. You can also use a range that will put the end offset in another
extent.
     fallocate -p -o 0 -l 1048615 reflnktest
4. sync
5. Check the  first cluster in the source file. (It will be zeroed out).
    dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C

Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.comSigned-off-by: NAshish Samant <ashish.samant@oracle.com>
Reported-by: NSaar Maoz <saar.maoz@oracle.com>
Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Eric Ren <zren@suse.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d21c353d

ocfs2: fix double unlock in case retry after free truncate log · 3bb8b653

由 Joseph Qi 提交于 9月 19, 2016

If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to
free truncate log and then retry.  Since ocfs2_try_to_free_truncate_log
will lock/unlock global bitmap inode, we have to unlock it before
calling this function.  But when retry reserve and it fails with no
global bitmap inode lock taken, it will unlock again in error handling
branch and BUG.

This issue also exists if no need retry and then ocfs2_inode_lock fails.
So fix it.

Fixes: 2070ad1a ("ocfs2: retry on ENOSPC if sufficient space in truncate log")
Link: http://lkml.kernel.org/r/57D91939.6030809@huawei.comSigned-off-by: NJoseph Qi <joseph.qi@huawei.com>
Signed-off-by: NJiufei Xue <xuejiufei@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3bb8b653

fanotify: fix list corruption in fanotify_get_response() · 96d41019

由 Jan Kara 提交于 9月 19, 2016

fanotify_get_response() calls fsnotify_remove_event() when it finds that
group is being released from fanotify_release() (bypass_perm is set).

However the event it removes need not be only in the group's notification
queue but it can have already moved to access_list (userspace read the
event before closing the fanotify instance fd) which is protected by a
different lock. Thus when fsnotify_remove_event() races with
fanotify_release() operating on access_list, the list can get corrupted.

Fix the problem by moving all the logic removing permission events from
the lists to one place - fanotify_release().

Fixes: 5838d444 ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-3-git-send-email-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
Reported-by: NMiklos Szeredi <mszeredi@redhat.com>
Tested-by: NMiklos Szeredi <mszeredi@redhat.com>
Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

96d41019

fsnotify: add a way to stop queueing events on group shutdown · 12703dbf

由 Jan Kara 提交于 9月 19, 2016

Implement a function that can be called when a group is being shutdown
to stop queueing new events to the group. Fanotify will use this.

Fixes: 5838d444 ("fanotify: fix double free of pending permission events")
Link: http://lkml.kernel.org/r/1473797711-14111-2-git-send-email-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
Reviewed-by: NMiklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12703dbf

ocfs2: fix trans extend while free cached blocks · d5bf1418

由 Junxiao Bi 提交于 9月 19, 2016

The root cause of this issue is the same with the one fixed by the last
patch, but this time credits for allocator inode and group descriptor
may not be consumed before trans extend.

The following error was caught:

WARNING: CPU: 0 PID: 2037 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront fb_sys_fops sysimgblt sysfillrect syscopyarea xen_netfront parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
CPU: 0 PID: 2037 Comm: rm Tainted: G W 4.1.12-37.6.3.el6uek.bug24573128v2.x86_64 #2
Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
Call Trace:
dump_stack+0x48/0x5c
warn_slowpath_common+0x95/0xe0
warn_slowpath_null+0x1a/0x20
start_this_handle+0x4c3/0x510 [jbd2]
jbd2__journal_restart+0x161/0x1b0 [jbd2]
jbd2_journal_restart+0x13/0x20 [jbd2]
ocfs2_extend_trans+0x74/0x220 [ocfs2]
ocfs2_free_cached_blocks+0x16b/0x4e0 [ocfs2]
ocfs2_run_deallocs+0x70/0x270 [ocfs2]
ocfs2_commit_truncate+0x474/0x6f0 [ocfs2]
ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
ocfs2_evict_inode+0x28/0x60 [ocfs2]
evict+0xab/0x1a0
iput_final+0xf6/0x190
iput+0xc8/0xe0
do_unlinkat+0x1b7/0x310
SyS_unlinkat+0x22/0x40
system_call_fastpath+0x12/0x71
---[ end trace a62437cb060baa71 ]---
JBD2: rm wants too many credits (149 > 128)

Link: http://lkml.kernel.org/r/1473674623-11810-2-git-send-email-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d5bf1418

ocfs2: fix trans extend while flush truncate log · 2b0ad008

由 Junxiao Bi 提交于 9月 19, 2016

Every time, ocfs2_extend_trans() included a credit for truncate log
inode, but as that inode had been managed by jbd2 running transaction
first time, it will not consume that credit until
jbd2_journal_restart().

Since total credits to extend always included the un-consumed ones,
there will be more and more un-consumed credit, at last
jbd2_journal_restart() will fail due to credit number over the half of
max transction credit.

The following error was caught when unlinking a large file with many
extents:

  ------------[ cut here ]------------
  WARNING: CPU: 0 PID: 13626 at fs/jbd2/transaction.c:269 start_this_handle+0x4c3/0x510 [jbd2]()
  Modules linked in: ocfs2 nfsd lockd grace nfs_acl auth_rpcgss sunrpc autofs4 ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sd_mod sg ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi ppdev xen_kbdfront xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea parport_pc parport pcspkr i2c_piix4 i2c_core acpi_cpufreq ext4 jbd2 mbcache xen_blkfront floppy pata_acpi ata_generic ata_piix dm_mirror dm_region_hash dm_log dm_mod
  CPU: 0 PID: 13626 Comm: unlink Tainted: G        W       4.1.12-37.6.3.el6uek.x86_64 #2
  Hardware name: Xen HVM domU, BIOS 4.4.4OVM 02/11/2016
  Call Trace:
    dump_stack+0x48/0x5c
    warn_slowpath_common+0x95/0xe0
    warn_slowpath_null+0x1a/0x20
    start_this_handle+0x4c3/0x510 [jbd2]
    jbd2__journal_restart+0x161/0x1b0 [jbd2]
    jbd2_journal_restart+0x13/0x20 [jbd2]
    ocfs2_extend_trans+0x74/0x220 [ocfs2]
    ocfs2_replay_truncate_records+0x93/0x360 [ocfs2]
    __ocfs2_flush_truncate_log+0x13e/0x3a0 [ocfs2]
    ocfs2_remove_btree_range+0x458/0x7f0 [ocfs2]
    ocfs2_commit_truncate+0x1b3/0x6f0 [ocfs2]
    ocfs2_truncate_for_delete+0xbd/0x380 [ocfs2]
    ocfs2_wipe_inode+0x136/0x6a0 [ocfs2]
    ocfs2_delete_inode+0x2a2/0x3e0 [ocfs2]
    ocfs2_evict_inode+0x28/0x60 [ocfs2]
    evict+0xab/0x1a0
    iput_final+0xf6/0x190
    iput+0xc8/0xe0
    do_unlinkat+0x1b7/0x310
    SyS_unlink+0x16/0x20
    system_call_fastpath+0x12/0x71
  ---[ end trace 28aa7410e69369cf ]---
  JBD2: unlink wants too many credits (251 > 128)

Link: http://lkml.kernel.org/r/1473674623-11810-1-git-send-email-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2b0ad008

ipc/shm: fix crash if CONFIG_SHMEM is not set · 31b4beb4

由 Kirill A. Shutemov 提交于 9月 19, 2016

Commit c01d5b30 ("shmem: get_unmapped_area align huge page") makes
use of shm_get_unmapped_area() in shm_file_operations() unconditional to
CONFIG_MMU.

As Tony Battersby pointed this can lead NULL-pointer dereference on
machine with CONFIG_MMU=y and CONFIG_SHMEM=n.  In this case ipc/shm is
backed by ramfs which doesn't provide f_op->get_unmapped_area for
configurations with MMU.

The solution is to provide dummy f_op->get_unmapped_area for ramfs when
CONFIG_MMU=y, which just call current->mm->get_unmapped_area().

Fixes: c01d5b30 ("shmem: get_unmapped_area align huge page")
Link: http://lkml.kernel.org/r/20160912102704.140442-1-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: NTony Battersby <tonyb@cybernetics.com>
Tested-by: NTony Battersby <tonyb@cybernetics.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: <stable@vger.kernel.org>	[4.7.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

31b4beb4

autofs: use dentry flags to block walks during expire · 7cbdb4a2

由 Ian Kent 提交于 9月 19, 2016

Somewhere along the way the autofs expire operation has changed to hold
a spin lock over expired dentry selection.  The autofs indirect mount
expired dentry selection is complicated and quite lengthy so it isn't
appropriate to hold a spin lock over the operation.

Commit 47be6184 ("fs/dcache.c: avoid soft-lockup in dput()") added a
might_sleep() to dput() causing a WARN_ONCE() about this usage to be
issued.

But the spin lock doesn't need to be held over this check, the autofs
dentry info.  flags are enough to block walks into dentrys during the
expire.

I've left the direct mount expire as it is (for now) because it is much
simpler and quicker than the indirect mount expire and adding spin lock
release and re-aquires would do nothing more than add overhead.

Fixes: 47be6184 ("fs/dcache.c: avoid soft-lockup in dput()")
Link: http://lkml.kernel.org/r/20160912014017.1773.73060.stgit@pluto.themaw.netSigned-off-by: NIan Kent <raven@themaw.net>
Reported-by: NTakashi Iwai <tiwai@suse.de>
Tested-by: NTakashi Iwai <tiwai@suse.de>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: NeilBrown <neilb@suse.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7cbdb4a2

ocfs2/dlm: fix race between convert and migration · e6f0c6e6

由 Joseph Qi 提交于 9月 19, 2016

Commit ac7cf246 ("ocfs2/dlm: fix race between convert and recovery")
checks if lockres master has changed to identify whether new master has
finished recovery or not.  This will introduce a race that right after
old master does umount ( means master will change), a new convert
request comes.

In this case, it will reset lockres state to DLM_RECOVERING and then
retry convert, and then fail with lockres->l_action being set to
OCFS2_AST_INVALID, which will cause inconsistent lock level between
ocfs2 and dlm, and then finally BUG.

Since dlm recovery will clear lock->convert_pending in
dlm_move_lockres_to_recovery_list, we can use it to correctly identify
the race case between convert and recovery.  So fix it.

Fixes: ac7cf246 ("ocfs2/dlm: fix race between convert and recovery")
Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.comSigned-off-by: NJoseph Qi <joseph.qi@huawei.com>
Signed-off-by: NJun Piao <piaojun@huawei.com>
Cc: Mark Fasheh <mfasheh@suse.de>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e6f0c6e6

16 9月, 2016 4 次提交

configfs: Return -EFBIG from configfs_write_bin_file. · 42857cf5

由 Phil Turnbull 提交于 9月 15, 2016

The check for writing more than cb_max_size bytes does not 'goto out' so
it is a no-op which allows users to vmalloc an arbitrary amount.

Fixes: 03607ace ("configfs: implement binary attributes")
Cc: stable@kernel.org
Signed-off-by: NPhil Turnbull <phil.turnbull@oracle.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>

42857cf5

aio: mark AIO pseudo-fs noexec · 22f6b4d3

由 Jann Horn 提交于 9月 16, 2016

This ensures that do_mmap() won't implicitly make AIO memory mappings
executable if the READ_IMPLIES_EXEC personality flag is set.  Such
behavior is problematic because the security_mmap_file LSM hook doesn't
catch this case, potentially permitting an attacker to bypass a W^X
policy enforced by SELinux.

I have tested the patch on my machine.

To test the behavior, compile and run this:

    #define _GNU_SOURCE
    #include <unistd.h>
    #include <sys/personality.h>
    #include <linux/aio_abi.h>
    #include <err.h>
    #include <stdlib.h>
    #include <stdio.h>
    #include <sys/syscall.h>

    int main(void) {
        personality(READ_IMPLIES_EXEC);
        aio_context_t ctx = 0;
        if (syscall(__NR_io_setup, 1, &ctx))
            err(1, "io_setup");

        char cmd[1000];
        sprintf(cmd, "cat /proc/%d/maps | grep -F '/[aio]'",
            (int)getpid());
        system(cmd);
        return 0;
    }

In the output, "rw-s" is good, "rwxs" is bad.
Signed-off-by: NJann Horn <jann@thejh.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

22f6b4d3

vfs: cap dedupe request structure size at PAGE_SIZE · b71dbf10

由 Darrick J. Wong 提交于 9月 14, 2016

Kirill A Shutemov reports that the kernel doesn't try to cap dest_count
in any way, and uses the number to allocate kernel memory.  This causes
high order allocation warnings in the kernel log if someone passes in a
big enough value.  We should clamp the allocation at PAGE_SIZE to avoid
stressing the VM.

The two existing users of the dedupe ioctl never send more than 120
requests, so we can safely clamp dest_range at PAGE_SIZE, because with
4k pages we can handle up to 127 dedupe candidates.  Given the max
extent length of 16MB, we can end up doing 2GB of IO which is plenty.

[ Note: the "offsetof()" can't overflow, because 'count' is just a
  16-bit integer.  That's not obvious in the limited context of the
  patch, so I'm noting it here because it made me go look.  - Linus ]
Reported-by: N"Kirill A. Shutemov" <kirill@shutemov.name>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b71dbf10

vfs: fix return type of ioctl_file_dedupe_range · 5297e0f0

由 Darrick J. Wong 提交于 9月 14, 2016

All the VFS functions in the dedupe ioctl path return int status, so
the ioctl handler ought to as well.

Found by Coverity, CID 1350952.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5297e0f0

12 9月, 2016 1 次提交

NFSv4.1: Fix the CREATE_SESSION slot number accounting · b519d408

由 Trond Myklebust 提交于 9月 11, 2016

Ensure that we conform to the algorithm described in RFC5661, section
18.36.4 for when to bump the sequence id. In essence we do it for all
cases except when the RPC call timed out, or in case of the server returning
NFS4ERR_DELAY or NFS4ERR_STALE_CLIENTID.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
Cc: stable@vger.kernel.org

b519d408

10 9月, 2016 7 次提交

fscrypto: require write access to mount to set encryption policy · ba63f23d

由 Eric Biggers 提交于 9月 08, 2016

Since setting an encryption policy requires writing metadata to the
filesystem, it should be guarded by mnt_want_write/mnt_drop_write.
Otherwise, a user could cause a write to a frozen or readonly
filesystem.  This was handled correctly by f2fs but not by ext4.  Make
fscrypt_process_policy() handle it rather than relying on the filesystem
to get it right.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Acked-by: NJaegeuk Kim <jaegeuk@kernel.org>

ba63f23d

Move check for prefix path to within cifs_get_root() · 348c1bfa

由 Sachin Prabhu 提交于 7月 29, 2016

Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
Tested-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

348c1bfa

Compare prepaths when comparing superblocks · c1d8b24d

由 Sachin Prabhu 提交于 7月 29, 2016

The patch
fs/cifs: make share unaccessible at root level mountable
makes use of prepaths when any component of the underlying path is
inaccessible.

When mounting 2 separate shares having different prepaths but are other
wise similar in other respects, we end up sharing superblocks when we
shouldn't be doing so.
Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
Tested-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

c1d8b24d

Fix memory leaks in cifs_do_mount() · 4214ebf4

由 Sachin Prabhu 提交于 7月 29, 2016

Fix memory leaks introduced by the patch
fs/cifs: make share unaccessible at root level mountable

Also move allocation of cifs_sb->prepath to cifs_setup_cifs_sb().
Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
Tested-by: NAurelien Aptel <aaptel@suse.com>
Signed-off-by: NSteve French <smfrench@gmail.com>

4214ebf4

fscrypto: only allow setting encryption policy on directories · 002ced4b

由 Eric Biggers 提交于 9月 08, 2016

The FS_IOC_SET_ENCRYPTION_POLICY ioctl allowed setting an encryption
policy on nondirectory files.  This was unintentional, and in the case
of nonempty regular files did not behave as expected because existing
data was not actually encrypted by the ioctl.

In the case of ext4, the user could also trigger filesystem errors in
->empty_dir(), e.g. due to mismatched "directory" checksums when the
kernel incorrectly tried to interpret a regular file as a directory.

This bug affected ext4 with kernels v4.8-rc1 or later and f2fs with
kernels v4.6 and later.  It appears that older kernels only permitted
directories and that the check was accidentally lost during the
refactoring to share the file encryption code between ext4 and f2fs.

This patch restores the !S_ISDIR() check that was present in older
kernels.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

002ced4b

fscrypto: add authorization check for setting encryption policy · 163ae1c6

由 Eric Biggers 提交于 9月 08, 2016

On an ext4 or f2fs filesystem with file encryption supported, a user
could set an encryption policy on any empty directory(*) to which they
had readonly access.  This is obviously problematic, since such a
directory might be owned by another user and the new encryption policy
would prevent that other user from creating files in their own directory
(for example).

Fix this by requiring inode_owner_or_capable() permission to set an
encryption policy.  This means that either the caller must own the file,
or the caller must have the capability CAP_FOWNER.

(*) Or also on any regular file, for f2fs v4.6 and later and ext4
    v4.8-rc1 and later; a separate bug fix is coming for that.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Cc: stable@vger.kernel.org # 4.1+; check fs/{ext4,f2fs}
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

163ae1c6

mm: fix show_smap() for zone_device-pmd ranges · ca120cf6

由 Dan Williams 提交于 9月 03, 2016

Attempting to dump /proc/<pid>/smaps for a process with pmd dax mappings
currently results in the following VM_BUG_ONs:

 kernel BUG at mm/huge_memory.c:1105!
 task: ffff88045f16b140 task.stack: ffff88045be14000
 RIP: 0010:[<ffffffff81268f9b>]  [<ffffffff81268f9b>] follow_trans_huge_pmd+0x2cb/0x340
 [..]
 Call Trace:
  [<ffffffff81306030>] smaps_pte_range+0xa0/0x4b0
  [<ffffffff814c2755>] ? vsnprintf+0x255/0x4c0
  [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
  [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
  [<ffffffff81307656>] show_smap+0xa6/0x2b0

 kernel BUG at fs/proc/task_mmu.c:585!
 RIP: 0010:[<ffffffff81306469>]  [<ffffffff81306469>] smaps_pte_range+0x499/0x4b0
 Call Trace:
  [<ffffffff814c2795>] ? vsnprintf+0x255/0x4c0
  [<ffffffff8123c46e>] __walk_page_range+0x1fe/0x4d0
  [<ffffffff8123c8a2>] walk_page_vma+0x62/0x80
  [<ffffffff81307696>] show_smap+0xa6/0x2b0

These locations are sanity checking page flags that must be set for an
anonymous transparent huge page, but are not set for the zone_device
pages associated with dax mappings.

Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

ca120cf6

06 9月, 2016 2 次提交

btrfs: introduce tickets_id to determine whether asynchronous metadata reclaim work makes progress · ce129655

由 Wang Xiaoguang 提交于 9月 02, 2016

In btrfs_async_reclaim_metadata_space(), we use ticket's address to
determine whether asynchronous metadata reclaim work is making progress.

	ticket = list_first_entry(&space_info->tickets,
				  struct reserve_ticket, list);
	if (last_ticket == ticket) {
		flush_state++;
	} else {
		last_ticket = ticket;
		flush_state = FLUSH_DELAYED_ITEMS_NR;
		if (commit_cycles)
			commit_cycles--;
	}

But indeed it's wrong, we should not rely on local variable's address to
do this check, because addresses may be same. In my test environment, I
dd one 168MB file in a 256MB fs, found that for this file, every time
wait_reserve_ticket() called, local variable ticket's address is same,

For above codes, assume a previous ticket's address is addrA, last_ticket
is addrA. Btrfs_async_reclaim_metadata_space() finished this ticket and
wake up it, then another ticket is added, but with the same address addrA,
now last_ticket will be same to current ticket, then current ticket's flush
work will start from current flush_state, not initial FLUSH_DELAYED_ITEMS_NR,
which may result in some enospc issues(I have seen this in my test machine).
Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ce129655

Btrfs: remove root_log_ctx from ctx list before btrfs_sync_log returns · cbd60aa7

由 Chris Mason 提交于 9月 06, 2016

We use a btrfs_log_ctx structure to pass information into the
tree log commit, and get error values out.  It gets added to a per
log-transaction list which we walk when things go bad.

Commit d1433deb added an optimization to skip waiting for the log
commit, but didn't take root_log_ctx out of the list.  This
patch makes sure we remove things before exiting.
Signed-off-by: NChris Mason <clm@fb.com>
Fixes: d1433deb
cc: stable@vger.kernel.org # 3.15+

cbd60aa7

05 9月, 2016 2 次提交

btrfs: do not decrease bytes_may_use when replaying extents · ed7a6948

由 Wang Xiaoguang 提交于 8月 26, 2016

When replaying extents, there is no need to update bytes_may_use
in btrfs_alloc_logged_file_extent(), otherwise it'll trigger a
WARN_ON about bytes_may_use.

Fixes: ("btrfs: update btrfs_space_info's bytes_may_use timely")
Signed-off-by: NWang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Reviewed-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ed7a6948

ceph: do not modify fi->frag in need_reset_readdir() · 0f5aa88a

由 Nicolas Iooss 提交于 8月 28, 2016

Commit f3c4ebe6 ("ceph: using hash value to compose dentry offset")
modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |=
fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a
comparison operator with an assignment one.

This looks like a typo which is reported by clang when building the
kernel with some warning flags:

    fs/ceph/dir.c:600:22: error: using the result of an assignment as a
    condition without parentheses [-Werror,-Wparentheses]
            } else if (fi->frag |= fpos_frag(new_pos)) {
                       ~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
    fs/ceph/dir.c:600:22: note: place parentheses around the assignment
    to silence this warning
            } else if (fi->frag |= fpos_frag(new_pos)) {
                                ^
                       (                             )
    fs/ceph/dir.c:600:22: note: use '!=' to turn this compound
    assignment into an inequality comparison
            } else if (fi->frag |= fpos_frag(new_pos)) {
                                ^~
                                !=

Fixes: f3c4ebe6 ("ceph: using hash value to compose dentry offset")
Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0f5aa88a

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功