提交 · 8aa6f1c5bd7043734fff1961a4795da9cc5d0f50 · openanolis / cloud-kernel

07 5月, 2014 40 次提交

f2fs: fix to truncate inline data in inode page when setattr · 8aa6f1c5

由 Chao Yu 提交于 4月 29, 2014

Previous we do not truncate inline data in inode page when setattr, so following
case could still read the inline data which has already truncated:

1.write inline data
2.ftruncate size to 0
3.ftruncate size to max inline data size
4.read from offset 0

This patch introduces truncate_inline_data() to fix this problem.

change log from v1:
 o fix a bug and do not truncate first page data after truncate inline data.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

8aa6f1c5

f2fs: readahead multi pages of directory for performance · 817202d9

由 Chao Yu 提交于 4月 28, 2014

We have no so such readahead mechanism in ->iterate() path as the one in
->read() path, it cause low performance when we read large directory.
This patch add readahead in f2fs_readdir() for better performance.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

817202d9

f2fs: set errno when f2fs_iget failed in recover_dentry · 5c1f9927

由 Chao Yu 提交于 4月 28, 2014

We should set the error number correctly when we fail in recover_dentry(), so
the recover flow could stop for the reason as error number shows instead of
continuing.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

5c1f9927

f2fs: consider fallocated space for SEEK_DATA · 7f7670fe

由 Jaegeuk Kim 提交于 4月 28, 2014

If an amount of data are allocated though fallocate and user writes a couple of
data among the space, f2fs should return the data offset made by user when
SEEK_DATA is requested.

For example, (N: NEW_ADDR by fallocate, X: NEW_ADDR by user)
1) fallocate 0 ~ 10MB
f -> N N N N N N N N N N N N ... N

2) write 4KB at 5MB offset
f -> N N N N N X N N N N N N ... N

3) SEEK_DATA from 0 should return 5MB offset

So, this patch adds a routine to search the first dirty page to handle that.
Then, the SEEK_DATA flow skips NEW_ADDR offsets until any dirty page is found.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

7f7670fe

f2fs: return i_size if the hole is outside of i_size · fe369bc8

由 Jaegeuk Kim 提交于 4月 28, 2014

When SEEK_HOLE is requeted, it should return i_size if the hole position is
found outside of i_size.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

fe369bc8

f2fs: introduce f2fs_seek_block to support SEEK_{DATA, HOLE} in llseek · 267378d4

由 Chao Yu 提交于 4月 23, 2014

In This patch we introduce f2fs_seek_block to support SEEK_{DATA,HOLE} of
lseek(2).

change log from v1:
 o fix bug when lseek from middle of page and fix wrong calculation of
PGOFS_OF_NEXT_DNODE macro.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

267378d4

f2fs: introduce help function {create,destroy}_flush_cmd_control · 2163d198

由 Gu Zheng 提交于 4月 27, 2014

Introduce help function {create,destroy}_flush_cmd_control to clean up
the create/destory flush merge operation.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

2163d198

f2fs: introduce struct flush_cmd_control to wrap the flush_merge fields · a688b9d9

由 Gu Zheng 提交于 4月 27, 2014

Split the flush_merge fields from sm_i, and use the new struct flush_cmd_control
to wrap it, so that we can igonre these fileds if flush_merge is disable, and
it alse can the structs more neat.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

a688b9d9

f2fs: introduce help macro ADDRS_PER_PAGE() · 6403eb1f

由 Chao Yu 提交于 4月 26, 2014

Introduce help macro ADDRS_PER_PAGE() to get the number of address pointers in
direct node or inode.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

6403eb1f

f2fs: submit bio at the reclaim path · 2aea39ec

由 Jaegeuk Kim 提交于 4月 24, 2014

If f2fs_write_data_page is called through the reclaim path, we should submit
the bio right away.

This patch resolves the following issue that Marc Dietrich reported.
"It took me a while to bisect a problem which causes my ARM (tegra2) netbook to
frequently stall for 5-10 seconds when I enable EXA acceleration (opentegra
experimental ddx)."
And this patch fixes that.
Reported-by: NMarc Dietrich <marvin24@gmx.de>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

2aea39ec

f2fs: return errors right after checking them · 916decbf

由 Jaegeuk Kim 提交于 4月 23, 2014

This patch adds two error conditions early in the setxattr operations.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

916decbf

f2fs: pass flags field to setxattr functions · c02745ef

由 Jaegeuk Kim 提交于 4月 23, 2014

This patch passes the "flags" field to the low level setxattr functions
to use XATTR_REPLACE in the following patches.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

c02745ef

f2fs: clean up long variable names · e1123268

由 Jaegeuk Kim 提交于 4月 23, 2014

This patch includes simple clean-ups to reduce unnecessary long variable names.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

e1123268

f2fs: handle inline data independently in f2fs_bmap · 454ae7e5

由 Chao Yu 提交于 4月 22, 2014

We'd better handle inline data case independently in f2fs_bmap().
It can reduce our handling time in f2fs_bmap().
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

454ae7e5

f2fs: adjust free mem size to flush dentry blocks · 6fb03f3a

由 Jaegeuk Kim 提交于 4月 16, 2014

If so many dirty dentry blocks are cached, not reached to the flush condition,
we should fall into livelock in balance_dirty_pages.
So, let's consider the mem size for the condition.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

6fb03f3a

f2fs: avoid BUG_ON when mouting corrupted image having garbage blocks · e8271fa3

由 Jaegeuk Kim 提交于 4月 18, 2014

If the disk has some garbage blocks, F2FS is able to face with BUG_ON when
recovering direct node blocks.
This patch detects the error case and avoids that prior to reaching BUG_ON.

Alexey Khoroshilov addressed the potential security issues as follows.
"An ability to trigger a BUG_ON assert by mounting a crafted image is
usually considered as a local denial of service [1-3]. As far as I
understand, the reason is that some kernel data may become inconsistent
that can lead to further problems.

[1] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-3353
[2] http://www.openwall.com/lists/oss-security/2011/06/24/4
[3] http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2011-2928
etc."
Reported-by: NAndrey Tsyvarev <tsyvarev@ispras.ru>
Cc: Alexey Khoroshilov <khoroshilov@ispras.ru>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

e8271fa3

f2fs: add available_nids to fix handling max_nid correctly · 7ee0eeab

由 Jaegeuk Kim 提交于 4月 18, 2014

This patch introduces available_nids for alloc_nids() and fixes max_nid for
build_free_nids() and scan_nat_pages().
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Reviewed-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

7ee0eeab

f2fs: add static to get_max_meta_blks · b49ad51e

由 Fabian Frederick 提交于 4月 17, 2014

inline get_max_meta_blks is only used in checkpoint.c
Use standard static inline format.

Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

b49ad51e

f2fs: introduce raw_nat_from_node_info() to simplfy codes · 94dac22e

由 Chao Yu 提交于 4月 17, 2014

This patch introduce raw_nat_from_node_info() to simplfy some codes, and also
use exist function node_info_from_raw_nat() to do the same job.
Signed-off-by: NChao Yu <chao2.yu@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

94dac22e

f2fs: add the flush_merge handle in the remount flow · 876dc59e

由 Gu Zheng 提交于 4月 11, 2014

Add the *remount* handle of flush_merge option, so that the users
can enable flush_merge in the runtime, such as the underlying device
handles the cache_flush command relatively slowly.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

876dc59e

f2fs: atomically set inode->i_flags in f2fs_set_inode_flags() · 8abfb36a

由 Zhang Zhen 提交于 4月 15, 2014

Use set_mask_bits() to atomically set i_flags instead of clearing out the
S_IMMUTABLE, S_APPEND, etc. flags and then setting them from the
FS_IMMUTABLE_FL, FS_APPEND_FL, etc. flags, since this opens up a race
where an immutable file has the immutable flag cleared for a brief
window of time.
Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

8abfb36a

f2fs: make recover_inline_xattr() static · b156d542

由 Jingoo Han 提交于 4月 15, 2014

Make recover_inline_xattr() static, because this function is
used only in this file.
Signed-off-by: NJingoo Han <jg1.han@samsung.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

b156d542

f2fs: remove costly dirty_dir_inode operations · ed57c27f

由 Jaegeuk Kim 提交于 4月 15, 2014

This patch removes list opeations in handling dirty dir inodes.
Previously, F2FS traverses whole the list of dirty dir inodes to check whether
there is an existing inode or not, resulting in heavy CPU overheads.

So this patch removes such the traverse operations by adding FI_DIRTY_DIR to
indicate the inode lies on the list or not.
Through this simple flag, we can remove redundant operations gracefully.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

ed57c27f

J
f2fs: fix to unlock f2fs_lock at the omitted error case · 15c6e3aa
由 Jaegeuk Kim 提交于 4月 16, 2014
```
If it occurs an error, we should call f2fs_unlock_op.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>
```
15c6e3aa

f2fs: call redirty_page_for_writepage · 76f60268

由 Jaegeuk Kim 提交于 4月 15, 2014

This patch replace some general codes with redirty_page_for_writepage, which
can be enabled after consideration on additional procedure like counting dirty
pages appropriately.
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

76f60268

f2fs: avoid to conduct roll-forward due to the remained garbage blocks · 1e87a78d

由 Jaegeuk Kim 提交于 4月 15, 2014

The f2fs always scans the next chain of direct node blocks.
But some garbage blocks are able to be remained due to no discard support or
SSR triggers.
This occasionally wreaks recovering wrong inodes that were used or BUG_ONs
due to reallocating node ids as follows.

When mount this f2fs image:
http://linuxtesting.org/downloads/f2fs_fault_image.zip
BUG_ON is triggered in f2fs driver (messages below are generated on
kernel 3.13.2; for other kernels output is similar):

kernel BUG at fs/f2fs/node.c:215!
 Call Trace:
 [<ffffffffa032ebad>] recover_inode_page+0x1fd/0x3e0 [f2fs]
 [<ffffffff811446e7>] ? __lock_page+0x67/0x70
 [<ffffffff81089990>] ? autoremove_wake_function+0x50/0x50
 [<ffffffffa0337788>] recover_fsync_data+0x1398/0x15d0 [f2fs]
 [<ffffffff812b9e5c>] ? selinux_d_instantiate+0x1c/0x20
 [<ffffffff811cb20b>] ? d_instantiate+0x5b/0x80
 [<ffffffffa0321044>] f2fs_fill_super+0xb04/0xbf0 [f2fs]
 [<ffffffff811b861e>] ? mount_bdev+0x7e/0x210
 [<ffffffff811b8769>] mount_bdev+0x1c9/0x210
 [<ffffffffa0320540>] ? validate_superblock+0x210/0x210 [f2fs]
 [<ffffffffa031cf8d>] f2fs_mount+0x1d/0x30 [f2fs]
 [<ffffffff811b9497>] mount_fs+0x47/0x1c0
 [<ffffffff81166e00>] ? __alloc_percpu+0x10/0x20
 [<ffffffff811d4032>] vfs_kern_mount+0x72/0x110
 [<ffffffff811d6763>] do_mount+0x493/0x910
 [<ffffffff811615cb>] ? strndup_user+0x5b/0x80
 [<ffffffff811d6c70>] SyS_mount+0x90/0xe0
 [<ffffffff8166f8d9>] system_call_fastpath+0x16/0x1b

Found by Linux File System Verification project (linuxtesting.org).
Reported-by: NAndrey Tsyvarev <tsyvarev@ispras.ru>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

1e87a78d

f2fs: enable flush_merge only in f2fs is not read-only · b270ad6f

由 Gu Zheng 提交于 4月 11, 2014

Enable flush_merge only in f2fs is not read-only, so does the mount
option show.
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

b270ad6f

f2fs: use __GFP_ZERO to avoid appending set-NULL · 197d4647

由 Gu Zheng 提交于 4月 11, 2014

Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

197d4647

f2fs: put the bio when issue_flush completed · a4ed23f2

由 Gu Zheng 提交于 4月 11, 2014

Put the bio when the flush cmd issued, it also can fix the following
kmemleak:
unreferenced object 0xffff8800270c73c0 (size 200):
  comm "f2fs_flush-7:0", pid 27161, jiffies 4312127988 (age 988.503s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 40 07 81 19 01 88 ff ff  ........@.......
    01 00 00 00 00 00 00 f0 11 14 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff81559866>] kmemleak_alloc+0x72/0x96
    [<ffffffff81156f7e>] slab_post_alloc_hook+0x28/0x2a
    [<ffffffff811595b1>] kmem_cache_alloc+0xec/0x157
    [<ffffffff8111924d>] mempool_alloc_slab+0x15/0x17
    [<ffffffff81119513>] mempool_alloc+0x71/0x138
    [<ffffffff81193548>] bio_alloc_bioset+0x93/0x18c
    [<ffffffffa040f857>] issue_flush_thread+0x8d/0x145 [f2fs]
    [<ffffffff8107ac16>] kthread+0xba/0xc2
    [<ffffffff81571b2c>] ret_from_fork+0x7c/0xb0
    [<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
Signed-off-by: NJaegeuk Kim <jaegeuk.kim@samsung.com>

a4ed23f2

Merge branch 'akpm' (incoming from Andrew) · 38583f09

由 Linus Torvalds 提交于 5月 06, 2014

Merge misc fixes from Andrew Morton:
 "13 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  agp: info leak in agpioc_info_wrap()
  fs/affs/super.c: bugfix / double free
  fanotify: fix -EOVERFLOW with large files on 64-bit
  slub: use sysfs'es release mechanism for kmem_cache
  revert "mm: vmscan: do not swap anon pages just because free+file is low"
  autofs: fix lockref lookup
  mm: filemap: update find_get_pages_tag() to deal with shadow entries
  mm/compaction: make isolate_freepages start at pageblock boundary
  MAINTAINERS: zswap/zbud: change maintainer email address
  mm/page-writeback.c: fix divide by zero in pos_ratio_polynom
  hugetlb: ensure hugepage access is denied if hugepages are not supported
  slub: fix memcg_propagate_slab_attrs
  drivers/rtc/rtc-pcf8523.c: fix month definition

38583f09

agp: info leak in agpioc_info_wrap() · 3ca9e5d3

由 Dan Carpenter 提交于 5月 06, 2014

On 64 bit systems the agp_info struct has a 4 byte hole between
->agp_mode and ->aper_base.  We need to clear it to avoid disclosing
stack information to userspace.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ca9e5d3

fs/affs/super.c: bugfix / double free · d353efd0

由 Fabian Frederick 提交于 5月 06, 2014

Commit 842a859d ("affs: use ->kill_sb() to simplify ->put_super()
and failure exits of ->mount()") adds .kill_sb which frees sbi but
doesn't remove sbi free in case of parse_options error causing double
free+random crash.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>	[3.14.x]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d353efd0

fanotify: fix -EOVERFLOW with large files on 64-bit · 1e2ee49f

由 Will Woods 提交于 5月 06, 2014

On 64-bit systems, O_LARGEFILE is automatically added to flags inside
the open() syscall (also openat(), blkdev_open(), etc). Userspace
therefore defines O_LARGEFILE to be 0 - you can use it, but it's a
no-op. Everything should be O_LARGEFILE by default.

But: when fanotify does create_fd() it uses dentry_open(), which skips
all that. And userspace can't set O_LARGEFILE in fanotify_init()
because it's defined to 0. So if fanotify gets an event regarding a
large file, the read() will just fail with -EOVERFLOW.

This patch adds O_LARGEFILE to fanotify_init()'s event_f_flags on 64-bit
systems, using the same test as open()/openat()/etc.

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=696821Signed-off-by: NWill Woods <wwoods@redhat.com>
Acked-by: NEric Paris <eparis@redhat.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1e2ee49f

slub: use sysfs'es release mechanism for kmem_cache · 41a21285

由 Christoph Lameter 提交于 5月 06, 2014

debugobjects warning during netfilter exit:

    ------------[ cut here ]------------
    WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0()
    ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
    Modules linked in:
    CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G        W 3.11.0-next-20130906-sasha #3984
    Workqueue: netns cleanup_net
    Call Trace:
      dump_stack+0x52/0x87
      warn_slowpath_common+0x8c/0xc0
      warn_slowpath_fmt+0x46/0x50
      debug_print_object+0x8d/0xb0
      __debug_check_no_obj_freed+0xa5/0x220
      debug_check_no_obj_freed+0x15/0x20
      kmem_cache_free+0x197/0x340
      kmem_cache_destroy+0x86/0xe0
      nf_conntrack_cleanup_net_list+0x131/0x170
      nf_conntrack_pernet_exit+0x5d/0x70
      ops_exit_list+0x5e/0x70
      cleanup_net+0xfb/0x1c0
      process_one_work+0x338/0x550
      worker_thread+0x215/0x350
      kthread+0xe7/0xf0
      ret_from_fork+0x7c/0xb0

Also during dcookie cleanup:

    WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0()
    ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
    Modules linked in:
    CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408
    Call Trace:
      dump_stack (lib/dump_stack.c:52)
      warn_slowpath_common (kernel/panic.c:430)
      warn_slowpath_fmt (kernel/panic.c:445)
      debug_print_object (lib/debugobjects.c:262)
      __debug_check_no_obj_freed (lib/debugobjects.c:697)
      debug_check_no_obj_freed (lib/debugobjects.c:726)
      kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717)
      kmem_cache_destroy (mm/slab_common.c:363)
      dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343)
      event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153)
      __fput (fs/file_table.c:217)
      ____fput (fs/file_table.c:253)
      task_work_run (kernel/task_work.c:125 (discriminator 1))
      do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751)
      int_signal (arch/x86/kernel/entry_64.S:807)

Sysfs has a release mechanism.  Use that to release the kmem_cache
structure if CONFIG_SYSFS is enabled.

Only slub is changed - slab currently only supports /proc/slabinfo and
not /sys/kernel/slab/*.  We talked about adding that and someone was
working on it.

[akpm@linux-foundation.org: fix CONFIG_SYSFS=n build]
[akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more]
Signed-off-by: NChristoph Lameter <cl@linux.com>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Tested-by: NSasha Levin <sasha.levin@oracle.com>
Acked-by: NGreg KH <greg@kroah.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41a21285

revert "mm: vmscan: do not swap anon pages just because free+file is low" · 62376251

由 Johannes Weiner 提交于 5月 06, 2014

This reverts commit 0bf1457f ("mm: vmscan: do not swap anon pages
just because free+file is low") because it introduced a regression in
mostly-anonymous workloads, where reclaim would become ineffective and
trap every allocating task in direct reclaim.

The problem is that there is a runaway feedback loop in the scan balance
between file and anon, where the balance tips heavily towards a tiny
thrashing file LRU and anonymous pages are no longer being looked at.
The commit in question removed the safe guard that would detect such
situations and respond with forced anonymous reclaim.

This commit was part of a series to fix premature swapping in loads with
relatively little cache, and while it made a small difference, the cure
is obviously worse than the disease.  Revert it.
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reported-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Acked-by: NRafael Aquini <aquini@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: <stable@kernel.org>		[3.12+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

62376251

autofs: fix lockref lookup · 6b6751f7

由 Ian Kent 提交于 5月 06, 2014

autofs needs to be able to see private data dentry flags for its dentrys
that are being created but not yet hashed and for its dentrys that have
been rmdir()ed but not yet freed.  It needs to do this so it can block
processes in these states until a status has been returned to indicate
the given operation is complete.

It does this by keeping two lists, active and expring, of dentrys in
this state and uses ->d_release() to keep them stable while it checks
the reference count to determine if they should be used.

But with the recent lockref changes dentrys being freed sometimes don't
transition to a reference count of 0 before being freed so autofs can
occassionally use a dentry that is invalid which can lead to a panic.
Signed-off-by: NIan Kent <raven@themaw.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6b6751f7

mm: filemap: update find_get_pages_tag() to deal with shadow entries · 139b6a6f

由 Johannes Weiner 提交于 5月 06, 2014

Dave Jones reports the following crash when find_get_pages_tag() runs
into an exceptional entry:

  kernel BUG at mm/filemap.c:1347!
  RIP: find_get_pages_tag+0x1cb/0x220
  Call Trace:
    find_get_pages_tag+0x36/0x220
    pagevec_lookup_tag+0x21/0x30
    filemap_fdatawait_range+0xbe/0x1e0
    filemap_fdatawait+0x27/0x30
    sync_inodes_sb+0x204/0x2a0
    sync_inodes_one_sb+0x19/0x20
    iterate_supers+0xb2/0x110
    sys_sync+0x44/0xb0
    ia32_do_call+0x13/0x13

  1343                         /*
  1344                          * This function is never used on a shmem/tmpfs
  1345                          * mapping, so a swap entry won't be found here.
  1346                          */
  1347                         BUG();

After commit 0cd6144a ("mm + fs: prepare for non-page entries in
page cache radix trees") this comment and BUG() are out of date because
exceptional entries can now appear in all mappings - as shadows of
recently evicted pages.

However, as Hugh Dickins notes,

  "it is truly surprising for a PAGECACHE_TAG_WRITEBACK (and probably
   any other PAGECACHE_TAG_*) to appear on an exceptional entry.

   I expect it comes down to an occasional race in RCU lookup of the
   radix_tree: lacking absolute synchronization, we might sometimes
   catch an exceptional entry, with the tag which really belongs with
   the unexceptional entry which was there an instant before."

And indeed, not only is the tree walk lockless, the tags are also read
in chunks, one radix tree node at a time.  There is plenty of time for
page reclaim to swoop in and replace a page that was already looked up
as tagged with a shadow entry.

Remove the BUG() and update the comment.  While reviewing all other
lookup sites for whether they properly deal with shadow entries of
evicted pages, update all the comments and fix memcg file charge moving
to not miss shmem/tmpfs swapcache pages.

Fixes: 0cd6144a ("mm + fs: prepare for non-page entries in page cache radix trees")
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Reported-by: NDave Jones <davej@redhat.com>
Acked-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

139b6a6f

mm/compaction: make isolate_freepages start at pageblock boundary · 49e068f0

由 Vlastimil Babka 提交于 5月 06, 2014

The compaction freepage scanner implementation in isolate_freepages()
starts by taking the current cc->free_pfn value as the first pfn.  In a
for loop, it scans from this first pfn to the end of the pageblock, and
then subtracts pageblock_nr_pages from the first pfn to obtain the first
pfn for the next for loop iteration.

This means that when cc->free_pfn starts at offset X rather than being
aligned on pageblock boundary, the scanner will start at offset X in all
scanned pageblock, ignoring potentially many free pages.  Currently this
can happen when

 a) zone's end pfn is not pageblock aligned, or

 b) through zone->compact_cached_free_pfn with CONFIG_HOLES_IN_ZONE
    enabled and a hole spanning the beginning of a pageblock

This patch fixes the problem by aligning the initial pfn in
isolate_freepages() to pageblock boundary.  This also permits replacing
the end-of-pageblock alignment within the for loop with a simple
pageblock_nr_pages increment.
Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
Reported-by: NHeesub Shin <heesub.shin@samsung.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: NRik van Riel <riel@redhat.com>
Cc: Dongjun Shin <d.j.shin@samsung.com>
Cc: Sunghwan Yun <sunghwan.yun@samsung.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

49e068f0

MAINTAINERS: zswap/zbud: change maintainer email address · 0e3b7e54

由 Seth Jennings 提交于 5月 06, 2014

sjenning@linux.vnet.ibm.com is no longer a viable entity.
Signed-off-by: NSeth Jennings <sjennings@variantweb.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0e3b7e54

mm/page-writeback.c: fix divide by zero in pos_ratio_polynom · d5c9fde3

由 Rik van Riel 提交于 5月 06, 2014

It is possible for "limit - setpoint + 1" to equal zero, after getting
truncated to a 32 bit variable, and resulting in a divide by zero error.

Using the fully 64 bit divide functions avoids this problem.  It also
will cause pos_ratio_polynom() to return the correct value when
(setpoint - limit) exceeds 2^32.

Also uninline pos_ratio_polynom, at Andrew's request.
Signed-off-by: NRik van Riel <riel@redhat.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Luiz Capitulino <lcapitulino@redhat.com>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d5c9fde3

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功