提交 · 5d2b42ede71c9da0bf4248fd2d409918fb065b5f · openanolis / cloud-kernel

30 8月, 2016 13 次提交

f2fs: fix a bug when using namehash to locate dentry bucket · 5d2b42ed

由 Shuoran Liu 提交于 8月 25, 2016

In the following scenario,

1) we don't have the key and doing a lookup for encrypted file,
2) and the encrypted filename is big name

we should use fname->hash as name hash value instead of what is
calculated by fname->disk_name. Because in such case,
fname->disk_name is empty.
Signed-off-by: NShuoran Liu <liushuoran@huawei.com>
Acked-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

5d2b42ed

f2fs: fix to preallocate block only aligned to 4K · dfd02e4d

由 Chao Yu 提交于 8月 20, 2016

In write_begin(), we skip checking dnode block for preallocating block
when whole block needs to be updated since we preallocated its block in
f2fs_preallocate_blocks, for partial updated block, we will still try
to lock its node and do preallocation in write_begin(), so in
f2fs_preallocate_blocks we should not preallocate its block.

But previously, the calculation of preallocating block number is
incorrect, fix it.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
[Jaegeuk Kim: fix a bug]
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

dfd02e4d

f2fs: fix non static symbol warning · 6a7a3aed

由 Wei Yongjun 提交于 8月 23, 2016

Fixes the following sparse warning:

fs/f2fs/data.c:969:12: warning:
 symbol 'f2fs_grab_bio' was not declared. Should it be static?
Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
Acked-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

6a7a3aed

f2fs: remove unnecessary initialization · 69494229

由 Sheng Yong 提交于 8月 23, 2016

`flags' is used to save value from userspace, there is no need to
initialize it, and FS_FL_USER_VISIBLE is the mask for getflags.
Signed-off-by: NSheng Yong <shengyong1@huawei.com>
Acked-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

69494229

f2fs: remove redundant judgement condition in available_free_memory · 5f8eaf1f

由 Chao Yu 提交于 8月 21, 2016

In available_free_memory, there are two same judgement conditions which
is used for checking NAT excess, remove one of them.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

5f8eaf1f

f2fs: check return value of write_checkpoint during fstrim · e9328353

由 Chao Yu 提交于 8月 21, 2016

During fstrim, if one of multiple write_checkpoint failed, break off and
return error number to caller.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

e9328353

f2fs: fix to do f2fs_balance_fs in f2fs_map_blocks correctly · 58383bef

由 Chao Yu 提交于 8月 20, 2016

If we preallocate blocks with f2fs_reserve_blocks in f2fs_map_blocks, we
should call f2fs_balance_fs for checking and reclaiming space, fix it.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

58383bef

f2fs: avoid unneeded loop in build_sit_entries · d600af23

由 Chao Yu 提交于 8月 19, 2016

When building each sit entry in cache, firstly, we will load it from
sit page, and then check all entries in sit journal, if there is one
updated entry in journal, cover cached entry with the journaled one.

Actually, most of check operation is unneeded since we only need
to update cached entries with journaled entries in batch, so
changing the flow as below for more efficient:
1. load all sit entries into cache from sit pages;
2. update sit entries with journal.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

d600af23

f2fs: clean up foreground GC flow · 43ced84e

由 Chao Yu 提交于 8月 19, 2016

This patch changes to check valid block number of one GCed section
directly instead of checking the number in all segments of section
one by one in order to clean up codes of foreground GC.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

43ced84e

f2fs: set dirty state for filesystem only when updating meta data · 7c4abcbe

由 Chao Yu 提交于 8月 18, 2016

We don't guarantee integrity of user data after checkpoint, since we only
guarantee meta data integrity for data consistency of filesystem.

Due to above reason, we only need to set fs as dirty when meta data is
updated, so that we can skip writing checkpoint in some case of non-meta
data is updated.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7c4abcbe

f2fs: skip new checkpoint when doing fstrim without fs change · 58cce381

由 Yunlei He 提交于 8月 18, 2016

This patch enables to do fstrim without checkpoint, if there is no fs
change.
Signed-off-by: NYunlei He <heyunlei@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

58cce381

f2fs: add discard info to sys entry of f2fs status · f83a2584

由 Yunlei He 提交于 8月 18, 2016

This patch add discard block count to sys entry of f2fs status
Signed-off-by: NYunlei He <heyunlei@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

f83a2584

f2fs: reduce batch size of fstrim · 2d9e9c32

由 Jaegeuk Kim 提交于 8月 11, 2016

This is to reduce the batch size of fstrim to avoid long latency.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

2d9e9c32

25 8月, 2016 2 次提交

f2fs: do not use discard_map for hard disks · 3e025740

由 Jaegeuk Kim 提交于 8月 02, 2016

We don't need to keep discard_map, if disk does not support discard command.
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

3e025740

f2fs: not allow to write illegal blkaddr · bb413d6a

由 Yunlei He 提交于 7月 28, 2016

we came across an error as below:

[build_nat_area_bitmap:1710] nid[0x    1718] addr[0x         1c18ddc] ino[0x    1718]
[build_nat_area_bitmap:1710] nid[0x    1719] addr[0x         1c193d5] ino[0x    1719]
[build_nat_area_bitmap:1710] nid[0x    171a] addr[0x         1c1736e] ino[0x    171a]
[build_nat_area_bitmap:1710] nid[0x    171b] addr[0x        58b3ee8f] ino[0x815f92ed]
[build_nat_area_bitmap:1710] nid[0x    171c] addr[0x         fcdc94b] ino[0x49366377]
[build_nat_area_bitmap:1710] nid[0x    171d] addr[0x        7cd2facf] ino[0xb3c55300]
[build_nat_area_bitmap:1710] nid[0x    171e] addr[0x        bd4e25d0] ino[0x77c34c09]

... ...

[build_nat_area_bitmap:1710] nid[0x    1718] addr[0x         1c18ddc] ino[0x    1718]
[build_nat_area_bitmap:1710] nid[0x    1719] addr[0x         1c193d5] ino[0x    1719]
[build_nat_area_bitmap:1710] nid[0x    171a] addr[0x         1c1736e] ino[0x    171a]
[build_nat_area_bitmap:1710] nid[0x    171b] addr[0x        58b3ee8f] ino[0x815f92ed]
[build_nat_area_bitmap:1710] nid[0x    171c] addr[0x         fcdc94b] ino[0x49366377]
[build_nat_area_bitmap:1710] nid[0x    171d] addr[0x        7cd2facf] ino[0xb3c55300]
[build_nat_area_bitmap:1710] nid[0x    171e] addr[0x        bd4e25d0] ino[0x77c34c09]

One nat block may be stepped by a data block, so this patch forbid to
write if the blkaddr is illegal
Signed-off-by: NYunlei He <heyunlei@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

bb413d6a

19 8月, 2016 4 次提交

f2fs: avoid potential deadlock in f2fs_move_file_range · 20a3d61d

由 Chao Yu 提交于 8月 04, 2016

Thread A			Thread B
- inode_lock fileA
				- inode_lock fileB
				 - inode_lock fileA
 - inode_lock fileB

We may encounter above potential deadlock during moving file range in
concurrent scenario. This patch fixes the issue by using inode_trylock
instead.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

20a3d61d

f2fs: allow copying file range only in between regular files · fe8494bf

由 Chao Yu 提交于 8月 04, 2016

Only if two input files are regular files, we allow copying data in
range of them, otherwise, deny it.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

fe8494bf

Revert "f2fs: move i_size_write in f2fs_write_end" · 3024c9a1

由 Chao Yu 提交于 8月 06, 2016

This reverts commit a2ee0a30.

When testing with generic/032 of xfstest suit, failure message will be
reported as below:

generic/032 8s ... [failed, exit status 1] - output mismatch (see results/generic/032.out.bad)
    --- tests/generic/032.out	2015-01-11 16:52:27.643681072 +0800
    +++ results/generic/032.out.bad	2016-08-06 13:44:43.861330500 +0800
    @@ -1,5 +1,5 @@
     QA output created by 032
    -100 iterations
    -0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
    -*
    -0100000
    +1: [768..775]: unwritten
    +Unwritten extents found!
    ...
    (Run 'diff -u tests/generic/032.out results/generic/032.out.bad'  to see the entire diff)
Ran: generic/032
Failures: generic/032
Failed 1 of 1 tests

In write_end(), we should update i_size of inode before unlock page,
otherwise, we will lose newly updated data in following race condition.

Thread A			Thread B
- write_end
 - unlock page
				- writepages
				 - lock_page
				  - writepage
				  if page is out-of-range of file size,
				  we will skip writting the page.
 - update i_size
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

3024c9a1

Revert "f2fs: use percpu_rw_semaphore" · b873b798

由 Jaegeuk Kim 提交于 8月 04, 2016

LKP reported -36.3% regression of fsmark.files_per_sec due to this patch.
I've confirmed that fxmark [1] has also slight regression for DWAL.

[1] https://github.com/sslab-gatech/fxmark

This reverts commit ec795418.

b873b798

17 8月, 2016 12 次提交

xfs: remove OWN_AG rmap when allocating a block from the AGFL · a03f1a66

由 Darrick J. Wong 提交于 8月 17, 2016

When we're really tight on space, xfs_alloc_ag_vextent_small() can
allocate a block from the AGFL and give it to the caller.  Since the
caller is never the AGFL-fixing method, we must remove the OWN_AG
reverse mapping because it will clash with whatever rmap the caller
wants to set up.  This bug was discovered by running generic/299
repeatedly.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

a03f1a66

xfs: (re-)implement FIEMAP_FLAG_XATTR · 1d4795e7

由 Christoph Hellwig 提交于 8月 17, 2016

Use a special read-only iomap_ops implementation to support fiemap on
the attr fork.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

1d4795e7

xfs: simplify xfs_file_iomap_begin · b95a2127

由 Christoph Hellwig 提交于 8月 17, 2016

We'll never get nimap == 0 for a successful return from xfs_bmapi_read,
so don't try to handle it.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

b95a2127

iomap: mark ->iomap_end as optional · f20ac7ab

由 Christoph Hellwig 提交于 8月 17, 2016

No need to implement it for read-only mappings.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

f20ac7ab

iomap: prepare iomap_fiemap for attribute mappings · ac2dc058

由 Dave Chinner 提交于 8月 17, 2016

By bassing through an -ENOENT, similar to the old XFS implementation of
FIEMAP_FLAG_XATTR.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
[hch: split from a larger patch]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

ac2dc058

iomap: fiemap should honor the FIEMAP_FLAG_SYNC flag · 8896b8f6

由 Dave Chinner 提交于 8月 17, 2016

The flag is checked as supported, but then we do an unconditional
sync of the file, regardless of whether the flag is set or not. Make
the sync conditional on having the FIEMAP_FLAG_SYNC flag set.
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

8896b8f6

iomap: remove superflous pagefault_disable from iomap_write_actor · 274c8874

由 Christoph Hellwig 提交于 8月 17, 2016

iov_iter_copy_from_user_atomic disables page faults internally, no need to
do it around the call.  This also brings the iomap code in line with
the original filemap version.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

274c8874

iomap: remove superflous mark_page_accessed from iomap_write_actor · 97dd8c9e

由 Christoph Hellwig 提交于 8月 17, 2016

This catches up with commit  2457ae ("mm: non-atomically mark page
accessed during page cache allocation where possible"), which
moved the initial access marking into the pagecache allocator.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

97dd8c9e

xfs: store rmapbt block count in the AGF · f32866fd

由 Darrick J. Wong 提交于 8月 17, 2016

Track the number of blocks used for the rmapbt in the AGF.  When we
get to the AG reservation code we need this counter to quickly
make our reservation during mount.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

f32866fd

xfs: don't invalidate whole file on DAX read/write · 8b2180b3

由 Dave Chinner 提交于 8月 17, 2016

When we do DAX IO, we try to invalidate the entire page cache held
on the file. This is incorrect as it will trash the entire mapping
tree that now tracks dirty state in exceptional entries in the radix
tree slots.

What we are trying to do is remove cached pages (e.g from reads
into holes) that sit in the radix tree over the range we are about
to write to. Hence we should just limit the invalidation to the
range we are about to overwrite.
Reported-by: NJan Kara <jack@suse.cz>
Signed-off-by: NDave Chinner <dchinner@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDave Chinner <david@fromorbit.com>

8b2180b3

xfs: fix bogus space reservation in xfs_iomap_write_allocate · 0af32fb4

由 Christoph Hellwig 提交于 8月 17, 2016

The space reservations was without an explaination in commit

    "Add error reporting calls in error paths that return EFSCORRUPTED"

back in 2003.  There is no reason to reserve disk blocks in the
transaction when allocating blocks for delalloc space as we already
reserved the space when creating the delalloc extent.

With this fix we stop running out of the reserved pool in
generic/229, which has happened for long time with small blocksize
file systems, and has increased in severity with the new buffered
write path.

[ dchinner: we still need to pass the block reservation into
  xfs_bmapi_write() to ensure we don't deadlock during AG selection.
  See commit dbd5c8c9 ("xfs: pass total block res. as total
  xfs_bmapi_write() parameter") for more details on why this is
  necessary. ]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

0af32fb4

xfs: don't assert fail on non-async buffers on ioacct decrement · 4dd3fd71

由 Brian Foster 提交于 8月 17, 2016

The buffer I/O accounting mechanism tracks async buffers under I/O. As
an optimization, the buffer I/O count is incremented only once on the
first async I/O for a given hold cycle of a buffer and decremented once
the buffer is released to the LRU (or freed).

xfs_buf_ioacct_dec() has an ASSERT() check for an XBF_ASYNC buffer, but
we have one or two corner cases where a buffer can be submitted for I/O
multiple times via different methods in a single hold cycle. If an async
I/O occurs first, the I/O count is incremented. If a sync I/O occurs
before the hold count drops, XBF_ASYNC is cleared by the time the I/O
count is decremented.

Remove the async assert check from xfs_buf_ioacct_dec() as this is a
perfectly valid scenario. For the purposes of I/O accounting, we really
only care about the buffer async state at I/O submission time.
Discovered-and-analyzed-by: NDave Chinner <david@fromorbit.com>
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>
Signed-off-by: NDave Chinner <david@fromorbit.com>

4dd3fd71

13 8月, 2016 1 次提交

nfsd: don't return an unhashed lock stateid after taking mutex · dd257933

由 Jeff Layton 提交于 8月 11, 2016

nfsd4_lock will take the st_mutex before working with the stateid it
gets, but between the time when we drop the cl_lock and take the mutex,
the stateid could become unhashed (a'la FREE_STATEID). If that happens
the lock stateid returned to the client will be forgotten.

Fix this by first moving the st_mutex acquisition into
lookup_or_create_lock_state. Then, have it check to see if the lock
stateid is still hashed after taking the mutex. If it's not, then put
the stateid and try the find/create again.
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Tested-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Cc: stable@vger.kernel.org # feb9dad5 nfsd: Always lock state exclusively.
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

dd257933

12 8月, 2016 2 次提交

proc, meminfo: use correct helpers for calculating LRU sizes in meminfo · 2f95ff90

由 Mel Gorman 提交于 8月 11, 2016

meminfo_proc_show() and si_mem_available() are using the wrong helpers
for calculating the size of the LRUs. The user-visible impact is that
there appears to be an abnormally high number of unevictable pages.

Link: http://lkml.kernel.org/r/20160805105805.GR2799@techsingularity.netSigned-off-by: NMel Gorman <mgorman@techsingularity.net>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2f95ff90

nfsd: Fix race between FREE_STATEID and LOCK · 42691398

由 Chuck Lever 提交于 8月 11, 2016

When running LTP's nfslock01 test, the Linux client can send a LOCK
and a FREE_STATEID request at the same time. The outcome is:

Frame 324    R OPEN stateid [2,O]

Frame 115004 C LOCK lockowner_is_new stateid [2,O] offset 672000 len 64
Frame 115008 R LOCK stateid [1,L]
Frame 115012 C WRITE stateid [0,L] offset 672000 len 64
Frame 115016 R WRITE NFS4_OK
Frame 115019 C LOCKU stateid [1,L] offset 672000 len 64
Frame 115022 R LOCKU NFS4_OK
Frame 115025 C FREE_STATEID stateid [2,L]
Frame 115026 C LOCK lockowner_is_new stateid [2,O] offset 672128 len 64
Frame 115029 R FREE_STATEID NFS4_OK
Frame 115030 R LOCK stateid [3,L]
Frame 115034 C WRITE stateid [0,L] offset 672128 len 64
Frame 115038 R WRITE NFS4ERR_BAD_STATEID

In other words, the server returns stateid L in a successful LOCK
reply, but it has already released it. Subsequent uses of stateid L
fail.

To address this, protect the generation check in nfsd4_free_stateid
with the st_mutex. This should guarantee that only one of two
outcomes occurs: either LOCK returns a fresh valid stateid, or
FREE_STATEID returns NFS4ERR_LOCKS_HELD.
Reported-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Fix-suggested-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

42691398

11 8月, 2016 1 次提交

nfsd: fix dentry refcounting on create · 502aa0a5

由 Josef Bacik 提交于 8月 10, 2016

b44061d0 introduced a dentry ref counting bug.  Previously we were
grabbing one ref to dchild in nfsd_create(), but with the creation of
nfsd_create_locked() we have a ref for dchild from the lookup in
nfsd_create(), and then another ref in nfsd_create_locked().  The ref
from the lookup in nfsd_create() is never dropped and results in
dentries still in use at unmount.
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Fixes: b44061d0 "nfsd: reorganize nfsd_create"
Reported-by: Nkernel test robot <xiaolong.ye@intel.com>
Reviewed-by: NJeff Layton <jlayton@redhat.com>
Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

502aa0a5

10 8月, 2016 2 次提交

mm, writeback: flush plugged IO in wakeup_flusher_threads() · 51350ea0

由 Konstantin Khlebnikov 提交于 8月 04, 2016

I've found funny live-lock between raid10 barriers during resync and
memory controller hard limits. Inside mpage_readpages() task holds on to
its plug bio which blocks the barrier in raid10. Its memory cgroup have
no free memory thus the task goes into reclaimer but all reclaimable
pages are dirty and cannot be written because raid10 is rebuilding and
stuck on the barrier.

Common flush of such IO in schedule() never happens, because the caller
doesn't go to sleep.

Lock is 'live' because changing memory limit or killing tasks which
holds that stuck bio unblock whole progress.

That was what happened in 3.18.x but I see no difference in upstream
logic.  Theoretically this might happen even without memory cgroup.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NJens Axboe <axboe@fb.com>

51350ea0

mm: memcontrol: only mark charged pages with PageKmemcg · c4159a75

由 Vladimir Davydov 提交于 8月 08, 2016

To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg,
which sets page->_mapcount to -512.  Currently, we set/clear PageKmemcg
in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated
with __GFP_ACCOUNT, including those that aren't actually charged to any
cgroup, i.e. allocated from the root cgroup context.  To avoid overhead
in case cgroups are not used, we only do that if memcg_kmem_enabled() is
true.  The latter is set iff there are kmem-enabled memory cgroups
(online or offline).  The root cgroup is not considered kmem-enabled.

As a result, if a page is allocated with __GFP_ACCOUNT for the root
cgroup when there are kmem-enabled memory cgroups and is freed after all
kmem-enabled memory cgroups were removed, e.g.

  # no memory cgroups has been created yet, create one
  mkdir /sys/fs/cgroup/memory/test
  # run something allocating pages with __GFP_ACCOUNT, e.g.
  # a program using pipe
  dmesg | tail
  # remove the memory cgroup
  rmdir /sys/fs/cgroup/memory/test

we'll get bad page state bug complaining about page->_mapcount != -1:

  BUG: Bad page state in process swapper/0  pfn:1fd945c
  page:ffffea007f651700 count:0 mapcount:-511 mapping:          (null) index:0x0
  flags: 0x1000000000000000()

To avoid that, let's mark with PageKmemcg only those pages that are
actually charged to and hence pin a non-root memory cgroup.

Fixes: 4949148a ("mm: charge/uncharge kmemcg from generic page allocator paths")
Reported-and-tested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c4159a75

09 8月, 2016 2 次提交
- I
  ceph: initialize pathbase in the !dentry case in encode_caps_cb() · 4eacd4cb
  由 Ilya Dryomov 提交于 8月 09, 2016
```
pathbase is the base inode; set it to 0 if we've got no path.

Coverity-id: 146348
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
```
  4eacd4cb
- Y
  ceph: fix null pointer dereference in ceph_flush_snaps() · e4d2b16a
  由 Yan, Zheng 提交于 8月 04, 2016
```
Signed-off-by: NYan, Zheng <zyan@redhat.com>
```
  e4d2b16a
08 8月, 2016 1 次提交

block: rename bio bi_rw to bi_opf · 1eff9d32

由 Jens Axboe 提交于 8月 05, 2016

Since commit 63a4cc24, bio->bi_rw contains flags in the lower
portion and the op code in the higher portions. This means that
old code that relies on manually setting bi_rw is most likely
going to be broken. Instead of letting that brokeness linger,
rename the member, to force old and out-of-tree code to break
at compile time instead of at runtime.

No intended functional changes in this commit.
Signed-off-by: NJens Axboe <axboe@fb.com>

1eff9d32

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功