提交 · 254133f5d0c2df8476495d331f0fa69ff1a374c1 · openeuler / Kernel

07 4月, 2017 1 次提交

xfs: fold __xfs_trans_roll into xfs_trans_roll · 254133f5

由 Christoph Hellwig 提交于 4月 06, 2017

No one cares about the low-level helper anymore.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

254133f5

04 4月, 2017 15 次提交

xfs: report realtime space information via the rtbitmap · 4c934c7d

由 Darrick J. Wong 提交于 3月 28, 2017

Use the realtime bitmap to return free space information via getfsmap.
Eventually this will be superseded by the realtime rmapbt code.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

4c934c7d

xfs: have getfsmap fall back to the freesp btrees when rmap is not present · a1cae728

由 Darrick J. Wong 提交于 3月 28, 2017

If the reverse-mapping btree isn't available, fall back to the
free space btrees to provide partial reverse mapping information.
The online scrub tool can make use of even partial information to
speed up the data block scan.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

a1cae728

xfs: implement the GETFSMAP ioctl · e89c0413

由 Darrick J. Wong 提交于 3月 28, 2017

Introduce a new ioctl that uses the reverse mapping btree to return
information about the physical layout of the filesystem.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

e89c0413

xfs: add a couple of queries to iterate free extents in the rtbitmap · fb3c3de2

由 Darrick J. Wong 提交于 3月 28, 2017

Add _query_range and _query_all functions to the realtime bitmap
allocator.  These two functions are similar in usage to the btree
functions with the same name and will be used for getfsmap and scrub.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

fb3c3de2

xfs: create a function to query all records in a btree · e9a2599a

由 Darrick J. Wong 提交于 3月 28, 2017

Create a helper function that will query all records in a btree.
This will be used by the online repair functions to examine every
record in a btree to rebuild a second btree.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

e9a2599a

xfs: provide a query_range function for freespace btrees · 2d520bfa

由 Darrick J. Wong 提交于 3月 28, 2017

Implement a query_range function for the bnobt and cntbt.  This will
be used for getfsmap fallback if there is no rmapbt and by the online
scrub and repair code.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

2d520bfa

xfs: plumb in needed functions for range querying of the freespace btrees · 08438b1e

由 Darrick J. Wong 提交于 3月 28, 2017

Plumb in the pieces (init_high_key, diff_two_keys) necessary to call
query_range on the free space btrees.  Remove the debugging asserts
so that we can make queries starting from block 0.

While we're at it, merge the redundant "if (btnum ==" hunks.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NBrian Foster <bfoster@redhat.com>

08438b1e

xfs: fix over-copying of getbmap parameters from userspace · be6324c0

由 Darrick J. Wong 提交于 4月 03, 2017

In xfs_ioc_getbmap, we should only copy the fields of struct getbmap
from userspace, or else we end up copying random stack contents into the
kernel.  struct getbmap is a strict subset of getbmapx, so a partial
structure copy should work fine.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

be6324c0

xfs: Remove obsolete declaration of xfs_buf_get_empty · 422e5b53

由 Nikolay Borisov 提交于 3月 28, 2017

This function has been removed ever since at least 3.12-era. No need to
keep its declaration in the header so nuke it.
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

422e5b53

xfs: fix up inode validation failure message · bc593eeb

由 Eric Sandeen 提交于 3月 28, 2017

"xfs_iread: validation failed for inode 96 failed"

One "failed" seems like enough.
Signed-off-by: NEric Sandeen <sandeen@redhat.com>
Reviewed-by: NAlex Elder <elder@linaro.org>
Reviewed-by: NBill O'Donnell <billodo@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

bc593eeb

xfs: remove the ISUNWRITTEN macro · 63fbb4c1

由 Christoph Hellwig 提交于 3月 28, 2017

Opencoding the trivial checks makes it much easier to read (and grep..).
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

63fbb4c1

xfs: factor out a xfs_bmap_is_real_extent helper · 9c4f29d3

由 Christoph Hellwig 提交于 3月 28, 2017

This checks for all the non-normal extent types, including handling both
encodings of delayed allocations.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

9c4f29d3

xfs: use dedicated log worker wq to avoid deadlock with cil wq · 696a5620

由 Brian Foster 提交于 3月 28, 2017

The log covering background task used to be part of the xfssyncd
workqueue. That workqueue was removed as of commit 5889608d ("xfs:
syncd workqueue is no more") and the associated work item scheduled
to the xfs-log wq. The latter is used for log buffer I/O completion.

Since xfs_log_worker() can invoke a log flush, a deadlock is
possible between the xfs-log and xfs-cil workqueues. Consider the
following codepath from xfs_log_worker():

xfs_log_worker()
  xfs_log_force()
    _xfs_log_force()
      xlog_cil_force()
        xlog_cil_force_lsn()
          xlog_cil_push_now()
            flush_work()

The above is in xfs-log wq context and blocked waiting on the
completion of an xfs-cil work item. Concurrently, the cil push in
progress can end up blocked here:

xlog_cil_push_work()
  xlog_cil_push()
    xlog_write()
      xlog_state_get_iclog_space()
        xlog_wait(&log->l_flush_wait, ...)

The above is in xfs-cil context waiting on log buffer I/O
completion, which executes in xfs-log wq context. In this scenario
both workqueues are deadlocked waiting on eachother.

Add a new workqueue specifically for the high level log covering and
ail pushing worker, as was the case prior to commit 5889608d.
Diagnosed-by: NDavid Jeffery <djeffery@redhat.com>
Signed-off-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

696a5620

xfs: fix kernel memory exposure problems · f1c0e202

由 Darrick J. Wong 提交于 4月 03, 2017

Fix a memory exposure problems in inumbers where we allocate an array of
structures with holes, fail to zero the holes, then blindly copy the
kernel memory contents (junk and all) into userspace.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>

f1c0e202

xfs: Honor FALLOC_FL_KEEP_SIZE when punching ends of files · 105664df

由 Calvin Owens 提交于 3月 31, 2017

When punching past EOF on XFS, fallocate(mode=PUNCH_HOLE|KEEP_SIZE) will
round the file size up to the nearest multiple of PAGE_SIZE:

  calvinow@vm-disks/generic-xfs-1 ~$ dd if=/dev/urandom of=test bs=2048 count=1
  calvinow@vm-disks/generic-xfs-1 ~$ stat test
    Size: 2048            Blocks: 8          IO Block: 4096   regular file
  calvinow@vm-disks/generic-xfs-1 ~$ fallocate -n -l 2048 -o 2048 -p test
  calvinow@vm-disks/generic-xfs-1 ~$ stat test
    Size: 4096            Blocks: 8          IO Block: 4096   regular file

Commit 3c2bdc91 ("xfs: kill xfs_zero_remaining_bytes") replaced
xfs_zero_remaining_bytes() with calls to iomap helpers. The new helpers
don't enforce that [pos,offset) lies strictly on [0,i_size) when being
called from xfs_free_file_space(), so by "leaking" these ranges into
xfs_zero_range() we get this buggy behavior.

Fix this by reintroducing the checks xfs_zero_remaining_bytes() did
against i_size at the bottom of xfs_free_file_space().
Reported-by: NAaron Gao <gzh@fb.com>
Fixes: 3c2bdc91 ("xfs: kill xfs_zero_remaining_bytes")
Cc: Christoph Hellwig <hch@lst.de>
Cc: Brian Foster <bfoster@redhat.com>
Cc: <stable@vger.kernel.org> # 4.8+
Signed-off-by: NCalvin Owens <calvinowens@fb.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>

105664df

29 3月, 2017 1 次提交

xfs: rework the inline directory verifiers · 005c5db8

由 Darrick J. Wong 提交于 3月 28, 2017

The inline directory verifiers should be called on the inode fork data,
which means after iformat_local on the read side, and prior to
ifork_flush on the write side.  This makes the fork verifier more
consistent with the way buffer verifiers work -- i.e. they will operate
on the memory buffer that the code will be reading and writing directly.

Furthermore, revise the verifier function to return -EFSCORRUPTED so
that we don't flood the logs with corruption messages and assert
notices.  This has been a particular problem with xfs/348, which
triggers the XFS_WANT_CORRUPTED_RETURN assertions, which halts the
kernel when CONFIG_XFS_DEBUG=y.  Disk corruption isn't supposed to do
that, at least not in a verifier.
Reviewed-by: NBrian Foster <bfoster@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
---
v2: get the inode d_ops the proper way
v3: describe the bug that this patch fixes; no code changes

005c5db8

26 3月, 2017 2 次提交

T
ext4: fix two spelling nits · d67d64f4
由 Theodore Ts'o 提交于 3月 25, 2017
```
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
```
d67d64f4

ext4: lock the xattr block before checksuming it · dac7a4b4

由 Theodore Ts'o 提交于 3月 25, 2017

We must lock the xattr block before calculating or verifying the
checksum in order to avoid spurious checksum failures.

https://bugzilla.kernel.org/show_bug.cgi?id=193661Reported-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

dac7a4b4

20 3月, 2017 5 次提交

f2fs: combine nat_bits and free_nid_bitmap cache · 7041d5d2

由 Chao Yu 提交于 3月 08, 2017

Both nat_bits cache and free_nid_bitmap cache provide same functionality
as a intermediate cache between free nid cache and disk, but with
different granularity of indicating free nid range, and different
persistence policy. nat_bits cache provides better persistence ability,
and free_nid_bitmap provides better granularity.

In this patch we combine advantage of both caches, so finally policy of
the intermediate cache would be:
- init: load free nid status from nat_bits into free_nid_bitmap
- lookup: scan free_nid_bitmap before load NAT blocks
- update: update free_nid_bitmap in real-time
- persistence: udpate and persist nat_bits in checkpoint

This patch also resolves performance regression reported by lkp-robot.

commit:
  4ac91242 ("f2fs: introduce free nid bitmap")
  d00030cf9cd0bb96fdccc41e33d3c91dcbb672ba ("f2fs: use __set{__clear}_bit_le")
  1382c0f3f9d3f936c8bc42ed1591cf7a593ef9f7 ("f2fs: combine nat_bits and free_nid_bitmap cache")

4ac91242 d00030cf9cd0bb96fdccc41e33 1382c0f3f9d3f936c8bc42ed15
---------------- -------------------------- --------------------------
         %stddev     %change         %stddev     %change         %stddev
             \          |                \          |                \
     77863 ±  0%      +2.1%      79485 ±  1%     +50.8%     117404 ±  0%  aim7.jobs-per-min
    231.63 ±  0%      -2.0%     227.01 ±  1%     -33.6%     153.80 ±  0%  aim7.time.elapsed_time
    231.63 ±  0%      -2.0%     227.01 ±  1%     -33.6%     153.80 ±  0%  aim7.time.elapsed_time.max
    896604 ±  0%      -0.8%     889221 ±  3%     -20.2%     715260 ±  1%  aim7.time.involuntary_context_switches
      2394 ±  1%      +4.6%       2503 ±  1%      +3.7%       2481 ±  2%  aim7.time.maximum_resident_set_size
      6240 ±  0%      -1.5%       6145 ±  1%     -14.1%       5360 ±  1%  aim7.time.system_time
   1111357 ±  3%      +1.9%    1132509 ±  2%      -6.2%    1041932 ±  2%  aim7.time.voluntary_context_switches
...
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Tested-by: NXiaolong Ye <xiaolong.ye@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

7041d5d2

f2fs: skip scanning free nid bitmap of full NAT blocks · 586d1492

由 Chao Yu 提交于 3月 01, 2017

This patch adds to account free nids for each NAT blocks, and while
scanning all free nid bitmap, do check count and skip lookuping in
full NAT block.
Signed-off-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

586d1492

f2fs: use __set{__clear}_bit_le · 23380b85

由 Jaegeuk Kim 提交于 3月 07, 2017

This patch uses __set{__clear}_bit_le for highter speed.
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

23380b85

f2fs: declare static functions · 9f7e4a2c

由 Jaegeuk Kim 提交于 3月 10, 2017

This is to avoid build warning reported by kbuild test robot.
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

9f7e4a2c

f2fs: don't overwrite node block by SSR · 720037f9

由 Jaegeuk Kim 提交于 3月 06, 2017

This patch fixes that SSR can overwrite previous warm node block consisting of
a node chain since the last checkpoint.

Fixes: 5b6c6be2 ("f2fs: use SSR for warm node as well")
Reviewed-by: NChao Yu <yuchao0@huawei.com>
Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>

720037f9

18 3月, 2017 8 次提交

btrfs: add missing memset while reading compressed inline extents · e1699d2d

由 Zygo Blaxell 提交于 3月 10, 2017

This is a story about 4 distinct (and very old) btrfs bugs.

Commit c8b97818 ("Btrfs: Add zlib compression support") added
three data corruption bugs for inline extents (bugs #1-3).

Commit 93c82d57 ("Btrfs: zero page past end of inline file items")
fixed bug #1:  uncompressed inline extents followed by a hole and more
extents could get non-zero data in the hole as they were read.  The fix
was to add a memset in btrfs_get_extent to zero out the hole.

Commit 166ae5a4 ("btrfs: fix inline compressed read err corruption")
fixed bug #2:  compressed inline extents which contained non-zero bytes
might be replaced with zero bytes in some cases.  This patch removed an
unhelpful memset from uncompress_inline, but the case where memset is
required was missed.

There is also a memset in the decompression code, but this only covers
decompressed data that is shorter than the ram_bytes from the extent
ref record.  This memset doesn't cover the region between the end of the
decompressed data and the end of the page.  It has also moved around a
few times over the years, so there's no single patch to refer to.

This patch fixes bug #3:  compressed inline extents followed by a hole
and more extents could get non-zero data in the hole as they were read
(i.e. bug #3 is the same as bug #1, but s/uncompressed/compressed/).
The fix is the same:  zero out the hole in the compressed case too,
by putting a memset back in uncompress_inline, but this time with
correct parameters.

The last and oldest bug, bug #0, is the cause of the offending inline
extent/hole/extent pattern.  Bug #0 is a subtle and mostly-harmless quirk
of behavior somewhere in the btrfs write code.  In a few special cases,
an inline extent and hole are allowed to persist where they normally
would be combined with later extents in the file.

A fast reproducer for bug #0 is presented below.  A few offending extents
are also created in the wild during large rsync transfers with the -S
flag.  A Linux kernel build (git checkout; make allyesconfig; make -j8)
will produce a handful of offending files as well.  Once an offending
file is created, it can present different content to userspace each
time it is read.

Bug #0 is at least 4 and possibly 8 years old.  I verified every vX.Y
kernel back to v3.5 has this behavior.  There are fossil records of this
bug's effects in commits all the way back to v2.6.32.  I have no reason
to believe bug #0 wasn't present at the beginning of btrfs compression
support in v2.6.29, but I can't easily test kernels that old to be sure.

It is not clear whether bug #0 is worth fixing.  A fix would likely
require injecting extra reads into currently write-only paths, and most
of the exceptional cases caused by bug #0 are already handled now.

Whether we like them or not, bug #0's inline extents followed by holes
are part of the btrfs de-facto disk format now, and we need to be able
to read them without data corruption or an infoleak.  So enough about
bug #0, let's get back to bug #3 (this patch).

An example of on-disk structure leading to data corruption found in
the wild:

        item 61 key (606890 INODE_ITEM 0) itemoff 9662 itemsize 160
                inode generation 50 transid 50 size 47424 nbytes 49141
                block group 0 mode 100644 links 1 uid 0 gid 0
                rdev 0 flags 0x0(none)
        item 62 key (606890 INODE_REF 603050) itemoff 9642 itemsize 20
                inode ref index 3 namelen 10 name: DB_File.so
        item 63 key (606890 EXTENT_DATA 0) itemoff 8280 itemsize 1362
                inline extent data size 1341 ram 4085 compress(zlib)
        item 64 key (606890 EXTENT_DATA 4096) itemoff 8227 itemsize 53
                extent data disk byte 5367308288 nr 20480
                extent data offset 0 nr 45056 ram 45056
                extent compression(zlib)

Different data appears in userspace during each read of the 11 bytes
between 4085 and 4096.  The extent in item 63 is not long enough to
fill the first page of the file, so a memset is required to fill the
space between item 63 (ending at 4085) and item 64 (beginning at 4096)
with zero.

Here is a reproducer from Liu Bo, which demonstrates another method
of creating the same inline extent and hole pattern:

Using 'page_poison=on' kernel command line (or enable
CONFIG_PAGE_POISONING) run the following:

	# touch foo
	# chattr +c foo
	# xfs_io -f -c "pwrite -W 0 1000" foo
	# xfs_io -f -c "falloc 4 8188" foo
	# od -x foo
	# echo 3 >/proc/sys/vm/drop_caches
	# od -x foo

This produce the following on my box:

Correct output:  file contains 1000 data bytes followed
by zeros:

	0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
	*
	0001740 cdcd cdcd cdcd cdcd 0000 0000 0000 0000
	0001760 0000 0000 0000 0000 0000 0000 0000 0000
	*
	0020000

Actual output:  the data after the first 1000 bytes
will be different each run:

	0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
	*
	0001740 cdcd cdcd cdcd cdcd 6c63 7400 635f 006d
	0001760 5f74 6f43 7400 435f 0053 5f74 7363 7400
	0002000 435f 0056 5f74 6164 7400 645f 0062 5f74
	(...)
Signed-off-by: NZygo Blaxell <ce3g8jdj@umail.furryterror.org>
Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
Reviewed-by: NChris Mason <clm@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

e1699d2d

Btrfs: fix regression in lock_delalloc_pages · 49d4a334

由 Liu Bo 提交于 3月 06, 2017

The bug is a regression after commit
(da2c7009 "btrfs: teach __process_pages_contig about PAGE_LOCK operation")
and commit
(76c0021d "Btrfs: use helper to simplify lock/unlock pages").

So if the dirty pages which are under writeback got truncated partially
before we lock the dirty pages, we couldn't find all pages mapping to the
delalloc range, and the bug didn't return an error so it kept going on and
found that the delalloc range got truncated and got to unlock the dirty
pages, and then the ASSERT could caught the error, and showed

-----------------------------------------------------------------------------
assertion failed: page_ops & PAGE_LOCK, file: fs/btrfs/extent_io.c, line: 1716
-----------------------------------------------------------------------------

This fixes the bug by returning the proper -EAGAIN.

Cc: David Sterba <dsterba@suse.com>
Reported-by: NDave Jones <davej@codemonkey.org.uk>
Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

49d4a334

pNFS/flexfiles: never nfs4_mark_deviceid_unavailable · da066f3f

由 Weston Andros Adamson 提交于 3月 09, 2017

The flexfiles layout should never mark a device unavailable.

Move nfs4_mark_deviceid_unavailable out of nfs4_pnfs_ds_connect and call
directly from files layout where it's still needed.

The flexfiles driver still handles marked devices in error paths, but will
now print a rate limited warning.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

da066f3f

pNFS: return status from nfs4_pnfs_ds_connect · a33e4b03

由 Weston Andros Adamson 提交于 3月 09, 2017

The nfs4_pnfs_ds_connect path can call rpc_create which can fail or it
can wait on another context to reach the same failure.

This checks that the rpc_create succeeded and returns the error to the
caller.

When an error is returned, both the files and flexfiles layouts will return
NULL from _prepare_ds(). The flexfiles layout will also return the layout
with the error NFS4ERR_NXIO.
Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

a33e4b03

NFSv4.1 respect server's max size in CREATE_SESSION · 03385332

由 Olga Kornievskaia 提交于 3月 08, 2017

Currently client doesn't respect max sizes server returns in CREATE_SESSION.
nfs4_session_set_rwsize() gets called and server->rsize, server->wsize are 0
so they never get set to the sizes returned by the server.
Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

03385332

NFS prevent double free in async nfs4_exchange_id · 63513232

由 Olga Kornievskaia 提交于 3月 13, 2017

Since rpc_task is async, the release function should be called which
will free the impl_id, scope, and owner.

Trond pointed at 2 more problems:
-- use of client pointer after free in the nfs4_exchangeid_release() function
-- cl_count mismatch if rpc_run_task() isn't run

Fixes: 8d89bd70 ("NFS setup async exchange_id")
Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
Cc: stable@vger.kernel.org # 4.9
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

63513232

nfs: make nfs4_cb_sv_ops static · 05fae7bb

由 Jason Yan 提交于 3月 10, 2017

Fixes the following sparse warning:

fs/nfs/callback.c:235:21: warning: symbol 'nfs4_cb_sv_ops' was not
declared. Should it be static?
Signed-off-by: NJason Yan <yanaijie@huawei.com>
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

05fae7bb

NFS: fix the fault nrequests decreasing for nfs_inode COPY · 38a33101

由 Kinglong Mee 提交于 3月 09, 2017

The nfs_commit_file for NFSv4.2's COPY operation goes through
the commit path for normal WRITE, but without increase nrequests,
so, the nrequests decreased in nfs_commit_release_pages is fault.
After that, the nrequests will be wrong.

[ 5670.299881] ------------[ cut here ]------------
[ 5670.300295] WARNING: CPU: 0 PID: 27656 at fs/nfs/inode.c:127 nfs_clear_inode+0x66/0x90 [nfs]
[ 5670.300558] Modules linked in: nfsv4(E) nfs(E) fscache(E) tun bridge stp llc fuse ip_set nfnetlink vmw_vsock_vmci_transport vsock snd_seq_midi snd_seq_midi_event ppdev f2fs coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_ens1371 intel_rapl_perf gameport snd_ac97_codec vmw_balloon ac97_bus snd_seq snd_pcm joydev snd_rawmidi snd_timer snd_seq_device snd soundcore nfit parport_pc parport acpi_cpufreq tpm_tis tpm_tis_core tpm i2c_piix4 vmw_vmci shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c vmwgfx drm_kms_helper ttm drm e1000 crc32c_intel mptspi scsi_transport_spi serio_raw mptscsih mptbase ata_generic pata_acpi fjes [last unloaded: fscache]
[ 5670.302925] CPU: 0 PID: 27656 Comm: umount.nfs4 Tainted: G        W   E   4.11.0-rc1+ #519
[ 5670.303292] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
[ 5670.304094] Call Trace:
[ 5670.304510]  dump_stack+0x63/0x86
[ 5670.304917]  __warn+0xcb/0xf0
[ 5670.305276]  warn_slowpath_null+0x1d/0x20
[ 5670.305661]  nfs_clear_inode+0x66/0x90 [nfs]
[ 5670.306093]  nfs4_evict_inode+0x61/0x70 [nfsv4]
[ 5670.306480]  evict+0xbb/0x1c0
[ 5670.306888]  dispose_list+0x4d/0x70
[ 5670.307233]  evict_inodes+0x178/0x1a0
[ 5670.307579]  generic_shutdown_super+0x44/0xf0
[ 5670.307985]  nfs_kill_super+0x21/0x40 [nfs]
[ 5670.308325]  deactivate_locked_super+0x43/0x70
[ 5670.308698]  deactivate_super+0x5a/0x60
[ 5670.309036]  cleanup_mnt+0x3f/0x90
[ 5670.309407]  __cleanup_mnt+0x12/0x20
[ 5670.309837]  task_work_run+0x80/0xa0
[ 5670.310162]  exit_to_usermode_loop+0x89/0x90
[ 5670.310497]  syscall_return_slowpath+0xaa/0xb0
[ 5670.310875]  entry_SYSCALL_64_fastpath+0xa7/0xa9
[ 5670.311197] RIP: 0033:0x7f1bb3617fe7
[ 5670.311545] RSP: 002b:00007ffecbabb828 EFLAGS: 00000206 ORIG_RAX: 00000000000000a6
[ 5670.311906] RAX: 0000000000000000 RBX: 0000000001dca1f0 RCX: 00007f1bb3617fe7
[ 5670.312239] RDX: 000000000000000c RSI: 0000000000000001 RDI: 0000000001dc83c0
[ 5670.312653] RBP: 0000000001dc83c0 R08: 0000000000000001 R09: 0000000000000000
[ 5670.312998] R10: 0000000000000755 R11: 0000000000000206 R12: 00007ffecbabc66a
[ 5670.313335] R13: 0000000001dc83a0 R14: 0000000000000000 R15: 0000000000000000
[ 5670.313758] ---[ end trace bf4bfe7764e4eb40 ]---

Cc: linux-kernel@vger.kernel.org
Fixes: 67911c8f ("NFS: Add nfs_commit_file()")
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>

38a33101

17 3月, 2017 8 次提交

kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file() · 966fa72a

由 Vaibhav Jain 提交于 3月 14, 2017

Recently started seeing a kernel oops when a module tries removing a
memory mapped sysfs bin_attribute. On closer investigation the root
cause seems to be kernfs_release_file() trying to call
kernfs_op.release() callback that's NULL for such sysfs
bin_attributes. The oops occurs when kernfs_release_file() is called from
kernfs_drain_open_files() to cleanup any open handles with active
memory mappings.

The patch fixes this by checking for flag KERNFS_HAS_RELEASE before
calling kernfs_release_file() in function kernfs_drain_open_files().

On ppc64-le arch with cxl module the oops back-trace is of the
form below:
[  861.381126] Unable to handle kernel paging request for instruction fetch
[  861.381360] Faulting instruction address: 0x00000000
[  861.381428] Oops: Kernel access of bad area, sig: 11 [#1]
....
[  861.382481] NIP: 0000000000000000 LR: c000000000362c60 CTR:
0000000000000000
....
Call Trace:
[c000000f1680b750] [c000000000362c34] kernfs_drain_open_files+0x104/0x1d0 (unreliable)
[c000000f1680b790] [c00000000035fa00] __kernfs_remove+0x260/0x2c0
[c000000f1680b820] [c000000000360da0] kernfs_remove_by_name_ns+0x60/0xe0
[c000000f1680b8b0] [c0000000003638f4] sysfs_remove_bin_file+0x24/0x40
[c000000f1680b8d0] [c00000000062a164] device_remove_bin_file+0x24/0x40
[c000000f1680b8f0] [d000000009b7b22c] cxl_sysfs_afu_remove+0x144/0x170 [cxl]
[c000000f1680b940] [d000000009b7c7e4] cxl_remove+0x6c/0x1a0 [cxl]
[c000000f1680b990] [c00000000052f694] pci_device_remove+0x64/0x110
[c000000f1680b9d0] [c0000000006321d4] device_release_driver_internal+0x1f4/0x2b0
[c000000f1680ba20] [c000000000525cb0] pci_stop_bus_device+0xa0/0xd0
[c000000f1680ba60] [c000000000525e80] pci_stop_and_remove_bus_device+0x20/0x40
[c000000f1680ba90] [c00000000004a6c4] pci_hp_remove_devices+0x84/0xc0
[c000000f1680bad0] [c00000000004a688] pci_hp_remove_devices+0x48/0xc0
[c000000f1680bb10] [c0000000009dfda4] eeh_reset_device+0xb0/0x290
[c000000f1680bbb0] [c000000000032b4c] eeh_handle_normal_event+0x47c/0x530
[c000000f1680bc60] [c000000000032e64] eeh_handle_event+0x174/0x350
[c000000f1680bd10] [c000000000033228] eeh_event_handler+0x1e8/0x1f0
[c000000f1680bdc0] [c0000000000d384c] kthread+0x14c/0x190
[c000000f1680be30] [c00000000000b5a0] ret_from_kernel_thread+0x5c/0xbc

Fixes: f83f3c51 ("kernfs: fix locking around kernfs_ops->release() callback")
Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

966fa72a

D
afs: Don't wait for page writeback with the page lock held · c5051c7b
由 David Howells 提交于 3月 16, 2017
```
Drop the page lock before waiting for page writeback.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
```
c5051c7b

afs: ->writepage() shouldn't call clear_page_dirty_for_io() · 65a15109

由 David Howells 提交于 3月 16, 2017

The ->writepage() op shouldn't call clear_page_dirty_for_io() as that has
already been called by the caller.

Fix afs_writepage() by moving the call out of
afs_write_back_from_locked_page() to afs_writepages_region() where it is
needed.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

65a15109

afs: Fix abort on signal while waiting for call completion · 954cd6dc

由 David Howells 提交于 3月 16, 2017

Fix the way in which a call that's in progress and being waited for is
aborted in the case that EINTR is detected. We should be sending
RX_USER_ABORT rather than RX_CALL_DEAD as the abort code.

Note that since the only two ways out of the loop are if the call completes
or if a signal happens, the kill-the-call clause after the loop has
finished can only happen in the case of EINTR. This means that we only
have one abort case to deal with, not two, and the "KWC" case can never
happen and so can be deleted.

Note further that simply aborting the call isn't necessarily the best thing
here since at this point: the request has been entirely sent and it's
likely the server will do the operation anyway - whether we abort it or
not. In future, we should punt the handling of the remainder of the call
off to a background thread.
Reported-by: NMarc Dionne <marc.c.dionne@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>

954cd6dc

afs: Fix an off-by-one error in afs_send_pages() · 445783d0

由 David Howells 提交于 3月 16, 2017

afs_send_pages() should only put the call into the AFS_CALL_AWAIT_REPLY
state if it has sent all the pages - but the check it makes is incorrect
and sometimes it will finish the loop early.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

445783d0

afs: Fix afs_kill_pages() · 7286a35e

由 David Howells 提交于 3月 16, 2017

Fix afs_kill_pages() in two ways:

 (1) If a writeback has been partially flushed, then if we try and kill the
     pages it contains, some of them may no longer be undergoing writeback
     and end_page_writeback() will assert.

     Fix this by checking to see whether the page in question is actually
     undergoing writeback before ending that writeback.

 (2) The loop that scans for pages to kill doesn't increase the first page
     index, and so the loop may not terminate, but it will try to process
     the same pages over and over again.

     Fix this by increasing the first page index to one after the last page
     we processed.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

7286a35e

afs: Fix page leak in afs_write_begin() · 6d06b0d2

由 David Howells 提交于 3月 16, 2017

afs_write_begin() leaks a ref and a lock on a page if afs_fill_page()
fails. Fix the leak by unlocking and releasing the page in the error path.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

6d06b0d2

afs: Don't set PG_error on local EINTR or ENOMEM when filling a page · 68ae849d

由 David Howells 提交于 3月 16, 2017

Don't set PG_error on a page if we get local EINTR or ENOMEM when filling a
page for writing.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

68ae849d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功