提交 · 96bcad506457cfa0c26680446eedefb616c6b079 · openanolis / cloud-kernel

23 6月, 2015 2 次提交

nfsd: fput rd_file from XDR encode context · 96bcad50

由 Christoph Hellwig 提交于 6月 18, 2015

Remove the hack where we fput the read-specific file in generic code.
Instead we can do it in nfsd4_encode_read as that gets called for all
error cases as well.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

96bcad50

nfsd: take struct file setup fully into nfs4_preprocess_stateid_op · af90f707

由 Christoph Hellwig 提交于 6月 18, 2015

This patch changes nfs4_preprocess_stateid_op so it always returns
a valid struct file if it has been asked for that.  For that we
now allocate a temporary struct file for special stateids, and check
permissions if we got the file structure from the stateid.  This
ensures that all callers will get their handling of special stateids
right, and avoids code duplication.

There is a little wart in here because the read code needs to know
if we allocated a file structure so that it can copy around the
read-ahead parameters.  In the long run we should probably aim to
cache full file structures used with special stateids instead.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

af90f707

20 6月, 2015 3 次提交

nfsd: refactor nfs4_preprocess_stateid_op · a0649b2d

由 Christoph Hellwig 提交于 6月 18, 2015

Split out two self contained helpers to make the function more readable.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

a0649b2d

nfsd: clean up raparams handling · e749a462

由 Christoph Hellwig 提交于 6月 18, 2015

Refactor the raparam hash helpers to just deal with the raparms,
and keep opening/closing files separate from that.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e749a462

nfsd: use swap() in sort_pacl_range() · 97b1f9aa

由 Fabian Frederick 提交于 6月 12, 2015

Use kernel.h macro definition.

Thanks to Julia Lawall for Coccinelle scripting support.
Signed-off-by: NFabian Frederick <fabf@skynet.be>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

97b1f9aa

05 6月, 2015 7 次提交

rpcrdma: Merge svcrdma and xprtrdma modules into one · ffe1f0df

由 Chuck Lever 提交于 6月 04, 2015

Bi-directional RPC support means code in svcrdma.ko invokes a bit of
code in xprtrdma.ko, and vice versa. To avoid loader/linker loops,
merge the server and client side modules together into a single
module.

When backchannel capabilities are added, the combined module will
register all needed transport capabilities so that Upper Layer
consumers automatically have everything needed to create a
bi-directional transport connection.

Module aliases are added for backwards compatibility with user
space, which still may expect svcrdma.ko or xprtrdma.ko to be
present.

This commit reverts commit 2e8c12e1 ("xprtrdma: add separate
Kconfig options for NFSoRDMA client and server support") and
provides a single CONFIG option for enabling the new module.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ffe1f0df

svcrdma: Add a separate "max data segs macro for svcrdma · 0380a3f3

由 Chuck Lever 提交于 6月 04, 2015

The server and client maximum are architecturally independent.
Allow changing one without affecting the other.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

0380a3f3

svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAIL · b7e0b9a9

由 Chuck Lever 提交于 6月 04, 2015

At the 2015 LSF/MM, it was requested that memory allocation
call sites that request GFP_KERNEL allocations in a loop should be
annotated with __GFP_NOFAIL.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b7e0b9a9

svcrdma: Keep rpcrdma_msg fields in network byte-order · 30b7e246

由 Chuck Lever 提交于 6月 04, 2015

Fields in struct rpcrdma_msg are __be32. Don't byte-swap these
fields when decoding RPC calls and then swap them back for the
reply. For the most part, they can be left alone.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

30b7e246

svcrdma: Fix byte-swapping in svc_rdma_sendto.c · 70747c25

由 Chuck Lever 提交于 6月 04, 2015

In send_write_chunks(), we have:

	for (xdr_off = rqstp->rq_res.head[0].iov_len, chunk_no = 0;
	     xfer_len && chunk_no < arg_ary->wc_nchunks;
	     chunk_no++) {
		 . . .
	}

Note that arg_ary->wc_nchunk is in network byte-order. For the
comparison to work correctly, both have to be in native byte-order.

In send_reply_chunks, we have:

	write_len = min(xfer_len, htonl(ch->rs_length));

xfer_len is in native byte-order, and ch->rs_length is in
network byte-order. be32_to_cpu() is the correct byte swap
for ch->rs_length.

As an additional clean up, replace ntohl() with be32_to_cpu() in
a few other places.

This appears to address a problem with large rsize hangs while
using PHYSICAL memory registration. I suspect that is the only
registration mode that uses more than one chunk element.

BugLink: https://bugzilla.linux-nfs.org/show_bug.cgi?id=248Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

70747c25

nfsd: Update callback sequnce id only CB_SEQUENCE success · 276f03e3

由 Kinglong Mee 提交于 6月 02, 2015

When testing pnfs layout, nfsd got error NFS4ERR_SEQ_MISORDERED.
It is caused by nfs return NFS4ERR_DELAY before validate_seqid(),
don't update the sequnce id, but nfsd updates the sequnce id !!!

According to RFC5661 20.9.3,
" If CB_SEQUENCE returns an error, then the state of the slot
(sequence ID, cached reply) MUST NOT change. "
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

276f03e3

nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying · 4399396e

由 Kinglong Mee 提交于 6月 02, 2015

nfsd enters a infinite loop and prints message every 10 seconds:

May 31 18:33:52 test-server kernel: Error sending entire callback!
May 31 18:34:01 test-server kernel: Error sending entire callback!

This is caused by a cb_layoutreturn getting error -10008
(NFS4ERR_DELAY), the client crashing, and then nfsd entering the
infinite loop:

  bc_sendto --> call_timeout --> nfsd4_cb_done --> nfsd4_cb_layout_done
  with error -10008 --> rpc_delay(task, HZ/100) --> bc_sendto ...

Reproduced using xfstests 074 with nfs client's kdump on,
CONFIG_DEFAULT_HUNG_TASK_TIMEOUT set, and client's blkmapd down:

1. nfs client's write operation will get the layout of file,
   and then send getdeviceinfo,
2. but layout segment is not recorded by client because blkmapd is down,
3. client writes data by sending WRITE to server,
4. nfs server recalls the layout of the file before WRITE,
5. network error causes the client reset the session and return NFS4ERR_DELAY,
6. so client's WRITE operation is waiting the reply.
   If the task hangs 120s, the client will crash.
7. so that, the next bc_sendto will fail with TIMEOUT,
   and cb_status is NFS4ERR_DELAY.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4399396e

04 6月, 2015 2 次提交

svcrdma: Remove svc_rdma_xdr_decode_deferred_req() · da7049f8

由 Chuck Lever 提交于 5月 26, 2015

svc_rdma_xdr_decode_deferred_req() indexes an array with an
un-byte-swapped value off the wire. Fortunately this function
isn't used anywhere, so simply remove it.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

da7049f8

SUNRPC: Move EXPORT_SYMBOL for svc_process · 3f87d5d6

由 Chuck Lever 提交于 5月 26, 2015

Clean up.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

3f87d5d6

01 6月, 2015 1 次提交

uapi/nfs: Add NFSv4.1 ACL definitions · e2b836cf

由 Andreas Gruenbacher 提交于 4月 22, 2015

Add the ACL related protocol definitions which were added in the NFSv4.1
specification.

(But we're not using them yet.)
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e2b836cf

29 5月, 2015 4 次提交

nfsd: Remove dead declarations · 2f6b3879

由 Andreas Gruenbacher 提交于 4月 24, 2015

Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

2f6b3879

nfsd: work around a gcc-5.1 warning · 6ac75368

由 Arnd Bergmann 提交于 5月 12, 2015

gcc-5.0 warns about a potential uninitialized variable use in nfsd:

fs/nfsd/nfs4state.c: In function 'nfsd4_process_open2':
fs/nfsd/nfs4state.c:3781:3: warning: 'old_deny_bmap' may be used uninitialized in this function [-Wmaybe-uninitialized]
   reset_union_bmap_deny(old_deny_bmap, stp);
   ^
fs/nfsd/nfs4state.c:3760:16: note: 'old_deny_bmap' was declared here
  unsigned char old_deny_bmap;
                ^

This is a false positive, the code path that is warned about cannot
actually be reached.

This adds an initialization for the variable to make the warning go
away.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

6ac75368

nfsd: Checking for acl support does not require fetching any acls · 0c9d65e7

由 Andreas Gruenbacher 提交于 4月 24, 2015

Whether or not a file system supports acls can be determined with
IS_POSIXACL(inode) and does not require trying to fetch any acls; the code for
computing the supported_attrs and aclsupport attributes can be simplified.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

0c9d65e7

nfsd: Disable NFSv2 timestamp workaround for NFSv3+ · cc265089

由 Andreas Gruenbacher 提交于 5月 09, 2015

NFSv2 can set the atime and/or mtime of a file to specific timestamps but not
to the server's current time.  To implement the equivalent of utimes("file",
NULL), it uses a heuristic.

NFSv3 and later do support setting the atime and/or mtime to the server's
current time directly.  The NFSv2 heuristic is still enabled, and causes
timestamps to be set wrong sometimes.

Fix this by moving the heuristic into the NFSv2 specific code.  We can leave it
out of the create code path: the owner can always set timestamps arbitrarily,
and the workaround would never trigger.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

cc265089

07 5月, 2015 1 次提交

nfsd: stop READDIRPLUS returning inconsistent attributes · 43b0e7ea

由 NeilBrown 提交于 5月 03, 2015

The NFSv3 READDIRPLUS gets some of the returned attributes from the
readdir, and some from an inode returned from a new lookup.  The two
objects could be different thanks to intervening renames.

The attributes in READDIRPLUS are optional, so let's just skip them if
we notice this case.
Signed-off-by: NNeilBrown <neilb@suse.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

43b0e7ea

05 5月, 2015 9 次提交

Documentation: remove overloads-avoided counter from knfsd-stats.txt · 72faedae

由 Scott Mayhew 提交于 4月 29, 2015

The 'overloads-avoided' counter itself was removed several years ago by
commit 78c210ef (Revert "knfsd: avoid overloading the CPU scheduler with
enormous load averages").
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

72faedae

nfsd: remove nfsd_close · fd891454

由 Christoph Hellwig 提交于 4月 28, 2015

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

fd891454

nfsd: skip CB_NULL probes for 4.1 or later · 4bd9e9b7

由 Christoph Hellwig 提交于 4月 30, 2015

With sessions in v4.1 or later we don't need to manually probe the backchannel
connection, so we can declare it up instantly after setting up the RPC client.

Note that we really should split nfsd4_run_cb_work in the long run, this is
just the least intrusive fix for now.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4bd9e9b7

nfsd: fix callback restarts · cba5f62b

由 Christoph Hellwig 提交于 4月 30, 2015

Checking the rpc_client pointer is not a reliable way to detect
backchannel changes: cl_cb_client is changed only after shutting down
the rpc client, so the condition cl_cb_client = tk_client will always be
true.

Check the RPC_TASK_KILLED flag instead, and rewrite the code to avoid
the buggy cl_callbacks list and fix the lifetime rules due to double
calls of the ->prepare callback operations method for this retry case.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

cba5f62b

nfsd: split transport vs operation errors for callbacks · ef2a1b3e

由 Christoph Hellwig 提交于 4月 30, 2015

We must only increment the sequence id if the client has seen and responded
to a request. If we failed to deliver it to the client we must resend with
the same sequence id. So just like the client track errors at the transport
level differently from those returned in the XDR.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ef2a1b3e

svcrpc: fix potential GSSX_ACCEPT_SEC_CONTEXT decoding failures · 9507271d

由 Scott Mayhew 提交于 4月 28, 2015

In an environment where the KDC is running Active Directory, the
exported composite name field returned in the context could be large
enough to span a page boundary.  Attaching a scratch buffer to the
decoding xdr_stream helps deal with those cases.

The case where we saw this was actually due to behavior that's been
fixed in newer gss-proxy versions, but we're fixing it here too.
Signed-off-by: NScott Mayhew <smayhew@redhat.com>
Cc: stable@vger.kernel.org
Reviewed-by: NSimo Sorce <simo@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

9507271d

nfsd: fix pNFS return on close semantics · 8287f009

由 Sachin Bhamare 提交于 4月 27, 2015

For the sake of forgetful clients, the server should return the layouts
to the file system on 'last close' of a file (assuming that there are no
delegations outstanding to that particular client) or on delegreturn
(assuming that there are no opens on a file from that particular
client).

In theory the information is all there in current data structures, but
it's not efficiently available; nfs4_file->fi_ref includes references on
the file across all clients, but we need a per-(client, file) count.
Walking through lots of stateid's to calculate this on each close or
delegreturn would be painful.

This patch introduces infrastructure to maintain per-client opens and
delegation counters on a per-file basis.

[hch: ported to the mainline pNFS support, merged various fixes from Jeff]
Signed-off-by: NSachin Bhamare <sachin.bhamare@primarydata.com>
Signed-off-by: NJeff Layton <jlayton@primarydata.com>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

8287f009

nfsd: fix the check for confirmed openowner in nfs4_preprocess_stateid_op · ebe9cb3b

由 Christoph Hellwig 提交于 4月 28, 2015

If we find a non-confirmed openowner we jump to exit the function, but do
not set an error value.  Fix this by factoring out a helper to do the
check and properly set the error from nfsd4_validate_stateid.

Cc: stable@vger.kernel.org
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

ebe9cb3b

nfsd/blocklayout: pretend we can send deviceid notifications · 40cdc7a5

由 Christoph Hellwig 提交于 4月 27, 2015

Commit df52699e ("NFSv4.1: Don't cache deviceids that have no
notifications") causes the Linux NFS client to stop caching deviceid's
unless a server pretends to support deviceid notifications.  While this
behavior is stupid and the language around this area in rfc5661 is a
mess carified by an errata that I submittted, Trond insists on this
behavior.  Not caching deviceids degrades block layout performance
massively as a GETDEVICEINFO is fairly expensive.

So add this hack to make the Linux client happy again.

Cc: stable@vger.kernel.org
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

40cdc7a5

04 5月, 2015 8 次提交

L

Linux 4.1-rc2 · 5ebe6afa
由 Linus Torvalds 提交于 5月 03, 2015

5ebe6afa

Merge tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 8663da2c

由 Linus Torvalds 提交于 5月 03, 2015

Pull ext4 fixes from Ted Ts'o:
 "Some miscellaneous bug fixes and some final on-disk and ABI changes
  for ext4 encryption which provide better security and performance"

* tag 'for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix growing of tiny filesystems
  ext4: move check under lock scope to close a race.
  ext4: fix data corruption caused by unwritten and delayed extents
  ext4 crypto: remove duplicated encryption mode definitions
  ext4 crypto: do not select from EXT4_FS_ENCRYPTION
  ext4 crypto: add padding to filenames before encrypting
  ext4 crypto: simplify and speed up filename encryption

8663da2c

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 101a6fd3

由 Linus Torvalds 提交于 5月 03, 2015

Pull drm fixes from Dave Airlie:
 "One intel fix, one rockchip fix, and a bunch of radeon fixes for some
  regressions from audio rework and vm stability"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
  drm/i915/chv: Implement WaDisableShadowRegForCpd
  drm/radeon: fix userptr return value checking (v2)
  drm/radeon: check new address before removing old one
  drm/radeon: reset BOs address after clearing it.
  drm/radeon: fix lockup when BOs aren't part of the VM on release
  drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
  drm/radeon: adjust pll when audio is not enabled
  drm/radeon: only enable audio streams if the monitor supports it
  drm/radeon: only mark audio as connected if the monitor supports it (v3)
  drm/radeon/audio: don't enable packets until the end
  drm/radeon: drop dce6_dp_enable
  drm/radeon: fix ordering of AVI packet setup
  drm/radeon: Use drm_calloc_ab for CS relocs
  drm/rockchip: fix error check when getting irq
  MAINTAINERS: add entry for Rockchip drm drivers

101a6fd3

Merge tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel into drm-fixes · 71aee819

由 Dave Airlie 提交于 5月 04, 2015

Just a single intel fix
* tag 'drm-intel-fixes-2015-04-30' of git://anongit.freedesktop.org/drm-intel:
  drm/i915/chv: Implement WaDisableShadowRegForCpd

71aee819

Merge branch 'drm-next0420' of https://github.com/markyzq/kernel-drm-rockchip into drm-fixes · df9ebeb2

由 Dave Airlie 提交于 5月 04, 2015

one fix and maintainers update
* 'drm-next0420' of https://github.com/markyzq/kernel-drm-rockchip:
  drm/rockchip: fix error check when getting irq
  MAINTAINERS: add entry for Rockchip drm drivers

df9ebeb2

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 61f06db0

由 Linus Torvalds 提交于 5月 03, 2015

Pull SCSI fixes from James Bottomley:
 "This is three logical fixes (as 5 patches).

  The 3ware class of drivers were causing an oops with multiqueue by
  tearing down the command mappings after completing the command (where
  the variables in the command used to tear down the mapping were
  no-longer valid).  There's also a fix for the qnap iscsi target which
  was choking on us sending it commands that were too long and a fix for
  the reworked aha1542 allocating GFP_KERNEL under a lock"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  3w-9xxx: fix command completion race
  3w-xxxx: fix command completion race
  3w-sas: fix command completion race
  aha1542: Allocate memory before taking a lock
  SCSI: add 1024 max sectors black list flag

61f06db0

Merge branch 'next' of git://git.infradead.org/users/vkoul/slave-dma · 33332224

由 Linus Torvalds 提交于 5月 03, 2015

Pull slave dmaengine fixes from Vinod Koul:
 "Here are the fixes in dmaengine subsystem for rc2:

   - privatecnt fix for slave dma request API by Christopher

   - warn fix for PM ifdef in usb-dmac by Geert

   - fix hardware dependency for xgene by Jean"

* 'next' of git://git.infradead.org/users/vkoul/slave-dma:
  dmaengine: increment privatecnt when using dma_get_any_slave_channel
  dmaengine: xgene: Set hardware dependency
  dmaengine: usb-dmac: Protect PM-only functions to kill warning

33332224

Merge tag 'powerpc-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux · 180d89f6

由 Linus Torvalds 提交于 5月 03, 2015

Pull powerpc fixes from Michael Ellerman:
 - build fix for SMP=n in book3s_xics.c
 - fix for Daniel's pci_controller_ops on powernv.
 - revert the TM syscall abort patch for now.
 - CPU affinity fix from Nathan.
 - two EEH fixes from Gavin.
 - fix for CR corruption from Sam.
 - selftest build fix.

* tag 'powerpc-4.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
  powerpc/powernv: Restore non-volatile CRs after nap
  powerpc/eeh: Delay probing EEH device during hotplug
  powerpc/eeh: Fix race condition in pcibios_set_pcie_reset_state()
  powerpc/pseries: Correct cpu affinity for dlpar added cpus
  selftests/powerpc: Fix the pmu install rule
  Revert "powerpc/tm: Abort syscalls in active transactions"
  powerpc/powernv: Fix early pci_controller_ops loading.
  powerpc/kvm: Fix SMP=n build error in book3s_xics.c

180d89f6

03 5月, 2015 3 次提交

ext4: fix growing of tiny filesystems · 2c869b26

由 Jan Kara 提交于 5月 02, 2015

The estimate of necessary transaction credits in ext4_flex_group_add()
is too pessimistic. It reserves credit for sb, resize inode, and resize
inode dindirect block for each group added in a flex group although they
are always the same block and thus it is enough to account them only
once. Also the number of modified GDT block is overestimated since we
fit EXT4_DESC_PER_BLOCK(sb) descriptors in one block.

Make the estimation more precise. That reduces number of requested
credits enough that we can grow 20 MB filesystem (which has 1 MB
journal, 79 reserved GDT blocks, and flex group size 16 by default).
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Reviewed-by: NEric Sandeen <sandeen@redhat.com>

2c869b26

ext4: move check under lock scope to close a race. · 280227a7

由 Davide Italiano 提交于 5月 02, 2015

fallocate() checks that the file is extent-based and returns
EOPNOTSUPP in case is not. Other tasks can convert from and to
indirect and extent so it's safe to check only after grabbing
the inode mutex.
Signed-off-by: NDavide Italiano <dccitaliano@gmail.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

280227a7

ext4: fix data corruption caused by unwritten and delayed extents · d2dc317d

由 Lukas Czerner 提交于 5月 02, 2015

Currently it is possible to lose whole file system block worth of data
when we hit the specific interaction with unwritten and delayed extents
in status extent tree.

The problem is that when we insert delayed extent into extent status
tree the only way to get rid of it is when we write out delayed buffer.
However there is a limitation in the extent status tree implementation
so that when inserting unwritten extent should there be even a single
delayed block the whole unwritten extent would be marked as delayed.

At this point, there is no way to get rid of the delayed extents,
because there are no delayed buffers to write out. So when a we write
into said unwritten extent we will convert it to written, but it still
remains delayed.

When we try to write into that block later ext4_da_map_blocks() will set
the buffer new and delayed and map it to invalid block which causes
the rest of the block to be zeroed loosing already written data.

For now we can fix this by simply not allowing to set delayed status on
written extent in the extent status tree. Also add WARN_ON() to make
sure that we notice if this happens in the future.

This problem can be easily reproduced by running the following xfs_io.

xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
          -c "falloc 0 131072" \
          -c "pwrite -S 0xbb 65536 2048" \
          -c "fsync" /mnt/test/fff

echo 3 > /proc/sys/vm/drop_caches
xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff

This can be theoretically also reproduced by at random by running fsx,
but it's not very reliable, though on machines with bigger page size
(like ppc) this can be seen more often (especially xfstest generic/127)
Signed-off-by: NLukas Czerner <lczerner@redhat.com>
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org

d2dc317d

openanolis / cloud-kernel 大约 1 年 前同步成功

openanolis / cloud-kernel
大约 1 年前同步成功