提交 · 1ec8c0c47f63b4af66e3615618f842267ac88b4a · openeuler / raspberrypi-kernel

01 4月, 2015 7 次提交

nfsd: Remove duplicate macro define for max sec label length · 1ec8c0c4

由 Kinglong Mee 提交于 3月 28, 2015

NFS4_MAXLABELLEN has defined for sec label max length, use it directly.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

1ec8c0c4

nfsd: allow setting acls with unenforceable DENYs · b14f4f7e

由 J. Bruce Fields 提交于 3月 26, 2015

We've been refusing ACLs that DENY permissions that we can't effectively
deny.  (For example, we can't deny permission to read attributes.)

Andreas points out that any DENY of Window's "read", "write", or
"modify" permissions would trigger this.  That would be annoying.

So maybe we should be a little less paranoid, and ignore entirely the
permissions that are meaningless to us.
Reported-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b14f4f7e

nfsd: NFSD_FAULT_INJECTION depends on DEBUG_FS · 629b8729

由 Chengyu Song 提交于 3月 25, 2015

NFSD_FAULT_INJECTION depends on DEBUG_FS, otherwise the debugfs_create_*
interface may return unexpected error -ENODEV, and cause system crash.
Signed-off-by: NChengyu Song <csong84@gatech.edu>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

629b8729

nfsd: remove unused status arg to nfsd4_cleanup_open_state · 42297899

由 Jeff Layton 提交于 3月 23, 2015

Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

42297899

nfsd: remove bogus setting of status in nfsd4_process_open2 · fc26c386

由 Jeff Layton 提交于 3月 23, 2015

status is always reset after this (and it doesn't make much sense there
anyway).
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

fc26c386

NFSD: Use correct reply size calculating function · beaca234

由 Kinglong Mee 提交于 3月 15, 2015

ALLOCATE/DEALLOCATE only reply one status value to client,
so, using nfsd4_only_status_rsize for reply size calculating.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NAnna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

beaca234

NFSD: Using path_equal() for checking two paths · b77a4b2e

由 Kinglong Mee 提交于 3月 15, 2015

Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

b77a4b2e

31 3月, 2015 1 次提交

nfsd: require an explicit option to enable pNFS · f3f03330

由 Christoph Hellwig 提交于 3月 30, 2015

Turns out sending out layouts to any client is a bad idea if they
can't get at the storage device, so require explicit admin action
to enable pNFS.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

f3f03330

26 3月, 2015 5 次提交

NFSD: Fix bad update of layout in nfsd4_return_file_layout · 7890203d

由 Kinglong Mee 提交于 3月 22, 2015

With return layout as, (seg is return layout, lo is record layout)
seg->offset <= lo->offset and layout_end(seg) < layout_end(lo),
nfsd should update lo's offset to seg's end,
and,
seg->offset > lo->offset and layout_end(seg) >= layout_end(lo),
nfsd should update lo's end to seg's offset.

Fixes: 9cf514cc ("nfsd: implement pNFS operations")
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

7890203d

NFSD: Take care the return value from nfsd4_encode_stateid · 376675da

由 Kinglong Mee 提交于 3月 22, 2015

Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

376675da

NFSD: Printk blocklayout length and offset as format 0x%llx · 85369523

由 Kinglong Mee 提交于 3月 22, 2015

When testing pnfs with nfsd_debug on, nfsd print a negative number
of layout length and foff in nfsd4_block_proc_layoutget as,
"GET: -xxxx:-xxx 2"
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

85369523

nfsd: return correct lockowner when there is a race on hash insert · 340f0ba1

由 J. Bruce Fields 提交于 3月 23, 2015

alloc_init_lock_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.

Noticed by inspection after Jeff Layton fixed the same bug for open
owners.  Depending on client behavior, this one may be trickier to
trigger in practice.

Fixes: c58c6610 "nfsd: Protect adding/removing lock owners using client_lock"
Cc: <stable@vger.kernel.org>
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Acked-by: NJeff Layton <jeff.layton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

340f0ba1

nfsd: return correct openowner when there is a race to put one in the hash · c5952338

由 Jeff Layton 提交于 3月 23, 2015

alloc_init_open_stateowner can return an already freed entry if there is
a race to put openowners in the hashtable.

In commit 7ffb5880, we changed it so that we allocate and initialize
an openowner, and then check to see if a matching one got stuffed into
the hashtable in the meantime. If it did, then we free the one we just
allocated and take a reference on the one already there. There is a bug
here though. The code will then return the pointer to the one that was
allocated (and has now been freed).

This wasn't evident before as this race almost never occurred. The Linux
kernel client used to serialize requests for a single openowner.  That
has changed now with v4.0 kernels, and this race can now easily occur.

Fixes: 7ffb5880
Cc: <stable@vger.kernel.org> # v3.17+
Cc: Trond Myklebust <trond.myklebust@primarydata.com>
Reported-by: NChristoph Hellwig <hch@infradead.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

c5952338

21 3月, 2015 5 次提交

NFSD: Put exports after nfsd4_layout_verify fail · a1420384

由 Kinglong Mee 提交于 3月 15, 2015

Fix commit 9cf514cc (nfsd: implement pNFS operations).
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

a1420384

NFSD: Error out when register_shrinker() fail · a68465c9

由 Kinglong Mee 提交于 3月 19, 2015

If register_shrinker() failed, nfsd will cause a NULL pointer access as,

[ 9250.875465] nfsd: last server has exited, flushing export cache
[ 9251.427270] BUG: unable to handle kernel NULL pointer dereference at           (null)
[ 9251.427393] IP: [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0
[ 9251.427579] PGD 13e4d067 PUD 13e4c067 PMD 0
[ 9251.427633] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 9251.427706] Modules linked in: ip6t_rpfilter ip6t_REJECT bnep bluetooth xt_conntrack cfg80211 rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw btrfs xfs microcode ppdev serio_raw pcspkr xor libcrc32c raid6_pq e1000 parport_pc parport i2c_piix4 i2c_core nfsd(OE-) auth_rpcgss nfs_acl lockd sunrpc(E) ata_generic pata_acpi
[ 9251.428240] CPU: 0 PID: 1557 Comm: rmmod Tainted: G           OE 3.16.0-rc2+ #22
[ 9251.428366] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
[ 9251.428496] task: ffff880000849540 ti: ffff8800136f4000 task.ti: ffff8800136f4000
[ 9251.428593] RIP: 0010:[<ffffffff8136fc29>]  [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0
[ 9251.428696] RSP: 0018:ffff8800136f7ea0  EFLAGS: 00010207
[ 9251.428751] RAX: 0000000000000000 RBX: ffffffffa0116d48 RCX: dead000000200200
[ 9251.428814] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0116d48
[ 9251.428876] RBP: ffff8800136f7ea0 R08: ffff8800136f4000 R09: 0000000000000001
[ 9251.428939] R10: 8080808080808080 R11: 0000000000000000 R12: ffffffffa011a5a0
[ 9251.429002] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000018ac090
[ 9251.429064] FS:  00007fb9acef0740(0000) GS:ffff88003fa00000(0000) knlGS:0000000000000000
[ 9251.429164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9251.429221] CR2: 0000000000000000 CR3: 0000000031a17000 CR4: 00000000001407f0
[ 9251.429306] Stack:
[ 9251.429410]  ffff8800136f7eb8 ffffffff8136fcdd ffffffffa0116d20 ffff8800136f7ed0
[ 9251.429511]  ffffffff8118a0f2 0000000000000000 ffff8800136f7ee0 ffffffffa00eb765
[ 9251.429610]  ffff8800136f7ef0 ffffffffa010e93c ffff8800136f7f78 ffffffff81104ac2
[ 9251.429709] Call Trace:
[ 9251.429755]  [<ffffffff8136fcdd>] list_del+0xd/0x30
[ 9251.429896]  [<ffffffff8118a0f2>] unregister_shrinker+0x22/0x40
[ 9251.430037]  [<ffffffffa00eb765>] nfsd_reply_cache_shutdown+0x15/0x90 [nfsd]
[ 9251.430106]  [<ffffffffa010e93c>] exit_nfsd+0x9/0x6cd [nfsd]
[ 9251.430192]  [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200
[ 9251.430280]  [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90
[ 9251.430395]  [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b
[ 9251.430457] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
[ 9251.430691] RIP  [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0
[ 9251.430755]  RSP <ffff8800136f7ea0>
[ 9251.430805] CR2: 0000000000000000
[ 9251.431033] ---[ end trace 080f3050d082b4ea ]---
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

a68465c9

NFSD: Take care the return value from nfsd4_decode_stateid · db59c0ef

由 Kinglong Mee 提交于 3月 19, 2015

Return status after nfsd4_decode_stateid failed.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

db59c0ef

NFSD: Check layout type when returning client layouts · 6f8f28ec

由 Kinglong Mee 提交于 3月 19, 2015

According to RFC5661:
" When lr_returntype is LAYOUTRETURN4_FSID, the current filehandle is used
   to identify the file system and all layouts matching the client ID,
   the fsid of the file system, lora_layout_type, and lora_iomode are
   returned.  When lr_returntype is LAYOUTRETURN4_ALL, all layouts
   matching the client ID, lora_layout_type, and lora_iomode are
   returned and the current filehandle is not used. "

When returning client layouts, always check layout type.
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

6f8f28ec

NFSD: restore trace event lost in mismerge · 715a03d2

由 Kinglong Mee 提交于 3月 20, 2015

31ef83dc "nfsd: add trace events" had a typo that dropped a trace
event and replaced it by an incorrect recursive call to
nfsd4_cb_layout_fail.  133d5582 "Subject: nfsd: don't recursively
call nfsd4_cb_layout_fail" fixed the crash, this restores the
tracepoint.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

715a03d2

20 3月, 2015 1 次提交

Subject: nfsd: don't recursively call nfsd4_cb_layout_fail · 133d5582

由 Christoph Hellwig 提交于 3月 05, 2015

Due to a merge error when creating c5c707f9 ("nfsd: implement pNFS
layout recalls"), we recursively call nfsd4_cb_layout_fail from itself,
leading to stack overflows.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Fixes:  c5c707f9 ("nfsd: implement pNFS layout recalls")
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
---
 fs/nfsd/nfs4layouts.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
index 3c1bfa1..1028a06 100644
--- a/fs/nfsd/nfs4layouts.c
+++ b/fs/nfsd/nfs4layouts.c
@@ -587,8 +587,6 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)

 	rpc_ntop((struct sockaddr *)&clp->cl_addr, addr_str, sizeof(addr_str));

-	nfsd4_cb_layout_fail(ls);
-
 	printk(KERN_WARNING
 		"nfsd: client %s failed to respond to layout recall. "
 		"  Fencing..\n", addr_str);
--
1.9.1

133d5582

14 3月, 2015 1 次提交

locks: fix generic_delete_lease tracepoint to use victim pointer · a9b1b455

由 Jeff Layton 提交于 3月 14, 2015

It's possible that "fl" won't point at a valid lock at this point, so
use "victim" instead which is either a valid lock or NULL.
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>

a9b1b455

13 3月, 2015 3 次提交

fanotify: fix event filtering with FAN_ONDIR set · b3c1030d

由 Suzuki K. Poulose 提交于 3月 12, 2015

With FAN_ONDIR set, the user can end up getting events, which it hasn't
marked.  This was revealed with fanotify04 testcase failure on
Linux-4.0-rc1, and is a regression from 3.19, revealed with 66ba93c0
("fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask").

   # /opt/ltp/testcases/bin/fanotify04
   [ ... ]
  fanotify04    7  TPASS  :  event generated properly for type 100000
  fanotify04    8  TFAIL  :  fanotify04.c:147: got unexpected event 30
  fanotify04    9  TPASS  :  No event as expected

The testcase sets the adds the following marks : FAN_OPEN | FAN_ONDIR for
a fanotify on a dir.  Then does an open(), followed by close() of the
directory and expects to see an event FAN_OPEN(0x20).  However, the
fanotify returns (FAN_OPEN|FAN_CLOSE_NOWRITE(0x10)).  This happens due to
the flaw in the check for event_mask in fanotify_should_send_event() which
does:

	if (event_mask & marks_mask & ~marks_ignored_mask)
		return true;

where, event_mask == (FAN_ONDIR | FAN_CLOSE_NOWRITE),
       marks_mask == (FAN_ONDIR | FAN_OPEN),
       marks_ignored_mask == 0

Fix this by masking the outgoing events to the user, as we already take
care of FAN_ONDIR and FAN_EVENT_ON_CHILD.
Signed-off-by: NSuzuki K. Poulose <suzuki.poulose@arm.com>
Tested-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
Reviewed-by: NJan Kara <jack@suse.cz>
Cc: Eric Paris <eparis@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b3c1030d

nilfs2: fix deadlock of segment constructor during recovery · 283ee148

由 Ryusuke Konishi 提交于 3月 12, 2015

According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck
during recovery at mount time.  The code path that caused the deadlock was
as follows:

  nilfs_fill_super()
    load_nilfs()
      nilfs_salvage_orphan_logs()
        * Do roll-forwarding, attach segment constructor for recovery,
          and kick it.

        nilfs_segctor_thread()
          nilfs_segctor_thread_construct()
           * A lock is held with nilfs_transaction_lock()
             nilfs_segctor_do_construct()
               nilfs_segctor_drop_written_files()
                 iput()
                   iput_final()
                     write_inode_now()
                       writeback_single_inode()
                         __writeback_single_inode()
                           do_writepages()
                             nilfs_writepage()
                               nilfs_construct_dsync_segment()
                                 nilfs_transaction_lock() --> deadlock

This can happen if commit 7ef3ff2f ("nilfs2: fix deadlock of segment
constructor over I_SYNC flag") is applied and roll-forward recovery was
performed at mount time.  The roll-forward recovery can happen if datasync
write is done and the file system crashes immediately after that.  For
instance, we can reproduce the issue with the following steps:

 < nilfs2 is mounted on /nilfs (device: /dev/sdb1) >
 # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync
 # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k
 count=1 && reboot -nfh
 < the system will immediately reboot >
 # mount -t nilfs2 /dev/sdb1 /nilfs

The deadlock occurs because iput() can run segment constructor through
writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags.  The
above commit changed segment constructor so that it calls iput()
asynchronously for inodes with i_nlink == 0, but that change was
imperfect.

This fixes the another deadlock by deferring iput() in segment constructor
even for the case that mount is not finished, that is, for the case that
MS_ACTIVE flag is not set.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Reported-by: NYuxuan Shui <yshuiv7@gmail.com>
Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

283ee148

ocfs2: make append_dio an incompat feature · 18d585f0

由 Mark Fasheh 提交于 3月 12, 2015

It turns out that making this feature ro_compat isn't quite enough to
prevent accidental corruption on mount from older kernels.  Ocfs2 (like
other file systems) will process orphaned inodes even when the user mounts
in 'ro' mode.  So for the case of a filesystem not knowing the append_dio
feature, mounting the filesystem could result in orphaned-for-dio files
being deleted, which we clearly don't want.

So instead, turn this into an incompat flag.

Btw, this is kind of my fault - initially I asked that we add a flag to
cover the feature and even suggested that we use an ro flag.  It wasn't
until I was looking through our commits for v4.0-rc1 that I realized we
actually want this to be incompat.
Signed-off-by: NMark Fasheh <mfasheh@suse.de>
Cc: Joseph Qi <joseph.qi@huawei.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

18d585f0

06 3月, 2015 3 次提交

Btrfs:__add_inode_ref: out of bounds memory read when looking for extended ref. · dd9ef135

由 Quentin Casasnovas 提交于 3月 03, 2015

Improper arithmetics when calculting the address of the extended ref could
lead to an out of bounds memory read and kernel panic.
Signed-off-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.cz>
cc: stable@vger.kernel.org # v3.7+
Signed-off-by: NChris Mason <clm@fb.com>

dd9ef135

Btrfs: fix data loss in the fast fsync path · 3a8b36f3

由 Filipe Manana 提交于 3月 01, 2015

When using the fast file fsync code path we can miss the fact that new
writes happened since the last file fsync and therefore return without
waiting for the IO to finish and write the new extents to the fsync log.

Here's an example scenario where the fsync will miss the fact that new
file data exists that wasn't yet durably persisted:

1. fs_info->last_trans_committed == N - 1 and current transaction is
   transaction N (fs_info->generation == N);

2. do a buffered write;

3. fsync our inode, this clears our inode's full sync flag, starts
   an ordered extent and waits for it to complete - when it completes
   at btrfs_finish_ordered_io(), the inode's last_trans is set to the
   value N (via btrfs_update_inode_fallback -> btrfs_update_inode ->
   btrfs_set_inode_last_trans);

4. transaction N is committed, so fs_info->last_trans_committed is now
   set to the value N and fs_info->generation remains with the value N;

5. do another buffered write, when this happens btrfs_file_write_iter
   sets our inode's last_trans to the value N + 1 (that is
   fs_info->generation + 1 == N + 1);

6. transaction N + 1 is started and fs_info->generation now has the
   value N + 1;

7. transaction N + 1 is committed, so fs_info->last_trans_committed
   is set to the value N + 1;

8. fsync our inode - because it doesn't have the full sync flag set,
   we only start the ordered extent, we don't wait for it to complete
   (only in a later phase) therefore its last_trans field has the
   value N + 1 set previously by btrfs_file_write_iter(), and so we
   have:

       inode->last_trans <= fs_info->last_trans_committed
           (N + 1)              (N + 1)

   Which made us not log the last buffered write and exit the fsync
   handler immediately, returning success (0) to user space and resulting
   in data loss after a crash.

This can actually be triggered deterministically and the following excerpt
from a testcase I made for xfstests triggers the issue. It moves a dummy
file across directories and then fsyncs the old parent directory - this
is just to trigger a transaction commit, so moving files around isn't
directly related to the issue but it was chosen because running 'sync' for
example does more than just committing the current transaction, as it
flushes/waits for all file data to be persisted. The issue can also happen
at random periods, since the transaction kthread periodicaly commits the
current transaction (about every 30 seconds by default).
The body of the test is:

  _scratch_mkfs >> $seqres.full 2>&1
  _init_flakey
  _mount_flakey

  # Create our main test file 'foo', the one we check for data loss.
  # By doing an fsync against our file, it makes btrfs clear the 'needs_full_sync'
  # bit from its flags (btrfs inode specific flags).
  $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 8K" \
                  -c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io

  # Now create one other file and 2 directories. We will move this second file
  # from one directory to the other later because it forces btrfs to commit its
  # currently open transaction if we fsync the old parent directory. This is
  # necessary to trigger the data loss bug that affected btrfs.
  mkdir $SCRATCH_MNT/testdir_1
  touch $SCRATCH_MNT/testdir_1/bar
  mkdir $SCRATCH_MNT/testdir_2

  # Make sure everything is durably persisted.
  sync

  # Write more 8Kb of data to our file.
  $XFS_IO_PROG -c "pwrite -S 0xbb 8K 8K" $SCRATCH_MNT/foo | _filter_xfs_io

  # Move our 'bar' file into a new directory.
  mv $SCRATCH_MNT/testdir_1/bar $SCRATCH_MNT/testdir_2/bar

  # Fsync our first directory. Because it had a file moved into some other
  # directory, this made btrfs commit the currently open transaction. This is
  # a condition necessary to trigger the data loss bug.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir_1

  # Now fsync our main test file. If the fsync succeeds, we expect the 8Kb of
  # data we wrote previously to be persisted and available if a crash happens.
  # This did not happen with btrfs, because of the transaction commit that
  # happened when we fsynced the parent directory.
  $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo

  # Simulate a crash/power loss.
  _load_flakey_table $FLAKEY_DROP_WRITES
  _unmount_flakey

  _load_flakey_table $FLAKEY_ALLOW_WRITES
  _mount_flakey

  # Now check that all data we wrote before are available.
  echo "File content after log replay:"
  od -t x1 $SCRATCH_MNT/foo

  status=0
  exit

The expected golden output for the test, which is what we get with this
fix applied (or when running against ext3/4 and xfs), is:

  wrote 8192/8192 bytes at offset 0
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  wrote 8192/8192 bytes at offset 8192
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  File content after log replay:
  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0020000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
  *
  0040000

Without this fix applied, the output shows the test file does not have
the second 8Kb extent that we successfully fsynced:

  wrote 8192/8192 bytes at offset 0
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  wrote 8192/8192 bytes at offset 8192
  XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
  File content after log replay:
  0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
  *
  0020000

So fix this by skipping the fsync only if we're doing a full sync and
if the inode's last_trans is <= fs_info->last_trans_committed, or if
the inode is already in the log. Also remove setting the inode's
last_trans in btrfs_file_write_iter since it's useless/unreliable.

Also because btrfs_file_write_iter no longer sets inode->last_trans to
fs_info->generation + 1, don't set last_trans to 0 if we bail out and don't
bail out if last_trans is 0, otherwise something as simple as the following
example wouldn't log the second write on the last fsync:

  1. write to file

  2. fsync file

  3. fsync file
       |--> btrfs_inode_in_log() returns true and it set last_trans to 0

  4. write to file
       |--> btrfs_file_write_iter() no longers sets last_trans, so it
            remained with a value of 0
  5. fsync
       |--> inode->last_trans == 0, so it bails out without logging the
            second write

A test case for xfstests will be sent soon.

CC: <stable@vger.kernel.org>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

3a8b36f3

Btrfs: remove extra run_delayed_refs in update_cowonly_root · f5c0a122

由 Josef Bacik 提交于 3月 02, 2015

This got added with my dirty_bgs patch, it's not needed.  Thanks,
Signed-off-by: NJosef Bacik <jbacik@fb.com>
Signed-off-by: NChris Mason <clm@fb.com>

f5c0a122

05 3月, 2015 1 次提交

locks: fix fasync_struct memory leak in lease upgrade/downgrade handling · 0164bf02

由 Jeff Layton 提交于 3月 04, 2015

Commit 8634b51f (locks: convert lease handling to file_lock_context)
introduced a regression in the handling of lease upgrade/downgrades.

In the event that we already have a lease on a file and are going to
either upgrade or downgrade it, we skip doing any list insertion or
deletion and simply re-call lm_setup on the existing lease.

As of commit 8634b51f however, we end up calling lm_setup on the
lease that was passed in, instead of on the existing lease. This causes
us to leak the fasync_struct that was allocated in the event that there
was not already an existing one (as it always appeared that there
wasn't one).

Fixes: 8634b51f (locks: convert lease handling to file_lock_context)
Reported-and-Tested-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>

0164bf02

04 3月, 2015 4 次提交

NFSv4.1: Clear the old state by our client id before establishing a new lease · e11259f9

由 Trond Myklebust 提交于 3月 03, 2015

If the call to exchange-id returns with the EXCHGID4_FLAG_CONFIRMED_R flag
set, then that means our lease was established by a previous mount instance.
Ensure that we detect this situation, and that we clear the state held by
that mount.
Reported-by: NJorge Mora <Jorge.Mora@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

e11259f9

NFSv4: Fix a race in NFSv4.1 server trunking discovery · 48d66b97

由 Trond Myklebust 提交于 3月 03, 2015

We do not want to allow a race with another NFS mount to cause
nfs41_walk_client_list() to establish a lease on our nfs_client before
we're done checking for trunking.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

48d66b97

NFS: Don't write enable new pages while an invalidation is proceeding · ef070dcb

由 Trond Myklebust 提交于 3月 03, 2015

nfs_vm_page_mkwrite() should wait until the page cache invalidation
is finished. This is the second patch in a 2 patch series to deprecate
the NFS client's reliance on nfs_release_page() in the context of
nfs_invalidate_mapping().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ef070dcb

NFS: Fix a regression in the read() syscall · 874f9463

由 Trond Myklebust 提交于 3月 02, 2015

When invalidating the page cache for a regular file, we want to first
sync all dirty data to disk and then call invalidate_inode_pages2().
The latter relies on nfs_launder_page() and nfs_release_page() to deal
respectively with dirty pages, and unstable written pages.

When commit 95905446 ("NFS: avoid deadlocks with loop-back mounted
NFS filesystems.") changed the behaviour of nfs_release_page(), then it
made it possible for invalidate_inode_pages2() to fail with an EBUSY.
Unfortunately, that error is then propagated back to read().

Let's therefore work around the problem for now by protecting the call
to sync the data and invalidate_inode_pages2() so that they are atomic
w.r.t. the addition of new writes.
Later on, we can revisit whether or not we still need nfs_launder_page()
and nfs_release_page().
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

874f9463

03 3月, 2015 9 次提交

eCryptfs: don't pass fs-specific ioctl commands through · 6d65261a

由 Tyler Hicks 提交于 2月 24, 2015

eCryptfs can't be aware of what to expect when after passing an
arbitrary ioctl command through to the lower filesystem. The ioctl
command may trigger an action in the lower filesystem that is
incompatible with eCryptfs.

One specific example is when one attempts to use the Btrfs clone
ioctl command when the source file is in the Btrfs filesystem that
eCryptfs is mounted on top of and the destination fd is from a new file
created in the eCryptfs mount. The ioctl syscall incorrectly returns
success because the command is passed down to Btrfs which thinks that it
was able to do the clone operation. However, the result is an empty
eCryptfs file.

This patch allows the trim, {g,s}etflags, and {g,s}etversion ioctl
commands through and then copies up the inode metadata from the lower
inode to the eCryptfs inode to catch any changes made to the lower
inode's metadata. Those five ioctl commands are mostly common across all
filesystems but the whitelist may need to be further pruned in the
future.

https://bugzilla.kernel.org/show_bug.cgi?id=93691
https://launchpad.net/bugs/1305335Signed-off-by: NTyler Hicks <tyhicks@canonical.com>
Cc: Rocko <rockorequin@hotmail.com>
Cc: Colin Ian King <colin.king@canonical.com>
Cc: stable@vger.kernel.org # v2.6.36+: c43f7b8f eCryptfs: Handle ioctl calls with unlocked and compat functions

6d65261a

NFSv4: Ensure we skip delegations that are already being returned · ec3ca4e5

由 Trond Myklebust 提交于 2月 26, 2015

In nfs_client_return_marked_delegations() and nfs_delegation_reap_unclaimed()
we want to optimise the loop traversal by skipping delegations that are
already in the process of being returned.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ec3ca4e5

NFSv4: Pin the superblock while we're returning the delegation · 9f0f8e12

由 Trond Myklebust 提交于 2月 26, 2015

This patch ensures that the superblock doesn't go ahead and disappear
underneath us while the state manager thread is returning delegations.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

9f0f8e12

NFSv4: Ensure we honour NFS_DELEGATION_RETURNING in nfs_inode_set_delegation() · ade04647

由 Trond Myklebust 提交于 2月 27, 2015

Ensure that nfs_inode_set_delegation() doesn't inadvertently detach a
delegation that is already in the process of being returned.
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

ade04647

T
NFSv4: Ensure that we don't reap a delegation that is being returned · b04b22f4
由 Trond Myklebust 提交于 2月 26, 2015
```
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
```
b04b22f4

NFS: Fix stateid used for NFS v4 closes · 369d6b7f

由 Anna Schumaker 提交于 3月 02, 2015

After 566fcec6 the client uses the "current stateid" from the
nfs4_state structure to close a file.  This could potentially contain a
delegation stateid, which is disallowed by the protocol and causes
servers to return NFS4ERR_BAD_STATEID.  This patch restores the
(correct) behavior of sending the open stateid to close a file.
Reported-by: NOlga Kornievskaia <kolga@netapp.com>
Fixes: 566fcec6 (NFSv4: Fix an atomicity problem in CLOSE)
Signed-off-by: NAnna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

369d6b7f

Btrfs: incremental send, don't rename a directory too soon · 84471e24

由 Filipe Manana 提交于 2月 28, 2015

There's one more case where we can't issue a rename operation for a
directory as soon as we process it. We used to delay directory renames
only if they have some ancestor directory with a higher inode number
that got renamed too, but there's another case where we need to delay
the rename too - when a directory A is renamed to the old name of a
directory B but that directory B has its rename delayed because it
has now (in the send root) an ancestor with a higher inode number that
was renamed. If we don't delay the directory rename in this case, the
receiving end of the send stream will attempt to rename A to the old
name of B before B got renamed to its new name, which results in a
"directory not empty" error. So fix this by delaying directory renames
for this case too.

Steps to reproduce:

  $ mkfs.btrfs -f /dev/sdb
  $ mount /dev/sdb /mnt

  $ mkdir /mnt/a
  $ mkdir /mnt/b
  $ mkdir /mnt/c
  $ touch /mnt/a/file

  $ btrfs subvolume snapshot -r /mnt /mnt/snap1

  $ mv /mnt/c /mnt/x
  $ mv /mnt/a /mnt/x/y
  $ mv /mnt/b /mnt/a

  $ btrfs subvolume snapshot -r /mnt /mnt/snap2

  $ btrfs send /mnt/snap1 -f /tmp/1.send
  $ btrfs send -p /mnt/snap1 /mnt/snap2 -f /tmp/2.send

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt2
  $ btrfs receive /mnt2 -f /tmp/1.send
  $ btrfs receive /mnt2 -f /tmp/2.send
  ERROR: rename b -> a failed. Directory not empty

A test case for xfstests follows soon.
Reported-by: NAmes Cornish <ames@cornishes.net>
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

84471e24

btrfs: fix lost return value due to variable shadowing · 1932b7be

由 David Sterba 提交于 2月 24, 2015

A block-local variable stores error code but btrfs_get_blocks_direct may
not return it in the end as there's a ret defined in the function scope.

CC: <stable@vger.kernel.org>	# 3.6+
Fixes: d187663e ("Btrfs: lock extents as we map them in DIO")
Signed-off-by: NDavid Sterba <dsterba@suse.cz>
Signed-off-by: NChris Mason <clm@fb.com>

1932b7be

Btrfs: do not ignore errors from btrfs_lookup_xattr in do_setxattr · 5cdf83ed

由 Filipe Manana 提交于 2月 23, 2015

The return value from btrfs_lookup_xattr() can be a pointer encoding an
error, therefore deal with it. This fixes commit 5f5bc6b1
("Btrfs: make xattr replace operations atomic").
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NChris Mason <clm@fb.com>

5cdf83ed