1. 01 4月, 2015 7 次提交
  2. 31 3月, 2015 1 次提交
  3. 26 3月, 2015 5 次提交
  4. 21 3月, 2015 5 次提交
    • K
      NFSD: Put exports after nfsd4_layout_verify fail · a1420384
      Kinglong Mee 提交于
      Fix commit 9cf514cc (nfsd: implement pNFS operations).
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a1420384
    • K
      NFSD: Error out when register_shrinker() fail · a68465c9
      Kinglong Mee 提交于
      If register_shrinker() failed, nfsd will cause a NULL pointer access as,
      
      [ 9250.875465] nfsd: last server has exited, flushing export cache
      [ 9251.427270] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [ 9251.427393] IP: [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0
      [ 9251.427579] PGD 13e4d067 PUD 13e4c067 PMD 0
      [ 9251.427633] Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
      [ 9251.427706] Modules linked in: ip6t_rpfilter ip6t_REJECT bnep bluetooth xt_conntrack cfg80211 rfkill ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw btrfs xfs microcode ppdev serio_raw pcspkr xor libcrc32c raid6_pq e1000 parport_pc parport i2c_piix4 i2c_core nfsd(OE-) auth_rpcgss nfs_acl lockd sunrpc(E) ata_generic pata_acpi
      [ 9251.428240] CPU: 0 PID: 1557 Comm: rmmod Tainted: G           OE 3.16.0-rc2+ #22
      [ 9251.428366] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
      [ 9251.428496] task: ffff880000849540 ti: ffff8800136f4000 task.ti: ffff8800136f4000
      [ 9251.428593] RIP: 0010:[<ffffffff8136fc29>]  [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0
      [ 9251.428696] RSP: 0018:ffff8800136f7ea0  EFLAGS: 00010207
      [ 9251.428751] RAX: 0000000000000000 RBX: ffffffffa0116d48 RCX: dead000000200200
      [ 9251.428814] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffffa0116d48
      [ 9251.428876] RBP: ffff8800136f7ea0 R08: ffff8800136f4000 R09: 0000000000000001
      [ 9251.428939] R10: 8080808080808080 R11: 0000000000000000 R12: ffffffffa011a5a0
      [ 9251.429002] R13: 0000000000000800 R14: 0000000000000000 R15: 00000000018ac090
      [ 9251.429064] FS:  00007fb9acef0740(0000) GS:ffff88003fa00000(0000) knlGS:0000000000000000
      [ 9251.429164] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 9251.429221] CR2: 0000000000000000 CR3: 0000000031a17000 CR4: 00000000001407f0
      [ 9251.429306] Stack:
      [ 9251.429410]  ffff8800136f7eb8 ffffffff8136fcdd ffffffffa0116d20 ffff8800136f7ed0
      [ 9251.429511]  ffffffff8118a0f2 0000000000000000 ffff8800136f7ee0 ffffffffa00eb765
      [ 9251.429610]  ffff8800136f7ef0 ffffffffa010e93c ffff8800136f7f78 ffffffff81104ac2
      [ 9251.429709] Call Trace:
      [ 9251.429755]  [<ffffffff8136fcdd>] list_del+0xd/0x30
      [ 9251.429896]  [<ffffffff8118a0f2>] unregister_shrinker+0x22/0x40
      [ 9251.430037]  [<ffffffffa00eb765>] nfsd_reply_cache_shutdown+0x15/0x90 [nfsd]
      [ 9251.430106]  [<ffffffffa010e93c>] exit_nfsd+0x9/0x6cd [nfsd]
      [ 9251.430192]  [<ffffffff81104ac2>] SyS_delete_module+0x162/0x200
      [ 9251.430280]  [<ffffffff81013b69>] ? do_notify_resume+0x59/0x90
      [ 9251.430395]  [<ffffffff816f2369>] system_call_fastpath+0x16/0x1b
      [ 9251.430457] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
      [ 9251.430691] RIP  [<ffffffff8136fc29>] __list_del_entry+0x29/0xd0
      [ 9251.430755]  RSP <ffff8800136f7ea0>
      [ 9251.430805] CR2: 0000000000000000
      [ 9251.431033] ---[ end trace 080f3050d082b4ea ]---
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      a68465c9
    • K
      NFSD: Take care the return value from nfsd4_decode_stateid · db59c0ef
      Kinglong Mee 提交于
      Return status after nfsd4_decode_stateid failed.
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      db59c0ef
    • K
      NFSD: Check layout type when returning client layouts · 6f8f28ec
      Kinglong Mee 提交于
      According to RFC5661:
      " When lr_returntype is LAYOUTRETURN4_FSID, the current filehandle is used
         to identify the file system and all layouts matching the client ID,
         the fsid of the file system, lora_layout_type, and lora_iomode are
         returned.  When lr_returntype is LAYOUTRETURN4_ALL, all layouts
         matching the client ID, lora_layout_type, and lora_iomode are
         returned and the current filehandle is not used. "
      
      When returning client layouts, always check layout type.
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      6f8f28ec
    • K
      NFSD: restore trace event lost in mismerge · 715a03d2
      Kinglong Mee 提交于
      31ef83dc "nfsd: add trace events" had a typo that dropped a trace
      event and replaced it by an incorrect recursive call to
      nfsd4_cb_layout_fail.  133d5582 "Subject: nfsd: don't recursively
      call nfsd4_cb_layout_fail" fixed the crash, this restores the
      tracepoint.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      715a03d2
  5. 20 3月, 2015 1 次提交
    • C
      Subject: nfsd: don't recursively call nfsd4_cb_layout_fail · 133d5582
      Christoph Hellwig 提交于
      Due to a merge error when creating c5c707f9 ("nfsd: implement pNFS
      layout recalls"), we recursively call nfsd4_cb_layout_fail from itself,
      leading to stack overflows.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Fixes:  c5c707f9 ("nfsd: implement pNFS layout recalls")
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ---
       fs/nfsd/nfs4layouts.c | 2 --
       1 file changed, 2 deletions(-)
      
      diff --git a/fs/nfsd/nfs4layouts.c b/fs/nfsd/nfs4layouts.c
      index 3c1bfa1..1028a06 100644
      --- a/fs/nfsd/nfs4layouts.c
      +++ b/fs/nfsd/nfs4layouts.c
      @@ -587,8 +587,6 @@ nfsd4_cb_layout_fail(struct nfs4_layout_stateid *ls)
      
       	rpc_ntop((struct sockaddr *)&clp->cl_addr, addr_str, sizeof(addr_str));
      
      -	nfsd4_cb_layout_fail(ls);
      -
       	printk(KERN_WARNING
       		"nfsd: client %s failed to respond to layout recall. "
       		"  Fencing..\n", addr_str);
      --
      1.9.1
      133d5582
  6. 14 3月, 2015 1 次提交
  7. 13 3月, 2015 3 次提交
    • S
      fanotify: fix event filtering with FAN_ONDIR set · b3c1030d
      Suzuki K. Poulose 提交于
      With FAN_ONDIR set, the user can end up getting events, which it hasn't
      marked.  This was revealed with fanotify04 testcase failure on
      Linux-4.0-rc1, and is a regression from 3.19, revealed with 66ba93c0
      ("fanotify: don't set FAN_ONDIR implicitly on a marks ignored mask").
      
         # /opt/ltp/testcases/bin/fanotify04
         [ ... ]
        fanotify04    7  TPASS  :  event generated properly for type 100000
        fanotify04    8  TFAIL  :  fanotify04.c:147: got unexpected event 30
        fanotify04    9  TPASS  :  No event as expected
      
      The testcase sets the adds the following marks : FAN_OPEN | FAN_ONDIR for
      a fanotify on a dir.  Then does an open(), followed by close() of the
      directory and expects to see an event FAN_OPEN(0x20).  However, the
      fanotify returns (FAN_OPEN|FAN_CLOSE_NOWRITE(0x10)).  This happens due to
      the flaw in the check for event_mask in fanotify_should_send_event() which
      does:
      
      	if (event_mask & marks_mask & ~marks_ignored_mask)
      		return true;
      
      where, event_mask == (FAN_ONDIR | FAN_CLOSE_NOWRITE),
             marks_mask == (FAN_ONDIR | FAN_OPEN),
             marks_ignored_mask == 0
      
      Fix this by masking the outgoing events to the user, as we already take
      care of FAN_ONDIR and FAN_EVENT_ON_CHILD.
      Signed-off-by: NSuzuki K. Poulose <suzuki.poulose@arm.com>
      Tested-by: NLino Sanfilippo <LinoSanfilippo@gmx.de>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3c1030d
    • R
      nilfs2: fix deadlock of segment constructor during recovery · 283ee148
      Ryusuke Konishi 提交于
      According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck
      during recovery at mount time.  The code path that caused the deadlock was
      as follows:
      
        nilfs_fill_super()
          load_nilfs()
            nilfs_salvage_orphan_logs()
              * Do roll-forwarding, attach segment constructor for recovery,
                and kick it.
      
              nilfs_segctor_thread()
                nilfs_segctor_thread_construct()
                 * A lock is held with nilfs_transaction_lock()
                   nilfs_segctor_do_construct()
                     nilfs_segctor_drop_written_files()
                       iput()
                         iput_final()
                           write_inode_now()
                             writeback_single_inode()
                               __writeback_single_inode()
                                 do_writepages()
                                   nilfs_writepage()
                                     nilfs_construct_dsync_segment()
                                       nilfs_transaction_lock() --> deadlock
      
      This can happen if commit 7ef3ff2f ("nilfs2: fix deadlock of segment
      constructor over I_SYNC flag") is applied and roll-forward recovery was
      performed at mount time.  The roll-forward recovery can happen if datasync
      write is done and the file system crashes immediately after that.  For
      instance, we can reproduce the issue with the following steps:
      
       < nilfs2 is mounted on /nilfs (device: /dev/sdb1) >
       # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync
       # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k
       count=1 && reboot -nfh
       < the system will immediately reboot >
       # mount -t nilfs2 /dev/sdb1 /nilfs
      
      The deadlock occurs because iput() can run segment constructor through
      writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags.  The
      above commit changed segment constructor so that it calls iput()
      asynchronously for inodes with i_nlink == 0, but that change was
      imperfect.
      
      This fixes the another deadlock by deferring iput() in segment constructor
      even for the case that mount is not finished, that is, for the case that
      MS_ACTIVE flag is not set.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Reported-by: NYuxuan Shui <yshuiv7@gmail.com>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      283ee148
    • M
      ocfs2: make append_dio an incompat feature · 18d585f0
      Mark Fasheh 提交于
      It turns out that making this feature ro_compat isn't quite enough to
      prevent accidental corruption on mount from older kernels.  Ocfs2 (like
      other file systems) will process orphaned inodes even when the user mounts
      in 'ro' mode.  So for the case of a filesystem not knowing the append_dio
      feature, mounting the filesystem could result in orphaned-for-dio files
      being deleted, which we clearly don't want.
      
      So instead, turn this into an incompat flag.
      
      Btw, this is kind of my fault - initially I asked that we add a flag to
      cover the feature and even suggested that we use an ro flag.  It wasn't
      until I was looking through our commits for v4.0-rc1 that I realized we
      actually want this to be incompat.
      Signed-off-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18d585f0
  8. 06 3月, 2015 3 次提交
    • Q
      Btrfs:__add_inode_ref: out of bounds memory read when looking for extended ref. · dd9ef135
      Quentin Casasnovas 提交于
      Improper arithmetics when calculting the address of the extended ref could
      lead to an out of bounds memory read and kernel panic.
      Signed-off-by: NQuentin Casasnovas <quentin.casasnovas@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      cc: stable@vger.kernel.org # v3.7+
      Signed-off-by: NChris Mason <clm@fb.com>
      dd9ef135
    • F
      Btrfs: fix data loss in the fast fsync path · 3a8b36f3
      Filipe Manana 提交于
      When using the fast file fsync code path we can miss the fact that new
      writes happened since the last file fsync and therefore return without
      waiting for the IO to finish and write the new extents to the fsync log.
      
      Here's an example scenario where the fsync will miss the fact that new
      file data exists that wasn't yet durably persisted:
      
      1. fs_info->last_trans_committed == N - 1 and current transaction is
         transaction N (fs_info->generation == N);
      
      2. do a buffered write;
      
      3. fsync our inode, this clears our inode's full sync flag, starts
         an ordered extent and waits for it to complete - when it completes
         at btrfs_finish_ordered_io(), the inode's last_trans is set to the
         value N (via btrfs_update_inode_fallback -> btrfs_update_inode ->
         btrfs_set_inode_last_trans);
      
      4. transaction N is committed, so fs_info->last_trans_committed is now
         set to the value N and fs_info->generation remains with the value N;
      
      5. do another buffered write, when this happens btrfs_file_write_iter
         sets our inode's last_trans to the value N + 1 (that is
         fs_info->generation + 1 == N + 1);
      
      6. transaction N + 1 is started and fs_info->generation now has the
         value N + 1;
      
      7. transaction N + 1 is committed, so fs_info->last_trans_committed
         is set to the value N + 1;
      
      8. fsync our inode - because it doesn't have the full sync flag set,
         we only start the ordered extent, we don't wait for it to complete
         (only in a later phase) therefore its last_trans field has the
         value N + 1 set previously by btrfs_file_write_iter(), and so we
         have:
      
             inode->last_trans <= fs_info->last_trans_committed
                 (N + 1)              (N + 1)
      
         Which made us not log the last buffered write and exit the fsync
         handler immediately, returning success (0) to user space and resulting
         in data loss after a crash.
      
      This can actually be triggered deterministically and the following excerpt
      from a testcase I made for xfstests triggers the issue. It moves a dummy
      file across directories and then fsyncs the old parent directory - this
      is just to trigger a transaction commit, so moving files around isn't
      directly related to the issue but it was chosen because running 'sync' for
      example does more than just committing the current transaction, as it
      flushes/waits for all file data to be persisted. The issue can also happen
      at random periods, since the transaction kthread periodicaly commits the
      current transaction (about every 30 seconds by default).
      The body of the test is:
      
        _scratch_mkfs >> $seqres.full 2>&1
        _init_flakey
        _mount_flakey
      
        # Create our main test file 'foo', the one we check for data loss.
        # By doing an fsync against our file, it makes btrfs clear the 'needs_full_sync'
        # bit from its flags (btrfs inode specific flags).
        $XFS_IO_PROG -f -c "pwrite -S 0xaa 0 8K" \
                        -c "fsync" $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Now create one other file and 2 directories. We will move this second file
        # from one directory to the other later because it forces btrfs to commit its
        # currently open transaction if we fsync the old parent directory. This is
        # necessary to trigger the data loss bug that affected btrfs.
        mkdir $SCRATCH_MNT/testdir_1
        touch $SCRATCH_MNT/testdir_1/bar
        mkdir $SCRATCH_MNT/testdir_2
      
        # Make sure everything is durably persisted.
        sync
      
        # Write more 8Kb of data to our file.
        $XFS_IO_PROG -c "pwrite -S 0xbb 8K 8K" $SCRATCH_MNT/foo | _filter_xfs_io
      
        # Move our 'bar' file into a new directory.
        mv $SCRATCH_MNT/testdir_1/bar $SCRATCH_MNT/testdir_2/bar
      
        # Fsync our first directory. Because it had a file moved into some other
        # directory, this made btrfs commit the currently open transaction. This is
        # a condition necessary to trigger the data loss bug.
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir_1
      
        # Now fsync our main test file. If the fsync succeeds, we expect the 8Kb of
        # data we wrote previously to be persisted and available if a crash happens.
        # This did not happen with btrfs, because of the transaction commit that
        # happened when we fsynced the parent directory.
        $XFS_IO_PROG -c "fsync" $SCRATCH_MNT/foo
      
        # Simulate a crash/power loss.
        _load_flakey_table $FLAKEY_DROP_WRITES
        _unmount_flakey
      
        _load_flakey_table $FLAKEY_ALLOW_WRITES
        _mount_flakey
      
        # Now check that all data we wrote before are available.
        echo "File content after log replay:"
        od -t x1 $SCRATCH_MNT/foo
      
        status=0
        exit
      
      The expected golden output for the test, which is what we get with this
      fix applied (or when running against ext3/4 and xfs), is:
      
        wrote 8192/8192 bytes at offset 0
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        wrote 8192/8192 bytes at offset 8192
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        File content after log replay:
        0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
        *
        0020000 bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb bb
        *
        0040000
      
      Without this fix applied, the output shows the test file does not have
      the second 8Kb extent that we successfully fsynced:
      
        wrote 8192/8192 bytes at offset 0
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        wrote 8192/8192 bytes at offset 8192
        XXX Bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
        File content after log replay:
        0000000 aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa aa
        *
        0020000
      
      So fix this by skipping the fsync only if we're doing a full sync and
      if the inode's last_trans is <= fs_info->last_trans_committed, or if
      the inode is already in the log. Also remove setting the inode's
      last_trans in btrfs_file_write_iter since it's useless/unreliable.
      
      Also because btrfs_file_write_iter no longer sets inode->last_trans to
      fs_info->generation + 1, don't set last_trans to 0 if we bail out and don't
      bail out if last_trans is 0, otherwise something as simple as the following
      example wouldn't log the second write on the last fsync:
      
        1. write to file
      
        2. fsync file
      
        3. fsync file
             |--> btrfs_inode_in_log() returns true and it set last_trans to 0
      
        4. write to file
             |--> btrfs_file_write_iter() no longers sets last_trans, so it
                  remained with a value of 0
        5. fsync
             |--> inode->last_trans == 0, so it bails out without logging the
                  second write
      
      A test case for xfstests will be sent soon.
      
      CC: <stable@vger.kernel.org>
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3a8b36f3
    • J
      Btrfs: remove extra run_delayed_refs in update_cowonly_root · f5c0a122
      Josef Bacik 提交于
      This got added with my dirty_bgs patch, it's not needed.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f5c0a122
  9. 05 3月, 2015 1 次提交
    • J
      locks: fix fasync_struct memory leak in lease upgrade/downgrade handling · 0164bf02
      Jeff Layton 提交于
      Commit 8634b51f (locks: convert lease handling to file_lock_context)
      introduced a regression in the handling of lease upgrade/downgrades.
      
      In the event that we already have a lease on a file and are going to
      either upgrade or downgrade it, we skip doing any list insertion or
      deletion and simply re-call lm_setup on the existing lease.
      
      As of commit 8634b51f however, we end up calling lm_setup on the
      lease that was passed in, instead of on the existing lease. This causes
      us to leak the fasync_struct that was allocated in the event that there
      was not already an existing one (as it always appeared that there
      wasn't one).
      
      Fixes: 8634b51f (locks: convert lease handling to file_lock_context)
      Reported-and-Tested-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      0164bf02
  10. 04 3月, 2015 4 次提交
  11. 03 3月, 2015 9 次提交