1. 17 3月, 2016 3 次提交
    • J
      nfs4: nfs4_ff_layout_prepare_ds should return NULL if connection failed · 849dc324
      Jeff Layton 提交于
      I hit the following oops out of the blue while testing with flexfiles:
      
      BUG: unable to handle kernel NULL pointer dereference at 00000000000000e8
      IP: [<ffffffffa048f6b8>] nfs4_ff_find_or_create_ds_client+0x48/0x50 [nfs_layout_flexfiles]
      PGD 44031067 PUD 5062d067 PMD 0
      Oops: 0000 [#1] SMP
      Modules linked in: nfsv3 nfs_layout_flexfiles tun rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache dcdbas nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw bonding ipmi_devintf ipmi_msghandler snd_hda_codec_generic virtio_balloon ppdev snd_hda_intel snd_hda_controller snd_hda_codec iosf_mbi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel snd_hda_core parport_pc snd_hwdep parport snd_seq snd_seq_device snd_pcm snd_timer acpi_cpufreq
       snd soundcore i2c_piix4 xfs libcrc32c joydev virtio_net virtio_console qxl drm_kms_helper ttm crc32c_intel drm virtio_pci serio_raw ata_generic virtio_ring virtio pata_acpi
      CPU: 0 PID: 19138 Comm: test5 Not tainted 4.1.9-100.pd.90.el7.x86_64 #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014
      task: ffff88007b70cf00 ti: ffff88004cc44000 task.ti: ffff88004cc44000
      RIP: 0010:[<ffffffffa048f6b8>]  [<ffffffffa048f6b8>] nfs4_ff_find_or_create_ds_client+0x48/0x50 [nfs_layout_flexfiles]
      RSP: 0018:ffff88004cc47890  EFLAGS: 00010246
      RAX: 0000000000000003 RBX: ffff880050932300 RCX: ffff88006978f488
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003e0e8540
      RBP: ffff88004cc47908 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff88007ff8c758 R11: 0000000000000005 R12: ffff88003e0e8540
      R13: 0000000000000000 R14: ffff88006978f488 R15: ffff88004431cc80
      FS:  00007fea40c7c740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000000000e8 CR3: 0000000044318000 CR4: 00000000000406f0
      Stack:
       ffffffffa048c934 ffff880050932310 0000000100000001 ffff88006978f510
       ffff88006978f3c8 ffff88003e56cd90 ffff88004cc479d0 00000020a052aff0
       000000000004b000 ffff88004cc47908 ffff880050932300 ffff88004cc479d0
      Call Trace:
       [<ffffffffa048c934>] ? ff_layout_write_pagelist+0x64/0x220 [nfs_layout_flexfiles]
       [<ffffffffa057a3bf>] pnfs_generic_pg_writepages+0xaf/0x1b0 [nfsv4]
       [<ffffffffa051ab57>] nfs_pageio_doio+0x27/0x60 [nfs]
       [<ffffffffa051bfe4>] nfs_pageio_complete_mirror+0x54/0xa0 [nfs]
       [<ffffffffa051c7ad>] nfs_pageio_complete+0x2d/0x90 [nfs]
       [<ffffffffa052032d>] nfs_writepage_locked+0x8d/0xe0 [nfs]
       [<ffffffff811e4630>] ? page_referenced_one+0x1a0/0x1a0
       [<ffffffffa05210e7>] nfs_wb_single_page+0xf7/0x190 [nfs]
       [<ffffffffa05108d1>] nfs_launder_page+0x41/0x90 [nfs]
       [<ffffffff811b8930>] invalidate_inode_pages2_range+0x340/0x3a0
       [<ffffffff811b89a7>] invalidate_inode_pages2+0x17/0x20
       [<ffffffffa0513e1e>] nfs_release+0x9e/0xb0 [nfs]
       [<ffffffffa050fa1d>] nfs_file_release+0x3d/0x60 [nfs]
       [<ffffffff8122481c>] __fput+0xdc/0x1e0
       [<ffffffff8122496e>] ____fput+0xe/0x10
       [<ffffffff810bde67>] task_work_run+0xa7/0xe0
       [<ffffffff810af735>] get_signal+0x565/0x600
       [<ffffffff811a9815>] ? __filemap_fdatawrite_range+0x65/0x90
       [<ffffffff810144a7>] do_signal+0x37/0x730
       [<ffffffffa0569921>] ? nfs4_file_fsync+0x81/0x150 [nfsv4]
       [<ffffffff81254dbb>] ? vfs_fsync_range+0x3b/0xb0
       [<ffffffff811446a6>] ? __audit_syscall_exit+0x1e6/0x280
       [<ffffffff81014bff>] do_notify_resume+0x5f/0xa0
       [<ffffffff8178ec3c>] int_signal+0x12/0x17
      Code: 48 8b 40 70 8b 00 83 f8 03 74 20 83 f8 04 75 13 55 48 89 ce 48 89 d7 48 89 e5 e8 14 0f 0e 00 5d c3 66 90 0f 0b 66 0f 1f 44 00 00 <48> 8b 82 e8 00 00 00 c3 66 66 66 66 90 55 48 89 e5 41 57 41 56
      RIP  [<ffffffffa048f6b8>] nfs4_ff_find_or_create_ds_client+0x48/0x50 [nfs_layout_flexfiles]
       RSP <ffff88004cc47890>
      CR2: 00000000000000e8
      
      When the DS connection attempt fails, nfs4_ff_layout_prepare_ds marks it
      for the error but then just returns the ds as if it were usable. The
      comments though say:
      
        /* Upon return, either ds is connected, or ds is NULL */
      
      Ensure that we set the return pointer to NULL in the event that the
      connection attempt fails.
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      849dc324
    • C
      nfs: remove nfs_inode_dio_wait · 95d9f6c3
      Christoph Hellwig 提交于
      Just call inode_dio_wait directly instead of through a pointless wrapper.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      95d9f6c3
    • C
      nfs: remove nfs4_file_fsync · 4ff79bc7
      Christoph Hellwig 提交于
      The only difference to nfs_file_fsync is the call to pnfs_sync_inode.  But
      pnfs_sync_inode is just an inline that calls a pNFS layout driver method
      if CONFIG_PNFS is designed, and thus can be called just fine from the core
      NFS module.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      4ff79bc7
  2. 28 2月, 2016 14 次提交
  3. 23 2月, 2016 2 次提交
  4. 20 2月, 2016 4 次提交
    • M
      fs/pnode.c: treat zero mnt_group_id-s as unequal · 7ae8fd03
      Maxim Patlasov 提交于
      propagate_one(m) calculates "type" argument for copy_tree() like this:
      
      >    if (m->mnt_group_id == last_dest->mnt_group_id) {
      >        type = CL_MAKE_SHARED;
      >    } else {
      >        type = CL_SLAVE;
      >        if (IS_MNT_SHARED(m))
      >           type |= CL_MAKE_SHARED;
      >   }
      
      The "type" argument then governs clone_mnt() behavior with respect to flags
      and mnt_master of new mount. When we iterate through a slave group, it is
      possible that both current "m" and "last_dest" are not shared (although,
      both are slaves, i.e. have non-NULL mnt_master-s). Then the comparison
      above erroneously makes new mount shared and sets its mnt_master to
      last_source->mnt_master. The patch fixes the problem by handling zero
      mnt_group_id-s as though they are unequal.
      
      The similar problem exists in the implementation of "else" clause above
      when we have to ascend upward in the master/slave tree by calling:
      
      >    last_source = last_source->mnt_master;
      >    last_dest = last_source->mnt_parent;
      
      proper number of times. The last step is governed by
      "n->mnt_group_id != last_dest->mnt_group_id" condition that may lie if
      both are zero. The patch fixes this case in the same way as the former one.
      
      [AV: don't open-code an obvious helper...]
      Signed-off-by: NMaxim Patlasov <mpatlasov@virtuozzo.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7ae8fd03
    • A
      affs_do_readpage_ofs(): just use kmap_atomic() around memcpy() · 0bacbe52
      Al Viro 提交于
      It forgets kunmap() on a failure exit, but there's really no point keeping
      the page kmapped at all - after all, what we are doing is a bunch of memcpy()
      into the parts of page, so kmap_atomic()/kunmap_atomic() just around those
      memcpy() is enough.
      Spotted-by: NInsu Yun <wuninsu@gmail.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0bacbe52
    • M
      xattr handlers: plug a lock leak in simple_xattr_list · 0e9a7da5
      Mateusz Guzik 提交于
      The code could leak xattrs->lock on error.
      
      Problem introduced with 786534b9 "tmpfs: listxattr should
      include POSIX ACL xattrs".
      Signed-off-by: NMateusz Guzik <mguzik@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0e9a7da5
    • W
      fs: allow no_seek_end_llseek to actually seek · 2feb55f8
      Wouter van Kesteren 提交于
      The user-visible impact of the issue is for example that without this
      patch sensors-detect breaks when trying to seek in /dev/cpu/0/cpuid.
      
      '~0ULL' is a 'unsigned long long' that when converted to a loff_t,
      which is signed, gets turned into -1. later in vfs_setpos we have
      'if (offset > maxsize)', which makes it always return EINVAL.
      
      Fixes: b25472f9 ("new helpers: no_seek_end_llseek{,_size}()")
      Signed-off-by: NWouter van Kesteren <woutershep@gmail.com>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2feb55f8
  5. 19 2月, 2016 4 次提交
    • J
      ext4: fix crashes in dioread_nolock mode · 74dae427
      Jan Kara 提交于
      Competing overwrite DIO in dioread_nolock mode will just overwrite
      pointer to io_end in the inode. This may result in data corruption or
      extent conversion happening from IO completion interrupt because we
      don't properly set buffer_defer_completion() when unlocked DIO races
      with locked DIO to unwritten extent.
      
      Since unlocked DIO doesn't need io_end for anything, just avoid
      allocating it and corrupting pointer from inode for locked DIO.
      A cleaner fix would be to avoid these games with io_end pointer from the
      inode but that requires more intrusive changes so we leave that for
      later.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      74dae427
    • J
      ext4: fix bh->b_state corruption · ed8ad838
      Jan Kara 提交于
      ext4 can update bh->b_state non-atomically in _ext4_get_block() and
      ext4_da_get_block_prep(). Usually this is fine since bh is just a
      temporary storage for mapping information on stack but in some cases it
      can be fully living bh attached to a page. In such case non-atomic
      update of bh->b_state can race with an atomic update which then gets
      lost. Usually when we are mapping bh and thus updating bh->b_state
      non-atomically, nobody else touches the bh and so things work out fine
      but there is one case to especially worry about: ext4_finish_bio() uses
      BH_Uptodate_Lock on the first bh in the page to synchronize handling of
      PageWriteback state. So when blocksize < pagesize, we can be atomically
      modifying bh->b_state of a buffer that actually isn't under IO and thus
      can race e.g. with delalloc trying to map that buffer. The result is
      that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
      corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.
      
      Fix the problem by always updating bh->b_state bits atomically.
      
      CC: stable@vger.kernel.org
      Reported-by: NNikolay Borisov <kernel@kyup.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      ed8ad838
    • J
      fsnotify: turn fsnotify reaper thread into a workqueue job · 0918f1c3
      Jeff Layton 提交于
      We don't require a dedicated thread for fsnotify cleanup.  Switch it
      over to a workqueue job instead that runs on the system_unbound_wq.
      
      In the interest of not thrashing the queued job too often when there are
      a lot of marks being removed, we delay the reaper job slightly when
      queueing it, to allow several to gather on the list.
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Tested-by: NEryu Guan <guaneryu@gmail.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0918f1c3
    • J
      Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread" · 13d34ac6
      Jeff Layton 提交于
      This reverts commit c510eff6 ("fsnotify: destroy marks with
      call_srcu instead of dedicated thread").
      
      Eryu reported that he was seeing some OOM kills kick in when running a
      testcase that adds and removes inotify marks on a file in a tight loop.
      
      The above commit changed the code to use call_srcu to clean up the
      marks.  While that does (in principle) work, the srcu callback job is
      limited to cleaning up entries in small batches and only once per jiffy.
      It's easily possible to overwhelm that machinery with too many call_srcu
      callbacks, and Eryu's reproduer did just that.
      
      There's also another potential problem with using call_srcu here.  While
      you can obviously sleep while holding the srcu_read_lock, the callbacks
      run under local_bh_disable, so you can't sleep there.
      
      It's possible when putting the last reference to the fsnotify_mark that
      we'll end up putting a chain of references including the fsnotify_group,
      uid, and associated keys.  While I don't see any obvious ways that that
      could occurs, it's probably still best to avoid using call_srcu here
      after all.
      
      This patch reverts the above patch.  A later patch will take a different
      approach to eliminated the dedicated thread here.
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Reported-by: NEryu Guan <guaneryu@gmail.com>
      Tested-by: NEryu Guan <guaneryu@gmail.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13d34ac6
  6. 18 2月, 2016 3 次提交
    • K
      pnfs/blocklayout: fix a memeory leak when using,vmalloc_to_page · c8975706
      Kinglong Mee 提交于
      unreferenced object 0xffffc90000abf000 (size 16900):
        comm "fsync02", pid 15765, jiffies 4297431627 (age 423.772s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 a0 c2 19 00 88 ff ff  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff8174d54e>] kmemleak_alloc+0x4e/0xb0
          [<ffffffff811b9b91>] __vmalloc_node_range+0x231/0x280
          [<ffffffff811b9c2a>] __vmalloc+0x4a/0x50
          [<ffffffffa02c9ec1>] ext_tree_prepare_commit+0x231/0x2e0 [blocklayoutdriver]
          [<ffffffffa02c700e>] bl_prepare_layoutcommit+0xe/0x10 [blocklayoutdriver]
          [<ffffffffa0596a6c>] pnfs_layoutcommit_inode+0x29c/0x330 [nfsv4]
          [<ffffffffa0596b13>] pnfs_generic_sync+0x13/0x20 [nfsv4]
          [<ffffffffa0585188>] nfs4_file_fsync+0x58/0x150 [nfsv4]
          [<ffffffff81228e5b>] vfs_fsync_range+0x4b/0xb0
          [<ffffffff81228f1d>] do_fsync+0x3d/0x70
          [<ffffffff812291d0>] SyS_fsync+0x10/0x20
          [<ffffffff81757def>] entry_SYSCALL_64_fastpath+0x12/0x76
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      v2, add missing include header
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      c8975706
    • C
      nfs4: fix stateid handling for the NFS v4.2 operations · 4bdf87eb
      Christoph Hellwig 提交于
      The newly added NFS v4.2 operations (ALLOCATE, DEALLOCATE, SEEK and CLONE)
      use a helper called nfs42_set_rw_stateid to select a stateid that is sent
      to the server.  But they don't set the inode and state fields in the
      nfs4_exception structure, and this don't partake in the stateid recovery
      protocol.  Because of this they will simply return errors insted of trying
      to recover a stateid when the server return a BAD_STATEID error.
      
      Additionally CLONE has the problem that it operates on two files and thus
      two stateids, and thus needs to call the exception handler twice to
      recover stateids.
      
      While we're at it stop grabbing an addititional reference to the open
      context in all these operations - having the file open guarantees that
      the open context won't go away.
      
      All this can be produces with the generic/168 and generic/170 tests in
      xfstests which stress the CLONE stateid handling.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      4bdf87eb
    • B
      NFSv4: Fix a dentry leak on alias use · d9dfd8d7
      Benjamin Coddington 提交于
      In the case where d_add_unique() finds an appropriate alias to use it will
      have already incremented the reference count.  An additional dget() to swap
      the open context's dentry is unnecessary and will leak a reference.
      Signed-off-by: NBenjamin Coddington <bcodding@redhat.com>
      Fixes: 275bb307 ("NFSv4: Move dentry instantiation into the NFSv4-...")
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      d9dfd8d7
  7. 17 2月, 2016 2 次提交
    • T
      writeback: initialize inode members that track writeback history · 3d65ae46
      Tahsin Erdogan 提交于
      inode struct members that track cgroup writeback information
      should be reinitialized when inode gets allocated from
      kmem_cache. Otherwise, their values remain and get used by the
      new inode.
      Signed-off-by: NTahsin Erdogan <tahsin@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Fixes: d10c8095 ("writeback: implement foreign cgroup inode bdi_writeback switching")
      Signed-off-by: NJens Axboe <axboe@fb.com>
      3d65ae46
    • T
      writeback: keep superblock pinned during cgroup writeback association switches · 5ff8eaac
      Tejun Heo 提交于
      If cgroup writeback is in use, an inode is associated with a cgroup
      for writeback.  If the inode's main dirtier changes to another cgroup,
      the association gets updated asynchronously.  Nothing was pinning the
      superblock while such switches are in progress and superblock could go
      away while async switching is pending or in progress leading to
      crashes like the following.
      
       kernel BUG at fs/jbd2/transaction.c:319!
       invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
       CPU: 1 PID: 29158 Comm: kworker/1:10 Not tainted 4.5.0-rc3 #51
       Hardware name: Google Google, BIOS Google 01/01/2011
       Workqueue: events inode_switch_wbs_work_fn
       task: ffff880213dbbd40 ti: ffff880209264000 task.ti: ffff880209264000
       RIP: 0010:[<ffffffff803e6922>]  [<ffffffff803e6922>] start_this_handle+0x382/0x3e0
       RSP: 0018:ffff880209267c30  EFLAGS: 00010202
       ...
       Call Trace:
        [<ffffffff803e6be4>] jbd2__journal_start+0xf4/0x190
        [<ffffffff803cfc7e>] __ext4_journal_start_sb+0x4e/0x70
        [<ffffffff803b31ec>] ext4_evict_inode+0x12c/0x3d0
        [<ffffffff8035338b>] evict+0xbb/0x190
        [<ffffffff80354190>] iput+0x130/0x190
        [<ffffffff80360223>] inode_switch_wbs_work_fn+0x343/0x4c0
        [<ffffffff80279819>] process_one_work+0x129/0x300
        [<ffffffff80279b16>] worker_thread+0x126/0x480
        [<ffffffff8027ed14>] kthread+0xc4/0xe0
        [<ffffffff809771df>] ret_from_fork+0x3f/0x70
      
      Fix it by bumping s_active while cgroup association switching is in
      flight.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NTahsin Erdogan <tahsin@google.com>
      Link: http://lkml.kernel.org/g/CAAeU0aNCq7LGODvVGRU-oU_o-6enii5ey0p1c26D1ZzYwkDc5A@mail.gmail.com
      Fixes: d10c8095 ("writeback: implement foreign cgroup inode bdi_writeback switching")
      Cc: stable@vger.kernel.org #v4.5+
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5ff8eaac
  8. 16 2月, 2016 4 次提交
    • K
      ext4: fix memleak in ext4_readdir() · c906f38e
      Kirill Tkhai 提交于
      When ext4_bread() fails, fname_crypto_str remains
      allocated after return. Fix that.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      CC: Dmitry Monakhov <dmonakhov@virtuozzo.com>
      c906f38e
    • F
      Btrfs: fix direct IO requests not reporting IO error to user space · 1636d1d7
      Filipe Manana 提交于
      If a bio for a direct IO request fails, we were not setting the error in
      the parent bio (the main DIO bio), making us not return the error to
      user space in btrfs_direct_IO(), that is, it made __blockdev_direct_IO()
      return the number of bytes issued for IO and not the error a bio created
      and submitted by btrfs_submit_direct() got from the block layer.
      This essentially happens because when we call:
      
         dio_end_io(dio_bio, bio->bi_error);
      
      It does not set dio_bio->bi_error to the value of the second argument.
      So just add this missing assignment in endio callbacks, just as we do in
      the error path at btrfs_submit_direct() when we fail to clone the dio bio
      or allocate its private object. This follows the convention of what is
      done with other similar APIs such as bio_endio() where the caller is
      responsible for setting the bi_error field in the bio it passes as an
      argument to bio_endio().
      
      This was detected by the new generic test cases in xfstests: 271, 272,
      276 and 278. Which essentially setup a dm error target, then load the
      error table, do a direct IO write and unload the error table. They
      expect the write to fail with -EIO, which was not getting reported
      when testing against btrfs.
      
      Cc: stable@vger.kernel.org  # 4.3+
      Fixes: 4246a0b6 ("block: add a bi_error field to struct bio")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      1636d1d7
    • T
      pNFS: Always set NFS_LAYOUT_RETURN_REQUESTED with lo->plh_return_iomode · e0fa0d01
      Trond Myklebust 提交于
      When setting the layout return mode, we must always also set the
      NFS_LAYOUT_RETURN_REQUESTED flag to ensure that we send a layoutreturn.
      Otherwise pnfs_error_mark_layout_for_return() could set the mode, but
      fail to send the layoutreturn because another is already in flight.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      e0fa0d01
    • T
      pNFS: Fix pnfs_mark_matching_lsegs_return() · 2f215968
      Trond Myklebust 提交于
      We don't need to schedule a layoutreturn if the layout segment can
      be freed immediately.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      2f215968
  9. 12 2月, 2016 4 次提交
    • E
      ext4: remove unused parameter "newblock" in convert_initialized_extent() · 56263b4c
      Eryu Guan 提交于
      The "newblock" parameter is not used in convert_initialized_extent(),
      remove it.
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      56263b4c
    • E
      ext4: don't read blocks from disk after extents being swapped · bcff2488
      Eryu Guan 提交于
      I notice ext4/307 fails occasionally on ppc64 host, reporting md5
      checksum mismatch after moving data from original file to donor file.
      
      The reason is that move_extent_per_page() calls __block_write_begin()
      and block_commit_write() to write saved data from original inode blocks
      to donor inode blocks, but __block_write_begin() not only maps buffer
      heads but also reads block content from disk if the size is not block
      size aligned.  At this time the physical block number in mapped buffer
      head is pointing to the donor file not the original file, and that
      results in reading wrong data to page, which get written to disk in
      following block_commit_write call.
      
      This also can be reproduced by the following script on 1k block size ext4
      on x86_64 host:
      
          mnt=/mnt/ext4
          donorfile=$mnt/donor
          testfile=$mnt/testfile
          e4compact=~/xfstests/src/e4compact
      
          rm -f $donorfile $testfile
      
          # reserve space for donor file, written by 0xaa and sync to disk to
          # avoid EBUSY on EXT4_IOC_MOVE_EXT
          xfs_io -fc "pwrite -S 0xaa 0 1m" -c "fsync" $donorfile
      
          # create test file written by 0xbb
          xfs_io -fc "pwrite -S 0xbb 0 1023" -c "fsync" $testfile
      
          # compute initial md5sum
          md5sum $testfile | tee md5sum.txt
          # drop cache, force e4compact to read data from disk
          echo 3 > /proc/sys/vm/drop_caches
      
          # test defrag
          echo "$testfile" | $e4compact -i -v -f $donorfile
          # check md5sum
          md5sum -c md5sum.txt
      
      Fix it by creating & mapping buffer heads only but not reading blocks
      from disk, because all the data in page is guaranteed to be up-to-date
      in mext_page_mkuptodate().
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      bcff2488
    • I
      ext4: fix potential integer overflow · 46901760
      Insu Yun 提交于
      Since sizeof(ext_new_group_data) > sizeof(ext_new_flex_group_data),
      integer overflow could be happened.
      Therefore, need to fix integer overflow sanitization.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NInsu Yun <wuninsu@gmail.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      46901760
    • H
      ext4: add a line break for proc mb_groups display · 802cf1f9
      Huaitong Han 提交于
      This patch adds a line break for proc mb_groups display.
      Signed-off-by: NHuaitong Han <huaitong.han@intel.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      802cf1f9