1. 19 1月, 2018 5 次提交
  2. 17 1月, 2018 9 次提交
  3. 22 12月, 2017 2 次提交
    • A
      gfs2: Trim the ordered write list in gfs2_ordered_write() · 1f23bc78
      Abhi Das 提交于
      We iterate through the entire ordered writes list in
      gfs2_ordered_write() to write out inodes. It's a good
      place to try and shrink the list by throwing out inodes
      that don't have any pages.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      1f23bc78
    • B
      GFS2: Reduce code redundancy writing log headers · 588bff95
      Bob Peterson 提交于
      Before this patch, there was a lot of code redundancy between functions
      log_write_header (which uses bio) and clean_journal (which uses
      buffer_head). This patch reduces the redundancy to simplify the code
      and make log header writing more consistent. We want more consistency
      and reduced redundancy because we plan to add a bunch of new fields
      to improve performance (by eliminating the local statfs and quota files)
      improve metadata integrity (by adding new crcs and such) and for better
      debugging (by adding new fields to track when and where metadata was
      pushed through the journals.) We don't want to duplicate setting these
      new fields, nor allow for human error in the process.
      
      This reduction in code redundancy is accomplished by introducing a new
      helper function, gfs2_write_log_header which uses bio rather than bh.
      That simplifies recovery function clean_journal() to use the new helper
      function and iomap rather than redundancy and block_map (and eventually
      we can maybe remove block_map). It also reduces our dependency on
      buffer_heads.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      588bff95
  4. 18 12月, 2017 2 次提交
  5. 17 12月, 2017 1 次提交
  6. 16 12月, 2017 3 次提交
    • L
      Revert "mm: replace p??_write with pte_access_permitted in fault + gup paths" · f6f37321
      Linus Torvalds 提交于
      This reverts commits 5c9d2d5c, c7da82b8, and e7fe7b5c.
      
      We'll probably need to revisit this, but basically we should not
      complicate the get_user_pages_fast() case, and checking the actual page
      table protection key bits will require more care anyway, since the
      protection keys depend on the exact state of the VM in question.
      
      Particularly when doing a "remote" page lookup (ie in somebody elses VM,
      not your own), you need to be much more careful than this was.  Dave
      Hansen says:
      
       "So, the underlying bug here is that we now a get_user_pages_remote()
        and then go ahead and do the p*_access_permitted() checks against the
        current PKRU. This was introduced recently with the addition of the
        new p??_access_permitted() calls.
      
        We have checks in the VMA path for the "remote" gups and we avoid
        consulting PKRU for them. This got missed in the pkeys selftests
        because I did a ptrace read, but not a *write*. I also didn't
        explicitly test it against something where a COW needed to be done"
      
      It's also not entirely clear that it makes sense to check the protection
      key bits at this level at all.  But one possible eventual solution is to
      make the get_user_pages_fast() case just abort if it sees protection key
      bits set, which makes us fall back to the regular get_user_pages() case,
      which then has a vma and can do the check there if we want to.
      
      We'll see.
      
      Somewhat related to this all: what we _do_ want to do some day is to
      check the PAGE_USER bit - it should obviously always be set for user
      pages, but it would be a good check to have back.  Because we have no
      generic way to test for it, we lost it as part of moving over from the
      architecture-specific x86 GUP implementation to the generic one in
      commit e585513b ("x86/mm/gup: Switch GUP to the generic
      get_user_page_fast() implementation").
      
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: "Jérôme Glisse" <jglisse@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f6f37321
    • S
      nfs: don't wait on commit in nfs_commit_inode() if there were no commit requests · dc4fd9ab
      Scott Mayhew 提交于
      If there were no commit requests, then nfs_commit_inode() should not
      wait on the commit or mark the inode dirty, otherwise the following
      BUG_ON can be triggered:
      
      [ 1917.130762] kernel BUG at fs/inode.c:578!
      [ 1917.130766] Oops: Exception in kernel mode, sig: 5 [#1]
      [ 1917.130768] SMP NR_CPUS=2048 NUMA pSeries
      [ 1917.130772] Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi blocklayoutdriver rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache sunrpc sg nx_crypto pseries_rng ip_tables xfs libcrc32c sd_mod crc_t10dif crct10dif_generic crct10dif_common ibmvscsi scsi_transport_srp ibmveth scsi_tgt dm_mirror dm_region_hash dm_log dm_mod
      [ 1917.130805] CPU: 2 PID: 14923 Comm: umount.nfs4 Tainted: G               ------------ T 3.10.0-768.el7.ppc64 #1
      [ 1917.130810] task: c0000005ecd88040 ti: c00000004cea0000 task.ti: c00000004cea0000
      [ 1917.130813] NIP: c000000000354178 LR: c000000000354160 CTR: c00000000012db80
      [ 1917.130816] REGS: c00000004cea3720 TRAP: 0700   Tainted: G               ------------ T  (3.10.0-768.el7.ppc64)
      [ 1917.130820] MSR: 8000000100029032 <SF,EE,ME,IR,DR,RI>  CR: 22002822  XER: 20000000
      [ 1917.130828] CFAR: c00000000011f594 SOFTE: 1
      GPR00: c000000000354160 c00000004cea39a0 c0000000014c4700 c0000000018cc750
      GPR04: 000000000000c750 80c0000000000000 0600000000000000 04eeb76bea749a03
      GPR08: 0000000000000034 c0000000018cc758 0000000000000001 d000000005e619e8
      GPR12: c00000000012db80 c000000007b31200 0000000000000000 0000000000000000
      GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
      GPR24: 0000000000000000 c000000000dfc3ec 0000000000000000 c0000005eefc02c0
      GPR28: d0000000079dbd50 c0000005b94a02c0 c0000005b94a0250 c0000005b94a01c8
      [ 1917.130867] NIP [c000000000354178] .evict+0x1c8/0x350
      [ 1917.130871] LR [c000000000354160] .evict+0x1b0/0x350
      [ 1917.130873] Call Trace:
      [ 1917.130876] [c00000004cea39a0] [c000000000354160] .evict+0x1b0/0x350 (unreliable)
      [ 1917.130880] [c00000004cea3a30] [c0000000003558cc] .evict_inodes+0x13c/0x270
      [ 1917.130884] [c00000004cea3af0] [c000000000327d20] .kill_anon_super+0x70/0x1e0
      [ 1917.130896] [c00000004cea3b80] [d000000005e43e30] .nfs_kill_super+0x20/0x60 [nfs]
      [ 1917.130900] [c00000004cea3c00] [c000000000328a20] .deactivate_locked_super+0xa0/0x1b0
      [ 1917.130903] [c00000004cea3c80] [c00000000035ba54] .cleanup_mnt+0xd4/0x180
      [ 1917.130907] [c00000004cea3d10] [c000000000119034] .task_work_run+0x114/0x150
      [ 1917.130912] [c00000004cea3db0] [c00000000001ba6c] .do_notify_resume+0xcc/0x100
      [ 1917.130916] [c00000004cea3e30] [c00000000000a7b0] .ret_from_except_lite+0x5c/0x60
      [ 1917.130919] Instruction dump:
      [ 1917.130921] 7fc3f378 486734b5 60000000 387f00a0 38800003 4bdcb365 60000000 e95f00a0
      [ 1917.130927] 694a0060 7d4a0074 794ad182 694a0001 <0b0a0000> 892d02a4 2f890000 40de0134
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Cc: stable@vger.kernel.org # 4.5+
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      dc4fd9ab
    • S
      nfs: fix a deadlock in nfs client initialization · c156618e
      Scott Mayhew 提交于
      The following deadlock can occur between a process waiting for a client
      to initialize in while walking the client list during nfsv4 server trunking
      detection and another process waiting for the nfs_clid_init_mutex so it
      can initialize that client:
      
      Process 1                               Process 2
      ---------                               ---------
      spin_lock(&nn->nfs_client_lock);
      list_add_tail(&CLIENTA->cl_share_link,
              &nn->nfs_client_list);
      spin_unlock(&nn->nfs_client_lock);
                                              spin_lock(&nn->nfs_client_lock);
                                              list_add_tail(&CLIENTB->cl_share_link,
                                                      &nn->nfs_client_list);
                                              spin_unlock(&nn->nfs_client_lock);
                                              mutex_lock(&nfs_clid_init_mutex);
                                              nfs41_walk_client_list(clp, result, cred);
                                              nfs_wait_client_init_complete(CLIENTA);
      (waiting for nfs_clid_init_mutex)
      
      Make sure nfs_match_client() only evaluates clients that have completed
      initialization in order to prevent that deadlock.
      
      This patch also fixes v4.0 trunking behavior by not marking the client
      NFS_CS_READY until the clientid has been confirmed.
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      c156618e
  7. 15 12月, 2017 3 次提交
  8. 14 12月, 2017 1 次提交
  9. 13 12月, 2017 3 次提交
    • A
      gfs2: Add a crc field to resource group headers · 850d2d91
      Andrew Price 提交于
      Add the rg_crc field to store a crc32 of the gfs2_rgrp structure. This
      allows us to check resource group headers' integrity and removes the
      requirement to check them against the rindex entries in fsck. If this
      field is found to be zero, it should be ignored (or updated with an
      accurate value).
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      850d2d91
    • A
      gfs2: Add rindex fields to rgrp headers · 166725d9
      Andrew Price 提交于
      Add rg_data0, rg_data and rg_bitbytes to struct gfs2_rgrp. The fields
      are identical to their counterparts in struct gfs2_rindex and are
      intended to reduce the use of the rindex. For now the fields are only
      written back as the in-memory equivalents in struct gfs2_rgrpd are set
      using values from the rindex. However, they are needed at this point so
      that userspace can make use of them, allowing a migration away from the
      rindex over time.
      
      The new fields take up previously reserved space which was explicitly
      zeroed on write so, in clusters with mixed kernels, these fields could
      get zeroed after being set and this should not be treated as an error.
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      166725d9
    • A
      gfs2: Add a next-resource-group pointer to resource groups · 65adc273
      Andrew Price 提交于
      Add a new rg_skip field to struct gfs2_rgrp, replacing __pad. The
      rg_skip field has the following meaning:
      
      - If rg_skip is zero, it is considered unset and not useful.
      - If rg_skip is non-zero, its value will be the number of blocks between
        this rgrp's address and the next rgrp's address. This can be used as a
        hint by fsck.gfs2 when rebuilding a bad rindex, for example.
      
      This will provide less dependency on the rindex in future, and allow
      tools such as fsck.gfs2 to iterate the resource groups without keeping
      the rindex around.
      
      The field is updated in gfs2_rgrp_out() so that existing file systems
      will have it set. This means that any resource groups that aren't ever
      written will not be updated. The final rgrp is a special case as there
      is no next rgrp, so it will always have a rg_skip of 0 (unless the fs is
      extended).
      
      Before this patch, gfs2_rgrp_out() zeroes the __pad field explicitly, so
      the rg_skip field can get set back to 0 in cases where nodes with and
      without this patch are mixed in a cluster. In some cases, the field may
      bounce between being set by one node and then zeroed by another which
      may harm performance slightly, e.g. when two nodes create many small
      files. In testing this situation is rare but it becomes more likely as
      the filesystem fills up and there are fewer resource groups to choose
      from. The problem goes away when all nodes are running with this patch.
      Dipping into the space currently occupied by the rg_reserved field would
      have resulted in the same problem as it is also explicitly zeroed, so
      unfortunately there is no other way around it.
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      65adc273
  10. 12 12月, 2017 1 次提交
    • C
      ext4: fix crash when a directory's i_size is too small · 9d5afec6
      Chandan Rajendra 提交于
      On a ppc64 machine, when mounting a fuzzed ext2 image (generated by
      fsfuzzer) the following call trace is seen,
      
      VFS: brelse: Trying to free free buffer
      WARNING: CPU: 1 PID: 6913 at /root/repos/linux/fs/buffer.c:1165 .__brelse.part.6+0x24/0x40
      .__brelse.part.6+0x20/0x40 (unreliable)
      .ext4_find_entry+0x384/0x4f0
      .ext4_lookup+0x84/0x250
      .lookup_slow+0xdc/0x230
      .walk_component+0x268/0x400
      .path_lookupat+0xec/0x2d0
      .filename_lookup+0x9c/0x1d0
      .vfs_statx+0x98/0x140
      .SyS_newfstatat+0x48/0x80
      system_call+0x58/0x6c
      
      This happens because the directory that ext4_find_entry() looks up has
      inode->i_size that is less than the block size of the filesystem. This
      causes 'nblocks' to have a value of zero. ext4_bread_batch() ends up not
      reading any of the directory file's blocks. This renders the entries in
      bh_use[] array to continue to have garbage data. buffer_uptodate() on
      bh_use[0] can then return a zero value upon which brelse() function is
      invoked.
      
      This commit fixes the bug by returning -ENOENT when the directory file
      has no associated blocks.
      Reported-by: NAbdul Haleem <abdhalee@linux.vnet.ibm.com>
      Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      9d5afec6
  11. 11 12月, 2017 7 次提交
  12. 10 12月, 2017 1 次提交
  13. 09 12月, 2017 2 次提交
    • D
      xfs: make iomap_begin functions trim iomaps consistently · b7e0b6ff
      Darrick J. Wong 提交于
      Historically, the XFS iomap_begin function only returned mappings for
      exactly the range queried, i.e. it doesn't do XFS_BMAPI_ENTIRE lookups.
      The current vfs iomap consumers are only set up to deal with trimmed
      mappings.  xfs_xattr_iomap_begin does BMAPI_ENTIRE lookups, which is
      inconsistent with the current iomap usage.  Remove the flag so that both
      iomap_begin functions behave the same way.
      
      FWIW this also fixes a behavioral regression in xattr FIEMAP that was
      introduced in 4.8 wherein attr fork extents are no longer trimmed like
      they used to be.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      b7e0b6ff
    • C
      xfs: remove "no-allocation" reservations for file creations · f59cf5c2
      Christoph Hellwig 提交于
      If we create a new file we will need an inode, and usually some metadata
      in the parent direction.  Aiming for everything to go well despite the
      lack of a reservation leads to dirty transactions cancelled under a heavy
      create/delete load.  This patch removes those nospace transactions, which
      will lead to slightly earlier ENOSPC on some workloads, but instead
      prevent file system shutdowns due to cancelling dirty transactions for
      others.
      
      A customer could observe assertations failures and shutdowns due to
      cancelation of dirty transactions during heavy NFS workloads as shown
      below:
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728125] XFS: Assertion failed: error != -ENOSPC, file: fs/xfs/xfs_inode.c, line: 1262
      
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728222] Call Trace:
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728246]  [<ffffffff81795daf>] dump_stack+0x63/0x81
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728262]  [<ffffffff810a1a5a>] warn_slowpath_common+0x8a/0xc0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728264]  [<ffffffff810a1b8a>] warn_slowpath_null+0x1a/0x20
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728285]  [<ffffffffa01bf403>] asswarn+0x33/0x40 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728308]  [<ffffffffa01bb07e>] xfs_create+0x7be/0x7d0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728329]  [<ffffffffa01b6ffb>] xfs_generic_create+0x1fb/0x2e0 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728348]  [<ffffffffa01b7114>] xfs_vn_mknod+0x14/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728366]  [<ffffffffa01b7153>] xfs_vn_create+0x13/0x20 [xfs]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728380]  [<ffffffff81231de5>] vfs_create+0xd5/0x140
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728390]  [<ffffffffa045ddb9>] do_nfsd_create+0x499/0x610 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728396]  [<ffffffffa0465fa5>] nfsd3_proc_create+0x135/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728401]  [<ffffffffa04561e3>] nfsd_dispatch+0xc3/0x210 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728416]  [<ffffffffa03bfa43>] svc_process_common+0x453/0x6f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728423]  [<ffffffffa03bfdf3>] svc_process+0x113/0x1f0 [sunrpc]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728427]  [<ffffffffa0455bcf>] nfsd+0x10f/0x180 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728432]  [<ffffffffa0455ac0>] ? nfsd_destroy+0x80/0x80 [nfsd]
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728438]  [<ffffffff810c0d58>] kthread+0xd8/0xf0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728441]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728451]  [<ffffffff8179d962>] ret_from_fork+0x42/0x70
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728453]  [<ffffffff810c0c80>] ? kthread_create_on_node+0x1b0/0x1b0
      2017-05-30 21:17:06 kernel: WARNING: [ 2670.728454] ---[ end trace f9822c842fec81d4 ]---
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728477] XFS (sdb): Internal error xfs_trans_cancel at line 983 of file fs/xfs/xfs_trans.c.  Caller xfs_create+0x4ee/0x7d0 [xfs]
      
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728684] XFS (sdb): Corruption of in-memory data detected. Shutting down filesystem
      2017-05-30 21:17:06 kernel: ALERT: [ 2670.728685] XFS (sdb): Please umount the filesystem and rectify the problem(s)
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f59cf5c2