1. 06 6月, 2014 12 次提交
  2. 15 5月, 2014 9 次提交
  3. 13 5月, 2014 5 次提交
  4. 07 5月, 2014 8 次提交
    • D
      xfs: fix directory readahead offset off-by-one · 8cfcc3e5
      Dave Chinner 提交于
      Directory readahead can throw loud scary but harmless warnings
      when multiblock directories are in use a specific pattern of
      discontiguous blocks are found in the directory. That is, if a hole
      follows a discontiguous block, it will throw a warning like:
      
      XFS (dm-1): xfs_da_do_buf: bno 637 dir: inode 34363923462
      XFS (dm-1): [00] br_startoff 637 br_startblock 1917954575 br_blockcount 1 br_state 0
      XFS (dm-1): [01] br_startoff 638 br_startblock -2 br_blockcount 1 br_state 0
      
      And dump a stack trace.
      
      This is because the readahead offset increment loop does a double
      increment of the block index - it does an increment for the loop
      iteration as well as increase the loop counter by the number of
      blocks in the extent. As a result, the readahead offset does not get
      incremented correctly for discontiguous blocks and hence can ask for
      readahead of a directory block from an offset part way through a
      directory block.  If that directory block is followed by a hole, it
      will trigger a mapping warning like the above.
      
      The bad readahead will be ignored, though, because the main
      directory block read loop uses the correct mapping offsets rather
      than the readahead offset and so will ignore the bad readahead
      altogether.
      
      Fix the warning by ensuring that the readahead offset is correctly
      incremented.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8cfcc3e5
    • D
      xfs: don't sleep in xlog_cil_force_lsn on shutdown · ac983517
      Dave Chinner 提交于
      Reports of a shutdown hang when fsyncing a directory have surfaced,
      such as this:
      
      [ 3663.394472] Call Trace:
      [ 3663.397199]  [<ffffffff815f1889>] schedule+0x29/0x70
      [ 3663.402743]  [<ffffffffa01feda5>] xlog_cil_force_lsn+0x185/0x1a0 [xfs]
      [ 3663.416249]  [<ffffffffa01fd3af>] _xfs_log_force_lsn+0x6f/0x2f0 [xfs]
      [ 3663.429271]  [<ffffffffa01a339d>] xfs_dir_fsync+0x7d/0xe0 [xfs]
      [ 3663.435873]  [<ffffffff811df8c5>] do_fsync+0x65/0xa0
      [ 3663.441408]  [<ffffffff811dfbc0>] SyS_fsync+0x10/0x20
      [ 3663.447043]  [<ffffffff815fc7d9>] system_call_fastpath+0x16/0x1b
      
      If we trigger a shutdown in xlog_cil_push() from xlog_write(), we
      will never wake waiters on the current push sequence number, so
      anything waiting in xlog_cil_force_lsn() for that push sequence
      number to come up will not get woken and hence stall the shutdown.
      
      Fix this by ensuring we call wake_up_all(&cil->xc_commit_wait) in
      the push abort handling, in the log shutdown code when waking all
      waiters, and adding a shutdown check in the sequence completion wait
      loops to ensure they abort when a wakeup due to a shutdown occurs.
      Reported-by: NBoris Ranto <branto@redhat.com>
      Reported-by: NEric Sandeen <esandeen@redhat.com>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      ac983517
    • D
      xfs: truncate_setsize should be outside transactions · 49abc3a8
      Dave Chinner 提交于
      truncate_setsize() removes pages from the page cache, and hence
      requires page locks to be held. It is not valid to lock a page cache
      page inside a transaction context as we can hold page locks when we
      we reserve space for a transaction. If we do, then we expose an ABBA
      deadlock between log space reservation and page locks.
      
      That is, both the write path and writeback lock a page, then start a
      transaction for block allocation, which means they can block waiting
      for a log reservation with the page lock held. If we hold a log
      reservation and then do something that locks a page (e.g.
      truncate_setsize in xfs_setattr_size) then that page lock can block
      on the page locked and waiting for a log reservation. If the
      transaction that is waiting for the page lock is the only active
      transaction in the system that can free log space via a commit,
      then writeback will never make progress and so log space will never
      free up.
      
      This issue with xfs_setattr_size() was introduced back in 2010 by
      commit fa9b227e ("xfs: new truncate sequence") which moved the page
      cache truncate from outside the transaction context (what was
      xfs_itruncate_data()) to inside the transaction context as a call to
      truncate_setsize().
      
      The reason truncate_setsize() was located where in this place was
      that we can't shouldn't change the file size until after we are in
      the transaction context and the operation will either succeed or
      shut down the filesystem on failure. However, block_truncate_page()
      already modifies the file contents before we enter the transaction
      context, so we can't really fulfill this guarantee in any way. Hence
      we may as well ensure that on success or failure, the in-memory
      inode and data is truncated away and that the application cleans up
      the mess appropriately.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      49abc3a8
    • F
      fs/affs/super.c: bugfix / double free · d353efd0
      Fabian Frederick 提交于
      Commit 842a859d ("affs: use ->kill_sb() to simplify ->put_super()
      and failure exits of ->mount()") adds .kill_sb which frees sbi but
      doesn't remove sbi free in case of parse_options error causing double
      free+random crash.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>	[3.14.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d353efd0
    • W
      fanotify: fix -EOVERFLOW with large files on 64-bit · 1e2ee49f
      Will Woods 提交于
      On 64-bit systems, O_LARGEFILE is automatically added to flags inside
      the open() syscall (also openat(), blkdev_open(), etc).  Userspace
      therefore defines O_LARGEFILE to be 0 - you can use it, but it's a
      no-op.  Everything should be O_LARGEFILE by default.
      
      But: when fanotify does create_fd() it uses dentry_open(), which skips
      all that.  And userspace can't set O_LARGEFILE in fanotify_init()
      because it's defined to 0.  So if fanotify gets an event regarding a
      large file, the read() will just fail with -EOVERFLOW.
      
      This patch adds O_LARGEFILE to fanotify_init()'s event_f_flags on 64-bit
      systems, using the same test as open()/openat()/etc.
      
      Addresses https://bugzilla.redhat.com/show_bug.cgi?id=696821Signed-off-by: NWill Woods <wwoods@redhat.com>
      Acked-by: NEric Paris <eparis@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e2ee49f
    • I
      autofs: fix lockref lookup · 6b6751f7
      Ian Kent 提交于
      autofs needs to be able to see private data dentry flags for its dentrys
      that are being created but not yet hashed and for its dentrys that have
      been rmdir()ed but not yet freed.  It needs to do this so it can block
      processes in these states until a status has been returned to indicate
      the given operation is complete.
      
      It does this by keeping two lists, active and expring, of dentrys in
      this state and uses ->d_release() to keep them stable while it checks
      the reference count to determine if they should be used.
      
      But with the recent lockref changes dentrys being freed sometimes don't
      transition to a reference count of 0 before being freed so autofs can
      occassionally use a dentry that is invalid which can lead to a panic.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b6751f7
    • N
      hugetlb: ensure hugepage access is denied if hugepages are not supported · 457c1b27
      Nishanth Aravamudan 提交于
      Currently, I am seeing the following when I `mount -t hugetlbfs /none
      /dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`.  I think it's
      related to the fact that hugetlbfs is properly not correctly setting
      itself up in this state?:
      
        Unable to handle kernel paging request for data at address 0x00000031
        Faulting instruction address: 0xc000000000245710
        Oops: Kernel access of bad area, sig: 11 [#1]
        SMP NR_CPUS=2048 NUMA pSeries
        ....
      
      In KVM guests on Power, in a guest not backed by hugepages, we see the
      following:
      
        AnonHugePages:         0 kB
        HugePages_Total:       0
        HugePages_Free:        0
        HugePages_Rsvd:        0
        HugePages_Surp:        0
        Hugepagesize:         64 kB
      
      HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
      are not supported at boot-time, but this is only checked in
      hugetlb_init().  Extract the check to a helper function, and use it in a
      few relevant places.
      
      This does make hugetlbfs not supported (not registered at all) in this
      environment.  I believe this is fine, as there are no valid hugepages
      and that won't change at runtime.
      
      [akpm@linux-foundation.org: use pr_info(), per Mel]
      [akpm@linux-foundation.org: fix build when HPAGE_SHIFT is undefined]
      Signed-off-by: NNishanth Aravamudan <nacc@linux.vnet.ibm.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      457c1b27
    • C
      posix_acl: handle NULL ACL in posix_acl_equiv_mode · 50c6e282
      Christoph Hellwig 提交于
      Various filesystems don't bother checking for a NULL ACL in
      posix_acl_equiv_mode, and thus can dereference a NULL pointer when it
      gets passed one. This usually happens from the NFS server, as the ACL tools
      never pass a NULL ACL, but instead of one representing the mode bits.
      
      Instead of adding boilerplat to all filesystems put this check into one place,
      which will allow us to remove the check from other filesystems as well later
      on.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reported-by: NBen Greear <greearb@candelatech.com>
      Reported-by: Marco Munderloh <munderl@tnt.uni-hannover.de>,
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      50c6e282
  5. 06 5月, 2014 2 次提交
    • D
      xfs: remote attribute overwrite causes transaction overrun · 8275cdd0
      Dave Chinner 提交于
      Commit e461fcb1 ("xfs: remote attribute lookups require the value
      length") passes the remote attribute length in the xfs_da_args
      structure on lookup so that CRC calculations and validity checking
      can be performed correctly by related code. This, unfortunately has
      the side effect of changing the args->valuelen parameter in cases
      where it shouldn't.
      
      That is, when we replace a remote attribute, the incoming
      replacement stores the value and length in args->value and
      args->valuelen, but then the lookup which finds the existing remote
      attribute overwrites args->valuelen with the length of the remote
      attribute being replaced. Hence when we go to create the new
      attribute, we create it of the size of the existing remote
      attribute, not the size it is supposed to be. When the new attribute
      is much smaller than the old attribute, this results in a
      transaction overrun and an ASSERT() failure on a debug kernel:
      
      XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 331
      
      Fix this by keeping the remote attribute value length separate to
      the attribute value length in the xfs_da_args structure. The enables
      us to pass the length of the remote attribute to be removed without
      overwriting the new attribute's length.
      
      Also, ensure that when we save remote block contexts for a later
      rename we zero the original state variables so that we don't confuse
      the state of the attribute to be removes with the state of the new
      attribute that we just added. [Spotted by Brain Foster.]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      8275cdd0
    • B
      xfs: initialize default acls for ->tmpfile() · d540e43b
      Brian Foster 提交于
      The current tmpfile handler does not initialize default ACLs. Doing so
      within xfs_vn_tmpfile() makes it roughly equivalent to xfs_vn_mknod(),
      which is already used as a common create handler.
      
      xfs_vn_mknod() does not currently have a mechanism to determine whether
      to link the file into the namespace. Therefore, further abstract
      xfs_vn_mknod() into a new xfs_generic_create() handler with a tmpfile
      parameter. This new handler calls xfs_create_tmpfile() and d_tmpfile()
      on the dentry when called via ->tmpfile().
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      d540e43b
  6. 05 5月, 2014 4 次提交
    • F
      xfs: Fix wrong error codes being returned · b28fd7b5
      From: Tuomas Tynkkynen 提交于
      xfs_{compat_,}attrmulti_by_handle could return an errno with incorrect
      sign in some cases. While at it, make sure ENOMEM is returned instead of
      E2BIG if kmalloc fails.
      Signed-off-by: NTuomas Tynkkynen <tuomas.tynkkynen@iki.fi>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b28fd7b5
    • D
      xfs: remove dquot hints · 3c353375
      Dave Chinner 提交于
      group and project quota hints are currently stored on the user
      dquot. If we are attaching quotas to the inode, then the group and
      project dquots are stored as hints on the user dquot to save having
      to look them up again later.
      
      The thing is, the hints are not used for that inode for the rest of
      the life of the inode - the dquots are attached directly to the
      inode itself - so the only time the hints are used is when an inode
      first has dquots attached.
      
      When the hints on the user dquot don't match the dquots being
      attache dto the inode, they are then removed and replaced with the
      new hints. If a user is concurrently modifying files in different
      group and/or project contexts, then this leads to thrashing of the
      hints attached to user dquot.
      
      If user quotas are not enabled, then hints are never even used.
      
      So, if the hints are used to avoid the cost of the lookup, is the
      cost of the lookup significant enough to justify the hint
      infrstructure? Maybe it was once, when there was a global quota
      manager shared between all XFS filesystems and was hash table based.
      
      However, lookups are now much simpler, requiring only a single lock and
      radix tree lookup local to the filesystem and no hash or LRU
      manipulations to be made. Hence the cost of lookup is much lower
      than when hints were implemented. Turns out that benchmarks show
      that, too, with thir being no differnce in performance when doing
      file creation workloads as a single user with user, group and
      project quotas enabled - the hints do not make the code go any
      faster. In fact, removing the hints shows a 2-3% reduction in the
      time it takes to create 50 million inodes....
      
      So, let's just get rid of the hints and the complexity around them.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      3c353375
    • E
      xfs: bulletfproof xfs_qm_scall_trunc_qfiles() · f58522c5
      Eric Sandeen 提交于
      Coverity noticed that if we sent junk into
      xfs_qm_scall_trunc_qfiles(), we could get back an
      uninitialized error value.  So sanitize the flags we
      will accept, and initialize error anyway for good measure.
      
      (This bug may have been introduced via c61a9e39).
      
      Should resolve Coverity CID 1163872.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      f58522c5
    • E
      xfs: fix Q_XQUOTARM ioctl · 9da93f9b
      Eric Sandeen 提交于
      The Q_XQUOTARM quotactl was not working properly, because
      we weren't passing around proper flags.  The xfs_fs_set_xstate()
      ioctl handler used the same flags for Q_XQUOTAON/OFF as
      well as for Q_XQUOTARM, but Q_XQUOTAON/OFF look for
      XFS_UQUOTA_ACCT, XFS_UQUOTA_ENFD, XFS_GQUOTA_ACCT etc,
      i.e. quota type + state, while Q_XQUOTARM looks only for
      the type of quota, i.e. XFS_DQ_USER, XFS_DQ_GROUP etc.
      
      Unfortunately these flag spaces overlap a bit, so we
      got semi-random results for Q_XQUOTARM; i.e. the value
      for XFS_DQ_USER == XFS_UQUOTA_ACCT, etc.  yeargh.
      
      Add a new quotactl op vector specifically for the QUOTARM
      operation, since it operates with a different flag space.
      
      This has been broken more or less forever, AFAICT.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      9da93f9b