1. 08 8月, 2014 4 次提交
    • A
      take fs_pin stuff to fs/* · efb170c2
      Al Viro 提交于
      Add a new field to fs_pin - kill(pin).  That's what umount and r/o remount
      will be calling for all pins attached to vfsmount and superblock resp.
      Called after bumping the refcount, so it won't go away under us.  Dropping
      the refcount is responsibility of the instance.  All generic stuff moved to
      fs/fs_pin.c; the next step will rip all the knowledge of kernel/acct.c from
      fs/super.c and fs/namespace.c.  After that - death to mnt_pin(); it was
      intended to be usable as generic mechanism for code that wants to attach
      objects to vfsmount, so that they would not make the sucker busy and
      would get killed on umount.  Never got it right; it remained acct.c-specific
      all along.  Now it's very close to being killable.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      efb170c2
    • A
      drop ->s_umount around acct_auto_close() · 0aec09d0
      Al Viro 提交于
      just repeat the frozen check after regaining it, and check that sb
      is still alive.  If several threads hit acct_auto_close() at the
      same time, acct_auto_close() will survive that just fine.  And we
      really don't want to play with writes and closing the file with
      ->s_umount held exclusive - it's a deadlock country.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      0aec09d0
    • A
      acct: get rid of acct_list · 215752fc
      Al Viro 提交于
      Put these suckers on per-vfsmount and per-superblock lists instead.
      Note: right now it's still acct_lock for everything, but that's
      going to change.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      215752fc
    • A
      acct: switch to __kernel_write() · ed44724b
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ed44724b
  2. 01 8月, 2014 2 次提交
  3. 30 7月, 2014 1 次提交
  4. 24 7月, 2014 6 次提交
  5. 23 7月, 2014 1 次提交
    • K
      NFSD: Fix crash encoding lock reply on 32-bit · f98bac5a
      Kinglong Mee 提交于
      Commit 8c7424cf "nfsd4: don't try to encode conflicting owner if low
      on space" forgot to free conf->data in nfsd4_encode_lockt and before
      sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak.
      
      Worse, kfree() can be called on an uninitialized pointer in the case of
      a succesful lock (or one that fails for a reason other than a conflict).
      
      (Note that lock->lk_denied.ld_owner.data appears it should be zero here,
      until you notice that it's one arm of a union the other arm of which is
      written to in the succesful case by the
      
      	memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid,
      	                                sizeof(stateid_t));
      
      in nfsd4_lock().  In the 32-bit case this overwrites ld_owner.data.)
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Fixes: 8c7424cf ""nfsd4: don't try to encode conflicting owner if low on space"
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f98bac5a
  6. 22 7月, 2014 2 次提交
  7. 20 7月, 2014 2 次提交
    • E
      btrfs: test for valid bdev before kobj removal in btrfs_rm_device · 0bfaa9c5
      Eric Sandeen 提交于
      commit 99994cde btrfs: dev delete should remove sysfs entry
      added a btrfs_kobj_rm_device, which dereferences device->bdev...
      right after we check whether device->bdev might be NULL.
      
      I don't honestly know if it's possible to have a NULL device->bdev
      here, but assuming that it is (given the test), we need to move
      the kobject removal to be under that test.
      
      (Coverity spotted this)
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      0bfaa9c5
    • L
      Btrfs: fix abnormal long waiting in fsync · 98ce2ded
      Liu Bo 提交于
      xfstests generic/127 detected this problem.
      
      With commit 7fc34a62, now fsync will only flush
      data within the passed range.  This is the cause of the above problem,
      -- btrfs's fsync has a stage called 'sync log' which will wait for all the
      ordered extents it've recorded to finish.
      
      In xfstests/generic/127, with mixed operations such as truncate, fallocate,
      punch hole, and mapwrite, we get some pre-allocated extents, and mapwrite will
      mmap, and then msync.  And I find that msync will wait for quite a long time
      (about 20s in my case), thanks to ftrace, it turns out that the previous
      fallocate calls 'btrfs_wait_ordered_range()' to flush dirty pages, but as the
      range of dirty pages may be larger than 'btrfs_wait_ordered_range()' wants,
      there can be some ordered extents created but not getting corresponding pages
      flushed, then they're left in memory until we fsync which runs into the
      stage 'sync log', and fsync will just wait for the system writeback thread
      to flush those pages and get ordered extents finished, so the latency is
      inevitable.
      
      This adds a flush similar to btrfs_start_ordered_extent() in
      btrfs_wait_logged_extents() to fix that.
      Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      98ce2ded
  8. 18 7月, 2014 10 次提交
  9. 16 7月, 2014 1 次提交
  10. 15 7月, 2014 4 次提交
    • D
      xfs: null unused quota inodes when quota is on · 03e01349
      Dave Chinner 提交于
      When quota is on, it is expected that unused quota inodes have a
      value of NULLFSINO. The changes to support a separate project quota
      in 3.12 broken this rule for non-project quota inode enabled
      filesystem, as the code now refuses to write the group quota inode
      if neither group or project quotas are enabled. This regression was
      introduced by commit d892d586 ("xfs: Start using pquotaino from the
      superblock").
      
      In this case, we should be writing NULLFSINO rather than nothing to
      ensure that we leave the group quota inode in a valid state while
      quotas are enabled.
      
      Failure to do so doesn't cause a current kernel to break - the
      separate project quota inodes introduced translation code to always
      treat a zero inode as NULLFSINO. This was introduced by commit
      01026297 ("xfs: Initialize all quota inodes to be NULLFSINO") with is
      also in 3.12 but older kernels do not do this and hence taking a
      filesystem back to an older kernel can result in quotas failing
      initialisation at mount time. When that happens, we see this in
      dmesg:
      
      [ 1649.215390] XFS (sdb): Mounting Filesystem
      [ 1649.316894] XFS (sdb): Failed to initialize disk quotas.
      [ 1649.316902] XFS (sdb): Ending clean mount
      
      By ensuring that we write NULLFSINO to quota inodes that aren't
      active, we avoid this problem. We have to be really careful when
      determining if the quota inodes are active or not, because we don't
      want to write a NULLFSINO if the quota inodes are active and we
      simply aren't updating them.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      03e01349
    • D
      xfs: refine the allocation stack switch · cf11da9c
      Dave Chinner 提交于
      The allocation stack switch at xfs_bmapi_allocate() has served it's
      purpose, but is no longer a sufficient solution to the stack usage
      problem we have in the XFS allocation path.
      
      Whilst the kernel stack size is now 16k, that is not a valid reason
      for undoing all our "keep stack usage down" modifications. What it
      does allow us to do is have the freedom to refine and perfect the
      modifications knowing that if we get it wrong it won't blow up in
      our faces - we have a safety net now.
      
      This is important because we still have the issue of older kernels
      having smaller stacks and that they are still supported and are
      demonstrating a wide range of different stack overflows.  Red Hat
      has several open bugs for allocation based stack overflows from
      directory modifications and direct IO block allocation and these
      problems still need to be solved. If we can solve them upstream,
      then distro's won't need to bake their own unique solutions.
      
      To that end, I've observed that every allocation based stack
      overflow report has had a specific characteristic - it has happened
      during or directly after a bmap btree block split. That event
      requires a new block to be allocated to the tree, and so we
      effectively stack one allocation stack on top of another, and that's
      when we get into trouble.
      
      A further observation is that bmap btree block splits are much rarer
      than writeback allocation - over a range of different workloads I've
      observed the ratio of bmap btree inserts to splits ranges from 100:1
      (xfstests run) to 10000:1 (local VM image server with sparse files
      that range in the hundreds of thousands to millions of extents).
      Either way, bmap btree split events are much, much rarer than
      allocation events.
      
      Finally, we have to move the kswapd state to the allocation workqueue
      work when allocation is done on behalf of kswapd. This is proving to
      cause significant perturbation in performance under memory pressure
      and appears to be generating allocation deadlock warnings under some
      workloads, so avoiding the use of a workqueue for the majority of
      kswapd writeback allocation will minimise the impact of such
      behaviour.
      
      Hence it makes sense to move the stack switch to xfs_btree_split()
      and only do it for bmap btree splits. Stack switches during
      allocation will be much rarer, so there won't be significant
      performacne overhead caused by switching stacks. The worse case
      stack from all allocation paths will be split, not just writeback.
      And the majority of memory allocations will be done in the correct
      context (e.g. kswapd) without causing additional latency, and so we
      simplify the memory reclaim interactions between processes,
      workqueues and kswapd.
      
      The worst stack I've been able to generate with this patch in place
      is 5600 bytes deep. It's very revealing because we exit XFS at:
      
      37)     1768      64   kmem_cache_alloc+0x13b/0x170
      
      about 1800 bytes of stack consumed, and the remaining 3800 bytes
      (and 36 functions) is memory reclaim, swap and the IO stack. And
      this occurs in the inode allocation from an open(O_CREAT) syscall,
      not writeback.
      
      The amount of stack being used is much less than I've previously be
      able to generate - fs_mark testing has been able to generate stack
      usage of around 7k without too much trouble; with this patch it's
      only just getting to 5.5k. This is primarily because the metadata
      allocation paths (e.g. directory blocks) are no longer causing
      double splits on the same stack, and hence now stack tracing is
      showing swapping being the worst stack consumer rather than XFS.
      
      Performance of fs_mark inode create workloads is unchanged.
      Performance of fs_mark async fsync workloads is consistently good
      with context switches reduced by around 150,000/s (30%).
      Performance of dbench, streaming IO and postmark is unchanged.
      Allocation deadlock warnings have not been seen on the workloads
      that generated them since adding this patch.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      cf11da9c
    • D
      Revert "xfs: block allocation work needs to be kswapd aware" · aa182e64
      Dave Chinner 提交于
      This reverts commit 1f6d6482.
      
      This commit resulted in regressions in performance in low
      memory situations where kswapd was doing writeback of delayed
      allocation blocks. It resulted in significant parallelism of the
      kswapd work and with the special kswapd flags meant that hundreds of
      active allocation could dip into kswapd specific memory reserves and
      avoid being throttled. This cause a large amount of performance
      variation, as well as random OOM-killer invocations that didn't
      previously exist.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      aa182e64
    • B
      aio: protect reqs_available updates from changes in interrupt handlers · 263782c1
      Benjamin LaHaise 提交于
      As of commit f8567a38 it is now possible to
      have put_reqs_available() called from irq context.  While put_reqs_available()
      is per cpu, it did not protect itself from interrupts on the same CPU.  This
      lead to aio_complete() corrupting the available io requests count when run
      under a heavy O_DIRECT workloads as reported by Robert Elliott.  Fix this by
      disabling irq updates around the per cpu batch updates of reqs_available.
      
      Many thanks to Robert and folks for testing and tracking this down.
      Reported-by: NRobert Elliot <Elliott@hp.com>
      Tested-by: NRobert Elliot <Elliott@hp.com>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      Cc: Jens Axboe <axboe@kernel.dk>, Christoph Hellwig <hch@infradead.org>
      Cc: stable@vger.kenel.org
      263782c1
  11. 14 7月, 2014 3 次提交
  12. 13 7月, 2014 4 次提交
新手
引导
客服 返回
顶部