1. 03 1月, 2014 8 次提交
    • S
      GFS2: Use range based functions for rgrp sync/invalidation · 7005c3e4
      Steven Whitehouse 提交于
      Each rgrp header is represented as a single extent on disk, so we
      can calculate the position within the address space, since we are
      using address spaces mapped 1:1 to the disk. This means that it
      is possible to use the range based versions of filemap_fdatawrite/wait
      and for invalidating the page cache.
      
      Our eventual intent is to then be able to merge the address spaces
      used for rgrps into a single address space, rather than to have
      one for each glock, saving memory and reducing complexity.
      
      Since during umount, the rgrp structures are disposed of before
      the glocks, we need to store the extent information in the glock
      so that is is available for a final invalidation. This patch uses
      a field which is otherwise unused in rgrp glocks to do that, so
      that we do not have to expand the size of a glock.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7005c3e4
    • S
      GFS2: Remove test which is always true · 7de41d36
      Steven Whitehouse 提交于
      Since gfs2_inplace_reserve() is always called with a valid
      alloc parms structure, there is no need to test for this
      within the function itself - and in any case, after we've
      all ready dereferenced it anyway.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7de41d36
    • S
      GFS2: Remove gfs2_quota_change_host structure · 7aed98fb
      Steven Whitehouse 提交于
      There is only one place this is used, when reading in the quota
      changes at mount time. It is not really required and much
      simpler to just convert the fields from the on-disk structure
      as required.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      7aed98fb
    • S
      GFS2: Clean up releasepage · e4f29206
      Steven Whitehouse 提交于
      For historical reasons, we drop and retake the log lock in ->releasepage()
      however, since there is no reason why we cannot hold the log lock over
      the whole function, this allows some simplification. In particular,
      pinning a buffer is only ever done under the log lock, so it is possible
      here to remove the test for pinned buffers in the second loop, since it
      is impossible for that to happen (it is also tested in the first loop).
      
      As a result, two tests made later in the second loop become constants
      and can also be reduced to the only possible branch. So the net result
      is to remove various bits of unreachable code and make this more
      readable.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      e4f29206
    • B
      GFS2: Implement a "rgrp has no extents longer than X" scheme · 5ea5050c
      Bob Peterson 提交于
      With the preceding patch, we started accepting block reservations
      smaller than the ideal size, which requires a lot more parsing of the
      bitmaps. To reduce the amount of bitmap searching, this patch
      implements a scheme whereby each rgrp keeps track of the point
      at this multi-block reservations will fail.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5ea5050c
    • B
      GFS2: Drop inadequate rgrps from the reservation tree · 1330edbe
      Bob Peterson 提交于
      This is just basically a resend of a patch I posted earlier.
      It didn't change from its original, except in diff offsets, etc:
      
      This patch fixes a bug in the GFS2 block allocation code. The problem
      starts if a process already has a multi-block reservation, but for
      some reason, another process disqualifies it from further allocations.
      For example, the other process might set on the GFS2_RDF_ERROR bit.
      The process holding the reservation jumps to label skip_rgrp, but
      that label comes after the code that removes the reservation from the
      tree. Therefore, the no longer usable reservation is not removed from
      the rgrp's reservations tree; it's lost. Eventually, the lost reservation
      causes the count of reserved blocks to get off, and eventually that
      causes a BUG_ON(rs->rs_rbm.rgd->rd_reserved < rs->rs_free) to trigger.
      This patch moves the call to after label skip_rgrp so that the
      disqualified reservation is properly removed from the tree, thus keeping
      the rgrp rd_reserved count sane.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1330edbe
    • B
      GFS2: If requested is too large, use the largest extent in the rgrp · 5ce13431
      Bob Peterson 提交于
      Here is a second try at a patch I posted earlier, which also implements
      suggestions Steve made:
      
      Before this patch, GFS2 would keep searching through all the rgrps
      until it found one that had a chunk of free blocks big enough to
      satisfy the size hint, which is based on the file write size,
      regardless of whether the chunk was big enough to perform the write.
      However, when doing big writes there may not be a large enough
      chunk of free blocks in any rgrp, due to file system fragmentation.
      The largest chunk may be big enough to satisfy the write request,
      but it may not meet the ideal reservation size from the "size hint".
      The writes would slow to a crawl because every write would search
      every rgrp, then finally give up and default to a single-block write.
      In my case, performance would drop from 425MB/s to 18KB/s, or 24000
      times slower.
      
      This patch basically makes it so that if we can't find a contiguous
      chunk of blocks big enough to satisfy the sizehint, we'll use the
      largest chunk of blocks we found that will still contain the write.
      It does so by keeping track of the largest run of blocks within the
      rgrp.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      5ce13431
    • J
      epoll: do not take the nested ep->mtx on EPOLL_CTL_DEL · 4ff36ee9
      Jason Baron 提交于
      The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock.
      That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with
      epoll_ctl(b, EPOLL_CTL_DEL, a, x).  The deadlock was introduced with
      commmit 67347fe4 ("epoll: do not take global 'epmutex' for simple
      topologies").
      
      The acquistion of the ep->mtx for the destination 'ep' was added such
      that a concurrent EPOLL_CTL_ADD operation would see the correct state of
      the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links')
      
      However, by simply not acquiring the lock, we do not serialize behind
      the ep->mtx from the add path, and thus may perform a full path check
      when if we had waited a little longer it may not have been necessary.
      However, this is a transient state, and performing the full loop
      checking in this case is not harmful.
      
      The important point is that we wouldn't miss doing the full loop
      checking when required, since EPOLL_CTL_ADD always locks any 'ep's that
      its operating upon.  The reason we don't need to do lock ordering in the
      add path, is that we are already are holding the global 'epmutex'
      whenever we do the double lock.  Further, the original posting of this
      patch, which was tested for the intended performance gains, did not
      perform this additional locking.
      Signed-off-by: NJason Baron <jbaron@akamai.com>
      Cc: Nathan Zimmer <nzimmer@sgi.com>
      Cc: Eric Wong <normalperson@yhbt.net>
      Cc: Nelson Elhage <nelhage@nelhage.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: "Paul E. McKenney" <paulmck@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ff36ee9
  2. 02 1月, 2014 1 次提交
  3. 28 12月, 2013 3 次提交
  4. 23 12月, 2013 1 次提交
    • L
      aio: clean up and fix aio_setup_ring page mapping · 3dc9acb6
      Linus Torvalds 提交于
      Since commit 36bc08cc ("fs/aio: Add support to aio ring pages
      migration") the aio ring setup code has used a special per-ring backing
      inode for the page allocations, rather than just using random anonymous
      pages.
      
      However, rather than remembering the pages as it allocated them, it
      would allocate the pages, insert them into the file mapping (dirty, so
      that they couldn't be free'd), and then forget about them.  And then to
      look them up again, it would mmap the mapping, and then use
      "get_user_pages()" to get back an array of the pages we just created.
      
      Now, not only is that incredibly inefficient, it also leaked all the
      pages if the mmap failed (which could happen due to excessive number of
      mappings, for example).
      
      So clean it all up, making it much more straightforward.  Also remove
      some left-overs of the previous (broken) mm_populate() usage that was
      removed in commit d6c355c7 ("aio: fix race in ring buffer page
      lookup introduced by page migration support") but left the pointless and
      now misleading MAP_POPULATE flag around.
      Tested-and-acked-by: NBenjamin LaHaise <bcrl@kvack.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3dc9acb6
  5. 22 12月, 2013 2 次提交
    • B
      aio/migratepages: make aio migrate pages sane · 8e321fef
      Benjamin LaHaise 提交于
      The arbitrary restriction on page counts offered by the core
      migrate_page_move_mapping() code results in rather suspicious looking
      fiddling with page reference counts in the aio_migratepage() operation.
      To fix this, make migrate_page_move_mapping() take an extra_count parameter
      that allows aio to tell the code about its own reference count on the page
      being migrated.
      
      While cleaning up aio_migratepage(), make it validate that the old page
      being passed in is actually what aio_migratepage() expects to prevent
      misbehaviour in the case of races.
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      8e321fef
    • B
      aio: fix kioctx leak introduced by "aio: Fix a trinity splat" · 1881686f
      Benjamin LaHaise 提交于
      e34ecee2 reworked the percpu reference
      counting to correct a bug trinity found.  Unfortunately, the change lead
      to kioctxes being leaked because there was no final reference count to
      put.  Add that reference count back in to fix things.
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      Cc: stable@vger.kernel.org
      1881686f
  6. 21 12月, 2013 1 次提交
  7. 20 12月, 2013 3 次提交
    • T
      ext4: add explicit casts when masking cluster sizes · f5a44db5
      Theodore Ts'o 提交于
      The missing casts can cause the high 64-bits of the physical blocks to
      be lost.  Set up new macros which allows us to make sure the right
      thing happen, even if at some point we end up supporting larger
      logical block numbers.
      
      Thanks to the Emese Revfy and the PaX security team for reporting this
      issue.
      Reported-by: NPaX Team <pageexec@freemail.hu>
      Reported-by: Emese Revfy <re.emese@gmail.com>                                 
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      f5a44db5
    • S
      GFS2: Wait for async DIO in glock state changes · 582d2f7a
      Steven Whitehouse 提交于
      We need to wait for any outstanding DIO to complete in a couple
      of situations. Firstly, in case we are changing out of deferred
      mode (in inode_go_sync) where GLF_DIRTY will not be set. That
      call could be prefixed with a test for gl_state == LM_ST_DEFERRED
      but it doesn't seem worth it bearing in mind that the test for
      outstanding DIO is very quick anyway, in the usual case that there
      is none.
      
      The second case is in inode_go_lock which will catch the cases
      where we have a cached EX lock, but where we grant deferred locks
      against it so that there is no glock state transistion. We only
      need to wait if the state is not deferred, since DIO is valid
      anyway in that state.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      582d2f7a
    • S
      GFS2: Fix incorrect invalidation for DIO/buffered I/O · dfd11184
      Steven Whitehouse 提交于
      In patch 209806ab we allowed
      local deferred locks to be granted against a cached exclusive
      lock. That opened up a corner case which this patch now
      fixes.
      
      The solution to the problem is to check whether we have cached
      pages each time we do direct I/O and if so to unmap, flush
      and invalidate those pages. Since the glock state machine
      normally does that for us, mostly the code will be a no-op.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      dfd11184
  8. 18 12月, 2013 1 次提交
    • J
      ext4: fix deadlock when writing in ENOSPC conditions · 34cf865d
      Jan Kara 提交于
      Akira-san has been reporting rare deadlocks of his machine when running
      xfstests test 269 on ext4 filesystem. The problem turned out to be in
      ext4_da_reserve_metadata() and ext4_da_reserve_space() which called
      ext4_should_retry_alloc() while holding i_data_sem. Since
      ext4_should_retry_alloc() can force a transaction commit, this is a
      lock ordering violation and leads to deadlocks.
      
      Fix the problem by just removing the retry loops. These functions should
      just report ENOSPC to the caller (e.g. ext4_da_write_begin()) and that
      function must take care of retrying after dropping all necessary locks.
      Reported-and-tested-by: NAkira Fujita <a-fujita@rs.jp.nec.com>
      Reviewed-by: NZheng Liu <wenqing.lz@taobao.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      34cf865d
  9. 17 12月, 2013 8 次提交
    • D
      xfs: abort metadata writeback on permanent errors · ac8809f9
      Dave Chinner 提交于
      If we are doing aysnc writeback of metadata, we can get write errors
      but have nobody to report them to. At the moment, we simply attempt
      to reissue the write from io completion in the hope that it's a
      transient error.
      
      When it's not a transient error, the buffer is stuck forever in
      this loop, and we cannot break out of it. Eventually, unmount will
      hang because the AIL cannot be emptied and everything goes downhill
      from them.
      
      To solve this problem, only retry the write IO once before aborting
      it. We don't throw the buffer away because some transient errors can
      last minutes (e.g.  FC path failover) or even hours (thin
      provisioned devices that have run out of backing space) before they
      go away. Hence we really want to keep trying until we can't try any
      more.
      
      Because the buffer was not cleaned, however, it does not get removed
      from the AIL and hence the next pass across the AIL will start IO on
      it again. As such, we still get the "retry forever" semantics that
      we currently have, but we allow other access to the buffer in the
      mean time. Meanwhile the filesystem can continue to modify the
      buffer and relog it, so the IO errors won't hang the log or the
      filesystem.
      
      Now when we are pushing the AIL, we can see all these "permanent IO
      error" buffers and we can issue a warning about failures before we
      retry the IO. We can also catch these buffers when unmounting an
      issue a corruption warning, too.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      ac8809f9
    • D
      xfs: swalloc doesn't align allocations properly · 33177f05
      Dave Chinner 提交于
      When swalloc is specified as a mount option, allocations are
      supposed to be aligned to the stripe width rather than the stripe
      unit of the underlying filesystem. However, it does not do this.
      
      What the implementation does is round up the allocation size to a
      stripe width, hence ensuring that all allocations span a full stripe
      width. It does not, however, ensure that that allocation is aligned
      to a stripe width, and hence the allocations can span multiple
      underlying stripes and so still see RMW cycles for things like
      direct IO on MD RAID.
      
      So, if the swalloc mount option is set, change the allocation
      alignment in xfs_bmap_btalloc() to use the stripe width rather than
      the stripe unit.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      33177f05
    • C
      xfs: remove xfsbdstrat error · 83a0adc3
      Christoph Hellwig 提交于
      The xfsbdstrat helper is a small but useless wrapper for xfs_buf_iorequest that
      handles the case of a shut down filesystem.  Most of the users have private,
      uncached buffers that can just be freed in this case, but the complex error
      handling in xfs_bioerror_relse messes up the case when it's called without
      a locked buffer.
      
      Remove xfsbdstrat and opencode the error handling in the callers.  All but
      one can simply return an error and don't need to deal with buffer state,
      and the one caller that cares about the buffer state could do with a major
      cleanup as well, but we'll defer that to later.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      83a0adc3
    • D
      xfs: align initial file allocations correctly · 6e708bcf
      Dave Chinner 提交于
      The function xfs_bmap_isaeof() is used to indicate that an
      allocation is occurring at or past the end of file, and as such
      should be aligned to the underlying storage geometry if possible.
      
      Commit 27a3f8f2 ("xfs: introduce xfs_bmap_last_extent") changed the
      behaviour of this function for empty files - it turned off
      allocation alignment for this case accidentally. Hence large initial
      allocations from direct IO are not getting correctly aligned to the
      underlying geometry, and that is cause write performance to drop in
      alignment sensitive configurations.
      
      Fix it by considering allocation into empty files as requiring
      aligned allocation again.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit f9b395a8)
      6e708bcf
    • J
      xfs: fix infinite loop by detaching the group/project hints from user dquot · 718cc6f8
      Jie Liu 提交于
      xfs_quota(8) will hang up if trying to turn group/project quota off
      before the user quota is off, this could be 100% reproduced by:
        # mount -ouquota,gquota /dev/sda7 /xfs
        # mkdir /xfs/test
        # xfs_quota -xc 'off -g' /xfs <-- hangs up
        # echo w > /proc/sysrq-trigger
        # dmesg
      
        SysRq : Show Blocked State
        task                        PC stack   pid father
        xfs_quota       D 0000000000000000     0 27574   2551 0x00000000
        [snip]
        Call Trace:
        [<ffffffff81aaa21d>] schedule+0xad/0xc0
        [<ffffffff81aa327e>] schedule_timeout+0x35e/0x3c0
        [<ffffffff8114b506>] ? mark_held_locks+0x176/0x1c0
        [<ffffffff810ad6c0>] ? call_timer_fn+0x2c0/0x2c0
        [<ffffffffa0c25380>] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
        [<ffffffff81aa3306>] schedule_timeout_uninterruptible+0x26/0x30
        [<ffffffffa0c26155>] xfs_qm_dquot_walk+0x235/0x260 [xfs]
        [<ffffffffa0c059d8>] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
        [<ffffffffa0c05805>] ? xfs_perag_get+0x5/0x2d0 [xfs]
        [<ffffffffa0b7707e>] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
        [<ffffffffa0c22280>] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
        [<ffffffffa0b7709f>] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
        [<ffffffffa0c261e6>] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
        [<ffffffffa0c2497a>] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
        [<ffffffffa0c2b8f6>] xfs_fs_set_xstate+0x136/0x180 [xfs]
        [<ffffffff8136cf7a>] do_quotactl+0x53a/0x6b0
        [<ffffffff812fba4b>] ? iput+0x5b/0x90
        [<ffffffff8136d257>] SyS_quotactl+0x167/0x1d0
        [<ffffffff814cf2ee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
        [<ffffffff81abcd19>] system_call_fastpath+0x16/0x1b
      
      It's fine if we turn user quota off at first, then turn off other
      kind of quotas if they are enabled since the group/project dquot
      refcount is decreased to zero once the user quota if off. Otherwise,
      those dquots refcount is non-zero due to the user dquot might refer
      to them as hint(s).  Hence, above operation cause an infinite loop
      at xfs_qm_dquot_walk() while trying to purge dquot cache.
      
      This problem has been around since Linux 3.4, it was introduced by:
        [ b84a3a96 xfs: remove the per-filesystem list of dquots ]
      
      Originally we will release the group dquot pointers because the user
      dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
      However, with above change, there is no such work to be done before
      purging group/project dquot cache.
      
      In order to solve this problem, this patch introduces a special routine
      xfs_qm_dqpurge_hints(), and it would release the group/project dquot
      pointers the user dquots maybe carrying around as a hint, and then it
      will proceed to purge the user dquot cache if requested.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit df8052e7)
      718cc6f8
    • J
      xfs: fix assertion failure at xfs_setattr_nonsize · 5c227278
      Jie Liu 提交于
      For CRC enabled v5 super block, change a file's ownership can simply
      trigger an ASSERT failure at xfs_setattr_nonsize() if both group and
      project quota are enabled, i.e,
      
      [  305.337609] XFS: Assertion failed: !XFS_IS_PQUOTA_ON(mp), file: fs/xfs/xfs_iops.c, line: 621
      [  305.339250] Kernel BUG at ffffffffa0a7fa32 [verbose debug info unavailable]
      [  305.383939] Call Trace:
      [  305.385536]  [<ffffffffa0a7d95a>] xfs_setattr_nonsize+0x69a/0x720 [xfs]
      [  305.387142]  [<ffffffffa0a7dea9>] xfs_vn_setattr+0x29/0x70 [xfs]
      [  305.388727]  [<ffffffff811ca388>] notify_change+0x1a8/0x350
      [  305.390298]  [<ffffffff811ac39d>] chown_common+0xfd/0x110
      [  305.391868]  [<ffffffff811ad6bf>] SyS_fchownat+0xaf/0x110
      [  305.393440]  [<ffffffff811ad760>] SyS_lchown+0x20/0x30
      [  305.394995]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
      [  305.399870] RIP  [<ffffffffa0a7fa32>] assfail+0x22/0x30 [xfs]
      
      This fix adjust the assertion to check if the super block support both
      quota inodes or not.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 5a01dd54)
      5c227278
    • J
      xfs: fix false assertion at xfs_qm_vop_create_dqattach · 30d161c9
      Jie Liu 提交于
      After the previous fix, there still has another ASSERT failure if turning
      off any type of quota while fsstress is running at the same time.
      
      Backtrace in this case:
      
      [   50.867897] XFS: Assertion failed: XFS_IS_GQUOTA_ON(mp), file: fs/xfs/xfs_qm.c, line: 2118
      [   50.867924] ------------[ cut here ]------------
      ... <snip>
      [   50.867957] Kernel BUG at ffffffffa0b55a32 [verbose debug info unavailable]
      [   50.867999] invalid opcode: 0000 [#1] SMP
      [   50.869407] Call Trace:
      [   50.869446]  [<ffffffffa0bc408a>] xfs_qm_vop_create_dqattach+0x19a/0x2d0 [xfs]
      [   50.869512]  [<ffffffffa0b9cc45>] xfs_create+0x5c5/0x6a0 [xfs]
      [   50.869564]  [<ffffffffa0b5307c>] xfs_vn_mknod+0xac/0x1d0 [xfs]
      [   50.869615]  [<ffffffffa0b531d6>] xfs_vn_mkdir+0x16/0x20 [xfs]
      [   50.869655]  [<ffffffff811becd5>] vfs_mkdir+0x95/0x130
      [   50.869689]  [<ffffffff811bf63a>] SyS_mkdirat+0xaa/0xe0
      [   50.869723]  [<ffffffff811bf689>] SyS_mkdir+0x19/0x20
      [   50.869757]  [<ffffffff8170f7dd>] system_call_fastpath+0x1a/0x1f
      [   50.869793] Code: 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 <snip>
      [   50.870003] RIP  [<ffffffffa0b55a32>] assfail+0x22/0x30 [xfs]
      [   50.870050]  RSP <ffff88002941fd60>
      [   50.879251] ---[ end trace c93a2b342341c65b ]---
      
      We're hitting the ASSERT(XFS_IS_*QUOTA_ON(mp)) in xfs_qm_vop_create_dqattach(),
      however the assertion itself is not right IMHO.  While performing quota off, we
      firstly clear the XFS_*QUOTA_ACTIVE bit(s) from struct xfs_mount without taking
      any special locks, see xfs_qm_scall_quotaoff().  Hence there is no guarantee
      that the desired quota is still active.
      Signed-off-by: NJie Liu <jeff.liu@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit 37eb9706)
      30d161c9
    • M
      xfs: fix memory leak in xfs_dir2_node_removename · 3a8c9208
      Mark Tinguely 提交于
      Fix the leak of kernel memory in xfs_dir2_node_removename()
      when xfs_dir2_leafn_remove() returns an error code.
      Signed-off-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      
      (cherry picked from commit ef701600)
      3a8c9208
  10. 14 12月, 2013 5 次提交
  11. 13 12月, 2013 2 次提交
    • J
      procfs: also fix proc_reg_get_unmapped_area() for !MMU case · ae5758a1
      Jan Beulich 提交于
      Commit fad1a86e ("procfs: call default get_unmapped_area on
      MMU-present architectures"), as its title says, took care of only the
      MMU case, leaving the !MMU side still in the regressed state (returning
      -EIO in all cases where pde->proc_fops->get_unmapped_area is NULL).
      
      From the fad1a86e changelog:
      
       "Commit c4fe2448 ("sparc: fix PCI device proc file mmap(2)") added
        proc_reg_get_unmapped_area in proc_reg_file_ops and
        proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
        get_unmapped_area method is not defined for the target procfs file, which
        causes regression of mmap on /proc/vmcore.
      
        To address this issue, like get_unmapped_area(), call default
        current->mm->get_unmapped_area on MMU-present architectures if
        pde->proc_fops->get_unmapped_area, i.e.  the one in actual file operation
        in the procfs file, is not defined"
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: <stable@vger.kernel.org>	[3.12.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae5758a1
    • W
      dcache: allow word-at-a-time name hashing with big-endian CPUs · a5c21dce
      Will Deacon 提交于
      When explicitly hashing the end of a string with the word-at-a-time
      interface, we have to be careful which end of the word we pick up.
      
      On big-endian CPUs, the upper-bits will contain the data we're after, so
      ensure we generate our masks accordingly (and avoid hashing whatever
      random junk may have been sitting after the string).
      
      This patch adds a new dcache helper, bytemask_from_count, which creates
      a mask appropriate for the CPU endianness.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a5c21dce
  12. 12 12月, 2013 5 次提交
    • D
      Btrfs: fix access_ok() check in btrfs_ioctl_send() · 700ff4f0
      Dan Carpenter 提交于
      The closing parenthesis is in the wrong place.  We want to check
      "sizeof(*arg->clone_sources) * arg->clone_sources_count" instead of
      "sizeof(*arg->clone_sources * arg->clone_sources_count)".
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Reviewed-by: NJie Liu <jeff.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      cc: stable@vger.kernel.org
      700ff4f0
    • W
      Btrfs: make sure we cleanup all reloc roots if error happens · 467bb1d2
      Wang Shilong 提交于
      I hit an oops when merging reloc roots fails, the reason is that
      new reloc roots may be added and we should make sure we cleanup
      all reloc roots.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      467bb1d2
    • W
      Btrfs: skip building backref tree for uuid and quota tree when doing balance relocation · 66463748
      Wang Shilong 提交于
      Quota tree and UUID Tree is only cowed, they can not be snapshoted.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      66463748
    • W
      Btrfs: fix an oops when doing balance relocation · c974c464
      Wang Shilong 提交于
      I hit an oops when inserting reloc root into @reloc_root_tree(it can be
      easily triggered when forcing cow for relocation root)
      
      [  866.494539]  [<ffffffffa0499579>] btrfs_init_reloc_root+0x79/0xb0 [btrfs]
      [  866.495321]  [<ffffffffa044c240>] record_root_in_trans+0xb0/0x110 [btrfs]
      [  866.496109]  [<ffffffffa044d758>] btrfs_record_root_in_trans+0x48/0x80 [btrfs]
      [  866.496908]  [<ffffffffa0494da8>] select_reloc_root+0xa8/0x210 [btrfs]
      [  866.497703]  [<ffffffffa0495c8a>] do_relocation+0x16a/0x540 [btrfs]
      
      This is because reloc root inserted into @reloc_root_tree is not within one
      transaction,reloc root may be cowed and root block bytenr will be reused then
      oops happens.We should update reloc root in @reloc_root_tree when cow reloc
      root node, fix it.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Reviewed-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c974c464
    • F
      Btrfs: don't miss skinny extent items on delayed ref head contention · 639eefc8
      Filipe David Borba Manana 提交于
      Currently extent-tree.c:btrfs_lookup_extent_info() can miss the lookup
      of skinny extent items. This can happen when the execution flow is the
      following:
      
      * We do an extent tree lookup and fail to find a skinny extent item;
      
      * As a result, we attempt to see if a non-skinny extent item exists,
        either by looking at previous item in the leaf or by doing another
        full extent tree search;
      
      * We have a transaction and then we check for a matching delayed ref
        head in the transaction's delayed refs rbtree;
      
      * We find such delayed ref head and then we try to lock it with a
        call to mutex_trylock();
      
      * The lock was contended so we jump to the label "again", which repeats
        the extent tree search but for a non-skinny extent item, because we set
        previously metadata variable to 0 and the search key to look for a
        non-skinny extent-item;
      
      * After the jump (and after releasing the transaction's delayed refs
        lock), a skinny extent item might have been added to the extent tree
        but we will miss it because metadata is set to 0 and the search key
        is set for a non-skinny extent-item.
      
      The fix here is to not reset metadata to 0 and to jump to the initial search
      key setup if the delayed ref head is contended, instead of jumping directly
      to the extent tree search label ("again").
      
      This issue was found while investigating the issue reported at Bugzilla 64961.
      
      David Sterba suspected this function was missing extent items, and that
      this could be caused by the last change to this function, which was made
      in the following patch:
      
          [PATCH] Btrfs: optimize btrfs_lookup_extent_info()
          (commit 74be9510)
      
      But in fact this issue already existed before, because after failing to find
      a skinny extent item, the code set the search key for a non-skinny extent
      item, and on contention of a matching delayed ref head it would not search
      the extent tree for a skinny extent item anymore.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      639eefc8