1. 02 2月, 2015 9 次提交
  2. 22 1月, 2015 3 次提交
    • D
      xfs: sanitise sb_bad_features2 handling · 074e427b
      Dave Chinner 提交于
      We currently have to ensure that every time we update sb_features2
      that we update sb_bad_features2. Now that we log and format the
      superblock in it's entirety we actually don't have to care because
      we can simply update the sb_bad_features2 when we format it into the
      buffer. This removes the need for anything but the mount and
      superblock formatting code to care about sb_bad_features2, and
      hence removes the possibility that we forget to update bad_features2
      when necessary in the future.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      074e427b
    • D
      xfs: consolidate superblock logging functions · 61e63ecb
      Dave Chinner 提交于
      We now have several superblock loggin functions that are identical
      except for the transaction reservation and whether it shoul dbe a
      synchronous transaction or not. Consolidate these all into a single
      function, a single reserveration and a sync flag and call it
      xfs_sync_sb().
      
      Also, xfs_mod_sb() is not really a modification function - it's the
      operation of logging the superblock buffer. hence change the name of
      it to reflect this.
      
      Note that we have to change the mp->m_update_flags that are passed
      around at mount time to a boolean simply to indicate a superblock
      update is needed.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      61e63ecb
    • D
      xfs: remove bitfield based superblock updates · 4d11a402
      Dave Chinner 提交于
      When we log changes to the superblock, we first have to write them
      to the on-disk buffer, and then log that. Right now we have a
      complex bitfield based arrangement to only write the modified field
      to the buffer before we log it.
      
      This used to be necessary as a performance optimisation because we
      logged the superblock buffer in every extent or inode allocation or
      freeing, and so performance was extremely important. We haven't done
      this for years, however, ever since the lazy superblock counters
      pulled the superblock logging out of the transaction commit
      fast path.
      
      Hence we have a bunch of complexity that is not necessary that makes
      writing the in-core superblock to disk much more complex than it
      needs to be. We only need to log the superblock now during
      management operations (e.g. during mount, unmount or quota control
      operations) so it is not a performance critical path anymore.
      
      As such, remove the complex field based logging mechanism and
      replace it with a simple conversion function similar to what we use
      for all other on-disk structures.
      
      This means we always log the entirity of the superblock, but again
      because we rarely modify the superblock this is not an issue for log
      bandwidth or CPU time. Indeed, if we do log the superblock
      frequently, delayed logging will minimise the impact of this
      overhead.
      
      [Fixed gquota/pquota inode sharing regression noticed by bfoster.]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      4d11a402
  3. 19 12月, 2014 6 次提交
    • J
      ocfs2: fix journal commit deadlock · 136f49b9
      Junxiao Bi 提交于
      For buffer write, page lock will be got in write_begin and released in
      write_end, in ocfs2_write_end_nolock(), before it unlock the page in
      ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask
      for the read lock of journal->j_trans_barrier.  Holding page lock and
      ask for journal->j_trans_barrier breaks the locking order.
      
      This will cause a deadlock with journal commit threads, ocfs2cmt will
      get write lock of journal->j_trans_barrier first, then it wakes up
      kjournald2 to do the commit work, at last it waits until done.  To
      commit journal, kjournald2 needs flushing data first, it needs get the
      cache page lock.
      
      Since some ocfs2 cluster locks are holding by write process, this
      deadlock may hung the whole cluster.
      
      unlock pages before ocfs2_run_deallocs() can fix the locking order, also
      put unlock before ocfs2_commit_trans() to make page lock is unlocked
      before j_trans_barrier to preserve unlocking order.
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NWengang Wang <wen.gang.wang@oracle.com>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      136f49b9
    • J
      ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker · 1e589581
      Joseph Qi 提交于
      Commit ac4fef4d ("ocfs2/dlm: do not purge lockres that is queued for
      assert master") may have the following possible race case:
      
        dlm_dispatch_assert_master       dlm_wq
        ========================================================================
        queue_work(dlm->quedlm_worker,
            &dlm->dispatched_work);
                                       dispatch work,
                                       dlm_lockres_drop_inflight_worker
                                       *BUG_ON(res->inflight_assert_workers == 0)*
        dlm_lockres_grab_inflight_worker
        inflight_assert_workers++
      
      So ensure inflight_assert_workers to be increased first.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Signed-off-by: NXue jiufei <xuejiufei@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e589581
    • J
      ocfs2: reflink: fix slow unlink for refcounted file · f62f12b3
      Junxiao Bi 提交于
      When running ocfs2 test suite multiple nodes reflink stress test, for a
      4 nodes cluster, every unlink() for refcounted file needs about 700s.
      
      The slow unlink is caused by the contention of refcount tree lock since
      all nodes are unlink files using the same refcount tree.  When the
      unlinking file have many extents(over 1600 in our test), most of the
      extents has refcounted flag set.  In ocfs2_commit_truncate(), it will
      execute the following call trace for every extents.  This means it needs
      get and released refcount tree lock about 1600 times.  And when several
      nodes are do this at the same time, the performance will be very low.
      
        ocfs2_remove_btree_range()
        --  ocfs2_lock_refcount_tree()
        ----  ocfs2_refcount_lock()
        ------  __ocfs2_cluster_lock()
      
      ocfs2_refcount_lock() is costly, move it to ocfs2_commit_truncate() to
      do lock/unlock once can improve a lot performance.
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Wengang <wen.gang.wang@oracle.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f62f12b3
    • P
      fs/proc/meminfo.c: include cma info in proc/meminfo · 47f8f929
      Pintu Kumar 提交于
      This patch include CMA info (CMATotal, CMAFree) in /proc/meminfo.
      Currently, in a CMA enabled system, if somebody wants to know the total
      CMA size declared, there is no way to tell, other than the dmesg or
      /var/log/messages logs.
      
      With this patch we are showing the CMA info as part of meminfo, so that it
      can be determined at any point of time.  This will be populated only when
      CMA is enabled.
      
      Below is the sample output from a ARM based device with RAM:512MB and CMA:16MB.
      
        MemTotal:         471172 kB
        MemFree:          111712 kB
        MemAvailable:     271172 kB
        .
        .
        .
        CmaTotal:          16384 kB
        CmaFree:            6144 kB
      
      This patch also fix below checkpatch errors that were found during these changes.
      
        ERROR: space required after that ',' (ctx:ExV)
        199: FILE: fs/proc/meminfo.c:199:
        +       ,atomic_long_read(&num_poisoned_pages) << (PAGE_SHIFT - 10)
                ^
      
        ERROR: space required after that ',' (ctx:ExV)
        202: FILE: fs/proc/meminfo.c:202:
        +       ,K(global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
                ^
      
        ERROR: space required after that ',' (ctx:ExV)
        206: FILE: fs/proc/meminfo.c:206:
        +       ,K(totalcma_pages)
                ^
      
        total: 3 errors, 0 warnings, 2 checks, 236 lines checked
      Signed-off-by: NPintu Kumar <pintu.k@samsung.com>
      Signed-off-by: NVishnu Pratap Singh <vishnu.ps@samsung.com>
      Acked-by: NMichal Nazarewicz <mina86@mina86.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      47f8f929
    • S
      hfsplus: fix longname handling · 89ac9b4d
      Sougata Santra 提交于
      Longname is not correctly handled by hfsplus driver.  If an attempt to
      create a longname(>255) file/directory is made, it succeeds by creating a
      file/directory with HFSPLUS_MAX_STRLEN and incorrect catalog key.  Thus
      leaving the volume in an inconsistent state.  This patch fixes this issue.
      
      Although lookup is always called first to create a negative entry, so just
      doing a check in lookup would probably fix this issue.  I choose to
      propagate error to other iops as well.
      
      Please NOTE: I have factored out hfsplus_cat_build_key_with_cnid from
      hfsplus_cat_build_key, to avoid unncessary branching.
      
      Thanks a lot.
      
        TEST:
        ------
        dir="TEST_DIR"
        cdir=`pwd`
        name255="_123456789_123456789_123456789_123456789_123456789_123456789\
        _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
        _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
        _123456789_123456789_123456789_123456789_123456789_1234"
        name256="${name255}5"
      
        mkdir $dir
        cd $dir
        touch $name255
        rm -f $name255
        touch $name256
        ls -la
        cd $cdir
        rm -rf $dir
      
        RESULT:
        -------
        [sougata@ultrabook tmp]$ cdir=`pwd`
        [sougata@ultrabook tmp]$
        name255="_123456789_123456789_123456789_123456789_123456789_123456789\
         > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
         > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
         > _123456789_123456789_123456789_123456789_123456789_1234"
        [sougata@ultrabook tmp]$ name256="${name255}5"
        [sougata@ultrabook tmp]$
        [sougata@ultrabook tmp]$ mkdir $dir
        [sougata@ultrabook tmp]$ cd $dir
        [sougata@ultrabook TEST_DIR]$ touch $name255
        [sougata@ultrabook TEST_DIR]$ rm -f $name255
        [sougata@ultrabook TEST_DIR]$ touch $name256
        [sougata@ultrabook TEST_DIR]$ ls -la
        ls: cannot access
        _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234:
        No such file or directory
        total 0
        drwxrwxr-x 1 sougata sougata 3 Feb 20 19:56 .
        drwxrwxrwx 1 root    root    6 Feb 20 19:56 ..
        -????????? ? ?       ?       ?            ?
        _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234
        [sougata@ultrabook TEST_DIR]$ cd $cdir
        [sougata@ultrabook tmp]$ rm -rf $dir
        rm: cannot remove `TEST_DIR': Directory not empty
      
      -ENAMETOOLONG returned from hfsplus_asc2uni was not propaged to iops.
      This allowed hfsplus to create files/directories with HFSPLUS_MAX_STRLEN
      and incorrect keys, leaving the FS in an inconsistent state.  This patch
      fixes this issue.
      Signed-off-by: NSougata Santra <sougata@tuxera.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Vyacheslav Dubeyko <slava@dubeyko.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      89ac9b4d
    • E
      mnt: Fix a memory stomp in umount · c297abfd
      Eric W. Biederman 提交于
      While reviewing the code of umount_tree I realized that when we append
      to a preexisting unmounted list we do not change pprev of the former
      first item in the list.
      
      Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
      the former first item of the list will stomp unmounted.first leaving
      it set to some random mount point which we are likely to free soon.
      
      This isn't likely to hit, but if it does I don't know how anyone could
      track it down.
      
      [ This happened because we don't have all the same operations for
        hlist's as we do for normal doubly-linked lists. In particular,
        list_splice() is easy on our standard doubly-linked lists, while
        hlist_splice() doesn't exist and needs both start/end entries of the
        hlist.  And commit 38129a13 incorrectly open-coded that missing
        hlist_splice().
      
        We should think about making these kinds of "mindless" conversions
        easier to get right by adding the missing hlist helpers   - Linus ]
      
      Fixes: 38129a13 switch mnt_hash to hlist
      Cc: stable@vger.kernel.org
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c297abfd
  4. 18 12月, 2014 20 次提交
  5. 17 12月, 2014 2 次提交