1. 05 6月, 2014 2 次提交
  2. 04 4月, 2014 6 次提交
    • J
      ocfs2: avoid system inode ref confusion by adding mutex lock · 43b10a20
      jiangyiwen 提交于
      The following case may lead to the same system inode ref in confusion.
      
      A thread                            B thread
      ocfs2_get_system_file_inode
      ->get_local_system_inode
      ->_ocfs2_get_system_file_inode
                                          because of *arr == NULL,
                                          ocfs2_get_system_file_inode
                                          ->get_local_system_inode
                                          ->_ocfs2_get_system_file_inode
      gets first ref thru
      _ocfs2_get_system_file_inode,
      gets second ref thru igrab and
      set *arr = inode
                                          at the moment, B thread also gets
                                          two refs, so lead to one more
                                          inode ref.
      
      So add mutex lock to avoid multi thread set two inode ref once at the
      same time.
      Signed-off-by: Njiangyiwen <jiangyiwen@huawei.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      43b10a20
    • G
      ocfs2: revert iput deferring code in ocfs2_drop_dentry_lock · 8ed6b237
      Goldwyn Rodrigues 提交于
      The following patches are reverted in this patch because these patches
      caused performance regression in the remote unlink() calls.
      
        ea455f8a - ocfs2: Push out dropping of dentry lock to ocfs2_wq
        f7b1aa69 - ocfs2: Fix deadlock on umount
        5fd13189 - ocfs2: Don't oops in ocfs2_kill_sb on a failed mount
      
      Previous patches in this series removed the possible deadlocks from
      downconvert thread so the above patches shouldn't be needed anymore.
      
      The regression is caused because these patches delay the iput() in case
      of dentry unlocks.  This also delays the unlocking of the open lockres.
      The open lockresource is required to test if the inode can be wiped from
      disk or not.  When the deleting node does not get the open lock, it
      marks it as orphan (even though it is not in use by another
      node/process) and causes a journal checkpoint.  This delays operations
      following the inode eviction.  This also moves the inode to the orphaned
      inode which further causes more I/O and a lot of unneccessary orphans.
      
      The following script can be used to generate the load causing issues:
      
        declare -a create
        declare -a remove
        declare -a iterations=(1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384)
        unique="`mktemp -u XXXXX`"
        script="/tmp/idontknow-${unique}.sh"
        cat <<EOF > "${script}"
        for n in {1..8}; do mkdir -p test/dir\${n}
          eval touch test/dir\${n}/foo{1.."\$1"}
        done
        EOF
        chmod 700 "${script}"
      
        function fcreate ()
        {
          exec 2>&1 /usr/bin/time --format=%E "${script}" "$1"
        }
      
        function fremove ()
        {
          exec 2>&1 /usr/bin/time --format=%E ssh node2 "cd `pwd`; rm -Rf test*"
        }
      
        function fcp ()
        {
          exec 2>&1 /usr/bin/time --format=%E ssh node3 "cd `pwd`; cp -R test test.new"
        }
      
        echo -------------------------------------------------
        echo "| # files | create #s | copy #s | remove #s |"
        echo -------------------------------------------------
        for ((x=0; x < ${#iterations[*]} ; x++)) do
          create[$x]="`fcreate ${iterations[$x]}`"
          copy[$x]="`fcp ${iterations[$x]}`"
          remove[$x]="`fremove`"
          printf "| %8d | %9s | %9s | %9s |\n" ${iterations[$x]} ${create[$x]} ${copy[$x]} ${remove[$x]}
        done
        rm "${script}"
        echo "------------------------"
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ed6b237
    • J
      ocfs2: implement delayed dropping of last dquot reference · e3a767b6
      Jan Kara 提交于
      We cannot drop last dquot reference from downconvert thread as that
      creates the following deadlock:
      
      NODE 1                                  NODE2
      holds dentry lock for 'foo'
      holds inode lock for GLOBAL_BITMAP_SYSTEM_INODE
                                              dquot_initialize(bar)
                                                ocfs2_dquot_acquire()
                                                  ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
                                                  ...
      downconvert thread (triggered from another
      node or a different process from NODE2)
        ocfs2_dentry_post_unlock()
          ...
          iput(foo)
            ocfs2_evict_inode(foo)
              ocfs2_clear_inode(foo)
                dquot_drop(inode)
                  ...
      	    ocfs2_dquot_release()
                    ocfs2_inode_lock(USER_QUOTA_SYSTEM_INODE)
                     - blocks
                                                  finds we need more space in
                                                  quota file
                                                  ...
                                                  ocfs2_extend_no_holes()
                                                    ocfs2_inode_lock(GLOBAL_BITMAP_SYSTEM_INODE)
                                                      - deadlocks waiting for
                                                        downconvert thread
      
      We solve the problem by postponing dropping of the last dquot reference to
      a workqueue if it happens from the downconvert thread.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Reviewed-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3a767b6
    • D
      ocfs2: improve fsync efficiency and fix deadlock between aio_write and sync_file · 2931cdcb
      Darrick J. Wong 提交于
      Currently, ocfs2_sync_file grabs i_mutex and forces the current journal
      transaction to complete.  This isn't terribly efficient, since sync_file
      really only needs to wait for the last transaction involving that inode
      to complete, and this doesn't require i_mutex.
      
      Therefore, implement the necessary bits to track the newest tid
      associated with an inode, and teach sync_file to wait for that instead
      of waiting for everything in the journal to commit.  Furthermore, only
      issue the flush request to the drive if jbd2 hasn't already done so.
      
      This also eliminates the deadlock between ocfs2_file_aio_write() and
      ocfs2_sync_file().  aio_write takes i_mutex then calls
      ocfs2_aiodio_wait() to wait for unaligned dio writes to finish.
      However, if that dio completion involves calling fsync, then we can get
      into trouble when some ocfs2_sync_file tries to take i_mutex.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NMark Fasheh <mfasheh@suse.de>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2931cdcb
    • J
      ocfs2: remove unused variable uuid_net_key in ocfs2_initialize_super · a75fe48c
      joyce.xue 提交于
      Variable uuid_net_key in ocfs2_initialize_super() is not used.  Clean it
      up.
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a75fe48c
    • W
      ocfs2: change ip_unaligned_aio to of type mutex from atomit_t · c18ceab0
      Wengang Wang 提交于
      There is a problem that waitqueue_active() may check stale data thus miss
      a wakeup of threads waiting on ip_unaligned_aio.
      
      The valid value of ip_unaligned_aio is only 0 and 1 so we can change it to
      be of type mutex thus the above prolem is avoid.  Another benifit is that
      mutex which works as FIFO is fairer than wake_up_all().
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c18ceab0
  3. 13 3月, 2014 1 次提交
    • T
      fs: push sync_filesystem() down to the file system's remount_fs() · 02b9984d
      Theodore Ts'o 提交于
      Previously, the no-op "mount -o mount /dev/xxx" operation when the
      file system is already mounted read-write causes an implied,
      unconditional syncfs().  This seems pretty stupid, and it's certainly
      documented or guaraunteed to do this, nor is it particularly useful,
      except in the case where the file system was mounted rw and is getting
      remounted read-only.
      
      However, it's possible that there might be some file systems that are
      actually depending on this behavior.  In most file systems, it's
      probably fine to only call sync_filesystem() when transitioning from
      read-write to read-only, and there are some file systems where this is
      not needed at all (for example, for a pseudo-filesystem or something
      like romfs).
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Cc: Jan Kara <jack@suse.cz>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Anders Larsen <al@alarsen.net>
      Cc: Phillip Lougher <phillip@squashfs.org.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Petr Vandrovec <petr@vandrovec.name>
      Cc: xfs@oss.sgi.com
      Cc: linux-btrfs@vger.kernel.org
      Cc: linux-cifs@vger.kernel.org
      Cc: samba-technical@lists.samba.org
      Cc: codalist@coda.cs.cmu.edu
      Cc: linux-ext4@vger.kernel.org
      Cc: linux-f2fs-devel@lists.sourceforge.net
      Cc: fuse-devel@lists.sourceforge.net
      Cc: cluster-devel@redhat.com
      Cc: linux-mtd@lists.infradead.org
      Cc: jfs-discussion@lists.sourceforge.net
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-nilfs@vger.kernel.org
      Cc: linux-ntfs-dev@lists.sourceforge.net
      Cc: ocfs2-devel@oss.oracle.com
      Cc: reiserfs-devel@vger.kernel.org
      02b9984d
  4. 22 1月, 2014 3 次提交
  5. 13 11月, 2013 1 次提交
  6. 25 9月, 2013 1 次提交
  7. 29 8月, 2013 1 次提交
  8. 04 7月, 2013 1 次提交
  9. 07 3月, 2013 1 次提交
  10. 22 2月, 2013 1 次提交
  11. 03 10月, 2012 1 次提交
  12. 21 3月, 2012 3 次提交
  13. 07 1月, 2012 1 次提交
  14. 04 1月, 2012 1 次提交
    • A
      vfs: fix the stupidity with i_dentry in inode destructors · 6b520e05
      Al Viro 提交于
      Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
      it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
      the cost of taking it into inode_init_always() will be negligible for pipes
      and sockets and negative for everything else.  Not to mention the removal of
      boilerplate code from ->destroy_inode() instances...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6b520e05
  15. 17 11月, 2011 1 次提交
  16. 28 7月, 2011 1 次提交
    • M
      ocfs2: serialize unaligned aio · a11f7e63
      Mark Fasheh 提交于
      Fix a corruption that can happen when we have (two or more) outstanding
      aio's to an overlapping unaligned region.  Ext4
      (e9e3bcec) and xfs recently had to fix
      similar issues.
      
      In our case what happens is that we can have an outstanding aio on a region
      and if a write comes in with some bytes overlapping the original aio we may
      decide to read that region into a page before continuing (typically because
      of buffered-io fallback).  Since we have no ordering guarantees with the
      aio, we can read stale or bad data into the page and then write it back out.
      
      If the i/o is page and block aligned, then we avoid this issue as there
      won't be any need to read data from disk.
      
      I took the same approach as Eric in the ext4 patch and introduced some
      serialization of unaligned async direct i/o.  I don't expect this to have an
      effect on the most common cases of AIO.  Unaligned aio will be slower
      though, but that's far more acceptable than data corruption.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      a11f7e63
  17. 25 7月, 2011 1 次提交
  18. 04 6月, 2011 1 次提交
    • A
      more conservative S_NOSEC handling · 9e1f1de0
      Al Viro 提交于
      Caching "we have already removed suid/caps" was overenthusiastic as merged.
      On network filesystems we might have had suid/caps set on another client,
      silently picked by this client on revalidate, all of that *without* clearing
      the S_NOSEC flag.
      
      AFAICS, the only reasonably sane way to deal with that is
      	* new superblock flag; unless set, S_NOSEC is not going to be set.
      	* local block filesystems set it in their ->mount() (more accurately,
      mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than
      local block ones clear it)
      	* if any network filesystem (or a cluster one) wants to use S_NOSEC,
      it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when
      inode attribute changes are picked from other clients.
      
      It's not an earth-shattering hole (anybody that can set suid on another client
      will almost certainly be able to write to the file before doing that anyway),
      but it's a bug that needs fixing.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9e1f1de0
  19. 01 6月, 2011 1 次提交
  20. 27 5月, 2011 1 次提交
    • D
      ocfs2: add cleancache support · 1cfd8bd0
      Dan Magenheimer 提交于
      This eighth patch of eight in this cleancache series "opts-in"
      cleancache for ocfs2.  Clustered filesystems must explicitly enable
      cleancache by calling cleancache_init_shared_fs anytime an instance
      of the filesystem is mounted.  Ocfs2 is currently the only user of
      the clustered filesystem interface but nevertheless, the cleancache
      hooks in the VFS layer are sufficient for ocfs2 including the matching
      cleancache_flush_fs hook which must be called on unmount.
      
      Details and a FAQ can be found in Documentation/vm/cleancache.txt
      
      [v8: trivial merge conflict update]
      [v5: jeremy@goop.org: simplify init hook and any future fs init changes]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik Van Riel <riel@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Andreas Dilger <adilger@sun.com>
      Cc: Ted Tso <tytso@mit.edu>
      Cc: Nitin Gupta <ngupta@vflare.org>
      1cfd8bd0
  21. 24 5月, 2011 1 次提交
  22. 31 3月, 2011 1 次提交
  23. 23 2月, 2011 1 次提交
  24. 21 2月, 2011 1 次提交
  25. 07 3月, 2011 1 次提交
    • T
      ocfs2: Remove EXIT from masklog. · c1e8d35e
      Tao Ma 提交于
      mlog_exit is used to record the exit status of a function.
      But because it is added in so many functions, if we enable it,
      the system logs get filled up quickly and cause too much I/O.
      So actually no one can open it for a production system or even
      for a test.
      
      This patch just try to remove it or change it. So:
      1. if all the error paths already use mlog_errno, it is just removed.
         Otherwise, it will be replaced by mlog_errno.
      2. if it is used to print some return value, it is replaced with
         mlog(0,...).
      mlog_exit_ptr is changed to mlog(0.
      All those mlog(0,...) will be replaced with trace events later.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      c1e8d35e
  26. 21 2月, 2011 1 次提交
    • T
      ocfs2: Remove ENTRY from masklog. · ef6b689b
      Tao Ma 提交于
      ENTRY is used to record the entry of a function.
      But because it is added in so many functions, if we enable it,
      the system logs get filled up quickly and cause too much I/O.
      So actually no one can open it for a production system or even
      for a test.
      
      So for mlog_entry_void, we just remove it.
      for mlog_entry(...), we replace it with mlog(0,...), and they
      will be replace by trace event later.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      ef6b689b
  27. 20 2月, 2011 1 次提交
  28. 01 2月, 2011 1 次提交
    • T
      ocfs2: use system_wq instead of ocfs2_quota_wq · 316873c9
      Tejun Heo 提交于
      ocfs2_quota_wq is not depended upon during memory reclaim and, with
      cmwq, there's no reason to use a dedicated workqueue.  Drop
      ocfs2_quota_wq and use system_wq instead.  dqi_sync_work is already
      sync canceled on quota disable and no further synchronization is
      necessary.
      
      This change makes ocfs2_quota_setup/shutdown() noops.  Both functions
      removed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      316873c9
  29. 13 1月, 2011 2 次提交
    • A
      switch ocfs2, close races · ba87167c
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ba87167c
    • J
      quota: Fix deadlock during path resolution · f00c9e44
      Jan Kara 提交于
      As Al Viro pointed out path resolution during Q_QUOTAON calls to quotactl
      is prone to deadlocks. We hold s_umount semaphore for reading during the
      path resolution and resolution itself may need to acquire the semaphore
      for writing when e. g. autofs mountpoint is passed.
      
      Solve the problem by performing the resolution before we get hold of the
      superblock (and thus s_umount semaphore). The whole thing is complicated
      by the fact that some filesystems (OCFS2) ignore the path argument. So to
      distinguish between filesystem which want the path and which do not we
      introduce new .quota_on_meta callback which does not get the path. OCFS2
      then uses this callback instead of old .quota_on.
      
      CC: Al Viro <viro@ZenIV.linux.org.uk>
      CC: Christoph Hellwig <hch@lst.de>
      CC: Ted Ts'o <tytso@mit.edu>
      CC: Joel Becker <joel.becker@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      f00c9e44