1. 03 10月, 2012 1 次提交
  2. 21 3月, 2012 3 次提交
  3. 07 1月, 2012 1 次提交
  4. 04 1月, 2012 1 次提交
    • A
      vfs: fix the stupidity with i_dentry in inode destructors · 6b520e05
      Al Viro 提交于
      Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
      it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
      the cost of taking it into inode_init_always() will be negligible for pipes
      and sockets and negative for everything else.  Not to mention the removal of
      boilerplate code from ->destroy_inode() instances...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      6b520e05
  5. 17 11月, 2011 1 次提交
  6. 28 7月, 2011 1 次提交
    • M
      ocfs2: serialize unaligned aio · a11f7e63
      Mark Fasheh 提交于
      Fix a corruption that can happen when we have (two or more) outstanding
      aio's to an overlapping unaligned region.  Ext4
      (e9e3bcec) and xfs recently had to fix
      similar issues.
      
      In our case what happens is that we can have an outstanding aio on a region
      and if a write comes in with some bytes overlapping the original aio we may
      decide to read that region into a page before continuing (typically because
      of buffered-io fallback).  Since we have no ordering guarantees with the
      aio, we can read stale or bad data into the page and then write it back out.
      
      If the i/o is page and block aligned, then we avoid this issue as there
      won't be any need to read data from disk.
      
      I took the same approach as Eric in the ext4 patch and introduced some
      serialization of unaligned async direct i/o.  I don't expect this to have an
      effect on the most common cases of AIO.  Unaligned aio will be slower
      though, but that's far more acceptable than data corruption.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <jlbec@evilplan.org>
      a11f7e63
  7. 25 7月, 2011 1 次提交
  8. 04 6月, 2011 1 次提交
    • A
      more conservative S_NOSEC handling · 9e1f1de0
      Al Viro 提交于
      Caching "we have already removed suid/caps" was overenthusiastic as merged.
      On network filesystems we might have had suid/caps set on another client,
      silently picked by this client on revalidate, all of that *without* clearing
      the S_NOSEC flag.
      
      AFAICS, the only reasonably sane way to deal with that is
      	* new superblock flag; unless set, S_NOSEC is not going to be set.
      	* local block filesystems set it in their ->mount() (more accurately,
      mount_bdev() does, so does btrfs ->mount(), users of mount_bdev() other than
      local block ones clear it)
      	* if any network filesystem (or a cluster one) wants to use S_NOSEC,
      it'll need to set MS_NOSEC in sb->s_flags *AND* take care to clear S_NOSEC when
      inode attribute changes are picked from other clients.
      
      It's not an earth-shattering hole (anybody that can set suid on another client
      will almost certainly be able to write to the file before doing that anyway),
      but it's a bug that needs fixing.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      9e1f1de0
  9. 01 6月, 2011 1 次提交
  10. 27 5月, 2011 1 次提交
    • D
      ocfs2: add cleancache support · 1cfd8bd0
      Dan Magenheimer 提交于
      This eighth patch of eight in this cleancache series "opts-in"
      cleancache for ocfs2.  Clustered filesystems must explicitly enable
      cleancache by calling cleancache_init_shared_fs anytime an instance
      of the filesystem is mounted.  Ocfs2 is currently the only user of
      the clustered filesystem interface but nevertheless, the cleancache
      hooks in the VFS layer are sufficient for ocfs2 including the matching
      cleancache_flush_fs hook which must be called on unmount.
      
      Details and a FAQ can be found in Documentation/vm/cleancache.txt
      
      [v8: trivial merge conflict update]
      [v5: jeremy@goop.org: simplify init hook and any future fs init changes]
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Reviewed-by: NJeremy Fitzhardinge <jeremy@goop.org>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Rik Van Riel <riel@redhat.com>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Andreas Dilger <adilger@sun.com>
      Cc: Ted Tso <tytso@mit.edu>
      Cc: Nitin Gupta <ngupta@vflare.org>
      1cfd8bd0
  11. 24 5月, 2011 1 次提交
  12. 31 3月, 2011 1 次提交
  13. 23 2月, 2011 1 次提交
  14. 21 2月, 2011 1 次提交
  15. 07 3月, 2011 1 次提交
    • T
      ocfs2: Remove EXIT from masklog. · c1e8d35e
      Tao Ma 提交于
      mlog_exit is used to record the exit status of a function.
      But because it is added in so many functions, if we enable it,
      the system logs get filled up quickly and cause too much I/O.
      So actually no one can open it for a production system or even
      for a test.
      
      This patch just try to remove it or change it. So:
      1. if all the error paths already use mlog_errno, it is just removed.
         Otherwise, it will be replaced by mlog_errno.
      2. if it is used to print some return value, it is replaced with
         mlog(0,...).
      mlog_exit_ptr is changed to mlog(0.
      All those mlog(0,...) will be replaced with trace events later.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      c1e8d35e
  16. 21 2月, 2011 1 次提交
    • T
      ocfs2: Remove ENTRY from masklog. · ef6b689b
      Tao Ma 提交于
      ENTRY is used to record the entry of a function.
      But because it is added in so many functions, if we enable it,
      the system logs get filled up quickly and cause too much I/O.
      So actually no one can open it for a production system or even
      for a test.
      
      So for mlog_entry_void, we just remove it.
      for mlog_entry(...), we replace it with mlog(0,...), and they
      will be replace by trace event later.
      Signed-off-by: NTao Ma <boyu.mt@taobao.com>
      ef6b689b
  17. 20 2月, 2011 1 次提交
  18. 01 2月, 2011 1 次提交
    • T
      ocfs2: use system_wq instead of ocfs2_quota_wq · 316873c9
      Tejun Heo 提交于
      ocfs2_quota_wq is not depended upon during memory reclaim and, with
      cmwq, there's no reason to use a dedicated workqueue.  Drop
      ocfs2_quota_wq and use system_wq instead.  dqi_sync_work is already
      sync canceled on quota disable and no further synchronization is
      necessary.
      
      This change makes ocfs2_quota_setup/shutdown() noops.  Both functions
      removed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      316873c9
  19. 13 1月, 2011 2 次提交
    • A
      switch ocfs2, close races · ba87167c
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ba87167c
    • J
      quota: Fix deadlock during path resolution · f00c9e44
      Jan Kara 提交于
      As Al Viro pointed out path resolution during Q_QUOTAON calls to quotactl
      is prone to deadlocks. We hold s_umount semaphore for reading during the
      path resolution and resolution itself may need to acquire the semaphore
      for writing when e. g. autofs mountpoint is passed.
      
      Solve the problem by performing the resolution before we get hold of the
      superblock (and thus s_umount semaphore). The whole thing is complicated
      by the fact that some filesystems (OCFS2) ignore the path argument. So to
      distinguish between filesystem which want the path and which do not we
      introduce new .quota_on_meta callback which does not get the path. OCFS2
      then uses this callback instead of old .quota_on.
      
      CC: Al Viro <viro@ZenIV.linux.org.uk>
      CC: Christoph Hellwig <hch@lst.de>
      CC: Ted Ts'o <tytso@mit.edu>
      CC: Joel Becker <joel.becker@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      f00c9e44
  20. 07 1月, 2011 1 次提交
    • N
      fs: icache RCU free inodes · fa0d7e3d
      Nick Piggin 提交于
      RCU free the struct inode. This will allow:
      
      - Subsequent store-free path walking patch. The inode must be consulted for
        permissions when walking, so an RCU inode reference is a must.
      - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
        to take i_lock no longer need to take sb_inode_list_lock to walk the list in
        the first place. This will simplify and optimize locking.
      - Could remove some nested trylock loops in dcache code
      - Could potentially simplify things a bit in VM land. Do not need to take the
        page lock to follow page->mapping.
      
      The downsides of this is the performance cost of using RCU. In a simple
      creat/unlink microbenchmark, performance drops by about 10% due to inability to
      reuse cache-hot slab objects. As iterations increase and RCU freeing starts
      kicking over, this increases to about 20%.
      
      In cases where inode lifetimes are longer (ie. many inodes may be allocated
      during the average life span of a single inode), a lot of this cache reuse is
      not applicable, so the regression caused by this patch is smaller.
      
      The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
      however this adds some complexity to list walking and store-free path walking,
      so I prefer to implement this at a later date, if it is shown to be a win in
      real situations. I haven't found a regression in any non-micro benchmark so I
      doubt it will be a problem.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fa0d7e3d
  21. 18 11月, 2010 1 次提交
  22. 29 10月, 2010 1 次提交
  23. 12 10月, 2010 2 次提交
  24. 08 10月, 2010 1 次提交
    • S
      · 2c442719
      Sunil Mushran 提交于
      ocfs2: Add support for heartbeat=global mount option
      
      Adds support for heartbeat=global mount option. It ensures that the heartbeat
      mode passed matches the one enabled on disk.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      2c442719
  25. 10 10月, 2010 1 次提交
    • S
      · 98f486f2
      Sunil Mushran 提交于
      ocfs2: Add an incompat feature flag OCFS2_FEATURE_INCOMPAT_CLUSTERINFO
      
      OCFS2_FEATURE_INCOMPAT_CLUSTERINFO allows us to use sb->s_cluster_info for
      both userspace and o2cb cluster stacks. It also allows us to extend cluster
      info to include stack flags.
      
      This patch also adds stackflags to sb->s_clusterinfo. It also introduces a
      clusterinfo flag OCFS2_CLUSTER_O2CB_GLOBAL_HEARTBEAT to denote the enabled
      global heartbeat mode.
      
      This incompat flag can be set/cleared using tunefs.ocfs2 --fs-features. The
      clusterinfo flag is set/cleared using tunefs.ocfs2 --update-cluster-stack.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      98f486f2
  26. 05 10月, 2010 2 次提交
    • A
      BKL: Remove BKL from OCFS2 · 60056794
      Arnd Bergmann 提交于
      The BKL in ocfs2/dlmfs is used in put_super, fill_super and remount_fs
      that are all three protected by the superblocks s_umount rw_semaphore.
      
      The use in ocfs2_control_open is evidently unrelated and the function
      is protected by ocfs2_control_lock.
      
      Therefore it is safe to remove the BKL entirely.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      60056794
    • J
      BKL: Explicitly add BKL around get_sb/fill_super · db719222
      Jan Blunck 提交于
      This patch is a preparation necessary to remove the BKL from do_new_mount().
      It explicitly adds calls to lock_kernel()/unlock_kernel() around
      get_sb/fill_super operations for filesystems that still uses the BKL.
      
      I've read through all the code formerly covered by the BKL inside
      do_kern_mount() and have satisfied myself that it doesn't need the BKL
      any more.
      
      do_kern_mount() is already called without the BKL when mounting the rootfs
      and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
      from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
      through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
      afs_mntpt_follow_link(). Both later functions are actually the filesystems
      follow_link inode operation. vfs_kern_mount() is calling the specified
      get_sb function and lets the filesystem do its job by calling the given
      fill_super function.
      
      Therefore I think it is safe to push down the BKL from the VFS to the
      low-level filesystems get_sb/fill_super operation.
      
      [arnd: do not add the BKL to those file systems that already
             don't use it elsewhere]
      Signed-off-by: NJan Blunck <jblunck@infradead.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Christoph Hellwig <hch@infradead.org>
      db719222
  27. 10 9月, 2010 2 次提交
    • T
      ocfs2: Cache system inodes of other slots. · b4d693fc
      Tao Ma 提交于
      Durring orphan scan, if we are slot 0, and we are replaying
      orphan_dir:0001, the general process is that for every file
      in this dir:
      1. we will iget orphan_dir:0001, since there is no inode for it.
         we will have to create an inode and read it from the disk.
      2. do the normal work, such as delete_inode and remove it from
         the dir if it is allowed.
      3. call iput orphan_dir:0001 when we are done. In this case,
         since we have no dcache for this inode, i_count will
         reach 0, and VFS will have to call clear_inode and in
         ocfs2_clear_inode we will checkpoint the inode which will let
         ocfs2_cmt and journald begin to work.
      4. We loop back to 1 for the next file.
      
      So you see, actually for every deleted file, we have to read the
      orphan dir from the disk and checkpoint the journal. It is very
      time consuming and cause a lot of journal checkpoint I/O.
      A better solution is that we can have another reference for these
      inodes in ocfs2_super. So if there is no other race among
      nodes(which will let dlmglue to checkpoint the inode), for step 3,
      clear_inode won't be called and for step 1, we may only need to
      read the inode for the 1st time. This is a big win for us.
      
      So this patch will try to cache system inodes of other slots so
      that we will have one more reference for these inodes and avoid
      the extra inode read and journal checkpoint.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b4d693fc
    • P
      OCFS2: Allow huge (> 16 TiB) volumes to mount · 3bdb8efd
      Patrick J. LoPresti 提交于
      The OCFS2 developers have already done all of the hard work to allow
      volumes larger than 16 TiB.  But there is still a "sanity check" in
      fs/ocfs2/super.c that prevents the mounting of such volumes, even when
      the cluster size and journal options would allow it.
      
      This patch replaces that sanity check with a more sophisticated one to
      mount a huge volume provided that (a) it is addressable by the raw
      word/address size of the system (borrowing a test from ext4); (b) the
      volume is using JBD2; and (c) the JBD2_FEATURE_INCOMPAT_64BIT flag is
      set on the journal.
      
      I factored out the sanity check into its own function.  I also moved it
      from ocfs2_initialize_super() down to ocfs2_check_volume(); any earlier,
      and the journal will not have been initialized yet.
      
      This patch is one of a pair, and it depends on the other ("JBD2: Allow
      feature checks before journal recovery").
      
      I have tested this patch on small volumes, huge volumes, and huge
      volumes without 64-bit block support in the journal.  All of them appear
      to work or to fail gracefully, as appropriate.
      Signed-off-by: NPatrick LoPresti <lopresti@gmail.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      3bdb8efd
  28. 10 8月, 2010 1 次提交
  29. 17 6月, 2010 1 次提交
  30. 24 5月, 2010 4 次提交
  31. 22 5月, 2010 1 次提交