1. 05 10月, 2010 1 次提交
    • J
      BKL: Explicitly add BKL around get_sb/fill_super · db719222
      Jan Blunck 提交于
      This patch is a preparation necessary to remove the BKL from do_new_mount().
      It explicitly adds calls to lock_kernel()/unlock_kernel() around
      get_sb/fill_super operations for filesystems that still uses the BKL.
      
      I've read through all the code formerly covered by the BKL inside
      do_kern_mount() and have satisfied myself that it doesn't need the BKL
      any more.
      
      do_kern_mount() is already called without the BKL when mounting the rootfs
      and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
      from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
      through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
      afs_mntpt_follow_link(). Both later functions are actually the filesystems
      follow_link inode operation. vfs_kern_mount() is calling the specified
      get_sb function and lets the filesystem do its job by calling the given
      fill_super function.
      
      Therefore I think it is safe to push down the BKL from the VFS to the
      low-level filesystems get_sb/fill_super operation.
      
      [arnd: do not add the BKL to those file systems that already
             don't use it elsewhere]
      Signed-off-by: NJan Blunck <jblunck@infradead.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Matthew Wilcox <matthew@wil.cx>
      Cc: Christoph Hellwig <hch@infradead.org>
      db719222
  2. 10 8月, 2010 1 次提交
  3. 17 6月, 2010 1 次提交
  4. 24 5月, 2010 4 次提交
  5. 22 5月, 2010 1 次提交
  6. 11 5月, 2010 1 次提交
    • J
      ocfs2: Wrap signal blocking in void functions. · e4b963f1
      Joel Becker 提交于
      ocfs2 sometimes needs to block signals around dlm operations, but it
      currently does it with sigprocmask().  Even worse, it's checking the
      error code of sigprocmask().  The in-kernel sigprocmask() can only error
      if you get the SIG_* argument wrong.  We don't.
      
      Wrap the sigprocmask() calls with ocfs2_[un]block_signals().  These
      functions are void, but they will BUG() if somehow sigprocmask() returns
      an error.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      e4b963f1
  7. 06 5月, 2010 6 次提交
    • S
      ocfs2: Make nointr a default mount option · 4b37fcb7
      Sunil Mushran 提交于
      OCFS2 has never really supported intr. This patch acknowledges this reality
      and makes nointr the default mount option. In a later patch, we intend to
      support intr.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      4b37fcb7
    • M
      ocfs2: Add dir_resv_level mount option · 83f92318
      Mark Fasheh 提交于
      The default behavior for directory reservations stays the same, but we add a
      mount option so people can tweak the size of directory reservations
      according to their workloads.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      83f92318
    • M
      ocfs2: increase the default size of local alloc windows · 6b82021b
      Mark Fasheh 提交于
      I have observed that the current size of 8M gives us pretty poor
      fragmentation on multi-threaded workloads which do lots of writes.
      
      Generally, I can increase the size of local alloc windows and observe a
      marked decrease in fragmentation, even up and beyond window sizes of 512
      megabytes. This makes sense for a couple reasons - larger local alloc means
      more room for reservation windows. On multi-node workloads the larger local
      alloc helps as well because we don't have to do window slides as often.
      
      Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no
      longer used and the comment above it was out of date.
      
      To test fragmentation, I used a workload which launched 4 threads that did
      4k writes into a series of about 140 alternating files.
      
      With resv_level=2, and a 4k/4k file system I observed the following average
      fragmentation for various localalloc= parameters:
      
      localalloc=	avg. fragmentation
      	8		48
      	32		16
      	64		10
      	120		7
      
      On larger cluster sizes, the difference is more dramatic.
      
      The new default size top out at 256M, which we'll only get for cluster
      sizes of 32K and above.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      6b82021b
    • M
      ocfs2: clean up localalloc mount option size parsing · 73c8a800
      Mark Fasheh 提交于
      This patch pulls the local alloc sizing code into localalloc.c and provides
      a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged
      except that I correctly calculate the maximum local alloc size. The old code
      in ocfs2_parse_options() calculated the max size as:
      
      ocfs2_local_alloc_size(sb) * 8
      
      which is correct, in bits. Unfortunately though the option passed in is in
      megabytes. Ultimately, this bug made no real difference - the shrink code
      would catch a too-large size and bring it down to something reasonable.
      Still, it's less than efficient as-is.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      73c8a800
    • M
      ocfs2: use allocation reservations during file write · 4fe370af
      Mark Fasheh 提交于
      Add a per-inode reservations structure and pass it through to the
      reservations code.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      4fe370af
    • M
      ocfs2: allocation reservations · d02f00cc
      Mark Fasheh 提交于
      This patch improves Ocfs2 allocation policy by allowing an inode to
      reserve a portion of the local alloc bitmap for itself. The reserved
      portion (allocation window) is advisory in that other allocation
      windows might steal it if the local alloc bitmap becomes
      full. Otherwise, the reservations are honored and guaranteed to be
      free. When the local alloc window is moved to a different portion of
      the bitmap, existing reservations are discarded.
      
      Reservation windows are represented internally by a red-black
      tree. Within that tree, each node represents the reservation window of
      one inode. An LRU of active reservations is also maintained. When new
      data is written, we allocate it from the inodes window. When all bits
      in a window are exhausted, we allocate a new one as close to the
      previous one as possible. Should we not find free space, an existing
      reservation is pulled off the LRU and cannibalized.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      d02f00cc
  8. 13 4月, 2010 2 次提交
  9. 27 2月, 2010 1 次提交
  10. 26 1月, 2010 1 次提交
  11. 30 10月, 2009 1 次提交
    • C
      ocfs2: return f_fsid info in ocfs2_statfs() · 837711f8
      Coly Li 提交于
      Currently the f_fsid of struct kstatfs returned from ocfs2_statfs() is
      undefined (vfs layer fills in 0 as default). Since in some conditions,
      f_fsid value might be used in a (f_fsid, ino) pair to uniquely identify
      a file, ocfs2 should return a unique defined f_fsid value from
      ocfs2_statfs().
      
      Because uuid_str is the same on big or litlle endian machine, it's
      endian consistent to use osb->uuid_str to generate f_fsid value.
      Signed-off-by: NColy Li <coly.li@suse.de>
      Cc: Sunil Mushran <sunil.mushran@oracle.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      837711f8
  12. 29 10月, 2009 4 次提交
  13. 02 10月, 2009 1 次提交
  14. 24 9月, 2009 1 次提交
  15. 23 9月, 2009 2 次提交
    • S
      ocfs2: __ocfs2_abort() should not enable panic for local mounts · a2f2ddbf
      Sunil Mushran 提交于
      In a clustered setup, we have to panic the box on journal abort. This is
      because we don't have the facility to go hard readonly. With hard ro, another
      node would detect node failure and initiate recovery.
      
      Having said that, we shouldn't force panic if the volume is mounted locally.
      This patch defers the handling to the mount option, errors.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      a2f2ddbf
    • T
      ocfs2: Add refcount tree lock mechanism. · 374a263e
      Tao Ma 提交于
      Implement locking around struct ocfs2_refcount_tree.  This protects
      all read/write operations on refcount trees.  ocfs2_refcount_tree
      has its own lock and its own caching_info, protecting buffers among
      multiple nodes.
      
      User must call ocfs2_lock_refcount_tree before his operation on
      the tree and unlock it after that.
      
      ocfs2_refcount_trees are referenced by the block number of the
      refcount tree root block, So we create an rb-tree on the ocfs2_super
      to look them up.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      374a263e
  16. 22 9月, 2009 1 次提交
  17. 05 9月, 2009 5 次提交
  18. 18 8月, 2009 1 次提交
  19. 24 7月, 2009 1 次提交
  20. 22 7月, 2009 1 次提交
    • J
      ocfs2: Fix deadlock on umount · f7b1aa69
      Jan Kara 提交于
      In commit ea455f8a, we moved the dentry lock
      put process into ocfs2_wq. This causes problems during umount because ocfs2_wq
      can drop references to inodes while they are being invalidated by
      invalidate_inodes() causing all sorts of nasty things (invalidate_inodes()
      ending in an infinite loop, "Busy inodes after umount" messages etc.).
      
      We fix the problem by stopping ocfs2_wq from doing any further releasing of
      inode references on the superblock being unmounted, wait until it finishes
      the current round of releasing and finally cleaning up all the references in
      dentry_lock_list from ocfs2_put_super().
      
      The issue was tracked down by Tao Ma <tao.ma@oracle.com>.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      f7b1aa69
  21. 09 7月, 2009 1 次提交
  22. 23 6月, 2009 2 次提交