1. 14 10月, 2008 12 次提交
    • T
      ocfs2: Add extent tree operation for xattr value btrees · f56654c4
      Tao Ma 提交于
      Add some thin wrappers around ocfs2_insert_extent() for each of the 3
      different btree types, ocfs2_inode_insert_extent(),
      ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The
      last is for the xattr index btree, which will be used in a followup patch.
      
      All the old callers in file.c etc will call ocfs2_dinode_insert_extent(),
      while the other two handle the xattr issue. And the init of extent tree are
      handled by these functions.
      
      When storing xattr value which is too large, we will allocate some clusters
      for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In
      order to re-use the b-tree operation code, a new parameter named "private"
      is added into ocfs2_extent_tree and it is used to indicate the root of
      ocfs2_exent_list. The reason is that we can't deduce the root from the
      buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse,
      in any place in an ocfs2_xattr_bucket.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      f56654c4
    • T
      ocfs2: Add helper function in uptodate.c for removing xattr clusters · ac11c827
      Tao Ma 提交于
      The old uptodate only handles the issue of removing one buffer_head from
      ocfs2 inode's buffer cache. With xattr clusters, we may need to remove
      multiple buffer_head's at a time.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ac11c827
    • T
      ocfs2: Add the basic xattr disk layout in ocfs2_fs.h · 5a7bc8eb
      Tao Ma 提交于
      Ocfs2 uses a very flexible structure for storing extended attributes on
      disk. Small amount of attributes are stored directly in the inode block - up
      to 256 bytes worth. If that fills up, attributes are also stored in an
      external block, linked to from the inode block. That block can in turn
      expand to a btree, capable of storing large numbers of attributes.
      
      Individual attribute values are stored inline if they're small enough
      (currently about 80 bytes, this can be changed though), and otherwise are
      expanded to a btree. The theoretical limit to the size of an individual
      attribute is about the same as an inode, though the kernel's upper bound on
      the size of an attributes data is far smaller.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      5a7bc8eb
    • T
      ocfs2: Make high level btree extend code generic · 0eb8d47e
      Tao Ma 提交于
      Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic
      function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls
      ocfs2_do_cluster_allocation() now, but the latter can be used for other
      btree types as well.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0eb8d47e
    • T
      ocfs2: Abstract ocfs2_extent_tree in b-tree operations. · e7d4cb6b
      Tao Ma 提交于
      In the old extent tree operation, we take the hypothesis that we
      are using the ocfs2_extent_list in ocfs2_dinode as the tree root.
      As xattr will also use ocfs2_extent_list to store large value
      for a xattr entry, we refactor the tree operation so that xattr
      can use it directly.
      
      The refactoring includes 4 steps:
      1. Abstract set/get of last_eb_blk and update_clusters since they may
         be stored in different location for dinode and xattr.
      2. Add a new structure named ocfs2_extent_tree to indicate the
         extent tree the operation will work on.
      3. Remove all the use of fe_bh and di, use root_bh and root_el in
         extent tree instead. So now all the fe_bh is replaced with
         et->root_bh, el with root_el accordingly.
      4. Make ocfs2_lock_allocators generic. Now it is limited to be only used
         in file extend allocation. But the whole function is useful when we want
         to store large EAs.
      
      Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used
      for anything other than truncate inode data btrees.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      e7d4cb6b
    • T
      ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode. · 811f933d
      Tao Ma 提交于
      ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and
      ocfs2_reserve_new_metadata() are all useful for extent tree operations. But
      they are all limited to an inode btree because they use a struct
      ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list
      (the part of an ocfs2_dinode they actually use) so that the xattr btree code
      can use these functions.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      811f933d
    • T
      ocfs2: Modify ocfs2_num_free_extents for future xattr usage. · 231b87d1
      Tao Ma 提交于
      ocfs2_num_free_extents() is used to find the number of free extent records
      in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to
      use this for extended attribute trees in the future, so genericize the
      interface the take a buffer head. A future patch will allow that buffer_head
      to contain any structure rooting an ocfs2 btree.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      231b87d1
    • M
      ocfs2: track local alloc state via debugfs · 9a8ff578
      Mark Fasheh 提交于
      A per-mount debugfs file, "local_alloc" is created which when read will
      expose live state of the nodes local alloc file. Performance impact is
      minimal, only a bit of memory overhead per mount point. Still, the code is
      hidden behind CONFIG_OCFS2_FS_STATS. This feature will help us debug
      local alloc performance problems on a live system.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      9a8ff578
    • M
      ocfs2: throttle back local alloc when low on disk space · 9c7af40b
      Mark Fasheh 提交于
      Ocfs2's local allocator disables itself for the duration of a mount point
      when it has trouble allocating a large enough area from the primary bitmap.
      That can cause performance problems, especially for disks which were only
      temporarily full or fragmented. This patch allows for the allocator to
      shrink it's window first, before being disabled. Later, it can also be
      re-enabled so that any performance drop is minimized.
      
      To do this, we allow the value of osb->local_alloc_bits to be shrunk when
      needed. The default value is recorded in a mostly read-only variable so that
      we can re-initialize when required.
      
      Locking had to be updated so that we could protect changes to
      local_alloc_bits. Mostly this involves protecting various local alloc values
      with the osb spinlock. A new state is also added, OCFS2_LA_THROTTLED, which
      is used when the local allocator is has shrunk, but is not disabled. If the
      available space dips below 1 megabyte, the local alloc file is disabled. In
      either case, local alloc is re-enabled 30 seconds after the event, or when
      an appropriate amount of bits is seen in the primary bitmap.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      9c7af40b
    • M
      ocfs2: Track local alloc bits internally · ebcee4b5
      Mark Fasheh 提交于
      Do this instead of tracking absolute local alloc size. This avoids
      needless re-calculatiion of bits from bytes in localalloc.c. Additionally,
      the value is now in a more natural unit for internal file system bitmap
      work.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ebcee4b5
    • M
      ocfs2: POSIX file locks support · 53da4939
      Mark Fasheh 提交于
      This is actually pretty easy since fs/dlm already handles the bulk of the
      work. The Ocfs2 userspace cluster stack module already uses fs/dlm as the
      underlying lock manager, so I only had to add the right calls.
      
      Cluster-aware POSIX locks ("plocks") can be turned off by the same means at
      UNIX locks - mount with 'noflocks', or create a local-only Ocfs2 volume.
      Internally, the file system uses two sets of file_operations, depending on
      whether cluster aware plocks is required. This turns out to be easier than
      implementing local-only versions of ->lock.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      53da4939
    • S
      vfs: Use const for kernel parser table · a447c093
      Steven Whitehouse 提交于
      This is a much better version of a previous patch to make the parser
      tables constant. Rather than changing the typedef, we put the "const" in
      all the various places where its required, allowing the __initconst
      exception for nfsroot which was the cause of the previous trouble.
      
      This was posted for review some time ago and I believe its been in -mm
      since then.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Alexander Viro <aviro@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a447c093
  2. 04 10月, 2008 1 次提交
    • M
      ocfs2: fiemap support · 00dc417f
      Mark Fasheh 提交于
      Plug ocfs2 into ->fiemap. Some portions of ocfs2_get_clusters() had to be
      refactored so that the extent cache can be skipped in favor of going
      directly to the on-disk records. This makes it easier for us to determine
      which extent is the last one in the btree. Also, I'm not sure we want to be
      caching fiemap lookups anyway as they're not directly related to data
      read/write.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
      Cc: ocfs2-devel@oss.oracle.com
      Cc: linux-fsdevel@vger.kernel.org
      00dc417f
  3. 10 9月, 2008 1 次提交
    • T
      ocfs2: Fix a bug in direct IO read. · 0e116227
      Tao Ma 提交于
      ocfs2 will become read-only if we try to read the bytes which pass
      the end of i_size. This can be easily reproduced by following steps:
      1. mkfs a ocfs2 volume with bs=4k cs=4k and nosparse.
      2. create a small file(say less than 100 bytes) and we will create the file
         which is allocated 1 cluster.
      3. read 8196 bytes from the kernel using O_DIRECT which exceeds the limit.
      4. The ocfs2 volume becomes read-only and dmesg shows:
      OCFS2: ERROR (device sda13): ocfs2_direct_IO_get_blocks:
      Inode 66010 has a hole at block 1
      File system is now read-only due to the potential of on-disk corruption.
      Please run fsck.ocfs2 once the file system is unmounted.
      
      So suppress the ERROR message.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0e116227
  4. 25 8月, 2008 1 次提交
  5. 23 8月, 2008 5 次提交
  6. 01 8月, 2008 4 次提交
  7. 27 7月, 2008 2 次提交
  8. 18 7月, 2008 2 次提交
  9. 17 7月, 2008 1 次提交
    • C
      [PATCH] ocfs2: fix oops in mmap_truncate testing · c0420ad2
      Coly Li 提交于
      This patch fixes a mmap_truncate bug which was found by ocfs2 test suite.
      
      In an ocfs2 cluster more than 1 node, run program mmap_truncate, which races
      mmap writes and truncates from multiple processes. While the test is
      running, a stat from another node forces writeout, causing an oops in
      ocfs2_get_block() because it sees a buffer to write which isn't allocated.
      
      This patch fixed the bug by clear dirty and uptodate bits in buffer, leave
      the buffer unmapped and return.
      
      Fix is suggested by Mark Fasheh, and I code up the patch.
      Signed-off-by: NColy Li <coyli@suse.de>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      c0420ad2
  10. 15 7月, 2008 9 次提交
  11. 11 7月, 2008 1 次提交
    • M
      ocfs2: Fix flags in ocfs2_file_lock · e988cf1c
      Mark Fasheh 提交于
      The stack-glue merge changed the way we use flags in dlmglue in that we now
      use the fs/dlm equivalents. Unfortunately, a merge error left the new flock
      code only partially updated. This took a while to show up though, because
      the lock level constants are actually identical between o2dlm and fs/dlm.
      The *_CONVERT and *_NOQUEUE flags have different values though, which is
      eventually causing a crash in flags_to_o2dlm().
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      e988cf1c
  12. 08 7月, 2008 1 次提交