1. 06 5月, 2010 16 次提交
    • T
      ocfs2/trivial: Code cleanup for allocation reservation. · 3e4218df
      Tao Ma 提交于
      Two tiny cleanup for allocation reservation.
      1. Remove some extra codes in ocfs2_local_alloc_find_clear_bits.
      2. Remove an unuseful variables in ocfs2_find_resv_lhs.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      3e4218df
    • T
      ocfs2: make ocfs2_adjust_resv_from_alloc simple. · b065556a
      Tao Ma 提交于
      When we allocate some bits from the reservation, we always
      allocate from the r_start(see ocfs2_resmap_resv_bits).
      So there should be no reason to check between r_start
      and start. And I don't think we will change this behaviour
      later by allocating from some bits after r_start.  Why not make
      ocfs2_adjust_resv_from_alloc simple for now?
      
      The only chance we have to adjust the reservation is when we haven't
      reached the end. With this patch, the function is more readable.
      
      Note:
      btw, this patch also fixes an original bug in the function
      which I haven't found before.
      	if (end < ocfs2_resv_end(resv))
      		rhs = end - ocfs2_resv_end(resv);
      This code is of course buggy. ;)
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b065556a
    • S
      ocfs2: Make nointr a default mount option · 4b37fcb7
      Sunil Mushran 提交于
      OCFS2 has never really supported intr. This patch acknowledges this reality
      and makes nointr the default mount option. In a later patch, we intend to
      support intr.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      4b37fcb7
    • S
      ocfs2/dlm: Make o2dlm domain join/leave messages KERN_NOTICE · 5c80d4c9
      Sunil Mushran 提交于
      o2dlm join and leave messages are more than informational as they are
      required for debugging locking issues. This patch changes them from
      KERN_INFO to KERN_NOTICE.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      5c80d4c9
    • S
      o2net: log socket state changes · 23fd9abd
      Srinivas Eeda 提交于
      This patch logs socket state changes that lead to socket shutdown.
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      23fd9abd
    • W
      ocfs2: print node # when tcp fails · a5196ec5
      Wengang Wang 提交于
      Print the node number of a peer node if sending it a message failed.
      Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      a5196ec5
    • M
      ocfs2: Add dir_resv_level mount option · 83f92318
      Mark Fasheh 提交于
      The default behavior for directory reservations stays the same, but we add a
      mount option so people can tweak the size of directory reservations
      according to their workloads.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      83f92318
    • M
      ocfs2: change default reservation window sizes · b07f8f24
      Mark Fasheh 提交于
      The default reservation size of 4 (32-bit windows) is a bit too ambitious.
      Scale it back to 16 bits (resv_level=2). I have been testing various sizes
      on a 4-node cluster which runs a mixed workload that is heavily threaded.
      With a 256MB local alloc, I get *roughly* the following levels of average file
      fragmentation:
      
      resv_level=0	70%
      resv_level=1	21%
      resv_level=2	23%
      resv_level=3	24%
      resv_level=4	60%
      resv_level=5	did not test
      resv_level=6	60%
      
      resv_level=2 seemed like a good compromise between not letting windows be
      too small, but not so big that heavier workloads will immediately suffer
      without tuning.
      
      This patch also change the behavior of directory reservations - they now
      track file reservations.  The previous compromise of giving directory
      windows only 8 bits wound up fragmenting more at some window sizes because
      file allocations had smaller unused windows to poach from.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b07f8f24
    • M
      ocfs2: increase the default size of local alloc windows · 6b82021b
      Mark Fasheh 提交于
      I have observed that the current size of 8M gives us pretty poor
      fragmentation on multi-threaded workloads which do lots of writes.
      
      Generally, I can increase the size of local alloc windows and observe a
      marked decrease in fragmentation, even up and beyond window sizes of 512
      megabytes. This makes sense for a couple reasons - larger local alloc means
      more room for reservation windows. On multi-node workloads the larger local
      alloc helps as well because we don't have to do window slides as often.
      
      Also, I removed the OCFS2_DEFAULT_LOCAL_ALLOC_SIZE constant as it is no
      longer used and the comment above it was out of date.
      
      To test fragmentation, I used a workload which launched 4 threads that did
      4k writes into a series of about 140 alternating files.
      
      With resv_level=2, and a 4k/4k file system I observed the following average
      fragmentation for various localalloc= parameters:
      
      localalloc=	avg. fragmentation
      	8		48
      	32		16
      	64		10
      	120		7
      
      On larger cluster sizes, the difference is more dramatic.
      
      The new default size top out at 256M, which we'll only get for cluster
      sizes of 32K and above.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      6b82021b
    • M
      ocfs2: clean up localalloc mount option size parsing · 73c8a800
      Mark Fasheh 提交于
      This patch pulls the local alloc sizing code into localalloc.c and provides
      a callout to it from ocfs2_fill_super(). Behavior is essentially unchanged
      except that I correctly calculate the maximum local alloc size. The old code
      in ocfs2_parse_options() calculated the max size as:
      
      ocfs2_local_alloc_size(sb) * 8
      
      which is correct, in bits. Unfortunately though the option passed in is in
      megabytes. Ultimately, this bug made no real difference - the shrink code
      would catch a too-large size and bring it down to something reasonable.
      Still, it's less than efficient as-is.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      73c8a800
    • M
      ocfs2: remove ocfs2_local_alloc_in_range() · a57c8fd2
      Mark Fasheh 提交于
      Inodes are always allocated from the global bitmap now so we don't need this
      any more. Also, the existing implementation bounces reservations around
      needlessly.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      a57c8fd2
    • M
      ocfs2: allocate btree internal block groups from the global bitmap · 33d5d380
      Mark Fasheh 提交于
      Otherwise, the need for a very large contiguous allocation tends to
      wreak havoc on many inode allocation reservations on the local alloc, thus
      ruining any chances for contiguousness.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      33d5d380
    • M
      ocfs2: use allocation reservations for directory data · e3b4a97d
      Mark Fasheh 提交于
      Use the reservations system for unindexed dir tree allocations. We don't
      bother with the indexed tree as reads from it are mostly random anyway.
      Directory reservations are marked seperately, to allow the reservations code
      a chance to optimize their window sizes. This patch allocates only 8 bits
      for directory windows as they generally are not expected to grow as quickly
      as file data. Future improvements to dir window sizing can trivially be
      made.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      e3b4a97d
    • M
      ocfs2: use allocation reservations during file write · 4fe370af
      Mark Fasheh 提交于
      Add a per-inode reservations structure and pass it through to the
      reservations code.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      4fe370af
    • M
      ocfs2: allocation reservations · d02f00cc
      Mark Fasheh 提交于
      This patch improves Ocfs2 allocation policy by allowing an inode to
      reserve a portion of the local alloc bitmap for itself. The reserved
      portion (allocation window) is advisory in that other allocation
      windows might steal it if the local alloc bitmap becomes
      full. Otherwise, the reservations are honored and guaranteed to be
      free. When the local alloc window is moved to a different portion of
      the bitmap, existing reservations are discarded.
      
      Reservation windows are represented internally by a red-black
      tree. Within that tree, each node represents the reservation window of
      one inode. An LRU of active reservations is also maintained. When new
      data is written, we allocate it from the inodes window. When all bits
      in a window are exhausted, we allocate a new one as close to the
      previous one as possible. Should we not find free space, an existing
      reservation is pulled off the LRU and cannibalized.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      d02f00cc
    • J
      ocfs2: Make ocfs2_journal_dirty() void. · ec20cec7
      Joel Becker 提交于
      jbd[2]_journal_dirty_metadata() only returns 0.  It's been returning 0
      since before the kernel moved to git.  There is no point in checking
      this error.
      
      ocfs2_journal_dirty() has been faithfully returning the status since the
      beginning.  All over ocfs2, we have blocks of code checking this can't
      fail status.  In the past few years, we've tried to avoid adding these
      checks, because they are pointless.  But anyone who looks at our code
      assumes they are needed.
      
      Finally, ocfs2_journal_dirty() is made a void function.  All error
      checking is removed from other files.  We'll BUG_ON() the status of
      jbd2_journal_dirty_metadata() just in case they change it someday.  They
      won't.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      ec20cec7
  2. 24 3月, 2010 1 次提交
    • M
      ocfs2: Clear undo bits when local alloc is freed · b4414eea
      Mark Fasheh 提交于
      When the local alloc file changes windows, unused bits are freed back to the
      global bitmap. By defnition, those bits can not be in use by any file. Also,
      the local alloc will never have been able to allocate those bits if they
      were part of a previous truncate. Therefore it makes sense that we should
      clear unused local alloc bits in the undo buffer so that they can be used
      immediatly.
      
      [ Modified to call it ocfs2_release_clusters() -- Joel ]
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b4414eea
  3. 20 3月, 2010 2 次提交
    • T
      ocfs2: Init meta_ac properly in ocfs2_create_empty_xattr_block. · b2317968
      Tao Ma 提交于
      You can't store a pointer that you haven't filled in yet and expect it
      to work.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b2317968
    • T
      ocfs2: Fix the update of name_offset when removing xattrs · dfe4d3d6
      Tao Ma 提交于
      When replacing a xattr's value, in some case we wipe its name/value
      first and then re-add it. The wipe is done by
      ocfs2_xa_block_wipe_namevalue() when the xattr is in the inode or
      block. We currently adjust name_offset for all the entries which have
      (offset < name_offset). This does not adjust the entrie we're replacing.
      Since we are replacing the entry, we don't adjust the total entry count.
      When we calculate a new namevalue location, we trust the entries
      now-wrong offset in ocfs2_xa_get_free_start().  The solution is to
      also adjust the name_offset for the replaced entry, allowing
      ocfs2_xa_get_free_start() to calculate the new namevalue location
      correctly.
      
      The following script can trigger a kernel panic easily.
      
      echo 'y'|mkfs.ocfs2 --fs-features=local,xattr -b 4K $DEVICE
      mount -t ocfs2 $DEVICE $MNT_DIR
      FILE=$MNT_DIR/$RANDOM
      for((i=0;i<76;i++))
      do
      string_76="a$string_76"
      done
      string_78="aa$string_76"
      string_82="aaaa$string_78"
      
      touch $FILE
      setfattr -n 'user.test1234567890' -v $string_76 $FILE
      setfattr -n 'user.test1234567890' -v $string_78 $FILE
      setfattr -n 'user.test1234567890' -v $string_82 $FILE
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      dfe4d3d6
  4. 19 3月, 2010 1 次提交
    • M
      ocfs2: Always try for maximum bits with new local alloc windows · b22b63eb
      Mark Fasheh 提交于
      What we were doing before was to ask for the current window size as the
      maximum allocation. This had the effect of limiting the amount of allocation
      we could get for the local alloc during times when the window size was
      shrunk due to fragmentation. In some cases, that could actually *increase*
      fragmentation by artificially limiting the number of bits we can accept. So
      while we still want to ask for a minimum number of bits equal to window
      size, there is no reason why we should limit the number of bits the local
      alloc should accept. Hence always allow the maximum number of local alloc
      bits.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b22b63eb
  5. 18 3月, 2010 4 次提交
  6. 13 3月, 2010 1 次提交
  7. 08 3月, 2010 1 次提交
  8. 07 3月, 2010 1 次提交
  9. 05 3月, 2010 7 次提交
    • C
      dquot: cleanup dquot initialize routine · 871a2931
      Christoph Hellwig 提交于
      Get rid of the initialize dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_initialize helper to __dquot_initialize
      and vfs_dq_init to dquot_initialize to have a consistent namespace.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      871a2931
    • C
      dquot: move dquot initialization responsibility into the filesystem · 907f4554
      Christoph Hellwig 提交于
      Currently various places in the VFS call vfs_dq_init directly.  This means
      we tie the quota code into the VFS.  Get rid of that and make the
      filesystem responsible for the initialization.   For most metadata operations
      this is a straight forward move into the methods, but for truncate and
      open it's a bit more complicated.
      
      For truncate we currently only call vfs_dq_init for the sys_truncate case
      because open already takes care of it for ftruncate and open(O_TRUNC) - the
      new code causes an additional vfs_dq_init for those which is harmless.
      
      For open the initialization is moved from do_filp_open into the open method,
      which means it happens slightly earlier now, and only for regular files.
      The latter is fine because we don't need to initialize it for operations
      on special files, and we already do it as part of the namespace operations
      for directories.
      
      Add a dquot_file_open helper that filesystems that support generic quotas
      can use to fill in ->open.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      907f4554
    • C
      dquot: cleanup dquot drop routine · 9f754758
      Christoph Hellwig 提交于
      Get rid of the drop dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_drop helper to __dquot_drop
      and vfs_dq_drop to dquot_drop to have a consistent namespace.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      9f754758
    • C
      dquot: move dquot drop responsibility into the filesystem · 257ba15c
      Christoph Hellwig 提交于
      Currently clear_inode calls vfs_dq_drop directly.  This means
      we tie the quota code into the VFS.  Get rid of that and make the
      filesystem responsible for the drop inside the ->clear_inode
      superblock operation.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      257ba15c
    • C
      dquot: cleanup dquot transfer routine · b43fa828
      Christoph Hellwig 提交于
      Get rid of the transfer dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_transfer helper to __dquot_transfer
      and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
      and make the new dquot_transfer return a normal negative errno value
      which all callers expect.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b43fa828
    • C
      dquot: cleanup inode allocation / freeing routines · 63936dda
      Christoph Hellwig 提交于
      Get rid of the alloc_inode and free_inode dquot operations - they are
      always called from the filesystem and if a filesystem really needs
      their own (which none currently does) it can just call into it's
      own routine directly.
      
      Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always
      call the lowlevel dquot_alloc_inode / dqout_free_inode routines
      directly, which now lose the number argument which is always 1.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      63936dda
    • C
      dquot: cleanup space allocation / freeing routines · 5dd4056d
      Christoph Hellwig 提交于
      Get rid of the alloc_space, free_space, reserve_space, claim_space and
      release_rsv dquot operations - they are always called from the filesystem
      and if a filesystem really needs their own (which none currently does)
      it can just call into it's own routine directly.
      
      Move shared logic into the common __dquot_alloc_space,
      dquot_claim_space_nodirty and __dquot_free_space low-level methods,
      and rationalize the wrappers around it to move as much as possible
      code into the common block for CONFIG_QUOTA vs not.  Also rename
      all these helpers to be named dquot_* instead of vfs_dq_*.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      5dd4056d
  10. 03 3月, 2010 1 次提交
  11. 28 2月, 2010 3 次提交
  12. 27 2月, 2010 2 次提交