1. 02 2月, 2013 2 次提交
  2. 29 1月, 2013 7 次提交
    • S
      GFS2: Use ->writepages for ordered writes · 45138990
      Steven Whitehouse 提交于
      Instead of using a list of buffers to write ahead of the journal
      flush, this now uses a list of inodes and calls ->writepages
      via filemap_fdatawrite() in order to achieve the same thing. For
      most use cases this results in a shorter ordered write list,
      as well as much larger i/os being issued.
      
      The ordered write list is sorted by inode number before writing
      in order to retain the disk block ordering between inodes as
      per the previous code.
      
      The previous ordered write code used to conflict in its assumptions
      about how to write out the disk blocks with mpage_writepages()
      so that with this updated version we can also use mpage_writepages()
      for GFS2's ordered write, writepages implementation. So we will
      also send larger i/os from writeback too.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      45138990
    • S
      GFS2: Clean up freeze code · d564053f
      Steven Whitehouse 提交于
      The freeze code has not been looked at a lot recently. Upstream has
      moved on, and this is an attempt to catch us back up again. There
      is a vfs level interface for the freeze code which can be called
      from our (obsolete, but kept for backward compatibility purposes)
      sysfs freeze interface. This means freezing this way vs. doing it
      from the ioctl should now work in identical fashion.
      
      As a result of this, the freeze function is only called once
      and we can drop our own special purpose code for counting the
      number of freezes.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      d564053f
    • S
      GFS2: Merge gfs2_attach_bufdata() into trans.c · c76c4d96
      Steven Whitehouse 提交于
      The locking in gfs2_attach_bufdata() was type specific (data/meta)
      which made the function rather confusing. This patch moves the core
      of gfs2_attach_bufdata() into trans.c renaming it gfs2_alloc_bufdata()
      and moving the locking into gfs2_trans_add_data()/gfs2_trans_add_meta()
      
      As a result all of the locking related to adding data and metadata to
      the journal is now in these two functions. This should help to clarify
      what is going on, and give us some opportunities to simplify in
      some cases.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c76c4d96
    • S
      GFS2: Copy gfs2_trans_add_bh into new data/meta functions · 767f433f
      Steven Whitehouse 提交于
      This patch copies the body of gfs2_trans_add_bh into the two newly
      added gfs2_trans_add_data and gfs2_trans_add_meta functions. We can
      then move the .lo_add functions from lops.c into trans.c and call
      them directly.
      
      As a result of this, we no longer need to use the .lo_add functions
      at all, so that is removed from the log operations structure.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      767f433f
    • S
      GFS2: Split gfs2_trans_add_bh() into two · 350a9b0a
      Steven Whitehouse 提交于
      There is little common content in gfs2_trans_add_bh() between the data
      and meta classes by the time that the functions which it calls are
      taken into account. The intent here is to split this into two
      separate functions. Stage one is to introduce gfs2_trans_add_data()
      and gfs2_trans_add_meta() and update the callers accordingly.
      
      Later patches will then pull in the content of gfs2_trans_add_bh()
      and its dependent functions in order to clean up the code in this
      area.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      350a9b0a
    • S
      GFS2: Merge revoke adding functions · 75f2b879
      Steven Whitehouse 提交于
      This moves the lo_add function for revokes into trans.c, removing
      a function call and making the code easier to read.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      75f2b879
    • S
      GFS2: Separate LRU scanning from shrinker · 2a005855
      Steven Whitehouse 提交于
      This breaks out the LRU scanning function from the shrinker in
      preparation for adding other callers to the LRU scanner.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      2a005855
  3. 28 1月, 2013 1 次提交
  4. 02 1月, 2013 4 次提交
    • B
      GFS2: Reset rd_last_alloc when it reaches the end of the rgrp · 13d2eb01
      Bob Peterson 提交于
      In function rg_mblk_search, it's searching for multiple blocks in
      a given state (e.g. "free"). If there's an active block reservation
      its goal is the next free block of that. If the resource group
      contains the dinode's goal block, that's used for the search. But
      if neither is the case, it uses the rgrp's last allocated block.
      That way, consecutive allocations appear after one another on media.
      The problem comes in when you hit the end of the rgrp; it would never
      start over and search from the beginning. This became a problem,
      since if you deleted all the files and data from the rgrp, it would
      never start over and find free blocks. So it had to keep searching
      further out on the media to allocate blocks. This patch resets the
      rd_last_alloc after it does an unsuccessful search at the end of
      the rgrp.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      13d2eb01
    • B
      GFS2: Stop looking for free blocks at end of rgrp · 15bd50ad
      Bob Peterson 提交于
      This patch adds a return code check after calling function
      gfs2_rbm_from_block while determining the free extent size.
      That way, when the end of an rgrp is reached, it won't try
      to process unaligned blocks after the end.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      15bd50ad
    • A
      GFS2: Fix race in gfs2_rs_alloc · f1213cac
      Abhijith Das 提交于
      QE aio tests uncovered a race condition in gfs2_rs_alloc where it's possible
      to come out of the function with a valid ip->i_res allocation but it gets
      freed before use resulting in a NULL ptr dereference.
      
      This patch envelopes the initial short-circuit check for non-NULL ip->i_res
      into the mutex lock. With this patch, I was able to successfully run the
      reproducer test multiple times.
      
      Resolves: rhbz#878476
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      f1213cac
    • N
      GFS2: Initialize hex string to '0' · ec148752
      Nathan Straz 提交于
      When generating the DLM lock name, a value of 0 would skip
      the loop and leave the string unchanged.  This left locks with
      a value of 0 unlabeled.  Initializing the string to '0' fixes this.
      Signed-off-by: NNathan Straz <nstraz@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ec148752
  5. 18 12月, 2012 1 次提交
  6. 12 12月, 2012 1 次提交
    • R
      mm: redefine address_space.assoc_mapping · 252aa6f5
      Rafael Aquini 提交于
      Overhaul struct address_space.assoc_mapping renaming it to
      address_space.private_data and its type is redefined to void*.  By this
      approach we consistently name the .private_* elements from struct
      address_space as well as allow extended usage for address_space
      association with other data structures through ->private_data.
      
      Also, all users of old ->assoc_mapping element are converted to reflect
      its new name and type change (->private_data).
      Signed-off-by: NRafael Aquini <aquini@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      252aa6f5
  7. 21 11月, 2012 1 次提交
    • B
      GFS2: Set gl_object during inode create · 1e2d9d44
      Bob Peterson 提交于
      This patch fixes a cluster coherency problem that occurs when one
      node creates a file, does several writes, then a different node
      tries to write to the same file. When the inode's glock is demoted,
      the inode wasn't synced to the media properly because the gl_object
      wasn't set. Later, the flush daemon noticed the uncommitted data
      and tried to flush it, only to discover the glock was no longer locked
      properly in exclusive mode. That caused an assert withdraw.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      1e2d9d44
  8. 16 11月, 2012 2 次提交
  9. 15 11月, 2012 2 次提交
  10. 14 11月, 2012 1 次提交
    • D
      GFS2: skip dlm_unlock calls in unmount · fb6791d1
      David Teigland 提交于
      When unmounting, gfs2 does a full dlm_unlock operation on every
      cached lock.  This can create a very large amount of work and can
      take a long time to complete.  However, the vast majority of these
      dlm unlock operations are unnecessary because after all the unlocks
      are done, gfs2 leaves the dlm lockspace, which automatically clears
      the locks of the leaving node, without unlocking each one individually.
      So, gfs2 can skip explicit dlm unlocks, and use dlm_release_lockspace to
      remove the locks implicitly.  The one exception is when the lock's lvb is
      being used.  In this case, dlm_unlock is called because it may update the
      lvb of the resource.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      fb6791d1
  11. 13 11月, 2012 4 次提交
  12. 07 11月, 2012 13 次提交
    • S
      GFS2: Add Orlov allocator · 9dbe9610
      Steven Whitehouse 提交于
      Just like ext3, this works on the root directory and any directory
      with the +T flag set. Also, just like ext3, any subdirectory created
      in one of the just mentioned cases will be allocated to a random
      resource group (GFS2 equivalent of a block group).
      
      If you are creating a set of directories, each of which will contain a
      job running on a different node, then by setting +T on the parent
      directory before creating the subdirectories, each will land up in a
      different resource group, and thus resource group contention between
      nodes will be kept to a minimum.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      9dbe9610
    • S
      GFS2: Use proper allocation context for new inodes · c9aecf73
      Steven Whitehouse 提交于
      Rather than using the parent directory's allocation context, this
      patch allocated the new inode earlier in the process and then uses
      it to contain all the information required. As a result, we can now
      use the new inode's own allocation context to allocate it rather
      than having to use the parent directory's context. This give us a
      lot more flexibility in where the inode is placed on disk.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      c9aecf73
    • S
      GFS2: Add test for resource group congestion status · bcd97c06
      Steven Whitehouse 提交于
      This patch uses information gathered by the recent glock statistics
      patch in order to derrive a boolean verdict on the congestion
      status of a resource group. This is then used when making decisions
      on which resource group to choose during block allocation.
      
      The aim is to avoid resource groups which are heavily contended
      by other nodes, while still ensuring locality of access wherever
      possible.
      
      Once a reservation has been made in a particular resource group
      we continue to use that resource group until a new reservation is
      required. This should help to ensure that we do not change resource
      groups too often.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      bcd97c06
    • B
      GFS2: Rename glops go_xmote_th to go_sync · 06dfc306
      Bob Peterson 提交于
      [Editorial: This is a nit, but has been a minor irritation for a long time:]
      
      This patch renames glops structure item for go_xmote_th to go_sync.
      The functionality is unchanged; it's just for readability.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      06dfc306
    • B
      GFS2: Speed up gfs2_rbm_from_block · a68a0a35
      Bob Peterson 提交于
      This patch is a rewrite of function gfs2_rbm_from_block. Rather than
      looping to find the right bitmap, the code now does a few simple
      math calculations.
      
      I compared the performance of both algorithms side by side and the new
      algorithm is noticeably faster. Sample instrumentation output from a
      "fast" machine:
      
      5 million calls: millisec spent: Orig: 166 New: 113
      5 million calls: millisec spent: Orig: 189 New: 114
      
      In addition, I ran postmark (on a somewhat slowr CPU) before the after
      the new algorithm was put in place and postmark showed a decent
      improvement:
      
      Before the new algorithm:
      -------------------------
      Time:
      	645 seconds total
      	584 seconds of transactions (171 per second)
      
      Files:
      	150087 created (232 per second)
      		Creation alone: 100000 files (2083 per second)
      		Mixed with transactions: 50087 files (85 per second)
      	49995 read (85 per second)
      	49991 appended (85 per second)
      	150087 deleted (232 per second)
      		Deletion alone: 100174 files (7705 per second)
      		Mixed with transactions: 49913 files (85 per second)
      
      Data:
      	273.42 megabytes read (434.08 kilobytes per second)
      	852.13 megabytes written (1.32 megabytes per second)
      
      With the new algorithm:
      -----------------------
      Time:
      	599 seconds total
      	530 seconds of transactions (188 per second)
      
      Files:
      	150087 created (250 per second)
      		Creation alone: 100000 files (1886 per second)
      		Mixed with transactions: 50087 files (94 per second)
      	49995 read (94 per second)
      	49991 appended (94 per second)
      	150087 deleted (250 per second)
      		Deletion alone: 100174 files (6260 per second)
      		Mixed with transactions: 49913 files (94 per second)
      
      Data:
      	273.42 megabytes read (467.42 kilobytes per second)
      	852.13 megabytes written (1.42 megabytes per second)
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a68a0a35
    • S
      GFS2: Review bug traps in glops.c · 8eae1ca0
      Steven Whitehouse 提交于
      Two of the bug traps here could really be warnings. The others are
      converted from BUG() to GLOCK_BUG_ON() since we'll most likely
      need to know the glock state in order to debug any issues which
      arise. As a result of this, __dump_glock has to be renamed and
      is no longer static.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      8eae1ca0
    • B
      GFS2: Test bufdata with buffer locked and gfs2_log_lock held · 96e5d1d3
      Benjamin Marzinski 提交于
      In gfs2_trans_add_bh(), gfs2 was testing if a there was a bd attached to the
      buffer without having the gfs2_log_lock held. It was then assuming it would
      stay attached for the rest of the function. However, without either the log
      lock being held of the buffer locked, __gfs2_ail_flush() could detach bd at any
      time.  This patch moves the locking before the test.  If there isn't a bd
      already attached, gfs2 can safely allocate one and attach it before locking.
      There is no way that the newly allocated bd could be on the ail list,
      and thus no way for __gfs2_ail_flush() to detach it.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      96e5d1d3
    • B
      GFS2: Don't call file_accessed() with a shared glock · 3d162688
      Benjamin Marzinski 提交于
      file_accessed() was being called by gfs2_mmap() with a shared glock. If it
      needed to update the atime, it was crashing because it dirtied the inode in
      gfs2_dirty_inode() without holding an exclusive lock. gfs2_dirty_inode()
      checked if the caller was already holding a glock, but it didn't make sure that
      the glock was in the exclusive state. Now, instead of calling file_accessed()
      while holding the shared lock in gfs2_mmap(), file_accessed() is called after
      grabbing and releasing the glock to update the inode.  If file_accessed() needs
      to update the atime, it will grab an exclusive lock in gfs2_dirty_inode().
      
      gfs2_dirty_inode() now also checks to make sure that if the calling process has
      already locked the glock, it has an exclusive lock.
      Signed-off-by: NBenjamin Marzinski <bmarzins@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3d162688
    • L
      GFS2: Fix FITRIM argument handling · 076f0faa
      Lukas Czerner 提交于
      Currently implementation in gfs2 uses FITRIM arguments as it were in
      file system blocks units which is wrong. The FITRIM arguments
      (fstrim_range.start, fstrim_range.len and fstrim_range.minlen) are
      actually in bytes.
      
      Moreover, check for start argument beyond the end of file system, len
      argument being smaller than file system block and minlen argument being
      bigger than biggest resource group were missing.
      
      This commit converts the code to convert FITRIM argument to file system
      blocks and also adds appropriate checks mentioned above.
      
      All the problems were recognised by xfstests 251 and 260.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      076f0faa
    • L
      GFS2: Require user to provide argument for FITRIM · 3a238ade
      Lukas Czerner 提交于
      When the fstrim_range argument is not provided by user in FITRIM ioctl
      we should just return EFAULT and not promoting bad behaviour by filling
      the structure in kernel. Let the user deal with it.
      Signed-off-by: NLukas Czerner <lczerner@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      3a238ade
    • A
      GFS2: Clean up some unused assignments · 73738a77
      Andrew Price 提交于
      Cleans up two cases where variables were assigned values but then never
      used again.
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      73738a77
    • A
      GFS2: Fix possible null pointer deref in gfs2_rs_alloc · cd0ed19f
      Andrew Price 提交于
      Despite the return value from kmem_cache_zalloc() being checked, the
      error wasn't being returned until after a possible null pointer
      dereference. This patch returns the error immediately, allowing the
      removal of the error variable.
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      cd0ed19f
    • A
      GFS2: Fix an unchecked error from gfs2_rs_alloc · aaaf68c5
      Andrew Price 提交于
      Check the return value of gfs2_rs_alloc(ip) and avoid a possible null
      pointer dereference.
      Signed-off-by: NAndrew Price <anprice@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      aaaf68c5
  13. 10 10月, 2012 1 次提交
    • H
      tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking · 35c2a7f4
      Hugh Dickins 提交于
      Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
      	u64 inum = fid->raw[2];
      which is unhelpfully reported as at the end of shmem_alloc_inode():
      
      BUG: unable to handle kernel paging request at ffff880061cd3000
      IP: [<ffffffff812190d0>] shmem_alloc_inode+0x40/0x40
      Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Call Trace:
       [<ffffffff81488649>] ? exportfs_decode_fh+0x79/0x2d0
       [<ffffffff812d77c3>] do_handle_open+0x163/0x2c0
       [<ffffffff812d792c>] sys_open_by_handle_at+0xc/0x10
       [<ffffffff83a5f3f8>] tracesys+0xe1/0xe6
      
      Right, tmpfs is being stupid to access fid->raw[2] before validating that
      fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
      fall at the end of a page, and the next page not be present.
      
      But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
      careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
      could oops in the same way: add the missing fh_len checks to those.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      35c2a7f4