1. 17 2月, 2014 1 次提交
  2. 10 2月, 2014 1 次提交
  3. 07 2月, 2014 1 次提交
    • S
      GFS2: Add meta readahead field in directory entries · 44aaada9
      Steven Whitehouse 提交于
      The intent of this new field in the directory entry is to
      allow a subsequent lookup to know how many blocks, which
      are contiguous with the inode, contain metadata which relates
      to the inode. This will then allow the issuing of a single
      read to read these blocks, rather than reading the inode
      first, and then issuing a second read for the metadata.
      
      This only works under some fairly strict conditions, since
      we do not have back pointers from inodes to directory entries
      we must ensure that the blocks referenced in this way will
      always belong to the inode.
      
      This rules out being able to use this system for indirect
      blocks, as these can change as a result of truncate/rewrite.
      
      So the idea here is to restrict this to xattr blocks only
      for the time being. For most inodes, that means only a
      single block. Also, when using ACLs and/or SELinux or
      other LSMs, these will be added at inode creation time
      so that they will be contiguous with the inode on disk and
      also will almost always be needed when we read the inode in
      for permissions checks.
      
      Once an xattr block for an inode is allocated, it will never
      change until the inode is deallocated.
      
      This patch adds the new field, a further patch will add the
      readahead in due course.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      44aaada9
  4. 06 2月, 2014 2 次提交
    • B
      GFS2: Lock i_mutex and use a local gfs2_holder for fallocate · a0846a53
      Bob Peterson 提交于
      This patch causes GFS2 to lock the i_mutex during fallocate. It
      also switches from using a dinode's inode glock to using a local
      holder like the other GFS2 i_operations.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      a0846a53
    • S
      GFS2: journal data writepages update · 774016b2
      Steven Whitehouse 提交于
      GFS2 has carried what is more or less a copy of the
      write_cache_pages() for some time. It seems that this
      copy has slipped behind the core code over time. This
      patch brings it back uptodate, and in addition adds the
      tracepoint which would otherwise be missing.
      
      We could go further, and eliminate some or all of the
      code duplication here. The issue is that if we do that,
      then the function we need to split out from the existing
      write_cache_pages(), which will look a lot like
      gfs2_jdata_write_pagevec(), would land up putting quite a
      lot of extra variables on the stack. I know that has been
      a problem in the past in the writeback code path, which
      is why I've hesitated to do it here.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      774016b2
  5. 04 2月, 2014 1 次提交
    • S
      GFS2: Allocate block for xattr at inode alloc time, if required · b2c8b3ea
      Steven Whitehouse 提交于
      This is another step towards improving the allocation of xattr
      blocks at inode allocation time. Here we take advantage of
      Christoph's recent work on ACLs to allocate a block for the
      xattrs early if we know that we will be adding ACLs to the
      inode later on. The advantage of that is that it is much
      more likely that we'll get a contiguous run of two blocks
      where the first is the inode and the second is the xattr block.
      
      We still have to fall back to the original system in case we
      don't get the requested two contiguous blocks, or in case the
      ACLs are too large to fit into the block.
      
      Future patches will move more of the ACL setting code further
      up the gfs2_inode_create() function. Also, I'd like to be
      able to do the same thing with the xattrs from LSMs in
      due course, too. That way we should be able to slowly reduce
      the number of independent transactions, at least in the
      most common cases.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b2c8b3ea
  6. 03 2月, 2014 3 次提交
    • S
      GFS2: Plug on AIL flush · 885bceca
      Steven Whitehouse 提交于
      When we do a flush of the AIL list, we are writing out what is
      likely to be a lot of small I/Os, which are possibly in an order
      which is not ideal performance-wise. Since this is done by calling
      filemap_fdatatwrite for each individual inode's address space there
      is no overall plugging going on.
      
      In addition to that, we do not always wait for AIL i/o when we flush
      it, so that it is possible for things to get left behind on the queue.
      By adding explicit plugging here, we reduce the chances of this
      being an issues. A quick test using the AIL flush tracepoint shows a
      small, but measurable improvement.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      885bceca
    • M
      hpfs: optimize quad buffer loading · 1c0b8a7a
      Mikulas Patocka 提交于
      HPFS needs to load 4 consecutive 512-byte sectors when accessing the
      directory nodes or bitmaps.  We can't switch to 2048-byte block size
      because files are allocated in the units of 512-byte sectors.
      
      Previously, the driver would allocate a 2048-byte area using kmalloc,
      copy the data from four buffers to this area and eventually copy them
      back if they were modified.
      
      In the current implementation of the buffer cache, buffers are allocated
      in the pagecache.  That means that 4 consecutive 512-byte buffers are
      stored in consecutive areas in the kernel address space.  So, we don't
      need to allocate extra memory and copy the content of the buffers there.
      
      This patch optimizes the code to avoid copying the buffers.  It checks
      if the four buffers are stored in contiguous memory - if they are not,
      it falls back to allocating a 2048-byte area and copying data there.
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c0b8a7a
    • M
      hpfs: remember free space · 2cbe5c76
      Mikulas Patocka 提交于
      Previously, hpfs scanned all bitmaps each time the user asked for free
      space using statfs.  This patch changes it so that hpfs scans the
      bitmaps only once, remembes the free space and on next invocation of
      statfs it returns the value instantly.
      
      New versions of wine are hammering on the statfs syscall very heavily,
      making some games unplayable when they're stored on hpfs, with load
      times in minutes.
      
      This should be backported to the stable kernels because it fixes
      user-visible problem (excessive level load times in wine).
      Signed-off-by: NMikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2cbe5c76
  7. 02 2月, 2014 1 次提交
  8. 01 2月, 2014 5 次提交
  9. 31 1月, 2014 5 次提交
  10. 30 1月, 2014 7 次提交
  11. 29 1月, 2014 13 次提交
    • C
      Btrfs: fix spin_unlock in check_ref_cleanup · cf93da7b
      Chris Mason 提交于
      Our goto out should have gone a little farther.
      Signed-off-by: NChris Mason <clm@fb.com>
      cf93da7b
    • C
      Btrfs: setup inode location during btrfs_init_inode_locked · 90d3e592
      Chris Mason 提交于
      We have a race during inode init because the BTRFS_I(inode)->location is setup
      after the inode hash table lock is dropped.  btrfs_find_actor uses the location
      field, so our search might not find an existing inode in the hash table if we
      race with the inode init code.
      
      This commit changes things to setup the location field sooner.  Also the find actor now
      uses only the location objectid to match inodes.  For inode hashing, we just
      need a unique and stable test, it doesn't have to reflect the inode numbers we
      show to userland.
      Signed-off-by: NChris Mason <clm@fb.com>
      CC: stable@vger.kernel.org
      90d3e592
    • C
      Btrfs: don't use ram_bytes for uncompressed inline items · 514ac8ad
      Chris Mason 提交于
      If we truncate an uncompressed inline item, ram_bytes isn't updated to reflect
      the new size.  The fixe uses the size directly from the item header when
      reading uncompressed inlines, and also fixes truncate to update the
      size as it goes.
      Reported-by: NJens Axboe <axboe@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      CC: stable@vger.kernel.org
      514ac8ad
    • F
      Btrfs: fix btrfs_search_slot_for_read backwards iteration · 23c6bf6a
      Filipe David Borba Manana 提交于
      If the current path's leaf slot is 0, we do search for the previous
      leaf (via btrfs_prev_leaf) and set the new path's leaf slot to a
      value corresponding to the number of items - 1 of the former leaf.
      Fix this by using the slot set by btrfs_prev_leaf, decrementing it
      by 1 if it's equal to the leaf's number of items.
      
      Use of btrfs_search_slot_for_read() for backward iteration is used in
      particular by the send feature, which could miss items when the input
      leaf has less items than its previous leaf.
      
      This could be reproduced by running btrfs/007 from xfstests in a loop.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      23c6bf6a
    • W
      Btrfs: do not export ulist functions · 49fc647a
      Wang Shilong 提交于
      There are not any users that use ulist except Btrfs,don't
      export them.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      49fc647a
    • W
      Btrfs: rework ulist with list+rb_tree · 4c7a6f74
      Wang Shilong 提交于
      We are really suffering from now ulist's implementation, some developers
      gave their try, and i just gave some of my ideas for things:
      
       1. use list+rb_tree instead of arrary+rb_tree
      
       2. add cur_list to iterator rather than ulist structure.
      
       3. add seqnum into every node when they are added, this is
       used to do selfcheck when iterating node.
      
      I noticed Zach Brown's comments before, long term is to kick off
      ulist implementation, however, for now, we need at least avoid
      arrary from ulist.
      
      Cc: Liu Bo <bo.li.liu@oracle.com>
      Cc: Josef Bacik <jbacik@fb.com>
      Cc: Zach Brown <zab@redhat.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      4c7a6f74
    • W
      Btrfs: fix memory leaks on walking backrefs failure · f05c4746
      Wang Shilong 提交于
      When walking backrefs, we may iterate every inode's extent
      and add/merge them into ulist, and the caller will free memory
      from ulist.
      
      However, if we fail to allocate inode's extents element
      memory or ulist_add() fail to allocate memory, we won't
      add allocated memory into ulist, and the caller won't
      free some allocated memory thus memory leaks happen.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f05c4746
    • F
      Btrfs: fix send file hole detection leading to data corruption · bf54f412
      Filipe David Borba Manana 提交于
      There was a case where file hole detection was incorrect and it would
      cause an incremental send to override a section of a file with zeroes.
      
      This happened in the case where between the last leaf we processed which
      contained a file extent item for our current inode and the leaf we're
      currently are at (and has a file extent item for our current inode) there
      are only leafs containing exclusively file extent items for our current
      inode, and none of them was updated since the previous send operation.
      The file hole detection code would incorrectly consider the file range
      covered by these leafs as a hole.
      
      A test case for xfstests follows soon.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      bf54f412
    • W
      Btrfs: add a reschedule point in btrfs_find_all_roots() · bca1a290
      Wang Shilong 提交于
      I can easily trigger the following warnings when enabling quota
      in my virtual machine(running Opensuse), Steps are firstly creating
      a subvolume full of fragment extents, and then create many snapshots
      (500 in my test case).
      
      [ 2362.808459] BUG: soft lockup - CPU#0 stuck for 22s! [btrfs-qgroup-re:1970]
      
      [ 2362.809023] task: e4af8450 ti: e371c000 task.ti: e371c000
      [ 2362.809026] EIP: 0060:[<fa38f4ae>] EFLAGS: 00000246 CPU: 0
      [ 2362.809049] EIP is at __merge_refs+0x5e/0x100 [btrfs]
      [ 2362.809051] EAX: 00000000 EBX: cfadbcf0 ECX: 00000000 EDX: cfadbcb0
      [ 2362.809052] ESI: dd8d3370 EDI: e371dde0 EBP: e371dd6c ESP: e371dd5c
      [ 2362.809054]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      [ 2362.809055] CR0: 80050033 CR2: ac454d50 CR3: 009a9000 CR4: 001407d0
      [ 2362.809099] Stack:
      [ 2362.809100]  00000001 e371dde0 dfcc6890 f29f8000 e371de28 fa39016d 00000011 00000001
      [ 2362.809105]  99bfc000 00000000 93928000 00000000 00000001 00000050 e371dda8 00000001
      [ 2362.809109]  f3a31000 f3413000 00000001 e371ddb8 000040a8 00000202 00000000 00000023
      [ 2362.809113] Call Trace:
      [ 2362.809136]  [<fa39016d>] find_parent_nodes+0x34d/0x1280 [btrfs]
      [ 2362.809156]  [<fa391172>] btrfs_find_all_roots+0xb2/0x110 [btrfs]
      [ 2362.809174]  [<fa3934a8>] btrfs_qgroup_rescan_worker+0x358/0x7a0 [btrfs]
      [ 2362.809180]  [<c024d0ce>] ? lock_timer_base.isra.39+0x1e/0x40
      [ 2362.809199]  [<fa3648df>] worker_loop+0xff/0x470 [btrfs]
      [ 2362.809204]  [<c027a88a>] ? __wake_up_locked+0x1a/0x20
      [ 2362.809221]  [<fa3647e0>] ? btrfs_queue_worker+0x2b0/0x2b0 [btrfs]
      [ 2362.809225]  [<c025ebbc>] kthread+0x9c/0xb0
      [ 2362.809229]  [<c06b487b>] ret_from_kernel_thread+0x1b/0x30
      [ 2362.809233]  [<c025eb20>] ? kthread_create_on_node+0x110/0x110
      
      By adding a reschedule point at the end of btrfs_find_all_roots(), i no longer
      hit these warnings.
      
      Cc: Josef Bacik <jbacik@fb.com>
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      bca1a290
    • F
      Btrfs: make send's file extent item search more efficient · 7fdd29d0
      Filipe David Borba Manana 提交于
      Instead of looking for a file extent item, process it, release the path
      and do a btree search for the next file extent item, just process all
      file extent items in a leaf without intermediate btree searches. This way
      we save cpu and we're not blocking other tasks or affecting concurrency on
      the btree, because send's paths use the commit root and skip btree node/leaf
      locking.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      7fdd29d0
    • W
      Btrfs: fix to catch all errors when resolving indirect ref · 95def2ed
      Wang Shilong 提交于
      We can only tolerate ENOENT here, for other errors, we should
      return directly.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      95def2ed
    • W
      Btrfs: fix protection between walking backrefs and root deletion · 538f72cd
      Wang Shilong 提交于
      There is a race condition between resolving indirect ref and root deletion,
      and we should gurantee that root can not be destroyed to avoid accessing
      broken tree here.
      
      Here we fix it by holding @subvol_srcu, and we will release it as soon
      as we have held root node lock.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      538f72cd
    • G
      btrfs: fix warning while merging two adjacent extents · 3c9665df
      Gui Hecheng 提交于
      When we have two adjacent extents in relink_extent_backref,
      we try to merge them. When we use btrfs_search_slot to locate the
      slot for the current extent, we shouldn't set "ins_len = 1",
      because we will merge it into the previous extent rather than
      insert a new item. Otherwise, we may happen to create a new leaf
      in btrfs_search_slot and path->slot[0] will be 0. Then we try to
      fetch the previous item using "path->slots[0]--", and it will cause
      a warning as follows:
      
      	[  145.713385] WARNING: CPU: 3 PID: 1796 at fs/btrfs/extent_io.c:5043 map_private_extent_buffer+0xd4/0xe0
      	[  145.713387] btrfs bad mapping eb start 53370886 len 4096, wanted 167772306 8
      	...
      	[  145.713462]  [<ffffffffa034b1f4>] map_private_extent_buffer+0xd4/0xe0
      	[  145.713476]  [<ffffffffa030097a>] ? btrfs_free_path+0x2a/0x40
      	[  145.713485]  [<ffffffffa0340864>] btrfs_get_token_64+0x64/0xf0
      	[  145.713498]  [<ffffffffa033472c>] relink_extent_backref+0x41c/0x820
      	[  145.713508]  [<ffffffffa0334d69>] btrfs_finish_ordered_io+0x239/0xa80
      
      I encounter this warning when running defrag having mkfs.btrfs
      with option -M. At the same time there are read/writes & snapshots
      running at background.
      Signed-off-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3c9665df