1. 09 10月, 2012 2 次提交
  2. 04 10月, 2012 1 次提交
  3. 02 10月, 2012 15 次提交
    • M
      Btrfs: fix unnecessary warning when the fragments make the space alloc fail · 962197ba
      Miao Xie 提交于
      When we wrote some data by compress mode into a btrfs filesystem which was full
      of the fragments, the kernel will report:
      	BTRFS warning (device xxx): Aborting unused transaction.
      
      The reason is:
      We can not find a long enough free space to store the compressed data because
      of the fragmentary free space, and the compressed data can not be splited,
      so the kernel outputed the above message.
      
      In fact, btrfs can deal with this problem very well: it fall back to
      uncompressed IO, split the uncompressed data into small ones, and then
      store them into to the fragmentary free space. So we shouldn't output the
      above warning message.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      962197ba
    • J
      Btrfs: create a pinned em when writing to a prealloc range in DIO · 69ffb543
      Josef Bacik 提交于
      Wade Cline reported a problem where he was getting garbage and warnings when
      writing to a preallocated range via O_DIRECT.  This is because we weren't
      creating our normal pinned extent_map for the range we were writing to,
      which was causing all sorts of issues.  This patch fixes the problem and
      makes his testcase much happier.  Thanks,
      Reported-by: NWade Cline <clinew@linux.vnet.ibm.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      69ffb543
    • M
      Btrfs: fix corrupted metadata in the snapshot · 8407aa46
      Miao Xie 提交于
      When we delete a inode, we will remove all the delayed items including delayed
      inode update, and then truncate all the relative metadata. If there is lots of
      metadata, we will end the current transaction, and start a new transaction to
      truncate the left metadata. In this way, we will leave a inode item that its
      link counter is > 0, and also may leave some directory index items in fs/file tree
      after the current transaction ends. In other words, the metadata in this fs/file tree
      is inconsistent. If we create a snapshot for this tree now, we will find a inode with
      corrupted metadata in the new snapshot, and we won't continue to drop the left metadata,
      because its link counter is not 0.
      
      We fix this problem by updating the inode item before the current transaction ends.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      8407aa46
    • D
      btrfs: polish names of kmem caches · 837e1972
      David Sterba 提交于
      Usecase:
      
        watch 'grep btrfs < /proc/slabinfo'
      
      easy to watch all caches in one go.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      837e1972
    • L
      Btrfs: use flag EXTENT_DEFRAG for snapshot-aware defrag · 9e8a4a8b
      Liu Bo 提交于
      We're going to use this flag EXTENT_DEFRAG to indicate which range
      belongs to defragment so that we can implement snapshow-aware defrag:
      
      We set the EXTENT_DEFRAG flag when dirtying the extents that need
      defragmented, so later on writeback thread can differentiate between
      normal writeback and writeback started by defragmentation.
      Original-Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      9e8a4a8b
    • M
      Btrfs: add a new "type" field into the block reservation structure · 66d8f3dd
      Miao Xie 提交于
      Sometimes we need choose the method of the reservation according to the type
      of the block reservation, such as the reservation for the delayed inode update.
      Now we identify the type just by comparing the address of the reservation
      variants, it is very ugly if it is a temporary one because we need compare it
      with all the common reservation variants. So we add a new "type" field to keep
      the type the reservation variants.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      66d8f3dd
    • S
      Btrfs: do not take cleanup_work_sem in btrfs_run_delayed_iputs() · ac14aed6
      Sage Weil 提交于
      Josef has suggested that this is not necessary.  Removing it also avoids
      this lockdep splat (after the new sb_internal locking stuff was added):
      
      [  604.090449] ======================================================
      [  604.114819] [ INFO: possible circular locking dependency detected ]
      [  604.139262] 3.6.0-rc2-ceph-00144-g463b030 #1 Not tainted
      [  604.162193] -------------------------------------------------------
      [  604.186139] btrfs-cleaner/6669 is trying to acquire lock:
      [  604.209555]  (sb_internal#2){.+.+..}, at: [<ffffffffa0042b84>] start_transaction+0x124/0x430 [btrfs]
      [  604.257100]
      [  604.257100] but task is already holding lock:
      [  604.300366]  (&fs_info->cleanup_work_sem){.+.+..}, at: [<ffffffffa0048002>] btrfs_run_delayed_iputs+0x72/0x130 [btrfs]
      [  604.352989]
      [  604.352989] which lock already depends on the new lock.
      [  604.352989]
      [  604.427104]
      [  604.427104] the existing dependency chain (in reverse order) is:
      [  604.478493]
      [  604.478493] -> #1 (&fs_info->cleanup_work_sem){.+.+..}:
      [  604.529313]        [<ffffffff810b2c82>] lock_acquire+0xa2/0x140
      [  604.559621]        [<ffffffff81632b69>] down_read+0x39/0x4e
      [  604.589382]        [<ffffffffa004db98>] btrfs_lookup_dentry+0x218/0x550 [btrfs]
      [  604.596161] btrfs: unlinked 1 orphans
      [  604.675002]        [<ffffffffa006aadd>] create_subvol+0x62d/0x690 [btrfs]
      [  604.708859]        [<ffffffffa006d666>] btrfs_mksubvol.isra.52+0x346/0x3a0 [btrfs]
      [  604.772466]        [<ffffffffa006d7f2>] btrfs_ioctl_snap_create_transid+0x132/0x190 [btrfs]
      [  604.842245]        [<ffffffffa006d8ae>] btrfs_ioctl_snap_create+0x5e/0x80 [btrfs]
      [  604.912852]        [<ffffffffa00708ae>] btrfs_ioctl+0x138e/0x1990 [btrfs]
      [  604.951888]        [<ffffffff8118e9b8>] do_vfs_ioctl+0x98/0x560
      [  604.989961]        [<ffffffff8118ef11>] sys_ioctl+0x91/0xa0
      [  605.026628]        [<ffffffff8163d569>] system_call_fastpath+0x16/0x1b
      [  605.064404]
      [  605.064404] -> #0 (sb_internal#2){.+.+..}:
      [  605.126832]        [<ffffffff810b25e8>] __lock_acquire+0x1ac8/0x1b90
      [  605.163671]        [<ffffffff810b2c82>] lock_acquire+0xa2/0x140
      [  605.200228]        [<ffffffff8117dac6>] __sb_start_write+0xc6/0x1b0
      [  605.236818]        [<ffffffffa0042b84>] start_transaction+0x124/0x430 [btrfs]
      [  605.274029]        [<ffffffffa00431a3>] btrfs_start_transaction+0x13/0x20 [btrfs]
      [  605.340520]        [<ffffffffa004ccfa>] btrfs_evict_inode+0x19a/0x330 [btrfs]
      [  605.378720]        [<ffffffff811972c8>] evict+0xb8/0x1c0
      [  605.416057]        [<ffffffff811974d5>] iput+0x105/0x210
      [  605.452373]        [<ffffffffa0048082>] btrfs_run_delayed_iputs+0xf2/0x130 [btrfs]
      [  605.521627]        [<ffffffffa003b5e1>] cleaner_kthread+0xa1/0x120 [btrfs]
      [  605.560520]        [<ffffffff810791ee>] kthread+0xae/0xc0
      [  605.598094]        [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10
      [  605.636499]
      [  605.636499] other info that might help us debug this:
      [  605.636499]
      [  605.736504]  Possible unsafe locking scenario:
      [  605.736504]
      [  605.801931]        CPU0                    CPU1
      [  605.835126]        ----                    ----
      [  605.867093]   lock(&fs_info->cleanup_work_sem);
      [  605.898594]                                lock(sb_internal#2);
      [  605.931954]                                lock(&fs_info->cleanup_work_sem);
      [  605.965359]   lock(sb_internal#2);
      [  605.994758]
      [  605.994758]  *** DEADLOCK ***
      [  605.994758]
      [  606.075281] 2 locks held by btrfs-cleaner/6669:
      [  606.104528]  #0:  (&fs_info->cleaner_mutex){+.+...}, at: [<ffffffffa003b5d5>] cleaner_kthread+0x95/0x120 [btrfs]
      [  606.165626]  #1:  (&fs_info->cleanup_work_sem){.+.+..}, at: [<ffffffffa0048002>] btrfs_run_delayed_iputs+0x72/0x130 [btrfs]
      [  606.231297]
      [  606.231297] stack backtrace:
      [  606.287723] Pid: 6669, comm: btrfs-cleaner Not tainted 3.6.0-rc2-ceph-00144-g463b030 #1
      [  606.347823] Call Trace:
      [  606.376184]  [<ffffffff8162a77c>] print_circular_bug+0x1fb/0x20c
      [  606.409243]  [<ffffffff810b25e8>] __lock_acquire+0x1ac8/0x1b90
      [  606.441343]  [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
      [  606.474583]  [<ffffffff810b2c82>] lock_acquire+0xa2/0x140
      [  606.505934]  [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
      [  606.539429]  [<ffffffff8132babd>] ? do_raw_spin_unlock+0x5d/0xb0
      [  606.571719]  [<ffffffff8117dac6>] __sb_start_write+0xc6/0x1b0
      [  606.603498]  [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
      [  606.637405]  [<ffffffffa0042b84>] ? start_transaction+0x124/0x430 [btrfs]
      [  606.670165]  [<ffffffff81172e75>] ? kmem_cache_alloc+0xb5/0x160
      [  606.702144]  [<ffffffffa0042b84>] start_transaction+0x124/0x430 [btrfs]
      [  606.735562]  [<ffffffffa00256a6>] ? block_rsv_add_bytes+0x56/0x80 [btrfs]
      [  606.769861]  [<ffffffffa00431a3>] btrfs_start_transaction+0x13/0x20 [btrfs]
      [  606.804575]  [<ffffffffa004ccfa>] btrfs_evict_inode+0x19a/0x330 [btrfs]
      [  606.838756]  [<ffffffff81634c6b>] ? _raw_spin_unlock+0x2b/0x40
      [  606.872010]  [<ffffffff811972c8>] evict+0xb8/0x1c0
      [  606.903800]  [<ffffffff811974d5>] iput+0x105/0x210
      [  606.935416]  [<ffffffffa0048082>] btrfs_run_delayed_iputs+0xf2/0x130 [btrfs]
      [  606.970510]  [<ffffffffa003b5d5>] ? cleaner_kthread+0x95/0x120 [btrfs]
      [  607.005648]  [<ffffffffa003b5e1>] cleaner_kthread+0xa1/0x120 [btrfs]
      [  607.040724]  [<ffffffffa003b540>] ? btrfs_destroy_delayed_refs.isra.102+0x220/0x220 [btrfs]
      [  607.104740]  [<ffffffff810791ee>] kthread+0xae/0xc0
      [  607.137119]  [<ffffffff810b379d>] ? trace_hardirqs_on+0xd/0x10
      [  607.169797]  [<ffffffff8163e744>] kernel_thread_helper+0x4/0x10
      [  607.202472]  [<ffffffff81635430>] ? retint_restore_args+0x13/0x13
      [  607.235884]  [<ffffffff81079140>] ? flush_kthread_work+0x1a0/0x1a0
      [  607.268731]  [<ffffffff8163e740>] ? gs_change+0x13/0x13
      Signed-off-by: NSage Weil <sage@inktank.com>
      ac14aed6
    • J
      Btrfs: add hole punching · 2aaa6655
      Josef Bacik 提交于
      This patch adds hole punching via fallocate.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2aaa6655
    • J
      Btrfs: remove unused hint byte argument for btrfs_drop_extents · 2671485d
      Josef Bacik 提交于
      I audited all users of btrfs_drop_extents and found that nobody actually uses
      the hint_byte argument.  I'm sure it was used for something at some point but
      it's not used now, and the way the pinning works the disk bytenr would never be
      immediately useful anyway so lets just remove it.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2671485d
    • L
      Btrfs: fix a bug in checking whether a inode is already in log · 46d8bc34
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      The current btrfs checks if an inode is in log by comparing
      root's last_log_commit to inode's last_sub_trans[2].
      
      But the problem is that this root->last_log_commit is shared among
      inodes.
      
      Say we have N inodes to be logged, after the first inode,
      root's last_log_commit is updated and the N-1 remained files will
      be skipped.
      
      This fixes the bug by keeping a local copy of root's last_log_commit
      inside each inode and this local copy will be maintained itself.
      
      [1]: we regard each log transaction as a subset of btrfs's transaction,
      i.e. sub_trans
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      46d8bc34
    • M
      Btrfs: fix wrong orphan count of the fs/file tree · 321f0e70
      Miao Xie 提交于
      If we add a new orphan item, we should increase the atomic counter,
      not decrease it. Fix it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      321f0e70
    • L
      Btrfs: improve fsync by filtering extents that we want · 4e2f84e6
      Liu Bo 提交于
      This is based on Josef's "Btrfs: turbo charge fsync".
      
      The above Josef's patch performs very good in random sync write test,
      because we won't have too much extents to merge.
      
      However, it does not performs good on the test:
      dd if=/dev/zero of=foobar bs=4k count=12500 oflag=sync
      
      The reason is when we do sequencial sync write, we need to merge the
      current extent just with the previous one, so that we can get accumulated
      extents to log:
      
      A(4k) --> AA(8k) --> AAA(12k) --> AAAA(16k) ...
      
      So we'll have to flush more and more checksum into log tree, which is the
      bottleneck according to my tests.
      
      But we can avoid this by telling fsync the real extents that are needed
      to be logged.
      
      With this, I did the above dd sync write test (size=50m),
      
               w/o (orig)   w/ (josef's)   w/ (this)
      SATA      104KB/s       109KB/s       121KB/s
      ramdisk   1.5MB/s       1.5MB/s       10.7MB/s (613%)
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      4e2f84e6
    • J
      Btrfs: do not needlessly restart the transaction for enospc · ca7e70f5
      Josef Bacik 提交于
      We will stop and restart a transaction every time we move to a different leaf
      when truncating a file.  This is for enospc reasons, but really we could
      probably get away with doing this a little better by actually working until we
      hit an ENOSPC.  So add a ->failfast flag to the block_rsv and set it when we do
      truncates which will fail as soon as the block rsv runs out of space, and then
      at that point we can stop and restart the transaction and refill the block rsv
      and carry on.  This will make rm'ing of a file with lots of extents a bit
      faster.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ca7e70f5
    • J
      Btrfs: turbo charge fsync · 5dc562c5
      Josef Bacik 提交于
      At least for the vm workload.  Currently on fsync we will
      
      1) Truncate all items in the log tree for the given inode if they exist
      
      and
      
      2) Copy all items for a given inode into the log
      
      The problem with this is that for things like VMs you can have lots of
      extents from the fragmented writing behavior, and worst yet you may have
      only modified a few extents, not the entire thing.  This patch fixes this
      problem by tracking which transid modified our extent, and then when we do
      the tree logging we find all of the extents we've modified in our current
      transaction, sort them and commit them.  We also only truncate up to the
      xattrs of the inode and copy that stuff in normally, and then just drop any
      extents in the range we have that exist in the log already.  Here are some
      numbers of a 50 meg fio job that does random writes and fsync()s after every
      write
      
      		Original	Patched
      SATA drive	82KB/s		140KB/s
      Fusion drive	431KB/s		2532KB/s
      
      So around 2-6 times faster depending on your hardware.  There are a few
      corner cases, for example if you truncate at all we have to do it the old
      way since there is no way to be sure what is in the log is ok.  This
      probably could be done smarter, but if you write-fsync-truncate-write-fsync
      you deserve what you get.  All this work is in RAM of course so if your
      inode gets evicted from cache and you read it in and fsync it we'll do it
      the slow way if we are still in the same transaction that we last modified
      the inode in.
      
      The biggest cool part of this is that it requires no changes to the recovery
      code, so if you fsync with this patch and crash and load an old kernel, it
      will run the recovery and be a-ok.  I have tested this pretty thoroughly
      with an fsync tester and everything comes back fine, as well as xfstests.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      5dc562c5
    • J
      Btrfs: update last trans if we don't update the inode · 7c735313
      Josef Bacik 提交于
      There is a completely impossible situation to hit where you can preallocate
      a file, fsync it, write into the preallocated region, have the transaction
      commit twice and then fsync and then immediately lose power and lose all of
      the contents of the write.  This patch fixes this just so I feel better
      about the situation and because it is lightweight, we just update the
      last_trans when we finish an ordered IO and we don't update the inode
      itself.  This way we are completely safe and I feel better.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      7c735313
  4. 29 8月, 2012 6 次提交
    • L
      Btrfs: fix ordered extent leak when failing to start a transaction · d280e5be
      Liu Bo 提交于
      We cannot just return error before freeing ordered extent and releasing reserved
      space when we fail to start a transacion.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d280e5be
    • L
      Btrfs: fix a dio write regression · 24c03fa5
      Liu Bo 提交于
      This bug is introduced by commit 3b8bde746f6f9bd36a9f05f5f3b6e334318176a9
      (Btrfs: lock extents as we map them in DIO).
      
      In dio write, we should unlock the section which we didn't do IO on in case that
      we fall back to buffered write.  But we need to not only unlock the section
      but also cleanup reserved space for the section.
      
      This bug was found while running xfstests 133, with this 133 no longer complains.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      24c03fa5
    • J
      Btrfs: fix enospc problems when deleting a subvol · 5a24e84c
      Josef Bacik 提交于
      Subvol delete is a special kind of awful where we use the global reserve to
      cover the ENOSPC requirements.  The problem is once we're done removing
      everything we do a btrfs_update_inode(), which by default will try to do the
      delayed update stuff which will use it's own reserve.  There will be no
      space in this reserve and we'll return ENOSPC.  So instead use
      btrfs_update_inode_fallback() which will just fallback to updating the inode
      item in the case of enospc.  This is fine because the global reserve covers
      the space requirements for this.  With this patch I can now delete a subvol
      on a problem image Dave Sterba sent me.  Thanks,
      Reported-by: NDavid Sterba <dave@jikos.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      5a24e84c
    • J
      Btrfs: barrier before waitqueue_active · 66657b31
      Josef Bacik 提交于
      We need a barrir before calling waitqueue_active otherwise we will miss
      wakeups.  So in places that do atomic_dec(); then atomic_read() use
      atomic_dec_return() which imply a memory barrier (see memory-barriers.txt)
      and then add an explicit memory barrier everywhere else that need them.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      66657b31
    • J
      Btrfs: don't allocate a seperate csums array for direct reads · c329861d
      Josef Bacik 提交于
      We've been allocating a big array for csums instead of storing them in the
      io_tree like we do for buffered reads because previously we were locking the
      entire range, so we didn't have an extent state for each sector of the
      range.  But now that we do the range locking as we map the buffers we can
      limit the mapping lenght to sectorsize and use the private part of the
      io_tree for our csums.  This allows us to avoid an extra memory allocation
      for direct reads which could incur latency.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c329861d
    • J
      Btrfs: lock extents as we map them in DIO · eb838e73
      Josef Bacik 提交于
      A deadlock in xfstests 113 was uncovered by commit
      
      d187663e
      
      This is because we would not return EIOCBQUEUED for short AIO reads, instead
      we'd wait for the DIO to complete and then return the amount of data we
      transferred, which would allow our stuff to unlock the remaning amount.  But
      with this change this no longer happens, so if we have a short AIO read (for
      example if we try to read past EOF), we could leave the section from EOF to
      the end of where we tried to read locked.  Fixing this is tricky since there
      is no clear way to know exactly how much data DIO truly submitted for IO, so
      to make this less hard on ourselves and less combersome we need to lock the
      extents as we try to map them, and then we unlock any areas we didn't
      actually map.  This makes us completely safe from deadlocks and reliance on
      a particular behavior of the DIO code.  This also lays the groundwork for
      allowing us to use the normal csum storage method for reads which means we
      can remove an allocation.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      eb838e73
  5. 04 8月, 2012 1 次提交
  6. 31 7月, 2012 1 次提交
    • J
      btrfs: Convert to new freezing mechanism · b2b5ef5c
      Jan Kara 提交于
      We convert btrfs_file_aio_write() to use new freeze check.  We also add proper
      freeze protection to btrfs_page_mkwrite(). We also add freeze protection to
      the transaction mechanism to avoid starting transactions on frozen filesystem.
      At minimum this is necessary to stop iput() of unlinked file to change frozen
      filesystem during truncation.
      
      Checks in cleaner_kthread() and transaction_kthread() can be safely removed
      since btrfs_freeze() will lock the mutexes and thus block the threads (and they
      shouldn't have anything to do anyway).
      
      CC: linux-btrfs@vger.kernel.org
      CC: Chris Mason <chris.mason@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b2b5ef5c
  7. 26 7月, 2012 1 次提交
    • A
      Btrfs: introduce subvol uuids and times · 8ea05e3a
      Alexander Block 提交于
      This patch introduces uuids for subvolumes. Each
      subvolume has it's own uuid. In case it was snapshotted,
      it also contains parent_uuid. In case it was received,
      it also contains received_uuid.
      
      It also introduces subvolume ctime/otime/stime/rtime. The
      first two are comparable to the times found in inodes. otime
      is the origin/creation time and ctime is the change time.
      stime/rtime are only valid on received subvolumes.
      stime is the time of the subvolume when it was
      sent. rtime is the time of the subvolume when it was
      received.
      
      Additionally to the times, we have a transid for each
      time. They are updated at the same place as the times.
      
      btrfs receive uses stransid and rtransid to find out
      if a received subvolume changed in the meantime.
      
      If an older kernel mounts a filesystem with the
      extented fields, all fields become invalid. The next
      mount with a new kernel will detect this and reset the
      fields.
      Signed-off-by: NAlexander Block <ablock84@googlemail.com>
      Reviewed-by: NDavid Sterba <dave@jikos.cz>
      Reviewed-by: NArne Jansen <sensille@gmx.net>
      Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>
      8ea05e3a
  8. 24 7月, 2012 6 次提交
  9. 14 7月, 2012 3 次提交
  10. 03 7月, 2012 2 次提交
    • L
      Btrfs: fix wrong check during log recovery · 6bf02314
      Liu Bo 提交于
      When we're evicting an inode during log recovery, we need to ensure that the inode
      is not in orphan state any more, which means inode's run_time flags has _no_
      BTRFS_INODE_HAS_ORPHAN_ITEM.  Thus, the BUG_ON was triggered because of a wrong
      check for the flags.
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6bf02314
    • J
      Btrfs: fix dio write vs buffered read race · c3473e83
      Josef Bacik 提交于
      Miao pointed out there's a problem with mixing dio writes and buffered
      reads.  If the read happens between us invalidating the page range and
      actually locking the extent we can bring in pages into page cache.  Then
      once the write finishes if somebody tries to read again it will just find
      uptodate pages and we'll read stale data.  So we need to lock the extent and
      check for uptodate bits in the range.  If there are uptodate bits we need to
      unlock and invalidate again.  This will keep this race from happening since
      we will hold the extent locked until we create the ordered extent, and then
      teh read side always waits for ordered extents.  There was also a race in
      how we updated i_size, previously we were relying on the generic DIO stuff
      to adjust the i_size after the DIO had completed, but this happens outside
      of the extent lock which means reads could come in and not see the updated
      i_size.  So instead move this work into where we create the extents, and
      then this way the update ordered i_size stuff works properly in the endio
      handlers.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      c3473e83
  11. 21 6月, 2012 1 次提交
  12. 15 6月, 2012 1 次提交