1. 29 8月, 2012 3 次提交
    • S
      Btrfs: remove superblock writing after fatal error · 68ce9682
      Stefan Behrens 提交于
      With commit acce952b, btrfs was changed to flag the filesystem with
      BTRFS_SUPER_FLAG_ERROR and switch to read-only mode after a fatal
      error happened like a write I/O errors of all mirrors.
      In such situations, on unmount, the superblock is written in
      btrfs_error_commit_super(). This is done with the intention to be able
      to evaluate the error flag on the next mount. A warning is printed
      in this case during the next mount and the log tree is ignored.
      
      The issue is that it is possible that the superblock points to a root
      that was not written (due to write I/O errors).
      The result is that the filesystem cannot be mounted. btrfsck also does
      not start and all the other btrfs-progs tools fail to start as well.
      However, mount -o recovery is working well and does the right things
      to recover the filesystem (i.e., don't use the log root, clear the
      free space cache and use the next mountable root that is stored in the
      root backup array).
      
      This patch removes the writing of the superblock when
      BTRFS_SUPER_FLAG_ERROR is set, and removes the handling of the error
      flag in the mount function.
      
      These lines can be used to reproduce the issue (using /dev/sdm):
      SCRATCH_DEV=/dev/sdm
      SCRATCH_MNT=/mnt
      echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup create foo
      ls -alLF /dev/mapper/foo
      mkfs.btrfs /dev/mapper/foo
      mount /dev/mapper/foo $SCRATCH_MNT
      echo bar > $SCRATCH_MNT/foo
      sync
      echo 0 25165824 error | dmsetup reload foo
      dmsetup resume foo
      ls -alF $SCRATCH_MNT
      touch $SCRATCH_MNT/1
      ls -alF $SCRATCH_MNT
      sleep 35
      echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup reload foo
      dmsetup resume foo
      sleep 1
      umount $SCRATCH_MNT
      btrfsck /dev/mapper/foo
      dmsetup remove foo
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      68ce9682
    • J
      Btrfs: barrier before waitqueue_active · 66657b31
      Josef Bacik 提交于
      We need a barrir before calling waitqueue_active otherwise we will miss
      wakeups.  So in places that do atomic_dec(); then atomic_read() use
      atomic_dec_return() which imply a memory barrier (see memory-barriers.txt)
      and then add an explicit memory barrier everywhere else that need them.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      66657b31
    • A
      Btrfs: fix deadlock in wait_for_more_refs · 1fa11e26
      Arne Jansen 提交于
      Commit a168650c introduced a waiting mechanism to prevent busy waiting in
      btrfs_run_delayed_refs. This can deadlock with btrfs_run_ordered_operations,
      where a tree_mod_seq is held while waiting for the io to complete, while
      the end_io calls btrfs_run_delayed_refs.
      This whole mechanism is unnecessary. If not enough runnable refs are
      available to satisfy count, just return as count is more like a guideline
      than a strict requirement.
      In case we have to run all refs, commit transaction makes sure that no
      other threads are working in the transaction anymore, so we just assert
      here that no refs are blocked.
      Signed-off-by: NArne Jansen <sensille@gmx.net>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      1fa11e26
  2. 26 7月, 2012 1 次提交
    • A
      Btrfs: introduce subvol uuids and times · 8ea05e3a
      Alexander Block 提交于
      This patch introduces uuids for subvolumes. Each
      subvolume has it's own uuid. In case it was snapshotted,
      it also contains parent_uuid. In case it was received,
      it also contains received_uuid.
      
      It also introduces subvolume ctime/otime/stime/rtime. The
      first two are comparable to the times found in inodes. otime
      is the origin/creation time and ctime is the change time.
      stime/rtime are only valid on received subvolumes.
      stime is the time of the subvolume when it was
      sent. rtime is the time of the subvolume when it was
      received.
      
      Additionally to the times, we have a transid for each
      time. They are updated at the same place as the times.
      
      btrfs receive uses stransid and rtransid to find out
      if a received subvolume changed in the meantime.
      
      If an older kernel mounts a filesystem with the
      extented fields, all fields become invalid. The next
      mount with a new kernel will detect this and reset the
      fields.
      Signed-off-by: NAlexander Block <ablock84@googlemail.com>
      Reviewed-by: NDavid Sterba <dave@jikos.cz>
      Reviewed-by: NArne Jansen <sensille@gmx.net>
      Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Reviewed-by: NAlex Lyakas <alex.bolshoy.btrfs@gmail.com>
      8ea05e3a
  3. 24 7月, 2012 3 次提交
  4. 12 7月, 2012 1 次提交
  5. 10 7月, 2012 3 次提交
  6. 03 7月, 2012 2 次提交
  7. 21 6月, 2012 1 次提交
  8. 15 6月, 2012 8 次提交
  9. 30 5月, 2012 6 次提交
    • S
      Btrfs: read device stats on mount, write modified ones during commit · 733f4fbb
      Stefan Behrens 提交于
      The device statistics are written into the device tree with each
      transaction commit. Only modified statistics are written.
      When a filesystem is mounted, the device statistics for each involved
      device are read from the device tree and used to initialize the
      counters.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      733f4fbb
    • S
      Btrfs: add device counters for detected IO and checksum errors · 442a4f63
      Stefan Behrens 提交于
      The goal is to detect when drives start to get an increased error rate,
      when drives should be replaced soon. Therefore statistic counters are
      added that count IO errors (read, write and flush). Additionally, the
      software detected errors like checksum errors and corrupted blocks are
      counted.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      442a4f63
    • A
      btrfs: Drop unused function btrfs_abort_devices() · d07eb911
      Asias He 提交于
      1) This function is not used anywhere.
      
      2) Using the blk_abort_queue() to abort the queue seems not correct.
      blk_abort_queue() is used for timeout handling (block/blk-timeout.c).
      
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: linux-btrfs@vger.kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NAsias He <asias@redhat.com>
      d07eb911
    • J
      Btrfs: fix how we deal with the orphan block rsv · 8a35d95f
      Josef Bacik 提交于
      Ceph was hitting this race where we would remove an inode from the per-root
      orphan list before we would release the space we had reserved for the inode.
      We actually don't need a list or anything, we just need to make sure the
      root doesn't try to free up the orphan reserve until after the inodes have
      released their reservations.  So use an atomic counter instead of a list on
      the root and only decrement the counter after we've released our
      reservation.  I've tested this as well as several others and we no longer
      see the warnings that you would see while running ceph.  Thanks,
      Btrfs: fix how we deal with the orphan block rsv
      
      Ceph was hitting this race where we would remove an inode from the per-root
      orphan list before we would release the space we had reserved for the inode.
      We actually don't need a list or anything, we just need to make sure the
      root doesn't try to free up the orphan reserve until after the inodes have
      released their reservations.  So use an atomic counter instead of a list on
      the root and only decrement the counter after we've released our
      reservation.  I've tested this as well as several others and we no longer
      see the warnings that you would see while running ceph.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      8a35d95f
    • J
      Btrfs: convert the inode bit field to use the actual bit operations · 72ac3c0d
      Josef Bacik 提交于
      Miao pointed this out while I was working on an orphan problem that messing
      with a bitfield where different ranges are protected by different locks
      doesn't work out right.  Turns out we've been doing this forever where we
      have different parts of the bit field protected by either no lock at all or
      different locks which could cause all sorts of weird problems including the
      issue I was hitting.  So instead make a runtime_flags thing that we use the
      normal bit operations on that are all atomic so we can keep having our
      no/different locking for the different flags and then make force_compress
      it's own thing so it can be treated normally.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      72ac3c0d
    • J
      Btrfs: finish ordered extents in their own thread · 5fd02043
      Josef Bacik 提交于
      We noticed that the ordered extent completion doesn't really rely on having
      a page and that it could be done independantly of ending the writeback on a
      page.  This patch makes us not do the threaded endio stuff for normal
      buffered writes and direct writes so we can end page writeback as soon as
      possible (in irq context) and only start threads to do the ordered work when
      it is actually done.  Compression needs to be reworked some to take
      advantage of this as well, but atm it has to do a find_get_page in its endio
      handler so it must be done in its own thread.  This makes direct writes
      quite a bit faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5fd02043
  10. 26 5月, 2012 2 次提交
  11. 06 5月, 2012 1 次提交
    • C
      Btrfs: avoid sleeping in verify_parent_transid while atomic · b9fab919
      Chris Mason 提交于
      verify_parent_transid needs to lock the extent range to make
      sure no IO is underway, and so it can safely clear the
      uptodate bits if our checks fail.
      
      But, a few callers are using it with spinlocks held.  Most
      of the time, the generation numbers are going to match, and
      we don't want to switch to a blocking lock just for the error
      case.  This adds an atomic flag to verify_parent_transid,
      and changes it to return EAGAIN if it needs to block to
      properly verifiy things.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b9fab919
  12. 19 4月, 2012 2 次提交
  13. 06 4月, 2012 1 次提交
  14. 30 3月, 2012 1 次提交
  15. 29 3月, 2012 3 次提交
  16. 27 3月, 2012 2 次提交