1. 30 5月, 2012 22 次提交
    • S
      Btrfs: add device counters for detected IO and checksum errors · 442a4f63
      Stefan Behrens 提交于
      The goal is to detect when drives start to get an increased error rate,
      when drives should be replaced soon. Therefore statistic counters are
      added that count IO errors (read, write and flush). Additionally, the
      software detected errors like checksum errors and corrupted blocks are
      counted.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      442a4f63
    • A
      btrfs: Drop unused function btrfs_abort_devices() · d07eb911
      Asias He 提交于
      1) This function is not used anywhere.
      
      2) Using the blk_abort_queue() to abort the queue seems not correct.
      blk_abort_queue() is used for timeout handling (block/blk-timeout.c).
      
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: linux-btrfs@vger.kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NAsias He <asias@redhat.com>
      d07eb911
    • M
      Btrfs: fix the same inode id problem when doing auto defragment · 762f2263
      Miao Xie 提交于
      Two files in the different subvolumes may have the same inode id, so
      The rb-tree which is used to manage the defragment object must take it
      into account. This patch fix this problem.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      762f2263
    • J
      Btrfs: fall back to non-inline if we don't have enough space · 2adcac1a
      Josef Bacik 提交于
      If cow_file_range_inline fails with ENOSPC we abort the transaction which
      isn't very nice.  This really shouldn't be happening anyways but there's no
      sense in making it a horrible error when we can easily just go allocate
      normal data space for this stuff.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      2adcac1a
    • J
      Btrfs: fix how we deal with the orphan block rsv · 8a35d95f
      Josef Bacik 提交于
      Ceph was hitting this race where we would remove an inode from the per-root
      orphan list before we would release the space we had reserved for the inode.
      We actually don't need a list or anything, we just need to make sure the
      root doesn't try to free up the orphan reserve until after the inodes have
      released their reservations.  So use an atomic counter instead of a list on
      the root and only decrement the counter after we've released our
      reservation.  I've tested this as well as several others and we no longer
      see the warnings that you would see while running ceph.  Thanks,
      Btrfs: fix how we deal with the orphan block rsv
      
      Ceph was hitting this race where we would remove an inode from the per-root
      orphan list before we would release the space we had reserved for the inode.
      We actually don't need a list or anything, we just need to make sure the
      root doesn't try to free up the orphan reserve until after the inodes have
      released their reservations.  So use an atomic counter instead of a list on
      the root and only decrement the counter after we've released our
      reservation.  I've tested this as well as several others and we no longer
      see the warnings that you would see while running ceph.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      8a35d95f
    • J
      Btrfs: convert the inode bit field to use the actual bit operations · 72ac3c0d
      Josef Bacik 提交于
      Miao pointed this out while I was working on an orphan problem that messing
      with a bitfield where different ranges are protected by different locks
      doesn't work out right.  Turns out we've been doing this forever where we
      have different parts of the bit field protected by either no lock at all or
      different locks which could cause all sorts of weird problems including the
      issue I was hitting.  So instead make a runtime_flags thing that we use the
      normal bit operations on that are all atomic so we can keep having our
      no/different locking for the different flags and then make force_compress
      it's own thing so it can be treated normally.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      72ac3c0d
    • J
      Btrfs: merge contigous regions when loading free space cache · cd023e7b
      Josef Bacik 提交于
      When we write out the free space cache we will write out everything that is
      in our in memory tree, and then we will just walk the pinned extents tree
      and write anything we see there.  The problem with this is that during
      normal operations the pinned extents will be merged back into the free space
      tree normally, and then we can allocate space from the merged areas and
      commit them to the tree log.  If we crash and replay the tree log we will
      crash again because the tree log will try to free up space from what looks
      like 2 seperate but contiguous entries, since one entry is from the original
      free space cache and the other was a pinned extent that was merged back.  To
      fix this we just need to walk the free space tree after we load it and merge
      contiguous entries back together.  This will keep the tree log stuff from
      breaking and it will make the allocator behave more nicely.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      cd023e7b
    • L
      Btrfs: do not do balance in readonly mode · 9ba1f6e4
      Liu Bo 提交于
      In normal cases, we would not be allowed to do balance in RO mode.
      However, when we're using a seeding device and adding another device to sprout,
      things will change:
      
      $ mkfs.btrfs /dev/sdb7
      $ btrfstune -S 1 /dev/sdb7
      $ mount /dev/sdb7 /mnt/btrfs -o ro
      $ btrfs fi bal /mnt/btrfs   -----------------------> fail.
      $ btrfs dev add /dev/sdb8 /mnt/btrfs
      $ btrfs fi bal /mnt/btrfs   -----------------------> works!
      
      It should not be designed as an exception, and we'd better add another check for
      mnt flags.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      9ba1f6e4
    • L
      Btrfs: use fastpath in extent state ops as much as possible · d1ac6e41
      Liu Bo 提交于
      Fully utilize our extent state's new helper functions to use
      fastpath as much as possible.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      d1ac6e41
    • L
      Btrfs: fix wrong error returned by adding a device · f8c5d0b4
      Liu Bo 提交于
      Reproduce:
      $ mkfs.btrfs /dev/sdb7
      $ mount /dev/sdb7 /mnt/btrfs -o ro
      $ btrfs dev add /dev/sdb8 /mnt/btrfs
      ERROR: error adding the device '/dev/sdb8' - Invalid argument
      
      Since we mount with readonly options, and /dev/sdb7 is not a seeding one,
      a readonly notification is preferred.
      Signed-off-by: NLiu Bo <liubo2009@cn.fujitsu.com>
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      f8c5d0b4
    • J
      Btrfs: finish ordered extents in their own thread · 5fd02043
      Josef Bacik 提交于
      We noticed that the ordered extent completion doesn't really rely on having
      a page and that it could be done independantly of ending the writeback on a
      page.  This patch makes us not do the threaded endio stuff for normal
      buffered writes and direct writes so we can end page writeback as soon as
      possible (in irq context) and only start threads to do the ordered work when
      it is actually done.  Compression needs to be reworked some to take
      advantage of this as well, but atm it has to do a find_get_page in its endio
      handler so it must be done in its own thread.  This makes direct writes
      quite a bit faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5fd02043
    • J
      Btrfs: do not check delalloc when updating disk_i_size · 4e899152
      Josef Bacik 提交于
      We are checking delalloc to see if it is ok to update the i_size.  There are
      2 cases it stops us from updating
      
      1) If there is delalloc between our current disk_i_size and this ordered
      extent
      
      2) If there is delalloc between our current ordered extent and the next
      ordered extent
      
      These tests are racy however since we can set delalloc for these ranges at
      any time.  Also for the first case if we notice there is delalloc between
      disk_i_size and our ordered extent we will not update disk_i_size and assume
      that when that delalloc bit gets written out it will update everything
      properly.  However if we crash before that we will have file extents outside
      of our i_size, which is not good, so this test is dangerous as well as racy.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      4e899152
    • J
      Btrfs: avoid buffer overrun in mount option handling · f60d16a8
      Jim Meyering 提交于
      There is an off-by-one error: allocating room for a maximal result
      string but without room for a trailing NUL.  That, can lead to
      returning a transformed string that is not NUL-terminated, and
      then to a caller reading beyond end of the malloc'd buffer.
      
      Rewrite to s/kzalloc/kmalloc/, remove unwarranted use of strncpy
      (the result is guaranteed to fit), remove dead strlen at end, and
      change a few variable names and comments.
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NJim Meyering <meyering@redhat.com>
      f60d16a8
    • J
      Btrfs: NUL-terminate path buffer in DEV_INFO ioctl result · a27202fb
      Jim Meyering 提交于
      A device with name of length BTRFS_DEVICE_PATH_NAME_MAX or longer
      would not be NUL-terminated in the DEV_INFO ioctl result buffer.
      Signed-off-by: NJim Meyering <meyering@redhat.com>
      a27202fb
    • J
      Btrfs: avoid buffer overrun in btrfs_printk · f07c9a79
      Jim Meyering 提交于
      The buffer read-overrun would be triggered by a printk format
      starting with <N>, where N is a single digit.  NUL-terminate
      after strncpy.  Use memcpy, not strncpy, since we know the
      string we're copying fits in the destination buffer and
      contains no NUL byte.
      Signed-off-by: NJim Meyering <meyering@redhat.com>
      f07c9a79
    • D
      Fix minor type issues · 2eec6c81
      Daniel J Blueman 提交于
      Address some minor type issues identified by sparse checker.
      Signed-off-by: NDaniel J Blueman <daniel@quora.org>
      2eec6c81
    • S
      btrfs: allow changing 'thread_pool' size at remount time · 0d2450ab
      Sergei Trofimovich 提交于
      Changing 'mount -oremount,thread_pool=2 /' didn't make any effect:
      
      maximum amount of worker threads is specified in 2 places:
      - in 'strict btrfs_fs_info::thread_pool_size'
      - in each worker struct: 'struct btrfs_workers::max_workers'
      
      'mount -oremount' updated only 'btrfs_fs_info::thread_pool_size'.
      
      Fix it by pushing new maximum value to all created worker structures
      as well.
      
      Cc: Josef Bacik <josef@redhat.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>
      0d2450ab
    • J
      Btrfs: do not do filemap_write_and_wait_range in fsync · 0885ef5b
      Josef Bacik 提交于
      We already do the btrfs_wait_ordered_range which will do this for us, so
      just remove this call so we don't call it twice.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0885ef5b
    • J
      Btrfs: remove useless waiting and extra filemap work · 551ebb2d
      Josef Bacik 提交于
      In btrfs_wait_ordered_range we have been calling filemap_fdata_write() twice
      because compression does strange things and then waiting.  Then we look up
      ordered extents and if we find any we will always schedule_timeout(); once
      and then loop back around and do it all again.  We will even check to see if
      there is delalloc pages on this range and loop again.  So this patch gets
      rid of the multipe fdata_write() calls and just does
      filemap_write_and_wait().  In the case of compression we will still find the
      ordered extents and start those individually if we need to so that is ok,
      but in the normal buffered case we avoid all this weird overhead.
      
      Then in the case of the schedule_timeout(1), we don't need it.  All callers
      either 1) don't care, they just want to make sure what they just wrote maeks
      it to disk or 2) are doing the lock()->lookup ordered->unlock->flush thing
      in which case it will lock and check for ordered extents _anyway_ so get
      back to them as quickly as possible.  The delaloc check is simply not
      needed, this only catches the case where we write to the file again since
      doing the filemap_write_and_wait() and if the caller truly cares about that
      it will take care of everything itself.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      551ebb2d
    • J
      Btrfs: fix compile warnings in extent_io.c · d7dbe9e7
      Josef Bacik 提交于
      These warnings are bogus since we will always have at least one page in an
      eb, but to make the compiler happy just set ret = 0 in these two cases.
      Thanks,
      Btrfs: fix compile warnings in extent_io.c
      
      These warnings are bogus since we will always have at least one page in an
      eb, but to make the compiler happy just set ret = 0 in these two cases.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d7dbe9e7
    • J
      Btrfs: cache no acl on new inodes · 30f8fe3e
      Josef Bacik 提交于
      When running compilebench I noticed we were spending some time looking up
      acls on new inodes, which shouldn't be happening since there were no acls.
      This is because when we init acls on the inode after creating them we don't
      cache the fact there are no acls if there aren't any.  Doing this adds a
      little bit of a bump to my compilebench runs.  Thanks,
      Btrfs: cache no acl on new inodes
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      30f8fe3e
    • J
      Btrfs: use i_version instead of our own sequence · 0c4d2d95
      Josef Bacik 提交于
      We've been keeping around the inode sequence number in hopes that somebody
      would use it, but nobody uses it and people actually use i_version which
      serves the same purpose, so use i_version where we used the incore inode's
      sequence number and that way the sequence is updated properly across the
      board, and not just in file write.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0c4d2d95
  2. 11 5月, 2012 7 次提交
  3. 07 5月, 2012 4 次提交
  4. 06 5月, 2012 6 次提交
    • C
      Btrfs: avoid sleeping in verify_parent_transid while atomic · b9fab919
      Chris Mason 提交于
      verify_parent_transid needs to lock the extent range to make
      sure no IO is underway, and so it can safely clear the
      uptodate bits if our checks fail.
      
      But, a few callers are using it with spinlocks held.  Most
      of the time, the generation numbers are going to match, and
      we don't want to switch to a blocking lock just for the error
      case.  This adds an atomic flag to verify_parent_transid,
      and changes it to return EAGAIN if it needs to block to
      properly verifiy things.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b9fab919
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha · 03cb00b3
      Linus Torvalds 提交于
      Pull alpha fixes from Matt Turner:
       "My alpha tree is back up (after taking quite some time to get my GPG
        key signed).  It contains just some simple fixes."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
        alpha: silence 'const' warning in sys_marvel.c
        alpha: include module.h to fix modpost on Tsunami
        alpha: properly define get/set_rtc_time on Marvel/SMP
        alpha: VGA_HOSE depends on VGA_CONSOLE
      03cb00b3
    • J
      TTY: pdc_cons, fix regression in close · 49a5f3cf
      Jiri Slaby 提交于
      The test in pdc_console_tty_close '!tty->count' was always wrong
      because tty->count is decremented after tty->ops->close is called and
      thus can never be zero. Hence the 'then' branch was never executed and
      the timer never deleted.
      
      This did not matter until commit 5dd5bc40 ("TTY: pdc_cons, use
      tty_port").  There we needed to set TTY in tty_port to NULL, but this
      never happened due to the bug above.
      
      So change the test to really trigger at the last close by changing the
      condition to 'tty->count == 1'.
      
      Well, the driver should not touch tty->count at all.  It should use
      tty_port->count and count open count there itself.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Reported-and-tested-by: NMikulas Patocka <mpatocka@redhat.com>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49a5f3cf
    • L
      Merge tag 'sound-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 1c2f9548
      Linus Torvalds 提交于
      Pull sound sound fixes from Takashi Iwai:
       "As good as nothing exciting here; just a few trivial fixes for various
        ASoC stuff."
      
      * tag 'sound-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ASoC: omap-pcm: Free dma buffers in case of error.
        ASoC: s3c2412-i2s: Fix dai registration
        ASoC: wm8350: Don't use locally allocated codec struct
        ASoC: tlv312aic23: unbreak resume
        ASoC: bf5xx-ssm2602: Set DAI format
        ASoC: core: check of_property_count_strings failure
        ASoC: dt: sgtl5000.txt: Add description for 'reg' field
        ASoC: wm_hubs: Make sure we don't disable differential line outputs
      1c2f9548
    • L
      Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux · 59068e36
      Linus Torvalds 提交于
      Pull an ACPI patch from Len Brown:
       "It fixes a D3 issue new in 3.4-rc1."
      
      By Lin Ming via Len Brown:
      * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
        ACPI: Fix D3hot v D3cold confusion
      59068e36
    • S
      init: don't try mounting device as nfs root unless type fully matches · 377485f6
      Sasha Levin 提交于
      Currently, we'll try mounting any device who's major device number is
      UNNAMED_MAJOR as NFS root.  This would happen for non-NFS devices as
      well (such as 9p devices) but it wouldn't cause any issues since
      mounting the device as NFS would fail quickly and the code proceeded to
      doing the proper mount:
      
             [  101.522716] VFS: Unable to mount root fs via NFS, trying floppy.
             [  101.534499] VFS: Mounted root (9p filesystem) on device 0:18.
      
      Commit 6829a048102a ("NFS: Retry mounting NFSROOT") introduced retries
      when mounting NFS root, which means that now we don't immediately fail
      and instead it takes an additional 90+ seconds until we stop retrying,
      which has revealed the issue this patch fixes.
      
      This meant that it would take an additional 90 seconds to boot when
      we're not using a device type which gets detected in order before NFS.
      
      This patch modifies the NFS type check to require device type to be
      'Root_NFS' instead of requiring the device to have an UNNAMED_MAJOR
      major.  This makes boot process cleaner since we now won't go through
      the NFS mounting code at all when the device isn't an NFS root
      ("/dev/nfs").
      Signed-off-by: NSasha Levin <levinsasha928@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      377485f6
  5. 05 5月, 2012 1 次提交