1. 30 5月, 2012 12 次提交
    • J
      Btrfs: finish ordered extents in their own thread · 5fd02043
      Josef Bacik 提交于
      We noticed that the ordered extent completion doesn't really rely on having
      a page and that it could be done independantly of ending the writeback on a
      page.  This patch makes us not do the threaded endio stuff for normal
      buffered writes and direct writes so we can end page writeback as soon as
      possible (in irq context) and only start threads to do the ordered work when
      it is actually done.  Compression needs to be reworked some to take
      advantage of this as well, but atm it has to do a find_get_page in its endio
      handler so it must be done in its own thread.  This makes direct writes
      quite a bit faster.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      5fd02043
    • J
      Btrfs: do not check delalloc when updating disk_i_size · 4e899152
      Josef Bacik 提交于
      We are checking delalloc to see if it is ok to update the i_size.  There are
      2 cases it stops us from updating
      
      1) If there is delalloc between our current disk_i_size and this ordered
      extent
      
      2) If there is delalloc between our current ordered extent and the next
      ordered extent
      
      These tests are racy however since we can set delalloc for these ranges at
      any time.  Also for the first case if we notice there is delalloc between
      disk_i_size and our ordered extent we will not update disk_i_size and assume
      that when that delalloc bit gets written out it will update everything
      properly.  However if we crash before that we will have file extents outside
      of our i_size, which is not good, so this test is dangerous as well as racy.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      4e899152
    • J
      Btrfs: avoid buffer overrun in mount option handling · f60d16a8
      Jim Meyering 提交于
      There is an off-by-one error: allocating room for a maximal result
      string but without room for a trailing NUL.  That, can lead to
      returning a transformed string that is not NUL-terminated, and
      then to a caller reading beyond end of the malloc'd buffer.
      
      Rewrite to s/kzalloc/kmalloc/, remove unwarranted use of strncpy
      (the result is guaranteed to fit), remove dead strlen at end, and
      change a few variable names and comments.
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NJim Meyering <meyering@redhat.com>
      f60d16a8
    • J
      Btrfs: NUL-terminate path buffer in DEV_INFO ioctl result · a27202fb
      Jim Meyering 提交于
      A device with name of length BTRFS_DEVICE_PATH_NAME_MAX or longer
      would not be NUL-terminated in the DEV_INFO ioctl result buffer.
      Signed-off-by: NJim Meyering <meyering@redhat.com>
      a27202fb
    • J
      Btrfs: avoid buffer overrun in btrfs_printk · f07c9a79
      Jim Meyering 提交于
      The buffer read-overrun would be triggered by a printk format
      starting with <N>, where N is a single digit.  NUL-terminate
      after strncpy.  Use memcpy, not strncpy, since we know the
      string we're copying fits in the destination buffer and
      contains no NUL byte.
      Signed-off-by: NJim Meyering <meyering@redhat.com>
      f07c9a79
    • D
      Fix minor type issues · 2eec6c81
      Daniel J Blueman 提交于
      Address some minor type issues identified by sparse checker.
      Signed-off-by: NDaniel J Blueman <daniel@quora.org>
      2eec6c81
    • S
      btrfs: allow changing 'thread_pool' size at remount time · 0d2450ab
      Sergei Trofimovich 提交于
      Changing 'mount -oremount,thread_pool=2 /' didn't make any effect:
      
      maximum amount of worker threads is specified in 2 places:
      - in 'strict btrfs_fs_info::thread_pool_size'
      - in each worker struct: 'struct btrfs_workers::max_workers'
      
      'mount -oremount' updated only 'btrfs_fs_info::thread_pool_size'.
      
      Fix it by pushing new maximum value to all created worker structures
      as well.
      
      Cc: Josef Bacik <josef@redhat.com>
      Cc: Chris Mason <chris.mason@oracle.com>
      Reviewed-by: NJosef Bacik <josef@redhat.com>
      Signed-off-by: NSergei Trofimovich <slyfox@gentoo.org>
      0d2450ab
    • J
      Btrfs: do not do filemap_write_and_wait_range in fsync · 0885ef5b
      Josef Bacik 提交于
      We already do the btrfs_wait_ordered_range which will do this for us, so
      just remove this call so we don't call it twice.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0885ef5b
    • J
      Btrfs: remove useless waiting and extra filemap work · 551ebb2d
      Josef Bacik 提交于
      In btrfs_wait_ordered_range we have been calling filemap_fdata_write() twice
      because compression does strange things and then waiting.  Then we look up
      ordered extents and if we find any we will always schedule_timeout(); once
      and then loop back around and do it all again.  We will even check to see if
      there is delalloc pages on this range and loop again.  So this patch gets
      rid of the multipe fdata_write() calls and just does
      filemap_write_and_wait().  In the case of compression we will still find the
      ordered extents and start those individually if we need to so that is ok,
      but in the normal buffered case we avoid all this weird overhead.
      
      Then in the case of the schedule_timeout(1), we don't need it.  All callers
      either 1) don't care, they just want to make sure what they just wrote maeks
      it to disk or 2) are doing the lock()->lookup ordered->unlock->flush thing
      in which case it will lock and check for ordered extents _anyway_ so get
      back to them as quickly as possible.  The delaloc check is simply not
      needed, this only catches the case where we write to the file again since
      doing the filemap_write_and_wait() and if the caller truly cares about that
      it will take care of everything itself.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      551ebb2d
    • J
      Btrfs: fix compile warnings in extent_io.c · d7dbe9e7
      Josef Bacik 提交于
      These warnings are bogus since we will always have at least one page in an
      eb, but to make the compiler happy just set ret = 0 in these two cases.
      Thanks,
      Btrfs: fix compile warnings in extent_io.c
      
      These warnings are bogus since we will always have at least one page in an
      eb, but to make the compiler happy just set ret = 0 in these two cases.
      Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      d7dbe9e7
    • J
      Btrfs: cache no acl on new inodes · 30f8fe3e
      Josef Bacik 提交于
      When running compilebench I noticed we were spending some time looking up
      acls on new inodes, which shouldn't be happening since there were no acls.
      This is because when we init acls on the inode after creating them we don't
      cache the fact there are no acls if there aren't any.  Doing this adds a
      little bit of a bump to my compilebench runs.  Thanks,
      Btrfs: cache no acl on new inodes
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      30f8fe3e
    • J
      Btrfs: use i_version instead of our own sequence · 0c4d2d95
      Josef Bacik 提交于
      We've been keeping around the inode sequence number in hopes that somebody
      would use it, but nobody uses it and people actually use i_version which
      serves the same purpose, so use i_version where we used the incore inode's
      sequence number and that way the sequence is updated properly across the
      board, and not just in file write.  Thanks,
      Signed-off-by: NJosef Bacik <josef@redhat.com>
      0c4d2d95
  2. 11 5月, 2012 7 次提交
  3. 06 5月, 2012 1 次提交
    • C
      Btrfs: avoid sleeping in verify_parent_transid while atomic · b9fab919
      Chris Mason 提交于
      verify_parent_transid needs to lock the extent range to make
      sure no IO is underway, and so it can safely clear the
      uptodate bits if our checks fail.
      
      But, a few callers are using it with spinlocks held.  Most
      of the time, the generation numbers are going to match, and
      we don't want to switch to a blocking lock just for the error
      case.  This adds an atomic flag to verify_parent_transid,
      and changes it to return EAGAIN if it needs to block to
      properly verifiy things.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      b9fab919
  4. 05 5月, 2012 5 次提交
  5. 04 5月, 2012 5 次提交
  6. 02 5月, 2012 3 次提交
  7. 01 5月, 2012 2 次提交
  8. 30 4月, 2012 2 次提交
    • L
      autofs: make the autofsv5 packet file descriptor use a packetized pipe · 64f371bc
      Linus Torvalds 提交于
      The autofs packet size has had a very unfortunate size problem on x86:
      because the alignment of 'u64' differs in 32-bit and 64-bit modes, and
      because the packet data was not 8-byte aligned, the size of the autofsv5
      packet structure differed between 32-bit and 64-bit modes despite
      looking otherwise identical (300 vs 304 bytes respectively).
      
      We first fixed that up by making the 64-bit compat mode know about this
      problem in commit a32744d4 ("autofs: work around unhappy compat
      problem on x86-64"), and that made a 32-bit 'systemd' work happily on a
      64-bit kernel because everything then worked the same way as on a 32-bit
      kernel.
      
      But it turned out that 'automount' had actually known and worked around
      this problem in user space, so fixing the kernel to do the proper 32-bit
      compatibility handling actually *broke* 32-bit automount on a 64-bit
      kernel, because it knew that the packet sizes were wrong and expected
      those incorrect sizes.
      
      As a result, we ended up reverting that compatibility mode fix, and
      thus breaking systemd again, in commit fcbf94b9.
      
      With both automount and systemd doing a single read() system call, and
      verifying that they get *exactly* the size they expect but using
      different sizes, it seemed that fixing one of them inevitably seemed to
      break the other.  At one point, a patch I seriously considered applying
      from Michael Tokarev did a "strcmp()" to see if it was automount that
      was doing the operation.  Ugly, ugly.
      
      However, a prettier solution exists now thanks to the packetized pipe
      mode.  By marking the communication pipe as being packetized (by simply
      setting the O_DIRECT flag), we can always just write the bigger packet
      size, and if user-space does a smaller read, it will just get that
      partial end result and the extra alignment padding will simply be thrown
      away.
      
      This makes both automount and systemd happy, since they now get the size
      they asked for, and the kernel side of autofs simply no longer needs to
      care - it could pad out the packet arbitrarily.
      
      Of course, if there is some *other* user of autofs (please, please,
      please tell me it ain't so - and we haven't heard of any) that tries to
      read the packets with multiple writes, that other user will now be
      broken - the whole point of the packetized mode is that one system call
      gets exactly one packet, and you cannot read a packet in pieces.
      Tested-by: NMichael Tokarev <mjt@tls.msk.ru>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Thomas Meyer <thomas@m3y3r.de>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      64f371bc
    • L
      pipes: add a "packetized pipe" mode for writing · 9883035a
      Linus Torvalds 提交于
      The actual internal pipe implementation is already really about
      individual packets (called "pipe buffers"), and this simply exposes that
      as a special packetized mode.
      
      When we are in the packetized mode (marked by O_DIRECT as suggested by
      Alan Cox), a write() on a pipe will not merge the new data with previous
      writes, so each write will get a pipe buffer of its own.  The pipe
      buffer is then marked with the PIPE_BUF_FLAG_PACKET flag, which in turn
      will tell the reader side to break the read at that boundary (and throw
      away any partial packet contents that do not fit in the read buffer).
      
      End result: as long as you do writes less than PIPE_BUF in size (so that
      the pipe doesn't have to split them up), you can now treat the pipe as a
      packet interface, where each read() system call will read one packet at
      a time.  You can just use a sufficiently big read buffer (PIPE_BUF is
      sufficient, since bigger than that doesn't guarantee atomicity anyway),
      and the return value of the read() will naturally give you the size of
      the packet.
      
      NOTE! We do not support zero-sized packets, and zero-sized reads and
      writes to a pipe continue to be no-ops.  Also note that big packets will
      currently be split at write time, but that the size at which that
      happens is not really specified (except that it's bigger than PIPE_BUF).
      Currently that limit is the system page size, but we might want to
      explicitly support bigger packets some day.
      
      The main user for this is going to be the autofs packet interface,
      allowing us to stop having to care so deeply about exact packet sizes
      (which have had bugs with 32/64-bit compatibility modes).  But user
      space can create packetized pipes with "pipe2(fd, O_DIRECT)", which will
      fail with an EINVAL on kernels that do not support this interface.
      Tested-by: NMichael Tokarev <mjt@tls.msk.ru>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Cc: Ian Kent <raven@themaw.net>
      Cc: Thomas Meyer <thomas@m3y3r.de>
      Cc: stable@kernel.org  # needed for systemd/autofs interaction fix
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9883035a
  9. 29 4月, 2012 1 次提交
  10. 28 4月, 2012 2 次提交
    • L
      Revert "autofs: work around unhappy compat problem on x86-64" · fcbf94b9
      Linus Torvalds 提交于
      This reverts commit a32744d4.
      
      While that commit was technically the right thing to do, and made the
      x86-64 compat mode work identically to native 32-bit mode (and thus
      fixing the problem with a 32-bit systemd install on a 64-bit kernel), it
      turns out that the automount binaries had workarounds for this compat
      problem.
      
      Now, the workarounds are disgusting: doing an "uname()" to find out the
      architecture of the kernel, and then comparing it for the 64-bit cases
      and fixing up the size of the read() in automount for those.  And they
      were confused: it's not actually a generic 64-bit issue at all, it's
      very much tied to just x86-64, which has different alignment for an
      'u64' in 64-bit mode than in 32-bit mode.
      
      But the end result is that fixing the compat layer actually breaks the
      case of a 32-bit automount on a x86-64 kernel.
      
      There are various approaches to fix this (including just doing a
      "strcmp()" on current->comm and comparing it to "automount"), but I
      think that I will do the one that teaches pipes about a special "packet
      mode", which will allow user space to not have to care too deeply about
      the padding at the end of the autofs packet.
      
      That change will make the compat workaround unnecessary, so let's revert
      it first, and get automount working again in compat mode.  The
      packetized pipes will then fix autofs for systemd.
      Reported-and-requested-by: NMichael Tokarev <mjt@tls.msk.ru>
      Cc: Ian Kent <raven@themaw.net>
      Cc: stable@kernel.org # for 3.3
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcbf94b9
    • C
      Btrfs: reduce lock contention during extent insertion · dc7fdde3
      Chris Mason 提交于
      We're spending huge amounts of time on lock contention during
      end_io processing because we unconditionally assume we are overwriting
      an existing extent in the file for each IO.
      
      This checks to see if we are outside i_size, and if so, it uses a
      less expensive readonly search of the btree to look for existing
      extents.
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      dc7fdde3