1. 22 5月, 2010 6 次提交
  2. 17 5月, 2010 3 次提交
  3. 29 4月, 2010 3 次提交
  4. 28 4月, 2010 2 次提交
  5. 27 4月, 2010 4 次提交
    • T
      block: implement bd_claiming and claiming block · 6b4517a7
      Tejun Heo 提交于
      Currently, device claiming for exclusive open is done after low level
      open - disk->fops->open() - has completed successfully.  This means
      that exclusive open attempts while a device is already exclusively
      open will fail only after disk->fops->open() is called.
      
      cdrom driver issues commands during open() which means that O_EXCL
      open attempt can unintentionally inject commands to in-progress
      command stream for burning thus disturbing burning process.  In most
      cases, this doesn't cause problems because the first command to be
      issued is TUR which most devices can process in the middle of burning.
      However, depending on how a device replies to TUR during burning,
      cdrom driver may end up issuing further commands.
      
      This can't be resolved trivially by moving bd_claim() before doing
      actual open() because that means an open attempt which will end up
      failing could interfere other legit O_EXCL open attempts.
      ie. unconfirmed open attempts can fail others.
      
      This patch resolves the problem by introducing claiming block which is
      started by bd_start_claiming() and terminated either by bd_claim() or
      bd_abort_claiming().  bd_claim() from inside a claiming block is
      guaranteed to succeed and once a claiming block is started, other
      bd_start_claiming() or bd_claim() attempts block till the current
      claiming block is terminated.
      
      bd_claim() can still be used standalone although now it always
      synchronizes against claiming blocks, so the existing users will keep
      working without any change.
      
      blkdev_open() and open_bdev_exclusive() are converted to use claiming
      blocks so that exclusive open attempts from these functions don't
      interfere with the existing exclusive open.
      
      This problem was discovered while investigating bko#15403.
      
        https://bugzilla.kernel.org/show_bug.cgi?id=15403
      
      The burning problem itself can be resolved by updating userspace
      probing tools to always open w/ O_EXCL.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMatthias-Christian Ott <ott@mirix.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      6b4517a7
    • T
      block: factor out bd_may_claim() · 1a3cbbc5
      Tejun Heo 提交于
      Factor out bd_may_claim() from bd_claim(), add comments and apply a
      couple of cosmetic edits.  This is to prepare for further updates to
      claim path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      1a3cbbc5
    • N
      nfsd4: bug in read_buf · 2bc3c117
      Neil Brown 提交于
      When read_buf is called to move over to the next page in the pagelist
      of an NFSv4 request, it sets argp->end to essentially a random
      number, certainly not an address within the page which argp->p now
      points to.  So subsequent calls to READ_BUF will think there is much
      more than a page of spare space (the cast to u32 ensures an unsigned
      comparison) so we can expect to fall off the end of the second
      page.
      
      We never encountered thsi in testing because typically the only
      operations which use more than two pages are write-like operations,
      which have their own decoding logic.  Something like a getattr after a
      write may cross a page boundary, but it would be very unusual for it to
      cross another boundary after that.
      
      Cc: stable@kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      2bc3c117
    • D
      xfs: more swap extent fixes for dynamic fork offsets · dd77ef92
      Dave Chinner 提交于
      A new xfsqa test (226) with a prototype xfs_fsr change to try to
      handle dynamic fork offsets better triggers an assertion failure
      where the inode data fork is in btree format, yet there is room in
      the inode for it to be in extent format. The two inodes look like:
      
      before: ino 0x101 (target), num_extents 11, Max in-fork extents 6, broot size 40, fork offset 96
      before: ino 0x115 (temp),  num_extents 5, Max in-fork extents 3, broot size 40, fork offset 56
      after: ino 0x101 (target), num_extents 5, Max in-fork extents 6, broot size 40, fork offset 96
      after: ino 0x115 (temp), num_extents 11, Max in-fork extents 3, broot size 40, fork offset 56
      
      Basically the target inode ends up with 5 extents in btree format,
      but it had space for 6 extents in extent format, so ends up
      incorrect. Notably here the broot size is the same, and that is
      where the kernel code is going wrong - the btree root will fit, so
      it lets the swap go ahead.
      
      The check should not allow the swap to take place if the number of
      extents while in btree format is less than the number of extents
      that can fit in the inode in extent format. Adding that check will
      prevent this swap and corruption from occurring.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      dd77ef92
  6. 26 4月, 2010 1 次提交
  7. 25 4月, 2010 7 次提交
    • J
      Catch filesystems lacking s_bdi · 5129a469
      Jörn Engel 提交于
      noop_backing_dev_info is used only as a flag to mark filesystems that
      don't have any backing store, like tmpfs, procfs, spufs, etc.
      Signed-off-by: NJoern Engel <joern@logfs.org>
      
      Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
      to the noop_backing_dev_info is not legal and will not result in
      them being flushed, but we already catch this condition in
      __mark_inode_dirty() when checking for a registered bdi.
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      5129a469
    • P
      squashfs: fix potential buffer over-run on 4K block file systems · e0d1f700
      Phillip Lougher 提交于
      Sizing the buffer based on block size is incorrect, leading
      to a potential buffer over-run on 4K block size file systems
      (because the metadata block size is always 8K).  This bug
      doesn't seem have triggered because 4K block size file systems
      are not default, and also because metadata blocks after
      compression tend to be less than 4K.
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      e0d1f700
    • P
      squashfs: add missing buffer free · 370ec3d1
      Phillip Lougher 提交于
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      370ec3d1
    • P
      squashfs: fix warn_on when root inode is corrupted · 1cb08e97
      Phillip Lougher 提交于
      Fix warn_on triggered by mounting a fsfuzzer corrupted file system, where
      the root inode has been corrupted.
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      Reported-by: NSteve Grubb <sgrubb@redhat.com>
      1cb08e97
    • A
      fs/block_dev.c: fix performance regression in O_DIRECT|O_SYNC writes to block devices · b8af67e2
      Anton Blanchard 提交于
      We are seeing a large regression in database performance on recent
      kernels.  The database opens a block device with O_DIRECT|O_SYNC and a
      number of threads write to different regions of the file at the same time.
      
      A simple test case is below.  I haven't defined DEVICE since getting it
      wrong will destroy your data :) On an 3 disk LVM with a 64k chunk size we
      see about 17MB/sec and only a few threads in IO wait:
      
      procs  -----io---- -system-- -----cpu------
       r  b     bi    bo   in   cs us sy id wa st
       0  3      0 16170  656 2259  0  0 86 14  0
       0  2      0 16704  695 2408  0  0 92  8  0
       0  2      0 17308  744 2653  0  0 86 14  0
       0  2      0 17933  759 2777  0  0 89 10  0
      
      Most threads are blocking in vfs_fsync_range, which has:
      
              mutex_lock(&mapping->host->i_mutex);
              err = fop->fsync(file, dentry, datasync);
              if (!ret)
                      ret = err;
              mutex_unlock(&mapping->host->i_mutex);
      
      commit 148f948b (vfs: Introduce new
      helpers for syncing after writing to O_SYNC file or IS_SYNC inode) offers
      some explanation of what is going on:
      
          Use these new helpers for syncing from generic VFS functions. This makes
          O_SYNC writes to block devices acquire i_mutex for syncing. If we really
          care about this, we can make block_fsync() drop the i_mutex and reacquire
          it before it returns.
      
      Thanks Jan for such a good commit message!  As well as dropping i_mutex,
      Christoph suggests we should remove the call to sync_blockdev():
      
      > sync_blockdev is an overcomplicated alias for filemap_write_and_wait on
      > the block device inode, which is exactly what we did just before calling
      > into ->fsync
      
      The patch below incorporates both suggestions. With it the testcase improves
      from 17MB/s to 68M/sec:
      
      procs  -----io---- -system-- -----cpu------
       r  b     bi    bo   in   cs us sy id wa st
       0  7      0 65536 1000 3878  0  0 70 30  0
       0 34      0 69632 1016 3921  0  1 46 53  0
       0 57      0 69632 1000 3921  0  0 55 45  0
       0 53      0 69640  754 4111  0  0 81 19  0
      
      Testcase:
      
      #define _GNU_SOURCE
      #include <stdio.h>
      #include <pthread.h>
      #include <unistd.h>
      #include <stdlib.h>
      #include <string.h>
      #include <sys/types.h>
      #include <sys/stat.h>
      #include <fcntl.h>
      
      #define NR_THREADS 64
      #define BUFSIZE (64 * 1024)
      
      #define DEVICE "/dev/mapper/XXXXXX"
      
      #define ALIGN(VAL, SIZE) (((VAL)+(SIZE)-1) & ~((SIZE)-1))
      
      static int fd;
      
      static void *doit(void *arg)
      {
      	unsigned long offset = (long)arg;
      	char *b, *buf;
      
      	b = malloc(BUFSIZE + 1024);
      	buf = (char *)ALIGN((unsigned long)b, 1024);
      	memset(buf, 0, BUFSIZE);
      
      	while (1)
      		pwrite(fd, buf, BUFSIZE, offset);
      }
      
      int main(int argc, char *argv[])
      {
      	int flags = O_RDWR|O_DIRECT;
      	int i;
      	unsigned long offset = 0;
      
      	if (argc > 1 && !strcmp(argv[1], "O_SYNC"))
      		flags |= O_SYNC;
      
      	fd = open(DEVICE, flags);
      	if (fd == -1) {
      		perror("open");
      		exit(1);
      	}
      
      	for (i = 0; i < NR_THREADS-1; i++) {
      		pthread_t tid;
      		pthread_create(&tid, NULL, doit, (void *)offset);
      		offset += BUFSIZE;
      	}
      	doit((void *)offset);
      
      	return 0;
      }
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b8af67e2
    • J
      reiserfs: fix corruption during shrinking of xattrs · fb2162df
      Jeff Mahoney 提交于
      Commit 48b32a35 ("reiserfs: use generic
      xattr handlers") introduced a problem that causes corruption when extended
      attributes are replaced with a smaller value.
      
      The issue is that the reiserfs_setattr to shrink the xattr file was moved
      from before the write to after the write.
      
      The root issue has always been in the reiserfs xattr code, but was papered
      over by the fact that in the shrink case, the file would just be expanded
      again while the xattr was written.
      
      The end result is that the last 8 bytes of xattr data are lost.
      
      This patch fixes it to use new_size.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=14826Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reported-by: NChristian Kujau <lists@nerdbynature.de>
      Tested-by: NChristian Kujau <lists@nerdbynature.de>
      Cc: Edward Shishkin <edward.shishkin@gmail.com>
      Cc: Jethro Beekman <kernel@jbeekman.nl>
      Cc: Greg Surbey <gregsurbey@hotmail.com>
      Cc: Marco Gatti <marco.gatti@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb2162df
    • J
      reiserfs: fix permissions on .reiserfs_priv · cac36f70
      Jeff Mahoney 提交于
      Commit 677c9b2e ("reiserfs: remove
      privroot hiding in lookup") removed the magic from the lookup code to hide
      the .reiserfs_priv directory since it was getting loaded at mount-time
      instead.  The intent was that the entry would be hidden from the user via
      a poisoned d_compare, but this was faulty.
      
      This introduced a security issue where unprivileged users could access and
      modify extended attributes or ACLs belonging to other users, including
      root.
      
      This patch resolves the issue by properly hiding .reiserfs_priv.  This was
      the intent of the xattr poisoning code, but it appears to have never
      worked as expected.  This is fixed by using d_revalidate instead of
      d_compare.
      
      This patch makes -oexpose_privroot a no-op.  I'm fine leaving it this way.
      The effort involved in working out the corner cases wrt permissions and
      caching outweigh the benefit of the feature.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Acked-by: NEdward Shishkin <edward.shishkin@gmail.com>
      Reported-by: NMatt McCutchen <matt@mattmccutchen.net>
      Tested-by: NMatt McCutchen <matt@mattmccutchen.net>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cac36f70
  8. 24 4月, 2010 1 次提交
  9. 23 4月, 2010 1 次提交
  10. 22 4月, 2010 9 次提交
  11. 21 4月, 2010 3 次提交