1. 05 4月, 2008 1 次提交
    • L
      Be more careful about marking buffers dirty · 1be62dc1
      Linus Torvalds 提交于
      Mikulas Patocka noted that the optimization where we check if a buffer
      was already dirty (and we avoid re-dirtying it) was not really SMP-safe.
      
      Since the read of the old status was not synchronized with anything, an
      aggressive CPU re-ordering of memory accesses might have moved that read
      up to before the data was even written to the buffer, and another CPU
      that cleaned it again, causing the newly dirty state to never actually
      hit the disk.
      
      Admittedly this would probably never trigger in practice, but it's still
      wrong.
      
      Mikulas sent a patch that fixed the problem, but I dislike the subtlety
      of the whole optimization, so this is an alternate fix that is more
      explicit about the particular SMP ordering for the optimization, and
      separates out the speculative reads of the buffer state into its own
      conditional (and makes the memory barrier only happen if we are likely
      to actually hit the optimized case in the first place).
      
      I considered removing the optimization entirely, but Andrew argued for
      it's continued existence. I'm a push-over.
      
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1be62dc1
  2. 04 4月, 2008 2 次提交
  3. 03 4月, 2008 1 次提交
  4. 02 4月, 2008 1 次提交
  5. 31 3月, 2008 3 次提交
  6. 29 3月, 2008 2 次提交
    • S
      afs: prevent double cell registration · 5214b729
      Sven Schnelle 提交于
      kafs doesn't check if the cell already exists - so if you do an echo "add
      newcell.org 1.2.3.4" >/proc/fs/afs/cells it will try to create this cell
      again.  kobject will also complain about a double registration.  To prevent
      such problems, return -EEXIST in that case.
      Signed-off-by: NSven Schnelle <svens@stackframe.org>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5214b729
    • D
      vfs: fix data leak in nobh_write_end() · 5b41e74a
      Dmitri Monakhov 提交于
      Current nobh_write_end() implementation ignore partial writes(copied < len)
      case if page was fully mapped and simply mark page as Uptodate, which is
      totally wrong because area [pos+copied, pos+len) wasn't updated explicitly in
      previous write_begin call.  It simply contains garbage from pagecache and
      result in data leakage.
      
      #TEST_CASE_BEGIN:
      ~~~~~~~~~~~~~~~~
      In fact issue triggered by classical testcase
      	open("/mnt/test", O_RDWR|O_CREAT|O_TRUNC, 0666) = 3
      	ftruncate(3, 409600)                    = 0
      	writev(3, [{"a", 1}, {NULL, 4095}], 2)  = 1
      ##TESTCASE_SOURCE:
      ~~~~~~~~~~~~~~~~~
      #include <stdio.h>
      #include <stdlib.h>
      #include <fcntl.h>
      #include <sys/uio.h>
      #include <sys/mman.h>
      #include <errno.h>
      int main(int argc, char **argv)
      {
      	int fd,  ret;
      	void* p;
      	struct iovec iov[2];
      	fd = open(argv[1], O_RDWR|O_CREAT|O_TRUNC, 0666);
      	ftruncate(fd, 409600);
      	iov[0].iov_base="a";
      	iov[0].iov_len=1;
      	iov[1].iov_base=NULL;
      	iov[1].iov_len=4096;
      	ret = writev(fd, iov, sizeof(iov)/sizeof(struct iovec));
      	printf("writev  = %d, err = %d\n", ret, errno);
      	return 0;
      }
      ##TESTCASE RESULT:
      ~~~~~~~~~~~~~~~~~~
      [root@ts63 ~]# mount | grep mnt2
      /dev/mapper/test on /mnt2 type ext2 (rw,nobh)
      [root@ts63 ~]#  /tmp/writev /mnt2/test
      writev  = 1, err = 0
      [root@ts63 ~]# hexdump -C /mnt2/test
      
      00000000  61 65 62 6f 6f 74 00 00  f0 b9 b4 59 3a 00 00 00  |aeboot.....Y:...|
      00000010  20 00 00 00 00 00 00 00  21 00 00 00 00 00 00 00  | .......!.......|
      00000020  df df df df df df df df  df df df df df df df df  |................|
      00000030  3a 00 00 00 2a 00 00 00  21 00 00 00 00 00 00 00  |:...*...!.......|
      00000040  60 c0 8c 00 00 00 00 00  40 4a 8d 00 00 00 00 00  |`.......@J......|
      00000050  00 00 00 00 00 00 00 00  41 00 00 00 00 00 00 00  |........A.......|
      00000060  74 69 6d 65 20 64 64 20  69 66 3d 2f 64 65 76 2f  |time dd if=/dev/|
      00000070  6c 6f 6f 70 30 20 20 6f  66 3d 2f 64 65 76 2f 6e  |loop0  of=/dev/n|
      skip..
      00000f50  00 00 00 00 00 00 00 00  31 00 00 00 00 00 00 00  |........1.......|
      00000f60  6d 6b 66 73 2e 65 78 74  33 20 2f 64 65 76 2f 76  |mkfs.ext3 /dev/v|
      00000f70  7a 76 67 2f 74 65 73 74  20 2d 62 34 30 39 36 00  |zvg/test -b4096.|
      00000f80  a0 fe 8c 00 00 00 00 00  21 00 00 00 00 00 00 00  |........!.......|
      00000f90  23 31 32 30 35 39 35 30  34 30 34 00 3a 00 00 00  |#1205950404.:...|
      00000fa0  20 00 8d 00 00 00 00 00  21 00 00 00 00 00 00 00  | .......!.......|
      00000fb0  d0 cf 8c 00 00 00 00 00  10 d0 8c 00 00 00 00 00  |................|
      00000fc0  00 00 00 00 00 00 00 00  41 00 00 00 00 00 00 00  |........A.......|
      00000fd0  6d 6f 75 6e 74 20 2f 64  65 76 2f 76 7a 76 67 2f  |mount /dev/vzvg/|
      00000fe0  74 65 73 74 20 20 2f 76  7a 20 2d 6f 20 64 61 74  |test  /vz -o dat|
      00000ff0  61 3d 77 72 69 74 65 62  61 63 6b 00 00 00 00 00  |a=writeback.....|
      00001000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
      
      As you can see file's page contains garbage from pagecache instead of zeros.
      #TEST_CASE_END
      
      Attached patch:
      - Add sanity check BUG_ON in order to prevent incorrect usage by caller,
        This is function invariant because page can has buffers and in no zero
        *fadata pointer at the same time.
      - Always attach buffers to page is it is partial write case.
      - Always switch back to generic_write_end if page has buffers.
        This is reasonable because if page already has buffer then generic_write_begin
        was called previously.
      Signed-off-by: NDmitri Monakhov <dmonakhov@openvz.org>
      Reviewed-by: NNick Piggin <npiggin@suse.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b41e74a
  7. 28 3月, 2008 5 次提交
  8. 25 3月, 2008 1 次提交
  9. 23 3月, 2008 2 次提交
  10. 21 3月, 2008 1 次提交
    • A
      [NET]: Fix permissions of /proc/net · 4f42c288
      Andre Noll 提交于
      commit e9720acd ([NET]: Make /proc/net a symlink on /proc/self/net (v3))
      broke ganglia and probably other applications that read /proc/net/dev.
      
      This is due to the change of permissions of /proc/net that was
      introduced in that commit.
      
      Before: dr-xr-xr-x 5 root root 0 Mar 19 11:30 /proc/net
      After: dr-xr--r-- 5 root root 0 Mar 19 11:29 /proc/self/net
      
      This patch restores the permissions to the old value which makes
      ganglia happy again.
      
      Pavel Emelyanov says:
      
      	This also broke the postfix, as it was reported in bug #10286
      	and described in detail by Benjamin.
      Signed-off-by: NAndre Noll <maan@systemlinux.org>
      Acked-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f42c288
  11. 20 3月, 2008 12 次提交
    • A
      fs/ufs/balloc.c: fix sparc64 printk warning · 9df13039
      Andrew Morton 提交于
      fs/ufs/balloc.c: In function `ufs_change_blocknr':
      fs/ufs/balloc.c:317: warning: long long unsigned int format, long unsigned int arg (arg 2)
      fs/ufs/balloc.c:317: warning: long long unsigned int format, long unsigned int arg (arg 3)
      
      sector_t is u64 and we don't know what type the architecture uses to implement
      u64.
      
      Cc: Evgeniy Dushistov <dushistov@mail.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9df13039
    • D
      zisofs: fix readpage() outside i_size · 08ca0db8
      Dave Young 提交于
      A read request outside i_size will be handled in do_generic_file_read().  So
      we just return 0 to avoid getting -EIO as normal reading, let
      do_generic_file_read do the rest.
      
      At the same time we need unlock the page to avoid system stuck.
      
      Fixes http://bugzilla.kernel.org/show_bug.cgi?id=10227Signed-off-by: NDave Young <hidave.darkstar@gmail.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Report-by: NChristian Perle <chris@linuxinfotag.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      08ca0db8
    • R
      fs: fix kernel-doc notation warnings · a6b91919
      Randy Dunlap 提交于
      Fix kernel-doc notation warnings in fs/.
      
      Warning(mmotm-2008-0314-1449//fs/super.c:560): missing initial short description on line:
       *	mark_files_ro
      Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
       *	lease_get_mtime
      Warning(mmotm-2008-0314-1449//fs/locks.c:1277): missing initial short description on line:
       *	lease_get_mtime
      Warning(mmotm-2008-0314-1449//fs/namei.c:1368): missing initial short description on line:
       * lookup_one_len:  filesystem helper to lookup single pathname component
      Warning(mmotm-2008-0314-1449//fs/buffer.c:3221): missing initial short description on line:
       * bh_uptodate_or_lock: Test whether the buffer is uptodate
      Warning(mmotm-2008-0314-1449//fs/buffer.c:3240): missing initial short description on line:
       * bh_submit_read: Submit a locked buffer for reading
      Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:30): missing initial short description on line:
       * writeback_acquire: attempt to get exclusive writeback access to a device
      Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:47): missing initial short description on line:
       * writeback_in_progress: determine whether there is writeback in progress
      Warning(mmotm-2008-0314-1449//fs/fs-writeback.c:58): missing initial short description on line:
       * writeback_release: relinquish exclusive writeback access against a device.
      Warning(mmotm-2008-0314-1449//include/linux/jbd.h:351): contents before sections
      Warning(mmotm-2008-0314-1449//include/linux/jbd.h:561): contents before sections
      Warning(mmotm-2008-0314-1449//fs/jbd/transaction.c:1935): missing initial short description on line:
       * void journal_invalidatepage()
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6b91919
    • M
      eCryptfs: Swap dput() and mntput() · 5366dc9f
      Michael Halcrow 提交于
      ecryptfs_d_release() is doing a mntput before doing the dput.  This patch
      moves the dput before the mntput.
      
      Thanks to Rajouri Jammu for reporting this.
      Signed-off-by: NMichael Halcrow <mhalcrow@us.ibm.com>
      Cc: Rajouri Jammu <rajouri.jammu@gmail.com>
      Cc: Eric Sandeen <sandeen@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5366dc9f
    • D
      jbd2: correctly unescape journal data blocks · d0025676
      Duane Griffin 提交于
      Fix a long-standing typo (predating git) that will cause data corruption if a
      journal data block needs unescaping.  At the moment the wrong buffer head's
      data is being unescaped.
      
      To test this case mount a filesystem with data=journal, start creating and
      deleting a bunch of files containing only JBD2_MAGIC_NUMBER (0xc03b3998), then
      pull the plug on the device.  Without this patch the files will contain zeros
      instead of the correct data after recovery.
      Signed-off-by: NDuane Griffin <duaneg@dghda.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0025676
    • D
      jbd: correctly unescape journal data blocks · 439aeec6
      Duane Griffin 提交于
      Fix a long-standing typo (predating git) that will cause data corruption if a
      journal data block needs unescaping.  At the moment the wrong buffer head's
      data is being unescaped.
      
      To test this case mount a filesystem with data=journal, start creating and
      deleting a bunch of files containing only JFS_MAGIC_NUMBER (0xc03b3998), then
      pull the plug on the device.  Without this patch the files will contain zeros
      instead of the correct data after recovery.
      Signed-off-by: NDuane Griffin <duaneg@dghda.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      439aeec6
    • D
      ROMFS: Fix up an error in iget removal · 4ebf8984
      David Howells 提交于
      Fix up an error in iget removal in which romfs_lookup() making a successful
      call to romfs_iget() continues through the negative/error handling (previously
      the successful case jumped around the negative/error handling case):
      
       (1) inode is initialised to NULL at the top of the function, eliminating the
           need for specific negative-inode handling.  This means the positive
           success handling now flows straight through.
      
       (2) Rename the labels to be clearer about what they mean.
      
      Also make romfs_lookup()'s result variable of type long so as to avoid
      32-bit/64-bit conversions with PTR_ERR() and friends.
      
      Based upon a report and patch from Adam Richter.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Acked-by: N"Adam J. Richter" <adam@yggdrasil.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ebf8984
    • J
      ext3: fix wrong gfp type under transaction · c587f0c0
      Josef Bacik 提交于
      There are several places where we make allocations with GFP_KERNEL while under
      a transaction, which could lead to an assertion panic or lockup if under
      memory pressure.  This patch switches these problem areas to use GFP_NOFS to
      keep these problems from happening.
      Signed-off-by: NJosef Bacik <jbacik@redhat.com>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c587f0c0
    • J
      quota: add possibly missing iput() when quotaon and quotaoff races · 87cb055b
      Jan Kara 提交于
      We should always put inode we have reference to, even if quota was reenabled
      in the mean time.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87cb055b
    • R
      jbd: fix jbd kernel-doc notation · 0cf01f66
      Randy Dunlap 提交于
      Fix kernel-doc notation in jbd.
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0cf01f66
    • Q
      aio: bad AIO race in aio_complete() leads to process hang · 6cb2a210
      Quentin Barnes 提交于
      My group ran into a AIO process hang on a 2.6.24 kernel with the process
      sleeping indefinitely in io_getevents(2) waiting for the last wakeup to come
      and it never would.
      
      We ran the tests on x86_64 SMP.  The hang only occurred on a Xeon box
      ("Clovertown") but not a Core2Duo ("Conroe").  On the Xeon, the L2 cache isn't
      shared between all eight processors, but is L2 is shared between between all
      two processors on the Core2Duo we use.
      
      My analysis of the hang is if you go down to the second while-loop
      in read_events(), what happens on processor #1:
      	1) add_wait_queue_exclusive() adds thread to ctx->wait
      	2) aio_read_evt() to check tail
      	3) if aio_read_evt() returned 0, call [io_]schedule() and sleep
      
      In aio_complete() with processor #2:
      	A) info->tail = tail;
      	B) waitqueue_active(&ctx->wait)
      	C) if waitqueue_active() returned non-0, call wake_up()
      
      The way the code is written, step 1 must be seen by all other processors
      before processor 1 checks for pending events in step 2 (that were recorded by
      step A) and step A by processor 2 must be seen by all other processors
      (checked in step 2) before step B is done.
      
      The race I believed I was seeing is that steps 1 and 2 were
      effectively swapped due to the __list_add() being delayed by the L2
      cache not shared by some of the other processors.  Imagine:
      proc 2: just before step A
      proc 1, step 1: adds to ctx->wait, but is not visible by other processors yet
      proc 1, step 2: checks tail and sees no pending events
      proc 2, step A: updates tail
      proc 1, step 3: calls [io_]schedule() and sleeps
      proc 2, step B: checks ctx->wait, but sees no one waiting, skips wakeup
                      so proc 1 sleeps indefinitely
      
      My patch adds a memory barrier between steps A and B.  It ensures that the
      update in step 1 gets seen on processor 2 before continuing.  If processor 1
      was just before step 1, the memory barrier makes sure that step A (update
      tail) gets seen by the time processor 1 makes it to step 2 (check tail).
      
      Before the patch our AIO process would hang virtually 100% of the time.  After
      the patch, we have yet to see the process ever hang.
      Signed-off-by: NQuentin Barnes <qbarnes+linux@yahoo-inc.com>
      Reviewed-by: NZach Brown <zach.brown@oracle.com>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: <stable@kernel.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      [ We should probably disallow that "if (waitqueue_active()) wake_up()"
        coding pattern, because it's so often buggy wrt memory ordering ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6cb2a210
    • F
      nfs: don't ignore return value from nfs_pageio_add_request · f8512ad0
      Fred Isaman 提交于
      Ignoring the return value from nfs_pageio_add_request can cause deadlocks.
      
      In read path:
        call nfs_pageio_add_request from readpage_async_filler
        assume at this point that there are requests already in desc, that
          can't be merged with the current request.
        so nfs_pageio_doio is fired up to clear out desc.
        assume something goes wrong in setting up the io, so desc->pg_error is set.
        This causes nfs_pageio_add_request to return 0, *WITHOUT* adding the original
          request.
        BUT, since return code is ignored, readpage_async_filler assumes it has
          been added, and does nothing further, leaving page locked.
        do_generic_mapping_read will eventually call lock_page, resulting in deadlock
      
      In write path:
        page is marked dirty by generic_perform_write
        nfs_writepages is called
        call nfs_pageio_add_request from nfs_page_async_flush
        assume at this point that there are requests already in desc, that
          can't be merged with the current request.
        so nfs_pageio_doio is fired up to clear out desc.
        assume something goes wrong in setting up the io, so desc->pg_error is set.
        This causes nfs_page_async_flush to return 0, *WITHOUT* adding the original
          request, yet marking the request as locked (PG_BUSY) and in writeback,
          clearing dirty marks.
        The next time a write is done to the page, deadlock will result as
          nfs_write_end calls nfs_update_request
      Signed-off-by: NFred Isaman <iisaman@citi.umich.edu>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      f8512ad0
  12. 19 3月, 2008 7 次提交
  13. 18 3月, 2008 2 次提交