1. 22 2月, 2013 4 次提交
    • T
      ocfs2: remove kfree() redundant null checks · d787ab09
      Tim Gardner 提交于
      smatch analysis indicates a number of redundant NULL checks before
      calling kfree(), eg:
      
        fs/ocfs2/alloc.c:6138 ocfs2_begin_truncate_log_recovery() info:
         redundant null check on *tl_copy calling kfree()
      
        fs/ocfs2/alloc.c:6755 ocfs2_zero_range_for_truncate() info:
         redundant null check on pages calling kfree()
      
      etc....
      
      [akpm@linux-foundation.org: revert dubious change in ocfs2_begin_truncate_log_recovery()]
      Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Acked-by: NJoel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d787ab09
    • W
      configfs: move the dereference below the NULL test · 49deb4bc
      Wei Yongjun 提交于
      The dereference should be moved below the NULL test.
      
      spatch with a semantic match is used to found this.
      (http://coccinelle.lip6.fr/)
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49deb4bc
    • M
      fs/block_dev.c: page cache wrongly left invalidated after revalidate_disk() · 7630b661
      MITSUNARI Shigeo 提交于
      We found that bdev->bd_invalidated was left set once revalidate_disk()
      is called, which results in page cache flush every time that device is
      open.
      
      Specifically, we found this problem in MD block device.  Once we resize
      a MD device, mdadm --monitor periodically flush all page cache for that
      device every 60 or 1000 seconds when it opens the device.
      
      This bug lies since at least 3.2.0 till the latest kernel(3.6.2).  Patch
      is attached.
      
      The following steps will reproduce the problem.
      
      1. prepair a block device (eg /dev/sdb).
      
      2. create two partitions:
      
         sudo parted /dev/sdb
         mklabel gpt
         mkpart primary 0% 50%
         mkpart primary 50% 100%
      
      3. create a md device.
      
         sudo mdadm -C /dev/md/hoge -l 1 -n 2 -e 1.2 --assume-clean --auto=md --symlink=no /dev/sdb1 /dev/sdb2
      
      4. create file system and mount it
      
         sudo mkfs.ext3 /dev/md/hoge
         sudo mkdir /mnt/test
         sudo mount /dev/md/hoge /mnt/test
      
      5. try to resize the device
      
         sudo mdadm -G /dev/md/hoge --size=max
      
      6. create a file to fill file cache.
      
        sudo dd if=/dev/urandom of=/mnt/test/data bs=1M count=10
      
      and verify the current status of file by free command.
      
      7. mdadm monitor will open the md device every 1000 seconds and you
         will find all file cache on the device are cleared.
      
      The timing can be reduced by the following steps.
      
      a) kill mdadm and restart it with --delay option
      
         /sbin/mdadm --monitor --delay=30 --pid-file /var/run/mdadm/monitor.pid --daemonise --scan --syslog
      
      or open the md device directly.
      
         sudo dd if=/dev/md/hoge of=/dev/null bs=4096 count=1
      Signed-off-by: NMITSUNARI Shigeo <herumi@nifty.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7630b661
    • J
      inotify: remove broken mask checks causing unmount to be EINVAL · 676a0675
      Jim Somerville 提交于
      Running the command:
      
      	inotifywait -e unmount /mnt/disk
      
      immediately aborts with a -EINVAL return code.  This is however a valid
      parameter.  This abort occurs only if unmount is the sole event
      parameter.  If other event parameters are supplied, then the unmount
      event wait will work.
      
      The problem was introduced by commit 44b350fc ("inotify: Fix mask
      checks").  In that commit, it states:
      
      	The mask checks in inotify_update_existing_watch() and
      	inotify_new_watch() are useless because inotify_arg_to_mask()
      	sets FS_IN_IGNORED and FS_EVENT_ON_CHILD bits anyway.
      
      But instead of removing the useless checks, it did this:
      
      	        mask = inotify_arg_to_mask(arg);
      	-       if (unlikely(!mask))
      	+       if (unlikely(!(mask & IN_ALL_EVENTS)))
      	                return -EINVAL;
      
      The problem is that IN_ALL_EVENTS doesn't include IN_UNMOUNT, and other
      parts of the code keep IN_UNMOUNT separate from IN_ALL_EVENTS.  So the
      check should be:
      
      	if (unlikely(!(mask & (IN_ALL_EVENTS | IN_UNMOUNT))))
      
      But inotify_arg_to_mask(arg) always sets the IN_UNMOUNT bit in the mask
      anyway, so the check is always going to pass and thus should simply be
      removed.  Also note that inotify_arg_to_mask completely controls what
      mask bits get set from arg, there's no way for invalid bits to get
      enabled there.
      
      Lets fix it by simply removing the useless broken checks.
      Signed-off-by: NJim Somerville <Jim.Somerville@windriver.com>
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Jerome Marchand <jmarchan@redhat.com>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: <stable@vger.kernel.org>		[2.6.37+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      676a0675
  2. 20 2月, 2013 1 次提交
  3. 19 2月, 2013 2 次提交
  4. 18 2月, 2013 3 次提交
    • F
      umount oops when remove blocklayoutdriver first · 5a12cca6
      fanchaoting 提交于
      now pnfs client uses block layout, maybe we can remove
      blocklayoutdriver first. if we umount later,
      it can cause oops in unset_pnfs_layoutdriver.
      because nfss->pnfs_curr_ld->clear_layoutdriver is invalid.
      
      reproduce it:
       modprobe  blocklayoutdriver
       mount -t nfs4 -o minorversion=1 pnfsip:/ /mnt/
       rmmod blocklayoutdriver
       umount /mnt
      
      then you can see following
      
      CPU 0
      Pid: 17023, comm: umount.nfs4 Tainted: GF          O 3.7.0-rc6-pnfs #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
      RIP: 0010:[<ffffffffa04cfe6d>]  [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
      RSP: 0018:ffff8800022d9e48  EFLAGS: 00010286
      RAX: ffffffffa04a1b00 RBX: ffff88000b013800 RCX: 0000000000000001
      RDX: ffffffff81ae8ee0 RSI: ffff880001ee94b8 RDI: ffff88000b013800
      RBP: ffff8800022d9e58 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff880001ee9400
      R13: ffff8800105978c0 R14: 00007fff25846c08 R15: 0000000001bba550
      FS:  00007f45ae7f0700(0000) GS:ffff880012c00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: ffffffffa04a1b38 CR3: 0000000002c0c000 CR4: 00000000000006f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process umount.nfs4 (pid: 17023, threadinfo ffff8800022d8000, task ffff880006e48aa0)
      Stack:
      ffff8800105978c0 ffff88000b013800 ffff8800022d9e78 ffffffffa04cd0ce
      ffff8800022d9e78 ffff88000b013800 ffff8800022d9ea8 ffffffffa04755a7
      ffff8800022d9ea8 ffff880002f96400 ffff88000b013800 ffff880002f96400
      Call Trace:
      [<ffffffffa04cd0ce>] nfs4_destroy_server+0x1e/0x30 [nfsv4]
      [<ffffffffa04755a7>] nfs_free_server+0xb7/0x150 [nfs]
      [<ffffffffa047d4d5>] nfs_kill_super+0x35/0x40 [nfs]
      [<ffffffff81178d35>] deactivate_locked_super+0x45/0x70
      [<ffffffff8117986a>] deactivate_super+0x4a/0x70
      [<ffffffff81193ee2>] mntput_no_expire+0xd2/0x130
      [<ffffffff81194d62>] sys_umount+0x72/0xe0
      [<ffffffff8154af59>] system_call_fastpath+0x16/0x1b
      Code: 06 e1 b8 ea ff ff ff eb 9e 0f 1f 44 00 00 55 48 89 e5 53 48 83 ec 08 66 66 66 66 90 48 8b 87 80 03 00 00 48 89 fb 48 85 c0 74 29 <48> 8b 40 38 48 85 c0 74 02 ff d0 48 8b 03 3e ff 48 04 0f 94 c2
      RIP  [<ffffffffa04cfe6d>] unset_pnfs_layoutdriver+0x1d/0x70 [nfsv4]
      RSP <ffff8800022d9e48>
      CR2: ffffffffa04a1b38
      ---[ end trace 29f75aaedda058bf ]---
      
      Signed-off-by: fanchaoting<fanchaoting@cn.fujitsu.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      5a12cca6
    • T
      nfs: remove kfree() redundant null checks · 96aa1549
      Tim Gardner 提交于
      smatch analysis:
      
      fs/nfs/getroot.c:130 nfs_get_root() info: redundant null
       check on name calling kfree()
      
      fs/nfs/unlink.c:272 nfs_async_unlink() info: redundant null
       check on devname_garbage calling kfree()
      
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: linux-nfs@vger.kernel.org
      Signed-off-by: NTim Gardner <tim.gardner@canonical.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      96aa1549
    • W
      NFSv4.1: Don't decode skipped layoutgets · 085b7a45
      Weston Andros Adamson 提交于
      layoutget's prepare hook can call rpc_exit with status = NFS4_OK (0).
      Because of this, nfs4_proc_layoutget can't depend on a 0 status to mean
      that the RPC was successfully sent, received and parsed.
      
      To fix this, use the result's len member to see if parsing took place.
      
      This fixes the following OOPS -- calling xdr_init_decode() with a buffer length
      0 doesn't set the stream's 'p' member and ends up using uninitialized memory
      in filelayout_decode_layout.
      
      BUG: unable to handle kernel paging request at 0000000000008050
      IP: [<ffffffff81282e78>] memcpy+0x18/0x120
      PGD 0
      Oops: 0000 [#1] SMP
      last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:01.0/irq
      CPU 1
      Modules linked in: nfs_layout_nfsv41_files nfs lockd fscache auth_rpcgss nfs_acl autofs4 sunrpc ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 dm_mirror dm_region_hash dm_log dm_mod ppdev parport_pc parport snd_ens1371 snd_rawmidi snd_ac97_codec ac97_bus snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc e1000 microcode vmware_balloon i2c_piix4 i2c_core sg shpchp ext4 mbcache jbd2 sr_mod cdrom sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptspi mptscsih mptbase scsi_transport_spi [last unloaded: speedstep_lib]
      
      Pid: 1665, comm: flush-0:22 Not tainted 2.6.32-356-test-2 #2 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
      RIP: 0010:[<ffffffff81282e78>]  [<ffffffff81282e78>] memcpy+0x18/0x120
      RSP: 0018:ffff88003dfab588  EFLAGS: 00010206
      RAX: ffff88003dc42000 RBX: ffff88003dfab610 RCX: 0000000000000009
      RDX: 000000003f807ff0 RSI: 0000000000008050 RDI: ffff88003dc42000
      RBP: ffff88003dfab5b0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000024
      R13: ffff88003dc42000 R14: ffff88003f808030 R15: ffff88003dfab6a0
      FS:  0000000000000000(0000) GS:ffff880003420000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      CR2: 0000000000008050 CR3: 000000003bc92000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process flush-0:22 (pid: 1665, threadinfo ffff88003dfaa000, task ffff880037f77540)
      Stack:
      ffffffffa0398ac1 ffff8800397c5940 ffff88003dfab610 ffff88003dfab6a0
      <d> ffff88003dfab5d0 ffff88003dfab680 ffffffffa01c150b ffffea0000d82e70
      <d> 000000508116713b 0000000000000000 0000000000000000 0000000000000000
      Call Trace:
      [<ffffffffa0398ac1>] ? xdr_inline_decode+0xb1/0x120 [sunrpc]
      [<ffffffffa01c150b>] filelayout_decode_layout+0xeb/0x350 [nfs_layout_nfsv41_files]
      [<ffffffffa01c17fc>] filelayout_alloc_lseg+0x8c/0x3c0 [nfs_layout_nfsv41_files]
      [<ffffffff8150e6ce>] ? __wait_on_bit+0x7e/0x90
      Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      085b7a45
  5. 15 2月, 2013 5 次提交
    • D
      xfs: xfs_bmap_add_attrfork_local is too generic · 1e82379b
      Dave Chinner 提交于
      When we are converting local data to an extent format as a result of
      adding an attribute, the type of data contained in the local fork
      determines the behaviour that needs to occur.
      
      xfs_bmap_add_attrfork_local() already handles the directory data
      case specially by using S_ISDIR() and calling out to
      xfs_dir2_sf_to_block(), but with verifiers we now need to handle
      each different type of metadata specially and different metadata
      formats require different verifiers (and eventually block header
      initialisation).
      
      There is only a single place that we add and attribute fork to
      the inode, but that is in the attribute code and it knows nothing
      about the specific contents of the data fork. It is only the case of
      local data that is the issue here, so adding code to hadnle this
      case in the attribute specific code is wrong. Hence we are really
      stuck trying to detect the data fork contents in
      xfs_bmap_add_attrfork_local() and performing the correct callout
      there.
      
      Luckily the current cases can be determined by S_IS* macros, and we
      can push the work off to data specific callouts, but each of those
      callouts does a lot of work in common with
      xfs_bmap_local_to_extents(). The only reason that this fails for
      symlinks right now is is that xfs_bmap_local_to_extents() assumes
      the data fork contains extent data, and so attaches a a bmap extent
      data verifier to the buffer and simply copies the data fork
      information straight into it.
      
      To fix this, allow us to pass a "formatting" callback into
      xfs_bmap_local_to_extents() which is responsible for setting the
      buffer type, initialising it and copying the data fork contents over
      to the new buffer. This allows callers to specify how they want to
      format the new buffer (which is necessary for the upcoming CRC
      enabled metadata blocks) and hence make xfs_bmap_local_to_extents()
      useful for any type of data fork content.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: Mark Tinguely <tinguely@sgi.com> 
      Signed-off-by: NBen Myers <bpm@sgi.com>
      1e82379b
    • B
      xfs: remove log force from xfs_buf_trylock() · fa5566e4
      Brian Foster 提交于
      The trylock log force invoked via xfs_buf_item_push() can attempt
      to acquire xa_lock, thus leading to a recursion bug when called
      with xa_lock held.
      
      This log force was originally added to xfs_buf_trylock() to address
      xfsaild stalls due to pinned and stale buffers. Since the addition
      of this behavior, the log item pushing code had been reworked to
      detect and track pinned items to inform xfsaild to issue a log
      force itself when necessary. As such, the log force on trylock
      failure is redundant and safe to remove.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      fa5566e4
    • B
      xfs: recheck buffer pinned status after push trylock failure · 5337fe9b
      Brian Foster 提交于
      The buffer pinned check and trylock sequence in xfs_buf_item_push()
      can race with an active transaction on marking the buffer pinned.
      This can result in the buffer becoming pinned and stale after the
      initial check and the trylock failure, but before the check in
      xfs_buf_trylock() that issues a log force. If the log force is
      issued from this context, a spinlock recursion occurs on xa_lock.
      
      Prepare xfs_buf_item_push() to handle the race by detecting a
      pinned buffer after the trylock failure so xfsaild issues a log
      force from a safe context. This, along with various previous fixes,
      renders the log force in xfs_buf_trylock() redundant.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      5337fe9b
    • D
      xfs: limit speculative prealloc size on sparse files · a1e16c26
      Dave Chinner 提交于
      Speculative preallocation based on the current file size works well
      for contiguous files, but is sub-optimal for sparse files where the
      EOF preallocation can fill holes and result in large amounts of
      zeros being written when it is not necessary.
      
      The algorithm is modified to prevent EOF speculative preallocation
      from triggering larger allocations on IO patterns of
      truncate--to-zero-seek-write-seek-write-....  which results in
      non-sparse files for large files. This, unfortunately, is the way cp
      now behaves when copying sparse files and so needs to be fixed.
      
      What this code does is that it looks at the existing extent adjacent
      to the current EOF and if it determines that it is a hole we disable
      speculative preallocation altogether. To avoid the next write from
      doing a large prealloc, it takes the size of subsequent
      preallocations from the current size of the existing EOF extent.
      IOWs, if you leave a hole in the file, it resets preallocation
      behaviour to the same as if it was a zero size file.
      
      Example new behaviour:
      
      $ xfs_io -f -c "pwrite 0 31m" \
                  -c "pwrite 33m 1m" \
                  -c "pwrite 128m 1m" \
                  -c "fiemap -v" /mnt/scratch/blah
      wrote 32505856/32505856 bytes at offset 0
      31 MiB, 7936 ops; 0.0000 sec (1.608 GiB/sec and 421432.7439 ops/sec)
      wrote 1048576/1048576 bytes at offset 34603008
      1 MiB, 256 ops; 0.0000 sec (1.462 GiB/sec and 383233.5329 ops/sec)
      wrote 1048576/1048576 bytes at offset 134217728
      1 MiB, 256 ops; 0.0000 sec (1.719 GiB/sec and 450704.2254 ops/sec)
      /mnt/scratch/blah:
       EXT: FILE-OFFSET      BLOCK-RANGE      TOTAL FLAGS
         0: [0..65535]:      96..65631        65536   0x0
         1: [65536..67583]:  hole              2048
         2: [67584..69631]:  67680..69727      2048   0x0
         3: [69632..262143]: hole             192512
         4: [262144..264191]: 262240..264287    2048   0x1
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      a1e16c26
    • T
      NFSv4.1: Fix bulk recall and destroy of layouts · fd9a8d71
      Trond Myklebust 提交于
      The current code in pnfs_destroy_all_layouts() assumes that removing
      the layout from the server->layouts list is sufficient to make it
      invisible to other processes. This ignores the fact that most
      users access the layout through the nfs_inode->layout...
      There is further breakage due to lack of reference counting of the
      layouts, meaning that the whole thing Oopses at the drop of a hat.
      
      The code in initiate_bulk_draining() is almost correct, and can be
      used as a model for pnfs_destroy_all_layouts(), so move that
      code to pnfs.c, and refactor the code to allow us to choose between
      a single filesystem bulk recall, and a recall of all layouts.
      Also note that initiate_bulk_draining() currently calls iput() while
      holding locks. Fix that too.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      fd9a8d71
  6. 13 2月, 2013 2 次提交
  7. 12 2月, 2013 7 次提交
  8. 11 2月, 2013 3 次提交
    • M
      fs/9p: Fix atomic_open · b6f4bee0
      M. Mohan Kumar 提交于
      Return EEXISTS if requested file already exists, without this patch open
      call will always succeed even if the file exists and user specified
      O_CREAT|O_EXCL.
      
      Following test code can be used to verify this patch. Without this patch
      executing following test code on 9p mount will result in printing 'test case
      failed' always.
      
      main()
      {
              int fd;
      
              /* first create the file */
              fd = open("./file", O_CREAT|O_WRONLY);
              if (fd < 0) {
                      perror("open");
                      return -1;
              }
              close(fd);
      
              /* Now opening same file with O_CREAT|O_EXCL should fail */
              fd = open("./file", O_CREAT|O_EXCL);
              if (fd < 0 && errno == EEXIST)
      	        printf("test case pass\n");
              else
      	        printf("test case failed\n");
              close(fd);
              return 0;
      }
      Signed-off-by: NM. Mohan Kumar <mohan@in.ibm.com>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      b6f4bee0
    • A
      fs/9p: Don't use O_TRUNC flag in TOPEN and TLOPEN request · 03f0e022
      Aneesh Kumar K.V 提交于
      We do the truncate via setattr request, hence don't pass the O_TRUNC flag in
      open request. Without this patch we end up sending zero sized write request
      to server when we try to truncate. Some servers (VirtFS) were not handling that
      properly.
      Reported-by: NM. Mohan Kumar <mohan@in.ibm.com>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      03f0e022
    • A
      locking in fs/9p ->readdir() · 7ffdea7e
      Al Viro 提交于
      	... is really excessive.  First of all, ->readdir() is serialized by
      file->f_path.dentry->d_inode->i_mutex; playing with file->f_path.dentry->d_lock
      is not buying you anything.  Moreover, rdir->mutex is pointless for exactly
      the same reason - you'll never see contention on it.
      
      	While we are at it, there's no point in having rdir->buf a pointer -
      you have it point just past the end of rdir, so it might as well be a flex
      array (and no, it's not a gccism).
      
      	Absolutely untested patch follows:
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      7ffdea7e
  9. 08 2月, 2013 1 次提交
  10. 07 2月, 2013 3 次提交
  11. 06 2月, 2013 7 次提交
    • J
      Btrfs: fix EDQUOT handling in btrfs_delalloc_reserve_metadata · eb6b88d9
      Jan Schmidt 提交于
      When btrfs_qgroup_reserve returned a failure, we were missing a counter
      operation for BTRFS_I(inode)->outstanding_extents++, leading to warning
      messages about outstanding extents and space_info->bytes_may_use != 0.
      Additionally, the error handling code didn't take into account that we
      dropped the inode lock which might require more cleanup.
      
      Luckily, all the cleanup code we need is already there and can be shared
      with reserve_metadata_bytes, which is exactly what this patch does.
      Reported-by: NLev Vainblat <lev@zadarastorage.com>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      eb6b88d9
    • J
      Btrfs: fix possible stale data exposure · 59fe4f41
      Josef Bacik 提交于
      We specifically do not update the disk i_size if there are ordered extents
      outstanding for any area between the current disk_i_size and our ordered
      extent so that we do not expose stale data.  The problem is the check we
      have only checks if the ordered extent starts at or after the current
      disk_i_size, which doesn't take into account an ordered extent that starts
      before the current disk_i_size and ends past the disk_i_size.  Fix this by
      checking if the extent ends past the disk_i_size.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      59fe4f41
    • J
      Btrfs: fix missing i_size update · 5d1f4020
      Josef Bacik 提交于
      If we have an ordered extent before the ordered extent we are currently
      completing that is after the current disk_i_size we will put our i_size
      update into that ordered extent so that we do not expose stale data.  The
      problem is that if our disk i_size is updated past the previous ordered
      extent we won't update the i_size with the pending i_size update.  So check
      the pending i_size update and if its above the current disk i_size we need
      to go ahead and try to update.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      5d1f4020
    • L
      Btrfs: fix race between snapshot deletion and getting inode · 6f1c3605
      Liu Bo 提交于
      While running snapshot testscript created by Mitch and David,
      the race between autodefrag and snapshot deletion can lead to
      corruption of dead_root list so that we can get crash on
      btrfs_clean_old_snapshots().
      
      And besides autodefrag, scrub also does the same thing, ie. read
      root first and get inode.
      
      Here is the story(take autodefrag as an example):
      (1) when we delete a snapshot or subvolume, it will set its root's
      refs to zero and do a iput() on its own inode, and if this inode happens
      to be the only active in-meory one in root's inode rbtree, it will add
      itself to the global dead_roots list for later cleanup.
      
      (2) after (1), the autodefrag thread may read another inode for defrag
      and the inode is just in the deleted snapshot/subvolume, but all of these
      are without checking if the root is still valid(refs > 0).  So the end up
      result is adding the deleted snapshot/subvolume's root to the global
      dead_roots list AGAIN.
      
      Fortunately, we already have a srcu lock to avoid the race, ie. subvol_srcu.
      
      So all we need to do is to take the lock to protect 'read root and get inode',
      since we synchronize to wait for the rcu grace period before adding something
      to the global dead_roots list.
      Reported-by: NMitch Harder <mitch.harder@sabayonlinux.org>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6f1c3605
    • M
      Btrfs: fix missing release of the space/qgroup reservation in start_transaction() · 843fcf35
      Miao Xie 提交于
      When we fail to start a transaction, we need to release the reserved free space
      and qgroup space, fix it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      843fcf35
    • M
      Btrfs: fix wrong sync_writers decrement in btrfs_file_aio_write() · 0a3404dc
      Miao Xie 提交于
      If the checks at the beginning of btrfs_file_aio_write() fail, we needn't
      decrease ->sync_writers, because we have not increased it. Fix it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      0a3404dc
    • J
      Btrfs: do not merge logged extents if we've removed them from the tree · 222c81dc
      Josef Bacik 提交于
      You can run into this problem where if somebody is fsyncing and writing out
      the existing extents you will have removed the extent map from the em tree,
      but it's still valid for the current fsync so we go ahead and write it.  The
      problem is we unconditionally try to merge it back into the em tree, but if
      we've removed it from the em tree that will cause use after free problems.
      Fix this to only merge if we are still a part of the tree.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      222c81dc
  12. 05 2月, 2013 2 次提交
    • V
      nilfs2: fix fix very long mount time issue · a9bae189
      Vyacheslav Dubeyko 提交于
      There exists a situation when GC can work in background alone without
      any other filesystem activity during significant time.
      
      The nilfs_clean_segments() method calls nilfs_segctor_construct() that
      updates superblocks in the case of NILFS_SC_SUPER_ROOT and
      THE_NILFS_DISCONTINUED flags are set.  But when GC is working alone the
      nilfs_clean_segments() is called with unset THE_NILFS_DISCONTINUED flag.
      As a result, the update of superblocks doesn't occurred all this time
      and in the case of SPOR superblocks keep very old values of last super
      root placement.
      
      SYMPTOMS:
      
      Trying to mount a NILFS2 volume after SPOR in such environment ends with
      very long mounting time (it can achieve about several hours in some
      cases).
      
      REPRODUCING PATH:
      
      1. It needs to use external USB HDD, disable automount and doesn't
         make any additional filesystem activity on the NILFS2 volume.
      
      2. Generate temporary file with size about 100 - 500 GB (for example,
         dd if=/dev/zero of=<file_name> bs=1073741824 count=200).  The size of
         file defines duration of GC working.
      
      3. Then it needs to delete file.
      
      4. Start GC manually by means of command "nilfs-clean -p 0".  When you
         start GC by means of such way then, at the end, superblocks is updated
         by once.  So, for simulation of SPOR, it needs to wait sometime (15 -
         40 minutes) and simply switch off USB HDD manually.
      
      5. Switch on USB HDD again and try to mount NILFS2 volume.  As a
         result, NILFS2 volume will mount during very long time.
      
      REPRODUCIBILITY: 100%
      
      FIX:
      
      This patch adds checking that superblocks need to update and set
      THE_NILFS_DISCONTINUED flag before nilfs_clean_segments() call.
      Reported-by: NSergey Alexandrov <splavgm@gmail.com>
      Signed-off-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Tested-by: NVyacheslav Dubeyko <slava@dubeyko.com>
      Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9bae189
    • D
      dlm: check the write size from user · d4b0bcf3
      David Teigland 提交于
      Return EINVAL from write if the size is larger than
      allowed.  Do this before allocating kernel memory for
      the bogus size, which could lead to OOM.
      Reported-by: NSasha Levin <levinsasha928@gmail.com>
      Tested-by: NJana Saout <jana@saout.de>
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      d4b0bcf3