1. 28 9月, 2012 1 次提交
  2. 23 9月, 2012 2 次提交
    • A
      close the race in nlmsvc_free_block() · c5aa1e55
      Al Viro 提交于
      we need to grab mutex before the reference counter reaches 0
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      c5aa1e55
    • A
      do_add_mount()/umount -l races · 156cacb1
      Al Viro 提交于
      normally we deal with lock_mount()/umount races by checking that
      mountpoint to be is still in our namespace after lock_mount() has
      been done.  However, do_add_mount() skips that check when called
      with MNT_SHRINKABLE in flags (i.e. from finish_automount()).  The
      reason is that ->mnt_ns may be a temporary namespace created exactly
      to contain automounts a-la NFS4 referral handling.  It's not the
      namespace of the caller, though, so check_mnt() would fail here.
      We still need to check that ->mnt_ns is non-NULL in that case,
      though.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      156cacb1
  3. 22 9月, 2012 2 次提交
    • L
      debugfs: fix u32_array race in format_array_alloc · e05e279e
      Linus Torvalds 提交于
      The format_array_alloc() function is fundamentally racy, in that it
      prints the array twice: once to figure out how much space to allocate
      for the buffer, and the second time to actually print out the data.
      
      If any of the array contents changes in between, the allocation size may
      be wrong, and the end result may be truncated in odd ways.
      
      Just don't do it.  Allocate a maximum-sized array up-front, and just
      format the array contents once.  The only user of the u32_array
      interfaces is the Xen spinlock statistics code, and it has 31 entries in
      the arrays, so the maximum size really isn't that big, and the end
      result is much simpler code without the bug.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e05e279e
    • D
      debugfs: fix race in u32_array_read and allocate array at open · 36048853
      David Rientjes 提交于
      u32_array_open() is racy when multiple threads read from a file with a
      seek position of zero, i.e. when two or more simultaneous reads are
      occurring after the non-seekable files are created.  It is possible that
      file->private_data is double-freed because the threads races between
      
      	kfree(file->private-data);
      
      and
      
      	file->private_data = NULL;
      
      The fix is to only do format_array_alloc() when the file is opened and
      free it when it is closed.
      
      Note that because the file has always been non-seekable, you can't open
      it and read it multiple times anyway, so the data has always been
      generated just once.  The difference is that now it is generated at open
      time rather than at the time of the first read, and that avoids the
      race.
      Reported-by: NDave Jones <davej@redhat.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Tested-by: NRaghavendra <raghavendra.kt@linux.vnet.ibm.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36048853
  4. 19 9月, 2012 3 次提交
    • B
      xfs: stop the sync worker before xfs_unmountfs · 0ba6e536
      Ben Myers 提交于
      Cancel work of the xfs_sync_worker before teardown of the log in
      xfs_unmountfs.  This prevents occasional crashes on unmount like so:
      
      PID: 21602  TASK: ee9df060  CPU: 0   COMMAND: "kworker/0:3"
       #0 [c5377d28] crash_kexec at c0292c94
       #1 [c5377d80] oops_end at c07090c2
       #2 [c5377d98] no_context at c06f614e
       #3 [c5377dbc] __bad_area_nosemaphore at c06f6281
       #4 [c5377df4] bad_area_nosemaphore at c06f629b
       #5 [c5377e00] do_page_fault at c070b0cb
       #6 [c5377e7c] error_code (via page_fault) at c070892c
          EAX: f300c6a8  EBX: f300c6a8  ECX: 000000c0  EDX: 000000c0  EBP: c5377ed0
          DS:  007b      ESI: 00000000  ES:  007b      EDI: 00000001  GS:  ffffad20
          CS:  0060      EIP: c0481ad0  ERR: ffffffff  EFLAGS: 00010246
       #7 [c5377eb0] atomic64_read_cx8 at c0481ad0
       #8 [c5377ebc] xlog_assign_tail_lsn_locked at f7cc7c6e [xfs]
       #9 [c5377ed4] xfs_trans_ail_delete_bulk at f7ccd520 [xfs]
      #10 [c5377f0c] xfs_buf_iodone at f7ccb602 [xfs]
      #11 [c5377f24] xfs_buf_do_callbacks at f7cca524 [xfs]
      #12 [c5377f30] xfs_buf_iodone_callbacks at f7cca5da [xfs]
      #13 [c5377f4c] xfs_buf_iodone_work at f7c718d0 [xfs]
      #14 [c5377f58] process_one_work at c024ee4c
      #15 [c5377f98] worker_thread at c024f43d
      #16 [c5377fbc] kthread at c025326b
      #17 [c5377fe8] kernel_thread_helper at c070e834
      
      PID: 26653  TASK: e79143b0  CPU: 3   COMMAND: "umount"
       #0 [cde0fda0] __schedule at c0706595
       #1 [cde0fe28] schedule at c0706b89
       #2 [cde0fe30] schedule_timeout at c0705600
       #3 [cde0fe94] __down_common at c0706098
       #4 [cde0fec8] __down at c0706122
       #5 [cde0fed0] down at c025936f
       #6 [cde0fee0] xfs_buf_lock at f7c7131d [xfs]
       #7 [cde0ff00] xfs_freesb at f7cc2236 [xfs]
       #8 [cde0ff10] xfs_fs_put_super at f7c80f21 [xfs]
       #9 [cde0ff1c] generic_shutdown_super at c0333d7a
      #10 [cde0ff38] kill_block_super at c0333e0f
      #11 [cde0ff48] deactivate_locked_super at c0334218
      #12 [cde0ff58] deactivate_super at c033495d
      #13 [cde0ff68] mntput_no_expire at c034bc13
      #14 [cde0ff7c] sys_umount at c034cc69
      #15 [cde0ffa0] sys_oldumount at c034ccd4
      #16 [cde0ffb0] system_call at c0707e66
      
      commit 11159a05 added this to xfs_log_unmount and needs to be cleaned up
      at a later date.
      Signed-off-by: NBen Myers <bpm@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMark Tinguely <tinguely@sgi.com>
      0ba6e536
    • J
      cifs: fix return value in cifsConvertToUTF16 · c73f6939
      Jeff Layton 提交于
      This function returns the wrong value, which causes the callers to get
      the length of the resulting pathname wrong when it contains non-ASCII
      characters.
      
      This seems to fix https://bugzilla.samba.org/show_bug.cgi?id=6767
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NBaldvin Kovacs <baldvin.kovacs@gmail.com>
      Reported-and-Tested-by: NNicolas Lefebvre <nico.lefebvre@gmail.com>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      c73f6939
    • M
      vfs: dcache: use DCACHE_DENTRY_KILLED instead of DCACHE_DISCONNECTED in d_kill() · b161dfa6
      Miklos Szeredi 提交于
      IBM reported a soft lockup after applying the fix for the rename_lock
      deadlock.  Commit c83ce989 ("VFS: Fix the nfs sillyrename regression
      in kernel 2.6.38") was found to be the culprit.
      
      The nfs sillyrename fix used DCACHE_DISCONNECTED to indicate that the
      dentry was killed.  This flag can be set on non-killed dentries too,
      which results in infinite retries when trying to traverse the dentry
      tree.
      
      This patch introduces a separate flag: DCACHE_DENTRY_KILLED, which is
      only set in d_kill() and makes try_to_ascend() test only this flag.
      
      IBM reported successful test results with this patch.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b161dfa6
  5. 18 9月, 2012 1 次提交
    • F
      fs/proc: fix potential unregister_sysctl_table hang · 6bf61045
      Francesco Ruggeri 提交于
      The unregister_sysctl_table() function hangs if all references to its
      ctl_table_header structure are not dropped.
      
      This can happen sometimes because of a leak in proc_sys_lookup():
      proc_sys_lookup() gets a reference to the table via lookup_entry(), but
      it does not release it when a subsequent call to sysctl_follow_link()
      fails.
      
      This patch fixes this leak by making sure the reference is always
      dropped on return.
      
      See also commit 076c3eed ("sysctl: Rewrite proc_sys_lookup
      introducing find_entry and lookup_entry") which reorganized this code in
      3.4.
      
      Tested in Linux 3.4.4.
      Signed-off-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6bf61045
  6. 15 9月, 2012 5 次提交
  7. 13 9月, 2012 3 次提交
  8. 12 9月, 2012 1 次提交
  9. 07 9月, 2012 3 次提交
  10. 06 9月, 2012 1 次提交
  11. 05 9月, 2012 5 次提交
  12. 04 9月, 2012 1 次提交
  13. 03 9月, 2012 1 次提交
    • D
      fuse: mark variables uninitialized · 381bf7ca
      Daniel Mack 提交于
      gcc 4.6.3 complains about uninitialized variables in fs/fuse/control.c:
      
        CC      fs/fuse/control.o
      fs/fuse/control.c: In function 'fuse_conn_congestion_threshold_write':
      fs/fuse/control.c:165:29: warning: 'val' may be used uninitialized in this function [-Wuninitialized]
      fs/fuse/control.c: In function 'fuse_conn_max_background_write':
      fs/fuse/control.c:128:23: warning: 'val' may be used uninitialized in this function [-Wuninitialized]
      
      fuse_conn_limit_write() will always return non-zero unless the &val
      is modified, so the warning is misleading. Let the compiler know
      about it by marking 'val' with 'uninitialized_var'.
      Signed-off-by: NDaniel Mack <zonque@gmail.com>
      Cc: Brian Foster <bfoster@redhat.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      381bf7ca
  14. 31 8月, 2012 2 次提交
    • M
      cuse: kill connection on initialization error · 8d39d801
      Miklos Szeredi 提交于
      Luca Risolia reported that a CUSE daemon will continue to run even if
      initialization of the emulated device failes for some reason (e.g. the device
      number is already registered by another driver).
      
      This patch disconnects the fuse device on error, which will make the userspace
      CUSE daemon exit, albeit without indication about what the problem was.
      Reported-by: NLuca Risolia <luca.risolia@studio.unibo.it>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      8d39d801
    • M
      cuse: fix fuse_conn_kill() · bbd99797
      Miklos Szeredi 提交于
      fuse_conn_kill() removed fc->entry, called fuse_ctl_remove_conn() and
      fuse_bdi_destroy().  None of which is appropriate for cuse cleanup.
      
      The fuse_ctl_remove_conn() decrements the nlink on the control filesystem, which
      is totally bogus.  The others are harmless but unnecessary.
      
      So move these out from fuse_conn_kill() to fuse_put_super() where they belong.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      bbd99797
  15. 30 8月, 2012 1 次提交
    • C
      xfs: fix race while discarding buffers [V4] · 6fb8a90a
      Carlos Maiolino 提交于
      While xfs_buftarg_shrink() is freeing buffers from the dispose list (filled with
      buffers from lru list), there is a possibility to have xfs_buf_stale() racing
      with it, and removing buffers from dispose list before xfs_buftarg_shrink() does
      it.
      
      This happens because xfs_buftarg_shrink() handle the dispose list without
      locking and the test condition in xfs_buf_stale() checks for the buffer being in
      *any* list:
      
      if (!list_empty(&bp->b_lru))
      
      If the buffer happens to be on dispose list, this causes the buffer counter of
      lru list (btp->bt_lru_nr) to be decremented twice (once in xfs_buftarg_shrink()
      and another in xfs_buf_stale()) causing a wrong account usage of the lru list.
      
      This may cause xfs_buftarg_shrink() to return a wrong value to the memory
      shrinker shrink_slab(), and such account error may also cause an underflowed
      value to be returned; since the counter is lower than the current number of
      items in the lru list, a decrement may happen when the counter is 0, causing
      an underflow on the counter.
      
      The fix uses a new flag field (and a new buffer flag) to serialize buffer
      handling during the shrink process. The new flag field has been designed to use
      btp->bt_lru_lock/unlock instead of xfs_buf_lock/unlock mechanism.
      
      dchinner, sandeen, aquini and aris also deserve credits for this.
      Signed-off-by: NCarlos Maiolino <cmaiolino@redhat.com>
      Reviewed-by: NBen Myers <bpm@sgi.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NBen Myers <bpm@sgi.com>
      6fb8a90a
  16. 29 8月, 2012 8 次提交
    • S
      Btrfs: fix that repair code is spuriously executed for transid failures · 256dd1bb
      Stefan Behrens 提交于
      If verify_parent_transid() fails for all mirrors, the current code
      calls repair_io_failure() anyway which means:
      - that the disk block is rewritten without repairing anything and
      - that a kernel log message is printed which misleadingly claims
        that a read error was corrected.
      
      This is an example:
      parent transid verify failed on 615015833600 wanted 110423 found 110424
      parent transid verify failed on 615015833600 wanted 110423 found 110424
      btrfs read error corrected: ino 1 off 615015833600 (dev /dev/...)
      
      It is wrong to ignore the results from verify_parent_transid() and to
      call repair_eb_io_failure() when the verification of the transids failed.
      This commit fixes the issue.
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      256dd1bb
    • L
      Btrfs: fix ordered extent leak when failing to start a transaction · d280e5be
      Liu Bo 提交于
      We cannot just return error before freeing ordered extent and releasing reserved
      space when we fail to start a transacion.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      d280e5be
    • L
      Btrfs: fix a dio write regression · 24c03fa5
      Liu Bo 提交于
      This bug is introduced by commit 3b8bde746f6f9bd36a9f05f5f3b6e334318176a9
      (Btrfs: lock extents as we map them in DIO).
      
      In dio write, we should unlock the section which we didn't do IO on in case that
      we fall back to buffered write.  But we need to not only unlock the section
      but also cleanup reserved space for the section.
      
      This bug was found while running xfstests 133, with this 133 no longer complains.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      24c03fa5
    • J
      Btrfs: fix deadlock with freeze and sync V2 · bd7de2c9
      Josef Bacik 提交于
      We can deadlock with freeze right now because we unconditionally start a
      transaction in our ->sync_fs() call.  To fix this just check and see if we
      have a running transaction to commit.  This saves us from the deadlock
      because at this point we'll have the umount sem for the sb so we're safe
      from freezes coming in after we've done our check.  With this patch the
      freeze xfstests no longer deadlocks.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      bd7de2c9
    • S
      Btrfs: revert checksum error statistic which can cause a BUG() · 5ee0844d
      Stefan Behrens 提交于
      Commit 442a4f63 added btrfs device
      statistic counters for detected IO and checksum errors to Linux 3.5.
      The statistic part that counts checksum errors in
      end_bio_extent_readpage() can cause a BUG() in a subfunction:
      "kernel BUG at fs/btrfs/volumes.c:3762!"
      That part is reverted with the current patch.
      However, the counting of checksum errors in the scrub context remains
      active, and the counting of detected IO errors (read, write or flush
      errors) in all contexts remains active.
      
      Cc: stable <stable@vger.kernel.org> # 3.5
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NChris Mason <chris.mason@oracle.com>
      5ee0844d
    • S
      Btrfs: remove superblock writing after fatal error · 68ce9682
      Stefan Behrens 提交于
      With commit acce952b, btrfs was changed to flag the filesystem with
      BTRFS_SUPER_FLAG_ERROR and switch to read-only mode after a fatal
      error happened like a write I/O errors of all mirrors.
      In such situations, on unmount, the superblock is written in
      btrfs_error_commit_super(). This is done with the intention to be able
      to evaluate the error flag on the next mount. A warning is printed
      in this case during the next mount and the log tree is ignored.
      
      The issue is that it is possible that the superblock points to a root
      that was not written (due to write I/O errors).
      The result is that the filesystem cannot be mounted. btrfsck also does
      not start and all the other btrfs-progs tools fail to start as well.
      However, mount -o recovery is working well and does the right things
      to recover the filesystem (i.e., don't use the log root, clear the
      free space cache and use the next mountable root that is stored in the
      root backup array).
      
      This patch removes the writing of the superblock when
      BTRFS_SUPER_FLAG_ERROR is set, and removes the handling of the error
      flag in the mount function.
      
      These lines can be used to reproduce the issue (using /dev/sdm):
      SCRATCH_DEV=/dev/sdm
      SCRATCH_MNT=/mnt
      echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup create foo
      ls -alLF /dev/mapper/foo
      mkfs.btrfs /dev/mapper/foo
      mount /dev/mapper/foo $SCRATCH_MNT
      echo bar > $SCRATCH_MNT/foo
      sync
      echo 0 25165824 error | dmsetup reload foo
      dmsetup resume foo
      ls -alF $SCRATCH_MNT
      touch $SCRATCH_MNT/1
      ls -alF $SCRATCH_MNT
      sleep 35
      echo 0 25165824 linear $SCRATCH_DEV 0 | dmsetup reload foo
      dmsetup resume foo
      sleep 1
      umount $SCRATCH_MNT
      btrfsck /dev/mapper/foo
      dmsetup remove foo
      Signed-off-by: NStefan Behrens <sbehrens@giantdisaster.de>
      Signed-off-by: NJan Schmidt <list.btrfs@jan-o-sch.net>
      68ce9682
    • J
      Btrfs: allow delayed refs to be merged · ae1e206b
      Josef Bacik 提交于
      Daniel Blueman reported a bug with fio+balance on a ramdisk setup.
      Basically what happens is the balance relocates a tree block which will drop
      the implicit refs for all of its children and adds a full backref.  Once the
      block is relocated we have to add the implicit refs back, so when we cow the
      block again we add the implicit refs for its children back.  The problem
      comes when the original drop ref doesn't get run before we add the implicit
      refs back.  The delayed ref stuff will specifically prefer ADD operations
      over DROP to keep us from freeing up an extent that will have references to
      it, so we try to add the implicit ref before it is actually removed and we
      panic.  This worked fine before because the add would have just canceled the
      drop out and we would have been fine.  But the backref walking work needs to
      be able to freeze the delayed ref stuff in time so we have this ever
      increasing sequence number that gets attached to all new delayed ref updates
      which makes us not merge refs and we run into this issue.
      
      So to fix this we need to merge delayed refs.  So everytime we run a
      clustered ref we need to try and merge all of its delayed refs.  The backref
      walking stuff locks the delayed ref head before processing, so if we have it
      locked we are safe to merge any refs inside of the sequence number.  If
      there is no sequence number we can merge all refs.  Doing this not only
      fixes our bug but keeps the delayed ref code from adding and removing
      useless refs and batching together multiple refs into one search instead of
      one search per delayed ref, which will really help our commit times.  I ran
      this with Daniels test and 276 and I haven't seen any problems.  Thanks,
      Reported-by: NDaniel J Blueman <daniel@quora.org>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      ae1e206b
    • J
      Btrfs: fix enospc problems when deleting a subvol · 5a24e84c
      Josef Bacik 提交于
      Subvol delete is a special kind of awful where we use the global reserve to
      cover the ENOSPC requirements.  The problem is once we're done removing
      everything we do a btrfs_update_inode(), which by default will try to do the
      delayed update stuff which will use it's own reserve.  There will be no
      space in this reserve and we'll return ENOSPC.  So instead use
      btrfs_update_inode_fallback() which will just fallback to updating the inode
      item in the case of enospc.  This is fine because the global reserve covers
      the space requirements for this.  With this patch I can now delete a subvol
      on a problem image Dave Sterba sent me.  Thanks,
      Reported-by: NDavid Sterba <dave@jikos.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      5a24e84c