1. 07 5月, 2013 20 次提交
    • J
      Btrfs: compare relevant parts of delayed tree refs · 41b0fc42
      Josef Bacik 提交于
      A user reported a panic while running a balance.  What was happening was he was
      relocating a block, which added the reference to the relocation tree.  Then
      relocation would walk through the relocation tree and drop that reference and
      free that block, and then it would walk down a snapshot which referenced the
      same block and add another ref to the block.  The problem is this was all
      happening in the same transaction, so the parent block was free'ed up when we
      drop our reference which was immediately available for allocation, and then it
      was used _again_ to add a reference for the same block from a different
      snapshot.  This resulted in something like this in the delayed ref tree
      
      add ref to 90234880, parent=2067398656, ref_root 1766, level 1
      del ref to 90234880, parent=2067398656, ref_root 18446744073709551608, level 1
      add ref to 90234880, parent=2067398656, ref_root 1767, level 1
      
      as you can see the ref_root's don't match, because when we inc the ref we use
      the header owner, which is the original tree the block belonged to, instead of
      the data reloc tree.  Then when we remove the extent we use the reloc tree
      objectid.  But none of this matters, since it is a shared reference which means
      only the parent matters.  When the delayed ref stuff runs it adds all the
      increments first, and then does all the drops, to make sure that we don't delete
      the ref if we net a positive ref count.  But tree blocks aren't allowed to have
      multiple refs from the same block, so this panics when it tries to add the
      second ref.  We need the add and the drop to cancel each other out in memory so
      we only do the final add.
      
      So to fix this we need to adjust how the delayed refs are added to the tree.
      Only the ref_root matters when it is a normal backref, and only the parent
      matters when it is a shared backref.  So make our decision based on what ref
      type we have.  This allows us to keep the ref_root in memory in case anybody
      wants to use it for something else, and it allows the delayed refs to be merged
      properly so we don't end up with this panic.
      
      With this patch the users image no longer panics on mount, and it has a clean
      fsck after a normal mount/umount cycle.  Thanks,
      
      Cc: stable@vger.kernel.org
      Reported-by: NRoman Mamedov <rm@romanrm.ru>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      41b0fc42
    • J
      Btrfs: fix infinite loop when we abort on mount · cf79ffb5
      Josef Bacik 提交于
      Testing my enospc log code I managed to abort a transaction during mount, which
      put me into an infinite loop.  This is because of two things, first we don't
      reset trans_no_join if we abort during transaction commit, which will force
      anybody trying to start a transaction to just loop endlessly waiting for it to
      be set to 0.  But this is still just a symptom, the second issue is we don't set
      the fs state to error during errors on mount.  This is because we don't want to
      do the flip read only thing during mount, but we still really want to set the fs
      state to an error to keep us from even getting to the trans_no_join check.  So
      fix both of these things, make sure to reset trans_no_join if we abort during a
      commit, and make sure we set the fs state to error no matter if we're mounting
      or not.  This should keep us from getting into this infinite loop again.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      cf79ffb5
    • W
      Btrfs: fix a warning when disabling quota · c9a9dbf2
      Wang Shilong 提交于
      Steps to reproduce:
      	mkfs.btrfs <disk>
      	mount <disk> <mnt>
      	btrfs quota enable <mnt>
      	btrfs sub create <mnt>/subv
      
      	i=1
      	while [ $i -le 10000 ]
      	do
      		dd if=/dev/zero of=<mnt>/subv/data_$i bs=1K count=1
      		i=$(($i+1))
      		if [ $i -eq 500 ]
      		then
      			btrfs quota disable $mnt
      		fi
      	done
      	dmesg
      Obviously, this warn_on() is unnecessary, and it will be easily triggered.
      Just remove it.
      Signed-off-by: NWang Shilong <wangsl-fnst@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c9a9dbf2
    • L
      Btrfs: pass NULL instead of 0 · 6b67a320
      Liu Bo 提交于
      set_extent_bit()'s (u64 *failed_start) expects NULL not 0.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      6b67a320
    • D
      btrfs: make subvol creation/deletion killable in the early stages · 5c50c9b8
      David Sterba 提交于
      The subvolume ioctls block on the parent directory mutex that can be
      held by other concurrent snapshot activity for a long time. Give the
      user at least some chance to get out of this situation by allowing
      to send a kill signal.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      5c50c9b8
    • D
      94ef7280
    • D
      btrfs: make orphan cleanup less verbose · 4884b476
      David Sterba 提交于
      The messages
      
        btrfs: unlinked 123 orphans
        btrfs: truncated 456 orphans
      
      are not useful to regular users and raise questions whether there are
      problems with the filesystem.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      4884b476
    • D
      btrfs: deprecate subvolrootid mount option · 5e2a4b25
      David Sterba 提交于
      This mount option was a workaround when subvol= assumed path relative
      to the default subvolume, not the toplevel one. This was fixed long time
      ago and subvolrootid has no effect.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      5e2a4b25
    • S
      Btrfs: Include the device in most error printk()s · c2cf52eb
      Simon Kirby 提交于
      With more than one btrfs volume mounted, it can be very difficult to find
      out which volume is hitting an error. btrfs_error() will print this, but
      it is currently rigged as more of a fatal error handler, while many of
      the printk()s are currently for debugging and yet-unhandled cases.
      
      This patch just changes the functions where the device information is
      already available. Some cases remain where the root or fs_info is not
      passed to the function emitting the error.
      
      This may introduce some confusion with volumes backed by multiple devices
      emitting errors referring to the primary device in the set instead of the
      one on which the error occurred.
      
      Use btrfs_printk(fs_info, format, ...) rather than writing the device
      string every time, and introduce macro wrappers ala XFS for brevity.
      Since the function already cannot be used for continuations, print a
      newline as part of the btrfs_printk() message rather than at each caller.
      Signed-off-by: NSimon Kirby <sim@hostway.ca>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c2cf52eb
    • D
      btrfs: update kconfig title · aa825914
      David Sterba 提交于
      The Kconfig title does not make much sense after the cleanup of
      CONFIG_EXPERIMENTAL option, align the wording with other filesystems.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      aa825914
    • D
      btrfs: clean snapshots one by one · 9d1a2a3a
      David Sterba 提交于
      Each time pick one dead root from the list and let the caller know if
      it's needed to continue. This should improve responsiveness during
      umount and balance which at some point waits for cleaning all currently
      queued dead roots.
      
      A new dead root is added to the end of the list, so the snapshots
      disappear in the order of deletion.
      
      The snapshot cleaning work is now done only from the cleaner thread and the
      others wake it if needed.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      9d1a2a3a
    • Z
      6841ebee
    • Z
    • L
      Btrfs: share stop worker code · 7abadb64
      Liu Bo 提交于
      Share the exactly same code of stopping workers.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      7abadb64
    • J
      Btrfs: add a incompatible format change for smaller metadata extent refs · 3173a18f
      Josef Bacik 提交于
      We currently store the first key of the tree block inside the reference for the
      tree block in the extent tree.  This takes up quite a bit of space.  Make a new
      key type for metadata which holds the level as the offset and completely removes
      storing the btrfs_tree_block_info inside the extent ref.  This reduces the size
      from 51 bytes to 33 bytes per extent reference for each tree block.  In practice
      this results in a 30-35% decrease in the size of our extent tree, which means we
      COW less and can keep more of the extent tree in memory which makes our heavy
      metadata operations go much faster.  This is not an automatic format change, you
      must enable it at mkfs time or with btrfstune.  This patch deals with having
      metadata stored as either the old format or the new format so it is easy to
      convert.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      3173a18f
    • L
      Btrfs: use helper to cleanup tree roots · be283b2e
      Liu Bo 提交于
      free_root_pointers() has been introduced to cleanup all of tree roots,
      so just use it instead.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      be283b2e
    • L
      Btrfs: cleanup unused arguments of btrfs_csum_data · b0496686
      Liu Bo 提交于
      Argument 'root' is no more used in btrfs_csum_data().
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b0496686
    • D
      btrfs: clean up transaction abort messages · 08748810
      David Sterba 提交于
      The transaction abort stacktrace is printed only once per module
      lifetime, but we'd like to see it each time it happens per mounted
      filesystem.  Introduce a fs_state flag that records it.
      
      Tweak the messages around abort:
      * add error number to the first abort
      * print the exact negative errno from btrfs_decode_error
      * clean up btrfs_decode_error and callers
      * no dots at the end of the messages
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      08748810
    • D
      btrfs: merge save_error_info helpers into one · bbece8a3
      David Sterba 提交于
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      bbece8a3
    • J
      Btrfs: add some free space cache tests · 74255aa0
      Josef Bacik 提交于
      We keep hitting bugs in the tree log replay because btrfs_remove_free_space
      doesn't account for some corner case.  So add a bunch of tests to try and fully
      test btrfs_remove_free_space since the only time it is called is during tree log
      replay.  These tests all finish successfully, so as we find more of these bugs
      we need to add to these tests to make sure we don't regress in fixing things.
      I've hidden the tests behind a Kconfig option, but they take no time to run so
      all btrfs developers should have this turned on all the time.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      74255aa0
  2. 30 4月, 2013 1 次提交
  3. 26 4月, 2013 1 次提交
  4. 19 4月, 2013 1 次提交
  5. 18 4月, 2013 3 次提交
  6. 14 4月, 2013 1 次提交
  7. 13 4月, 2013 1 次提交
    • J
      Btrfs: make sure nbytes are right after log replay · 4bc4bee4
      Josef Bacik 提交于
      While trying to track down a tree log replay bug I noticed that fsck was always
      complaining about nbytes not being right for our fsynced file.  That is because
      the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
      nbytes are not necessarily updated properly when we log it.  So to fix this we
      need to set nbytes to whatever it is on the inode that is on disk, so when we
      replay the extents we can just add the bytes that are being added as we replay
      the extent.  This makes it work for the case that we have the wrong nbytes or
      the case that we logged everything and nbytes is actually correct.  With this
      I'm no longer getting nbytes errors out of btrfsck.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4bc4bee4
  8. 12 4月, 2013 1 次提交
    • T
      kthread: Prevent unpark race which puts threads on the wrong cpu · f2530dc7
      Thomas Gleixner 提交于
      The smpboot threads rely on the park/unpark mechanism which binds per
      cpu threads on a particular core. Though the functionality is racy:
      
      CPU0	       	 	CPU1  	     	    CPU2
      unpark(T)				    wake_up_process(T)
        clear(SHOULD_PARK)	T runs
      			leave parkme() due to !SHOULD_PARK  
        bind_to(CPU2)		BUG_ON(wrong CPU)						    
      
      We cannot let the tasks move themself to the target CPU as one of
      those tasks is actually the migration thread itself, which requires
      that it starts running on the target cpu right away.
      
      The solution to this problem is to prevent wakeups in park mode which
      are not from unpark(). That way we can guarantee that the association
      of the task to the target cpu is working correctly.
      
      Add a new task state (TASK_PARKED) which prevents other wakeups and
      use this state explicitly for the unpark wakeup.
      
      Peter noticed: Also, since the task state is visible to userspace and
      all the parked tasks are still in the PID space, its a good hint in ps
      and friends that these tasks aren't really there for the moment.
      
      The migration thread has another related issue.
      
      CPU0	      	     	 CPU1
      Bring up CPU2
      create_thread(T)
      park(T)
       wait_for_completion()
      			 parkme()
      			 complete()
      sched_set_stop_task()
      			 schedule(TASK_PARKED)
      
      The sched_set_stop_task() call is issued while the task is on the
      runqueue of CPU1 and that confuses the hell out of the stop_task class
      on that cpu. So we need the same synchronizaion before
      sched_set_stop_task().
      Reported-by: NDave Jones <davej@redhat.com>
      Reported-and-tested-by: NDave Hansen <dave@sr71.net>
      Reported-and-tested-by: NBorislav Petkov <bp@alien8.de>
      Acked-by: NPeter Ziljstra <peterz@infradead.org>
      Cc: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Cc: dhillf@gmail.com
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1304091635430.21884@ionosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      f2530dc7
  9. 11 4月, 2013 2 次提交
  10. 10 4月, 2013 4 次提交
  11. 06 4月, 2013 3 次提交
    • T
      NFSv4/4.1: Fix bugs in nfs4[01]_walk_client_list · 7b1f1fd1
      Trond Myklebust 提交于
      It is unsafe to use list_for_each_entry_safe() here, because
      when we drop the nn->nfs_client_lock, we pin the _current_ list
      entry and ensure that it stays in the list, but we don't do the
      same for the _next_ list entry. Use of list_for_each_entry() is
      therefore the correct thing to do.
      
      Also fix the refcounting in nfs41_walk_client_list().
      
      Finally, ensure that the nfs_client has finished being initialised
      and, in the case of NFSv4.1, that the session is set up.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Bryan Schumaker <bjschuma@netapp.com>
      Cc: stable@vger.kernel.org [>= 3.7]
      7b1f1fd1
    • T
      NFSv4: Fix a memory leak in nfs4_discover_server_trunking · b193d59a
      Trond Myklebust 提交于
      When we assign a new rpc_client to clp->cl_rpcclient, we need to destroy
      the old one.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org [>=3.7]
      b193d59a
    • B
      GFS2: Issue discards in 512b sectors · b2c87cae
      Bob Peterson 提交于
      This patch changes GFS2's discard issuing code so that it calls
      function sb_issue_discard rather than blkdev_issue_discard. The
      code was calling blkdev_issue_discard and specifying the correct
      sector offset and sector size, but blkdev_issue_discard expects
      these values to be in terms of 512 byte sectors, even if the native
      sector size for the device is different. Calling sb_issue_discard
      with the BLOCK size instead ensures the correct block-to-512b-sector
      translation. I verified that "minlen" is specified in blocks, so
      comparing it to a number of blocks is correct.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      b2c87cae
  12. 04 4月, 2013 2 次提交