1. 22 10月, 2015 1 次提交
    • J
      Btrfs: add fragment=* debug mount option · d0bd4560
      Josef Bacik 提交于
      In tracking down these weird bitmap problems it was helpful to artificially
      create an extremely fragmented file system.  These mount options let us either
      fragment data or metadata or both.  With these options I could reproduce all
      sorts of weird latencies and hangs that occur under extreme fragmentation and
      get them fixed.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d0bd4560
  2. 29 9月, 2015 1 次提交
    • A
      Btrfs: __btrfs_std_error() logic should be consistent w/out CONFIG_PRINTK defined · 57d816a1
      Anand Jain 提交于
      error handling logic behaves differently with or without
      CONFIG_PRINTK defined, since there are two copies of the same
      function which a bit of different logic
      
      One, when CONFIG_PRINTK is defined, code is
      
      __btrfs_std_error(..)
      {
      ::
             save_error_info(fs_info);
             if (sb->s_flags & MS_BORN)
                     btrfs_handle_error(fs_info);
      }
      
      and two when CONFIG_PRINTK is not defined, the code is
      
      __btrfs_std_error(..)
      {
      ::
             if (sb->s_flags & MS_BORN) {
                     save_error_info(fs_info);
                     btrfs_handle_error(fs_info);
              }
      }
      
      I doubt if this was intentional ? and appear to have caused since
      we maintain two copies of the same function and they got diverged
      with commits.
      
      Now to decide which logic is correct reviewed changes as below,
      
       533574c6
      Commit added two copies of this function
      
       cf79ffb5
      Commit made change to only one copy of the function and to the
      copy when CONFIG_PRINTK is defined.
      
      To fix this, instead of maintaining two copies of same function
      approach, maintain single function, and just put the extra
      portion of the code under CONFIG_PRINTK define.
      
      This patch just does that. And keeps code of with CONFIG_PRINTK
      defined.
      Signed-off-by: NAnand Jain <anand.jain@oracle.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      57d816a1
  3. 10 9月, 2015 1 次提交
    • F
      Btrfs: remove unnecessary locking of cleaner_mutex to avoid deadlock · 85e0a0f2
      Filipe Manana 提交于
      After commmit e44163e1 ("btrfs: explictly delete unused block groups
      in close_ctree and ro-remount"), added in the 4.3 merge window, we have
      calls to btrfs_delete_unused_bgs() while holding the cleaner_mutex.
      This can cause a deadlock with a concurrent block group relocation (when
      a filesystem balance or shrink operation is in progress for example)
      because btrfs_delete_unused_bgs() locks delete_unused_bgs_mutex and the
      relocation path locks first delete_unused_bgs_mutex and then it locks
      cleaner_mutex, resulting in a classic ABBA deadlock:
      
               CPU 0                                        CPU 1
      
      lock fs_info->cleaner_mutex
      
                                                 __btrfs_balance() || btrfs_shrink_device()
                                                   lock fs_info->delete_unused_bgs_mutex
                                                   btrfs_relocate_chunk()
                                                     btrfs_relocate_block_group()
                                                       lock fs_info->cleaner_mutex
      btrfs_delete_unused_bgs()
        lock fs_info->delete_unused_bgs_mutex
      
      Fix this by not taking the cleaner_mutex before calling
      btrfs_delete_unused_bgs() because it's no longer needed after
      commit 67c5e7d4 ("Btrfs: fix race between balance and unused block
      group deletion"). The mutex fs_info->delete_unused_bgs_mutex, the
      spinlock fs_info->unused_bgs_lock and a block group's spinlock are
      enough to get correct serialization between tasks running relocation
      and unused block group deletion (as well as between multiple tasks
      concurrently calling btrfs_delete_unused_bgs()).
      
      This issue was discussed (in the mailing list) during the review of
      the patch titled "btrfs: explictly delete unused block groups in
      close_ctree and ro-remount" and it was agreed that acquiring the
      cleaner mutex had to be dropped after the patch titled
      "Btrfs: fix race between balance and unused block group deletion"
      got merged (both patches were submitted at about the same time, but
      one landed in kernel 4.2 and the other in the 4.3 merge window).
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      85e0a0f2
  4. 09 8月, 2015 1 次提交
    • C
      Btrfs: add support for blkio controllers · da2f0f74
      Chris Mason 提交于
      This attaches accounting information to bios as we submit them so the
      new blkio controllers can throttle on btrfs filesystems.
      
      Not much is required, we're just associating bios with blkcgs during clone,
      calling wbc_init_bio()/wbc_account_io() during writepages submission,
      and attaching the bios to the current context during direct IO.
      
      Finally if we are splitting bios during btrfs_map_bio, this attaches
      accounting information to the split.
      
      The end result is able to throttle nicely on single disk filesystems.  A
      little more work is required for multi-device filesystems.
      Signed-off-by: NChris Mason <clm@fb.com>
      da2f0f74
  5. 06 8月, 2015 1 次提交
  6. 29 7月, 2015 2 次提交
  7. 03 6月, 2015 8 次提交
    • O
      Btrfs: show subvol= and subvolid= in /proc/mounts · c8d3fe02
      Omar Sandoval 提交于
      Now that we're guaranteed to have a meaningful root dentry, we can just
      export seq_dentry() and use it in btrfs_show_options(). The subvolume ID
      is easy to get and can also be useful, so put that in there, too.
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NOmar Sandoval <osandov@osandov.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c8d3fe02
    • O
      Btrfs: unify subvol= and subvolid= mounting · 05dbe683
      Omar Sandoval 提交于
      Currently, mounting a subvolume with subvolid= takes a different code
      path than mounting with subvol=. This isn't really a big deal except for
      the fact that mounts done with subvolid= or the default subvolume don't
      have a dentry that's connected to the dentry tree like in the subvol=
      case. To unify the code paths, when given subvolid= or using the default
      subvolume ID, translate it into a subvolume name by walking
      ROOT_BACKREFs in the root tree and INODE_REFs in the filesystem trees.
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NOmar Sandoval <osandov@osandov.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      05dbe683
    • O
      Btrfs: fail on mismatched subvol and subvolid mount options · bb289b7b
      Omar Sandoval 提交于
      There's nothing to stop a user from passing both subvol= and subvolid=
      to mount, but if they don't refer to the same subvolume, someone is
      going to be surprised at some point. Error out on this case, but allow
      users to pass in both if they do match (which they could, for example,
      get out of /proc/mounts).
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NOmar Sandoval <osandov@osandov.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      bb289b7b
    • O
      Btrfs: clean up error handling in mount_subvol() · fa330659
      Omar Sandoval 提交于
      In preparation for new functionality in mount_subvol(), give it
      ownership of subvol_name and tidy up the error paths.
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NOmar Sandoval <osandov@osandov.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      fa330659
    • O
      Btrfs: remove all subvol options before mounting top-level · e6e4dbe8
      Omar Sandoval 提交于
      Currently, setup_root_args() substitutes 's/subvol=[^,]*/subvolid=0/'.
      But, this means that if the user passes both a subvol and subvolid for
      some reason, we won't actually mount the top-level when we recursively
      mount. For example, consider:
      
      mkfs.btrfs -f /dev/sdb
      mount /dev/sdb /mnt
      btrfs subvol create /mnt/subvol1 # subvolid=257
      btrfs subvol create /mnt/subvol2 # subvolid=258
      umount /mnt
      mount -osubvol=/subvol1,subvolid=258 /dev/sdb /mnt
      
      In the final mount, subvol=/subvol1,subvolid=258 becomes
      subvolid=0,subvolid=258, and the last option takes precedence, so we
      mount subvol2 and try to look up subvol1 inside of it, which fails.
      
      So, instead, do a thorough scan through the argument list and remove any
      subvol= and subvolid= options, then append subvolid=0 to the end. This
      implicitly makes subvol= take precedence over subvolid=, but we're about
      to add a stricter check for that. This also makes setup_root_args() more
      generic, which we'll need soon.
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NOmar Sandoval <osandov@osandov.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      e6e4dbe8
    • O
      Btrfs: lock superblock before remounting for rw subvol · 773cd04e
      Omar Sandoval 提交于
      Since commit 0723a047 ("btrfs: allow mounting btrfs subvolumes with
      different ro/rw options"), when mounting a subvolume read/write when
      another subvolume has previously been mounted read-only, we first do a
      remount. However, this should be done with the superblock locked, as per
      sync_filesystem():
      
      	/*
      	 * We need to be protected against the filesystem going from
      	 * r/o to r/w or vice versa.
      	 */
      	WARN_ON(!rwsem_is_locked(&sb->s_umount));
      
      This WARN_ON can easily be hit with:
      
      mkfs.btrfs -f /dev/vdb
      mount /dev/vdb /mnt
      btrfs subvol create /mnt/vol1
      btrfs subvol create /mnt/vol2
      umount /mnt
      mount -oro,subvol=/vol1 /dev/vdb /mnt
      mount -orw,subvol=/vol2 /dev/vdb /mnt2
      
      Fixes: 0723a047 ("btrfs: allow mounting btrfs subvolumes with different ro/rw options")
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NOmar Sandoval <osandov@osandov.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      773cd04e
    • D
      btrfs: add 'cold' compiler annotations to all error handling functions · c0d19e2b
      David Sterba 提交于
      The annotated functios will be placed into .text.unlikely section. The
      annotation also hints compiler to move the code out of the hot paths,
      and may implicitly mark if-statement leading to that block as unlikely.
      
      This is a heuristic, the impact on the generated code is not
      significant.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      c0d19e2b
    • D
      btrfs: report exact callsite where transaction abort occurs · 1a9a8a71
      David Sterba 提交于
      WARN is called from a single location and all bugreports say that's in
      super.c __btrfs_abort_transaction. This is slightly confusing as we'd
      rather want to know the exact callsite. Whereas this information is
      printed in the syslog below the stacktrace, this requires further look
      and we usually see only the headline from WARNING.
      
      Moving the WARN into the macro has to inline some code and increases
      code by a few kilobytes:
      
        text    data     bss     dec     hex filename
      835481   20305   14120  869906   d4612 btrfs.ko.before
      842883   20305   14120  877308   d62fc btrfs.ko.after
      
      The delta is +7k (130+ calls), measured on 3.19 x86_64, distro config.
      The increase is not small and could lead to worse icache use. The code
      is on error/exit paths that can be recognized by compiler as cold and
      moved out of the way so the impact is speculated to be low, if
      measurable at all.
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      1a9a8a71
  8. 16 4月, 2015 1 次提交
  9. 27 3月, 2015 2 次提交
    • J
      btrfs: cleanup orphans while looking up default subvolume · 727b9784
      Jeff Mahoney 提交于
      Orphans in the fs tree are cleaned up via open_ctree and subvolume
      orphans are cleaned via btrfs_lookup_dentry -- except when a default
      subvolume is in use.  The name for the default subvolume uses a manual
      lookup that doesn't trigger orphan cleanup and needs to trigger it
      manually as well. This doesn't apply to the remount case since the
      subvolumes are cleaned up by walking the root radix tree.
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      727b9784
    • T
      btrfs: explicitly set control file's private_data · d8620958
      Tom Van Braeckel 提交于
      The private_data member of the Btrfs control device file
      (/dev/btrfs-control) is used to hold the current transaction and needs
      to be initialized to NULL to signify that no transaction is in progress.
      
      We explicitly set the control file's private_data to NULL to be
      independent of whatever value the misc subsystem initializes it to.
      
      Backstory:
      ----------
      
      The misc subsystem (which is used by /dev/btrfs-control) initializes
      a file's private_data to point to the misc device when a driver has
      registered a custom open file operation and initializes it to NULL
      when a custom open file operation has *not* been provided.
      
      This subtle quirk is confusing, to the point where kernel code registers
      *empty* file open operations to have private_data point to the misc
      device structure.
      
      And it leads to bugs, where the addition or removal of a custom open
      file operation surprisingly changes the initial contents of a file's
      private_data structure.
      
      To simplify things in the misc subsystem, a patch [1] has been proposed
      to *always* set private_data to point to the misc device instead of
      only doing this when a custom open file operation has been registered.
      
      But before we can fix this in the misc subsystem itself, we need to
      modify the (few) drivers that rely on this very subtle behavior.
      
      [1] https://lkml.org/lkml/2014/12/4/939Signed-off-by: NMartin Kepplinger <martink@posteo.de>
      Signed-off-by: NTom Van Braeckel <tomvanbraeckel@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d8620958
  10. 04 3月, 2015 1 次提交
  11. 21 2月, 2015 1 次提交
  12. 22 1月, 2015 1 次提交
  13. 21 1月, 2015 1 次提交
    • Q
      btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock. · a53f4f8e
      Qu Wenruo 提交于
      Commit 6b5fe46d (btrfs: do commit in sync_fs if there are pending
      changes) will call btrfs_start_transaction() in sync_fs(), to handle
      some operations needed to be done in next transaction.
      
      However this can cause deadlock if the filesystem is frozen, with the
      following sys_r+w output:
      [  143.255932] Call Trace:
      [  143.255936]  [<ffffffff816c0e09>] schedule+0x29/0x70
      [  143.255939]  [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100
      [  143.255971]  [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0
      [btrfs]
      [  143.255992]  [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20
      [btrfs]
      [  143.256003]  [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs]
      [  143.256007]  [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30
      [  143.256011]  [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0
      [  143.256014]  [<ffffffff811f7d75>] sys_sync+0x55/0x90
      [  143.256017]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
      [  143.256111] Call Trace:
      [  143.256114]  [<ffffffff816c0e09>] schedule+0x29/0x70
      [  143.256119]  [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0
      [  143.256123]  [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20
      [  143.256131]  [<ffffffff811caae8>] thaw_super+0x28/0xc0
      [  143.256135]  [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540
      [  143.256187]  [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0
      [  143.256213]  [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
      
      The reason is like the following:
      (Holding s_umount)
      VFS sync_fs staff:
      |- btrfs_sync_fs()
         |- btrfs_start_transaction()
            |- sb_start_intwrite()
            (Waiting thaw_fs to unfreeze)
      					VFS thaw_fs staff:
      					thaw_fs()
      					(Waiting sync_fs to release
      					 s_umount)
      
      So deadlock happens.
      This can be easily triggered by fstest/generic/068 with inode_cache
      mount option.
      
      The fix is to check if the fs is frozen, if the fs is frozen, just
      return and waiting for the next transaction.
      
      Cc: David Sterba <dsterba@suse.cz>
      Reported-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      [enhanced comment, changed to SB_FREEZE_WRITE]
      Signed-off-by: NDavid Sterba <dsterba@suse.cz>
      Signed-off-by: NChris Mason <clm@fb.com>
      a53f4f8e
  14. 20 1月, 2015 1 次提交
  15. 03 12月, 2014 1 次提交
    • F
      Btrfs: make btrfs_abort_transaction consider existence of new block groups · c92f6be3
      Filipe Manana 提交于
      If the transaction handle doesn't have used blocks but has created new block
      groups make sure we turn the fs into readonly mode too. This is because the
      new block groups didn't get all their metadata persisted into the chunk and
      device trees, and therefore if a subsequent transaction starts, allocates
      space from the new block groups, writes data or metadata into that space,
      commits successfully and then after we unmount and mount the filesystem
      again, the same space can be allocated again for a new block group,
      resulting in file data or metadata corruption.
      
      Example where we don't abort the transaction when we fail to finish the
      chunk allocation (add items to the chunk and device trees) and later a
      future transaction where the block group is removed fails because it can't
      find the chunk item in the chunk tree:
      
      [25230.404300] WARNING: CPU: 0 PID: 7721 at fs/btrfs/super.c:260 __btrfs_abort_transaction+0x50/0xfc [btrfs]()
      [25230.404301] BTRFS: Transaction aborted (error -28)
      [25230.404302] Modules linked in: btrfs dm_flakey nls_utf8 fuse xor raid6_pq ntfs vfat msdos fat xfs crc32c_generic libcrc32c ext3 jbd ext2 dm_mod nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc loop psmouse i2c_piix4 i2ccore parport_pc parport processor button pcspkr serio_raw thermal_sys evdev microcode ext4 crc16 jbd2 mbcache sr_mod cdrom ata_generic sg sd_mod crc_t10dif crct10dif_generic crct10dif_common virtio_scsi floppy e1000 ata_piix libata virtio_pci virtio_ring scsi_mod virtio [last unloaded: btrfs]
      [25230.404325] CPU: 0 PID: 7721 Comm: xfs_io Not tainted 3.17.0-rc5-btrfs-next-1+ #1
      [25230.404326] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [25230.404328]  0000000000000000 ffff88004581bb08 ffffffff813e7a13 ffff88004581bb50
      [25230.404330]  ffff88004581bb40 ffffffff810423aa ffffffffa049386a 00000000ffffffe4
      [25230.404332]  ffffffffa05214c0 000000000000240c ffff88010fc8f800 ffff88004581bba8
      [25230.404334] Call Trace:
      [25230.404338]  [<ffffffff813e7a13>] dump_stack+0x45/0x56
      [25230.404342]  [<ffffffff810423aa>] warn_slowpath_common+0x7f/0x98
      [25230.404351]  [<ffffffffa049386a>] ? __btrfs_abort_transaction+0x50/0xfc [btrfs]
      [25230.404353]  [<ffffffff8104240b>] warn_slowpath_fmt+0x48/0x50
      [25230.404362]  [<ffffffffa049386a>] __btrfs_abort_transaction+0x50/0xfc [btrfs]
      [25230.404374]  [<ffffffffa04a8c43>] btrfs_create_pending_block_groups+0x10c/0x135 [btrfs]
      [25230.404387]  [<ffffffffa04b77fd>] __btrfs_end_transaction+0x7e/0x2de [btrfs]
      [25230.404398]  [<ffffffffa04b7a6d>] btrfs_end_transaction+0x10/0x12 [btrfs]
      [25230.404408]  [<ffffffffa04a3d64>] btrfs_check_data_free_space+0x111/0x1f0 [btrfs]
      [25230.404421]  [<ffffffffa04c53bd>] __btrfs_buffered_write+0x160/0x48d [btrfs]
      [25230.404425]  [<ffffffff811a9268>] ? cap_inode_need_killpriv+0x2d/0x37
      [25230.404429]  [<ffffffff810f6501>] ? get_page+0x1a/0x2b
      [25230.404441]  [<ffffffffa04c7c95>] btrfs_file_write_iter+0x321/0x42f [btrfs]
      [25230.404443]  [<ffffffff8110f5d9>] ? handle_mm_fault+0x7f3/0x846
      [25230.404446]  [<ffffffff813e98c5>] ? mutex_unlock+0x16/0x18
      [25230.404449]  [<ffffffff81138d68>] new_sync_write+0x7c/0xa0
      [25230.404450]  [<ffffffff81139401>] vfs_write+0xb0/0x112
      [25230.404452]  [<ffffffff81139c9d>] SyS_pwrite64+0x66/0x84
      [25230.404454]  [<ffffffff813ebf52>] system_call_fastpath+0x16/0x1b
      [25230.404455] ---[ end trace 5aa5684fdf47ab38 ]---
      [25230.404458] BTRFS warning (device sdc): btrfs_create_pending_block_groups:9228: Aborting unused transaction(No space left).
      [25288.084814] BTRFS: error (device sdc) in btrfs_free_chunk:2509: errno=-2 No such entry (Failed lookup while freeing chunk.)
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c92f6be3
  16. 21 11月, 2014 2 次提交
  17. 12 11月, 2014 2 次提交
  18. 28 10月, 2014 1 次提交
  19. 08 10月, 2014 1 次提交
  20. 06 10月, 2014 1 次提交
    • Q
      btrfs: Make btrfs handle security mount options internally to avoid losing security label. · f667aef6
      Qu Wenruo 提交于
      [BUG]
      Originally when mount btrfs with "-o subvol=" mount option, btrfs will
      lose all security lable.
      And if the btrfs fs is mounted somewhere else, due to the lost of
      security lable, SELinux will refuse to mount since the same super block
      is being mounted using different security lable.
      
      [REPRODUCER]
      With SELinux enabled:
       #mkfs -t btrfs /dev/sda5
       #mount -o context=system_u:object_r:nfs_t:s0 /dev/sda5 /mnt/btrfs
       #btrfs subvolume create /mnt/btrfs/subvol
       #mount -o subvol=subvol,context=system_u:object_r:nfs_t:s0 /dev/sda5
        /mnt/test
      
      kernel message:
      SELinux: mount invalid.  Same superblock, different security settings
      for (dev sda5, type btrfs)
      
      [REASON]
      This happens because btrfs will call vfs_kern_mount() and then
      mount_subtree() to handle subvolume name lookup.
      First mount will cut off all the security lables and when it comes to
      the second vfs_kern_mount(), it has no security label now.
      
      [FIX]
      This patch will makes btrfs behavior much more like nfs,
      which has the type flag FS_BINARY_MOUNTDATA,
      making btrfs handles the security label internally.
      So security label will be set in the real mount time and won't lose
      label when use with "subvol=" mount option.
      Reported-by: NEryu Guan <guaneryu@gmail.com>
      Signed-off-by: NQu Wenruo <quwenruo@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f667aef6
  21. 02 10月, 2014 4 次提交
  22. 18 9月, 2014 4 次提交
  23. 15 8月, 2014 1 次提交