1. 27 6月, 2014 2 次提交
    • J
      ext4: Fix hole punching for files with indirect blocks · a93cd4cf
      Jan Kara 提交于
      Hole punching code for files with indirect blocks wrongly computed
      number of blocks which need to be cleared when traversing the indirect
      block tree. That could result in punching more blocks than actually
      requested and thus effectively cause a data loss. For example:
      
      fallocate -n -p 10240000 4096
      
      will punch the range 10240000 - 12632064 instead of the range 1024000 -
      10244096. Fix the calculation.
      
      CC: stable@vger.kernel.org
      Fixes: 8bad6fc8Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      a93cd4cf
    • J
      ext4: Fix block zeroing when punching holes in indirect block files · 77ea2a4b
      Jan Kara 提交于
      free_holes_block() passed local variable as a block pointer
      to ext4_clear_blocks(). Thus ext4_clear_blocks() zeroed out this local
      variable instead of proper place in inode / indirect block. We later
      zero out proper place in inode / indirect block but don't dirty the
      inode / buffer again which can lead to subtle issues (some changes e.g.
      to inode can be lost).
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      77ea2a4b
  2. 26 6月, 2014 2 次提交
  3. 16 6月, 2014 1 次提交
    • J
      ext4: Fix buffer double free in ext4_alloc_branch() · c5c7b8dd
      Jan Kara 提交于
      Error recovery in ext4_alloc_branch() calls ext4_forget() even for
      buffer corresponding to indirect block it did not allocate. This leads
      to brelse() being called twice for that buffer (once from ext4_forget()
      and once from cleanup in ext4_ind_map_blocks()) leading to buffer use
      count misaccounting. Eventually (but often much later because there
      are other users of the buffer) we will see messages like:
      VFS: brelse: Trying to free free buffer
      
      Another manifestation of this problem is an error:
      JBD2 unexpected failure: jbd2_journal_revoke: !buffer_revoked(bh);
      inconsistent data on disk
      
      The fix is easy - don't forget buffer we did not allocate. Also add an
      explanatory comment because the indexing at ext4_alloc_branch() is
      somewhat subtle.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Cc: stable@vger.kernel.org
      c5c7b8dd
  4. 14 6月, 2014 7 次提交
    • E
      btrfs: fix error handling in create_pending_snapshot · 47a306a7
      Eric Sandeen 提交于
      fcebe456 cut and pasted some code to a later point
      in create_pending_snapshot(), but didn't switch
      to the appropriate error handling for this stage
      of the function.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      47a306a7
    • E
      btrfs: fix use of uninit "ret" in end_extent_writepage() · 3e2426bd
      Eric Sandeen 提交于
      If this condition in end_extent_writepage() is false:
      
      	if (tree->ops && tree->ops->writepage_end_io_hook)
      
      we will then test an uninitialized "ret" at:
      
      	ret = ret < 0 ? ret : -EIO;
      
      The test for ret is for the case where ->writepage_end_io_hook
      failed, and we'd choose that ret as the error; but if
      there is no ->writepage_end_io_hook, nothing sets ret.
      
      Initializing ret to 0 should be sufficient; if
      writepage_end_io_hook wasn't set, (!uptodate) means
      non-zero err was passed in, so we choose -EIO in that case.
      Signed-of-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      3e2426bd
    • E
      btrfs: free ulist in qgroup_shared_accounting() error path · d7372780
      Eric Sandeen 提交于
      If tmp = ulist_alloc(GFP_NOFS) fails, we return without
      freeing the previously allocated qgroups = ulist_alloc(GFP_NOFS)
      and cause a memory leak.
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      d7372780
    • F
      Btrfs: fix qgroups sanity test crash or hang · b050f9f6
      Filipe Manana 提交于
      Often when running the qgroups sanity test, a crash or a hang happened.
      This is because the extent buffer the test uses for the root node doesn't
      have an header level explicitly set, making it have a random level value.
      This is a problem when it's not zero for the btrfs_search_slot() calls
      the test ends up doing, resulting in crashes or hangs such as the following:
      
      [ 6454.127192] Btrfs loaded, debug=on, assert=on, integrity-checker=on
      (...)
      [ 6454.127760] BTRFS: selftest: Running qgroup tests
      [ 6454.127964] BTRFS: selftest: Running test_test_no_shared_qgroup
      [ 6454.127966] BTRFS: selftest: Qgroup basic add
      [ 6480.152005] BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:5383]
      [ 6480.152005] Modules linked in: btrfs(+) xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc i2c_piix4 i2c_core pcspkr evbug psmouse serio_raw e1000 [last unloaded: btrfs]
      [ 6480.152005] irq event stamp: 188448
      [ 6480.152005] hardirqs last  enabled at (188447): [<ffffffff8168ef5c>] restore_args+0x0/0x30
      [ 6480.152005] hardirqs last disabled at (188448): [<ffffffff81698e6a>] apic_timer_interrupt+0x6a/0x80
      [ 6480.152005] softirqs last  enabled at (188446): [<ffffffff810516cf>] __do_softirq+0x1cf/0x450
      [ 6480.152005] softirqs last disabled at (188441): [<ffffffff81051c25>] irq_exit+0xb5/0xc0
      [ 6480.152005] CPU: 0 PID: 5383 Comm: modprobe Not tainted 3.15.0-rc8-fdm-btrfs-next-33+ #4
      [ 6480.152005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [ 6480.152005] task: ffff8802146125a0 ti: ffff8800d0d00000 task.ti: ffff8800d0d00000
      [ 6480.152005] RIP: 0010:[<ffffffff81349a63>]  [<ffffffff81349a63>] __write_lock_failed+0x13/0x20
      [ 6480.152005] RSP: 0018:ffff8800d0d038e8  EFLAGS: 00000287
      [ 6480.152005] RAX: 0000000000000000 RBX: ffffffff8168ef5c RCX: 000005deb8525852
      [ 6480.152005] RDX: 0000000000000000 RSI: 0000000000001d45 RDI: ffff8802105000b8
      [ 6480.152005] RBP: ffff8800d0d038e8 R08: fffffe12710f63db R09: ffffffffa03196fb
      [ 6480.152005] R10: ffff8802146125a0 R11: ffff880214612e28 R12: ffff8800d0d03858
      [ 6480.152005] R13: 0000000000000000 R14: ffff8800d0d00000 R15: ffff8802146125a0
      [ 6480.152005] FS:  00007f14ff804700(0000) GS:ffff880215e00000(0000) knlGS:0000000000000000
      [ 6480.152005] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      [ 6480.152005] CR2: 00007fff4df0dac8 CR3: 00000000d1796000 CR4: 00000000000006f0
      [ 6480.152005] Stack:
      [ 6480.152005]  ffff8800d0d03908 ffffffff810ae967 0000000000000001 ffff8802105000b8
      [ 6480.152005]  ffff8800d0d03938 ffffffff8168e57e ffffffffa0319c16 0000000000000007
      [ 6480.152005]  ffff880210500000 ffff880210500100 ffff8800d0d039b8 ffffffffa0319c16
      [ 6480.152005] Call Trace:
      [ 6480.152005]  [<ffffffff810ae967>] do_raw_write_lock+0x47/0xa0
      [ 6480.152005]  [<ffffffff8168e57e>] _raw_write_lock+0x5e/0x80
      [ 6480.152005]  [<ffffffffa0319c16>] ? btrfs_tree_lock+0x116/0x270 [btrfs]
      [ 6480.152005]  [<ffffffffa0319c16>] btrfs_tree_lock+0x116/0x270 [btrfs]
      [ 6480.152005]  [<ffffffffa02b2acb>] btrfs_lock_root_node+0x3b/0x50 [btrfs]
      [ 6480.152005]  [<ffffffffa02b81a6>] btrfs_search_slot+0x916/0xa20 [btrfs]
      [ 6480.152005]  [<ffffffff811a727f>] ? create_object+0x23f/0x300
      [ 6480.152005]  [<ffffffffa02b9958>] btrfs_insert_empty_items+0x78/0xd0 [btrfs]
      [ 6480.152005]  [<ffffffffa036041a>] insert_normal_tree_ref.constprop.4+0xa2/0x19a [btrfs]
      [ 6480.152005]  [<ffffffffa03605c3>] test_no_shared_qgroup+0xb1/0x1ca [btrfs]
      [ 6480.152005]  [<ffffffff8108cad6>] ? local_clock+0x16/0x30
      [ 6480.152005]  [<ffffffffa035ef8e>] btrfs_test_qgroups+0x1ae/0x1d7 [btrfs]
      [ 6480.152005]  [<ffffffffa03a69d2>] ? ftrace_define_fields_btrfs_space_reservation+0xfd/0xfd [btrfs]
      [ 6480.152005]  [<ffffffffa03a6a86>] init_btrfs_fs+0xb4/0x153 [btrfs]
      [ 6480.152005]  [<ffffffff81000352>] do_one_initcall+0x102/0x150
      [ 6480.152005]  [<ffffffff8103d223>] ? set_memory_nx+0x43/0x50
      [ 6480.152005]  [<ffffffff81682668>] ? set_section_ro_nx+0x6d/0x74
      [ 6480.152005]  [<ffffffff810d91cc>] load_module+0x1cdc/0x2630
      (...)
      
      Therefore initialize the extent buffer as an empty leaf (level 0).
      
      Issue easy to reproduce when btrfs is built as a module via:
      
          $ for ((i = 1; i <= 1000000; i++)); do rmmod btrfs; modprobe btrfs; done
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      b050f9f6
    • S
      btrfs: prevent RCU warning when dereferencing radix tree slot · f1e3c289
      Sasha Levin 提交于
      Mark the dereference as protected by lock. Not doing so triggers
      an RCU warning since the radix tree assumed that RCU is in use.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      f1e3c289
    • W
      Btrfs: fix unfinished readahead thread for raid5/6 degraded mounting · 5fbc7c59
      Wang Shilong 提交于
      Steps to reproduce:
      
       # mkfs.btrfs -f /dev/sd[b-f] -m raid5 -d raid5
       # mkfs.ext4 /dev/sdc --->corrupt one of btrfs device
       # mount /dev/sdb /mnt -o degraded
       # btrfs scrub start -BRd /mnt
      
      This is because readahead would skip missing device, this is not true
      for RAID5/6, because REQ_GET_READ_MIRRORS return 1 for RAID5/6 block
      mapping. If expected data locates in missing device, readahead thread
      would not call __readahead_hook() which makes event @rc->elems=0
      wait forever.
      
      Fix this problem by checking return value of btrfs_map_block(),we
      can only skip missing device safely if there are several mirrors.
      Signed-off-by: NWang Shilong <wangsl.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      5fbc7c59
    • G
      btrfs: new ioctl TREE_SEARCH_V2 · cc68a8a5
      Gerhard Heift 提交于
      This new ioctl call allows the user to supply a buffer of varying size in which
      a tree search can store its results. This is much more flexible if you want to
      receive items which are larger than the current fixed buffer of 3992 bytes or
      if you want to fetch more items at once. Items larger than this buffer are for
      example some of the type EXTENT_CSUM.
      Signed-off-by: NGerhard Heift <Gerhard@Heift.Name>
      Signed-off-by: NChris Mason <clm@fb.com>
      Acked-by: NDavid Sterba <dsterba@suse.cz>
      cc68a8a5
  5. 13 6月, 2014 6 次提交
  6. 12 6月, 2014 8 次提交
  7. 11 6月, 2014 4 次提交
  8. 10 6月, 2014 10 次提交
    • M
      NFS: populate ->net in mount data when remounting · a914722f
      Mateusz Guzik 提交于
      Otherwise the kernel oopses when remounting with IPv6 server because
      net is dereferenced in dev_get_by_name.
      
      Use net ns of current thread so that dev_get_by_name does not operate on
      foreign ns. Changing the address is prohibited anyway so this should not
      affect anything.
      Signed-off-by: NMateusz Guzik <mguzik@redhat.com>
      Cc: linux-nfs@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      a914722f
    • W
      pnfs: fix lockup caused by pnfs_generic_pg_test · c5e20cb7
      Weston Andros Adamson 提交于
      end_offset and req_offset both return u64 - avoid casting to u32
      until it's needed, when it's less than the (u32) size returned by
      nfs_generic_pg_test.
      
      Also, fix the comments in pnfs_generic_pg_test.
      
      Running the cthon04 special tests caused this lockup in the
      "write/read at 2GB, 4GB edges" test when running against a file layout server:
      
      BUG: soft lockup - CPU#0 stuck for 22s! [bigfile2:823]
      Modules linked in: nfs_layout_nfsv41_files rpcsec_gss_krb5 nfsv4 nfs fscache ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 ip6table_mangle ip6table_filter ip6_tables iptable_nat nf_nat_ipv4 nf_nat iptable_mangle ppdev crc32c_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd serio_raw e1000 shpchp i2c_piix4 i2c_core parport_pc parport nfsd auth_rpcgss oid_registry exportfs nfs_acl lockd sunrpc btrfs xor zlib_deflate raid6_pq mptspi scsi_transport_spi mptscsih mptbase ata_generic floppy autofs4
      irq event stamp: 205958
      hardirqs last  enabled at (205957): [<ffffffff814a62dc>] restore_args+0x0/0x30
      hardirqs last disabled at (205958): [<ffffffff814ad96a>] apic_timer_interrupt+0x6a/0x80
      softirqs last  enabled at (205956): [<ffffffff8103ffb2>] __do_softirq+0x1ea/0x2ab
      softirqs last disabled at (205951): [<ffffffff8104026d>] irq_exit+0x44/0x9a
      CPU: 0 PID: 823 Comm: bigfile2 Not tainted 3.15.0-rc1-branch-pgio_plus+ #3
      Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
      task: ffff8800792ec480 ti: ffff880078c4e000 task.ti: ffff880078c4e000
      RIP: 0010:[<ffffffffa02ce51f>]  [<ffffffffa02ce51f>] nfs_page_group_unlock+0x3e/0x4b [nfs]
      RSP: 0018:ffff880078c4fab0  EFLAGS: 00000202
      RAX: 0000000000000fff RBX: ffff88006bf83300 RCX: 0000000000000000
      RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88006bf83300
      RBP: ffff880078c4fab8 R08: 0000000000000001 R09: 0000000000000000
      R10: ffffffff8249840c R11: 0000000000000000 R12: 0000000000000035
      R13: ffff88007ffc72d8 R14: 0000000000000001 R15: 0000000000000000
      FS:  00007f45f11b7740(0000) GS:ffff88007f200000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3a8cb632d0 CR3: 000000007931c000 CR4: 00000000001407f0
      Stack:
       ffff88006bf832c0 ffff880078c4fb00 ffffffffa02cec22 ffff880078c4fad8
       00000fff810f9d99 ffff880078c4fca0 ffff88006bf832c0 ffff88006bf832c0
       ffff880078c4fca0 ffff880078c4fd60 ffff880078c4fb28 ffffffffa02cee34
      Call Trace:
       [<ffffffffa02cec22>] __nfs_pageio_add_request+0x298/0x34f [nfs]
       [<ffffffffa02cee34>] nfs_pageio_add_request+0x1f/0x42 [nfs]
       [<ffffffffa02d1722>] nfs_do_writepage+0x1b5/0x1e4 [nfs]
       [<ffffffffa02d1764>] nfs_writepages_callback+0x13/0x25 [nfs]
       [<ffffffffa02d1751>] ? nfs_do_writepage+0x1e4/0x1e4 [nfs]
       [<ffffffff810eb32d>] write_cache_pages+0x254/0x37f
       [<ffffffffa02d1751>] ? nfs_do_writepage+0x1e4/0x1e4 [nfs]
       [<ffffffff8149cf9e>] ? printk+0x54/0x56
       [<ffffffff810eacca>] ? __set_page_dirty_nobuffers+0x22/0xe9
       [<ffffffffa016d864>] ? put_rpccred+0x38/0x101 [sunrpc]
       [<ffffffffa02d1ae1>] nfs_writepages+0xb4/0xf8 [nfs]
       [<ffffffff810ec59c>] do_writepages+0x21/0x2f
       [<ffffffff810e36e8>] __filemap_fdatawrite_range+0x55/0x57
       [<ffffffff810e374a>] filemap_write_and_wait_range+0x2d/0x5b
       [<ffffffffa030ba0a>] nfs4_file_fsync+0x3a/0x98 [nfsv4]
       [<ffffffff8114ee3c>] vfs_fsync_range+0x18/0x20
       [<ffffffff810e40c2>] generic_file_aio_write+0xa7/0xbd
       [<ffffffffa02c5c6b>] nfs_file_write+0xf0/0x170 [nfs]
       [<ffffffff81129215>] do_sync_write+0x59/0x78
       [<ffffffff8112956c>] vfs_write+0xab/0x107
       [<ffffffff81129c8b>] SyS_write+0x49/0x7f
       [<ffffffff814acd12>] system_call_fastpath+0x16/0x1b
      Reported-by: NAnna Schumaker <Anna.Schumaker@netapp.com>
      Signed-off-by: NWeston Andros Adamson <dros@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      c5e20cb7
    • L
      Btrfs: fix scrub_print_warning to handle skinny metadata extents · 6eda71d0
      Liu Bo 提交于
      The skinny extents are intepreted incorrectly in scrub_print_warning(),
      and end up hitting the BUG() in btrfs_extent_inline_ref_size.
      Reported-by: NKonstantinos Skarlatos <k.skarlatos@gmail.com>
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      6eda71d0
    • F
      Btrfs: make fsync work after cloning into a file · 7ffbb598
      Filipe Manana 提交于
      When cloning into a file, we were correctly replacing the extent
      items in the target range and removing the extent maps. However
      we weren't replacing the extent maps with new ones that point to
      the new extents - as a consequence, an incremental fsync (when the
      inode doesn't have the full sync flag) was a NOOP, since it relies
      on the existence of extent maps in the modified list of the inode's
      extent map tree, which was empty. Therefore add new extent maps to
      reflect the target clone range.
      
      A test case for xfstests follows.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      7ffbb598
    • L
      Btrfs: use right type to get real comparison · cd857dd6
      Liu Bo 提交于
      We want to make sure the point is still within the extent item, not to verify
      the memory it's pointing to.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      cd857dd6
    • J
      Btrfs: don't check nodes for extent items · 8a56457f
      Josef Bacik 提交于
      The backref code was looking at nodes as well as leaves when we tried to
      populate extent item entries.  This is not good, and although we go away with it
      for the most part because we'd skip where disk_bytenr != random_memory,
      sometimes random_memory would match and suddenly boom.  This fixes that problem.
      Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      8a56457f
    • F
      Btrfs: don't release invalid page in btrfs_page_exists_in_range() · 6fdef6d4
      Filipe Manana 提交于
      In inode.c:btrfs_page_exists_in_range(), if the page we got from
      the radix tree is an exception entry, which can't be retried, we
      exit the loop with a non-NULL page and then call page_cache_release
      against it, which is not ok since it's not a valid page. This could
      also make us return true when we shouldn't.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      6fdef6d4
    • F
      Btrfs: make sure we retry if page is a retriable exception · 809f9016
      Filipe Manana 提交于
      In inode.c:btrfs_page_exists_in_range(), if the page we get from the
      radix tree is an exception which should make us retry, set page to
      NULL in order to really retry, because otherwise we don't get another
      loop iteration executed (page != NULL makes the while loop exit).
      This also was making us call page_cache_release after exiting the loop,
      which isn't correct because page doesn't point to a valid page, and
      possibly return true from the function when we shouldn't.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      809f9016
    • F
      Btrfs: make sure we retry if we couldn't get the page · 91405151
      Filipe Manana 提交于
      In inode.c:btrfs_page_exists_in_range(), if we can't get the page
      we need to retry. However we weren't retrying because we weren't
      setting page to NULL, which makes the while loop exit immediately
      and will make us call page_cache_release after exiting the loop
      which is incorrect because our page get didn't succeed. This could
      also make us return true when we shouldn't.
      Signed-off-by: NFilipe David Borba Manana <fdmanana@gmail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      91405151
    • G
      btrfs: replace EINVAL with EOPNOTSUPP for dev_replace raid56 · c81d5767
      Gui Hecheng 提交于
      To return EOPNOTSUPP is more user friendly than to return EINVAL,
      and then user-space tool will show that the dev_replace operation
      for raid56 is not currently supported rather than showing that
      there is an invalid argument.
      Signed-off-by: NGui Hecheng <guihc.fnst@cn.fujitsu.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      c81d5767