1. 21 7月, 2018 1 次提交
  2. 17 7月, 2018 1 次提交
    • Q
      btrfs: scrub: Don't use inode page cache in scrub_handle_errored_block() · 665d4953
      Qu Wenruo 提交于
      In commit ac0b4145 ("btrfs: scrub: Don't use inode pages for device
      replace") we removed the branch of copy_nocow_pages() to avoid
      corruption for compressed nodatasum extents.
      
      However above commit only solves the problem in scrub_extent(), if
      during scrub_pages() we failed to read some pages,
      sctx->no_io_error_seen will be non-zero and we go to fixup function
      scrub_handle_errored_block().
      
      In scrub_handle_errored_block(), for sctx without csum (no matter if
      we're doing replace or scrub) we go to scrub_fixup_nodatasum() routine,
      which does the similar thing with copy_nocow_pages(), but does it
      without the extra check in copy_nocow_pages() routine.
      
      So for test cases like btrfs/100, where we emulate read errors during
      replace/scrub, we could corrupt compressed extent data again.
      
      This patch will fix it just by avoiding any "optimization" for
      nodatasum, just falls back to the normal fixup routine by try read from
      any good copy.
      
      This also solves WARN_ON() or dead lock caused by lame backref iteration
      in scrub_fixup_nodatasum() routine.
      
      The deadlock or WARN_ON() won't be triggered before commit ac0b4145
      ("btrfs: scrub: Don't use inode pages for device replace") since
      copy_nocow_pages() have better locking and extra check for data extent,
      and it's already doing the fixup work by try to read data from any good
      copy, so it won't go scrub_fixup_nodatasum() anyway.
      
      This patch disables the faulty code and will be removed completely in a
      followup patch.
      
      Fixes: ac0b4145 ("btrfs: scrub: Don't use inode pages for device replace")
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      665d4953
  3. 15 7月, 2018 4 次提交
  4. 13 7月, 2018 2 次提交
  5. 06 7月, 2018 10 次提交
    • L
      Fix up non-directory creation in SGID directories · 0fa3ecd8
      Linus Torvalds 提交于
      sgid directories have special semantics, making newly created files in
      the directory belong to the group of the directory, and newly created
      subdirectories will also become sgid.  This is historically used for
      group-shared directories.
      
      But group directories writable by non-group members should not imply
      that such non-group members can magically join the group, so make sure
      to clear the sgid bit on non-directories for non-members (but remember
      that sgid without group execute means "mandatory locking", just to
      confuse things even more).
      Reported-by: NJann Horn <jannh@google.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0fa3ecd8
    • S
      cifs: Fix stack out-of-bounds in smb{2,3}_create_lease_buf() · 729c0c9d
      Stefano Brivio 提交于
      smb{2,3}_create_lease_buf() store a lease key in the lease
      context for later usage on a lease break.
      
      In most paths, the key is currently sourced from data that
      happens to be on the stack near local variables for oplock in
      SMB2_open() callers, e.g. from open_shroot(), whereas
      smb2_open_file() properly allocates space on its stack for it.
      
      The address of those local variables holding the oplock is then
      passed to create_lease_buf handlers via SMB2_open(), and 16
      bytes near oplock are used. This causes a stack out-of-bounds
      access as reported by KASAN on SMB2.1 and SMB3 mounts (first
      out-of-bounds access is shown here):
      
      [  111.528823] BUG: KASAN: stack-out-of-bounds in smb3_create_lease_buf+0x399/0x3b0 [cifs]
      [  111.530815] Read of size 8 at addr ffff88010829f249 by task mount.cifs/985
      [  111.532838] CPU: 3 PID: 985 Comm: mount.cifs Not tainted 4.18.0-rc3+ #91
      [  111.534656] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [  111.536838] Call Trace:
      [  111.537528]  dump_stack+0xc2/0x16b
      [  111.540890]  print_address_description+0x6a/0x270
      [  111.542185]  kasan_report+0x258/0x380
      [  111.544701]  smb3_create_lease_buf+0x399/0x3b0 [cifs]
      [  111.546134]  SMB2_open+0x1ef8/0x4b70 [cifs]
      [  111.575883]  open_shroot+0x339/0x550 [cifs]
      [  111.591969]  smb3_qfs_tcon+0x32c/0x1e60 [cifs]
      [  111.617405]  cifs_mount+0x4f3/0x2fc0 [cifs]
      [  111.674332]  cifs_smb3_do_mount+0x263/0xf10 [cifs]
      [  111.677915]  mount_fs+0x55/0x2b0
      [  111.679504]  vfs_kern_mount.part.22+0xaa/0x430
      [  111.684511]  do_mount+0xc40/0x2660
      [  111.698301]  ksys_mount+0x80/0xd0
      [  111.701541]  do_syscall_64+0x14e/0x4b0
      [  111.711807]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  111.713665] RIP: 0033:0x7f372385b5fa
      [  111.715311] Code: 48 8b 0d 99 78 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 66 78 2c 00 f7 d8 64 89 01 48
      [  111.720330] RSP: 002b:00007ffff27049d8 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5
      [  111.722601] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f372385b5fa
      [  111.724842] RDX: 000055c2ecdc73b2 RSI: 000055c2ecdc73f9 RDI: 00007ffff270580f
      [  111.727083] RBP: 00007ffff2705804 R08: 000055c2ee976060 R09: 0000000000001000
      [  111.729319] R10: 0000000000000000 R11: 0000000000000206 R12: 00007f3723f4d000
      [  111.731615] R13: 000055c2ee976060 R14: 00007f3723f4f90f R15: 0000000000000000
      
      [  111.735448] The buggy address belongs to the page:
      [  111.737420] page:ffffea000420a7c0 count:0 mapcount:0 mapping:0000000000000000 index:0x0
      [  111.739890] flags: 0x17ffffc0000000()
      [  111.741750] raw: 0017ffffc0000000 0000000000000000 dead000000000200 0000000000000000
      [  111.744216] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
      [  111.746679] page dumped because: kasan: bad access detected
      
      [  111.750482] Memory state around the buggy address:
      [  111.752562]  ffff88010829f100: 00 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 00
      [  111.754991]  ffff88010829f180: 00 00 f2 f2 00 00 00 00 00 00 00 00 00 00 00 00
      [  111.757401] >ffff88010829f200: 00 00 00 00 00 f1 f1 f1 f1 01 f2 f2 f2 f2 f2 f2
      [  111.759801]                                               ^
      [  111.762034]  ffff88010829f280: f2 02 f2 f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00
      [  111.764486]  ffff88010829f300: f2 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  111.766913] ==================================================================
      
      Lease keys are however already generated and stored in fid data
      on open and create paths: pass them down to the lease context
      creation handlers and use them.
      Suggested-by: NAurélien Aptel <aaptel@suse.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      Fixes: b8c32dbb ("CIFS: Request SMB2.1 leases")
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      729c0c9d
    • P
      cifs: Fix infinite loop when using hard mount option · 7ffbe655
      Paulo Alcantara 提交于
      For every request we send, whether it is SMB1 or SMB2+, we attempt to
      reconnect tcon (cifs_reconnect_tcon or smb2_reconnect) before carrying
      out the request.
      
      So, while server->tcpStatus != CifsNeedReconnect, we wait for the
      reconnection to succeed on wait_event_interruptible_timeout(). If it
      returns, that means that either the condition was evaluated to true, or
      timeout elapsed, or it was interrupted by a signal.
      
      Since we're not handling the case where the process woke up due to a
      received signal (-ERESTARTSYS), the next call to
      wait_event_interruptible_timeout() will _always_ fail and we end up
      looping forever inside either cifs_reconnect_tcon() or smb2_reconnect().
      
      Here's an example of how to trigger that:
      
      $ mount.cifs //foo/share /mnt/test -o
      username=foo,password=foo,vers=1.0,hard
      
      (break connection to server before executing bellow cmd)
      $ stat -f /mnt/test & sleep 140
      [1] 2511
      
      $ ps -aux -q 2511
      USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      root      2511  0.0  0.0  12892  1008 pts/0    S    12:24   0:00 stat -f
      /mnt/test
      
      $ kill -9 2511
      
      (wait for a while; process is stuck in the kernel)
      $ ps -aux -q 2511
      USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
      root      2511 83.2  0.0  12892  1008 pts/0    R    12:24  30:01 stat -f
      /mnt/test
      
      By using 'hard' mount point means that cifs.ko will keep retrying
      indefinitely, however we must allow the process to be killed otherwise
      it would hang the system.
      Signed-off-by: NPaulo Alcantara <palcantara@suse.de>
      Cc: stable@vger.kernel.org
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      7ffbe655
    • S
      cifs: Fix slab-out-of-bounds in send_set_info() on SMB2 ACE setting · f46ecbd9
      Stefano Brivio 提交于
      A "small" CIFS buffer is not big enough in general to hold a
      setacl request for SMB2, and we end up overflowing the buffer in
      send_set_info(). For instance:
      
       # mount.cifs //127.0.0.1/test /mnt/test -o username=test,password=test,nounix,cifsacl
       # touch /mnt/test/acltest
       # getcifsacl /mnt/test/acltest
       REVISION:0x1
       CONTROL:0x9004
       OWNER:S-1-5-21-2926364953-924364008-418108241-1000
       GROUP:S-1-22-2-1001
       ACL:S-1-5-21-2926364953-924364008-418108241-1000:ALLOWED/0x0/0x1e01ff
       ACL:S-1-22-2-1001:ALLOWED/0x0/R
       ACL:S-1-22-2-1001:ALLOWED/0x0/R
       ACL:S-1-5-21-2926364953-924364008-418108241-1000:ALLOWED/0x0/0x1e01ff
       ACL:S-1-1-0:ALLOWED/0x0/R
       # setcifsacl -a "ACL:S-1-22-2-1004:ALLOWED/0x0/R" /mnt/test/acltest
      
      this setacl will cause the following KASAN splat:
      
      [  330.777927] BUG: KASAN: slab-out-of-bounds in send_set_info+0x4dd/0xc20 [cifs]
      [  330.779696] Write of size 696 at addr ffff88010d5e2860 by task setcifsacl/1012
      
      [  330.781882] CPU: 1 PID: 1012 Comm: setcifsacl Not tainted 4.18.0-rc2+ #2
      [  330.783140] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
      [  330.784395] Call Trace:
      [  330.784789]  dump_stack+0xc2/0x16b
      [  330.786777]  print_address_description+0x6a/0x270
      [  330.787520]  kasan_report+0x258/0x380
      [  330.788845]  memcpy+0x34/0x50
      [  330.789369]  send_set_info+0x4dd/0xc20 [cifs]
      [  330.799511]  SMB2_set_acl+0x76/0xa0 [cifs]
      [  330.801395]  set_smb2_acl+0x7ac/0xf30 [cifs]
      [  330.830888]  cifs_xattr_set+0x963/0xe40 [cifs]
      [  330.840367]  __vfs_setxattr+0x84/0xb0
      [  330.842060]  __vfs_setxattr_noperm+0xe6/0x370
      [  330.843848]  vfs_setxattr+0xc2/0xd0
      [  330.845519]  setxattr+0x258/0x320
      [  330.859211]  path_setxattr+0x15b/0x1b0
      [  330.864392]  __x64_sys_setxattr+0xc0/0x160
      [  330.866133]  do_syscall_64+0x14e/0x4b0
      [  330.876631]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  330.878503] RIP: 0033:0x7ff2e507db0a
      [  330.880151] Code: 48 8b 0d 89 93 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 bc 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 56 93 2c 00 f7 d8 64 89 01 48
      [  330.885358] RSP: 002b:00007ffdc4903c18 EFLAGS: 00000246 ORIG_RAX: 00000000000000bc
      [  330.887733] RAX: ffffffffffffffda RBX: 000055d1170de140 RCX: 00007ff2e507db0a
      [  330.890067] RDX: 000055d1170de7d0 RSI: 000055d115b39184 RDI: 00007ffdc4904818
      [  330.892410] RBP: 0000000000000001 R08: 0000000000000000 R09: 000055d1170de7e4
      [  330.894785] R10: 00000000000002b8 R11: 0000000000000246 R12: 0000000000000007
      [  330.897148] R13: 000055d1170de0c0 R14: 0000000000000008 R15: 000055d1170de550
      
      [  330.901057] Allocated by task 1012:
      [  330.902888]  kasan_kmalloc+0xa0/0xd0
      [  330.904714]  kmem_cache_alloc+0xc8/0x1d0
      [  330.906615]  mempool_alloc+0x11e/0x380
      [  330.908496]  cifs_small_buf_get+0x35/0x60 [cifs]
      [  330.910510]  smb2_plain_req_init+0x4a/0xd60 [cifs]
      [  330.912551]  send_set_info+0x198/0xc20 [cifs]
      [  330.914535]  SMB2_set_acl+0x76/0xa0 [cifs]
      [  330.916465]  set_smb2_acl+0x7ac/0xf30 [cifs]
      [  330.918453]  cifs_xattr_set+0x963/0xe40 [cifs]
      [  330.920426]  __vfs_setxattr+0x84/0xb0
      [  330.922284]  __vfs_setxattr_noperm+0xe6/0x370
      [  330.924213]  vfs_setxattr+0xc2/0xd0
      [  330.926008]  setxattr+0x258/0x320
      [  330.927762]  path_setxattr+0x15b/0x1b0
      [  330.929592]  __x64_sys_setxattr+0xc0/0x160
      [  330.931459]  do_syscall_64+0x14e/0x4b0
      [  330.933314]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [  330.936843] Freed by task 0:
      [  330.938588] (stack is not available)
      
      [  330.941886] The buggy address belongs to the object at ffff88010d5e2800
       which belongs to the cache cifs_small_rq of size 448
      [  330.946362] The buggy address is located 96 bytes inside of
       448-byte region [ffff88010d5e2800, ffff88010d5e29c0)
      [  330.950722] The buggy address belongs to the page:
      [  330.952789] page:ffffea0004357880 count:1 mapcount:0 mapping:ffff880108fdca80 index:0x0 compound_mapcount: 0
      [  330.955665] flags: 0x17ffffc0008100(slab|head)
      [  330.957760] raw: 0017ffffc0008100 dead000000000100 dead000000000200 ffff880108fdca80
      [  330.960356] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
      [  330.963005] page dumped because: kasan: bad access detected
      
      [  330.967039] Memory state around the buggy address:
      [  330.969255]  ffff88010d5e2880: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  330.971833]  ffff88010d5e2900: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [  330.974397] >ffff88010d5e2980: 00 00 00 00 00 00 00 00 fc fc fc fc fc fc fc fc
      [  330.976956]                                            ^
      [  330.979226]  ffff88010d5e2a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  330.981755]  ffff88010d5e2a80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  330.984225] ==================================================================
      
      Fix this by allocating a regular CIFS buffer in
      smb2_plain_req_init() if the request command is SMB2_SET_INFO.
      Reported-by: NJianhong Yin <jiyin@redhat.com>
      Fixes: 366ed846 ("cifs: Use smb 2 - 3 and cifsacl mount options setacl function")
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NStefano Brivio <sbrivio@redhat.com>
      Reviewed-and-tested-by: NAurelien Aptel <aaptel@suse.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      f46ecbd9
    • P
      cifs: Fix memory leak in smb2_set_ea() · 6aa0c114
      Paulo Alcantara 提交于
      This patch fixes a memory leak when doing a setxattr(2) in SMB2+.
      Signed-off-by: NPaulo Alcantara <palcantara@suse.de>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NAurelien Aptel <aaptel@suse.com>
      6aa0c114
    • R
      cifs: fix SMB1 breakage · 81f39f95
      Ronnie Sahlberg 提交于
      SMB1 mounting broke in commit 35e2cc1b
      ("cifs: Use correct packet length in SMB2_TRANSFORM header")
      Fix it and also rename smb2_rqst_len to smb_rqst_len
      to make it less unobvious that the function is also called from
      CIFS/SMB1
      
      Good job by Paulo reviewing and cleaning up Ronnie's original patch.
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPaulo Alcantara <palcantara@suse.de>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      81f39f95
    • P
      cifs: Fix validation of signed data in smb2 · 8de8c460
      Paulo Alcantara 提交于
      Fixes: c713c877 ("cifs: push rfc1002 generation down the stack")
      
      We failed to validate signed data returned by the server because
      __cifs_calc_signature() now expects to sign the actual data in iov but
      we were also passing down the rfc1002 length.
      
      Fix smb3_calc_signature() to calculate signature of rfc1002 length prior
      to passing only the actual data iov[1-N] to __cifs_calc_signature(). In
      addition, there are a few cases where no rfc1002 length is passed so we
      make sure there's one (iov_len == 4).
      Signed-off-by: NPaulo Alcantara <palcantara@suse.de>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      8de8c460
    • P
      cifs: Fix validation of signed data in smb3+ · 27c32b49
      Paulo Alcantara 提交于
      Fixes: c713c877 ("cifs: push rfc1002 generation down the stack")
      
      We failed to validate signed data returned by the server because
      __cifs_calc_signature() now expects to sign the actual data in iov but
      we were also passing down the rfc1002 length.
      
      Fix smb3_calc_signature() to calculate signature of rfc1002 length prior
      to passing only the actual data iov[1-N] to __cifs_calc_signature(). In
      addition, there are a few cases where no rfc1002 length is passed so we
      make sure there's one (iov_len == 4).
      Signed-off-by: NPaulo Alcantara <palcantara@suse.de>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      27c32b49
    • L
      cifs: Fix use after free of a mid_q_entry · 696e420b
      Lars Persson 提交于
      With protocol version 2.0 mounts we have seen crashes with corrupt mid
      entries. Either the server->pending_mid_q list becomes corrupt with a
      cyclic reference in one element or a mid object fetched by the
      demultiplexer thread becomes overwritten during use.
      
      Code review identified a race between the demultiplexer thread and the
      request issuing thread. The demultiplexer thread seems to be written
      with the assumption that it is the sole user of the mid object until
      it calls the mid callback which either wakes the issuer task or
      deletes the mid.
      
      This assumption is not true because the issuer task can be woken up
      earlier by a signal. If the demultiplexer thread has proceeded as far
      as setting the mid_state to MID_RESPONSE_RECEIVED then the issuer
      thread will happily end up calling cifs_delete_mid while the
      demultiplexer thread still is using the mid object.
      
      Inserting a delay in the cifs demultiplexer thread widens the race
      window and makes reproduction of the race very easy:
      
      		if (server->large_buf)
      			buf = server->bigbuf;
      
      +		usleep_range(500, 4000);
      
      		server->lstrp = jiffies;
      
      To resolve this I think the proper solution involves putting a
      reference count on the mid object. This patch makes sure that the
      demultiplexer thread holds a reference until it has finished
      processing the transaction.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NLars Persson <larper@axis.com>
      Acked-by: NPaulo Alcantara <palcantara@suse.de>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      696e420b
    • L
      autofs: rename 'autofs' module back to 'autofs4' · d02d21ea
      Linus Torvalds 提交于
      It turns out that systemd has a bug: it wants to load the autofs module
      early because of some initialization ordering with udev, and it doesn't
      do that correctly.  Everywhere else it does the proper "look up module
      name" that does the proper alias resolution, but in that early code, it
      just uses a hardcoded "autofs4" for the module name.
      
      The result of that is that as of commit a2225d93 ("autofs: remove
      left-over autofs4 stubs"), you get
      
          systemd[1]: Failed to insert module 'autofs4': No such file or directory
      
      in the system logs, and a lack of module loading.  All this despite the
      fact that we had very clearly marked 'autofs4' as an alias for this
      module.
      
      What's so ridiculous about this is that literally everything else does
      the module alias handling correctly, including really old versions of
      systemd (that just used 'modprobe' to do this), and even all the other
      systemd module loading code.
      
      Only that special systemd early module load code is broken, hardcoding
      the module names for not just 'autofs4', but also "ipv6", "unix",
      "ip_tables" and "virtio_rng".  Very annoying.
      
      Instead of creating an _additional_ separate compatibility 'autofs4'
      module, just rely on the fact that everybody else gets this right, and
      just call the module 'autofs4' for compatibility reasons, with 'autofs'
      as the alias name.
      
      That will allow the systemd people to fix their bugs, adding the proper
      alias handling, and maybe even fix the name of the module to be just
      "autofs" (so that they can _test_ the alias handling).  And eventually,
      we can revert this silly compatibility hack.
      
      See also
      
          https://github.com/systemd/systemd/issues/9501
          https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=902946
      
      for the systemd bug reports upstream and in the Debian bug tracker
      respectively.
      
      Fixes: a2225d93 ("autofs: remove left-over autofs4 stubs")
      Reported-by: NBen Hutchings <ben@decadent.org.uk>
      Reported-by: NMichael Biebl <biebl@debian.org>
      Cc: Ian Kent <raven@themaw.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d02d21ea
  6. 04 7月, 2018 1 次提交
  7. 03 7月, 2018 1 次提交
  8. 29 6月, 2018 1 次提交
    • L
      Revert changes to convert to ->poll_mask() and aio IOCB_CMD_POLL · a11e1d43
      Linus Torvalds 提交于
      The poll() changes were not well thought out, and completely
      unexplained.  They also caused a huge performance regression, because
      "->poll()" was no longer a trivial file operation that just called down
      to the underlying file operations, but instead did at least two indirect
      calls.
      
      Indirect calls are sadly slow now with the Spectre mitigation, but the
      performance problem could at least be largely mitigated by changing the
      "->get_poll_head()" operation to just have a per-file-descriptor pointer
      to the poll head instead.  That gets rid of one of the new indirections.
      
      But that doesn't fix the new complexity that is completely unwarranted
      for the regular case.  The (undocumented) reason for the poll() changes
      was some alleged AIO poll race fixing, but we don't make the common case
      slower and more complex for some uncommon special case, so this all
      really needs way more explanations and most likely a fundamental
      redesign.
      
      [ This revert is a revert of about 30 different commits, not reverted
        individually because that would just be unnecessarily messy  - Linus ]
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a11e1d43
  9. 28 6月, 2018 4 次提交
    • F
      Btrfs: fix mount failure when qgroup rescan is in progress · e4e7ede7
      Filipe Manana 提交于
      If a power failure happens while the qgroup rescan kthread is running,
      the next mount operation will always fail. This is because of a recent
      regression that makes qgroup_rescan_init() incorrectly return -EINVAL
      when we are mounting the filesystem (through btrfs_read_qgroup_config()).
      This causes the -EINVAL error to be returned regardless of any qgroup
      flags being set instead of returning the error only when neither of
      the flags BTRFS_QGROUP_STATUS_FLAG_RESCAN nor BTRFS_QGROUP_STATUS_FLAG_ON
      are set.
      
      A test case for fstests follows up soon.
      
      Fixes: 9593bf49 ("btrfs: qgroup: show more meaningful qgroup_rescan_init error message")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      e4e7ede7
    • C
      Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion · 717beb96
      Chris Mason 提交于
      The vm_fault_t conversion commit introduced a ret2 variable for tracking
      the integer return values from internal btrfs functions.  It was
      sometimes returning VM_FAULT_LOCKED for pages that were actually invalid
      and had been removed from the radix.  Something like this:
      
          ret2 = btrfs_delalloc_reserve_space() // returns zero on success
      
          lock_page(page)
          if (page->mapping != inode->i_mapping)
      	goto out_unlock;
      
      ...
      
      out_unlock:
          if (!ret2) {
      	    ...
      	    return VM_FAULT_LOCKED;
          }
      
      This ends up triggering this WARNING in btrfs_destroy_inode()
          WARN_ON(BTRFS_I(inode)->block_rsv.size);
      
      xfstests generic/095 was able to reliably reproduce the errors.
      
      Since out_unlock: is only used for errors, this fix moves it below the
      if (!ret2) check we use to return VM_FAULT_LOCKED for success.
      
      Fixes: a528a241 (btrfs: change return type of btrfs_page_mkwrite to vm_fault_t)
      Signed-off-by: NChris Mason <clm@fb.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      717beb96
    • Q
      btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf · 6f7de19e
      Qu Wenruo 提交于
      Commit ff3d27a0 ("btrfs: qgroup: Finish rescan when hit the last leaf
      of extent tree") added a new exit for rescan finish.
      
      However after finishing quota rescan, we set
      fs_info->qgroup_rescan_progress to (u64)-1 before we exit through the
      original exit path.
      While we missed that assignment of (u64)-1 in the new exit path.
      
      The end result is, the quota status item doesn't have the same value.
      (-1 vs the last bytenr + 1)
      Although it doesn't affect quota accounting, it's still better to keep
      the original behavior.
      Reported-by: NMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
      Fixes: ff3d27a0 ("btrfs: qgroup: Finish rescan when hit the last leaf of extent tree")
      Signed-off-by: NQu Wenruo <wqu@suse.com>
      Reviewed-by: NMisono Tomohiro <misono.tomohiro@jp.fujitsu.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      6f7de19e
    • C
      proc: add proc_seq_release · 877f919e
      Chunyu Hu 提交于
      kmemleak reported some memory leak on reading proc files. After adding
      some debug lines, find that proc_seq_fops is using seq_release as
      release handler, which won't handle the free of 'private' field of
      seq_file, while in fact the open handler proc_seq_open could create
      the private data with __seq_open_private when state_size is greater
      than zero. So after reading files created with proc_create_seq_private,
      such as /proc/timer_list and /proc/vmallocinfo, the private mem of a
      seq_file is not freed. Fix it by adding the paired proc_seq_release
      as the default release handler of proc_seq_ops instead of seq_release.
      
      Fixes: 44414d82 ("proc: introduce proc_create_seq_private")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      CC: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NChunyu Hu <chuhu@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      877f919e
  10. 27 6月, 2018 1 次提交
  11. 25 6月, 2018 8 次提交
    • D
      xfs: fix fdblocks accounting w/ RMAPBT per-AG reservation · d8cb5e42
      Darrick J. Wong 提交于
      In __xfs_ag_resv_init we incorrectly calculate the amount by which to
      decrease fdblocks when reserving blocks for the rmapbt.  Because rmapbt
      allocations do not decrease fdblocks, we must decrease fdblocks by the
      entire size of the requested reservation in order to achieve our goal of
      always having enough free blocks to satisfy an rmapbt expansion.
      
      This is in contrast to the refcountbt/finobt, which /do/ subtract from
      fdblocks whenever they allocate a block.  For this allocation type we
      preserve the existing behavior where we decrease fdblocks only by the
      requested reservation minus the size of the existing tree.
      
      This fixes the problem where the available block counts reported by
      statfs change across a remount if there had been an rmapbt size change
      since mount time.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      d8cb5e42
    • D
      xfs: ensure post-EOF zeroing happens after zeroing part of a file · e53c4b59
      Darrick J. Wong 提交于
      If a user asks us to zero_range part of a file, the end of the range is
      EOF, and not aligned to a page boundary, invoke writeback of the EOF
      page to ensure that the post-EOF part of the page is zeroed.  This
      ensures that we don't expose stale memory contents via mmap, if in a
      clumsy manner.
      
      Found by running generic/127 when it runs zero_range and mapread at EOF
      one after the other.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      e53c4b59
    • D
      xfs: fix off-by-one error in xfs_rtalloc_query_range · a3a374bf
      Darrick J. Wong 提交于
      In commit 8ad560d2 ("xfs: strengthen rtalloc query range checks")
      we strengthened the input parameter checks in the rtbitmap range query
      function, but introduced an off-by-one error in the process.  The call
      to xfs_rtfind_forw deals with the high key being rextents, but we clamp
      the high key to rextents - 1.  This causes the returned results to stop
      one block short of the end of the rtdev, which is incorrect.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      a3a374bf
    • D
      xfs: fix uninitialized field in rtbitmap fsmap backend · 232d0a24
      Darrick J. Wong 提交于
      Initialize the extent count field of the high key so that when we use
      the high key to synthesize an 'unknown owner' record (i.e. used space
      record) at the end of the queried range we have a field with which to
      compute rm_blockcount.  This is not strictly necessary because the
      synthesizer never uses the rm_blockcount field, but we can shut up the
      static code analysis anyway.
      
      Coverity-id: 1437358
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      232d0a24
    • D
      xfs: recheck reflink state after grabbing ILOCK_SHARED for a write · 5bd88d15
      Darrick J. Wong 提交于
      The reflink iflag could have changed since the earlier unlocked check,
      so if we got ILOCK_SHARED for a write and but we're now a reflink inode
      we have to switch to ILOCK_EXCL and relock.
      
      This helps us avoid blowing lock assertions in things like generic/166:
      
      XFS: Assertion failed: xfs_isilocked(ip, XFS_ILOCK_EXCL), file: fs/xfs/xfs_reflink.c, line: 383
      WARNING: CPU: 1 PID: 24707 at fs/xfs/xfs_message.c:104 assfail+0x25/0x30 [xfs]
      Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug]
      CPU: 1 PID: 24707 Comm: xfs_io Not tainted 4.18.0-rc1-djw #1
      Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:assfail+0x25/0x30 [xfs]
      Code: ff 0f 0b c3 90 66 66 66 66 90 48 89 f1 41 89 d0 48 c7 c6 e8 ef 1b a0 48 89 fa 31 ff e8 54 f9 ff ff 80 3d fd ba 0f 00 00 75 03 <0f> 0b c3 0f 0b 66 0f 1f 44 00 00 66 66 66 66 90 48 63 f6 49 89 f9
      RSP: 0018:ffffc90006423ad8 EFLAGS: 00010246
      RAX: 0000000000000000 RBX: ffff880030b65e80 RCX: 0000000000000000
      RDX: 00000000ffffffc0 RSI: 000000000000000a RDI: ffffffffa01b0447
      RBP: ffffc90006423c10 R08: 0000000000000000 R09: 0000000000000000
      R10: ffff88003d43fc30 R11: f000000000000000 R12: ffff880077cda000
      R13: 0000000000000000 R14: ffffc90006423c30 R15: ffffc90006423bf9
      FS:  00007feba8986800(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000000138ab58 CR3: 000000003d40a000 CR4: 00000000000006a0
      Call Trace:
       xfs_reflink_allocate_cow+0x24c/0x3d0 [xfs]
       xfs_file_iomap_begin+0x6d2/0xeb0 [xfs]
       ? iomap_to_fiemap+0x80/0x80
       iomap_apply+0x5e/0x130
       iomap_dio_rw+0x2e0/0x400
       ? iomap_to_fiemap+0x80/0x80
       ? xfs_file_dio_aio_write+0x133/0x4a0 [xfs]
       xfs_file_dio_aio_write+0x133/0x4a0 [xfs]
       xfs_file_write_iter+0x7b/0xb0 [xfs]
       __vfs_write+0x16f/0x1f0
       vfs_write+0xc8/0x1c0
       ksys_pwrite64+0x74/0x90
       do_syscall_64+0x56/0x180
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      5bd88d15
    • D
      xfs: don't allow insert-range to shift extents past the maximum offset · f62cb48e
      Darrick J. Wong 提交于
      Zorro Lang reports that generic/485 blows an assert on a filesystem with
      512 byte blocks.  The test tries to fallocate a post-eof extent at the
      maximum file size and calls insert range to shift the extents right by
      two blocks.  On a 512b block filesystem this causes startoff to overflow
      the 54-bit startoff field, leading to the assert.
      
      Therefore, always check the rightmost extent to see if it would overflow
      prior to invoking the insert range machinery.
      
      Reported-by: zlang@redhat.com
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=200137Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      f62cb48e
    • D
      xfs: don't trip over negative free space in xfs_reserve_blocks · aafe12ce
      Darrick J. Wong 提交于
      If we somehow end up with a filesystem that has fewer free blocks than
      the blocks set aside to avoid ENOSPC deadlocks, it's possible that the
      free space calculation in xfs_reserve_blocks will spit out a negative
      number (because percpu_counter_sum returns s64).  We fail to notice
      this negative number and set fdblks_delta to it.  Now we increment
      fdblocks(!) and the unsigned type of m_resblks means that we end up
      setting a ridiculously huge m_resblks reservation.
      
      Avoid this comedy of errors by detecting the negative free space and
      returning -ENOSPC.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      aafe12ce
    • D
      xfs: allow empty transactions while frozen · 10ee2526
      Darrick J. Wong 提交于
      In commit e89c0413 ("xfs: implement the GETFSMAP ioctl") we
      created the ability to obtain empty transactions.  These transactions
      have no log or block reservations and therefore can't modify anything.
      Since they're also NO_WRITECOUNT they can run while the fs is frozen,
      so we don't need to WARN_ON about that usage.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      10ee2526
  12. 22 6月, 2018 6 次提交
    • F
      Btrfs: fix return value on rename exchange failure · c5b4a50b
      Filipe Manana 提交于
      If we failed during a rename exchange operation after starting/joining a
      transaction, we would end up replacing the return value, stored in the
      local 'ret' variable, with the return value from btrfs_end_transaction().
      So this could end up returning 0 (success) to user space despite the
      operation having failed and aborted the transaction, because if there are
      multiple tasks having a reference on the transaction at the time
      btrfs_end_transaction() is called by the rename exchange, that function
      returns 0 (otherwise it returns -EIO and not the original error value).
      So fix this by not overwriting the return value on error after getting
      a transaction handle.
      
      Fixes: cdd1fedf ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT")
      CC: stable@vger.kernel.org # 4.9+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      c5b4a50b
    • D
      xfs: xfs_iflush_abort() can be called twice on cluster writeback failure · e53946db
      Dave Chinner 提交于
      When a corrupt inode is detected during xfs_iflush_cluster, we can
      get a shutdown ASSERT failure like this:
      
      XFS (pmem1): Metadata corruption detected at xfs_symlink_shortform_verify+0x5c/0xa0, inode 0x86627 data fork
      XFS (pmem1): Unmount and run xfs_repair
      XFS (pmem1): xfs_do_force_shutdown(0x8) called from line 3372 of file fs/xfs/xfs_inode.c.  Return address = ffffffff814f4116
      XFS (pmem1): Corruption of in-memory data detected.  Shutting down filesystem
      XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c.  Return address = ffffffff814a8a88
      XFS (pmem1): xfs_do_force_shutdown(0x1) called from line 222 of file fs/xfs/libxfs/xfs_defer.c.  Return address = ffffffff814a8ef9
      XFS (pmem1): Please umount the filesystem and rectify the problem(s)
      XFS: Assertion failed: xfs_isiflocked(ip), file: fs/xfs/xfs_inode.h, line: 258
      .....
      Call Trace:
       xfs_iflush_abort+0x10a/0x110
       xfs_iflush+0xf3/0x390
       xfs_inode_item_push+0x126/0x1e0
       xfsaild+0x2c5/0x890
       kthread+0x11c/0x140
       ret_from_fork+0x24/0x30
      
      Essentially, xfs_iflush_abort() has been called twice on the
      original inode that that was flushed. This happens because the
      inode has been flushed to teh buffer successfully via
      xfs_iflush_int(), and so when another inode is detected as corrupt
      in xfs_iflush_cluster, the buffer is marked stale and EIO, and
      iodone callbacks are run on it.
      
      Running the iodone callbacks walks across the original inode and
      calls xfs_iflush_abort() on it. When xfs_iflush_cluster() returns
      to xfs_iflush(), it runs the error path for that function, and that
      calls xfs_iflush_abort() on the inode a second time, leading to the
      above assert failure as the inode is not flush locked anymore.
      
      This bug has been there a long time.
      
      The simple fix would be to just avoid calling xfs_iflush_abort() in
      xfs_iflush() if we've got a failure from xfs_iflush_cluster().
      However, xfs_iflush_cluster() has magic delwri buffer handling that
      means it may or may not have run IO completion on the buffer, and
      hence sometimes we have to call xfs_iflush_abort() from
      xfs_iflush(), and sometimes we shouldn't.
      
      After reading through all the error paths and the delwri buffer
      code, it's clear that the error handling in xfs_iflush_cluster() is
      unnecessary. If the buffer is delwri, it leaves it on the delwri
      list so that when the delwri list is submitted it sees a shutdown
      fliesystem in xfs_buf_submit() and that marks the buffer stale, EIO
      and runs IO completion. i.e. exactly what xfs+iflush_cluster() does
      when it's not a delwri buffer. Further, marking a buffer stale
      clears the _XBF_DELWRI_Q flag on the buffer, which means when
      submission of the buffer occurs, it just skips over it and releases
      it.
      
      IOWs, the error handling in xfs_iflush_cluster doesn't need to care
      if the buffer is already on a the delwri queue or not - it just
      needs to mark the buffer stale, EIO and run completions. That means
      we can just use the easy fix for xfs_iflush() to avoid the double
      abort.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e53946db
    • D
      xfs: More robust inode extent count validation · 23fcb334
      Dave Chinner 提交于
      When the inode is in extent format, it can't have more extents that
      fit in the inode fork. We don't currenty check this, and so this
      corruption goes unnoticed by the inode verifiers. This can lead to
      crashes operating on invalid in-memory structures.
      
      Attempts to access such a inode will now error out in the verifier
      rather than allowing modification operations to proceed.
      Reported-by: NWen Xu <wen.xu@gatech.edu>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      [darrick: fix a typedef, add some braces and breaks to shut up compiler warnings]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      23fcb334
    • C
      xfs: simplify xfs_bmap_punch_delalloc_range · e2ac8363
      Christoph Hellwig 提交于
      Instead of using xfs_bmapi_read to find delalloc extents and then punch
      them out using xfs_bunmapi, opencode the loop to iterate over the extents
      and call xfs_bmap_del_extent_delay directly.  This both simplifies the
      code and reduces the number of extent tree lookups required.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      e2ac8363
    • L
      btrfs: fix invalid-free in btrfs_extent_same · 22883ddc
      Lu Fengqi 提交于
      If this condition ((BTRFS_I(src)->flags & BTRFS_INODE_NODATASUM) !=
      		   (BTRFS_I(dst)->flags & BTRFS_INODE_NODATASUM))
      is hit, we will go to free the uninitialized cmp.src_pages and
      cmp.dst_pages.
      
      Fixes: 67b07bd4 ("Btrfs: reuse cmp workspace in EXTENT_SAME ioctl")
      Signed-off-by: NLu Fengqi <lufq.fnst@cn.fujitsu.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      22883ddc
    • F
      Btrfs: fix physical offset reported by fiemap for inline extents · f0986318
      Filipe Manana 提交于
      Commit 9d311e11 ("Btrfs: fiemap: pass correct bytenr when
      fm_extent_count is zero") introduced a regression where we no longer
      report 0 as the physical offset for inline extents (and other extents
      with a special block_start value). This is because it always sets the
      variable used to report the physical offset ("disko") as em->block_start
      plus some offset, and em->block_start has the value 18446744073709551614
      ((u64) -2) for inline extents.
      
      This made the btrfs test 004 (from fstests) often fail, for example, for
      a file with an inline extent we have the following items in the subvolume
      tree:
      
          item 101 key (418 INODE_ITEM 0) itemoff 11029 itemsize 160
                 generation 25 transid 38 size 1525 nbytes 1525
                 block group 0 mode 100666 links 1 uid 0 gid 0 rdev 0
                 sequence 0 flags 0x2(none)
                 atime 1529342058.461891730 (2018-06-18 18:14:18)
                 ctime 1529342058.461891730 (2018-06-18 18:14:18)
                 mtime 1529342058.461891730 (2018-06-18 18:14:18)
                 otime 1529342055.869892885 (2018-06-18 18:14:15)
          item 102 key (418 INODE_REF 264) itemoff 11016 itemsize 13
                 index 25 namelen 3 name: fc7
          item 103 key (418 EXTENT_DATA 0) itemoff 9470 itemsize 1546
                 generation 38 type 0 (inline)
                 inline extent data size 1525 ram_bytes 1525 compression 0 (none)
      
      Then when test 004 invoked fiemap against the file it got a non-zero
      physical offset:
      
       $ filefrag -v /mnt/p0/d4/d7/fc7
       Filesystem type is: 9123683e
       File size of /mnt/p0/d4/d7/fc7 is 1525 (1 block of 4096 bytes)
        ext:     logical_offset:        physical_offset: length:   expected: flags:
          0:        0..    4095: 18446744073709551614..      4093:   4096:             last,not_aligned,inline,eof
       /mnt/p0/d4/d7/fc7: 1 extent found
      
      This resulted in the test failing like this:
      
      btrfs/004 49s ... [failed, exit status 1]- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad)
          --- tests/btrfs/004.out	2016-08-23 10:17:35.027012095 +0100
          +++ /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad	2018-06-18 18:15:02.385872155 +0100
          @@ -1,3 +1,10 @@
           QA output created by 004
           *** test backref walking
          -*** done
          +./tests/btrfs/004: line 227: [: 7.55578637259143e+22: integer expression expected
          +ERROR: 7.55578637259143e+22 is not a valid numeric value.
          +unexpected output from
          +	/home/fdmanana/git/hub/btrfs-progs/btrfs inspect-internal logical-resolve -s 65536 -P 7.55578637259143e+22 /home/fdmanana/btrfs-tests/scratch_1
          ...
          (Run 'diff -u tests/btrfs/004.out /home/fdmanana/git/hub/xfstests/results//btrfs/004.out.bad'  to see the entire diff)
      Ran: btrfs/004
      
      The large number in scientific notation reported as an invalid numeric
      value is the result from the filter passed to perl which multiplies the
      physical offset by the block size reported by fiemap.
      
      So fix this by ensuring the physical offset is always set to 0 when we
      are processing an extent with a special block_start value.
      
      Fixes: 9d311e11 ("Btrfs: fiemap: pass correct bytenr when fm_extent_count is zero")
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      f0986318