1. 07 4月, 2019 1 次提交
    • C
      block: remove CONFIG_LBDAF · 72deb455
      Christoph Hellwig 提交于
      Currently support for 64-bit sector_t and blkcnt_t is optional on 32-bit
      architectures.  These types are required to support block device and/or
      file sizes larger than 2 TiB, and have generally defaulted to on for
      a long time.  Enabling the option only increases the i386 tinyconfig
      size by 145 bytes, and many data structures already always use
      64-bit values for their in-core and on-disk data structures anyway,
      so there should not be a large change in dynamic memory usage either.
      
      Dropping this option removes a somewhat weird non-default config that
      has cause various bugs or compiler warnings when actually used.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      72deb455
  2. 30 3月, 2019 5 次提交
    • Y
      fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links · 23da9588
      YueHaibing 提交于
      Syzkaller reports:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN PTI
      CPU: 1 PID: 5373 Comm: syz-executor.0 Not tainted 5.0.0-rc8+ #3
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:put_links+0x101/0x440 fs/proc/proc_sysctl.c:1599
      Code: 00 0f 85 3a 03 00 00 48 8b 43 38 48 89 44 24 20 48 83 c0 38 48 89 c2 48 89 44 24 28 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 fe 02 00 00 48 8b 74 24 20 48 c7 c7 60 2a 9d 91
      RSP: 0018:ffff8881d828f238 EFLAGS: 00010202
      RAX: dffffc0000000000 RBX: ffff8881e01b1140 RCX: ffffffff8ee98267
      RDX: 0000000000000007 RSI: ffffc90001479000 RDI: ffff8881e01b1178
      RBP: dffffc0000000000 R08: ffffed103ee27259 R09: ffffed103ee27259
      R10: 0000000000000001 R11: ffffed103ee27258 R12: fffffffffffffff4
      R13: 0000000000000006 R14: ffff8881f59838c0 R15: dffffc0000000000
      FS:  00007f072254f700(0000) GS:ffff8881f7100000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fff8b286668 CR3: 00000001f0542002 CR4: 00000000007606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       drop_sysctl_table+0x152/0x9f0 fs/proc/proc_sysctl.c:1629
       get_subdir fs/proc/proc_sysctl.c:1022 [inline]
       __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
       br_netfilter_init+0xbc/0x1000 [br_netfilter]
       do_one_initcall+0xfa/0x5ca init/main.c:887
       do_init_module+0x204/0x5f6 kernel/module.c:3460
       load_module+0x66b2/0x8570 kernel/module.c:3808
       __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902
       do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007f072254ec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003
      RBP: 00007f072254ec70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007f072254f6bc
      R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
      Modules linked in: br_netfilter(+) dvb_usb_dibusb_mc_common dib3000mc dibx000_common dvb_usb_dibusb_common dvb_usb_dw2102 dvb_usb classmate_laptop palmas_regulator cn videobuf2_v4l2 v4l2_common snd_soc_bd28623 mptbase snd_usb_usx2y snd_usbmidi_lib snd_rawmidi wmi libnvdimm lockd sunrpc grace rc_kworld_pc150u rc_core rtc_da9063 sha1_ssse3 i2c_cros_ec_tunnel adxl34x_spi adxl34x nfnetlink lib80211 i5500_temp dvb_as102 dvb_core videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops udc_core lnbp22 leds_lp3952 hid_roccat_ryos s1d13xxxfb mtd vport_geneve openvswitch nf_conncount nf_nat_ipv6 nsh geneve udp_tunnel ip6_udp_tunnel snd_soc_mt6351 sis_agp phylink snd_soc_adau1761_spi snd_soc_adau1761 snd_soc_adau17x1 snd_soc_core snd_pcm_dmaengine ac97_bus snd_compress snd_soc_adau_utils snd_soc_sigmadsp_regmap snd_soc_sigmadsp raid_class hid_roccat_konepure hid_roccat_common hid_roccat c2port_duramar2150 core mdio_bcm_unimac iptable_security iptable_raw iptable_mangle
       iptable_nat nf_nat_ipv4 nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim devlink vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel joydev mousedev ide_pci_generic piix aesni_intel aes_x86_64 ide_core crypto_simd atkbd cryptd glue_helper serio_raw ata_generic pata_acpi i2c_piix4 floppy sch_fq_codel ip_tables x_tables ipv6 [last unloaded: lm73]
      Dumping ftrace buffer:
         (ftrace buffer empty)
      ---[ end trace 770020de38961fd0 ]---
      
      A new dir entry can be created in get_subdir and its 'header->parent' is
      set to NULL.  Only after insert_header success, it will be set to 'dir',
      otherwise 'header->parent' is set to NULL and drop_sysctl_table is called.
      However in err handling path of get_subdir, drop_sysctl_table also be
      called on 'new->header' regardless its value of parent pointer.  Then
      put_links is called, which triggers NULL-ptr deref when access member of
      header->parent.
      
      In fact we have multiple error paths which call drop_sysctl_table() there,
      upon failure on insert_links() we also call drop_sysctl_table().And even
      in the successful case on __register_sysctl_table() we still always call
      drop_sysctl_table().This patch fix it.
      
      Link: http://lkml.kernel.org/r/20190314085527.13244-1-yuehaibing@huawei.com
      Fixes: 0e47c99d ("sysctl: Replace root_list with links between sysctl_table_sets")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Acked-by: NLuis Chamberlain <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: <stable@vger.kernel.org>    [3.4+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      23da9588
    • R
      fs: fs_parser: fix printk format warning · 26203278
      Randy Dunlap 提交于
      Fix printk format warning (seen on i386 builds) by using ptrdiff format
      specifier (%t):
      
        fs/fs_parser.c:413:6: warning: format `%lu' expects argument of type `long unsigned int', but argument 3 has type `int' [-Wformat=]
      
      Link: http://lkml.kernel.org/r/19432668-ffd3-fbb2-af4f-1c8e48f6cc81@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26203278
    • Y
      fs/proc/kcore.c: make kcore_modules static · eebf3648
      YueHaibing 提交于
      Fix sparse warning:
      
        fs/proc/kcore.c:591:19: warning:
         symbol 'kcore_modules' was not declared. Should it be static?
      
      Link: http://lkml.kernel.org/r/20190320135417.13272-1-yuehaibing@huawei.comSigned-off-by: NYueHaibing <yuehaibing@huawei.com>
      Acked-by: NMukesh Ojha <mojha@codeaurora.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eebf3648
    • D
      ocfs2: fix inode bh swapping mixup in ocfs2_reflink_inodes_lock · e6a9467e
      Darrick J. Wong 提交于
      ocfs2_reflink_inodes_lock() can swap the inode1/inode2 variables so that
      we always grab cluster locks in order of increasing inode number.
      
      Unfortunately, we forget to swap the inode record buffer head pointers
      when we've done this, which leads to incorrect bookkeepping when we're
      trying to make the two inodes have the same refcount tree.
      
      This has the effect of causing filesystem shutdowns if you're trying to
      reflink data from inode 100 into inode 97, where inode 100 already has a
      refcount tree attached and inode 97 doesn't.  The reflink code decides
      to copy the refcount tree pointer from 100 to 97, but uses inode 97's
      inode record to open the tree root (which it doesn't have) and blows up.
      This issue causes filesystem shutdowns and metadata corruption!
      
      Link: http://lkml.kernel.org/r/20190312214910.GK20533@magnolia
      Fixes: 29ac8e85 ("ocfs2: implement the VFS clone_range, copy_range, and dedupe_range features")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NJoseph Qi <jiangqi903@gmail.com>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6a9467e
    • T
      fs/open.c: allow opening only regular files during execve() · 73601ea5
      Tetsuo Handa 提交于
      syzbot is hitting lockdep warning [1] due to trying to open a fifo
      during an execve() operation.  But we don't need to open non regular
      files during an execve() operation, for all files which we will need are
      the executable file itself and the interpreter programs like /bin/sh and
      ld-linux.so.2 .
      
      Since the manpage for execve(2) says that execve() returns EACCES when
      the file or a script interpreter is not a regular file, and the manpage
      for uselib(2) says that uselib() can return EACCES, and we use
      FMODE_EXEC when opening for execve()/uselib(), we can bail out if a non
      regular file is requested with FMODE_EXEC set.
      
      Since this deadlock followed by khungtaskd warnings is trivially
      reproducible by a local unprivileged user, and syzbot's frequent crash
      due to this deadlock defers finding other bugs, let's workaround this
      deadlock until we get a chance to find a better solution.
      
      [1] https://syzkaller.appspot.com/bug?id=b5095bfec44ec84213bac54742a82483aad578ce
      
      Link: http://lkml.kernel.org/r/1552044017-7890-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jpReported-by: Nsyzbot <syzbot+e93a80c1bb7c5c56e522461c149f8bf55eab1b2b@syzkaller.appspotmail.com>
      Fixes: 8924feff ("splice: lift pipe_lock out of splice_to_pipe()")
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <stable@vger.kernel.org>	[4.9+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73601ea5
  3. 28 3月, 2019 2 次提交
    • D
      afs: Fix StoreData op marshalling · 8c7ae38d
      David Howells 提交于
      The marshalling of AFS.StoreData, AFS.StoreData64 and YFS.StoreData64 calls
      generated by ->setattr() ops for the purpose of expanding a file is
      incorrect due to older documentation incorrectly describing the way the RPC
      'FileLength' parameter is meant to work.
      
      The older documentation says that this is the length the file is meant to
      end up at the end of the operation; however, it was never implemented this
      way in any of the servers, but rather the file is truncated down to this
      before the write operation is effected, and never expanded to it (and,
      indeed, it was renamed to 'TruncPos' in 2014).
      
      Fix this by setting the position parameter to the new file length and doing
      a zero-lengh write there.
      
      The bug causes Xwayland to SIGBUS due to unexpected non-expansion of a file
      it then mmaps.  This can be tested by giving the following test program a
      filename in an AFS directory:
      
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <unistd.h>
      	#include <fcntl.h>
      	#include <sys/mman.h>
      	int main(int argc, char *argv[])
      	{
      		char *p;
      		int fd;
      		if (argc != 2) {
      			fprintf(stderr,
      				"Format: test-trunc-mmap <file>\n");
      			exit(2);
      		}
      		fd = open(argv[1], O_RDWR | O_CREAT | O_TRUNC);
      		if (fd < 0) {
      			perror(argv[1]);
      			exit(1);
      		}
      		if (ftruncate(fd, 0x140008) == -1) {
      			perror("ftruncate");
      			exit(1);
      		}
      		p = mmap(NULL, 4096, PROT_READ | PROT_WRITE,
      			 MAP_SHARED, fd, 0);
      		if (p == MAP_FAILED) {
      			perror("mmap");
      			exit(1);
      		}
      		p[0] = 'a';
      		if (munmap(p, 4096) < 0) {
      			perror("munmap");
      			exit(1);
      		}
      		if (close(fd) < 0) {
      			perror("close");
      			exit(1);
      		}
      		exit(0);
      	}
      
      Fixes: 31143d5d ("AFS: implement basic file write support")
      Reported-by: NJonathan Billings <jsbillin@umich.edu>
      Tested-by: NJonathan Billings <jsbillin@umich.edu>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c7ae38d
    • A
      ceph: fix use-after-free on symlink traversal · daf5cc27
      Al Viro 提交于
      free the symlink body after the same RCU delay we have for freeing the
      struct inode itself, so that traversal during RCU pathwalk wouldn't step
      into freed memory.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      daf5cc27
  4. 26 3月, 2019 3 次提交
    • B
      xfs: serialize unaligned dio writes against all other dio writes · 2032a8a2
      Brian Foster 提交于
      XFS applies more strict serialization constraints to unaligned
      direct writes to accommodate things like direct I/O layer zeroing,
      unwritten extent conversion, etc. Unaligned submissions acquire the
      exclusive iolock and wait for in-flight dio to complete to ensure
      multiple submissions do not race on the same block and cause data
      corruption.
      
      This generally works in the case of an aligned dio followed by an
      unaligned dio, but the serialization is lost if I/Os occur in the
      opposite order. If an unaligned write is submitted first and
      immediately followed by an overlapping, aligned write, the latter
      submits without the typical unaligned serialization barriers because
      there is no indication of an unaligned dio still in-flight. This can
      lead to unpredictable results.
      
      To provide proper unaligned dio serialization, require that such
      direct writes are always the only dio allowed in-flight at one time
      for a particular inode. We already acquire the exclusive iolock and
      drain pending dio before submitting the unaligned dio. Wait once
      more after the dio submission to hold the iolock across the I/O and
      prevent further submissions until the unaligned I/O completes. This
      is heavy handed, but consistent with the current pre-submission
      serialization for unaligned direct writes.
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NAllison Henderson <allison.henderson@oracle.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      2032a8a2
    • R
      io_uring: offload write to async worker in case of -EAGAIN · 9bf7933f
      Roman Penyaev 提交于
      In case of direct write -EAGAIN will be returned if page cache was
      previously populated.  To avoid immediate completion of a request
      with -EAGAIN error write has to be offloaded to the async worker,
      like io_read() does.
      Signed-off-by: NRoman Penyaev <rpenyaev@suse.de>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-block@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9bf7933f
    • A
      io_uring: fix big-endian compat signal mask handling · 9e75ad5d
      Arnd Bergmann 提交于
      On big-endian architectures, the signal masks are differnet
      between 32-bit and 64-bit tasks, so we have to use a different
      function for reading them from user space.
      
      io_cqring_wait() initially got this wrong, and always interprets
      this as a native structure. This is ok on x86 and most arm64,
      but not on s390, ppc64be, mips64be, sparc64 and parisc.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9e75ad5d
  5. 25 3月, 2019 2 次提交
    • D
      xfs: prohibit fstrim in norecovery mode · ed79dac9
      Darrick J. Wong 提交于
      The xfs fstrim implementation uses the free space btrees to find free
      space that can be discarded.  If we haven't recovered the log, the bnobt
      will be stale and we absolutely *cannot* use stale metadata to zap the
      underlying storage.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NEric Sandeen <sandeen@redhat.com>
      ed79dac9
    • J
      locks: wake any locks blocked on request before deadlock check · 945ab8f6
      Jeff Layton 提交于
      Andreas reported that he was seeing the tdbtorture test fail in some
      cases with -EDEADLCK when it wasn't before. Some debugging showed that
      deadlock detection was sometimes discovering the caller's lock request
      itself in a dependency chain.
      
      While we remove the request from the blocked_lock_hash prior to
      reattempting to acquire it, any locks that are blocked on that request
      will still be present in the hash and will still have their fl_blocker
      pointer set to the current request.
      
      This causes posix_locks_deadlock to find a deadlock dependency chain
      when it shouldn't, as a lock request cannot block itself.
      
      We are going to end up waking all of those blocked locks anyway when we
      go to reinsert the request back into the blocked_lock_hash, so just do
      it prior to checking for deadlocks. This ensures that any lock blocked
      on the current request will no longer be part of any blocked request
      chain.
      
      URL: https://bugzilla.kernel.org/show_bug.cgi?id=202975
      Fixes: 5946c431 ("fs/locks: allow a lock request to block other requests.")
      Cc: stable@vger.kernel.org
      Reported-by: NAndreas Schneider <asn@redhat.com>
      Signed-off-by: NNeil Brown <neilb@suse.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      945ab8f6
  6. 24 3月, 2019 3 次提交
  7. 23 3月, 2019 11 次提交
    • Z
      ext4: cleanup bh release code in ext4_ind_remove_space() · 5e86bdda
      zhangyi (F) 提交于
      Currently, we are releasing the indirect buffer where we are done with
      it in ext4_ind_remove_space(), so we can see the brelse() and
      BUFFER_TRACE() everywhere.  It seems fragile and hard to read, and we
      may probably forget to release the buffer some day.  This patch cleans
      up the code by putting of the code which releases the buffers to the
      end of the function.
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      5e86bdda
    • Z
      ext4: brelse all indirect buffer in ext4_ind_remove_space() · 674a2b27
      zhangyi (F) 提交于
      All indirect buffers get by ext4_find_shared() should be released no
      mater the branch should be freed or not. But now, we forget to release
      the lower depth indirect buffers when removing space from the same
      higher depth indirect block. It will lead to buffer leak and futher
      more, it may lead to quota information corruption when using old quota,
      consider the following case.
      
       - Create and mount an empty ext4 filesystem without extent and quota
         features,
       - quotacheck and enable the user & group quota,
       - Create some files and write some data to them, and then punch hole
         to some files of them, it may trigger the buffer leak problem
         mentioned above.
       - Disable quota and run quotacheck again, it will create two new
         aquota files and write the checked quota information to them, which
         probably may reuse the freed indirect block(the buffer and page
         cache was not freed) as data block.
       - Enable quota again, it will invoke
         vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
         buffers and pagecache. Unfortunately, because of the buffer of quota
         data block is still referenced, quota code cannot read the up to date
         quota info from the device and lead to quota information corruption.
      
      This problem can be reproduced by xfstests generic/231 on ext3 file
      system or ext4 file system without extent and quota features.
      
      This patch fix this problem by releasing the missing indirect buffers,
      in ext4_ind_remove_space().
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: Nzhangyi (F) <yi.zhang@huawei.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: stable@kernel.org
      674a2b27
    • K
      x86/gart: Exclude GART aperture from kcore · ffc8599a
      Kairui Song 提交于
      On machines where the GART aperture is mapped over physical RAM,
      /proc/kcore contains the GART aperture range. Accessing the GART range via
      /proc/kcore results in a kernel crash.
      
      vmcore used to have the same issue, until it was fixed with commit
      2a3e83c6 ("x86/gart: Exclude GART aperture from vmcore")', leveraging
      existing hook infrastructure in vmcore to let /proc/vmcore return zeroes
      when attempting to read the aperture region, and so it won't read from the
      actual memory.
      
      Apply the same workaround for kcore. First implement the same hook
      infrastructure for kcore, then reuse the hook functions introduced in the
      previous vmcore fix. Just with some minor adjustment, rename some functions
      for more general usage, and simplify the hook infrastructure a bit as there
      is no module usage yet.
      Suggested-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NKairui Song <kasong@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJiri Bohac <jbohac@suse.cz>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Dave Young <dyoung@redhat.com>
      Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com
      
      ffc8599a
    • S
      cifs: update internal module version number · cf7d624f
      Steve French 提交于
      To 2.19
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      cf7d624f
    • S
      SMB3: Fix SMB3.1.1 guest mounts to Samba · 8c11a607
      Steve French 提交于
      Workaround problem with Samba responses to SMB3.1.1
      null user (guest) mounts.  The server doesn't set the
      expected flag in the session setup response so we have
      to do a similar check to what is done in smb3_validate_negotiate
      where we also check if the user is a null user (but not sec=krb5
      since username might not be passed in on mount for Kerberos case).
      
      Note that the commit below tightened the conditions and forced signing
      for the SMB2-TreeConnect commands as per MS-SMB2.
      However, this should only apply to normal user sessions and not for
      cases where there is no user (even if server forgets to set the flag
      in the response) since we don't have anything useful to sign with.
      This is especially important now that the more secure SMB3.1.1 protocol
      is in the default dialect list.
      
      An earlier patch ("cifs: allow guest mounts to work for smb3.11") fixed
      the guest mounts to Windows.
      
          Fixes: 6188f28b ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares")
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPaulo Alcantara <palcantara@suse.de>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      8c11a607
    • P
      cifs: Fix slab-out-of-bounds when tracing SMB tcon · 68ddb496
      Paulo Alcantara (SUSE) 提交于
      This patch fixes the following KASAN report:
      
      [  779.044746] BUG: KASAN: slab-out-of-bounds in string+0xab/0x180
      [  779.044750] Read of size 1 at addr ffff88814f327968 by task trace-cmd/2812
      
      [  779.044756] CPU: 1 PID: 2812 Comm: trace-cmd Not tainted 5.1.0-rc1+ #62
      [  779.044760] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.0-0-ga698c89-prebuilt.qemu.org 04/01/2014
      [  779.044761] Call Trace:
      [  779.044769]  dump_stack+0x5b/0x90
      [  779.044775]  ? string+0xab/0x180
      [  779.044781]  print_address_description+0x6c/0x23c
      [  779.044787]  ? string+0xab/0x180
      [  779.044792]  ? string+0xab/0x180
      [  779.044797]  kasan_report.cold.3+0x1a/0x32
      [  779.044803]  ? string+0xab/0x180
      [  779.044809]  string+0xab/0x180
      [  779.044816]  ? widen_string+0x160/0x160
      [  779.044822]  ? vsnprintf+0x5bf/0x7f0
      [  779.044829]  vsnprintf+0x4e7/0x7f0
      [  779.044836]  ? pointer+0x4a0/0x4a0
      [  779.044841]  ? seq_buf_vprintf+0x79/0xc0
      [  779.044848]  seq_buf_vprintf+0x62/0xc0
      [  779.044855]  trace_seq_printf+0x113/0x210
      [  779.044861]  ? trace_seq_puts+0x110/0x110
      [  779.044867]  ? trace_raw_output_prep+0xd8/0x110
      [  779.044876]  trace_raw_output_smb3_tcon_class+0x9f/0xc0
      [  779.044882]  print_trace_line+0x377/0x890
      [  779.044888]  ? tracing_buffers_read+0x300/0x300
      [  779.044893]  ? ring_buffer_read+0x58/0x70
      [  779.044899]  s_show+0x6e/0x140
      [  779.044906]  seq_read+0x505/0x6a0
      [  779.044913]  vfs_read+0xaf/0x1b0
      [  779.044919]  ksys_read+0xa1/0x130
      [  779.044925]  ? kernel_write+0xa0/0xa0
      [  779.044931]  ? __do_page_fault+0x3d5/0x620
      [  779.044938]  do_syscall_64+0x63/0x150
      [  779.044944]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [  779.044949] RIP: 0033:0x7f62c2c2db31
      [ 779.044955] Code: fe ff ff 48 8d 3d 17 9e 09 00 48 83 ec 08 e8 96 02
      02 00 66 0f 1f 44 00 00 8b 05 fa fc 2c 00 48 63 ff 85 c0 75 13 31 c0
      0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48
      89
      [  779.044958] RSP: 002b:00007ffd6e116678 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
      [  779.044964] RAX: ffffffffffffffda RBX: 0000560a38be9260 RCX: 00007f62c2c2db31
      [  779.044966] RDX: 0000000000002000 RSI: 00007ffd6e116710 RDI: 0000000000000003
      [  779.044966] RDX: 0000000000002000 RSI: 00007ffd6e116710 RDI: 0000000000000003
      [  779.044969] RBP: 00007f62c2ef5420 R08: 0000000000000000 R09: 0000000000000003
      [  779.044972] R10: ffffffffffffffa8 R11: 0000000000000246 R12: 00007ffd6e116710
      [  779.044975] R13: 0000000000002000 R14: 0000000000000d68 R15: 0000000000002000
      
      [  779.044981] Allocated by task 1257:
      [  779.044987]  __kasan_kmalloc.constprop.5+0xc1/0xd0
      [  779.044992]  kmem_cache_alloc+0xad/0x1a0
      [  779.044997]  getname_flags+0x6c/0x2a0
      [  779.045003]  user_path_at_empty+0x1d/0x40
      [  779.045008]  do_faccessat+0x12a/0x330
      [  779.045012]  do_syscall_64+0x63/0x150
      [  779.045017]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [  779.045019] Freed by task 1257:
      [  779.045023]  __kasan_slab_free+0x12e/0x180
      [  779.045029]  kmem_cache_free+0x85/0x1b0
      [  779.045034]  filename_lookup.part.70+0x176/0x250
      [  779.045039]  do_faccessat+0x12a/0x330
      [  779.045043]  do_syscall_64+0x63/0x150
      [  779.045048]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      [  779.045052] The buggy address belongs to the object at ffff88814f326600
      which belongs to the cache names_cache of size 4096
      [  779.045057] The buggy address is located 872 bytes to the right of
      4096-byte region [ffff88814f326600, ffff88814f327600)
      [  779.045058] The buggy address belongs to the page:
      [  779.045062] page:ffffea00053cc800 count:1 mapcount:0 mapping:ffff88815b191b40 index:0x0 compound_mapcount: 0
      [  779.045067] flags: 0x200000000010200(slab|head)
      [  779.045075] raw: 0200000000010200 dead000000000100 dead000000000200 ffff88815b191b40
      [  779.045081] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
      [  779.045083] page dumped because: kasan: bad access detected
      
      [  779.045085] Memory state around the buggy address:
      [  779.045089]  ffff88814f327800: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045093]  ffff88814f327880: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045097] >ffff88814f327900: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045099]                                                           ^
      [  779.045103]  ffff88814f327980: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045107]  ffff88814f327a00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      [  779.045109] ==================================================================
      [  779.045110] Disabling lock debugging due to kernel taint
      
      Correctly assign tree name str for smb3_tcon event.
      Signed-off-by: NPaulo Alcantara (SUSE) <paulo@paulo.ac>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      68ddb496
    • R
      cifs: allow guest mounts to work for smb3.11 · e71ab2aa
      Ronnie Sahlberg 提交于
      Fix Guest/Anonymous sessions so that they work with SMB 3.11.
      
      The commit noted below tightened the conditions and forced signing for
      the SMB2-TreeConnect commands as per MS-SMB2.
      However, this should only apply to normal user sessions and not for
      Guest/Anonumous sessions.
      
      Fixes: 6188f28b ("Tree connect for SMB3.1.1 must be signed for non-encrypted shares")
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      CC: Stable <stable@vger.kernel.org>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      e71ab2aa
    • S
      fix incorrect error code mapping for OBJECTID_NOT_FOUND · 85f9987b
      Steve French 提交于
      It was mapped to EIO which can be confusing when user space
      queries for an object GUID for an object for which the server
      file system doesn't support (or hasn't saved one).
      
      As Amir Goldstein suggested this is similar to ENOATTR
      (equivalently ENODATA in Linux errno definitions) so
      changing NT STATUS code mapping for OBJECTID_NOT_FOUND
      to ENODATA.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Amir Goldstein <amir73il@gmail.com>
      85f9987b
    • X
      cifs: fix that return -EINVAL when do dedupe operation · b073a080
      Xiaoli Feng 提交于
      dedupe_file_range operations is combiled into remap_file_range.
      But it's always skipped for dedupe operations in function
      cifs_remap_file_range.
      
      Example to test:
      Before this patch:
        # dd if=/dev/zero of=cifs/file bs=1M count=1
        # xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file
        XFS_IOC_FILE_EXTENT_SAME: Invalid argument
      
      After this patch:
        # dd if=/dev/zero of=cifs/file bs=1M count=1
        # xfs_io -c "dedupe cifs/file 4k 64k 4k" cifs/file
        XFS_IOC_FILE_EXTENT_SAME: Operation not supported
      
      Influence for xfstests:
      generic/091
      generic/112
      generic/127
      generic/263
      These tests report this error "do_copy_range:: Invalid
      argument" instead of "FIDEDUPERANGE: Invalid argument".
      Because there are still two bugs cause these test failed.
      https://bugzilla.kernel.org/show_bug.cgi?id=202935
      https://bugzilla.kernel.org/show_bug.cgi?id=202785Signed-off-by: NXiaoli Feng <fengxiaoli0714@gmail.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      b073a080
    • L
      CIFS: Fix an issue with re-sending rdata when transport returning -EAGAIN · 0b0dfd59
      Long Li 提交于
      When sending a rdata, transport may return -EAGAIN. In this case
      we should re-obtain credits because the session may have been
      reconnected.
      
      Change in v2: adjust_credits before re-sending
      Signed-off-by: NLong Li <longli@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      0b0dfd59
    • L
      CIFS: Fix an issue with re-sending wdata when transport returning -EAGAIN · d53e292f
      Long Li 提交于
      When sending a wdata, transport may return -EAGAIN. In this case
      we should re-obtain credits because the session may have been
      reconnected.
      
      Change in v2: adjust_credits before re-sending
      Signed-off-by: NLong Li <longli@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      d53e292f
  8. 21 3月, 2019 1 次提交
    • F
      Btrfs: fix assertion failure on fsync with NO_HOLES enabled · 0ccc3876
      Filipe Manana 提交于
      Back in commit a89ca6f2 ("Btrfs: fix fsync after truncate when
      no_holes feature is enabled") I added an assertion that is triggered when
      an inline extent is found to assert that the length of the (uncompressed)
      data the extent represents is the same as the i_size of the inode, since
      that is true most of the time I couldn't find or didn't remembered about
      any exception at that time. Later on the assertion was expanded twice to
      deal with a case of a compressed inline extent representing a range that
      matches the sector size followed by an expanding truncate, and another
      case where fallocate can update the i_size of the inode without adding
      or updating existing extents (if the fallocate range falls entirely within
      the first block of the file). These two expansion/fixes of the assertion
      were done by commit 7ed586d0 ("Btrfs: fix assertion on fsync of
      regular file when using no-holes feature") and commit 6399fb5a
      ("Btrfs: fix assertion failure during fsync in no-holes mode").
      These however missed the case where an falloc expands the i_size of an
      inode to exactly the sector size and inline extent exists, for example:
      
       $ mkfs.btrfs -f -O no-holes /dev/sdc
       $ mount /dev/sdc /mnt
      
       $ xfs_io -f -c "pwrite -S 0xab 0 1096" /mnt/foobar
       wrote 1096/1096 bytes at offset 0
       1 KiB, 1 ops; 0.0002 sec (4.448 MiB/sec and 4255.3191 ops/sec)
      
       $ xfs_io -c "falloc 1096 3000" /mnt/foobar
       $ xfs_io -c "fsync" /mnt/foobar
       Segmentation fault
      
       $ dmesg
       [701253.602385] assertion failed: len == i_size || (len == fs_info->sectorsize && btrfs_file_extent_compression(leaf, extent) != BTRFS_COMPRESS_NONE) || (len < i_size && i_size < fs_info->sectorsize), file: fs/btrfs/tree-log.c, line: 4727
       [701253.602962] ------------[ cut here ]------------
       [701253.603224] kernel BUG at fs/btrfs/ctree.h:3533!
       [701253.603503] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
       [701253.603774] CPU: 2 PID: 7192 Comm: xfs_io Tainted: G        W         5.0.0-rc8-btrfs-next-45 #1
       [701253.604054] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626ccb91-prebuilt.qemu-project.org 04/01/2014
       [701253.604650] RIP: 0010:assfail.constprop.23+0x18/0x1a [btrfs]
       (...)
       [701253.605591] RSP: 0018:ffffbb48c186bc48 EFLAGS: 00010286
       [701253.605914] RAX: 00000000000000de RBX: ffff921d0a7afc08 RCX: 0000000000000000
       [701253.606244] RDX: 0000000000000000 RSI: ffff921d36b16868 RDI: ffff921d36b16868
       [701253.606580] RBP: ffffbb48c186bcf0 R08: 0000000000000000 R09: 0000000000000000
       [701253.606913] R10: 0000000000000003 R11: 0000000000000000 R12: ffff921d05d2de18
       [701253.607247] R13: ffff921d03b54000 R14: 0000000000000448 R15: ffff921d059ecf80
       [701253.607769] FS:  00007f14da906700(0000) GS:ffff921d36b00000(0000) knlGS:0000000000000000
       [701253.608163] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       [701253.608516] CR2: 000056087ea9f278 CR3: 00000002268e8001 CR4: 00000000003606e0
       [701253.608880] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       [701253.609250] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
       [701253.609608] Call Trace:
       [701253.609994]  btrfs_log_inode+0xdfb/0xe40 [btrfs]
       [701253.610383]  btrfs_log_inode_parent+0x2be/0xa60 [btrfs]
       [701253.610770]  ? do_raw_spin_unlock+0x49/0xc0
       [701253.611150]  btrfs_log_dentry_safe+0x4a/0x70 [btrfs]
       [701253.611537]  btrfs_sync_file+0x3b2/0x440 [btrfs]
       [701253.612010]  ? do_sysinfo+0xb0/0xf0
       [701253.612552]  do_fsync+0x38/0x60
       [701253.612988]  __x64_sys_fsync+0x10/0x20
       [701253.613360]  do_syscall_64+0x60/0x1b0
       [701253.613733]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
       [701253.614103] RIP: 0033:0x7f14da4e66d0
       (...)
       [701253.615250] RSP: 002b:00007fffa670fdb8 EFLAGS: 00000246 ORIG_RAX: 000000000000004a
       [701253.615647] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f14da4e66d0
       [701253.616047] RDX: 000056087ea9c260 RSI: 000056087ea9c260 RDI: 0000000000000003
       [701253.616450] RBP: 0000000000000001 R08: 0000000000000020 R09: 0000000000000010
       [701253.616854] R10: 000000000000009b R11: 0000000000000246 R12: 000056087ea9c260
       [701253.617257] R13: 000056087ea9c240 R14: 0000000000000000 R15: 000056087ea9dd10
       (...)
       [701253.619941] ---[ end trace e088d74f132b6da5 ]---
      
      Updating the assertion again to allow for this particular case would result
      in a meaningless assertion, plus there is currently no risk of logging
      content that would result in any corruption after a log replay if the size
      of the data encoded in an inline extent is greater than the inode's i_size
      (which is not currently possibe either with or without compression),
      therefore just remove the assertion.
      
      CC: stable@vger.kernel.org # 4.4+
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      0ccc3876
  9. 20 3月, 2019 1 次提交
  10. 19 3月, 2019 11 次提交
    • D
      xfs: always init bma in xfs_bmapi_write · 4b0bce30
      Darrick J. Wong 提交于
      Always init the tp/ip fields of bma in xfs_bmapi_write so that the
      bmapi_finish at the bottom never trips over null transaction or inode
      pointers.
      
      Coverity-id: 1443964
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      4b0bce30
    • D
      xfs: fix btree scrub checking with regards to root-in-inode · a72e9d8d
      Darrick J. Wong 提交于
      In xchk_btree_check_owner, we can be passed a null buffer pointer.  This
      should only happen for the root of a root-in-inode btree type, but we
      should program defensively in case the btree cursor state ever gets
      screwed up and we get a null buffer anyway.
      
      Coverity-id: 1438713
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      a72e9d8d
    • D
      xfs: dabtree scrub needs to range-check level · 228de124
      Darrick J. Wong 提交于
      Make sure scrub's dabtree iterator function checks that we're not
      going deeper in the stack than our cursor permits.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      228de124
    • N
      btrfs: Avoid possible qgroup_rsv_size overflow in btrfs_calculate_inode_block_rsv_size · 139a5617
      Nikolay Borisov 提交于
      qgroup_rsv_size is calculated as the product of
      outstanding_extent * fs_info->nodesize. The product is calculated with
      32 bit precision since both variables are defined as u32. Yet
      qgroup_rsv_size expects a 64 bit result.
      
      Avoid possible multiplication overflow by casting outstanding_extent to
      u64. Such overflow would in the worst case (64K nodesize) require more
      than 65536 extents, which is quite large and i'ts not likely that it
      would happen in practice.
      
      Fixes-coverity-id: 1435101
      Fixes: ff6bc37e ("btrfs: qgroup: Use independent and accurate per inode qgroup rsv")
      CC: stable@vger.kernel.org # 4.19+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      139a5617
    • N
      btrfs: Fix bound checking in qgroup_trace_new_subtree_blocks · 7ff2c2a1
      Nikolay Borisov 提交于
      If 'cur_level' is 7  then the bound checking at the top of the function
      will actually pass. Later on, it's possible to dereference
      ds_path->nodes[cur_level+1] which will be an out of bounds.
      
      The correct check will be cur_level >= BTRFS_MAX_LEVEL - 1 .
      
      Fixes-coverty-id: 1440918
      Fixes-coverty-id: 1440911
      Fixes: ea49f3e7 ("btrfs: qgroup: Introduce function to find all new tree blocks of reloc tree")
      CC: stable@vger.kernel.org # 4.20+
      Reviewed-by: NQu Wenruo <wqu@suse.com>
      Signed-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      7ff2c2a1
    • J
      fanotify: Allow copying of file handle to userspace · b2d22b6b
      Jan Kara 提交于
      When file handle is embedded inside fanotify_event and usercopy checks
      are enabled, we get a warning like:
      
      Bad or missing usercopy whitelist? Kernel memory exposure attempt detected
      from SLAB object 'fanotify_event' (offset 40, size 8)!
      WARNING: CPU: 1 PID: 7649 at mm/usercopy.c:78 usercopy_warn+0xeb/0x110
      mm/usercopy.c:78
      
      Annotate handling in fanotify_event properly to mark copying it to
      userspace is fine.
      
      Reported-by: syzbot+2c49971e251e36216d1f@syzkaller.appspotmail.com
      Fixes: a8b13aa2 ("fanotify: enable FAN_REPORT_FID init flag")
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NAmir Goldstein <amir73il@gmail.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b2d22b6b
    • N
      NFS: fix mount/umount race in nlmclnt. · 4a9be28c
      NeilBrown 提交于
      If the last NFSv3 unmount from a given host races with a mount from the
      same host, we can destroy an nlm_host that is still in use.
      
      Specifically nlmclnt_lookup_host() can increment h_count on
      an nlm_host that nlmclnt_release_host() has just successfully called
      refcount_dec_and_test() on.
      Once nlmclnt_lookup_host() drops the mutex, nlm_destroy_host_lock()
      will be called to destroy the nlmclnt which is now in use again.
      
      The cause of the problem is that the dec_and_test happens outside the
      locked region.  This is easily fixed by using
      refcount_dec_and_mutex_lock().
      
      Fixes: 8ea6ecc8 ("lockd: Create client-side nlm_host cache")
      Cc: stable@vger.kernel.org (v2.6.38+)
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      4a9be28c
    • A
      btrfs: raid56: properly unmap parity page in finish_parity_scrub() · 3897b6f0
      Andrea Righi 提交于
      Parity page is incorrectly unmapped in finish_parity_scrub(), triggering
      a reference counter bug on i386, i.e.:
      
       [ 157.662401] kernel BUG at mm/highmem.c:349!
       [ 157.666725] invalid opcode: 0000 [#1] SMP PTI
      
      The reason is that kunmap(p_page) was completely left out, so we never
      did an unmap for the p_page and the loop unmapping the rbio page was
      iterating over the wrong number of stripes: unmapping should be done
      with nr_data instead of rbio->real_stripes.
      
      Test case to reproduce the bug:
      
       - create a raid5 btrfs filesystem:
         # mkfs.btrfs -m raid5 -d raid5 /dev/sdb /dev/sdc /dev/sdd /dev/sde
      
       - mount it:
         # mount /dev/sdb /mnt
      
       - run btrfs scrub in a loop:
         # while :; do btrfs scrub start -BR /mnt; done
      
      BugLink: https://bugs.launchpad.net/bugs/1812845
      Fixes: 5a6ac9ea ("Btrfs, raid56: support parity scrub on raid56")
      CC: stable@vger.kernel.org # 4.4+
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NAndrea Righi <andrea.righi@canonical.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      3897b6f0
    • C
      NFS: Fix nfs4_lock_state refcounting in nfs4_alloc_{lock,unlock}data() · 3028efe0
      Catalin Marinas 提交于
      Commit 7b587e1a ("NFS: use locks_copy_lock() to copy locks.")
      changed the lock copying from memcpy() to the dedicated
      locks_copy_lock() function. The latter correctly increments the
      nfs4_lock_state.ls_count via nfs4_fl_copy_lock(), however, this refcount
      has already been incremented in the nfs4_alloc_{lock,unlock}data().
      Kmemleak subsequently reports an unreferenced nfs4_lock_state object as
      below (arm64 platform):
      
      unreferenced object 0xffff8000fce0b000 (size 256):
        comm "systemd-sysuser", pid 1608, jiffies 4294892825 (age 32.348s)
        hex dump (first 32 bytes):
          20 57 4c fb 00 80 ff ff 20 57 4c fb 00 80 ff ff   WL..... WL.....
          00 57 4c fb 00 80 ff ff 01 00 00 00 00 00 00 00  .WL.............
        backtrace:
          [<000000000d15010d>] kmem_cache_alloc+0x178/0x208
          [<00000000d7c1d264>] nfs4_set_lock_state+0x124/0x1f0
          [<000000009c867628>] nfs4_proc_lock+0x90/0x478
          [<000000001686bd74>] do_setlk+0x64/0xe8
          [<00000000e01500d4>] nfs_lock+0xe8/0x1f0
          [<000000004f387d8d>] vfs_lock_file+0x18/0x40
          [<00000000656ab79b>] do_lock_file_wait+0x68/0xf8
          [<00000000f17c4a4b>] fcntl_setlk+0x224/0x280
          [<0000000052a242c6>] do_fcntl+0x418/0x730
          [<000000004f47291a>] __arm64_sys_fcntl+0x84/0xd0
          [<00000000d6856e01>] el0_svc_common+0x80/0xf0
          [<000000009c4bd1df>] el0_svc_handler+0x2c/0x80
          [<00000000b1a0d479>] el0_svc+0x8/0xc
          [<0000000056c62a0f>] 0xffffffffffffffff
      
      This patch removes the original refcount_inc(&lsp->ls_count) that was
      paired with the memcpy() lock copying.
      
      Fixes: 7b587e1a ("NFS: use locks_copy_lock() to copy locks.")
      Cc: <stable@vger.kernel.org> # 5.0.x-
      Cc: NeilBrown <neilb@suse.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      3028efe0
    • J
      block: add BIO_NO_PAGE_REF flag · 399254aa
      Jens Axboe 提交于
      If bio_iov_iter_get_pages() is called on an iov_iter that is flagged
      with NO_REF, then we don't need to add a page reference for the pages
      that we add.
      
      Add BIO_NO_PAGE_REF to track this in the bio, so IO completion knows
      not to drop a reference to these pages.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      399254aa
    • J
      iov_iter: add ITER_BVEC_FLAG_NO_REF flag · 875f1d07
      Jens Axboe 提交于
      For ITER_BVEC, if we're holding on to kernel pages, the caller
      doesn't need to grab a reference to the bvec pages, and drop that
      same reference on IO completion. This is essentially safe for any
      ITER_BVEC, but some use cases end up reusing pages and uncondtionally
      dropping a page reference on completion. And example of that is
      sendfile(2), that ends up being a splice_in + splice_out on the
      pipe pages.
      
      Add a flag that tells us it's fine to not grab a page reference
      to the bvec pages, since that caller knows not to drop a reference
      when it's done with the pages.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      875f1d07