1. 04 5月, 2019 1 次提交
  2. 02 5月, 2019 24 次提交
    • A
      Fix aio_poll() races · e9e47779
      Al Viro 提交于
      commit af5c72b1fc7a00aa484e90b0c4e0eeb582545634 upstream.
      
      aio_poll() has to cope with several unpleasant problems:
      	* requests that might stay around indefinitely need to
      be made visible for io_cancel(2); that must not be done to
      a request already completed, though.
      	* in cases when ->poll() has placed us on a waitqueue,
      wakeup might have happened (and request completed) before ->poll()
      returns.
      	* worse, in some early wakeup cases request might end
      up re-added into the queue later - we can't treat "woken up and
      currently not in the queue" as "it's not going to stick around
      indefinitely"
      	* ... moreover, ->poll() might have decided not to
      put it on any queues to start with, and that needs to be distinguished
      from the previous case
      	* ->poll() might have tried to put us on more than one queue.
      Only the first will succeed for aio poll, so we might end up missing
      wakeups.  OTOH, we might very well notice that only after the
      wakeup hits and request gets completed (all before ->poll() gets
      around to the second poll_wait()).  In that case it's too late to
      decide that we have an error.
      
      req->woken was an attempt to deal with that.  Unfortunately, it was
      broken.  What we need to keep track of is not that wakeup has happened -
      the thing might come back after that.  It's that async reference is
      already gone and won't come back, so we can't (and needn't) put the
      request on the list of cancellables.
      
      The easiest case is "request hadn't been put on any waitqueues"; we
      can tell by seeing NULL apt.head, and in that case there won't be
      anything async.  We should either complete the request ourselves
      (if vfs_poll() reports anything of interest) or return an error.
      
      In all other cases we get exclusion with wakeups by grabbing the
      queue lock.
      
      If request is currently on queue and we have something interesting
      from vfs_poll(), we can steal it and complete the request ourselves.
      
      If it's on queue and vfs_poll() has not reported anything interesting,
      we either put it on the cancellable list, or, if we know that it
      hadn't been put on all queues ->poll() wanted it on, we steal it and
      return an error.
      
      If it's _not_ on queue, it's either been already dealt with (in which
      case we do nothing), or there's aio_poll_complete_work() about to be
      executed.  In that case we either put it on the cancellable list,
      or, if we know it hadn't been put on all queues ->poll() wanted it on,
      simulate what cancel would've done.
      
      It's a lot more convoluted than I'd like it to be.  Single-consumer APIs
      suck, and unfortunately aio is not an exception...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9e47779
    • A
      aio: store event at final iocb_put() · aab66dfb
      Al Viro 提交于
      commit 2bb874c0d873d13bd9b9b9c6d7b7c4edab18c8b4 upstream.
      
      Instead of having aio_complete() set ->ki_res.{res,res2}, do that
      explicitly in its callers, drop the reference (as aio_complete()
      used to do) and delay the rest until the final iocb_put().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aab66dfb
    • A
      aio: keep io_event in aio_kiocb · c20202c5
      Al Viro 提交于
      commit a9339b7855094ba11a97e8822ae038135e879e79 upstream.
      
      We want to separate forming the resulting io_event from putting it
      into the ring buffer.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c20202c5
    • A
      aio: fold lookup_kiocb() into its sole caller · 592ea630
      Al Viro 提交于
      commit 833f4154ed560232120bc475935ee1d6a20e159f upstream.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      592ea630
    • L
      pin iocb through aio. · c7f2525a
      Linus Torvalds 提交于
      commit b53119f13a04879c3bf502828d99d13726639ead upstream.
      
      aio_poll() is not the only case that needs file pinned; worse, while
      aio_read()/aio_write() can live without pinning iocb itself, the
      proof is rather brittle and can easily break on later changes.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7f2525a
    • L
      aio: simplify - and fix - fget/fput for io_submit() · d6b2615f
      Linus Torvalds 提交于
      commit 84c4e1f89fefe70554da0ab33be72c9be7994379 upstream.
      
      Al Viro root-caused a race where the IOCB_CMD_POLL handling of
      fget/fput() could cause us to access the file pointer after it had
      already been freed:
      
       "In more details - normally IOCB_CMD_POLL handling looks so:
      
         1) io_submit(2) allocates aio_kiocb instance and passes it to
            aio_poll()
      
         2) aio_poll() resolves the descriptor to struct file by req->file =
            fget(iocb->aio_fildes)
      
         3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
            aio_kiocb to 2 (bumps by 1, that is).
      
         4) aio_poll() calls vfs_poll(). After sanity checks (basically,
            "poll_wait() had been called and only once") it locks the queue.
            That's what the extra reference to iocb had been for - we know we
            can safely access it.
      
         5) With queue locked, we check if ->woken has already been set to
            true (by aio_poll_wake()) and, if it had been, we unlock the
            queue, drop a reference to aio_kiocb and bugger off - at that
            point it's a responsibility to aio_poll_wake() and the stuff
            called/scheduled by it. That code will drop the reference to file
            in req->file, along with the other reference to our aio_kiocb.
      
         6) otherwise, we see whether we need to wait. If we do, we unlock the
            queue, drop one reference to aio_kiocb and go away - eventual
            wakeup (or cancel) will deal with the reference to file and with
            the other reference to aio_kiocb
      
         7) otherwise we remove ourselves from waitqueue (still under the
            queue lock), so that wakeup won't get us. No async activity will
            be happening, so we can safely drop req->file and iocb ourselves.
      
        If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
        won't get freed under us, so we can do all the checks and locking
        safely. And we don't touch ->file if we detect that case.
      
        However, vfs_poll() most certainly *does* touch the file it had been
        given. So wakeup coming while we are still in ->poll() might end up
        doing fput() on that file. That case is not too rare, and usually we
        are saved by the still present reference from descriptor table - that
        fput() is not the final one.
      
        But if another thread closes that descriptor right after our fget()
        and wakeup does happen before ->poll() returns, we are in trouble -
        final fput() done while we are in the middle of a method:
      
      Al also wrote a patch to take an extra reference to the file descriptor
      to fix this, but I instead suggested we just streamline the whole file
      pointer handling by submit_io() so that the generic aio submission code
      simply keeps the file pointer around until the aio has completed.
      
      Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6b2615f
    • M
      aio: initialize kiocb private in case any filesystems expect it. · 2afa01cd
      Mike Marshall 提交于
      commit ec51f8ee1e63498e9f521ec0e5a6d04622bb2c67 upstream.
      
      A recent optimization had left private uninitialized.
      
      Fixes: 2bc4ca9bb600 ("aio: don't zero entire aio_kiocb aio_get_req()")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Marshall <hubcap@omnibond.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2afa01cd
    • J
      aio: abstract out io_event filler helper · a812f7b6
      Jens Axboe 提交于
      commit 875736bb3f3ded168469f6a14df7a938416a99d5 upstream.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a812f7b6
    • J
      aio: split out iocb copy from io_submit_one() · d384f8b8
      Jens Axboe 提交于
      commit 88a6f18b950e2e4dce57d31daa151105f4f3dcff upstream.
      
      In preparation of handing in iocbs in a different fashion as well. Also
      make it clear that the iocb being passed in isn't modified, by marking
      it const throughout.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d384f8b8
    • J
      aio: use iocb_put() instead of open coding it · 4d677689
      Jens Axboe 提交于
      commit 71ebc6fef0f53459f37fb39e1466792232fa52ee upstream.
      
      Replace the percpu_ref_put() + kmem_cache_free() with a call to
      iocb_put() instead.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d677689
    • J
      aio: don't zero entire aio_kiocb aio_get_req() · ef529eea
      Jens Axboe 提交于
      commit 2bc4ca9bb600cbe36941da2b2a67189fc4302a04 upstream.
      
      It's 192 bytes, fairly substantial. Most items don't need to be cleared,
      especially not upfront. Clear the ones we do need to clear, and leave
      the other ones for setup when the iocb is prepared and submitted.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef529eea
    • C
      aio: separate out ring reservation from req allocation · 730198c8
      Christoph Hellwig 提交于
      commit 432c79978c33ecef91b1b04cea6936c20810da29 upstream.
      
      This is in preparation for certain types of IO not needing a ring
      reserveration.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      730198c8
    • J
      aio: use assigned completion handler · b3373253
      Jens Axboe 提交于
      commit bc9bff61624ac33b7c95861abea1af24ee7a94fc upstream.
      
      We know this is a read/write request, but in preparation for
      having different kinds of those, ensure that we call the assigned
      handler instead of assuming it's aio_complete_rq().
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3373253
    • C
      aio: clear IOCB_HIPRI · 9101cbe7
      Christoph Hellwig 提交于
      commit 154989e45fd8de9bfb52bbd6e5ea763e437e54c5 upstream.
      
      No one is going to poll for aio (yet), so we must clear the HIPRI
      flag, as we would otherwise send it down the poll queues, where no
      one will be polling for completions.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      
      IOCB_HIPRI, not RWF_HIPRI.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9101cbe7
    • T
      NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family. · 94ad68a6
      Tetsuo Handa 提交于
      commit 7c2bd9a39845bfb6d72ddb55ce737650271f6f96 upstream.
      
      syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This
      is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family
      (which is embedded into user-visible "struct nfs_mount_data" structure)
      despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6)
      bytes of AF_INET6 address to rpc_sockaddr2uaddr().
      
      Since "struct nfs_mount_data" structure is user-visible, we can't change
      "struct nfs_mount_data" to use "struct sockaddr_storage". Therefore,
      assuming that everybody is using AF_INET family when passing address via
      "struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET.
      
      [1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75cReported-by: Nsyzbot <syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com>
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94ad68a6
    • Y
      fs/proc/proc_sysctl.c: Fix a NULL pointer dereference · 4d476a00
      YueHaibing 提交于
      commit 89189557b47b35683a27c80ee78aef18248eefb4 upstream.
      
      Syzkaller report this:
      
        sysctl could not get directory: /net//bridge -12
        kasan: CONFIG_KASAN_INLINE enabled
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        general protection fault: 0000 [#1] SMP KASAN PTI
        CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G         C        5.1.0-rc3+ #8
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
        RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline]
        RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline]
        RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline]
        RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459
        Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48
        RSP: 0018:ffff8881bb507778 EFLAGS: 00010206
        RAX: dffffc0000000000 RBX: ffff8881f224b5b8 RCX: ffffffff818f3f6a
        RDX: 000000000000000a RSI: 0000000000000050 RDI: ffff8881f224b568
        RBP: 0000000000000000 R08: ffffed10376a0ef4 R09: ffffed10376a0ef4
        R10: 0000000000000001 R11: ffffed10376a0ef4 R12: ffff8881f224b558
        R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
        FS:  00007f3e7ce13700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fd60fbe9398 CR3: 00000001cb55c001 CR4: 00000000007606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        PKRU: 55555554
        Call Trace:
         erase_entry fs/proc/proc_sysctl.c:178 [inline]
         erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207
         start_unregistering fs/proc/proc_sysctl.c:331 [inline]
         drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631
         get_subdir fs/proc/proc_sysctl.c:1022 [inline]
         __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
         br_netfilter_init+0x68/0x1000 [br_netfilter]
         do_one_initcall+0xbc/0x47d init/main.c:901
         do_init_module+0x1b5/0x547 kernel/module.c:3456
         load_module+0x6405/0x8c10 kernel/module.c:3804
         __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
         do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy mdio_cavium iptable_security iptable_raw iptable_mangle
         iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: br_netfilter]
        Dumping ftrace buffer:
           (ftrace buffer empty)
        ---[ end trace 68741688d5fbfe85 ]---
      
      commit 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer
      dereference in put_links") forgot to handle start_unregistering() case,
      while header->parent is NULL, it calls erase_header() and as seen in the
      above syzkaller call trace, accessing &header->parent->root will trigger
      a NULL pointer dereference.
      
      As that commit explained, there is also no need to call
      start_unregistering() if header->parent is NULL.
      
      Link: http://lkml.kernel.org/r/20190409153622.28112-1-yuehaibing@huawei.com
      Fixes: 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links")
      Fixes: 0e47c99d ("sysctl: Replace root_list with links between sysctl_table_sets")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d476a00
    • T
      nfsd: Don't release the callback slot unless it was actually held · b4d4b5e4
      Trond Myklebust 提交于
      commit e6abc8caa6deb14be2a206253f7e1c5e37e9515b upstream.
      
      If there are multiple callbacks queued, waiting for the callback
      slot when the callback gets shut down, then they all currently
      end up acting as if they hold the slot, and call
      nfsd4_cb_sequence_done() resulting in interesting side-effects.
      
      In addition, the 'retry_nowait' path in nfsd4_cb_sequence_done()
      causes a loop back to nfsd4_cb_prepare() without first freeing the
      slot, which causes a deadlock when nfsd41_cb_get_slot() gets called
      a second time.
      
      This patch therefore adds a boolean to track whether or not the
      callback did pick up the slot, so that it can do the right thing
      in these 2 cases.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b4d4b5e4
    • Y
      ceph: fix ci->i_head_snapc leak · 950eec81
      Yan, Zheng 提交于
      commit 37659182bff1eeaaeadcfc8f853c6d2b6dbc3f47 upstream.
      
      We missed two places that i_wrbuffer_ref_head, i_wr_ref, i_dirty_caps
      and i_flushing_caps may change. When they are all zeros, we should free
      i_head_snapc.
      
      Cc: stable@vger.kernel.org
      Link: https://tracker.ceph.com/issues/38224Reported-and-tested-by: NLuis Henriques <lhenriques@suse.com>
      Signed-off-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      950eec81
    • J
      ceph: ensure d_name stability in ceph_dentry_hash() · 246d2bf3
      Jeff Layton 提交于
      commit 76a495d666e5043ffc315695f8241f5e94a98849 upstream.
      
      Take the d_lock here to ensure that d_name doesn't change.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      246d2bf3
    • J
      ceph: only use d_name directly when parent is locked · 8d693ef0
      Jeff Layton 提交于
      commit 1bcb344086f3ecf8d6705f6d708441baa823beb3 upstream.
      
      Ben reported tripping the BUG_ON in create_request_message during some
      performance testing. Analysis of the vmcore showed that the length of
      the r_dentry->d_name string changed after we allocated the buffer, but
      before we encoded it.
      
      build_dentry_path returns pointers to d_name in the common case of
      non-snapped dentries, but this optimization isn't safe unless the parent
      directory is locked. When it isn't, have the code make a copy of the
      d_name while holding the d_lock.
      
      Cc: stable@vger.kernel.org
      Reported-by: NBen England <bengland@redhat.com>
      Signed-off-by: NJeff Layton <jlayton@kernel.org>
      Reviewed-by: N"Yan, Zheng" <zyan@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8d693ef0
    • J
      tracing: Fix buffer_ref pipe ops · cffeb9c8
      Jann Horn 提交于
      commit b987222654f84f7b4ca95b3a55eca784cb30235b upstream.
      
      This fixes multiple issues in buffer_pipe_buf_ops:
      
       - The ->steal() handler must not return zero unless the pipe buffer has
         the only reference to the page. But generic_pipe_buf_steal() assumes
         that every reference to the pipe is tracked by the page's refcount,
         which isn't true for these buffers - buffer_pipe_buf_get(), which
         duplicates a buffer, doesn't touch the page's refcount.
         Fix it by using generic_pipe_buf_nosteal(), which refuses every
         attempted theft. It should be easy to actually support ->steal, but the
         only current users of pipe_buf_steal() are the virtio console and FUSE,
         and they also only use it as an optimization. So it's probably not worth
         the effort.
       - The ->get() and ->release() handlers can be invoked concurrently on pipe
         buffers backed by the same struct buffer_ref. Make them safe against
         concurrency by using refcount_t.
       - The pointers stored in ->private were only zeroed out when the last
         reference to the buffer_ref was dropped. As far as I know, this
         shouldn't be necessary anyway, but if we do it, let's always do it.
      
      Link: http://lkml.kernel.org/r/20190404215925.253531-1-jannh@google.com
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      Fixes: 73a757e6 ("ring-buffer: Return reader page back into existing ring buffer")
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      cffeb9c8
    • F
      cifs: do not attempt cifs operation on smb2+ rename error · ee231063
      Frank Sorenson 提交于
      commit 652727bbe1b17993636346716ae5867627793647 upstream.
      
      A path-based rename returning EBUSY will incorrectly try opening
      the file with a cifs (NT Create AndX) operation on an smb2+ mount,
      which causes the server to force a session close.
      
      If the mount is smb2+, skip the fallback.
      Signed-off-by: NFrank Sorenson <sorenson@redhat.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Stable <stable@vger.kernel.org>
      Reviewed-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee231063
    • R
      cifs: fix memory leak in SMB2_read · d5bf783a
      Ronnie Sahlberg 提交于
      commit 05fd5c2c61732152a6bddc318aae62d7e436629b upstream.
      
      Commit 088aaf17aa79300cab14dbee2569c58cfafd7d6e introduced a leak where
      if SMB2_read() returned an error we would return without freeing the
      request buffer.
      
      Cc: Stable <stable@vger.kernel.org>
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Reviewed-by: NPavel Shilovsky <pshilov@microsoft.com>
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d5bf783a
    • D
      ext4: fix some error pointer dereferences · 8766cc7d
      Dan Carpenter 提交于
      [ Upstream commit 7159a986b4202343f6cca3bb8079ecace5816fd6 ]
      
      We can't pass error pointers to brelse().
      
      Fixes: fb265c9cb49e ("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases")
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8766cc7d
  3. 27 4月, 2019 5 次提交
  4. 20 4月, 2019 10 次提交
    • C
      f2fs: fix to dirty inode for i_mode recovery · 48b0309f
      Chao Yu 提交于
      [ Upstream commit ca597bddedd94906cd761d8be6a3ad21292725de ]
      
      As Seulbae Kim reported in bugzilla:
      
      https://bugzilla.kernel.org/show_bug.cgi?id=202637
      
      We didn't recover permission field correctly after sudden power-cut,
      the reason is in setattr we didn't add inode into global dirty list
      once i_mode is changed, so latter checkpoint triggered by fsync will
      not flush last i_mode into disk, result in this problem, fix it.
      Reported-by: NSeulbae Kim <seulbae@gatech.edu>
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      48b0309f
    • S
      cifs: fallback to older infolevels on findfirst queryinfo retry · e9603cff
      Steve French 提交于
      [ Upstream commit 3b7960caceafdfc2cdfe2850487f8d091eb41144 ]
      
      In cases where queryinfo fails, we have cases in cifs (vers=1.0)
      where with backupuid mounts we retry the query info with findfirst.
      This doesn't work to some NetApp servers which don't support
      WindowsXP (and later) infolevel 261 (SMB_FIND_FILE_ID_FULL_DIR_INFO)
      so in this case use other info levels (in this case it will usually
      be level 257, SMB_FIND_FILE_DIRECTORY_INFO).
      
      (Also fixes some indentation)
      
      See kernel bugzilla 201435
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      e9603cff
    • S
      f2fs: cleanup dirty pages if recover failed · 8722566b
      Sheng Yong 提交于
      [ Upstream commit 26b5a079197c8cb6725565968b7fd3299bd1877b ]
      
      During recover, we will try to create new dentries for inodes with
      dentry_mark. But if the parent is missing (e.g. killed by fsck),
      recover will break. But those recovered dirty pages are not cleanup.
      This will hit f2fs_bug_on:
      
      [   53.519566] F2FS-fs (loop0): Found nat_bits in checkpoint
      [   53.539354] F2FS-fs (loop0): recover_inode: ino = 5, name = file, inline = 3
      [   53.539402] F2FS-fs (loop0): recover_dentry: ino = 5, name = file, dir = 0, err = -2
      [   53.545760] F2FS-fs (loop0): Cannot recover all fsync data errno=-2
      [   53.546105] F2FS-fs (loop0): access invalid blkaddr:4294967295
      [   53.546171] WARNING: CPU: 1 PID: 1798 at fs/f2fs/checkpoint.c:163 f2fs_is_valid_blkaddr+0x26c/0x320
      [   53.546174] Modules linked in:
      [   53.546183] CPU: 1 PID: 1798 Comm: mount Not tainted 4.19.0-rc2+ #1
      [   53.546186] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      [   53.546191] RIP: 0010:f2fs_is_valid_blkaddr+0x26c/0x320
      [   53.546195] Code: 85 bb 00 00 00 48 89 df 88 44 24 07 e8 ad a8 db ff 48 8b 3b 44 89 e1 48 c7 c2 40 03 72 a9 48 c7 c6 e0 01 72 a9 e8 84 3c ff ff <0f> 0b 0f b6 44 24 07 e9 8a 00 00 00 48 8d bf 38 01 00 00 e8 7c a8
      [   53.546201] RSP: 0018:ffff88006c067768 EFLAGS: 00010282
      [   53.546208] RAX: 0000000000000000 RBX: ffff880068844200 RCX: ffffffffa83e1a33
      [   53.546211] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88006d51e590
      [   53.546215] RBP: 0000000000000005 R08: ffffed000daa3cb3 R09: ffffed000daa3cb3
      [   53.546218] R10: 0000000000000001 R11: ffffed000daa3cb2 R12: 00000000ffffffff
      [   53.546221] R13: ffff88006a1f8000 R14: 0000000000000200 R15: 0000000000000009
      [   53.546226] FS:  00007fb2f3646840(0000) GS:ffff88006d500000(0000) knlGS:0000000000000000
      [   53.546229] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   53.546234] CR2: 00007f0fd77f0008 CR3: 00000000687e6002 CR4: 00000000000206e0
      [   53.546237] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   53.546240] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [   53.546242] Call Trace:
      [   53.546248]  f2fs_submit_page_bio+0x95/0x740
      [   53.546253]  read_node_page+0x161/0x1e0
      [   53.546271]  ? truncate_node+0x650/0x650
      [   53.546283]  ? add_to_page_cache_lru+0x12c/0x170
      [   53.546288]  ? pagecache_get_page+0x262/0x2d0
      [   53.546292]  __get_node_page+0x200/0x660
      [   53.546302]  f2fs_update_inode_page+0x4a/0x160
      [   53.546306]  f2fs_write_inode+0x86/0xb0
      [   53.546317]  __writeback_single_inode+0x49c/0x620
      [   53.546322]  writeback_single_inode+0xe4/0x1e0
      [   53.546326]  sync_inode_metadata+0x93/0xd0
      [   53.546330]  ? sync_inode+0x10/0x10
      [   53.546342]  ? do_raw_spin_unlock+0xed/0x100
      [   53.546347]  f2fs_sync_inode_meta+0xe0/0x130
      [   53.546351]  f2fs_fill_super+0x287d/0x2d10
      [   53.546367]  ? vsnprintf+0x742/0x7a0
      [   53.546372]  ? f2fs_commit_super+0x180/0x180
      [   53.546379]  ? up_write+0x20/0x40
      [   53.546385]  ? set_blocksize+0x5f/0x140
      [   53.546391]  ? f2fs_commit_super+0x180/0x180
      [   53.546402]  mount_bdev+0x181/0x200
      [   53.546406]  mount_fs+0x94/0x180
      [   53.546411]  vfs_kern_mount+0x6c/0x1e0
      [   53.546415]  do_mount+0xe5e/0x1510
      [   53.546420]  ? fs_reclaim_release+0x9/0x30
      [   53.546424]  ? copy_mount_string+0x20/0x20
      [   53.546428]  ? fs_reclaim_acquire+0xd/0x30
      [   53.546435]  ? __might_sleep+0x2c/0xc0
      [   53.546440]  ? ___might_sleep+0x53/0x170
      [   53.546453]  ? __might_fault+0x4c/0x60
      [   53.546468]  ? _copy_from_user+0x95/0xa0
      [   53.546474]  ? memdup_user+0x39/0x60
      [   53.546478]  ksys_mount+0x88/0xb0
      [   53.546482]  __x64_sys_mount+0x5d/0x70
      [   53.546495]  do_syscall_64+0x65/0x130
      [   53.546503]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
      [   53.547639] ---[ end trace b804d1ea2fec893e ]---
      
      So if recover fails, we need to drop all recovered data.
      Signed-off-by: NSheng Yong <shengyong1@huawei.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      8722566b
    • C
      f2fs: fix to do sanity check with current segment number · 14b18321
      Chao Yu 提交于
      [ Upstream commit 042be0f849e5fc24116d0afecfaf926eed5cac63 ]
      
      https://bugzilla.kernel.org/show_bug.cgi?id=200219
      
      Reproduction way:
      - mount image
      - run poc code
      - umount image
      
      F2FS-fs (loop1): Bitmap was wrongly set, blk:15364
      ------------[ cut here ]------------
      kernel BUG at /home/yuchao/git/devf2fs/segment.c:2061!
      invalid opcode: 0000 [#1] PREEMPT SMP
      CPU: 2 PID: 17686 Comm: umount Tainted: G        W  O      4.18.0-rc2+ #39
      Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
      EIP: update_sit_entry+0x459/0x4e0 [f2fs]
      Code: e8 1c b5 fd ff 0f 0b 0f 0b 8b 45 e4 c7 44 24 08 9c 7a 6c f8 c7 44 24 04 bc 4a 6c f8 89 44 24 0c 8b 06 89 04 24 e8 f7 b4 fd ff <0f> 0b 8b 45 e4 0f b6 d2 89 54 24 10 c7 44 24 08 60 7a 6c f8 c7 44
      EAX: 00000032 EBX: 000000f8 ECX: 00000002 EDX: 00000001
      ESI: d7177000 EDI: f520fe68 EBP: d6477c6c ESP: d6477c34
      DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00010282
      CR0: 80050033 CR2: b7fbe000 CR3: 2a99b3c0 CR4: 000406f0
      Call Trace:
       f2fs_allocate_data_block+0x124/0x580 [f2fs]
       do_write_page+0x78/0x150 [f2fs]
       f2fs_do_write_node_page+0x25/0xa0 [f2fs]
       __write_node_page+0x2bf/0x550 [f2fs]
       f2fs_sync_node_pages+0x60e/0x6d0 [f2fs]
       ? sync_inode_metadata+0x2f/0x40
       ? f2fs_write_checkpoint+0x28f/0x7d0 [f2fs]
       ? up_write+0x1e/0x80
       f2fs_write_checkpoint+0x2a9/0x7d0 [f2fs]
       ? mark_held_locks+0x5d/0x80
       ? _raw_spin_unlock_irq+0x27/0x50
       kill_f2fs_super+0x68/0x90 [f2fs]
       deactivate_locked_super+0x3d/0x70
       deactivate_super+0x40/0x60
       cleanup_mnt+0x39/0x70
       __cleanup_mnt+0x10/0x20
       task_work_run+0x81/0xa0
       exit_to_usermode_loop+0x59/0xa7
       do_fast_syscall_32+0x1f5/0x22c
       entry_SYSENTER_32+0x53/0x86
      EIP: 0xb7f95c51
      Code: c1 1e f7 ff ff 89 e5 8b 55 08 85 d2 8b 81 64 cd ff ff 74 02 89 02 5d c3 8b 0c 24 c3 8b 1c 24 c3 90 51 52 55 89 e5 0f 34 cd 80 <5d> 5a 59 c3 90 90 90 90 8d 76 00 58 b8 77 00 00 00 cd 80 90 8d 76
      EAX: 00000000 EBX: 0871ab90 ECX: bfb2cd00 EDX: 00000000
      ESI: 00000000 EDI: 0871ab90 EBP: 0871ab90 ESP: bfb2cd7c
      DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000246
      Modules linked in: f2fs(O) crc32_generic bnep rfcomm bluetooth ecdh_generic snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq pcbc joydev aesni_intel snd_seq_device aes_i586 snd_timer crypto_simd snd cryptd soundcore mac_hid serio_raw video i2c_piix4 parport_pc ppdev lp parport hid_generic psmouse usbhid hid e1000 [last unloaded: f2fs]
      ---[ end trace d423f83982cfcdc5 ]---
      
      The reason is, different log headers using the same segment, once
      one log's next block address is used by another log, it will cause
      panic as above.
      
      Main area: 24 segs, 24 secs 24 zones
        - COLD  data: 0, 0, 0
        - WARM  data: 1, 1, 1
        - HOT   data: 20, 20, 20
        - Dir   dnode: 22, 22, 22
        - File   dnode: 22, 22, 22
        - Indir nodes: 21, 21, 21
      
      So this patch adds sanity check to detect such condition to avoid
      this issue.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      14b18321
    • D
      9p locks: add mount option for lock retry interval · 4369f8a3
      Dinu-Razvan Chis-Serban 提交于
      [ Upstream commit 5e172f75e51e3de1b4274146d9b990f803cb5c2a ]
      
      The default P9_LOCK_TIMEOUT can be too long for some users exporting
      a local file system to a guest VM (30s), make this configurable at
      mount time.
      
      Link: http://lkml.kernel.org/r/1536295827-3181-1-git-send-email-asmadeus@codewreck.org
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195727Signed-off-by: NDinu-Razvan Chis-Serban <justcsdr@gmail.com>
      Signed-off-by: NDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      4369f8a3
    • G
      9p: do not trust pdu content for stat item size · db77c789
      Gertjan Halkes 提交于
      [ Upstream commit 2803cf4379ed252894f046cb8812a48db35294e3 ]
      
      v9fs_dir_readdir() could deadloop if a struct was sent with a size set
      to -2
      
      Link: http://lkml.kernel.org/r/1536134432-11997-1-git-send-email-asmadeus@codewreck.org
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=88021Signed-off-by: NGertjan Halkes <gertjan@google.com>
      Signed-off-by: NDominique Martinet <dominique.martinet@cea.fr>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      db77c789
    • C
      f2fs: fix to avoid NULL pointer dereference on se->discard_map · f9368366
      Chao Yu 提交于
      [ Upstream commit 7d20c8abb2edcf962ca857d51f4d0f9cd4b19053 ]
      
      https://bugzilla.kernel.org/show_bug.cgi?id=200951
      
      These is a NULL pointer dereference issue reported in bugzilla:
      
      Hi,
      in the setup there is a SATA SSD connected to a SATA-to-USB bridge.
      
      The disc is "Samsung SSD 850 PRO 256G" which supports TRIM.
      There are four partitions:
       sda1: FAT  /boot
       sda2: F2FS /
       sda3: F2FS /home
       sda4: F2FS
      
      The bridge is ASMT1153e which uses the "uas" driver.
      There is no TRIM pass-through, so, when mounting it reports:
       mounting with "discard" option, but the device does not support discard
      
      The USB host is USB3.0 and UASP capable. It is the one on RK3399.
      
      Given this everything works fine, except there is no TRIM support.
      
      In order to enable TRIM a new UDEV rule is added [1]:
       /etc/udev/rules.d/10-sata-bridge-trim.rules:
       ACTION=="add|change", ATTRS{idVendor}=="174c", ATTRS{idProduct}=="55aa", SUBSYSTEM=="scsi_disk", ATTR{provisioning_mode}="unmap"
      After reboot any F2FS write hangs forever and dmesg reports:
       Unable to handle kernel NULL pointer dereference
      
      Also tested on a x86_64 system: works fine even with TRIM enabled.
       same disc
       same bridge
       different usb host controller
       different cpu architecture
       not root filesystem
      
      Regards,
        Vicenç.
      
      [1] Post #5 in https://bbs.archlinux.org/viewtopic.php?id=236280
      
       Unable to handle kernel NULL pointer dereference at virtual address 000000000000003e
       Mem abort info:
         ESR = 0x96000004
         Exception class = DABT (current EL), IL = 32 bits
         SET = 0, FnV = 0
         EA = 0, S1PTW = 0
       Data abort info:
         ISV = 0, ISS = 0x00000004
         CM = 0, WnR = 0
       user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000626e3122
       [000000000000003e] pgd=0000000000000000
       Internal error: Oops: 96000004 [#1] SMP
       Modules linked in: overlay snd_soc_hdmi_codec rc_cec dw_hdmi_i2s_audio dw_hdmi_cec snd_soc_simple_card snd_soc_simple_card_utils snd_soc_rockchip_i2s rockchip_rga snd_soc_rockchip_pcm rockchipdrm videobuf2_dma_sg v4l2_mem2mem rtc_rk808 videobuf2_memops analogix_dp videobuf2_v4l2 videobuf2_common dw_hdmi dw_wdt cec rc_core videodev drm_kms_helper media drm rockchip_thermal rockchip_saradc realtek drm_panel_orientation_quirks syscopyarea sysfillrect sysimgblt fb_sys_fops dwmac_rk stmmac_platform stmmac pwm_bl squashfs loop crypto_user gpio_keys hid_kensington
       CPU: 5 PID: 957 Comm: nvim Not tainted 4.19.0-rc1-1-ARCH #1
       Hardware name: Sapphire-RK3399 Board (DT)
       pstate: 00000005 (nzcv daif -PAN -UAO)
       pc : update_sit_entry+0x304/0x4b0
       lr : update_sit_entry+0x108/0x4b0
       sp : ffff00000ca13bd0
       x29: ffff00000ca13bd0 x28: 000000000000003e
       x27: 0000000000000020 x26: 0000000000080000
       x25: 0000000000000048 x24: ffff8000ebb85cf8
       x23: 0000000000000253 x22: 00000000ffffffff
       x21: 00000000000535f2 x20: 00000000ffffffdf
       x19: ffff8000eb9e6800 x18: ffff8000eb9e6be8
       x17: 0000000007ce6926 x16: 000000001c83ffa8
       x15: 0000000000000000 x14: ffff8000f602df90
       x13: 0000000000000006 x12: 0000000000000040
       x11: 0000000000000228 x10: 0000000000000000
       x9 : 0000000000000000 x8 : 0000000000000000
       x7 : 00000000000535f2 x6 : ffff8000ebff3440
       x5 : ffff8000ebff3440 x4 : ffff8000ebe3a6c8
       x3 : 00000000ffffffff x2 : 0000000000000020
       x1 : 0000000000000000 x0 : ffff8000eb9e5800
       Process nvim (pid: 957, stack limit = 0x0000000063a78320)
       Call trace:
        update_sit_entry+0x304/0x4b0
        f2fs_invalidate_blocks+0x98/0x140
        truncate_node+0x90/0x400
        f2fs_remove_inode_page+0xe8/0x340
        f2fs_evict_inode+0x2b0/0x408
        evict+0xe0/0x1e0
        iput+0x160/0x260
        do_unlinkat+0x214/0x298
        __arm64_sys_unlinkat+0x3c/0x68
        el0_svc_handler+0x94/0x118
        el0_svc+0x8/0xc
       Code: f9400800 b9488400 36080140 f9400f01 (387c4820)
       ---[ end trace a0f21a307118c477 ]---
      
      The reason is it is possible to enable discard flag on block queue via
      UDEV, but during mount, f2fs will initialize se->discard_map only if
      this flag is set, once the flag is set after mount, f2fs may dereference
      NULL pointer on se->discard_map.
      
      So this patch does below changes to fix this issue:
      - initialize and update se->discard_map all the time.
      - don't clear DISCARD option if device has no QUEUE_FLAG_DISCARD flag
      during mount.
      - don't issue small discard on zoned block device.
      - introduce some functions to enhance the readability.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Tested-by: NVicente Bergas <vicencb@gmail.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      f9368366
    • D
      ext4: prohibit fstrim in norecovery mode · 6fd66bec
      Darrick J. Wong 提交于
      [ Upstream commit 18915b5873f07e5030e6fb108a050fa7c71c59fb ]
      
      The ext4 fstrim implementation uses the block bitmaps to find free space
      that can be discarded.  If we haven't replayed the journal, the bitmaps
      will be stale and we absolutely *cannot* use stale metadata to zap the
      underlying storage.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NTheodore Ts'o <tytso@mit.edu>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      6fd66bec
    • K
      x86/gart: Exclude GART aperture from kcore · 83e3e89d
      Kairui Song 提交于
      [ Upstream commit ffc8599aa9763f39f6736a79da4d1575e7006f9a ]
      
      On machines where the GART aperture is mapped over physical RAM,
      /proc/kcore contains the GART aperture range. Accessing the GART range via
      /proc/kcore results in a kernel crash.
      
      vmcore used to have the same issue, until it was fixed with commit
      2a3e83c6 ("x86/gart: Exclude GART aperture from vmcore")', leveraging
      existing hook infrastructure in vmcore to let /proc/vmcore return zeroes
      when attempting to read the aperture region, and so it won't read from the
      actual memory.
      
      Apply the same workaround for kcore. First implement the same hook
      infrastructure for kcore, then reuse the hook functions introduced in the
      previous vmcore fix. Just with some minor adjustment, rename some functions
      for more general usage, and simplify the hook infrastructure a bit as there
      is no module usage yet.
      Suggested-by: NBaoquan He <bhe@redhat.com>
      Signed-off-by: NKairui Song <kasong@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJiri Bohac <jbohac@suse.cz>
      Acked-by: NBaoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Omar Sandoval <osandov@fb.com>
      Cc: Dave Young <dyoung@redhat.com>
      Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.comSigned-off-by: NSasha Levin <sashal@kernel.org>
      83e3e89d
    • S
      fix incorrect error code mapping for OBJECTID_NOT_FOUND · 40276e4e
      Steve French 提交于
      [ Upstream commit 85f9987b236cf46e06ffdb5c225cf1f3c0acb789 ]
      
      It was mapped to EIO which can be confusing when user space
      queries for an object GUID for an object for which the server
      file system doesn't support (or hasn't saved one).
      
      As Amir Goldstein suggested this is similar to ENOATTR
      (equivalently ENODATA in Linux errno definitions) so
      changing NT STATUS code mapping for OBJECTID_NOT_FOUND
      to ENODATA.
      Signed-off-by: NSteve French <stfrench@microsoft.com>
      CC: Amir Goldstein <amir73il@gmail.com>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      40276e4e