1. 07 7月, 2020 4 次提交
    • D
      xfs: fix reflink quota reservation accounting error · 83895227
      Darrick J. Wong 提交于
      Quota reservations are supposed to account for the blocks that might be
      allocated due to a bmap btree split.  Reflink doesn't do this, so fix
      this to make the quota accounting more accurate before we start
      rearranging things.
      
      Fixes: 862bb360 ("xfs: reflink extents from one file to another")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      83895227
    • D
      xfs: don't eat an EIO/ENOSPC writeback error when scrubbing data fork · eb0efe50
      Darrick J. Wong 提交于
      The data fork scrubber calls filemap_write_and_wait to flush dirty pages
      and delalloc reservations out to disk prior to checking the data fork's
      extent mappings.  Unfortunately, this means that scrub can consume the
      EIO/ENOSPC errors that would otherwise have stayed around in the address
      space until (we hope) the writer application calls fsync to persist data
      and collect errors.  The end result is that programs that wrote to a
      file might never see the error code and proceed as if nothing were
      wrong.
      
      xfs_scrub is not in a position to notify file writers about the
      writeback failure, and it's only here to check metadata, not file
      contents.  Therefore, if writeback fails, we should stuff the error code
      back into the address space so that an fsync by the writer application
      can pick that up.
      
      Fixes: 99d9d8d0 ("xfs: scrub inode block mappings")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      eb0efe50
    • B
      xfs: preserve rmapbt swapext block reservation from freed blocks · f74681ba
      Brian Foster 提交于
      The rmapbt extent swap algorithm remaps individual extents between
      the source inode and the target to trigger reverse mapping metadata
      updates. If either inode straddles a format or other bmap allocation
      boundary, the individual unmap and map cycles can trigger repeated
      bmap block allocations and frees as the extent count bounces back
      and forth across the boundary. While net block usage is bound across
      the swap operation, this behavior can prematurely exhaust the
      transaction block reservation because it continuously drains as the
      transaction rolls. Each allocation accounts against the reservation
      and each free returns to global free space on transaction roll.
      
      The previous workaround to this problem attempted to detect this
      boundary condition and provide surplus block reservation to
      acommodate it. This is insufficient because more remaps can occur
      than implied by the extent counts; if start offset boundaries are
      not aligned between the two inodes, for example.
      
      To address this problem more generically and dynamically, add a
      transaction accounting mode that returns freed blocks to the
      transaction reservation instead of the superblock counters on
      transaction roll and use it when the rmapbt based algorithm is
      active. This allows the chain of remap transactions to preserve the
      block reservation based own its own frees and prevent premature
      exhaustion regardless of the remap pattern. Note that this is only
      safe for superblocks with lazy sb accounting, but the latter is
      required for v5 supers and the rmap feature depends on v5.
      
      Fixes: b3fed434 ("xfs: account format bouncing into rmapbt swapext tx reservation")
      Root-caused-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NBrian Foster <bfoster@redhat.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      f74681ba
    • K
      xfs: Couple of typo fixes in comments · 06734e3c
      Keyur Patel 提交于
      ./xfs/libxfs/xfs_inode_buf.c:56: unnecssary ==> unnecessary
      ./xfs/libxfs/xfs_inode_buf.c:59: behavour ==> behaviour
      ./xfs/libxfs/xfs_inode_buf.c:206: unitialized ==> uninitialized
      Signed-off-by: NKeyur Patel <iamkeyur96@gmail.com>
      Reviewed-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      06734e3c
  2. 05 7月, 2020 1 次提交
    • J
      io_uring: fix regression with always ignoring signals in io_cqring_wait() · b7db41c9
      Jens Axboe 提交于
      When switching to TWA_SIGNAL for task_work notifications, we also made
      any signal based condition in io_cqring_wait() return -ERESTARTSYS.
      This breaks applications that rely on using signals to abort someone
      waiting for events.
      
      Check if we have a signal pending because of queued task_work, and
      repeat the signal check once we've run the task_work. This provides a
      reliable way of telling the two apart.
      
      Additionally, only use TWA_SIGNAL if we are using an eventfd. If not,
      we don't have the dependency situation described in the original commit,
      and we can get by with just using TWA_RESUME like we previously did.
      
      Fixes: ce593a6c ("io_uring: use signal based task_work running")
      Cc: stable@vger.kernel.org # v5.7
      Reported-by: NAndres Freund <andres@anarazel.de>
      Tested-by: NAndres Freund <andres@anarazel.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b7db41c9
  3. 04 7月, 2020 1 次提交
  4. 03 7月, 2020 5 次提交
    • B
      gfs2: The freeze glock should never be frozen · c860f8ff
      Bob Peterson 提交于
      Before this patch, some gfs2 code locked the freeze glock with LM_FLAG_NOEXP
      (Do not freeze) flag, and some did not. We never want to freeze the freeze
      glock, so this patch makes it consistently use LM_FLAG_NOEXP always.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      c860f8ff
    • B
      gfs2: When freezing gfs2, use GL_EXACT and not GL_NOCACHE · 623ba664
      Bob Peterson 提交于
      Before this patch, the freeze code in gfs2 specified GL_NOCACHE in
      several places. That's wrong because we always want to know the state
      of whether the file system is frozen.
      
      There was also a problem with freeze/thaw transitioning the glock from
      frozen (EX) to thawed (SH) because gfs2 will normally grant glocks in EX
      to processes that request it in SH mode, unless GL_EXACT is specified.
      Therefore, the freeze/thaw code, which tried to reacquire the glock in
      SH mode would get the glock in EX mode, and miss the transition from EX
      to SH. That made it think the thaw had completed normally, but since the
      glock was still cached in EX, other nodes could not freeze again.
      
      This patch removes the GL_NOCACHE flag to allow the freeze glock to be
      cached. It also adds the GL_EXACT flag so the glock is fully transitioned
      from EX to SH, thereby allowing future freeze operations.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      623ba664
    • B
      gfs2: read-only mounts should grab the sd_freeze_gl glock · b780cc61
      Bob Peterson 提交于
      Before this patch, only read-write mounts would grab the freeze
      glock in read-only mode, as part of gfs2_make_fs_rw. So the freeze
      glock was never initialized. That meant requests to freeze, which
      request the glock in EX, were granted without any state transition.
      That meant you could mount a gfs2 file system, which is currently
      frozen on a different cluster node, in read-only mode.
      
      This patch makes read-only mounts lock the freeze glock in SH mode,
      which will block for file systems that are frozen on another node.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      b780cc61
    • B
      gfs2: freeze should work on read-only mounts · 541656d3
      Bob Peterson 提交于
      Before this patch, function freeze_go_sync, called when promoting
      the freeze glock, was testing for the SDF_JOURNAL_LIVE superblock flag.
      That's only set for read-write mounts. Read-only mounts don't use a
      journal, so the bit is never set, so the freeze never happened.
      
      This patch removes the check for SDF_JOURNAL_LIVE for freeze requests
      but still checks it when deciding whether to flush a journal.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      541656d3
    • B
      gfs2: eliminate GIF_ORDERED in favor of list_empty · 7542486b
      Bob Peterson 提交于
      In several places, we used the GIF_ORDERED inode flag to determine
      if an inode was on the ordered writes list. However, since we always
      held the sd_ordered_lock spin_lock during the manipulation, we can
      just as easily check list_empty(&ip->i_ordered) instead.
      This allows us to keep more than one ordered writes list to make
      journal writing improvements.
      
      This patch eliminates GIF_ORDERED in favor of checking list_empty.
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      7542486b
  5. 02 7月, 2020 8 次提交
  6. 01 7月, 2020 1 次提交
    • J
      io_uring: use signal based task_work running · ce593a6c
      Jens Axboe 提交于
      Since 5.7, we've been using task_work to trigger async running of
      requests in the context of the original task. This generally works
      great, but there's a case where if the task is currently blocked
      in the kernel waiting on a condition to become true, it won't process
      task_work. Even though the task is woken, it just checks whatever
      condition it's waiting on, and goes back to sleep if it's still false.
      
      This is a problem if that very condition only becomes true when that
      task_work is run. An example of that is the task registering an eventfd
      with io_uring, and it's now blocked waiting on an eventfd read. That
      read could depend on a completion event, and that completion event
      won't get trigged until task_work has been run.
      
      Use the TWA_SIGNAL notification for task_work, so that we ensure that
      the task always runs the work when queued.
      
      Cc: stable@vger.kernel.org # v5.7
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ce593a6c
  7. 30 6月, 2020 6 次提交
    • A
      gfs2: Don't sleep during glock hash walk · 34244d71
      Andreas Gruenbacher 提交于
      In flush_delete_work, instead of flushing each individual pending
      delayed work item, cancel and re-queue them for immediate execution.
      The waiting isn't needed here because we're already waiting for all
      queued work items to complete in gfs2_flush_delete_work.  This makes the
      code more efficient, but more importantly, it avoids sleeping during a
      rhashtable walk, inside rcu_read_lock().
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      34244d71
    • B
      gfs2: fix trans slab error when withdraw occurs inside log_flush · 58e08e8d
      Bob Peterson 提交于
      Log flush operations (gfs2_log_flush()) can target a specific transaction.
      But if the function encounters errors (e.g. io errors) and withdraws,
      the transaction was only freed it if was queued to one of the ail lists.
      If the withdraw occurred before the transaction was queued to the ail1
      list, function ail_drain never freed it. The result was:
      
      BUG gfs2_trans: Objects remaining in gfs2_trans on __kmem_cache_shutdown()
      
      This patch makes log_flush() add the targeted transaction to the ail1
      list so that function ail_drain() will find and free it properly.
      
      Cc: stable@vger.kernel.org # v5.7+
      Signed-off-by: NBob Peterson <rpeterso@redhat.com>
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      58e08e8d
    • A
      gfs2: Don't return NULL from gfs2_inode_lookup · 5902f4dd
      Andreas Gruenbacher 提交于
      Callers expect gfs2_inode_lookup to return an inode pointer or ERR_PTR(error).
      Commit b66648ad caused it to return NULL instead of ERR_PTR(-ESTALE) in
      some cases.  Fix that.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Fixes: b66648ad ("gfs2: Move inode generation number check into gfs2_inode_lookup")
      Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
      5902f4dd
    • J
      nfsd: fix nfsdfs inode reference count leak · bf265401
      J. Bruce Fields 提交于
      I don't understand this code well, but  I'm seeing a warning about a
      still-referenced inode on unmount, and every other similar filesystem
      does a dput() here.
      
      Fixes: e8a79fb1 ("nfsd: add nfsd/clients directory")
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bf265401
    • J
      nfsd4: fix nfsdfs reference count loop · 681370f4
      J. Bruce Fields 提交于
      We don't drop the reference on the nfsdfs filesystem with
      mntput(nn->nfsd_mnt) until nfsd_exit_net(), but that won't be called
      until the nfsd module's unloaded, and we can't unload the module as long
      as there's a reference on nfsdfs.  So this prevents module unloading.
      
      Fixes: 2c830dd7 ("nfsd: persist nfsd filesystem across mounts")
      Reported-and-Tested-by: R969857396's avatarLuo Xiaogang <lxgrxd@163.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      681370f4
    • M
      Revert "fs: Do not check if there is a fsnotify watcher on pseudo inodes" · b6509f6a
      Mel Gorman 提交于
      This reverts commit e9c15bad ("fs: Do not check if there is a
      fsnotify watcher on pseudo inodes"). The commit intended to eliminate
      fsnotify-related overhead for pseudo inodes but it is broken in
      concept. inotify can receive events of pipe files under /proc/X/fd and
      chromium relies on close and open events for sandboxing. Maxim Levitsky
      reported the following
      
        Chromium starts as a white rectangle, shows few white rectangles that
        resemble its notifications and then crashes.
      
        The stdout output from chromium:
      
        [mlevitsk@starship ~]$chromium-freeworld
        mesa: for the   --simplifycfg-sink-common option: may only occur zero or one times!
        mesa: for the   --global-isel-abort option: may only occur zero or one times!
        [3379:3379:0628/135151.440930:ERROR:browser_switcher_service.cc(238)] XXX Init()
        ../../sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.cc:**CRASHING**:seccomp-bpf failure in syscall 0072
        Received signal 11 SEGV_MAPERR 0000004a9048
      
      Crashes are not universal but even if chromium does not crash, it certainly
      does not work properly. While filtering just modify and access might be
      safe, the benefit is not worth the risk hence the revert.
      Reported-by: NMaxim Levitsky <mlevitsk@redhat.com>
      Fixes: e9c15bad ("fs: Do not check if there is a fsnotify watcher on pseudo inodes")
      Signed-off-by: NMel Gorman <mgorman@techsingularity.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6509f6a
  8. 29 6月, 2020 5 次提交
  9. 28 6月, 2020 1 次提交
  10. 26 6月, 2020 7 次提交
    • O
      NFSv4 fix CLOSE not waiting for direct IO compeletion · d03727b2
      Olga Kornievskaia 提交于
      Figuring out the root case for the REMOVE/CLOSE race and
      suggesting the solution was done by Neil Brown.
      
      Currently what happens is that direct IO calls hold a reference
      on the open context which is decremented as an asynchronous task
      in the nfs_direct_complete(). Before reference is decremented,
      control is returned to the application which is free to close the
      file. When close is being processed, it decrements its reference
      on the open_context but since directIO still holds one, it doesn't
      sent a close on the wire. It returns control to the application
      which is free to do other operations. For instance, it can delete a
      file. Direct IO is finally releasing its reference and triggering
      an asynchronous close. Which races with the REMOVE. On the server,
      REMOVE can be processed before the CLOSE, failing the REMOVE with
      EACCES as the file is still opened.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Suggested-by: NNeil Brown <neilb@suse.com>
      CC: stable@vger.kernel.org
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      d03727b2
    • T
      pNFS/flexfiles: Fix list corruption if the mirror count changes · 8b040137
      Trond Myklebust 提交于
      If the mirror count changes in the new layout we pick up inside
      ff_layout_pg_init_write(), then we can end up adding the
      request to the wrong mirror and corrupting the mirror->pg_list.
      
      Fixes: d600ad1f ("NFS41: pop some layoutget errors to application")
      Cc: stable@vger.kernel.org
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      8b040137
    • T
      nfs: Fix memory leak of export_path · 4659ed7c
      Tom Rix 提交于
      The try_location function is called within a loop by nfs_follow_referral.
      try_location calls nfs4_pathname_string to created the export_path.
      nfs4_pathname_string allocates the memory. export_path is stored in the
      nfs_fs_context/fs_context structure similarly as hostname and source.
      But whereas the ctx hostname and source are freed before assignment,
      export_path is not.  So if there are multiple loops, the new export_path
      will overwrite the old without the old being freed.
      
      So call kfree for export_path.
      Signed-off-by: NTom Rix <trix@redhat.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      4659ed7c
    • J
      ocfs2: fix value of OCFS2_INVALID_SLOT · 9277f833
      Junxiao Bi 提交于
      In the ocfs2 disk layout, slot number is 16 bits, but in ocfs2
      implementation, slot number is 32 bits.  Usually this will not cause any
      issue, because slot number is converted from u16 to u32, but
      OCFS2_INVALID_SLOT was defined as -1, when an invalid slot number from
      disk was obtained, its value was (u16)-1, and it was converted to u32.
      Then the following checking in get_local_system_inode will be always
      skipped:
      
       static struct inode **get_local_system_inode(struct ocfs2_super *osb,
                                                     int type,
                                                     u32 slot)
       {
       	BUG_ON(slot == OCFS2_INVALID_SLOT);
      	...
       }
      
      Link: http://lkml.kernel.org/r/20200616183829.87211-5-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9277f833
    • J
      ocfs2: fix panic on nfs server over ocfs2 · e5a15e17
      Junxiao Bi 提交于
      The following kernel panic was captured when running nfs server over
      ocfs2, at that time ocfs2_test_inode_bit() was checking whether one
      inode locating at "blkno" 5 was valid, that is ocfs2 root inode, its
      "suballoc_slot" was OCFS2_INVALID_SLOT(65535) and it was allocted from
      //global_inode_alloc, but here it wrongly assumed that it was got from per
      slot inode alloctor which would cause array overflow and trigger kernel
      panic.
      
        BUG: unable to handle kernel paging request at 0000000000001088
        IP: [<ffffffff816f6898>] _raw_spin_lock+0x18/0xf0
        PGD 1e06ba067 PUD 1e9e7d067 PMD 0
        Oops: 0002 [#1] SMP
        CPU: 6 PID: 24873 Comm: nfsd Not tainted 4.1.12-124.36.1.el6uek.x86_64 #2
        Hardware name: Huawei CH121 V3/IT11SGCA1, BIOS 3.87 02/02/2018
        RIP: _raw_spin_lock+0x18/0xf0
        RSP: e02b:ffff88005ae97908  EFLAGS: 00010206
        RAX: ffff88005ae98000 RBX: 0000000000001088 RCX: 0000000000000000
        RDX: 0000000000020000 RSI: 0000000000000009 RDI: 0000000000001088
        RBP: ffff88005ae97928 R08: 0000000000000000 R09: ffff880212878e00
        R10: 0000000000007ff0 R11: 0000000000000000 R12: 0000000000001088
        R13: ffff8800063c0aa8 R14: ffff8800650c27d0 R15: 000000000000ffff
        FS:  0000000000000000(0000) GS:ffff880218180000(0000) knlGS:ffff880218180000
        CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 0000000000001088 CR3: 00000002033d0000 CR4: 0000000000042660
        Call Trace:
          igrab+0x1e/0x60
          ocfs2_get_system_file_inode+0x63/0x3a0 [ocfs2]
          ocfs2_test_inode_bit+0x328/0xa00 [ocfs2]
          ocfs2_get_parent+0xba/0x3e0 [ocfs2]
          reconnect_path+0xb5/0x300
          exportfs_decode_fh+0xf6/0x2b0
          fh_verify+0x350/0x660 [nfsd]
          nfsd4_putfh+0x4d/0x60 [nfsd]
          nfsd4_proc_compound+0x3d3/0x6f0 [nfsd]
          nfsd_dispatch+0xe0/0x290 [nfsd]
          svc_process_common+0x412/0x6a0 [sunrpc]
          svc_process+0x123/0x210 [sunrpc]
          nfsd+0xff/0x170 [nfsd]
          kthread+0xcb/0xf0
          ret_from_fork+0x61/0x90
        Code: 83 c2 02 0f b7 f2 e8 18 dc 91 ff 66 90 eb bf 0f 1f 40 00 55 48 89 e5 41 56 41 55 41 54 53 0f 1f 44 00 00 48 89 fb ba 00 00 02 00 <f0> 0f c1 17 89 d0 45 31 e4 45 31 ed c1 e8 10 66 39 d0 41 89 c6
        RIP   _raw_spin_lock+0x18/0xf0
        CR2: 0000000000001088
        ---[ end trace 7264463cd1aac8f9 ]---
        Kernel panic - not syncing: Fatal exception
      
      Link: http://lkml.kernel.org/r/20200616183829.87211-4-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5a15e17
    • J
      ocfs2: load global_inode_alloc · 7569d3c7
      Junxiao Bi 提交于
      Set global_inode_alloc as OCFS2_FIRST_ONLINE_SYSTEM_INODE, that will
      make it load during mount.  It can be used to test whether some
      global/system inodes are valid.  One use case is that nfsd will test
      whether root inode is valid.
      
      Link: http://lkml.kernel.org/r/20200616183829.87211-3-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7569d3c7
    • J
      ocfs2: avoid inode removal while nfsd is accessing it · 4cd9973f
      Junxiao Bi 提交于
      Patch series "ocfs2: fix nfsd over ocfs2 issues", v2.
      
      This is a series of patches to fix issues on nfsd over ocfs2.  patch 1
      is to avoid inode removed while nfsd access it patch 2 & 3 is to fix a
      panic issue.
      
      This patch (of 4):
      
      When nfsd is getting file dentry using handle or parent dentry of some
      dentry, one cluster lock is used to avoid inode removed from other node,
      but it still could be removed from local node, so use a rw lock to avoid
      this.
      
      Link: http://lkml.kernel.org/r/20200616183829.87211-1-junxiao.bi@oracle.com
      Link: http://lkml.kernel.org/r/20200616183829.87211-2-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4cd9973f
  11. 25 6月, 2020 1 次提交
    • P
      io_uring: fix current->mm NULL dereference on exit · d60b5fbc
      Pavel Begunkov 提交于
      Don't reissue requests from io_iopoll_reap_events(), the task may not
      have mm, which ends up with NULL. It's better to kill everything off on
      exit anyway.
      
      [  677.734670] RIP: 0010:io_iopoll_complete+0x27e/0x630
      ...
      [  677.734679] Call Trace:
      [  677.734695]  ? __send_signal+0x1f2/0x420
      [  677.734698]  ? _raw_spin_unlock_irqrestore+0x24/0x40
      [  677.734699]  ? send_signal+0xf5/0x140
      [  677.734700]  io_iopoll_getevents+0x12f/0x1a0
      [  677.734702]  io_iopoll_reap_events.part.0+0x5e/0xa0
      [  677.734703]  io_ring_ctx_wait_and_kill+0x132/0x1c0
      [  677.734704]  io_uring_release+0x20/0x30
      [  677.734706]  __fput+0xcd/0x230
      [  677.734707]  ____fput+0xe/0x10
      [  677.734709]  task_work_run+0x67/0xa0
      [  677.734710]  do_exit+0x35d/0xb70
      [  677.734712]  do_group_exit+0x43/0xa0
      [  677.734713]  get_signal+0x140/0x900
      [  677.734715]  do_signal+0x37/0x780
      [  677.734717]  ? enqueue_hrtimer+0x41/0xb0
      [  677.734718]  ? recalibrate_cpu_khz+0x10/0x10
      [  677.734720]  ? ktime_get+0x3e/0xa0
      [  677.734721]  ? lapic_next_deadline+0x26/0x30
      [  677.734723]  ? tick_program_event+0x4d/0x90
      [  677.734724]  ? __hrtimer_get_next_event+0x4d/0x80
      [  677.734726]  __prepare_exit_to_usermode+0x126/0x1c0
      [  677.734741]  prepare_exit_to_usermode+0x9/0x40
      [  677.734742]  idtentry_exit_cond_rcu+0x4c/0x60
      [  677.734743]  sysvec_reschedule_ipi+0x92/0x160
      [  677.734744]  ? asm_sysvec_reschedule_ipi+0xa/0x20
      [  677.734745]  asm_sysvec_reschedule_ipi+0x12/0x20
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d60b5fbc