1. 02 5月, 2019 40 次提交
    • D
      x86/retpolines: Disable switch jump tables when retpolines are enabled · e923c6b7
      Daniel Borkmann 提交于
      commit a9d57ef15cbe327fe54416dd194ee0ea66ae53a4 upstream.
      
      Commit ce02ef06fcf7 ("x86, retpolines: Raise limit for generating indirect
      calls from switch-case") raised the limit under retpolines to 20 switch
      cases where gcc would only then start to emit jump tables, and therefore
      effectively disabling the emission of slow indirect calls in this area.
      
      After this has been brought to attention to gcc folks [0], Martin Liska
      has then fixed gcc to align with clang by avoiding to generate switch jump
      tables entirely under retpolines. This is taking effect in gcc starting
      from stable version 8.4.0. Given kernel supports compilation with older
      versions of gcc where the fix is not being available or backported anymore,
      we need to keep the extra KBUILD_CFLAGS around for some time and generally
      set the -fno-jump-tables to align with what more recent gcc is doing
      automatically today.
      
      More than 20 switch cases are not expected to be fast-path critical, but
      it would still be good to align with gcc behavior for versions < 8.4.0 in
      order to have consistency across supported gcc versions. vmlinux size is
      slightly growing by 0.27% for older gcc. This flag is only set to work
      around affected gcc, no change for clang.
      
        [0] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86952Suggested-by: NMartin Liska <mliska@suse.cz>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Björn Töpel<bjorn.topel@intel.com>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: H.J. Lu <hjl.tools@gmail.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: David S. Miller <davem@davemloft.net>
      Link: https://lkml.kernel.org/r/20190325135620.14882-1-daniel@iogearbox.netSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e923c6b7
    • D
      x86, retpolines: Raise limit for generating indirect calls from switch-case · 6cfcff3c
      Daniel Borkmann 提交于
      commit ce02ef06fcf7a399a6276adb83f37373d10cbbe1 upstream.
      
      From networking side, there are numerous attempts to get rid of indirect
      calls in fast-path wherever feasible in order to avoid the cost of
      retpolines, for example, just to name a few:
      
        * 283c16a2dfd3 ("indirect call wrappers: helpers to speed-up indirect calls of builtin")
        * aaa5d90b395a ("net: use indirect call wrappers at GRO network layer")
        * 028e0a476684 ("net: use indirect call wrappers at GRO transport layer")
        * 356da6d0cde3 ("dma-mapping: bypass indirect calls for dma-direct")
        * 09772d92 ("bpf: avoid retpoline for lookup/update/delete calls on maps")
        * 10870dd89e95 ("netfilter: nf_tables: add direct calls for all builtin expressions")
        [...]
      
      Recent work on XDP from Björn and Magnus additionally found that manually
      transforming the XDP return code switch statement with more than 5 cases
      into if-else combination would result in a considerable speedup in XDP
      layer due to avoidance of indirect calls in CONFIG_RETPOLINE enabled
      builds. On i40e driver with XDP prog attached, a 20-26% speedup has been
      observed [0]. Aside from XDP, there are many other places later in the
      networking stack's critical path with similar switch-case
      processing. Rather than fixing every XDP-enabled driver and locations in
      stack by hand, it would be good to instead raise the limit where gcc would
      emit expensive indirect calls from the switch under retpolines and stick
      with the default as-is in case of !retpoline configured kernels. This would
      also have the advantage that for archs where this is not necessary, we let
      compiler select the underlying target optimization for these constructs and
      avoid potential slow-downs by if-else hand-rewrite.
      
      In case of gcc, this setting is controlled by case-values-threshold which
      has an architecture global default that selects 4 or 5 (latter if target
      does not have a case insn that compares the bounds) where some arch back
      ends like arm64 or s390 override it with their own target hooks, for
      example, in gcc commit db7a90aa0de5 ("S/390: Disable prediction of indirect
      branches") the threshold pretty much disables jump tables by limit of 20
      under retpoline builds.  Comparing gcc's and clang's default code
      generation on x86-64 under O2 level with retpoline build results in the
      following outcome for 5 switch cases:
      
      * gcc with -mindirect-branch=thunk-inline -mindirect-branch-register:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400be0 <+0>:     cmp    $0x4,%edi
         0x0000000000400be3 <+3>:     ja     0x400c35 <dispatch+85>
         0x0000000000400be5 <+5>:     lea    0x915f8(%rip),%rdx        # 0x4921e4
         0x0000000000400bec <+12>:    mov    %edi,%edi
         0x0000000000400bee <+14>:    movslq (%rdx,%rdi,4),%rax
         0x0000000000400bf2 <+18>:    add    %rdx,%rax
         0x0000000000400bf5 <+21>:    callq  0x400c01 <dispatch+33>
         0x0000000000400bfa <+26>:    pause
         0x0000000000400bfc <+28>:    lfence
         0x0000000000400bff <+31>:    jmp    0x400bfa <dispatch+26>
         0x0000000000400c01 <+33>:    mov    %rax,(%rsp)
         0x0000000000400c05 <+37>:    retq
         0x0000000000400c06 <+38>:    nopw   %cs:0x0(%rax,%rax,1)
         0x0000000000400c10 <+48>:    jmpq   0x400c90 <fn_3>
         0x0000000000400c15 <+53>:    nopl   (%rax)
         0x0000000000400c18 <+56>:    jmpq   0x400c70 <fn_2>
         0x0000000000400c1d <+61>:    nopl   (%rax)
         0x0000000000400c20 <+64>:    jmpq   0x400c50 <fn_1>
         0x0000000000400c25 <+69>:    nopl   (%rax)
         0x0000000000400c28 <+72>:    jmpq   0x400c40 <fn_0>
         0x0000000000400c2d <+77>:    nopl   (%rax)
         0x0000000000400c30 <+80>:    jmpq   0x400cb0 <fn_4>
         0x0000000000400c35 <+85>:    push   %rax
         0x0000000000400c36 <+86>:    callq  0x40dd80 <abort>
        End of assembler dump.
      
      * clang with -mretpoline emitting search tree:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400b30 <+0>:     cmp    $0x1,%edi
         0x0000000000400b33 <+3>:     jle    0x400b44 <dispatch+20>
         0x0000000000400b35 <+5>:     cmp    $0x2,%edi
         0x0000000000400b38 <+8>:     je     0x400b4d <dispatch+29>
         0x0000000000400b3a <+10>:    cmp    $0x3,%edi
         0x0000000000400b3d <+13>:    jne    0x400b52 <dispatch+34>
         0x0000000000400b3f <+15>:    jmpq   0x400c50 <fn_3>
         0x0000000000400b44 <+20>:    test   %edi,%edi
         0x0000000000400b46 <+22>:    jne    0x400b5c <dispatch+44>
         0x0000000000400b48 <+24>:    jmpq   0x400c20 <fn_0>
         0x0000000000400b4d <+29>:    jmpq   0x400c40 <fn_2>
         0x0000000000400b52 <+34>:    cmp    $0x4,%edi
         0x0000000000400b55 <+37>:    jne    0x400b66 <dispatch+54>
         0x0000000000400b57 <+39>:    jmpq   0x400c60 <fn_4>
         0x0000000000400b5c <+44>:    cmp    $0x1,%edi
         0x0000000000400b5f <+47>:    jne    0x400b66 <dispatch+54>
         0x0000000000400b61 <+49>:    jmpq   0x400c30 <fn_1>
         0x0000000000400b66 <+54>:    push   %rax
         0x0000000000400b67 <+55>:    callq  0x40dd20 <abort>
        End of assembler dump.
      
        For sake of comparison, clang without -mretpoline:
      
        # gdb -batch -ex 'disassemble dispatch' ./c-switch
        Dump of assembler code for function dispatch:
         0x0000000000400b30 <+0>:	cmp    $0x4,%edi
         0x0000000000400b33 <+3>:	ja     0x400b57 <dispatch+39>
         0x0000000000400b35 <+5>:	mov    %edi,%eax
         0x0000000000400b37 <+7>:	jmpq   *0x492148(,%rax,8)
         0x0000000000400b3e <+14>:	jmpq   0x400bf0 <fn_0>
         0x0000000000400b43 <+19>:	jmpq   0x400c30 <fn_4>
         0x0000000000400b48 <+24>:	jmpq   0x400c10 <fn_2>
         0x0000000000400b4d <+29>:	jmpq   0x400c20 <fn_3>
         0x0000000000400b52 <+34>:	jmpq   0x400c00 <fn_1>
         0x0000000000400b57 <+39>:	push   %rax
         0x0000000000400b58 <+40>:	callq  0x40dcf0 <abort>
        End of assembler dump.
      
      Raising the cases to a high number (e.g. 100) will still result in similar
      code generation pattern with clang and gcc as above, in other words clang
      generally turns off jump table emission by having an extra expansion pass
      under retpoline build to turn indirectbr instructions from their IR into
      switch instructions as a built-in -mno-jump-table lowering of a switch (in
      this case, even if IR input already contained an indirect branch).
      
      For gcc, adding --param=case-values-threshold=20 as in similar fashion as
      s390 in order to raise the limit for x86 retpoline enabled builds results
      in a small vmlinux size increase of only 0.13% (before=18,027,528
      after=18,051,192). For clang this option is ignored due to i) not being
      needed as mentioned and ii) not having above cmdline
      parameter. Non-retpoline-enabled builds with gcc continue to use the
      default case-values-threshold setting, so nothing changes here.
      
      [0] https://lore.kernel.org/netdev/20190129095754.9390-1-bjorn.topel@gmail.com/
          and "The Path to DPDK Speeds for AF_XDP", LPC 2018, networking track:
        - http://vger.kernel.org/lpc_net2018_talks/lpc18_pres_af_xdp_perf-v3.pdf
        - http://vger.kernel.org/lpc_net2018_talks/lpc18_paper_af_xdp_perf-v2.pdfSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Acked-by: NBjörn Töpel <bjorn.topel@intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: netdev@vger.kernel.org
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Magnus Karlsson <magnus.karlsson@intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: https://lkml.kernel.org/r/20190221221941.29358-1-daniel@iogearbox.netSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6cfcff3c
    • A
      Fix aio_poll() races · e9e47779
      Al Viro 提交于
      commit af5c72b1fc7a00aa484e90b0c4e0eeb582545634 upstream.
      
      aio_poll() has to cope with several unpleasant problems:
      	* requests that might stay around indefinitely need to
      be made visible for io_cancel(2); that must not be done to
      a request already completed, though.
      	* in cases when ->poll() has placed us on a waitqueue,
      wakeup might have happened (and request completed) before ->poll()
      returns.
      	* worse, in some early wakeup cases request might end
      up re-added into the queue later - we can't treat "woken up and
      currently not in the queue" as "it's not going to stick around
      indefinitely"
      	* ... moreover, ->poll() might have decided not to
      put it on any queues to start with, and that needs to be distinguished
      from the previous case
      	* ->poll() might have tried to put us on more than one queue.
      Only the first will succeed for aio poll, so we might end up missing
      wakeups.  OTOH, we might very well notice that only after the
      wakeup hits and request gets completed (all before ->poll() gets
      around to the second poll_wait()).  In that case it's too late to
      decide that we have an error.
      
      req->woken was an attempt to deal with that.  Unfortunately, it was
      broken.  What we need to keep track of is not that wakeup has happened -
      the thing might come back after that.  It's that async reference is
      already gone and won't come back, so we can't (and needn't) put the
      request on the list of cancellables.
      
      The easiest case is "request hadn't been put on any waitqueues"; we
      can tell by seeing NULL apt.head, and in that case there won't be
      anything async.  We should either complete the request ourselves
      (if vfs_poll() reports anything of interest) or return an error.
      
      In all other cases we get exclusion with wakeups by grabbing the
      queue lock.
      
      If request is currently on queue and we have something interesting
      from vfs_poll(), we can steal it and complete the request ourselves.
      
      If it's on queue and vfs_poll() has not reported anything interesting,
      we either put it on the cancellable list, or, if we know that it
      hadn't been put on all queues ->poll() wanted it on, we steal it and
      return an error.
      
      If it's _not_ on queue, it's either been already dealt with (in which
      case we do nothing), or there's aio_poll_complete_work() about to be
      executed.  In that case we either put it on the cancellable list,
      or, if we know it hadn't been put on all queues ->poll() wanted it on,
      simulate what cancel would've done.
      
      It's a lot more convoluted than I'd like it to be.  Single-consumer APIs
      suck, and unfortunately aio is not an exception...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e9e47779
    • A
      aio: store event at final iocb_put() · aab66dfb
      Al Viro 提交于
      commit 2bb874c0d873d13bd9b9b9c6d7b7c4edab18c8b4 upstream.
      
      Instead of having aio_complete() set ->ki_res.{res,res2}, do that
      explicitly in its callers, drop the reference (as aio_complete()
      used to do) and delay the rest until the final iocb_put().
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      aab66dfb
    • A
      aio: keep io_event in aio_kiocb · c20202c5
      Al Viro 提交于
      commit a9339b7855094ba11a97e8822ae038135e879e79 upstream.
      
      We want to separate forming the resulting io_event from putting it
      into the ring buffer.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c20202c5
    • A
      aio: fold lookup_kiocb() into its sole caller · 592ea630
      Al Viro 提交于
      commit 833f4154ed560232120bc475935ee1d6a20e159f upstream.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      592ea630
    • L
      pin iocb through aio. · c7f2525a
      Linus Torvalds 提交于
      commit b53119f13a04879c3bf502828d99d13726639ead upstream.
      
      aio_poll() is not the only case that needs file pinned; worse, while
      aio_read()/aio_write() can live without pinning iocb itself, the
      proof is rather brittle and can easily break on later changes.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c7f2525a
    • L
      aio: simplify - and fix - fget/fput for io_submit() · d6b2615f
      Linus Torvalds 提交于
      commit 84c4e1f89fefe70554da0ab33be72c9be7994379 upstream.
      
      Al Viro root-caused a race where the IOCB_CMD_POLL handling of
      fget/fput() could cause us to access the file pointer after it had
      already been freed:
      
       "In more details - normally IOCB_CMD_POLL handling looks so:
      
         1) io_submit(2) allocates aio_kiocb instance and passes it to
            aio_poll()
      
         2) aio_poll() resolves the descriptor to struct file by req->file =
            fget(iocb->aio_fildes)
      
         3) aio_poll() sets ->woken to false and raises ->ki_refcnt of that
            aio_kiocb to 2 (bumps by 1, that is).
      
         4) aio_poll() calls vfs_poll(). After sanity checks (basically,
            "poll_wait() had been called and only once") it locks the queue.
            That's what the extra reference to iocb had been for - we know we
            can safely access it.
      
         5) With queue locked, we check if ->woken has already been set to
            true (by aio_poll_wake()) and, if it had been, we unlock the
            queue, drop a reference to aio_kiocb and bugger off - at that
            point it's a responsibility to aio_poll_wake() and the stuff
            called/scheduled by it. That code will drop the reference to file
            in req->file, along with the other reference to our aio_kiocb.
      
         6) otherwise, we see whether we need to wait. If we do, we unlock the
            queue, drop one reference to aio_kiocb and go away - eventual
            wakeup (or cancel) will deal with the reference to file and with
            the other reference to aio_kiocb
      
         7) otherwise we remove ourselves from waitqueue (still under the
            queue lock), so that wakeup won't get us. No async activity will
            be happening, so we can safely drop req->file and iocb ourselves.
      
        If wakeup happens while we are in vfs_poll(), we are fine - aio_kiocb
        won't get freed under us, so we can do all the checks and locking
        safely. And we don't touch ->file if we detect that case.
      
        However, vfs_poll() most certainly *does* touch the file it had been
        given. So wakeup coming while we are still in ->poll() might end up
        doing fput() on that file. That case is not too rare, and usually we
        are saved by the still present reference from descriptor table - that
        fput() is not the final one.
      
        But if another thread closes that descriptor right after our fget()
        and wakeup does happen before ->poll() returns, we are in trouble -
        final fput() done while we are in the middle of a method:
      
      Al also wrote a patch to take an extra reference to the file descriptor
      to fix this, but I instead suggested we just streamline the whole file
      pointer handling by submit_io() so that the generic aio submission code
      simply keeps the file pointer around until the aio has completed.
      
      Fixes: bfe4037e ("aio: implement IOCB_CMD_POLL")
      Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
      Reported-by: syzbot+503d4cc169fcec1cb18c@syzkaller.appspotmail.com
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d6b2615f
    • M
      aio: initialize kiocb private in case any filesystems expect it. · 2afa01cd
      Mike Marshall 提交于
      commit ec51f8ee1e63498e9f521ec0e5a6d04622bb2c67 upstream.
      
      A recent optimization had left private uninitialized.
      
      Fixes: 2bc4ca9bb600 ("aio: don't zero entire aio_kiocb aio_get_req()")
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Marshall <hubcap@omnibond.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2afa01cd
    • J
      aio: abstract out io_event filler helper · a812f7b6
      Jens Axboe 提交于
      commit 875736bb3f3ded168469f6a14df7a938416a99d5 upstream.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a812f7b6
    • J
      aio: split out iocb copy from io_submit_one() · d384f8b8
      Jens Axboe 提交于
      commit 88a6f18b950e2e4dce57d31daa151105f4f3dcff upstream.
      
      In preparation of handing in iocbs in a different fashion as well. Also
      make it clear that the iocb being passed in isn't modified, by marking
      it const throughout.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d384f8b8
    • J
      aio: use iocb_put() instead of open coding it · 4d677689
      Jens Axboe 提交于
      commit 71ebc6fef0f53459f37fb39e1466792232fa52ee upstream.
      
      Replace the percpu_ref_put() + kmem_cache_free() with a call to
      iocb_put() instead.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d677689
    • J
      aio: don't zero entire aio_kiocb aio_get_req() · ef529eea
      Jens Axboe 提交于
      commit 2bc4ca9bb600cbe36941da2b2a67189fc4302a04 upstream.
      
      It's 192 bytes, fairly substantial. Most items don't need to be cleared,
      especially not upfront. Clear the ones we do need to clear, and leave
      the other ones for setup when the iocb is prepared and submitted.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ef529eea
    • C
      aio: separate out ring reservation from req allocation · 730198c8
      Christoph Hellwig 提交于
      commit 432c79978c33ecef91b1b04cea6936c20810da29 upstream.
      
      This is in preparation for certain types of IO not needing a ring
      reserveration.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      730198c8
    • J
      aio: use assigned completion handler · b3373253
      Jens Axboe 提交于
      commit bc9bff61624ac33b7c95861abea1af24ee7a94fc upstream.
      
      We know this is a read/write request, but in preparation for
      having different kinds of those, ensure that we call the assigned
      handler instead of assuming it's aio_complete_rq().
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b3373253
    • C
      aio: clear IOCB_HIPRI · 9101cbe7
      Christoph Hellwig 提交于
      commit 154989e45fd8de9bfb52bbd6e5ea763e437e54c5 upstream.
      
      No one is going to poll for aio (yet), so we must clear the HIPRI
      flag, as we would otherwise send it down the poll queues, where no
      one will be polling for completions.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      
      IOCB_HIPRI, not RWF_HIPRI.
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9101cbe7
    • E
      rxrpc: fix race condition in rxrpc_input_packet() · 920ecc72
      Eric Dumazet 提交于
      commit 032be5f19a94de51093851757089133dcc1e92aa upstream.
      
      After commit 5271953c ("rxrpc: Use the UDP encap_rcv hook"),
      rxrpc_input_packet() is directly called from lockless UDP receive
      path, under rcu_read_lock() protection.
      
      It must therefore use RCU rules :
      
      - udp_sk->sk_user_data can be cleared at any point in this function.
        rcu_dereference_sk_user_data() is what we need here.
      
      - Also, since sk_user_data might have been set in rxrpc_open_socket()
        we must observe a proper RCU grace period before kfree(local) in
        rxrpc_lookup_local()
      
      v4: @local can be NULL in xrpc_lookup_local() as reported by kbuild test robot <lkp@intel.com>
              and Julia Lawall <julia.lawall@lip6.fr>, thanks !
      
      v3,v2 : addressed David Howells feedback, thanks !
      
      syzbot reported :
      
      kasan: CONFIG_KASAN_INLINE enabled
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] PREEMPT SMP KASAN
      CPU: 0 PID: 19236 Comm: syz-executor703 Not tainted 5.1.0-rc6 #79
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      RIP: 0010:__lock_acquire+0xbef/0x3fb0 kernel/locking/lockdep.c:3573
      Code: 00 0f 85 a5 1f 00 00 48 81 c4 10 01 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1 ea 03 <80> 3c 02 00 0f 85 4a 21 00 00 49 81 7d 00 20 54 9c 89 0f 84 cf f4
      RSP: 0018:ffff88809d7aef58 EFLAGS: 00010002
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000026 RSI: 0000000000000000 RDI: 0000000000000001
      RBP: ffff88809d7af090 R08: 0000000000000001 R09: 0000000000000001
      R10: ffffed1015d05bc7 R11: ffff888089428600 R12: 0000000000000000
      R13: 0000000000000130 R14: 0000000000000001 R15: 0000000000000001
      FS:  00007f059044d700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000004b6040 CR3: 00000000955ca000 CR4: 00000000001406f0
      Call Trace:
       lock_acquire+0x16f/0x3f0 kernel/locking/lockdep.c:4211
       __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
       _raw_spin_lock_irqsave+0x95/0xcd kernel/locking/spinlock.c:152
       skb_queue_tail+0x26/0x150 net/core/skbuff.c:2972
       rxrpc_reject_packet net/rxrpc/input.c:1126 [inline]
       rxrpc_input_packet+0x4a0/0x5536 net/rxrpc/input.c:1414
       udp_queue_rcv_one_skb+0xaf2/0x1780 net/ipv4/udp.c:2011
       udp_queue_rcv_skb+0x128/0x730 net/ipv4/udp.c:2085
       udp_unicast_rcv_skb.isra.0+0xb9/0x360 net/ipv4/udp.c:2245
       __udp4_lib_rcv+0x701/0x2ca0 net/ipv4/udp.c:2301
       udp_rcv+0x22/0x30 net/ipv4/udp.c:2482
       ip_protocol_deliver_rcu+0x60/0x8f0 net/ipv4/ip_input.c:208
       ip_local_deliver_finish+0x23b/0x390 net/ipv4/ip_input.c:234
       NF_HOOK include/linux/netfilter.h:289 [inline]
       NF_HOOK include/linux/netfilter.h:283 [inline]
       ip_local_deliver+0x1e9/0x520 net/ipv4/ip_input.c:255
       dst_input include/net/dst.h:450 [inline]
       ip_rcv_finish+0x1e1/0x300 net/ipv4/ip_input.c:413
       NF_HOOK include/linux/netfilter.h:289 [inline]
       NF_HOOK include/linux/netfilter.h:283 [inline]
       ip_rcv+0xe8/0x3f0 net/ipv4/ip_input.c:523
       __netif_receive_skb_one_core+0x115/0x1a0 net/core/dev.c:4987
       __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5099
       netif_receive_skb_internal+0x117/0x660 net/core/dev.c:5202
       napi_frags_finish net/core/dev.c:5769 [inline]
       napi_gro_frags+0xade/0xd10 net/core/dev.c:5843
       tun_get_user+0x2f24/0x3fb0 drivers/net/tun.c:1981
       tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2027
       call_write_iter include/linux/fs.h:1866 [inline]
       do_iter_readv_writev+0x5e1/0x8e0 fs/read_write.c:681
       do_iter_write fs/read_write.c:957 [inline]
       do_iter_write+0x184/0x610 fs/read_write.c:938
       vfs_writev+0x1b3/0x2f0 fs/read_write.c:1002
       do_writev+0x15e/0x370 fs/read_write.c:1037
       __do_sys_writev fs/read_write.c:1110 [inline]
       __se_sys_writev fs/read_write.c:1107 [inline]
       __x64_sys_writev+0x75/0xb0 fs/read_write.c:1107
       do_syscall_64+0x103/0x610 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 5271953c ("rxrpc: Use the UDP encap_rcv hook")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      920ecc72
    • T
      net/rds: Check address length before reading address family · 5a228d5d
      Tetsuo Handa 提交于
      commit dd3ac9a684358b8c1d5c432ca8322aaf5e4f28ee upstream.
      
      syzbot is reporting uninitialized value at rds_connect() [1] and
      rds_bind() [2]. This is because syzbot is passing ulen == 0 whereas
      these functions expect that it is safe to access sockaddr->family field
      in order to determine minimal address length for validation.
      
      [1] https://syzkaller.appspot.com/bug?id=f4e61c010416c1e6f0fa3ffe247561b60a50ad71
      [2] https://syzkaller.appspot.com/bug?id=a4bf9e41b7e055c3823fdcd83e8c58ca7270e38fReported-by: Nsyzbot <syzbot+0049bebbf3042dbd2e8f@syzkaller.appspotmail.com>
      Reported-by: Nsyzbot <syzbot+915c9f99f3dbc4bd6cd1@syzkaller.appspotmail.com>
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      5a228d5d
    • Y
      net: netrom: Fix error cleanup path of nr_proto_init · e30203e4
      YueHaibing 提交于
      commit d3706566ae3d92677b932dd156157fd6c72534b1 upstream.
      
      Syzkaller report this:
      
      BUG: unable to handle kernel paging request at fffffbfff830524b
      PGD 237fe8067 P4D 237fe8067 PUD 237e64067 PMD 1c9716067 PTE 0
      Oops: 0000 [#1] SMP KASAN PTI
      CPU: 1 PID: 4465 Comm: syz-executor.0 Not tainted 5.0.0+ #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:__list_add_valid+0x21/0xe0 lib/list_debug.c:23
      Code: 8b 0c 24 e9 17 fd ff ff 90 55 48 89 fd 48 8d 7a 08 53 48 89 d3 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 48 83 ec 08 <80> 3c 02 00 0f 85 8b 00 00 00 48 8b 53 08 48 39 f2 75 35 48 89 f2
      RSP: 0018:ffff8881ea2278d0 EFLAGS: 00010282
      RAX: dffffc0000000000 RBX: ffffffffc1829250 RCX: 1ffff1103d444ef4
      RDX: 1ffffffff830524b RSI: ffffffff85659300 RDI: ffffffffc1829258
      RBP: ffffffffc1879250 R08: fffffbfff0acb269 R09: fffffbfff0acb269
      R10: ffff8881ea2278f0 R11: fffffbfff0acb268 R12: ffffffffc1829250
      R13: dffffc0000000000 R14: 0000000000000008 R15: ffffffffc187c830
      FS:  00007fe0361df700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: fffffbfff830524b CR3: 00000001eb39a001 CR4: 00000000007606e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       __list_add include/linux/list.h:60 [inline]
       list_add include/linux/list.h:79 [inline]
       proto_register+0x444/0x8f0 net/core/sock.c:3375
       nr_proto_init+0x73/0x4b3 [netrom]
       ? 0xffffffffc1628000
       ? 0xffffffffc1628000
       do_one_initcall+0xbc/0x47d init/main.c:887
       do_init_module+0x1b5/0x547 kernel/module.c:3456
       load_module+0x6405/0x8c10 kernel/module.c:3804
       __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fe0361dec58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000020000100 RDI: 0000000000000003
      RBP: 00007fe0361dec70 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fe0361df6bc
      R13: 00000000004bcefa R14: 00000000006f6fb0 R15: 0000000000000004
      Modules linked in: netrom(+) ax25 fcrypt pcbc af_alg arizona_ldo1 v4l2_common videodev media v4l2_dv_timings hdlc ide_cd_mod snd_soc_sigmadsp_regmap snd_soc_sigmadsp intel_spi_platform intel_spi mtd spi_nor snd_usbmidi_lib usbcore lcd ti_ads7950 hi6421_regulator snd_soc_kbl_rt5663_max98927 snd_soc_hdac_hdmi snd_hda_ext_core snd_hda_core snd_soc_rt5663 snd_soc_core snd_pcm_dmaengine snd_compress snd_soc_rl6231 mac80211 rtc_rc5t583 spi_slave_time leds_pwm hid_gt683r hid industrialio_triggered_buffer kfifo_buf industrialio ir_kbd_i2c rc_core led_class_flash dwc_xlgmac snd_ymfpci gameport snd_mpu401_uart snd_rawmidi snd_ac97_codec snd_pcm ac97_bus snd_opl3_lib snd_timer snd_seq_device snd_hwdep snd soundcore iptable_security iptable_raw iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan
       bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ide_pci_generic piix aesni_intel aes_x86_64 crypto_simd cryptd glue_helper ide_core psmouse input_leds i2c_piix4 serio_raw intel_agp intel_gtt ata_generic agpgart pata_acpi parport_pc rtc_cmos parport floppy sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: rxrpc]
      Dumping ftrace buffer:
         (ftrace buffer empty)
      CR2: fffffbfff830524b
      ---[ end trace 039ab24b305c4b19 ]---
      
      If nr_proto_init failed, it may forget to call proto_unregister,
      tiggering this issue.This patch rearrange code of nr_proto_init
      to avoid such issues.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e30203e4
    • X
      tipc: check link name with right length in tipc_nl_compat_link_set · a0cb0faa
      Xin Long 提交于
      commit 8c63bf9ab4be8b83bd8c34aacfd2f1d2c8901c8a upstream.
      
      A similar issue as fixed by Patch "tipc: check bearer name with right
      length in tipc_nl_compat_bearer_enable" was also found by syzbot in
      tipc_nl_compat_link_set().
      
      The length to check with should be 'TLV_GET_DATA_LEN(msg->req) -
      offsetof(struct tipc_link_config, name)'.
      
      Reported-by: syzbot+de00a87b8644a582ae79@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a0cb0faa
    • X
      tipc: check bearer name with right length in tipc_nl_compat_bearer_enable · f21fae80
      Xin Long 提交于
      commit 6f07e5f06c8712acc423485f657799fc8e11e56c upstream.
      
      Syzbot reported the following crash:
      
      BUG: KMSAN: uninit-value in memchr+0xce/0x110 lib/string.c:961
        memchr+0xce/0x110 lib/string.c:961
        string_is_valid net/tipc/netlink_compat.c:176 [inline]
        tipc_nl_compat_bearer_enable+0x2c4/0x910 net/tipc/netlink_compat.c:401
        __tipc_nl_compat_doit net/tipc/netlink_compat.c:321 [inline]
        tipc_nl_compat_doit+0x3aa/0xaf0 net/tipc/netlink_compat.c:354
        tipc_nl_compat_handle net/tipc/netlink_compat.c:1162 [inline]
        tipc_nl_compat_recv+0x1ae7/0x2750 net/tipc/netlink_compat.c:1265
        genl_family_rcv_msg net/netlink/genetlink.c:601 [inline]
        genl_rcv_msg+0x185f/0x1a60 net/netlink/genetlink.c:626
        netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477
        genl_rcv+0x63/0x80 net/netlink/genetlink.c:637
        netlink_unicast_kernel net/netlink/af_netlink.c:1310 [inline]
        netlink_unicast+0xf3e/0x1020 net/netlink/af_netlink.c:1336
        netlink_sendmsg+0x127f/0x1300 net/netlink/af_netlink.c:1917
        sock_sendmsg_nosec net/socket.c:622 [inline]
        sock_sendmsg net/socket.c:632 [inline]
      
      Uninit was created at:
        __alloc_skb+0x309/0xa20 net/core/skbuff.c:208
        alloc_skb include/linux/skbuff.h:1012 [inline]
        netlink_alloc_large_skb net/netlink/af_netlink.c:1182 [inline]
        netlink_sendmsg+0xb82/0x1300 net/netlink/af_netlink.c:1892
        sock_sendmsg_nosec net/socket.c:622 [inline]
        sock_sendmsg net/socket.c:632 [inline]
      
      It was triggered when the bearer name size < TIPC_MAX_BEARER_NAME,
      it would check with a wrong len/TLV_GET_DATA_LEN(msg->req), which
      also includes priority and disc_domain length.
      
      This patch is to fix it by checking it with a right length:
      'TLV_GET_DATA_LEN(msg->req) - offsetof(struct tipc_bearer_config, name)'.
      
      Reported-by: syzbot+8b707430713eb46e1e45@syzkaller.appspotmail.com
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f21fae80
    • Y
      fm10k: Fix a potential NULL pointer dereference · 9b9b0df4
      Yue Haibing 提交于
      commit 01ca667133d019edc9f0a1f70a272447c84ec41f upstream.
      
      Syzkaller report this:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN PTI
      CPU: 0 PID: 4378 Comm: syz-executor.0 Tainted: G         C        5.0.0+ #5
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
      RIP: 0010:__lock_acquire+0x95b/0x3200 kernel/locking/lockdep.c:3573
      Code: 00 0f 85 28 1e 00 00 48 81 c4 08 01 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 4c 89 ea 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 cc 24 00 00 49 81 7d 00 e0 de 03 a6 41 bc 00 00
      RSP: 0018:ffff8881e3c07a40 EFLAGS: 00010002
      RAX: dffffc0000000000 RBX: 0000000000000000 RCX: 0000000000000000
      RDX: 0000000000000010 RSI: 0000000000000000 RDI: 0000000000000080
      RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
      R10: ffff8881e3c07d98 R11: ffff8881c7f21f80 R12: 0000000000000001
      R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000001
      FS:  00007fce2252e700(0000) GS:ffff8881f2400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007fffc7eb0228 CR3: 00000001e5bea002 CR4: 00000000007606f0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      PKRU: 55555554
      Call Trace:
       lock_acquire+0xff/0x2c0 kernel/locking/lockdep.c:4211
       __mutex_lock_common kernel/locking/mutex.c:925 [inline]
       __mutex_lock+0xdf/0x1050 kernel/locking/mutex.c:1072
       drain_workqueue+0x24/0x3f0 kernel/workqueue.c:2934
       destroy_workqueue+0x23/0x630 kernel/workqueue.c:4319
       __do_sys_delete_module kernel/module.c:1018 [inline]
       __se_sys_delete_module kernel/module.c:961 [inline]
       __x64_sys_delete_module+0x30c/0x480 kernel/module.c:961
       do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x462e99
      Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
      RSP: 002b:00007fce2252dc58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0
      RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000140
      RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00007fce2252e6bc
      R13: 00000000004bcca9 R14: 00000000006f6b48 R15: 00000000ffffffff
      
      If alloc_workqueue fails, it should return -ENOMEM, otherwise may
      trigger this NULL pointer dereference while unloading drivers.
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Fixes: 0a38c17a ("fm10k: Remove create_workqueue")
      Signed-off-by: NYue Haibing <yuehaibing@huawei.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9b9b0df4
    • F
      netfilter: ebtables: CONFIG_COMPAT: drop a bogus WARN_ON · f7dc13d6
      Florian Westphal 提交于
      commit 7caa56f006e9d712b44f27b32520c66420d5cbc6 upstream.
      
      It means userspace gave us a ruleset where there is some other
      data after the ebtables target but before the beginning of the next rule.
      
      Fixes: 81e675c2 ("netfilter: ebtables: add CONFIG_COMPAT support")
      Reported-by: syzbot+659574e7bcc7f7eb4df7@syzkaller.appspotmail.com
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7dc13d6
    • T
      NFS: Forbid setting AF_INET6 to "struct sockaddr_in"->sin_family. · 94ad68a6
      Tetsuo Handa 提交于
      commit 7c2bd9a39845bfb6d72ddb55ce737650271f6f96 upstream.
      
      syzbot is reporting uninitialized value at rpc_sockaddr2uaddr() [1]. This
      is because syzbot is setting AF_INET6 to "struct sockaddr_in"->sin_family
      (which is embedded into user-visible "struct nfs_mount_data" structure)
      despite nfs23_validate_mount_data() cannot pass sizeof(struct sockaddr_in6)
      bytes of AF_INET6 address to rpc_sockaddr2uaddr().
      
      Since "struct nfs_mount_data" structure is user-visible, we can't change
      "struct nfs_mount_data" to use "struct sockaddr_storage". Therefore,
      assuming that everybody is using AF_INET family when passing address via
      "struct nfs_mount_data"->addr, reject if its sin_family is not AF_INET.
      
      [1] https://syzkaller.appspot.com/bug?id=599993614e7cbbf66bc2656a919ab2a95fb5d75cReported-by: Nsyzbot <syzbot+047a11c361b872896a4f@syzkaller.appspotmail.com>
      Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      94ad68a6
    • L
      sched/deadline: Correctly handle active 0-lag timers · 245a94a0
      luca abeni 提交于
      commit 1b02cd6a2d7f3e2a6a5262887d2cb2912083e42f upstream.
      
      syzbot reported the following warning:
      
         [ ] WARNING: CPU: 4 PID: 17089 at kernel/sched/deadline.c:255 task_non_contending+0xae0/0x1950
      
      line 255 of deadline.c is:
      
      	WARN_ON(hrtimer_active(&dl_se->inactive_timer));
      
      in task_non_contending().
      
      Unfortunately, in some cases (for example, a deadline task
      continuosly blocking and waking immediately) it can happen that
      a task blocks (and task_non_contending() is called) while the
      0-lag timer is still active.
      
      In this case, the safest thing to do is to immediately decrease
      the running bandwidth of the task, without trying to re-arm the 0-lag timer.
      Signed-off-by: Nluca abeni <luca.abeni@santannapisa.it>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJuri Lelli <juri.lelli@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: chengjian (D) <cj.chengjian@huawei.com>
      Link: https://lkml.kernel.org/r/20190325131530.34706-1-luca.abeni@santannapisa.itSigned-off-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      245a94a0
    • T
      binder: fix handling of misaligned binder object · 6bf7d3c5
      Todd Kjos 提交于
      commit 26528be6720bb40bc8844e97ee73a37e530e9c5e upstream.
      
      Fixes crash found by syzbot:
      kernel BUG at drivers/android/binder_alloc.c:LINE! (2)
      
      Reported-and-tested-by: syzbot+55de1eb4975dec156d8f@syzkaller.appspotmail.com
      Signed-off-by: NTodd Kjos <tkjos@google.com>
      Reviewed-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Cc: stable <stable@vger.kernel.org> # 5.0, 4.19, 4.14
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6bf7d3c5
    • T
      workqueue: Try to catch flush_work() without INIT_WORK(). · 8c37f7c2
      Tetsuo Handa 提交于
      commit 4d43d395fed124631ca02356c711facb90185175 upstream.
      
      syzbot found a flush_work() caller who forgot to call INIT_WORK()
      because that work_struct was allocated by kzalloc() [1]. But the message
      
        INFO: trying to register non-static key.
        the code is fine but needs lockdep annotation.
        turning off the locking correctness validator.
      
      by lock_map_acquire() is failing to tell that INIT_WORK() is missing.
      
      Since flush_work() without INIT_WORK() is a bug, and INIT_WORK() should
      set ->func field to non-zero, let's warn if ->func field is zero.
      
      [1] https://syzkaller.appspot.com/bug?id=a5954455fcfa51c29ca2ab55b203076337e1c770Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c37f7c2
    • Y
      fs/proc/proc_sysctl.c: Fix a NULL pointer dereference · 4d476a00
      YueHaibing 提交于
      commit 89189557b47b35683a27c80ee78aef18248eefb4 upstream.
      
      Syzkaller report this:
      
        sysctl could not get directory: /net//bridge -12
        kasan: CONFIG_KASAN_INLINE enabled
        kasan: GPF could be caused by NULL-ptr deref or user memory access
        general protection fault: 0000 [#1] SMP KASAN PTI
        CPU: 1 PID: 7027 Comm: syz-executor.0 Tainted: G         C        5.1.0-rc3+ #8
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
        RIP: 0010:__write_once_size include/linux/compiler.h:220 [inline]
        RIP: 0010:__rb_change_child include/linux/rbtree_augmented.h:144 [inline]
        RIP: 0010:__rb_erase_augmented include/linux/rbtree_augmented.h:186 [inline]
        RIP: 0010:rb_erase+0x5f4/0x19f0 lib/rbtree.c:459
        Code: 00 0f 85 60 13 00 00 48 89 1a 48 83 c4 18 5b 5d 41 5c 41 5d 41 5e 41 5f c3 48 89 f2 48 b8 00 00 00 00 00 fc ff df 48 c1 ea 03 <80> 3c 02 00 0f 85 75 0c 00 00 4d 85 ed 4c 89 2e 74 ce 4c 89 ea 48
        RSP: 0018:ffff8881bb507778 EFLAGS: 00010206
        RAX: dffffc0000000000 RBX: ffff8881f224b5b8 RCX: ffffffff818f3f6a
        RDX: 000000000000000a RSI: 0000000000000050 RDI: ffff8881f224b568
        RBP: 0000000000000000 R08: ffffed10376a0ef4 R09: ffffed10376a0ef4
        R10: 0000000000000001 R11: ffffed10376a0ef4 R12: ffff8881f224b558
        R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
        FS:  00007f3e7ce13700(0000) GS:ffff8881f7300000(0000) knlGS:0000000000000000
        CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
        CR2: 00007fd60fbe9398 CR3: 00000001cb55c001 CR4: 00000000007606e0
        DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
        DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
        PKRU: 55555554
        Call Trace:
         erase_entry fs/proc/proc_sysctl.c:178 [inline]
         erase_header+0xe3/0x160 fs/proc/proc_sysctl.c:207
         start_unregistering fs/proc/proc_sysctl.c:331 [inline]
         drop_sysctl_table+0x558/0x880 fs/proc/proc_sysctl.c:1631
         get_subdir fs/proc/proc_sysctl.c:1022 [inline]
         __register_sysctl_table+0xd65/0x1090 fs/proc/proc_sysctl.c:1335
         br_netfilter_init+0x68/0x1000 [br_netfilter]
         do_one_initcall+0xbc/0x47d init/main.c:901
         do_init_module+0x1b5/0x547 kernel/module.c:3456
         load_module+0x6405/0x8c10 kernel/module.c:3804
         __do_sys_finit_module+0x162/0x190 kernel/module.c:3898
         do_syscall_64+0x9f/0x450 arch/x86/entry/common.c:290
         entry_SYSCALL_64_after_hwframe+0x49/0xbe
        Modules linked in: br_netfilter(+) backlight comedi(C) hid_sensor_hub max3100 ti_ads8688 udc_core fddi snd_mona leds_gpio rc_streamzap mtd pata_netcell nf_log_common rc_winfast udp_tunnel snd_usbmidi_lib snd_usb_toneport snd_usb_line6 snd_rawmidi snd_seq_device snd_hwdep videobuf2_v4l2 videobuf2_common videodev media videobuf2_vmalloc videobuf2_memops rc_gadmei_rm008z 8250_of smm665 hid_tmff hid_saitek hwmon_vid rc_ati_tv_wonder_hd_600 rc_core pata_pdc202xx_old dn_rtmsg as3722 ad714x_i2c ad714x snd_soc_cs4265 hid_kensington panel_ilitek_ili9322 drm drm_panel_orientation_quirks ipack cdc_phonet usbcore phonet hid_jabra hid extcon_arizona can_dev industrialio_triggered_buffer kfifo_buf industrialio adm1031 i2c_mux_ltc4306 i2c_mux ipmi_msghandler mlxsw_core snd_soc_cs35l34 snd_soc_core snd_pcm_dmaengine snd_pcm snd_timer ac97_bus snd_compress snd soundcore gpio_da9055 uio ecdh_generic mdio_thunder of_mdio fixed_phy libphy mdio_cavium iptable_security iptable_raw iptable_mangle
         iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter bpfilter ip6_vti ip_vti ip_gre ipip sit tunnel4 ip_tunnel hsr veth netdevsim vxcan batman_adv cfg80211 rfkill chnl_net caif nlmon dummy team bonding vcan bridge stp llc ip6_gre gre ip6_tunnel tunnel6 tun joydev mousedev ppdev tpm kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel ide_pci_generic piix aes_x86_64 crypto_simd cryptd ide_core glue_helper input_leds psmouse intel_agp intel_gtt serio_raw ata_generic i2c_piix4 agpgart pata_acpi parport_pc parport floppy rtc_cmos sch_fq_codel ip_tables x_tables sha1_ssse3 sha1_generic ipv6 [last unloaded: br_netfilter]
        Dumping ftrace buffer:
           (ftrace buffer empty)
        ---[ end trace 68741688d5fbfe85 ]---
      
      commit 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer
      dereference in put_links") forgot to handle start_unregistering() case,
      while header->parent is NULL, it calls erase_header() and as seen in the
      above syzkaller call trace, accessing &header->parent->root will trigger
      a NULL pointer dereference.
      
      As that commit explained, there is also no need to call
      start_unregistering() if header->parent is NULL.
      
      Link: http://lkml.kernel.org/r/20190409153622.28112-1-yuehaibing@huawei.com
      Fixes: 23da9588037e ("fs/proc/proc_sysctl.c: fix NULL pointer dereference in put_links")
      Fixes: 0e47c99d ("sysctl: Replace root_list with links between sysctl_table_sets")
      Signed-off-by: NYueHaibing <yuehaibing@huawei.com>
      Reported-by: NHulk Robot <hulkci@huawei.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d476a00
    • A
      intel_th: gth: Fix an off-by-one in output unassigning · bce00f41
      Alexander Shishkin 提交于
      commit 91d3f8a629849968dc91d6ce54f2d46abf4feb7f upstream.
      
      Commit 9ed3f22223c3 ("intel_th: Don't reference unassigned outputs")
      fixes a NULL dereference for all masters except the last one ("256+"),
      which keeps the stale pointer after the output driver had been unassigned.
      
      Fix the off-by-one.
      Signed-off-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Fixes: 9ed3f22223c3 ("intel_th: Don't reference unassigned outputs")
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bce00f41
    • L
      slip: make slhc_free() silently accept an error pointer · 9c8c39ba
      Linus Torvalds 提交于
      commit baf76f0c58aec435a3a864075b8f6d8ee5d1f17e upstream.
      
      This way, slhc_free() accepts what slhc_init() returns, whether that is
      an error or not.
      
      In particular, the pattern in sl_alloc_bufs() is
      
              slcomp = slhc_init(16, 16);
              ...
              slhc_free(slcomp);
      
      for the error handling path, and rather than complicate that code, just
      make it ok to always free what was returned by the init function.
      
      That's what the code used to do before commit 4ab42d78 ("ppp, slip:
      Validate VJ compression slot parameters completely") when slhc_init()
      just returned NULL for the error case, with no actual indication of the
      details of the error.
      
      Reported-by: syzbot+45474c076a4927533d2e@syzkaller.appspotmail.com
      Fixes: 4ab42d78 ("ppp, slip: Validate VJ compression slot parameters completely")
      Acked-by: NBen Hutchings <ben@decadent.org.uk>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      9c8c39ba
    • K
      USB: Consolidate LPM checks to avoid enabling LPM twice · f41d2de6
      Kai-Heng Feng 提交于
      commit d7a6c0ce8d26412903c7981503bad9e1cc7c45d2 upstream.
      
      USB Bluetooth controller QCA ROME (0cf3:e007) sometimes stops working
      after S3:
      [ 165.110742] Bluetooth: hci0: using NVM file: qca/nvm_usb_00000302.bin
      [ 168.432065] Bluetooth: hci0: Failed to send body at 4 of 1953 (-110)
      
      After some experiments, I found that disabling LPM can workaround the
      issue.
      
      On some platforms, the USB power is cut during S3, so the driver uses
      reset-resume to resume the device. During port resume, LPM gets enabled
      twice, by usb_reset_and_verify_device() and usb_port_resume().
      
      Consolidate all checks into new LPM helpers to make sure LPM only gets
      enabled once.
      
      Fixes: de68bab4 ("usb: Don't enable USB 2.0 Link PM by default.”)
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Cc: stable <stable@vger.kernel.org> # after much soaking
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f41d2de6
    • K
      USB: Add new USB LPM helpers · 50cda889
      Kai-Heng Feng 提交于
      commit 7529b2574a7aaf902f1f8159fbc2a7caa74be559 upstream.
      
      Use new helpers to make LPM enabling/disabling more clear.
      
      This is a preparation to subsequent patch.
      Signed-off-by: NKai-Heng Feng <kai.heng.feng@canonical.com>
      Cc: stable <stable@vger.kernel.org> # after much soaking
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      50cda889
    • M
      drm/vc4: Fix compilation error reported by kbuild test bot · 8c700e90
      Maarten Lankhorst 提交于
      commit 462ce5d963f18b71c63f6b7730a35a2ee5273540 upstream.
      
      A pointer to crtc was missing, resulting in the following build error:
      drivers/gpu/drm/vc4/vc4_crtc.c:1045:44: sparse: sparse: incorrect type in argument 1 (different base types)
      drivers/gpu/drm/vc4/vc4_crtc.c:1045:44: sparse:    expected struct drm_crtc *crtc
      drivers/gpu/drm/vc4/vc4_crtc.c:1045:44: sparse:    got struct drm_crtc_state *state
      drivers/gpu/drm/vc4/vc4_crtc.c:1045:39: sparse: sparse: not enough arguments for function vc4_crtc_destroy_state
      Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Cc: Eric Anholt <eric@anholt.net>
      Link: https://patchwork.freedesktop.org/patch/msgid/2b6ed5e6-81b0-4276-8860-870b54ca3262@linux.intel.com
      Fixes: d08106796a78 ("drm/vc4: Fix memory leak during gpu reset.")
      Cc: <stable@vger.kernel.org> # v4.6+
      Acked-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8c700e90
    • D
      Revert "drm/i915/fbdev: Actually configure untiled displays" · 2bc7ce32
      Dave Airlie 提交于
      commit 9fa246256e09dc30820524401cdbeeaadee94025 upstream.
      
      This reverts commit d179b88deb3bf6fed4991a31fd6f0f2cad21fab5.
      
      This commit is documented to break userspace X.org modesetting driver in certain configurations.
      
      The X.org modesetting userspace driver is broken. No fixes are available yet. In order for this patch to be applied it either needs a config option or a workaround developed.
      
      This has been reported a few times, saying it's a userspace problem is clearly against the regression rules.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109806Signed-off-by: NDave Airlie <airlied@redhat.com>
      Cc: <stable@vger.kernel.org> # v3.19+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2bc7ce32
    • M
      drm/vc4: Fix memory leak during gpu reset. · 2c892ef0
      Maarten Lankhorst 提交于
      commit d08106796a78a4273e39e1bbdf538dc4334b2635 upstream.
      
      __drm_atomic_helper_crtc_destroy_state does not free memory, it only
      cleans it up. Fix this by calling the functions own destroy function.
      
      Fixes: 6d6e5003 ("drm/vc4: Allocate the right amount of space for boot-time CRTC state.")
      Cc: Eric Anholt <eric@anholt.net>
      Cc: <stable@vger.kernel.org> # v4.6+
      Reviewed-by: NEric Anholt <eric@anholt.net>
      Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@linux.intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20190301125627.7285-2-maarten.lankhorst@linux.intel.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2c892ef0
    • M
      powerpc/mm/radix: Make Radix require HUGETLB_PAGE · 087341c0
      Michael Ellerman 提交于
      commit 8adddf349fda0d3de2f6bb41ddf838cbf36a8ad2 upstream.
      
      Joel reported weird crashes using skiroot_defconfig, in his case we
      jumped into an NX page:
      
        kernel tried to execute exec-protected page (c000000002bff4f0) - exploit attempt? (uid: 0)
        BUG: Unable to handle kernel instruction fetch
        Faulting instruction address: 0xc000000002bff4f0
      
      Looking at the disassembly, we had simply branched to that address:
      
        c000000000c001bc  49fff335    bl     c000000002bff4f0
      
      But that didn't match the original kernel image:
      
        c000000000c001bc  4bfff335    bl     c000000000bff4f0 <kobject_get+0x8>
      
      When STRICT_KERNEL_RWX is enabled, and we're using the radix MMU, we
      call radix__change_memory_range() late in boot to change page
      protections. We do that both to mark rodata read only and also to mark
      init text no-execute. That involves walking the kernel page tables,
      and clearing _PAGE_WRITE or _PAGE_EXEC respectively.
      
      With radix we may use hugepages for the linear mapping, so the code in
      radix__change_memory_range() uses eg. pmd_huge() to test if it has
      found a huge mapping, and if so it stops the page table walk and
      changes the PMD permissions.
      
      However if the kernel is built without HUGETLBFS support, pmd_huge()
      is just a #define that always returns 0. That causes the code in
      radix__change_memory_range() to incorrectly interpret the PMD value as
      a pointer to a PTE page rather than as a PTE at the PMD level.
      
      We can see this using `dv` in xmon which also uses pmd_huge():
      
        0:mon> dv c000000000000000
        pgd  @ 0xc000000001740000
        pgdp @ 0xc000000001740000 = 0x80000000ffffb009
        pudp @ 0xc0000000ffffb000 = 0x80000000ffffa009
        pmdp @ 0xc0000000ffffa000 = 0xc00000000000018f   <- this is a PTE
        ptep @ 0xc000000000000100 = 0xa64bb17da64ab07d   <- kernel text
      
      The end result is we treat the value at 0xc000000000000100 as a PTE
      and clear _PAGE_WRITE or _PAGE_EXEC, potentially corrupting the code
      at that address.
      
      In Joel's specific case we cleared the sign bit in the offset of the
      branch, causing a backward branch to turn into a forward branch which
      caused us to branch into a non-executable page. However the exact
      nature of the crash depends on kernel version, compiler version, and
      other factors.
      
      We need to fix radix__change_memory_range() to not use accessors that
      depend on HUGETLBFS, but we also have radix memory hotplug code that
      uses pmd_huge() etc that will also need fixing. So for now just
      disallow the broken combination of Radix with HUGETLBFS disabled.
      
      The only defconfig we have that is affected is skiroot_defconfig, so
      turn on HUGETLBFS there so that it still gets Radix.
      
      Fixes: 566ca99a ("powerpc/mm/radix: Add dummy radix_enabled()")
      Cc: stable@vger.kernel.org # v4.7+
      Reported-by: NJoel Stanley <joel@jms.id.au>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      087341c0
    • A
      ARM: 8857/1: efi: enable CP15 DMB instructions before cleaning the cache · 478afe34
      Ard Biesheuvel 提交于
      commit e17b1af96b2afc38e684aa2f1033387e2ed10029 upstream.
      
      The EFI stub is entered with the caches and MMU enabled by the
      firmware, and once the stub is ready to hand over to the decompressor,
      we clean and disable the caches.
      
      The cache clean routines use CP15 barrier instructions, which can be
      disabled via SCTLR. Normally, when using the provided cache handling
      routines to enable the caches and MMU, this bit is enabled as well.
      However, but since we entered the stub with the caches already enabled,
      this routine is not executed before we call the cache clean routines,
      resulting in undefined instruction exceptions if the firmware never
      enabled this bit.
      
      So set the bit explicitly in the EFI entry code, but do so in a way that
      guarantees that the resulting code can still run on v6 cores as well
      (which are guaranteed to have CP15 barriers enabled)
      
      Cc: <stable@vger.kernel.org> # v4.9+
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      478afe34
    • A
      dmaengine: sh: rcar-dmac: Fix glitch in dmaengine_tx_status · 129c337c
      Achim Dahlhoff 提交于
      commit 6e7da74775348d96e2d7efaf3f91410e18c481ef upstream.
      
      The tx_status poll in the rcar_dmac driver reads the status register
      which indicates which chunk is busy (DMACHCRB). Afterwards the point
      inside the chunk is read from DMATCRB. It is possible that the chunk
      has changed between the two reads. The result is a non-monotonous
      increase of the residue. Fix this by introducing a 'safe read' logic.
      
      Fixes: 73a47bd0 ("dmaengine: rcar-dmac: use TCRB instead of TCR for residue")
      Signed-off-by: NAchim Dahlhoff <Achim.Dahlhoff@de.bosch.com>
      Signed-off-by: NDirk Behme <dirk.behme@de.bosch.com>
      Reviewed-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Cc: <stable@vger.kernel.org> # v4.16+
      Signed-off-by: NVinod Koul <vkoul@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      129c337c
    • D
      dmaengine: sh: rcar-dmac: With cyclic DMA residue 0 is valid · 0f00e1c5
      Dirk Behme 提交于
      commit 907bd68a2edc491849e2fdcfe52c4596627bca94 upstream.
      
      Having a cyclic DMA, a residue 0 is not an indication of a completed
      DMA. In case of cyclic DMA make sure that dma_set_residue() is called
      and with this a residue of 0 is forwarded correctly to the caller.
      
      Fixes: 3544d287 ("dmaengine: rcar-dmac: use result of updated get_residue in tx_status")
      Signed-off-by: NDirk Behme <dirk.behme@de.bosch.com>
      Signed-off-by: NAchim Dahlhoff <Achim.Dahlhoff@de.bosch.com>
      Signed-off-by: NHiroyuki Yokoyama <hiroyuki.yokoyama.vx@renesas.com>
      Signed-off-by: NYao Lihua <ylhuajnu@outlook.com>
      Reviewed-by: NYoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
      Reviewed-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Cc: <stable@vger.kernel.org> # v4.8+
      Signed-off-by: NVinod Koul <vkoul@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      0f00e1c5
    • A
      vfio/type1: Limit DMA mappings per container · f7b467ad
      Alex Williamson 提交于
      commit 492855939bdb59c6f947b0b5b44af9ad82b7e38c upstream.
      
      Memory backed DMA mappings are accounted against a user's locked
      memory limit, including multiple mappings of the same memory.  This
      accounting bounds the number of such mappings that a user can create.
      However, DMA mappings that are not backed by memory, such as DMA
      mappings of device MMIO via mmaps, do not make use of page pinning
      and therefore do not count against the user's locked memory limit.
      These mappings still consume memory, but the memory is not well
      associated to the process for the purpose of oom killing a task.
      
      To add bounding on this use case, we introduce a limit to the total
      number of concurrent DMA mappings that a user is allowed to create.
      This limit is exposed as a tunable module option where the default
      value of 64K is expected to be well in excess of any reasonable use
      case (a large virtual machine configuration would typically only make
      use of tens of concurrent mappings).
      
      This fixes CVE-2019-3882.
      Reviewed-by: NEric Auger <eric.auger@redhat.com>
      Tested-by: NEric Auger <eric.auger@redhat.com>
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f7b467ad