1. 21 3月, 2019 2 次提交
  2. 17 3月, 2019 1 次提交
  3. 16 3月, 2019 1 次提交
    • J
      filemap: kill page_cache_read usage in filemap_fault · a75d4c33
      Josef Bacik 提交于
      Patch series "drop the mmap_sem when doing IO in the fault path", v6.
      
      Now that we have proper isolation in place with cgroups2 we have started
      going through and fixing the various priority inversions.  Most are all
      gone now, but this one is sort of weird since it's not necessarily a
      priority inversion that happens within the kernel, but rather because of
      something userspace does.
      
      We have giant applications that we want to protect, and parts of these
      giant applications do things like watch the system state to determine how
      healthy the box is for load balancing and such.  This involves running
      'ps' or other such utilities.  These utilities will often walk
      /proc/<pid>/whatever, and these files can sometimes need to
      down_read(&task->mmap_sem).  Not usually a big deal, but we noticed when
      we are stress testing that sometimes our protected application has latency
      spikes trying to get the mmap_sem for tasks that are in lower priority
      cgroups.
      
      This is because any down_write() on a semaphore essentially turns it into
      a mutex, so even if we currently have it held for reading, any new readers
      will not be allowed on to keep from starving the writer.  This is fine,
      except a lower priority task could be stuck doing IO because it has been
      throttled to the point that its IO is taking much longer than normal.  But
      because a higher priority group depends on this completing it is now stuck
      behind lower priority work.
      
      In order to avoid this particular priority inversion we want to use the
      existing retry mechanism to stop from holding the mmap_sem at all if we
      are going to do IO.  This already exists in the read case sort of, but
      needed to be extended for more than just grabbing the page lock.  With
      io.latency we throttle at submit_bio() time, so the readahead stuff can
      block and even page_cache_read can block, so all these paths need to have
      the mmap_sem dropped.
      
      The other big thing is ->page_mkwrite.  btrfs is particularly shitty here
      because we have to reserve space for the dirty page, which can be a very
      expensive operation.  We use the same retry method as the read path, and
      simply cache the page and verify the page is still setup properly the next
      pass through ->page_mkwrite().
      
      I've tested these patches with xfstests and there are no regressions.
      
      This patch (of 3):
      
      If we do not have a page at filemap_fault time we'll do this weird forced
      page_cache_read thing to populate the page, and then drop it again and
      loop around and find it.  This makes for 2 ways we can read a page in
      filemap_fault, and it's not really needed.  Instead add a FGP_FOR_MMAP
      flag so that pagecache_get_page() will return a unlocked page that's in
      pagecache.  Then use the normal page locking and readpage logic already in
      filemap_fault.  This simplifies the no page in page cache case
      significantly.
      
      [akpm@linux-foundation.org: fix comment text]
      [josef@toxicpanda.com: don't unlock null page in FGP_FOR_MMAP case]
        Link: http://lkml.kernel.org/r/20190312201742.22935-1-josef@toxicpanda.com
      Link: http://lkml.kernel.org/r/20181211173801.29535-2-josef@toxicpanda.comSigned-off-by: NJosef Bacik <josef@toxicpanda.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a75d4c33
  4. 15 3月, 2019 1 次提交
  5. 14 3月, 2019 1 次提交
  6. 13 3月, 2019 20 次提交
    • D
      tracing: kdb: Fix ftdump to not sleep · 31b265b3
      Douglas Anderson 提交于
      As reported back in 2016-11 [1], the "ftdump" kdb command triggers a
      BUG for "sleeping function called from invalid context".
      
      kdb's "ftdump" command wants to call ring_buffer_read_prepare() in
      atomic context.  A very simple solution for this is to add allocation
      flags to ring_buffer_read_prepare() so kdb can call it without
      triggering the allocation error.  This patch does that.
      
      Note that in the original email thread about this, it was suggested
      that perhaps the solution for kdb was to either preallocate the buffer
      ahead of time or create our own iterator.  I'm hoping that this
      alternative of adding allocation flags to ring_buffer_read_prepare()
      can be considered since it means I don't need to duplicate more of the
      core trace code into "trace_kdb.c" (for either creating my own
      iterator or re-preparing a ring allocator whose memory was already
      allocated).
      
      NOTE: another option for kdb is to actually figure out how to make it
      reuse the existing ftrace_dump() function and totally eliminate the
      duplication.  This sounds very appealing and actually works (the "sr
      z" command can be seen to properly dump the ftrace buffer).  The
      downside here is that ftrace_dump() fully consumes the trace buffer.
      Unless that is changed I'd rather not use it because it means "ftdump
      | grep xyz" won't be very useful to search the ftrace buffer since it
      will throw away the whole trace on the first grep.  A future patch to
      dump only the last few lines of the buffer will also be hard to
      implement.
      
      [1] https://lkml.kernel.org/r/20161117191605.GA21459@google.com
      
      Link: http://lkml.kernel.org/r/20190308193205.213659-1-dianders@chromium.orgReported-by: NBrian Norris <briannorris@chromium.org>
      Signed-off-by: NDouglas Anderson <dianders@chromium.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      31b265b3
    • C
      f2fs: print more parameters in trace_f2fs_map_blocks · 76630f20
      Chao Yu 提交于
      for better map_blocks trace.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      76630f20
    • C
      f2fs: trace f2fs_ioc_shutdown · 559e87c4
      Chao Yu 提交于
      This patch supports to trace f2fs_ioc_shutdown.
      Signed-off-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      559e87c4
    • Z
      f2fs: correct spelling mistake · 68b79cdc
      Zeng Guangyue 提交于
      correct spelling mistake for "nunmber"
      Signed-off-by: NZeng Guangyue <zengguangyue@hisilicon.com>
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      68b79cdc
    • A
      dt-bindings: clock: imx8mq: Fix numbering overlaps and gaps · 010d5166
      Abel Vesa 提交于
      IMX8MQ_CLK_USB_PHY_REF changes from 163 to 153, this way removing the gap.
      All the following clock ids are now decreased by 10 to keep the numbering
      right. Doing this, the IMX8MQ_CLK_CSI2_CORE is not overlapped with
      IMX8MQ_CLK_GPT1 anymore. IMX8MQ_CLK_GPT1_ROOT changes from 193 to 183 and
      all the following ids are updated accordingly.
      Reported-by: NPatrick Wildt <patrick@blueri.se>
      Fixes: 1cf3817b ("dt-bindings: Add binding for i.MX8MQ CCM")
      Signed-off-by: NAbel Vesa <abel.vesa@nxp.com>
      Signed-off-by: NStephen Boyd <sboyd@kernel.org>
      010d5166
    • O
      fix null pointer deref in tracepoints in back channel · f87b543a
      Olga Kornievskaia 提交于
      Backchannel doesn't have the rq_task->tk_clientid pointer set.
      
      Otherwise can lead to the following oops:
      ocalhost login: [  111.385319] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
      [  111.388073] #PF error: [normal kernel read fault]
      [  111.389452] PGD 80000000290d8067 P4D 80000000290d8067 PUD 75f25067 PMD 0
      [  111.391224] Oops: 0000 [#1] SMP PTI
      [  111.392151] CPU: 0 PID: 3533 Comm: NFSv4 callback Not tainted 5.0.0-rc7+ #1
      [  111.393787] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
      [  111.396340] RIP: 0010:trace_event_raw_event_xprt_enq_xmit+0x6f/0xf0 [sunrpc]
      [  111.397974] Code: 00 00 00 48 89 ee 48 89 e7 e8 bd 0a 85 d7 48 85 c0 74 4a 41 0f b7 94 24 e0 00 00 00 48 89 e7 89 50 08 49 8b 94 24 a8 00 00 00 <8b> 52 04 89 50 0c 49 8b 94 24 c0 00 00 00 8b 92 a8 00 00 00 0f ca
      [  111.402215] RSP: 0018:ffffb98743263cf8 EFLAGS: 00010286
      [  111.403406] RAX: ffffa0890fc3bc88 RBX: 0000000000000003 RCX: 0000000000000000
      [  111.405057] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffb98743263cf8
      [  111.406656] RBP: ffffa0896f5368f0 R08: 0000000000000246 R09: 0000000000000000
      [  111.408437] R10: ffffe19b01c01500 R11: 0000000000000000 R12: ffffa08977d28a00
      [  111.410210] R13: 0000000000000004 R14: ffffa089315303f0 R15: ffffa08931530000
      [  111.411856] FS:  0000000000000000(0000) GS:ffffa0897bc00000(0000) knlGS:0000000000000000
      [  111.413699] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  111.415068] CR2: 0000000000000004 CR3: 000000002ac90004 CR4: 00000000001606f0
      [  111.416745] Call Trace:
      [  111.417339]  xprt_request_enqueue_transmit+0x2b6/0x4a0 [sunrpc]
      [  111.418709]  ? rpc_task_need_encode+0x40/0x40 [sunrpc]
      [  111.419957]  call_bc_transmit+0xd5/0x170 [sunrpc]
      [  111.421067]  __rpc_execute+0x7e/0x3f0 [sunrpc]
      [  111.422177]  rpc_run_bc_task+0x78/0xd0 [sunrpc]
      [  111.423212]  bc_svc_process+0x281/0x340 [sunrpc]
      [  111.424325]  nfs41_callback_svc+0x130/0x1c0 [nfsv4]
      [  111.425430]  ? remove_wait_queue+0x60/0x60
      [  111.426398]  kthread+0xf5/0x130
      [  111.427155]  ? nfs_callback_authenticate+0x50/0x50 [nfsv4]
      [  111.428388]  ? kthread_bind+0x10/0x10
      [  111.429270]  ret_from_fork+0x1f/0x30
      
      localhost login: [  467.462259] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004
      [  467.464411] #PF error: [normal kernel read fault]
      [  467.465445] PGD 80000000728c1067 P4D 80000000728c1067 PUD 728c0067 PMD 0
      [  467.466980] Oops: 0000 [#1] SMP PTI
      [  467.467759] CPU: 0 PID: 3517 Comm: NFSv4 callback Not tainted 5.0.0-rc7+ #1
      [  467.469393] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/02/2015
      [  467.471840] RIP: 0010:trace_event_raw_event_xprt_transmit+0x7c/0xf0 [sunrpc]
      [  467.473392] Code: f6 48 85 c0 74 4b 49 8b 94 24 98 00 00 00 48 89 e7 0f b7 92 e0 00 00 00 89 50 08 49 8b 94 24 98 00 00 00 48 8b 92 a8 00 00 00 <8b> 52 04 89 50 0c 41 8b 94 24 a8 00 00 00 0f ca 89 50 10 41 8b 94
      [  467.477605] RSP: 0018:ffffabe7434fbcd0 EFLAGS: 00010282
      [  467.478793] RAX: ffff99720fc3bce0 RBX: 0000000000000003 RCX: 0000000000000000
      [  467.480409] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffabe7434fbcd0
      [  467.482011] RBP: ffff99726f631948 R08: 0000000000000246 R09: 0000000000000000
      [  467.483591] R10: 0000000070000000 R11: 0000000000000000 R12: ffff997277dfcc00
      [  467.485226] R13: 0000000000000000 R14: 0000000000000000 R15: ffff99722fecdca8
      [  467.486830] FS:  0000000000000000(0000) GS:ffff99727bc00000(0000) knlGS:0000000000000000
      [  467.488596] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  467.489931] CR2: 0000000000000004 CR3: 00000000270e6006 CR4: 00000000001606f0
      [  467.491559] Call Trace:
      [  467.492128]  xprt_transmit+0x303/0x3f0 [sunrpc]
      [  467.493143]  ? rpc_task_need_encode+0x40/0x40 [sunrpc]
      [  467.494328]  call_bc_transmit+0x49/0x170 [sunrpc]
      [  467.495379]  __rpc_execute+0x7e/0x3f0 [sunrpc]
      [  467.496451]  rpc_run_bc_task+0x78/0xd0 [sunrpc]
      [  467.497467]  bc_svc_process+0x281/0x340 [sunrpc]
      [  467.498507]  nfs41_callback_svc+0x130/0x1c0 [nfsv4]
      [  467.499751]  ? remove_wait_queue+0x60/0x60
      [  467.500686]  kthread+0xf5/0x130
      [  467.501438]  ? nfs_callback_authenticate+0x50/0x50 [nfsv4]
      [  467.502640]  ? kthread_bind+0x10/0x10
      [  467.503454]  ret_from_fork+0x1f/0x30
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      f87b543a
    • K
      Drop flex_arrays · 586187d7
      Kent Overstreet 提交于
      All existing users have been converted to generic radix trees
      
      Link: http://lkml.kernel.org/r/20181217131929.11727-8-kent.overstreet@gmail.comSigned-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      Acked-by: NDave Hansen <dave.hansen@intel.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      586187d7
    • K
      sctp: convert to genradix · 2075e50c
      Kent Overstreet 提交于
      This also makes sctp_stream_alloc_(out|in) saner, in that they no longer
      allocate new flex_arrays/genradixes, they just preallocate more
      elements.
      
      This code does however have a suspicious lack of locking.
      
      Link: http://lkml.kernel.org/r/20181217131929.11727-7-kent.overstreet@gmail.comSigned-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2075e50c
    • K
      generic radix trees · ba20ba2e
      Kent Overstreet 提交于
      Very simple radix tree implementation that supports storing arbitrary
      size entries, up to PAGE_SIZE - upcoming patches will convert existing
      flex_array users to genradixes.  The new genradix code has a much
      simpler API and implementation, and doesn't have a hard limit on the
      number of elements like flex_array does.
      
      Link: http://lkml.kernel.org/r/20181217131929.11727-5-kent.overstreet@gmail.comSigned-off-by: NKent Overstreet <kent.overstreet@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Paul Moore <paul@paul-moore.com>
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ba20ba2e
    • M
      memblock: remove memblock_{set,clear}_region_flags · fe145124
      Mike Rapoport 提交于
      The memblock API provides dedicated helpers to set or clear a flag on a
      memory region, e.g.  memblock_{mark,clear}_hotplug().
      
      The memblock_{set,clear}_region_flags() functions are used only by the
      memblock internal function that adjusts the region flags.  Drop these
      functions and use open-coded implementation instead.
      
      Link: http://lkml.kernel.org/r/1549455025-17706-2-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe145124
    • M
      memblock: drop memblock_alloc_*_nopanic() variants · 26fb3dae
      Mike Rapoport 提交于
      As all the memblock allocation functions return NULL in case of error
      rather than panic(), the duplicates with _nopanic suffix can be removed.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-22-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: Petr Mladek <pmladek@suse.com>		[printk]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26fb3dae
    • M
      memblock: make memblock_find_in_range_node() and choose_memblock_flags() static · c366ea89
      Mike Rapoport 提交于
      These functions are not used outside memblock.  Make them static.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-12-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c366ea89
    • M
      memblock: refactor internal allocation functions · 92d12f95
      Mike Rapoport 提交于
      Currently, memblock has several internal functions with overlapping
      functionality.  They all call memblock_find_in_range_node() to find free
      memory and then reserve the allocated range and mark it with kmemleak.
      However, there is difference in the allocation constraints and in
      fallback strategies.
      
      The allocations returning physical address first attempt to find free
      memory on the specified node within mirrored memory regions, then retry
      on the same node without the requirement for memory mirroring and
      finally fall back to all available memory.
      
      The allocations returning virtual address start with clamping the
      allowed range to memblock.current_limit, attempt to allocate from the
      specified node from regions with mirroring and with user defined minimal
      address.  If such allocation fails, next attempt is done with node
      restriction lifted.  Next, the allocation is retried with minimal
      address reset to zero and at last without the requirement for mirrored
      regions.
      
      Let's consolidate various fallbacks handling and make them more
      consistent for physical and virtual variants.  Most of the fallback
      handling is moved to memblock_alloc_range_nid() and it now handles node
      and mirror fallbacks.
      
      The memblock_alloc_internal() uses memblock_alloc_range_nid() to get a
      physical address of the allocated range and converts it to virtual
      address.
      
      The fallback for allocation below the specified minimal address remains
      in memblock_alloc_internal() because memblock_alloc_range_nid() is used
      by CMA with exact requirement for lower bounds.
      
      The memblock_phys_alloc_nid() function is completely dropped as it is not
      used anywhere outside memblock and its only usage can be replaced by a
      call to memblock_alloc_range_nid().
      
      [rppt@linux.ibm.com: fix parameter order in memblock_phys_alloc_try_nid()]
        Link: http://lkml.kernel.org/r/20190203113915.GC8620@rapoport-lnx
      Link: http://lkml.kernel.org/r/1548057848-15136-11-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92d12f95
    • M
      memblock: drop memblock_alloc_base() · 0ba9e6ed
      Mike Rapoport 提交于
      The memblock_alloc_base() function tries to allocate a memory up to the
      limit specified by its max_addr parameter and panics if the allocation
      fails.  Replace its usage with memblock_phys_alloc_range() and make the
      callers check the return value and panic in case of error.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-10-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>		[powerpc]
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ba9e6ed
    • M
      memblock: drop __memblock_alloc_base() · 42b46aef
      Mike Rapoport 提交于
      The __memblock_alloc_base() function tries to allocate a memory up to
      the limit specified by its max_addr parameter.  Depending on the value
      of this parameter, the __memblock_alloc_base() can is replaced with the
      appropriate memblock_phys_alloc*() variant.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-9-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: NRob Herring <robh@kernel.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42b46aef
    • M
      memblock: memblock_phys_alloc(): don't panic · ecc3e771
      Mike Rapoport 提交于
      Make the memblock_phys_alloc() function an inline wrapper for
      memblock_phys_alloc_range() and update the memblock_phys_alloc() callers
      to check the returned value and panic in case of error.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-8-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ecc3e771
    • M
      memblock: emphasize that memblock_alloc_range() returns a physical address · 8a770c2a
      Mike Rapoport 提交于
      Rename memblock_alloc_range() to memblock_phys_alloc_range() to
      emphasize that it returns a physical address.
      
      While on it, remove the 'enum memblock_flags' parameter from this
      function as its only user anyway sets it to MEMBLOCK_NONE, which is the
      default for the most of memblock allocations.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-6-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a770c2a
    • M
      memblock: drop memblock_alloc_base_nid() · 53d818d2
      Mike Rapoport 提交于
      memblock_alloc_base_nid() is a oneliner wrapper for
      memblock_alloc_range_nid() without any side effect.
      
      Replace it's usage by the direct calls to memblock_alloc_range_nid().
      
      Link: http://lkml.kernel.org/r/1548057848-15136-5-git-send-email-rppt@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53d818d2
    • N
      mm: refactor readahead defines in mm.h · b5420237
      Nikolay Borisov 提交于
      All users of VM_MAX_READAHEAD actually convert it to kbytes and then to
      pages. Define the macro explicitly as (SZ_128K / PAGE_SIZE). This
      simplifies the expression in every filesystem. Also rename the macro to
      VM_READAHEAD_PAGES to properly convey its meaning. Finally remove unused
      VM_MIN_READAHEAD
      
      [akpm@linux-foundation.org: fix fs/io_uring.c, per Stephen]
      Link: http://lkml.kernel.org/r/20181221144053.24318-1-nborisov@suse.comSigned-off-by: NNikolay Borisov <nborisov@suse.com>
      Reviewed-by: NMatthew Wilcox <willy@infradead.org>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Cc: Dominique Martinet <asmadeus@codewreck.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Chris Mason <clm@fb.com>
      Cc: Josef Bacik <josef@toxicpanda.com>
      Cc: David Sterba <dsterba@suse.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5420237
    • S
      mm/hmm: convert to use vm_fault_t · b57e622e
      Souptick Joarder 提交于
      Convert to use vm_fault_t type as return type for fault handler.
      
      kbuild reported warning during testing of
      *mm-create-the-new-vm_fault_t-type.patch* available in below link -
      https://patchwork.kernel.org/patch/10752741/
      
        kernel/memremap.c:46:34: warning: incorrect type in return expression
                                 (different base types)
        kernel/memremap.c:46:34: expected restricted vm_fault_t
        kernel/memremap.c:46:34: got int
      
      This patch has fixed the warnings and also hmm_devmem_fault() is
      converted to return vm_fault_t to avoid further warnings.
      
      [sfr@canb.auug.org.au: drm/nouveau/dmem: update for struct hmm_devmem_ops member change]
        Link: http://lkml.kernel.org/r/20190220174407.753d94e5@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20190110145900.GA1317@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Reviewed-by: NJérôme Glisse <jglisse@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b57e622e
  7. 12 3月, 2019 2 次提交
    • R
      PM / wakeup: Drop wakeup_source_drop() · 623217a0
      Rafael J. Wysocki 提交于
      After commit d856f39ac1cc ("PM / wakeup: Rework wakeup source timer
      cancellation") wakeup_source_drop() is a trivial wrapper around
      __pm_relax() and it has no users except for wakeup_source_destroy()
      and wakeup_source_trash() which also has no users, so drop it along
      with the latter and make wakeup_source_destroy() call __pm_relax()
      directly.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      623217a0
    • A
      y2038: fix socket.h header inclusion · a623a7a1
      Arnd Bergmann 提交于
      Referencing the __kernel_long_t type caused some user space applications
      to stop compiling when they had not already included linux/posix_types.h,
      e.g.
      
      s/multicast.c -o ext/sockets/multicast.lo
      In file included from /builddir/build/BUILD/php-7.3.3/main/php.h:468,
                       from /builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c:27:
      /builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c: In function 'zm_startup_sockets':
      /builddir/build/BUILD/php-7.3.3/ext/sockets/sockets.c:776:40: error: '__kernel_long_t' undeclared (first use in this function)
        776 |  REGISTER_LONG_CONSTANT("SO_SNDTIMEO", SO_SNDTIMEO, CONST_CS | CONST_PERSISTENT);
      
      It is safe to include that header here, since it only contains kernel
      internal types that do not conflict with other user space types.
      
      It's still possible that some related build failures remain, but those
      are likely to be for code that is not already y2038 safe.
      Reported-by: NLaura Abbott <labbott@redhat.com>
      Fixes: a9beb86a ("sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a623a7a1
  8. 11 3月, 2019 2 次提交
  9. 10 3月, 2019 2 次提交
    • E
      ip: fix ip_mc_may_pull() return value · 083b78a9
      Eric Dumazet 提交于
      ip_mc_may_pull() must return 0 if there is a problem, not an errno.
      
      syzbot reported :
      
      BUG: KASAN: use-after-free in br_ip4_multicast_igmp3_report net/bridge/br_multicast.c:947 [inline]
      BUG: KASAN: use-after-free in br_multicast_ipv4_rcv net/bridge/br_multicast.c:1631 [inline]
      BUG: KASAN: use-after-free in br_multicast_rcv+0x3cd8/0x4440 net/bridge/br_multicast.c:1741
      Read of size 4 at addr ffff88820a4084ee by task syz-executor.2/11183
      
      CPU: 1 PID: 11183 Comm: syz-executor.2 Not tainted 5.0.0+ #14
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x172/0x1f0 lib/dump_stack.c:113
       print_address_description.cold+0x7c/0x20d mm/kasan/report.c:187
       kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317
       __asan_report_load4_noabort+0x14/0x20 mm/kasan/generic_report.c:131
       br_ip4_multicast_igmp3_report net/bridge/br_multicast.c:947 [inline]
       br_multicast_ipv4_rcv net/bridge/br_multicast.c:1631 [inline]
       br_multicast_rcv+0x3cd8/0x4440 net/bridge/br_multicast.c:1741
       br_handle_frame_finish+0xa3a/0x14c0 net/bridge/br_input.c:108
       br_nf_hook_thresh+0x2ec/0x380 net/bridge/br_netfilter_hooks.c:1005
       br_nf_pre_routing_finish+0x8e2/0x1750 net/bridge/br_netfilter_hooks.c:410
       NF_HOOK include/linux/netfilter.h:289 [inline]
       NF_HOOK include/linux/netfilter.h:283 [inline]
       br_nf_pre_routing+0x7e7/0x13a0 net/bridge/br_netfilter_hooks.c:506
       nf_hook_entry_hookfn include/linux/netfilter.h:119 [inline]
       nf_hook_slow+0xbf/0x1f0 net/netfilter/core.c:511
       nf_hook include/linux/netfilter.h:244 [inline]
       NF_HOOK include/linux/netfilter.h:287 [inline]
       br_handle_frame+0x95b/0x1450 net/bridge/br_input.c:305
       __netif_receive_skb_core+0xa96/0x3040 net/core/dev.c:4902
       __netif_receive_skb_one_core+0xa8/0x1a0 net/core/dev.c:4971
       __netif_receive_skb+0x2c/0x1c0 net/core/dev.c:5083
       netif_receive_skb_internal+0x117/0x660 net/core/dev.c:5186
       netif_receive_skb+0x6e/0x5a0 net/core/dev.c:5261
      
      Fixes: ba5ea614 ("bridge: simplify ip_mc_check_igmp() and ipv6_mc_check_mld() calls")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Linus Lüssing <linus.luessing@c0d3.blue>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      083b78a9
    • G
      net: keep refcount warning in reqsk_free() · 1039c6e1
      Guillaume Nault 提交于
      As Eric Dumazet said, "We do not have a way to tell if the req was ever
      inserted in a hash table, so better play safe.".
      Let's remove this comment, so that nobody will be tempted to drop the
      WARN_ON_ONCE() line.
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1039c6e1
  10. 09 3月, 2019 3 次提交
    • P
      net: add missing documentation in linux/skbuff.h · 161e6137
      Pedro Tammela 提交于
      This patch adds missing documentation for some inline functions on
      linux/skbuff.h. The patch is incomplete and a lot more can be added,
      just wondering if it's of interest of the netdev developers.
      
      Also fixed some whitespaces.
      Signed-off-by: NPedro Tammela <pctammela@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      161e6137
    • B
      bpf: fix warning about using plain integer as NULL · 71b91a50
      Bo YU 提交于
      Sparse warning below:
      
      sudo make C=2 CF=-D__CHECK_ENDIAN__ M=net/bpf/
      CHECK   net/bpf//test_run.c
      net/bpf//test_run.c:19:77: warning: Using plain integer as NULL pointer
      ./include/linux/bpf-cgroup.h:295:77: warning: Using plain integer as NULL pointer
      
      Fixes: 8bad74f9 ("bpf: extend cgroup bpf core to allow multiple cgroup storage types")
      Acked-by: NYonghong Song <yhs@fb.com>
      Signed-off-by: NBo YU <tsu.yubo@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      71b91a50
    • D
      rxrpc: Fix client call connect/disconnect race · 930c9f91
      David Howells 提交于
      rxrpc_disconnect_client_call() reads the call's connection ID protocol
      value (call->cid) as part of that function's variable declarations.  This
      is bad because it's not inside the locked section and so may race with
      someone granting use of the channel to the call.
      
      This manifests as an assertion failure (see below) where the call in the
      presumed channel (0 because call->cid wasn't set when we read it) doesn't
      match the call attached to the channel we were actually granted (if 1, 2 or
      3).
      
      Fix this by moving the read and dependent calculations inside of the
      channel_lock section.  Also, only set the channel number and pointer
      variables if cid is not zero (ie. unset).
      
      This problem can be induced by injecting an occasional error in
      rxrpc_wait_for_channel() before the call to schedule().
      
      Make two further changes also:
      
       (1) Add a trace for wait failure in rxrpc_connect_call().
      
       (2) Drop channel_lock before BUG'ing in the case of the assertion failure.
      
      The failure causes a trace akin to the following:
      
      rxrpc: Assertion failed - 18446612685268945920(0xffff8880beab8c00) == 18446612685268621312(0xffff8880bea69800) is false
      ------------[ cut here ]------------
      kernel BUG at net/rxrpc/conn_client.c:824!
      ...
      RIP: 0010:rxrpc_disconnect_client_call+0x2bf/0x99d
      ...
      Call Trace:
       rxrpc_connect_call+0x902/0x9b3
       ? wake_up_q+0x54/0x54
       rxrpc_new_client_call+0x3a0/0x751
       ? rxrpc_kernel_begin_call+0x141/0x1bc
       ? afs_alloc_call+0x1b5/0x1b5
       rxrpc_kernel_begin_call+0x141/0x1bc
       afs_make_call+0x20c/0x525
       ? afs_alloc_call+0x1b5/0x1b5
       ? __lock_is_held+0x40/0x71
       ? lockdep_init_map+0xaf/0x193
       ? lockdep_init_map+0xaf/0x193
       ? __lock_is_held+0x40/0x71
       ? yfs_fs_fetch_data+0x33b/0x34a
       yfs_fs_fetch_data+0x33b/0x34a
       afs_fetch_data+0xdc/0x3b7
       afs_read_dir+0x52d/0x97f
       afs_dir_iterate+0xa0/0x661
       ? iterate_dir+0x63/0x141
       iterate_dir+0xa2/0x141
       ksys_getdents64+0x9f/0x11b
       ? filldir+0x111/0x111
       ? do_syscall_64+0x3e/0x1a0
       __x64_sys_getdents64+0x16/0x19
       do_syscall_64+0x7d/0x1a0
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Fixes: 45025bce ("rxrpc: Improve management and caching of client connection objects")
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NMarc Dionne <marc.dionne@auristor.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      930c9f91
  11. 08 3月, 2019 5 次提交
    • P
      netfilter: nf_tables: fix set double-free in abort path · 40ba1d9b
      Pablo Neira Ayuso 提交于
      The abort path can cause a double-free of an anonymous set.
      Added-and-to-be-aborted rule looks like this:
      
      udp dport { 137, 138 } drop
      
      The to-be-aborted transaction list looks like this:
      
      newset
      newsetelem
      newsetelem
      rule
      
      This gets walked in reverse order, so first pass disables the rule, the
      set elements, then the set.
      
      After synchronize_rcu(), we then destroy those in same order: rule, set
      element, set element, newset.
      
      Problem is that the anonymous set has already been bound to the rule, so
      the rule (lookup expression destructor) already frees the set, when then
      cause use-after-free when trying to delete the elements from this set,
      then try to free the set again when handling the newset expression.
      
      Rule releases the bound set in first place from the abort path, this
      causes the use-after-free on set element removal when undoing the new
      element transactions. To handle this, skip new element transaction if
      set is bound from the abort path.
      
      This is still causes the use-after-free on set element removal.  To
      handle this, remove transaction from the list when the set is already
      bound.
      
      Joint work with Florian Westphal.
      
      Fixes: f6ac8585 ("netfilter: nf_tables: unbind set in rule from commit path")
      Bugzilla: https://bugzilla.netfilter.org/show_bug.cgi?id=1325Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      40ba1d9b
    • L
      include/linux/relay.h: fix percpu annotation in struct rchan · 62461ac2
      Luc Van Oostenryck 提交于
      The percpu member of this structure is declared as:
      	struct ... ** __percpu member;
      So its type is:
      	__percpu pointer to pointer to struct ...
      
      But looking at how it's used, its type should be:
      	pointer to __percpu pointer to struct ...
      and it should thus be declared as:
      	struct ... * __percpu *member;
      
      So fix the placement of '__percpu' in the definition of this
      structures.
      
      This silents a few Sparse's warnings like:
      	warning: incorrect type in initializer (different address spaces)
      	  expected void const [noderef] <asn:3> *__vpp_verify
      	  got struct sched_domain **
      
      Link: http://lkml.kernel.org/r/20190118144902.79065-1-luc.vanoostenryck@gmail.com
      Fixes: 017c59c0 ("relay: Use per CPU constructs for the relay channel buffer pointers")
      Signed-off-by: NLuc Van Oostenryck <luc.vanoostenryck@gmail.com>
      Cc: Jens Axboe <axboe@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62461ac2
    • S
      mm: create the new vm_fault_t type · 3d353901
      Souptick Joarder 提交于
      Page fault handlers are supposed to return VM_FAULT codes, but some
      drivers/file systems mistakenly return error numbers.  Now that all
      drivers/file systems have been converted to use the vm_fault_t return
      type, change the type definition to no longer be compatible with 'int'.
      By making it an unsigned int, the function prototype becomes
      incompatible with a function which returns int.  Sparse will detect any
      attempts to return a value which is not a VM_FAULT code.
      
      VM_FAULT_SET_HINDEX and VM_FAULT_GET_HINDEX values are changed to avoid
      conflict with other VM_FAULT codes.
      
      [jrdr.linux@gmail.com: fix warnings]
        Link: http://lkml.kernel.org/r/20190109183742.GA24326@jordon-HP-15-Notebook-PC
      Link: http://lkml.kernel.org/r/20190108183041.GA12137@jordon-HP-15-Notebook-PCSigned-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Reviewed-by: NWilliam Kucharski <william.kucharski@oracle.com>
      Reviewed-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NMatthew Wilcox <willy@infradead.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d353901
    • D
      lib/lzo: separate lzo-rle from lzo · 45ec975e
      Dave Rodgman 提交于
      To prevent any issues with persistent data, separate lzo-rle from lzo so
      that it is treated as a separate algorithm, and lzo is still available.
      
      Link: http://lkml.kernel.org/r/20190205155944.16007-3-dave.rodgman@arm.comSigned-off-by: NDave Rodgman <dave.rodgman@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com>
      Cc: Matt Sealey <matt.sealey@arm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <nitingupta910@gmail.com>
      Cc: Richard Purdie <rpurdie@openedhand.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      45ec975e
    • D
      lib/lzo: implement run-length encoding · 5ee4014a
      Dave Rodgman 提交于
      Patch series "lib/lzo: run-length encoding support", v5.
      
      Following on from the previous lzo-rle patchset:
      
        https://lkml.org/lkml/2018/11/30/972
      
      This patchset contains only the RLE patches, and should be applied on
      top of the non-RLE patches ( https://lkml.org/lkml/2019/2/5/366 ).
      
      Previously, some questions were raised around the RLE patches.  I've
      done some additional benchmarking to answer these questions.  In short:
      
       - RLE offers significant additional performance (data-dependent)
      
       - I didn't measure any regressions that were clearly outside the noise
      
      One concern with this patchset was around performance - specifically,
      measuring RLE impact separately from Matt Sealey's patches (CTZ & fast
      copy).  I have done some additional benchmarking which I hope clarifies
      the benefits of each part of the patchset.
      
      Firstly, I've captured some memory via /dev/fmem from a Chromebook with
      many tabs open which is starting to swap, and then split this into 4178
      4k pages.  I've excluded the all-zero pages (as zram does), and also the
      no-zero pages (which won't tell us anything about RLE performance).
      This should give a realistic test dataset for zram.  What I found was
      that the data is VERY bimodal: 44% of pages in this dataset contain 5%
      or fewer zeros, and 44% contain over 90% zeros (30% if you include the
      no-zero pages).  This supports the idea of special-casing zeros in zram.
      
      Next, I've benchmarked four variants of lzo on these pages (on 64-bit
      Arm at max frequency): baseline LZO; baseline + Matt Sealey's patches
      (aka MS); baseline + RLE only; baseline + MS + RLE.  Numbers are for
      weighted roundtrip throughput (the weighting reflects that zram does
      more compression than decompression).
      
        https://drive.google.com/file/d/1VLtLjRVxgUNuWFOxaGPwJYhl_hMQXpHe/view?usp=sharing
      
      Matt's patches help in all cases for Arm (and no effect on Intel), as
      expected.
      
      RLE also behaves as expected: with few zeros present, it makes no
      difference; above ~75%, it gives a good improvement (50 - 300 MB/s on
      top of the benefit from Matt's patches).
      
      Best performance is seen with both MS and RLE patches.
      
      Finally, I have benchmarked the same dataset on an x86-64 device.  Here,
      the MS patches make no difference (as expected); RLE helps, similarly as
      on Arm.  There were no definite regressions; allowing for observational
      error, 0.1% (3/4178) of cases had a regression > 1 standard deviation,
      of which the largest was 4.6% (1.2 standard deviations).  I think this
      is probably within the noise.
      
        https://drive.google.com/file/d/1xCUVwmiGD0heEMx5gcVEmLBI4eLaageV/view?usp=sharing
      
      One point to note is that the graphs show RLE appears to help very
      slightly with no zeros present! This is because the extra code causes
      the clang optimiser to change code layout in a way that happens to have
      a significant benefit.  Taking baseline LZO and adding a do-nothing line
      like "__builtin_prefetch(out_len);" immediately before the "goto next"
      has the same effect.  So this is a real, but basically spurious effect -
      it's small enough not to upset the overall findings.
      
      This patch (of 3):
      
      When using zram, we frequently encounter long runs of zero bytes.  This
      adds a special case which identifies runs of zeros and encodes them
      using run-length encoding.
      
      This is faster for both compression and decompresion.  For high-entropy
      data which doesn't hit this case, impact is minimal.
      
      Compression ratio is within a few percent in all cases.
      
      This modifies the bitstream in a way which is backwards compatible
      (i.e., we can decompress old bitstreams, but old versions of lzo cannot
      decompress new bitstreams).
      
      Link: http://lkml.kernel.org/r/20190205155944.16007-2-dave.rodgman@arm.comSigned-off-by: NDave Rodgman <dave.rodgman@arm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Markus F.X.J. Oberhumer <markus@oberhumer.com>
      Cc: Matt Sealey <matt.sealey@arm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <nitingupta910@gmail.com>
      Cc: Richard Purdie <rpurdie@openedhand.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Sonny Rao <sonnyrao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ee4014a