1. 07 2月, 2020 2 次提交
  2. 06 2月, 2020 1 次提交
    • Q
      skbuff: fix a data race in skb_queue_len() · 86b18aaa
      Qian Cai 提交于
      sk_buff.qlen can be accessed concurrently as noticed by KCSAN,
      
       BUG: KCSAN: data-race in __skb_try_recv_from_queue / unix_dgram_sendmsg
      
       read to 0xffff8a1b1d8a81c0 of 4 bytes by task 5371 on cpu 96:
        unix_dgram_sendmsg+0x9a9/0xb70 include/linux/skbuff.h:1821
      				 net/unix/af_unix.c:1761
        ____sys_sendmsg+0x33e/0x370
        ___sys_sendmsg+0xa6/0xf0
        __sys_sendmsg+0x69/0xf0
        __x64_sys_sendmsg+0x51/0x70
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
       write to 0xffff8a1b1d8a81c0 of 4 bytes by task 1 on cpu 99:
        __skb_try_recv_from_queue+0x327/0x410 include/linux/skbuff.h:2029
        __skb_try_recv_datagram+0xbe/0x220
        unix_dgram_recvmsg+0xee/0x850
        ____sys_recvmsg+0x1fb/0x210
        ___sys_recvmsg+0xa2/0xf0
        __sys_recvmsg+0x66/0xf0
        __x64_sys_recvmsg+0x51/0x70
        do_syscall_64+0x91/0xb47
        entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Since only the read is operating as lockless, it could introduce a logic
      bug in unix_recvq_full() due to the load tearing. Fix it by adding
      a lockless variant of skb_queue_len() and unix_recvq_full() where
      READ_ONCE() is on the read while WRITE_ONCE() is on the write similar to
      the commit d7d16a89 ("net: add skb_queue_empty_lockless()").
      Signed-off-by: NQian Cai <cai@lca.pw>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86b18aaa
  3. 05 2月, 2020 3 次提交
    • E
      bonding/alb: properly access headers in bond_alb_xmit() · 38f88c45
      Eric Dumazet 提交于
      syzbot managed to send an IPX packet through bond_alb_xmit()
      and af_packet and triggered a use-after-free.
      
      First, bond_alb_xmit() was using ipx_hdr() helper to reach
      the IPX header, but ipx_hdr() was using the transport offset
      instead of the network offset. In the particular syzbot
      report transport offset was 0xFFFF
      
      This patch removes ipx_hdr() since it was only (mis)used from bonding.
      
      Then we need to make sure IPv4/IPv6/IPX headers are pulled
      in skb->head before dereferencing anything.
      
      BUG: KASAN: use-after-free in bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
      Read of size 2 at addr ffff8801ce56dfff by task syz-executor.2/18108
       (if (ipx_hdr(skb)->ipx_checksum != IPX_NO_CHECKSUM) ...)
      
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       [<ffffffff8441fc42>] __dump_stack lib/dump_stack.c:17 [inline]
       [<ffffffff8441fc42>] dump_stack+0x14d/0x20b lib/dump_stack.c:53
       [<ffffffff81a7dec4>] print_address_description+0x6f/0x20b mm/kasan/report.c:282
       [<ffffffff81a7e0ec>] kasan_report_error mm/kasan/report.c:380 [inline]
       [<ffffffff81a7e0ec>] kasan_report mm/kasan/report.c:438 [inline]
       [<ffffffff81a7e0ec>] kasan_report.cold+0x8c/0x2a0 mm/kasan/report.c:422
       [<ffffffff81a7dc4f>] __asan_report_load_n_noabort+0xf/0x20 mm/kasan/report.c:469
       [<ffffffff82c8c00a>] bond_alb_xmit+0x153a/0x1590 drivers/net/bonding/bond_alb.c:1452
       [<ffffffff82c60c74>] __bond_start_xmit drivers/net/bonding/bond_main.c:4199 [inline]
       [<ffffffff82c60c74>] bond_start_xmit+0x4f4/0x1570 drivers/net/bonding/bond_main.c:4224
       [<ffffffff83baa558>] __netdev_start_xmit include/linux/netdevice.h:4525 [inline]
       [<ffffffff83baa558>] netdev_start_xmit include/linux/netdevice.h:4539 [inline]
       [<ffffffff83baa558>] xmit_one net/core/dev.c:3611 [inline]
       [<ffffffff83baa558>] dev_hard_start_xmit+0x168/0x910 net/core/dev.c:3627
       [<ffffffff83bacf35>] __dev_queue_xmit+0x1f55/0x33b0 net/core/dev.c:4238
       [<ffffffff83bae3a8>] dev_queue_xmit+0x18/0x20 net/core/dev.c:4278
       [<ffffffff84339189>] packet_snd net/packet/af_packet.c:3226 [inline]
       [<ffffffff84339189>] packet_sendmsg+0x4919/0x70b0 net/packet/af_packet.c:3252
       [<ffffffff83b1ac0c>] sock_sendmsg_nosec net/socket.c:673 [inline]
       [<ffffffff83b1ac0c>] sock_sendmsg+0x12c/0x160 net/socket.c:684
       [<ffffffff83b1f5a2>] __sys_sendto+0x262/0x380 net/socket.c:1996
       [<ffffffff83b1f700>] SYSC_sendto net/socket.c:2008 [inline]
       [<ffffffff83b1f700>] SyS_sendto+0x40/0x60 net/socket.c:2004
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38f88c45
    • A
      net: dsa: microchip: Platform data shan't include kernel.h · 8b7a07c7
      Andy Shevchenko 提交于
      Replace with appropriate types.h.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b7a07c7
    • A
      net: dsa: b53: Platform data shan't include kernel.h · e22e0790
      Andy Shevchenko 提交于
      Replace with appropriate types.h.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e22e0790
  4. 04 2月, 2020 26 次提交
    • M
      asm-generic: Make dma-contiguous.h a mandatory include/asm header · def3f7ce
      Michal Simek 提交于
      dma-continuguous.h is generic for all architectures except arm32 which has
      its own version.
      
      Similar change was done for msi.h by commit a1b39bae
      ("asm-generic: Make msi.h a mandatory include/asm header")
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Link: https://lore.kernel.org/linux-arm-kernel/20200117080446.GA8980@lst.de/T/#m92bb56b04161057635d4142e1b3b9b6b0a70122eSigned-off-by: NMichal Simek <michal.simek@xilinx.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: Paul Walmsley <paul.walmsley@sifive.com> # for arch/riscv
      def3f7ce
    • Y
      include/linux/cpumask.h: don't calculate length of the input string · 190535f7
      Yury Norov 提交于
      New design of inner bitmap_parse() allows to avoid calculating the size of
      a null-terminated string.
      
      Link: http://lkml.kernel.org/r/20200102043031.30357-8-yury.norov@gmail.comSigned-off-by: NYury Norov <yury.norov@gmail.com>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Amritha Nambiar <amritha.nambiar@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: "Tobin C . Harding" <tobin@kernel.org>
      Cc: Vineet Gupta <vineet.gupta1@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      190535f7
    • Y
      lib: rework bitmap_parse() · 2d626158
      Yury Norov 提交于
      bitmap_parse() is ineffective and full of opaque variables and opencoded
      parts.  It leads to hard understanding and usage of it.  This rework
      includes:
      
      - remove bitmap_shift_left() call from the cycle.  Now it makes the
        complexity of the algorithm as O(nbits^2).  In the suggested approach
        the input string is parsed in reverse direction, so no shifts needed;
      
      - relax requirement on a single comma and no white spaces between
        chunks.  It is considered useful in scripting, and it aligns with
        bitmap_parselist();
      
      - split bitmap_parse() to small readable helpers;
      
      - make an explicit calculation of the end of input line at the
        beginning, so users of the bitmap_parse() won't bother doing this.
      
      Link: http://lkml.kernel.org/r/20200102043031.30357-6-yury.norov@gmail.comSigned-off-by: NYury Norov <yury.norov@gmail.com>
      Cc: Amritha Nambiar <amritha.nambiar@intel.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: "Tobin C . Harding" <tobin@kernel.org>
      Cc: Vineet Gupta <vineet.gupta1@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d626158
    • Y
      bitops: more BITS_TO_* macros · 0bddc1bd
      Yury Norov 提交于
      Introduce BITS_TO_U64, BITS_TO_U32 and BITS_TO_BYTES as they are handy in
      the following patches (BITS_TO_U32 specifically).  Reimplement tools/
      version of the macros according to the kernel implementation.
      
      Also fix indentation for BITS_PER_TYPE definition.
      
      Link: http://lkml.kernel.org/r/20200102043031.30357-3-yury.norov@gmail.comSigned-off-by: NYury Norov <yury.norov@gmail.com>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Amritha Nambiar <amritha.nambiar@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Cc: "Tobin C . Harding" <tobin@kernel.org>
      Cc: Vineet Gupta <vineet.gupta1@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0bddc1bd
    • Y
      lib/string: add strnchrnul() · 0bee0cec
      Yury Norov 提交于
      Patch series "lib: rework bitmap_parse", v5.
      
      Similarl to the recently revisited bitmap_parselist(), bitmap_parse() is
      ineffective and overcomplicated.  This series reworks it, aligns its
      interface with bitmap_parselist() and makes it simpler to use.
      
      The series also adds a test for the function and fixes usage of it in
      cpumask_parse() according to the new design - drops the calculating of
      length of an input string.
      
      bitmap_parse() takes the array of numbers to be put into the map in the BE
      order which is reversed to the natural LE order for bitmaps.  For example,
      to construct bitmap containing a bit on the position 42, we have to put a
      line '400,0'.  Current implementation reads chunk one by one from the
      beginning ('400' before '0') and makes bitmap shift after each successful
      parse.  It makes the complexity of the whole process as O(n^2).  We can do
      it in reverse direction ('0' before '400') and avoid shifting, but it
      requires reverse parsing helpers.
      
      This patch (of 7):
      
      New function works like strchrnul() with a length limited string.
      
      Link: http://lkml.kernel.org/r/20200102043031.30357-2-yury.norov@gmail.comSigned-off-by: NYury Norov <yury.norov@gmail.com>
      Reviewed-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Cc: Amritha Nambiar <amritha.nambiar@intel.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: "Tobin C . Harding" <tobin@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Miklos Szeredi <mszeredi@redhat.com>
      Cc: Vineet Gupta <vineet.gupta1@synopsys.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0bee0cec
    • A
      proc: convert everything to "struct proc_ops" · 97a32539
      Alexey Dobriyan 提交于
      The most notable change is DEFINE_SHOW_ATTRIBUTE macro split in
      seq_file.h.
      
      Conversion rule is:
      
      	llseek		=> proc_lseek
      	unlocked_ioctl	=> proc_ioctl
      
      	xxx		=> proc_xxx
      
      	delete ".owner = THIS_MODULE" line
      
      [akpm@linux-foundation.org: fix drivers/isdn/capi/kcapi_proc.c]
      [sfr@canb.auug.org.au: fix kernel/sched/psi.c]
        Link: http://lkml.kernel.org/r/20200122180545.36222f50@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20191225172546.GB13378@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97a32539
    • A
      proc: decouple proc from VFS with "struct proc_ops" · d56c0d45
      Alexey Dobriyan 提交于
      Currently core /proc code uses "struct file_operations" for custom hooks,
      however, VFS doesn't directly call them.  Every time VFS expands
      file_operations hook set, /proc code bloats for no reason.
      
      Introduce "struct proc_ops" which contains only those hooks which /proc
      allows to call into (open, release, read, write, ioctl, mmap, poll).  It
      doesn't contain module pointer as well.
      
      Save ~184 bytes per usage:
      
      	add/remove: 26/26 grow/shrink: 1/4 up/down: 1922/-6674 (-4752)
      	Function                                     old     new   delta
      	sysvipc_proc_ops                               -      72     +72
      				...
      	config_gz_proc_ops                             -      72     +72
      	proc_get_inode                               289     339     +50
      	proc_reg_get_unmapped_area                   110     107      -3
      	close_pdeo                                   227     224      -3
      	proc_reg_open                                289     284      -5
      	proc_create_data                              60      53      -7
      	rt_cpu_seq_fops                              256       -    -256
      				...
      	default_affinity_proc_fops                   256       -    -256
      	Total: Before=5430095, After=5425343, chg -0.09%
      
      Link: http://lkml.kernel.org/r/20191225172228.GA13378@avx2Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d56c0d45
    • P
      asm-generic/tlb: provide MMU_GATHER_TABLE_FREE · 0d6e24d4
      Peter Zijlstra 提交于
      As described in the comment, the correct order for freeing pages is:
      
       1) unhook page
       2) TLB invalidate page
       3) free page
      
      This order equally applies to page directories.
      
      Currently there are two correct options:
      
       - use tlb_remove_page(), when all page directores are full pages and
         there are no futher contraints placed by things like software
         walkers (HAVE_FAST_GUP).
      
       - use MMU_GATHER_RCU_TABLE_FREE and tlb_remove_table() when the
         architecture does not do IPI based TLB invalidate and has
         HAVE_FAST_GUP (or software TLB fill).
      
      This however leaves architectures that don't have page based directories
      but don't need RCU in a bind.  For those, provide MMU_GATHER_TABLE_FREE,
      which provides the independent batching for directories without the
      additional RCU freeing.
      
      Link: http://lkml.kernel.org/r/20200116064531.483522-10-aneesh.kumar@linux.ibm.comSigned-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d6e24d4
    • P
      asm-generic/tlb: rename HAVE_MMU_GATHER_NO_GATHER · 580a586c
      Peter Zijlstra 提交于
      Towards a more consistent naming scheme.
      
      Link: http://lkml.kernel.org/r/20200116064531.483522-9-aneesh.kumar@linux.ibm.comSigned-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      580a586c
    • P
      asm-generic/tlb: rename HAVE_MMU_GATHER_PAGE_SIZE · 3af4bd03
      Peter Zijlstra 提交于
      Towards a more consistent naming scheme.
      
      Link: http://lkml.kernel.org/r/20200116064531.483522-8-aneesh.kumar@linux.ibm.comSigned-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3af4bd03
    • P
      asm-generic/tlb: rename HAVE_RCU_TABLE_FREE · ff2e6d72
      Peter Zijlstra 提交于
      Towards a more consistent naming scheme.
      
      [akpm@linux-foundation.org: fix sparc64 Kconfig]
      Link: http://lkml.kernel.org/r/20200116064531.483522-7-aneesh.kumar@linux.ibm.comSigned-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ff2e6d72
    • P
      asm-gemeric/tlb: remove stray function declarations · 491a49ff
      Peter Zijlstra 提交于
      We removed the actual functions a while ago.
      
      Link: http://lkml.kernel.org/r/20200116064531.483522-5-aneesh.kumar@linux.ibm.com
      Fixes: 1808d65b ("asm-generic/tlb: Remove arch_tlb*_mmu()")
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      491a49ff
    • P
      asm-generic/tlb: avoid potential double flush · 0758cd83
      Peter Zijlstra 提交于
      Aneesh reported that:
      
      	tlb_flush_mmu()
      	  tlb_flush_mmu_tlbonly()
      	    tlb_flush()			<-- #1
      	  tlb_flush_mmu_free()
      	    tlb_table_flush()
      	      tlb_table_invalidate()
      		tlb_flush_mmu_tlbonly()
      		  tlb_flush()		<-- #2
      
      does two TLBIs when tlb->fullmm, because __tlb_reset_range() will not
      clear tlb->end in that case.
      
      Observe that any caller to __tlb_adjust_range() also sets at least one of
      the tlb->freed_tables || tlb->cleared_p* bits, and those are
      unconditionally cleared by __tlb_reset_range().
      
      Change the condition for actually issuing TLBI to having one of those bits
      set, as opposed to having tlb->end != 0.
      
      Link: http://lkml.kernel.org/r/20200116064531.483522-4-aneesh.kumar@linux.ibm.comSigned-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reported-by: N"Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0758cd83
    • P
      mm/mmu_gather: invalidate TLB correctly on batch allocation failure and flush · 0ed13259
      Peter Zijlstra 提交于
      Architectures for which we have hardware walkers of Linux page table
      should flush TLB on mmu gather batch allocation failures and batch flush.
      Some architectures like POWER supports multiple translation modes (hash
      and radix) and in the case of POWER only radix translation mode needs the
      above TLBI.  This is because for hash translation mode kernel wants to
      avoid this extra flush since there are no hardware walkers of linux page
      table.  With radix translation, the hardware also walks linux page table
      and with that, kernel needs to make sure to TLB invalidate page walk cache
      before page table pages are freed.
      
      More details in commit d86564a2 ("mm/tlb, x86/mm: Support invalidating
      TLB caches for RCU_TABLE_FREE")
      
      The changes to sparc are to make sure we keep the old behavior since we
      are now removing HAVE_RCU_TABLE_NO_INVALIDATE.  The default value for
      tlb_needs_table_invalidate is to always force an invalidate and sparc can
      avoid the table invalidate.  Hence we define tlb_needs_table_invalidate to
      false for sparc architecture.
      
      Link: http://lkml.kernel.org/r/20200116064531.483522-3-aneesh.kumar@linux.ibm.com
      Fixes: a46cc7a9 ("powerpc/mm/radix: Improve TLB/PWC flushes")
      Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au>	[powerpc]
      Cc: <stable@vger.kernel.org>	[4.14+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ed13259
    • S
      x86: mm: avoid allocating struct mm_struct on the stack · e47690d7
      Steven Price 提交于
      struct mm_struct is quite large (~1664 bytes) and so allocating on the
      stack may cause problems as the kernel stack size is small.
      
      Since ptdump_walk_pgd_level_core() was only allocating the structure so
      that it could modify the pgd argument we can instead introduce a pgd
      override in struct mm_walk and pass this down the call stack to where it
      is needed.
      
      Since the correct mm_struct is now being passed down, it is now also
      unnecessary to take the mmap_sem semaphore because ptdump_walk_pgd() will
      now take the semaphore on the real mm.
      
      [steven.price@arm.com: restore missed arm64 changes]
        Link: http://lkml.kernel.org/r/20200108145710.34314-1-steven.price@arm.com
      Link: http://lkml.kernel.org/r/20200108145710.34314-1-steven.price@arm.comSigned-off-by: NSteven Price <steven.price@arm.com>
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: "Liang, Kan" <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zong Li <zong.li@sifive.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e47690d7
    • S
      mm: ptdump: reduce level numbers by 1 in note_page() · f8f0d0b6
      Steven Price 提交于
      Rather than having to increment the 'depth' number by 1 in ptdump_hole(),
      let's change the meaning of 'level' in note_page() since that makes the
      code simplier.
      
      Note that for x86, the level numbers were previously increased by 1 in
      commit 45dcd209 ("x86/mm/dump_pagetables: Fix printout of p4d level")
      and the comment "Bit 7 has a different meaning" was not updated, so this
      change also makes the code match the comment again.
      
      Link: http://lkml.kernel.org/r/20191218162402.45610-24-steven.price@arm.comSigned-off-by: NSteven Price <steven.price@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: "Liang, Kan" <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zong Li <zong.li@sifive.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f8f0d0b6
    • S
      mm: add generic ptdump · 30d621f6
      Steven Price 提交于
      Add a generic version of page table dumping that architectures can opt-in
      to.
      
      Link: http://lkml.kernel.org/r/20191218162402.45610-20-steven.price@arm.comSigned-off-by: NSteven Price <steven.price@arm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: "Liang, Kan" <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zong Li <zong.li@sifive.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30d621f6
    • S
      mm: pagewalk: add 'depth' parameter to pte_hole · b7a16c7a
      Steven Price 提交于
      The pte_hole() callback is called at multiple levels of the page tables.
      Code dumping the kernel page tables needs to know what at what depth the
      missing entry is.  Add this is an extra parameter to pte_hole().  When the
      depth isn't know (e.g.  processing a vma) then -1 is passed.
      
      The depth that is reported is the actual level where the entry is missing
      (ignoring any folding that is in place), i.e.  any levels where
      PTRS_PER_P?D is set to 1 are ignored.
      
      Note that depth starts at 0 for a PGD so that PUD/PMD/PTE retain their
      natural numbers as levels 2/3/4.
      
      Link: http://lkml.kernel.org/r/20191218162402.45610-16-steven.price@arm.comSigned-off-by: NSteven Price <steven.price@arm.com>
      Tested-by: NZong Li <zong.li@sifive.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: "Liang, Kan" <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b7a16c7a
    • S
      mm: pagewalk: allow walking without vma · 488ae6a2
      Steven Price 提交于
      Since 48684a65: "mm: pagewalk: fix misbehavior of walk_page_range for
      vma(VM_PFNMAP)", page_table_walk() will report any kernel area as a hole,
      because it lacks a vma.
      
      This means each arch has re-implemented page table walking when needed,
      for example in the per-arch ptdump walker.
      
      Remove the requirement to have a vma in the generic code and add a new
      function walk_page_range_novma() which ignores the VMAs and simply walks
      the page tables.
      
      Link: http://lkml.kernel.org/r/20191218162402.45610-13-steven.price@arm.comSigned-off-by: NSteven Price <steven.price@arm.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: "Liang, Kan" <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zong Li <zong.li@sifive.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      488ae6a2
    • S
      mm: pagewalk: add p4d_entry() and pgd_entry() · 3afc4236
      Steven Price 提交于
      pgd_entry() and pud_entry() were removed by commit 0b1fbfe5
      ("mm/pagewalk: remove pgd_entry() and pud_entry()") because there were no
      users.  We're about to add users so reintroduce them, along with
      p4d_entry() as we now have 5 levels of tables.
      
      Note that commit a00cc7d9 ("mm, x86: add support for PUD-sized
      transparent hugepages") already re-added pud_entry() but with different
      semantics to the other callbacks.  This commit reverts the semantics back
      to match the other callbacks.
      
      To support hmm.c which now uses the new semantics of pud_entry() a new
      member ('action') of struct mm_walk is added which allows the callbacks to
      either descend (ACTION_SUBTREE, the default), skip (ACTION_CONTINUE) or
      repeat the callback (ACTION_AGAIN).  hmm.c is then updated to call
      pud_trans_huge_lock() itself and make use of the splitting/retry logic of
      the core code.
      
      After this change pud_entry() is called for all entries, not just
      transparent huge pages.
      
      [arnd@arndb.de: fix unused variable warning]
       Link: http://lkml.kernel.org/r/20200107204607.1533842-1-arnd@arndb.de
      Link: http://lkml.kernel.org/r/20191218162402.45610-12-steven.price@arm.comSigned-off-by: NSteven Price <steven.price@arm.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: "Liang, Kan" <kan.liang@linux.intel.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Zong Li <zong.li@sifive.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3afc4236
    • S
      mm: add generic p?d_leaf() macros · 93fab1b2
      Steven Price 提交于
      Patch series "Generic page walk and ptdump", v17.
      
      Many architectures current have a debugfs file for dumping the kernel page
      tables.  Currently each architecture has to implement custom functions for
      this because the details of walking the page tables used by the kernel are
      different between architectures.
      
      This series extends the capabilities of walk_page_range() so that it can
      deal with the page tables of the kernel (which have no VMAs and can
      contain larger huge pages than exist for user space).  A generic PTDUMP
      implementation is the implemented making use of the new functionality of
      walk_page_range() and finally arm64 and x86 are switch to using it,
      removing the custom table walkers.
      
      To enable a generic page table walker to walk the unusual mappings of the
      kernel we need to implement a set of functions which let us know when the
      walker has reached the leaf entry.  After a suggestion from Will Deacon
      I've chosen the name p?d_leaf() as this (hopefully) describes the purpose
      (and is a new name so has no historic baggage).  Some architectures have
      p?d_large macros but this is easily confused with "large pages".
      
      This series ends with a generic PTDUMP implemention for arm64 and x86.
      
      Mostly this is a clean up and there should be very little functional
      change.  The exceptions are:
      
      * arm64 PTDUMP debugfs now displays pages which aren't present (patch 22).
      
      * arm64 has the ability to efficiently process KASAN pages (which
        previously only x86 implemented).  This means that the combination of
        KASAN and DEBUG_WX is now useable.
      
      This patch (of 23):
      
      Exposing the pud/pgd levels of the page tables to walk_page_range() means
      we may come across the exotic large mappings that come with large areas of
      contiguous memory (such as the kernel's linear map).
      
      For architectures that don't provide all p?d_leaf() macros, provide
      generic do nothing default that are suitable where there cannot be leaf
      pages at that level.  Futher patches will add implementations for
      individual architectures.
      
      The name p?d_leaf() is chosen to minimize the confusion with existing uses
      of "large" pages and "huge" pages which do not necessary mean that the
      entry is a leaf (for example it may be a set of contiguous entries that
      only take 1 TLB slot).  For the purpose of walking the page tables we
      don't need to know how it will be represented in the TLB, but we do need
      to know for sure if it is a leaf of the tree.
      
      Link: http://lkml.kernel.org/r/20191218162402.45610-2-steven.price@arm.comSigned-off-by: NSteven Price <steven.price@arm.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Liang, Kan" <kan.liang@linux.intel.com>
      Cc: Albert Ou <aou@eecs.berkeley.edu>
      Cc: Alexandre Ghiti <alex@ghiti.fr>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <jhogan@kernel.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Zong Li <zong.li@sifive.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      93fab1b2
    • F
      mm: remove __krealloc · 1c948715
      Florian Westphal 提交于
      Since 5.5-rc1 the last user of this function is gone, so remove the
      functionality.
      
      See commit
      2ad9d774 ("netfilter: conntrack: free extension area immediately")
      for details.
      
      Link: http://lkml.kernel.org/r/20191212223442.22141-1-fw@strlen.deSigned-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c948715
    • D
      mm/memory_hotplug: drop valid_start/valid_end from test_pages_in_a_zone() · 92917998
      David Hildenbrand 提交于
      The callers are only interested in the actual zone, they don't care about
      boundaries.  Return the zone instead to simplify.
      
      Link: http://lkml.kernel.org/r/20200110183308.11849-1-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      92917998
    • D
      mm: factor out next_present_section_nr() · 4c605881
      David Hildenbrand 提交于
      Let's move it to the header and use the shorter variant from
      mm/page_alloc.c (the original one will also check
      "__highest_present_section_nr + 1", which is not necessary).  While at
      it, make the section_nr in next_pfn() const.
      
      In next_pfn(), we now return section_nr_to_pfn(-1) instead of -1 once we
      exceed __highest_present_section_nr, which doesn't make a difference in
      the caller as it is big enough (>= all sane end_pfn).
      
      Link: http://lkml.kernel.org/r/20200113144035.10848-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "Jin, Zhi" <zhi.jin@intel.com>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4c605881
    • D
      mm/page_alloc.c: initialize memmap of unavailable memory directly · 4b094b78
      David Hildenbrand 提交于
      Let's make sure that all memory holes are actually marked PageReserved(),
      that page_to_pfn() produces reliable results, and that these pages are not
      detected as "mmap" pages due to the mapcount.
      
      E.g., booting a x86-64 QEMU guest with 4160 MB:
      
      [    0.010585] Early memory node ranges
      [    0.010586]   node   0: [mem 0x0000000000001000-0x000000000009efff]
      [    0.010588]   node   0: [mem 0x0000000000100000-0x00000000bffdefff]
      [    0.010589]   node   0: [mem 0x0000000100000000-0x0000000143ffffff]
      
      max_pfn is 0x144000.
      
      Before this change:
      
      [root@localhost ~]# ./page-types -r -a 0x144000,
                   flags      page-count       MB  symbolic-flags                     long-symbolic-flags
      0x0000000000000800           16384       64  ___________M_______________________________        mmap
                   total           16384       64
      
      After this change:
      
      [root@localhost ~]# ./page-types -r -a 0x144000,
                   flags      page-count       MB  symbolic-flags                     long-symbolic-flags
      0x0000000100000000           16384       64  ___________________________r_______________        reserved
                   total           16384       64
      
      IOW, especially the unavailable physical memory ("memory hole") in the
      last section would not get properly marked PageReserved() and is indicated
      to be "mmap" memory.
      
      Drop the trace of that function from include/linux/mm.h - nobody else
      needs it, and rename it accordingly.
      
      Note: The fake zone/node might not be covered by the zone/node span.  This
      is not an urgent issue (for now, we had the same node/zone due to the
      zeroing).  We'll need a clean way to mark memory holes (e.g., using a page
      type PageHole() if possible or a fake ZONE_INVALID) and eventually stop
      marking these memory holes PageReserved().
      
      Link: http://lkml.kernel.org/r/20191211163201.17179-4-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Bob Picco <bob.picco@oracle.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Sistare <steven.sistare@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b094b78
    • E
      platform/chrome: cros_ec: Match implementation with headers · 034dbec1
      Enric Balletbo i Serra 提交于
      The 'cros_ec' core driver is the common interface for the cros_ec
      transport drivers to do the shared operations to register, unregister,
      suspend, resume and handle_event. The interface is provided by including
      the header 'include/linux/platform_data/cros_ec_proto.h', however, instead
      of have the implementation of these functions in cros_ec_proto.c, it is in
      'cros_ec.c', which is a different kernel module. Apart from being a bad
      practice, this can induce confusions allowing the users of the cros_ec
      protocol to call these functions.
      
      The register, unregister, suspend, resume and handle_event functions
      *should* only be called by the different transport drivers (i2c, spi, lpc,
      etc.), so make this a bit less confusing by moving these functions from
      the public in-kernel space to a private include in platform/chrome, and
      then, the interface for cros_ec module and for the cros_ec_proto module is
      clean.
      Signed-off-by: NEnric Balletbo i Serra <enric.balletbo@collabora.com>
      Signed-off-by: NBenson Leung <bleung@chromium.org>
      034dbec1
  5. 03 2月, 2020 1 次提交
  6. 01 2月, 2020 7 次提交