1. 03 10月, 2020 3 次提交
  2. 08 8月, 2020 2 次提交
    • M
      mm/sparse: cleanup the code surrounding memory_present() · c89ab04f
      Mike Rapoport 提交于
      After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP we have two equivalent
      functions that call memory_present() for each region in memblock.memory:
      sparse_memory_present_with_active_regions() and membocks_present().
      
      Moreover, all architectures have a call to either of these functions
      preceding the call to sparse_init() and in the most cases they are called
      one after the other.
      
      Mark the regions from memblock.memory as present during sparce_init() by
      making sparse_init() call memblocks_present(), make memblocks_present()
      and memory_present() functions static and remove redundant
      sparse_memory_present_with_active_regions() function.
      
      Also remove no longer required HAVE_MEMORY_PRESENT configuration option.
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20200712083130.22919-1-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c89ab04f
    • A
      mm/sparsemem: enable vmem_altmap support in vmemmap_populate_basepages() · 1d9cfee7
      Anshuman Khandual 提交于
      Patch series "arm64: Enable vmemmap mapping from device memory", v4.
      
      This series enables vmemmap backing memory allocation from device memory
      ranges on arm64.  But before that, it enables vmemmap_populate_basepages()
      and vmemmap_alloc_block_buf() to accommodate struct vmem_altmap based
      alocation requests.
      
      This patch (of 3):
      
      vmemmap_populate_basepages() is used across platforms to allocate backing
      memory for vmemmap mapping.  This is used as a standard default choice or
      as a fallback when intended huge pages allocation fails.  This just
      creates entire vmemmap mapping with base pages (PAGE_SIZE).
      
      On arm64 platforms, vmemmap_populate_basepages() is called instead of the
      platform specific vmemmap_populate() when ARM64_SWAPPER_USES_SECTION_MAPS
      is not enabled as in case for ARM64_16K_PAGES and ARM64_64K_PAGES configs.
      
      At present vmemmap_populate_basepages() does not support allocating from
      driver defined struct vmem_altmap while trying to create vmemmap mapping
      for a device memory range.  It prevents ARM64_16K_PAGES and
      ARM64_64K_PAGES configs on arm64 from supporting device memory with
      vmemap_altmap request.
      
      This enables vmem_altmap support in vmemmap_populate_basepages() unlocking
      device memory allocation for vmemap mapping on arm64 platforms with 16K or
      64K base page configs.
      
      Each architecture should evaluate and decide on subscribing device memory
      based base page allocation through vmemmap_populate_basepages().  Hence
      lets keep it disabled on all archs in order to preserve the existing
      semantics.  A subsequent patch enables it on arm64.
      Signed-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: NJia He <justin.he@arm.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NWill Deacon <will@kernel.org>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hsin-Yi Wang <hsinyi@chromium.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Steve Capper <steve.capper@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Link: http://lkml.kernel.org/r/1594004178-8861-1-git-send-email-anshuman.khandual@arm.com
      Link: http://lkml.kernel.org/r/1594004178-8861-2-git-send-email-anshuman.khandual@arm.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1d9cfee7
  3. 31 7月, 2020 1 次提交
  4. 25 7月, 2020 3 次提交
    • A
      riscv: Parse all memory blocks to remove unusable memory · fa5a1983
      Atish Patra 提交于
      Currently, maximum physical memory allowed is equal to -PAGE_OFFSET.
      That's why we remove any memory blocks spanning beyond that size. However,
      it is done only for memblock containing linux kernel which will not work
      if there are multiple memblocks.
      
      Process all memory blocks to figure out how much memory needs to be removed
      and remove at the end instead of updating the memblock list in place.
      Signed-off-by: NAtish Patra <atish.patra@wdc.com>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      fa5a1983
    • A
      RISC-V: Do not rely on initrd_start/end computed during early dt parsing · 4400231c
      Atish Patra 提交于
      Currently, initrd_start/end are computed during early_init_dt_scan
      but used during arch_setup. We will get the following panic if initrd is used
      and CONFIG_DEBUG_VIRTUAL is turned on.
      
      [    0.000000] ------------[ cut here ]------------
      [    0.000000] kernel BUG at arch/riscv/mm/physaddr.c:33!
      [    0.000000] Kernel BUG [#1]
      [    0.000000] Modules linked in:
      [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-rc4-00015-ged0b226fed02 #886
      [    0.000000] epc: ffffffe0002058d2 ra : ffffffe0000053f0 sp : ffffffe001001f40
      [    0.000000]  gp : ffffffe00106e250 tp : ffffffe001009d40 t0 : ffffffe00107ee28
      [    0.000000]  t1 : 0000000000000000 t2 : ffffffe000a2e880 s0 : ffffffe001001f50
      [    0.000000]  s1 : ffffffe0001383e8 a0 : ffffffe00c087e00 a1 : 0000000080200000
      [    0.000000]  a2 : 00000000010bf000 a3 : ffffffe00106f3c8 a4 : ffffffe0010bf000
      [    0.000000]  a5 : ffffffe000000000 a6 : 0000000000000006 a7 : 0000000000000001
      [    0.000000]  s2 : ffffffe00106f068 s3 : ffffffe00106f070 s4 : 0000000080200000
      [    0.000000]  s5 : 0000000082200000 s6 : 0000000000000000 s7 : 0000000000000000
      [    0.000000]  s8 : 0000000080011010 s9 : 0000000080012700 s10: 0000000000000000
      [    0.000000]  s11: 0000000000000000 t3 : 000000000001fe30 t4 : 000000000001fe30
      [    0.000000]  t5 : 0000000000000000 t6 : ffffffe00107c471
      [    0.000000] status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003
      [    0.000000] random: get_random_bytes called from print_oops_end_marker+0x22/0x46 with crng_init=0
      
      To avoid the error, initrd_start/end can be computed from phys_initrd_start/size
      in setup itself. It also improves the initrd placement by aligning the start
      and size with the page size.
      
      Fixes: 76d2a049 ("RISC-V: Init and Halt Code")
      Signed-off-by: NAtish Patra <atish.patra@wdc.com>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      4400231c
    • A
      RISC-V: Set maximum number of mapped pages correctly · d0d8aae6
      Atish Patra 提交于
      Currently, maximum number of mapper pages are set to the pfn calculated
      from the memblock size of the memblock containing kernel. This will work
      until that memblock spans the entire memory. However, it will be set to
      a wrong value if there are multiple memblocks defined in kernel
      (e.g. with efi runtime services).
      
      Set the the maximum value to the pfn calculated from dram size.
      Signed-off-by: NAtish Patra <atish.patra@wdc.com>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      d0d8aae6
  5. 10 7月, 2020 1 次提交
  6. 10 6月, 2020 3 次提交
    • A
      RISC-V: Don't mark init section as non-executable · 4e0f9e3a
      Anup Patel 提交于
      The head text section (i.e. _start, secondary_start_sbi, etc) and the
      init section fall under same page table level-1 mapping.
      
      Currently, the runtime CPU hotplug is broken because we are marking
      init section as non-executable which in-turn marks head text section
      as non-executable.
      
      Further investigating other architectures, it seems marking the init
      section as non-executable is redundant because the init section pages
      are anyway poisoned and freed.
      
      To fix broken runtime CPU hotplug, we simply remove the code marking
      the init section as non-executable.
      
      Fixes: d27c3c90 ("riscv: add STRICT_KERNEL_RWX support")
      Cc: stable@vger.kernel.org
      Signed-off-by: NAnup Patel <anup.patel@wdc.com>
      Reviewed-by: NZong Li <zong.li@sifive.com>
      Reviewed-by: NAtish Patra <atish.patra@wdc.com>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      4e0f9e3a
    • M
      mm: consolidate pte_index() and pte_offset_*() definitions · 974b9b2c
      Mike Rapoport 提交于
      All architectures define pte_index() as
      
      	(address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)
      
      and all architectures define pte_offset_kernel() as an entry in the array
      of PTEs indexed by the pte_index().
      
      For the most architectures the pte_offset_kernel() implementation relies
      on the availability of pmd_page_vaddr() that converts a PMD entry value to
      the virtual address of the page containing PTEs array.
      
      Let's move x86 definitions of the PTE accessors to the generic place in
      <linux/pgtable.h> and then simply drop the respective definitions from the
      other architectures.
      
      The architectures that didn't provide pmd_page_vaddr() are updated to have
      that defined.
      
      The generic implementation of pte_offset_kernel() can be overridden by an
      architecture and alpha makes use of this because it has special ordering
      requirements for its version of pte_offset_kernel().
      
      [rppt@linux.ibm.com: v2]
        Link: http://lkml.kernel.org/r/20200514170327.31389-11-rppt@kernel.org
      [rppt@linux.ibm.com: update]
        Link: http://lkml.kernel.org/r/20200514170327.31389-12-rppt@kernel.org
      [rppt@linux.ibm.com: update]
        Link: http://lkml.kernel.org/r/20200514170327.31389-13-rppt@kernel.org
      [akpm@linux-foundation.org: fix x86 warning]
      [sfr@canb.auug.org.au: fix powerpc build]
        Link: http://lkml.kernel.org/r/20200607153443.GB738695@linux.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-10-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      974b9b2c
    • M
      mm: don't include asm/pgtable.h if linux/mm.h is already included · e31cf2f4
      Mike Rapoport 提交于
      Patch series "mm: consolidate definitions of page table accessors", v2.
      
      The low level page table accessors (pXY_index(), pXY_offset()) are
      duplicated across all architectures and sometimes more than once.  For
      instance, we have 31 definition of pgd_offset() for 25 supported
      architectures.
      
      Most of these definitions are actually identical and typically it boils
      down to, e.g.
      
      static inline unsigned long pmd_index(unsigned long address)
      {
              return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
      }
      
      static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
      {
              return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
      }
      
      These definitions can be shared among 90% of the arches provided
      XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
      
      For architectures that really need a custom version there is always
      possibility to override the generic version with the usual ifdefs magic.
      
      These patches introduce include/linux/pgtable.h that replaces
      include/asm-generic/pgtable.h and add the definitions of the page table
      accessors to the new header.
      
      This patch (of 12):
      
      The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
      functions involving page table manipulations, e.g.  pte_alloc() and
      pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
      in the files that include <linux/mm.h>.
      
      The include statements in such cases are remove with a simple loop:
      
      	for f in $(git grep -l "include <linux/mm.h>") ; do
      		sed -i -e '/include <asm\/pgtable.h>/ d' $f
      	done
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e31cf2f4
  7. 04 6月, 2020 2 次提交
    • Z
      riscv: support DEBUG_WX · b422d28b
      Zong Li 提交于
      Support DEBUG_WX to check whether there are mapping with write and execute
      permission at the same time.
      
      [akpm@linux-foundation.org: replace macros with C]
      Signed-off-by: NZong Li <zong.li@sifive.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Link: http://lkml.kernel.org/r/282e266311bced080bc6f7c255b92f87c1eb65d6.1587455584.git.zong.li@sifive.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b422d28b
    • M
      mm: use free_area_init() instead of free_area_init_nodes() · 9691a071
      Mike Rapoport 提交于
      free_area_init() has effectively became a wrapper for
      free_area_init_nodes() and there is no point of keeping it.  Still
      free_area_init() name is shorter and more general as it does not imply
      necessity to initialize multiple nodes.
      
      Rename free_area_init_nodes() to free_area_init(), update the callers and
      drop old version of free_area_init().
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
      Reviewed-by: NBaoquan He <bhe@redhat.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200412194859.12663-6-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9691a071
  8. 21 5月, 2020 1 次提交
  9. 19 5月, 2020 1 次提交
    • P
      riscv: Allow device trees to be built into the kernel · 2d268251
      Palmer Dabbelt 提交于
      Some systems don't provide a useful device tree to the kernel on boot.
      Chasing around bootloaders for these systems is a headache, so instead
      le't's just keep a device tree table in the kernel, keyed by the SOC's
      unique identifier, that contains the relevant DTB.
      
      This is only implemented for M mode right now. While we could implement
      this via the SBI calls that allow access to these identifiers, we don't
      have any systems that need this right now.
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      2d268251
  10. 06 5月, 2020 1 次提交
  11. 05 5月, 2020 1 次提交
    • V
      riscv: set max_pfn to the PFN of the last page · c749bb2d
      Vincent Chen 提交于
      The current max_pfn equals to zero. In this case, I found it caused users
      cannot get some page information through /proc such as kpagecount in v5.6
      kernel because of new sanity checks. The following message is displayed by
      stress-ng test suite with the command "stress-ng --verbose --physpage 1 -t
      1" on HiFive unleashed board.
      
       # stress-ng --verbose --physpage 1 -t 1
       stress-ng: debug: [109] 4 processors online, 4 processors configured
       stress-ng: info: [109] dispatching hogs: 1 physpage
       stress-ng: debug: [109] cache allocate: reducing cache level from L3 (too high) to L0
       stress-ng: debug: [109] get_cpu_cache: invalid cache_level: 0
       stress-ng: info: [109] cache allocate: using built-in defaults as no suitable cache found
       stress-ng: debug: [109] cache allocate: default cache size: 2048K
       stress-ng: debug: [109] starting stressors
       stress-ng: debug: [109] 1 stressor spawned
       stress-ng: debug: [110] stress-ng-physpage: started [110] (instance 0)
       stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd34de000 in /proc/kpagecount, errno=0 (Success)
       stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success)
       ...
       stress-ng: error: [110] stress-ng-physpage: cannot read page count for address 0x3fd32db078 in /proc/kpagecount, errno=0 (Success)
       stress-ng: debug: [110] stress-ng-physpage: exited [110] (instance 0)
       stress-ng: debug: [109] process [110] terminated
       stress-ng: info: [109] successful run completed in 1.00s
       #
      
      After applying this patch, the kernel can pass the test.
      
       # stress-ng --verbose --physpage 1 -t 1
       stress-ng: debug: [104] 4 processors online, 4 processors configured stress-ng: info: [104] dispatching hogs: 1 physpage
       stress-ng: info: [104] cache allocate: using defaults, can't determine cache details from sysfs
       stress-ng: debug: [104] cache allocate: default cache size: 2048K
       stress-ng: debug: [104] starting stressors
       stress-ng: debug: [104] 1 stressor spawned
       stress-ng: debug: [105] stress-ng-physpage: started [105] (instance 0) stress-ng: debug: [105] stress-ng-physpage: exited [105] (instance 0) stress-ng: debug: [104] process [105] terminated
       stress-ng: info: [104] successful run completed in 1.01s
       #
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NVincent Chen <vincent.chen@sifive.com>
      Reviewed-by: NAnup Patel <anup@brainfault.org>
      Reviewed-by: NYash Shah <yash.shah@sifive.com>
      Tested-by: NYash Shah <yash.shah@sifive.com>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      c749bb2d
  12. 27 3月, 2020 1 次提交
  13. 05 3月, 2020 1 次提交
  14. 03 1月, 2020 1 次提交
  15. 23 11月, 2019 1 次提交
  16. 18 11月, 2019 1 次提交
    • C
      riscv: add nommu support · 6bd33e1e
      Christoph Hellwig 提交于
      The kernel runs in M-mode without using page tables, and thus can't run
      bare metal without help from additional firmware.
      
      Most of the patch is just stubbing out code not needed without page
      tables, but there is an interesting detail in the signals implementation:
      
       - The normal RISC-V syscall ABI only implements rt_sigreturn as VDSO
         entry point, but the ELF VDSO is not supported for nommu Linux.
         We instead copy the code to call the syscall onto the stack.
      
      In addition to enabling the nommu code a new defconfig for a small
      kernel image that can run in nommu mode on qemu is also provided, to run
      a kernel in qemu you can use the following command line:
      
      qemu-system-riscv64 -smp 2 -m 64 -machine virt -nographic \
      	-kernel arch/riscv/boot/loader \
      	-drive file=rootfs.ext2,format=raw,id=hd0 \
      	-device virtio-blk-device,drive=hd0
      
      Contains contributions from Damien Le Moal <Damien.LeMoal@wdc.com>.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NAnup Patel <anup@brainfault.org>
      [paul.walmsley@sifive.com: updated to apply; add CONFIG_MMU guards
       around PCI_IOBASE definition to fix build issues; fixed checkpatch
       issues; move the PCI_IO_* and VMEMMAP address space macros along
       with the others; resolve sparse warning]
      Signed-off-by: NPaul Walmsley <paul.walmsley@sifive.com>
      6bd33e1e
  17. 13 11月, 2019 1 次提交
  18. 28 10月, 2019 2 次提交
  19. 24 10月, 2019 1 次提交
  20. 02 10月, 2019 1 次提交
    • A
      riscv: Fix memblock reservation for device tree blob · 922b0375
      Albert Ou 提交于
      This fixes an error with how the FDT blob is reserved in memblock.
      An incorrect physical address calculation exposed the FDT header to
      unintended corruption, which typically manifested with of_fdt_raw_init()
      faulting during late boot after fdt_totalsize() returned a wrong value.
      Systems with smaller physical memory sizes more frequently trigger this
      issue, as the kernel is more likely to allocate from the DMA32 zone
      where bbl places the DTB after the kernel image.
      
      Commit 671f9a3e ("RISC-V: Setup initial page tables in two stages")
      changed the mapping of the DTB to reside in the fixmap area.
      Consequently, early_init_fdt_reserve_self() cannot be used anymore in
      setup_bootmem() since it relies on __pa() to derive a physical address,
      which does not work with dtb_early_va that is no longer a valid kernel
      logical address.
      
      The reserved[0x1] region shows the effect of the pointer underflow
      resulting from the __pa(initial_boot_params) offset subtraction:
      
      [    0.000000] MEMBLOCK configuration:
      [    0.000000]  memory size = 0x000000001fe00000 reserved size = 0x0000000000a2e514
      [    0.000000]  memory.cnt  = 0x1
      [    0.000000]  memory[0x0]     [0x0000000080200000-0x000000009fffffff], 0x000000001fe00000 bytes flags: 0x0
      [    0.000000]  reserved.cnt  = 0x2
      [    0.000000]  reserved[0x0]   [0x0000000080200000-0x0000000080c2dfeb], 0x0000000000a2dfec bytes flags: 0x0
      [    0.000000]  reserved[0x1]   [0xfffffff080100000-0xfffffff080100527], 0x0000000000000528 bytes flags: 0x0
      
      With the fix applied:
      
      [    0.000000] MEMBLOCK configuration:
      [    0.000000]  memory size = 0x000000001fe00000 reserved size = 0x0000000000a2e514
      [    0.000000]  memory.cnt  = 0x1
      [    0.000000]  memory[0x0]     [0x0000000080200000-0x000000009fffffff], 0x000000001fe00000 bytes flags: 0x0
      [    0.000000]  reserved.cnt  = 0x2
      [    0.000000]  reserved[0x0]   [0x0000000080200000-0x0000000080c2dfeb], 0x0000000000a2dfec bytes flags: 0x0
      [    0.000000]  reserved[0x1]   [0x0000000080e00000-0x0000000080e00527], 0x0000000000000528 bytes flags: 0x0
      
      Fixes: 671f9a3e ("RISC-V: Setup initial page tables in two stages")
      Signed-off-by: NAlbert Ou <aou@eecs.berkeley.edu>
      Tested-by: NBin Meng <bmeng.cn@gmail.com>
      Reviewed-by: NAnup Patel <anup@brainfault.org>
      Signed-off-by: NPaul Walmsley <paul.walmsley@sifive.com>
      922b0375
  21. 31 8月, 2019 2 次提交
  22. 10 7月, 2019 1 次提交
    • A
      RISC-V: Setup initial page tables in two stages · 671f9a3e
      Anup Patel 提交于
      Currently, the setup_vm() does initial page table setup in one-shot
      very early before enabling MMU. Due to this, the setup_vm() has to map
      all possible kernel virtual addresses since it does not know size and
      location of RAM. This means we have kernel mappings for non-existent
      RAM and any buggy driver (or kernel) code doing out-of-bound access
      to RAM will not fault and cause underterministic behaviour.
      
      Further, the setup_vm() creates PMD mappings (i.e. 2M mappings) for
      RV64 systems. This means for PAGE_OFFSET=0xffffffe000000000 (i.e.
      MAXPHYSMEM_128GB=y), the setup_vm() will require 129 pages (i.e.
      516 KB) of memory for initial page tables which is never freed. The
      memory required for initial page tables will further increase if
      we chose a lower value of PAGE_OFFSET (e.g. 0xffffff0000000000)
      
      This patch implements two-staged initial page table setup, as follows:
      1. Early (i.e. setup_vm()): This stage maps kernel image and DTB in
      a early page table (i.e. early_pg_dir). The early_pg_dir will be used
      only by boot HART so it can be freed as-part of init memory free-up.
      2. Final (i.e. setup_vm_final()): This stage maps all possible RAM
      banks in the final page table (i.e. swapper_pg_dir). The boot HART
      will start using swapper_pg_dir at the end of setup_vm_final(). All
      non-boot HARTs directly use the swapper_pg_dir created by boot HART.
      
      We have following advantages with this new approach:
      1. Kernel mappings for non-existent RAM don't exists anymore.
      2. Memory consumed by initial page tables is now indpendent of the
      chosen PAGE_OFFSET.
      3. Memory consumed by initial page tables on RV64 system is 2 pages
      (i.e. 8 KB) which has significantly reduced and these pages will be
      freed as-part of the init memory free-up.
      
      The patch also provides a foundation for implementing strict kernel
      mappings where we protect kernel text and rodata using PTE permissions.
      Suggested-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAnup Patel <anup.patel@wdc.com>
      [paul.walmsley@sifive.com: updated to apply; fixed a checkpatch warning]
      Signed-off-by: NPaul Walmsley <paul.walmsley@sifive.com>
      671f9a3e
  23. 04 7月, 2019 1 次提交
  24. 02 7月, 2019 1 次提交
  25. 05 6月, 2019 1 次提交
  26. 15 5月, 2019 1 次提交
  27. 11 4月, 2019 1 次提交
  28. 27 3月, 2019 1 次提交
  29. 21 2月, 2019 2 次提交