1. 01 11月, 2019 1 次提交
  2. 09 8月, 2019 1 次提交
  3. 19 7月, 2019 3 次提交
    • D
      mm/memory_hotplug: allow arch_remove_memory() without CONFIG_MEMORY_HOTREMOVE · 80ec922d
      David Hildenbrand 提交于
      We want to improve error handling while adding memory by allowing to use
      arch_remove_memory() and __remove_pages() even if
      CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
      
      	arch_add_memory()
      	rc = do_something();
      	if (rc) {
      		arch_remove_memory();
      	}
      
      We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
      quite some dependencies for memory offlining.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-7-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Reviewed-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80ec922d
    • D
      s390x/mm: implement arch_remove_memory() · 18c86506
      David Hildenbrand 提交于
      Will come in handy when wanting to handle errors after
      arch_add_memory().
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-4-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      18c86506
    • D
      s390x/mm: fail when an altmap is used for arch_add_memory() · 973de24a
      David Hildenbrand 提交于
      ZONE_DEVICE is not yet supported, fail if an altmap is passed, so we
      don't forget arch_add_memory()/arch_remove_memory() when unlocking
      support.
      
      Link: http://lkml.kernel.org/r/20190527111152.16324-3-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Suggested-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chintan Pandya <cpandya@codeaurora.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Jun Yao <yaojun8558363@gmail.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: "mike.travis@hpe.com" <mike.travis@hpe.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      973de24a
  4. 17 7月, 2019 1 次提交
    • T
      dma-direct: Force unencrypted DMA under SME for certain DMA masks · 9087c375
      Tom Lendacky 提交于
      If a device doesn't support DMA to a physical address that includes the
      encryption bit (currently bit 47, so 48-bit DMA), then the DMA must
      occur to unencrypted memory. SWIOTLB is used to satisfy that requirement
      if an IOMMU is not active (enabled or configured in passthrough mode).
      
      However, commit fafadcd1 ("swiotlb: don't dip into swiotlb pool for
      coherent allocations") modified the coherent allocation support in
      SWIOTLB to use the DMA direct coherent allocation support. When an IOMMU
      is not active, this resulted in dma_alloc_coherent() failing for devices
      that didn't support DMA addresses that included the encryption bit.
      
      Addressing this requires changes to the force_dma_unencrypted() function
      in kernel/dma/direct.c. Since the function is now non-trivial and
      SME/SEV specific, update the DMA direct support to add an arch override
      for the force_dma_unencrypted() function. The arch override is selected
      when CONFIG_AMD_MEM_ENCRYPT is set. The arch override function resides in
      the arch/x86/mm/mem_encrypt.c file and forces unencrypted DMA when either
      SEV is active or SME is active and the device does not support DMA to
      physical addresses that include the encryption bit.
      
      Fixes: fafadcd1 ("swiotlb: don't dip into swiotlb pool for coherent allocations")
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      [hch: moved the force_dma_unencrypted declaration to dma-mapping.h,
            fold the s390 fix from Halil Pasic]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9087c375
  5. 15 6月, 2019 1 次提交
  6. 15 5月, 2019 3 次提交
    • D
      mm/memory_hotplug: make __remove_pages() and arch_remove_memory() never fail · ac5c9426
      David Hildenbrand 提交于
      All callers of arch_remove_memory() ignore errors.  And we should really
      try to remove any errors from the memory removal path.  No more errors are
      reported from __remove_pages().  BUG() in s390x code in case
      arch_remove_memory() is triggered.  We may implement that properly later.
      WARN in case powerpc code failed to remove the section mapping, which is
      better than ignoring the error completely right now.
      
      Link: http://lkml.kernel.org/r/20190409100148.24703-5-david@redhat.comSigned-off-by: NDavid Hildenbrand <david@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Oscar Salvador <osalvador@suse.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Stefan Agner <stefan@agner.ch>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Arun KS <arunks@codeaurora.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Mathieu Malaterre <malat@debian.org>
      Cc: Andrew Banman <andrew.banman@hpe.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Mike Travis <mike.travis@hpe.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac5c9426
    • M
      mm, memory_hotplug: provide a more generic restrictions for memory hotplug · 940519f0
      Michal Hocko 提交于
      arch_add_memory, __add_pages take a want_memblock which controls whether
      the newly added memory should get the sysfs memblock user API (e.g.
      ZONE_DEVICE users do not want/need this interface).  Some callers even
      want to control where do we allocate the memmap from by configuring
      altmap.
      
      Add a more generic hotplug context for arch_add_memory and __add_pages.
      struct mhp_restrictions contains flags which contains additional features
      to be enabled by the memory hotplug (MHP_MEMBLOCK_API currently) and
      altmap for alternative memmap allocator.
      
      This patch shouldn't introduce any functional change.
      
      [akpm@linux-foundation.org: build fix]
      Link: http://lkml.kernel.org/r/20190408082633.2864-3-osalvador@suse.deSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NOscar Salvador <osalvador@suse.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      940519f0
    • C
      initramfs: poison freed initrd memory · f94f7434
      Christoph Hellwig 提交于
      Various architectures including x86 poison the freed initrd memory.  Do
      the same in the generic free_initrd_mem implementation and switch a few
      more architectures that are identical to the generic code over to it now.
      
      Link: http://lkml.kernel.org/r/20190213174621.29297-9-hch@lst.deSigned-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: NMike Rapoport <rppt@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Steven Price <steven.price@arm.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f94f7434
  7. 29 4月, 2019 1 次提交
    • G
      locking/lockdep: check for freed initmem in static_obj() · 7a5da02d
      Gerald Schaefer 提交于
      The following warning occurred on s390:
      WARNING: CPU: 0 PID: 804 at kernel/locking/lockdep.c:1025 lockdep_register_key+0x30/0x150
      
      This is because the check in static_obj() assumes that all memory within
      [_stext, _end] belongs to static objects, which at least for s390 isn't
      true. The init section is also part of this range, and freeing it allows
      the buddy allocator to allocate memory from it. We have virt == phys for
      the kernel on s390, so that such allocations would then have addresses
      within the range [_stext, _end].
      
      To fix this, introduce arch_is_kernel_initmem_freed(), similar to
      arch_is_kernel_text/data(), and add it to the checks in static_obj().
      This will always return 0 on architectures that do not define
      arch_is_kernel_initmem_freed. On s390, it will return 1 if initmem has
      been freed and the address is in the range [__init_begin, __init_end].
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      7a5da02d
  8. 29 12月, 2018 2 次提交
  9. 31 10月, 2018 2 次提交
  10. 09 10月, 2018 2 次提交
    • V
      s390/kasan: free early identity mapping structures · 135ff163
      Vasily Gorbik 提交于
      Kasan initialization code is changed to populate persistent shadow
      first, save allocator position into pgalloc_freeable and proceed with
      early identity mapping creation. This way early identity mapping paging
      structures could be freed at once after switching to swapper_pg_dir
      when early identity mapping is not needed anymore.
      Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      135ff163
    • V
      s390/kasan: add initialization code and enable it · 42db5ed8
      Vasily Gorbik 提交于
      Kasan needs 1/8 of kernel virtual address space to be reserved as the
      shadow area. And eventually it requires the shadow memory offset to be
      known at compile time (passed to the compiler when full instrumentation
      is enabled).  Any value picked as the shadow area offset for 3-level
      paging would eat up identity mapping on 4-level paging (with 1PB
      shadow area size). So, the kernel sticks to 3-level paging when kasan
      is enabled. 3TB border is picked as the shadow offset.  The memory
      layout is adjusted so, that physical memory border does not exceed
      KASAN_SHADOW_START and vmemmap does not go below KASAN_SHADOW_END.
      
      Due to the fact that on s390 paging is set up very late and to cover
      more code with kasan instrumentation, temporary identity mapping and
      final shadow memory are set up early. The shadow memory mapping is
      later carried over to init_mm.pgd during paging_init.
      
      For the needs of paging structures allocation and shadow memory
      population a primitive allocator is used, which simply chops off
      memory blocks from the end of the physical memory.
      
      Kasan currenty doesn't track vmemmap and vmalloc areas.
      
      Current memory layout (for 3-level paging, 2GB physical memory).
      
      ---[ Identity Mapping ]---
      0x0000000000000000-0x0000000000100000
      ---[ Kernel Image Start ]---
      0x0000000000100000-0x0000000002b00000
      ---[ Kernel Image End ]---
      0x0000000002b00000-0x0000000080000000        2G <- physical memory border
      0x0000000080000000-0x0000030000000000     3070G PUD I
      ---[ Kasan Shadow Start ]---
      0x0000030000000000-0x0000030010000000      256M PMD RW X  <- shadow for 2G memory
      0x0000030010000000-0x0000037ff0000000   523776M PTE RO NX <- kasan zero ro page
      0x0000037ff0000000-0x0000038000000000      256M PMD RW X  <- shadow for 2G modules
      ---[ Kasan Shadow End ]---
      0x0000038000000000-0x000003d100000000      324G PUD I
      ---[ vmemmap Area ]---
      0x000003d100000000-0x000003e080000000
      ---[ vmalloc Area ]---
      0x000003e080000000-0x000003ff80000000
      ---[ Modules Area ]---
      0x000003ff80000000-0x0000040000000000        2G
      Acked-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      42db5ed8
  11. 09 1月, 2018 2 次提交
  12. 14 11月, 2017 1 次提交
    • M
      s390: remove all code using the access register mode · 0aaba41b
      Martin Schwidefsky 提交于
      The vdso code for the getcpu() and the clock_gettime() call use the access
      register mode to access the per-CPU vdso data page with the current code.
      
      An alternative to the complicated AR mode is to use the secondary space
      mode. This makes the vdso faster and quite a bit simpler. The downside is
      that the uaccess code has to be changed quite a bit.
      
      Which instructions are used depends on the machine and what kind of uaccess
      operation is requested. The instruction dictates which ASCE value needs
      to be loaded into %cr1 and %cr7.
      
      The different cases:
      
      * User copy with MVCOS for z10 and newer machines
        The MVCOS instruction can copy between the primary space (aka user) and
        the home space (aka kernel) directly. For set_fs(KERNEL_DS) the kernel
        ASCE is loaded into %cr1. For set_fs(USER_DS) the user space is already
        loaded in %cr1.
      
      * User copy with MVCP/MVCS for older machines
        To be able to execute the MVCP/MVCS instructions the kernel needs to
        switch to primary mode. The control register %cr1 has to be set to the
        kernel ASCE and %cr7 to either the kernel ASCE or the user ASCE dependent
        on set_fs(KERNEL_DS) vs set_fs(USER_DS).
      
      * Data access in the user address space for strnlen / futex
        To use "normal" instruction with data from the user address space the
        secondary space mode is used. The kernel needs to switch to primary mode,
        %cr1 has to contain the kernel ASCE and %cr7 either the user ASCE or the
        kernel ASCE, dependent on set_fs.
      
      To load a new value into %cr1 or %cr7 is an expensive operation, the kernel
      tries to be lazy about it. E.g. for multiple user copies in a row with
      MVCP/MVCS the replacement of the vdso ASCE in %cr7 with the user ASCE is
      done only once. On return to user space a CPU bit is checked that loads the
      vdso ASCE again.
      
      To enable and disable the data access via the secondary space two new
      functions are added, enable_sacf_uaccess and disable_sacf_uaccess. The fact
      that a context is in secondary space uaccess mode is stored in the
      mm_segment_t value for the task. The code of an interrupt may use set_fs
      as long as it returns to the previous state it got with get_fs with another
      call to set_fs. The code in finish_arch_post_lock_switch simply has to do a
      set_fs with the current mm_segment_t value for the task.
      
      For CPUs with MVCOS:
      
      CPU running in                        | %cr1 ASCE | %cr7 ASCE |
      --------------------------------------|-----------|-----------|
      user space                            |  user     |  vdso     |
      kernel, USER_DS, normal-mode          |  user     |  vdso     |
      kernel, USER_DS, normal-mode, lazy    |  user     |  user     |
      kernel, USER_DS, sacf-mode            |  kernel   |  user     |
      kernel, KERNEL_DS, normal-mode        |  kernel   |  vdso     |
      kernel, KERNEL_DS, normal-mode, lazy  |  kernel   |  kernel   |
      kernel, KERNEL_DS, sacf-mode          |  kernel   |  kernel   |
      
      For CPUs without MVCOS:
      
      CPU running in                        | %cr1 ASCE | %cr7 ASCE |
      --------------------------------------|-----------|-----------|
      user space                            |  user     |  vdso     |
      kernel, USER_DS, normal-mode          |  user     |  vdso     |
      kernel, USER_DS, normal-mode lazy     |  kernel   |  user     |
      kernel, USER_DS, sacf-mode            |  kernel   |  user     |
      kernel, KERNEL_DS, normal-mode        |  kernel   |  vdso     |
      kernel, KERNEL_DS, normal-mode, lazy  |  kernel   |  kernel   |
      kernel, KERNEL_DS, sacf-mode          |  kernel   |  kernel   |
      
      The lines with "lazy" refer to the state after a copy via the secondary
      space with a delayed reload of %cr1 and %cr7.
      
      There are three hardware address spaces that can cause a DAT exception,
      primary, secondary and home space. The exception can be related to
      four different fault types: user space fault, vdso fault, kernel fault,
      and the gmap faults.
      
      Dependent on the set_fs state and normal vs. sacf mode there are a number
      of fault combinations:
      
      1) user address space fault via the primary ASCE
      2) gmap address space fault via the primary ASCE
      3) kernel address space fault via the primary ASCE for machines with
         MVCOS and set_fs(KERNEL_DS)
      4) vdso address space faults via the secondary ASCE with an invalid
         address while running in secondary space in problem state
      5) user address space fault via the secondary ASCE for user-copy
         based on the secondary space mode, e.g. futex_ops or strnlen_user
      6) kernel address space fault via the secondary ASCE for user-copy
         with secondary space mode with set_fs(KERNEL_DS)
      7) kernel address space fault via the primary ASCE for user-copy
         with secondary space mode with set_fs(USER_DS) on machines without
         MVCOS.
      8) kernel address space fault via the home space ASCE
      
      Replace user_space_fault() with a new function get_fault_type() that
      can distinguish all four different fault types.
      
      With these changes the futex atomic ops from the kernel and the
      strnlen_user will get a little bit slower, as well as the old style
      uaccess with MVCP/MVCS. All user accesses based on MVCOS will be as
      fast as before. On the positive side, the user space vdso code is a
      lot faster and Linux ceases to use the complicated AR mode.
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      0aaba41b
  13. 09 11月, 2017 1 次提交
    • H
      s390: avoid undefined behaviour · ead7a22e
      Heiko Carstens 提交于
      At a couple of places smatch emits warnings like this:
      
          arch/s390/mm/vmem.c:409 vmem_map_init() warn:
              right shifting more than type allows
      
      In fact shifting a signed type right is undefined. Avoid this and add
      an unsigned long cast. The shifted values are always positive.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      ead7a22e
  14. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  15. 09 8月, 2017 1 次提交
  16. 26 7月, 2017 1 次提交
  17. 25 7月, 2017 1 次提交
    • M
      s390/mm: tag normal pages vs pages used in page tables · c9b5ad54
      Martin Schwidefsky 提交于
      The ESSA instruction has a new option that allows to tag pages that
      are not used as a page table. Without the tag the hypervisor has to
      assume that any guest page could be used in a page table inside the
      guest. This forces the hypervisor to flush all guest TLB entries
      whenever a host page table entry is invalidated. With the tag
      the host can skip the TLB flush if the page is tagged as normal page.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      c9b5ad54
  18. 07 7月, 2017 3 次提交
    • M
      mm, memory_hotplug: replace for_device by want_memblock in arch_add_memory · 3d79a728
      Michal Hocko 提交于
      arch_add_memory gets for_device argument which then controls whether we
      want to create memblocks for created memory sections.  Simplify the
      logic by telling whether we want memblocks directly rather than going
      through pointless negation.  This also makes the api easier to
      understand because it is clear what we want rather than nothing telling
      for_device which can mean anything.
      
      This shouldn't introduce any functional change.
      
      Link: http://lkml.kernel.org/r/20170515085827.16474-13-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d79a728
    • M
      mm, memory_hotplug: do not associate hotadded memory to zones until online · f1dd2cd1
      Michal Hocko 提交于
      The current memory hotplug implementation relies on having all the
      struct pages associate with a zone/node during the physical hotplug
      phase (arch_add_memory->__add_pages->__add_section->__add_zone).  In the
      vast majority of cases this means that they are added to ZONE_NORMAL.
      This has been so since 9d99aaa3 ("[PATCH] x86_64: Support memory
      hotadd without sparsemem") and it wasn't a big deal back then because
      movable onlining didn't exist yet.
      
      Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
      onlining 511c2aba ("mm, memory-hotplug: dynamic configure movable
      memory and portion memory") and then things got more complicated.
      Rather than reconsidering the zone association which was no longer
      needed (because the memory hotplug already depended on SPARSEMEM) a
      convoluted semantic of zone shifting has been developed.  Only the
      currently last memblock or the one adjacent to the zone_movable can be
      onlined movable.  This essentially means that the online type changes as
      the new memblocks are added.
      
      Let's simulate memory hot online manually
        $ echo 0x100000000 > /sys/devices/system/memory/probe
        $ grep . /sys/devices/system/memory/memory32/valid_zones
        Normal Movable
      
        $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe
        $ grep . /sys/devices/system/memory/memory3?/valid_zones
        /sys/devices/system/memory/memory32/valid_zones:Normal
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
      
        $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe
        $ grep . /sys/devices/system/memory/memory3?/valid_zones
        /sys/devices/system/memory/memory32/valid_zones:Normal
        /sys/devices/system/memory/memory33/valid_zones:Normal
        /sys/devices/system/memory/memory34/valid_zones:Normal Movable
      
        $ echo online_movable > /sys/devices/system/memory/memory34/state
        $ grep . /sys/devices/system/memory/memory3?/valid_zones
        /sys/devices/system/memory/memory32/valid_zones:Normal
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
        /sys/devices/system/memory/memory34/valid_zones:Movable Normal
      
      This is an awkward semantic because an udev event is sent as soon as the
      block is onlined and an udev handler might want to online it based on
      some policy (e.g.  association with a node) but it will inherently race
      with new blocks showing up.
      
      This patch changes the physical online phase to not associate pages with
      any zone at all.  All the pages are just marked reserved and wait for
      the onlining phase to be associated with the zone as per the online
      request.  There are only two requirements
      
      	- existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap
      
      	- ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses
      
      the latter one is not an inherent requirement and can be changed in the
      future.  It preserves the current behavior and made the code slightly
      simpler.  This is subject to change in future.
      
      This means that the same physical online steps as above will lead to the
      following state: Normal Movable
      
        /sys/devices/system/memory/memory32/valid_zones:Normal Movable
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
      
        /sys/devices/system/memory/memory32/valid_zones:Normal Movable
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
        /sys/devices/system/memory/memory34/valid_zones:Normal Movable
      
        /sys/devices/system/memory/memory32/valid_zones:Normal Movable
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
        /sys/devices/system/memory/memory34/valid_zones:Movable
      
      Implementation:
      The current move_pfn_range is reimplemented to check the above
      requirements (allow_online_pfn_range) and then updates the respective
      zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
      pfn range with the zone/node.  __add_pages is updated to not require the
      zone and only initializes sections in the range.  This allowed to
      simplify the arch_add_memory code (s390 could get rid of quite some of
      code).
      
      devm_memremap_pages is the only user of arch_add_memory which relies on
      the zone association because it only hooks into the memory hotplug only
      half way.  It uses it to associate the new memory with ZONE_DEVICE but
      doesn't allow it to be {on,off}lined via sysfs.  This means that this
      particular code path has to call move_pfn_range_to_zone explicitly.
      
      The original zone shifting code is kept in place and will be removed in
      the follow up patch for an easier review.
      
      Please note that this patch also changes the original behavior when
      offlining a memory block adjacent to another zone (Normal vs.  Movable)
      used to allow to change its movable type.  This will be handled later.
      
      [richard.weiyang@gmail.com: simplify zone_intersects()]
        Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com
      [richard.weiyang@gmail.com: remove duplicate call for set_page_links]
        Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com
      [akpm@linux-foundation.org: remove unused local `i']
      Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # For s390 bits
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1dd2cd1
    • M
      mm, memory_hotplug: get rid of is_zone_device_section · 1b862aec
      Michal Hocko 提交于
      Device memory hotplug hooks into regular memory hotplug only half way.
      It needs memory sections to track struct pages but there is no
      need/desire to associate those sections with memory blocks and export
      them to the userspace via sysfs because they cannot be onlined anyway.
      
      This is currently expressed by for_device argument to arch_add_memory
      which then makes sure to associate the given memory range with
      ZONE_DEVICE.  register_new_memory then relies on is_zone_device_section
      to distinguish special memory hotplug from the regular one.  While this
      works now, later patches in this series want to move __add_zone outside
      of arch_add_memory path so we have to come up with something else.
      
      Add want_memblock down the __add_pages path and use it to control
      whether the section->memblock association should be done.
      arch_add_memory then just trivially want memblock for everything but
      for_device hotplug.
      
      remove_memory_section doesn't need is_zone_device_section either.  We
      can simply skip all the memblock specific cleanup if there is no
      memblock for the given section.
      
      This shouldn't introduce any functional change.
      
      Link: http://lkml.kernel.org/r/20170515085827.16474-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b862aec
  19. 12 6月, 2017 3 次提交
  20. 09 5月, 2017 1 次提交
  21. 17 2月, 2017 1 次提交
  22. 08 2月, 2017 1 次提交
    • M
      s390: add no-execute support · 57d7f939
      Martin Schwidefsky 提交于
      Bit 0x100 of a page table, segment table of region table entry
      can be used to disallow code execution for the virtual addresses
      associated with the entry.
      
      There is one tricky bit, the system call to return from a signal
      is part of the signal frame written to the user stack. With a
      non-executable stack this would stop working. To avoid breaking
      things the protection fault handler checks the opcode that caused
      the fault for 0x0a77 (sys_sigreturn) and 0x0aad (sys_rt_sigreturn)
      and injects a system call. This is preferable to the alternative
      solution with a stub function in the vdso because it works for
      vdso=off and statically linked binaries as well.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      57d7f939
  23. 25 12月, 2016 1 次提交
  24. 24 10月, 2016 1 次提交
    • G
      s390/mm: fix zone calculation in arch_add_memory() · 4a654294
      Gerald Schaefer 提交于
      Standby (hotplug) memory should be added to ZONE_MOVABLE on s390. After
      commit 199071f1 "s390/mm: make arch_add_memory() NUMA aware",
      arch_add_memory() used memblock_end_of_DRAM() to find out the end of
      ZONE_NORMAL and the beginning of ZONE_MOVABLE. However, commit 7f36e3e5
      "memory-hotplug: add hot-added memory ranges to memblock before allocate
      node_data for a node." moved the call of memblock_add_node() before
      the call of arch_add_memory() in add_memory_resource(), and thus changed
      the return value of memblock_end_of_DRAM() when called in
      arch_add_memory(). As a result, arch_add_memory() will think that all
      memory blocks should be added to ZONE_NORMAL.
      
      Fix this by changing the logic in arch_add_memory() so that it will
      manually iterate over all zones of a given node to find out which zone
      a memory block should be added to.
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      4a654294
  25. 13 6月, 2016 3 次提交
    • H
      s390: add proper __ro_after_init support · d07a980c
      Heiko Carstens 提交于
      On s390 __ro_after_init is currently mapped to __read_mostly which
      means that data marked as __ro_after_init will not be protected.
      
      Reason for this is that the common code __ro_after_init implementation
      is x86 centric: the ro_after_init data section was added to rodata,
      since x86 enables write protection to kernel text and rodata very
      late. On s390 we have write protection for these sections enabled with
      the initial page tables. So adding the ro_after_init data section to
      rodata does not work on s390.
      
      In order to make __ro_after_init work properly on s390 move the
      ro_after_init data, right behind rodata. Unlike the rodata section it
      will be marked read-only later after all init calls happened.
      
      This s390 specific implementation adds new __start_ro_after_init and
      __end_ro_after_init labels. Everything in between will be marked
      read-only after the init calls happened. In addition to the
      __ro_after_init data move also the exception table there, since from a
      practical point of view it fits the __ro_after_init requirements.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      d07a980c
    • M
      s390/mm: simplify the TLB flushing code · 64f31d58
      Martin Schwidefsky 提交于
      ptep_flush_lazy and pmdp_flush_lazy use mm->context.attach_count to
      decide between a lazy TLB flush vs an immediate TLB flush. The field
      contains two 16-bit counters, the number of CPUs that have the mm
      attached and can create TLB entries for it and the number of CPUs in
      the middle of a page table update.
      
      The __tlb_flush_asce, ptep_flush_direct and pmdp_flush_direct functions
      use the attach counter and a mask check with mm_cpumask(mm) to decide
      between a local flush local of the current CPU and a global flush.
      
      For all these functions the decision between lazy vs immediate and
      local vs global TLB flush can be based on CPU masks. There are two
      masks:  the mm->context.cpu_attach_mask with the CPUs that are actively
      using the mm, and the mm_cpumask(mm) with the CPUs that have used the
      mm since the last full flush. The decision between lazy vs immediate
      flush is based on the mm->context.cpu_attach_mask, to decide between
      local vs global flush the mm_cpumask(mm) is used.
      
      With this patch all checks will use the CPU masks, the old counter
      mm->context.attach_count with its two 16-bit values is turned into a
      single counter mm->context.flush_count that keeps track of the number
      of CPUs with incomplete page table updates. The sole user of this
      counter is finish_arch_post_lock_switch() which waits for the end of
      all page table updates.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      64f31d58
    • H
      s390/mm: align swapper_pg_dir to 16k · 0ccb32c9
      Heiko Carstens 提交于
      The segment/region table that is part of the kernel image must be
      properly aligned to 16k in order to make the crdte inline assembly
      work.
      Otherwise it will calculate a wrong segment/region table start address
      and access incorrect memory locations if the swapper_pg_dir is not
      aligned to 16k.
      
      Therefore define BSS_FIRST_SECTIONS in order to put the swapper_pg_dir
      at the beginning of the bss section and also align the bss section to
      16k just like other architectures did.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      0ccb32c9
  26. 21 4月, 2016 1 次提交
    • G
      s390/mm: fix asce_bits handling with dynamic pagetable levels · 723cacbd
      Gerald Schaefer 提交于
      There is a race with multi-threaded applications between context switch and
      pagetable upgrade. In switch_mm() a new user_asce is built from mm->pgd and
      mm->context.asce_bits, w/o holding any locks. A concurrent mmap with a
      pagetable upgrade on another thread in crst_table_upgrade() could already
      have set new asce_bits, but not yet the new mm->pgd. This would result in a
      corrupt user_asce in switch_mm(), and eventually in a kernel panic from a
      translation exception.
      
      Fix this by storing the complete asce instead of just the asce_bits, which
      can then be read atomically from switch_mm(), so that it either sees the
      old value or the new value, but no mixture. Both cases are OK. Having the
      old value would result in a page fault on access to the higher level memory,
      but the fault handler would see the new mm->pgd, if it was a valid access
      after the mmap on the other thread has completed. So as worst-case scenario
      we would have a page fault loop for the racing thread until the next time
      slice.
      
      Also remove dead code and simplify the upgrade/downgrade path, there are no
      upgrades from 2 levels, and only downgrades from 3 levels for compat tasks.
      There are also no concurrent upgrades, because the mmap_sem is held with
      down_write() in do_mmap, so the flush and table checks during upgrade can
      be removed.
      Reported-by: NMichael Munday <munday@ca.ibm.com>
      Reviewed-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      723cacbd