1. 01 10月, 2018 2 次提交
  2. 25 9月, 2018 2 次提交
    • J
      arm64/mm: use fixmap to modify swapper_pg_dir · 2330b7ca
      Jun Yao 提交于
      Once swapper_pg_dir is in the rodata section, it will not be possible to
      modify it directly, but we will need to modify it in some cases.
      
      To enable this, we can use the fixmap when deliberately modifying
      swapper_pg_dir. As the pgd is only transiently mapped, this provides
      some resilience against illicit modification of the pgd, e.g. for
      Kernel Space Mirror Attack (KSMA).
      Signed-off-by: NJun Yao <yaojun8558363@gmail.com>
      Reviewed-by: NJames Morse <james.morse@arm.com>
      [Mark: simplify ifdeffery, commit message]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      2330b7ca
    • J
      arm64/mm: Separate boot-time page tables from swapper_pg_dir · 2b5548b6
      Jun Yao 提交于
      Since the address of swapper_pg_dir is fixed for a given kernel image,
      it is an attractive target for manipulation via an arbitrary write. To
      mitigate this we'd like to make it read-only by moving it into the
      rodata section.
      
      We require that swapper_pg_dir is at a fixed offset from tramp_pg_dir
      and reserved_ttbr0, so these will also need to move into rodata.
      However, swapper_pg_dir is allocated along with some transient page
      tables used for boot which we do not want to move into rodata.
      
      As a step towards this, this patch separates the boot-time page tables
      into a new init_pg_dir, and reduces swapper_pg_dir to the single page it
      needs to be. This allows us to retain the relationship between
      swapper_pg_dir, tramp_pg_dir, and swapper_pg_dir, while cleanly
      separating these from the boot-time page tables.
      
      The init_pg_dir holds all of the pgd/pud/pmd/pte levels needed during
      boot, and all of these levels will be freed when we switch to the
      swapper_pg_dir, which is initialized by the existing code in
      paging_init(). Since we start off on the init_pg_dir, we no longer need
      to allocate a transient page table in paging_init() in order to ensure
      that swapper_pg_dir isn't live while we initialize it.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NJun Yao <yaojun8558363@gmail.com>
      Reviewed-by: NJames Morse <james.morse@arm.com>
      [Mark: place init_pg_dir after BSS, fold mm changes, commit message]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      2b5548b6
  3. 21 9月, 2018 1 次提交
    • J
      arm64: Kconfig: Remove ARCH_HAS_HOLES_MEMORYMODEL · 8a695a58
      James Morse 提交于
      include/linux/mmzone.h describes ARCH_HAS_HOLES_MEMORYMODEL as
      relevant when parts the memmap have been free()d. This would
      happen on systems where memory is smaller than a sparsemem-section,
      and the extra struct pages are expensive. pfn_valid() on these
      systems returns true for the whole sparsemem-section, so an extra
      memmap_valid_within() check is needed.
      
      On arm64 we have nomap memory, so always provide pfn_valid() to test
      for nomap pages. This means ARCH_HAS_HOLES_MEMORYMODEL's extra checks
      are already rolled up into pfn_valid().
      
      Remove it.
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      8a695a58
  4. 18 9月, 2018 1 次提交
    • V
      arm64: mm: Support Common Not Private translations · 5ffdfaed
      Vladimir Murzin 提交于
      Common Not Private (CNP) is a feature of ARMv8.2 extension which
      allows translation table entries to be shared between different PEs in
      the same inner shareable domain, so the hardware can use this fact to
      optimise the caching of such entries in the TLB.
      
      CNP occupies one bit in TTBRx_ELy and VTTBR_EL2, which advertises to
      the hardware that the translation table entries pointed to by this
      TTBR are the same as every PE in the same inner shareable domain for
      which the equivalent TTBR also has CNP bit set. In case CNP bit is set
      but TTBR does not point at the same translation table entries for a
      given ASID and VMID, then the system is mis-configured, so the results
      of translations are UNPREDICTABLE.
      
      For kernel we postpone setting CNP till all cpus are up and rely on
      cpufeature framework to 1) patch the code which is sensitive to CNP
      and 2) update TTBR1_EL1 with CNP bit set. TTBR1_EL1 can be
      reprogrammed as result of hibernation or cpuidle (via __enable_mmu).
      For these two cases we restore CnP bit via __cpu_suspend_exit().
      
      There are a few cases we need to care of changes in TTBR0_EL1:
        - a switch to idmap
        - software emulated PAN
      
      we rule out latter via Kconfig options and for the former we make
      sure that CNP is set for non-zero ASIDs only.
      Reviewed-by: NJames Morse <james.morse@arm.com>
      Reviewed-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Reviewed-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
      [catalin.marinas@arm.com: default y for CONFIG_ARM64_CNP]
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      5ffdfaed
  5. 15 9月, 2018 1 次提交
  6. 10 9月, 2018 1 次提交
  7. 18 8月, 2018 2 次提交
  8. 17 8月, 2018 1 次提交
    • G
      arm64: mm: check for upper PAGE_SHIFT bits in pfn_valid() · 5ad356ea
      Greg Hackmann 提交于
      ARM64's pfn_valid() shifts away the upper PAGE_SHIFT bits of the input
      before seeing if the PFN is valid.  This leads to false positives when
      some of the upper bits are set, but the lower bits match a valid PFN.
      
      For example, the following userspace code looks up a bogus entry in
      /proc/kpageflags:
      
          int pagemap = open("/proc/self/pagemap", O_RDONLY);
          int pageflags = open("/proc/kpageflags", O_RDONLY);
          uint64_t pfn, val;
      
          lseek64(pagemap, [...], SEEK_SET);
          read(pagemap, &pfn, sizeof(pfn));
          if (pfn & (1UL << 63)) {        /* valid PFN */
              pfn &= ((1UL << 55) - 1);   /* clear flag bits */
              pfn |= (1UL << 55);
              lseek64(pageflags, pfn * sizeof(uint64_t), SEEK_SET);
              read(pageflags, &val, sizeof(val));
          }
      
      On ARM64 this causes the userspace process to crash with SIGSEGV rather
      than reading (1 << KPF_NOPAGE).  kpageflags_read() treats the offset as
      valid, and stable_page_flags() will try to access an address between the
      user and kernel address ranges.
      
      Fixes: c1cc1552 ("arm64: MMU initialisation")
      Cc: stable@vger.kernel.org
      Signed-off-by: NGreg Hackmann <ghackmann@google.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      5ad356ea
  9. 09 8月, 2018 1 次提交
  10. 02 8月, 2018 1 次提交
    • L
      mm: do not initialize TLB stack vma's with vma_init() · 8b11ec1b
      Linus Torvalds 提交于
      Commit 2c4541e2 ("mm: use vma_init() to initialize VMAs on stack and
      data segments") tried to initialize various left-over ad-hoc vma's
      "properly", but actually made things worse for the temporary vma's used
      for TLB flushing.
      
      vma_init() doesn't actually initialize all of the vma, just a few
      fields, so doing something like
      
         -       struct vm_area_struct vma = { .vm_mm = tlb->mm, };
         +       struct vm_area_struct vma;
         +
         +       vma_init(&vma, tlb->mm);
      
      was actually very bad: instead of having a nicely initialized vma with
      every field but "vm_mm" zeroed, you'd have an entirely uninitialized vma
      with only a couple of fields initialized.  And they weren't even fields
      that the code in question mostly cared about.
      
      The flush_tlb_range() function takes a "struct vma" rather than a
      "struct mm_struct", because a few architectures actually care about what
      kind of range it is - being able to only do an ITLB flush if it's a
      range that doesn't have data accesses enabled, for example.  And all the
      normal users already have the vma for doing the range invalidation.
      
      But a few people want to call flush_tlb_range() with a range they just
      made up, so they also end up using a made-up vma.  x86 just has a
      special "flush_tlb_mm_range()" function for this, but other
      architectures (arm and ia64) do the "use fake vma" thing instead, and
      thus got caught up in the vma_init() changes.
      
      At the same time, the TLB flushing code really doesn't care about most
      other fields in the vma, so vma_init() is just unnecessary and
      pointless.
      
      This fixes things by having an explicit "this is just an initializer for
      the TLB flush" initializer macro, which is used by the arm/arm64/ia64
      people who mis-use this interface with just a dummy vma.
      
      Fixes: 2c4541e2 ("mm: use vma_init() to initialize VMAs on stack and data segments")
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Kirill Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8b11ec1b
  11. 27 7月, 2018 2 次提交
  12. 25 7月, 2018 1 次提交
    • J
      arm64: fix vmemmap BUILD_BUG_ON() triggering on !vmemmap setups · 7b0eb6b4
      Johannes Weiner 提交于
      Arnd reports the following arm64 randconfig build error with the PSI
      patches that add another page flag:
      
        /git/arm-soc/arch/arm64/mm/init.c: In function 'mem_init':
        /git/arm-soc/include/linux/compiler.h:357:38: error: call to
        '__compiletime_assert_618' declared with attribute error: BUILD_BUG_ON
        failed: sizeof(struct page) > (1 << STRUCT_PAGE_MAX_SHIFT)
      
      The additional page flag causes other information stored in
      page->flags to get bumped into their own struct page member:
      
        #if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_CPUPID_SHIFT <=
        BITS_PER_LONG - NR_PAGEFLAGS
        #define LAST_CPUPID_WIDTH LAST_CPUPID_SHIFT
        #else
        #define LAST_CPUPID_WIDTH 0
        #endif
      
        #if defined(CONFIG_NUMA_BALANCING) && LAST_CPUPID_WIDTH == 0
        #define LAST_CPUPID_NOT_IN_PAGE_FLAGS
        #endif
      
      which in turn causes the struct page size to exceed the size set in
      STRUCT_PAGE_MAX_SHIFT. This value is an an estimate used to size the
      VMEMMAP page array according to address space and struct page size.
      
      However, the check is performed - and triggers here - on a !VMEMMAP
      config, which consumes an additional 22 page bits for the sparse
      section id. When VMEMMAP is enabled, those bits are returned, cpupid
      doesn't need its own member, and the page passes the VMEMMAP check.
      
      Restrict that check to the situation it was meant to check: that we
      are sizing the VMEMMAP page array correctly.
      
      Says Arnd:
      
          Further experiments show that the build error already existed before,
          but was only triggered with larger values of CONFIG_NR_CPU and/or
          CONFIG_NODES_SHIFT that might be used in actual configurations but
          not in randconfig builds.
      
          With longer CPU and node masks, I could recreate the problem with
          kernels as old as linux-4.7 when arm64 NUMA support got added.
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Tested-by: NArnd Bergmann <arnd@arndb.de>
      Cc: stable@vger.kernel.org
      Fixes: 1a2db300 ("arm64, numa: Add NUMA support for arm64 platforms.")
      Fixes: 3e1907d5 ("arm64: mm: move vmemmap region right below the linear region")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      7b0eb6b4
  13. 12 7月, 2018 1 次提交
  14. 06 7月, 2018 4 次提交
  15. 05 7月, 2018 1 次提交
    • C
      ioremap: Update pgtable free interfaces with addr · 785a19f9
      Chintan Pandya 提交于
      The following kernel panic was observed on ARM64 platform due to a stale
      TLB entry.
      
       1. ioremap with 4K size, a valid pte page table is set.
       2. iounmap it, its pte entry is set to 0.
       3. ioremap the same address with 2M size, update its pmd entry with
          a new value.
       4. CPU may hit an exception because the old pmd entry is still in TLB,
          which leads to a kernel panic.
      
      Commit b6bdb751 ("mm/vmalloc: add interfaces to free unmapped page
      table") has addressed this panic by falling to pte mappings in the above
      case on ARM64.
      
      To support pmd mappings in all cases, TLB purge needs to be performed
      in this case on ARM64.
      
      Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
      so that TLB purge can be added later in seprate patches.
      
      [toshi.kani@hpe.com: merge changes, rewrite patch description]
      Fixes: 28ee90fe ("x86/mm: implement free pmd/pte page interfaces")
      Signed-off-by: NChintan Pandya <cpandya@codeaurora.org>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: mhocko@suse.com
      Cc: akpm@linux-foundation.org
      Cc: hpa@zytor.com
      Cc: linux-mm@kvack.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: stable@vger.kernel.org
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <stable@vger.kernel.org>
      Link: https://lkml.kernel.org/r/20180627141348.21777-3-toshi.kani@hpe.com
      785a19f9
  16. 02 7月, 2018 1 次提交
  17. 23 6月, 2018 1 次提交
  18. 19 6月, 2018 1 次提交
  19. 15 6月, 2018 1 次提交
  20. 13 6月, 2018 1 次提交
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
  21. 24 5月, 2018 1 次提交
  22. 23 5月, 2018 3 次提交
    • M
      arm64: Unify kernel fault reporting · c870f14e
      Mark Rutland 提交于
      In do_page_fault(), we handle some kernel faults early, and simply
      die() with a message. For faults handled later, we dump the faulting
      address, decode the ESR, walk the page tables, and perform a number of
      steps to ensure that this data is reported.
      
      Let's unify the handling of fatal kernel faults with a new
      die_kernel_fault() helper, handling all of these details. This is
      largely the same as the existing logic in __do_kernel_fault(), except
      that addresses are consistently padded to 16 hex characters, as would be
      expected for a 64-bit address.
      
      The messages currently logged in do_page_fault are adjusted to fit into
      the die_kernel_fault() message template.
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      c870f14e
    • M
      arm64: make is_permission_fault() name clearer · 969e61ba
      Mark Rutland 提交于
      The naming of is_permission_fault() makes it sound like it should return
      true for permission faults from EL0, but by design, it only does so for
      faults from EL1.
      
      Let's make this clear by dropping el1 in the name, as we do for
      is_el1_instruction_abort().
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      969e61ba
    • P
      arm64: fault: Don't leak data in ESR context for user fault on kernel VA · cc198460
      Peter Maydell 提交于
      If userspace faults on a kernel address, handing them the raw ESR
      value on the sigframe as part of the delivered signal can leak data
      useful to attackers who are using information about the underlying hardware
      fault type (e.g. translation vs permission) as a mechanism to defeat KASLR.
      
      However there are also legitimate uses for the information provided
      in the ESR -- notably the GCC and LLVM sanitizers use this to report
      whether wild pointer accesses by the application are reads or writes
      (since a wild write is a more serious bug than a wild read), so we
      don't want to drop the ESR information entirely.
      
      For faulting addresses in the kernel, sanitize the ESR. We choose
      to present userspace with the illusion that there is nothing mapped
      in the kernel's part of the address space at all, by reporting all
      faults as level 0 translation faults taken to EL1.
      
      These fields are safe to pass through to userspace as they depend
      only on the instruction that userspace used to provoke the fault:
       EC IL (always)
       ISV CM WNR (for all data aborts)
      All the other fields in ESR except DFSC are architecturally RES0
      for an L0 translation fault taken to EL1, so can be zeroed out
      without confusing userspace.
      
      The illusion is not entirely perfect, as there is a tiny wrinkle
      where we will report an alignment fault that was not due to the memory
      type (for instance a LDREX to an unaligned address) as a translation
      fault, whereas if you do this on real unmapped memory the alignment
      fault takes precedence. This is not likely to trip anybody up in
      practice, as the only users we know of for the ESR information who
      care about the behaviour for kernel addresses only really want to
      know about the WnR bit.
      Signed-off-by: NPeter Maydell <peter.maydell@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      cc198460
  23. 15 5月, 2018 1 次提交
    • C
      arm64: Increase ARCH_DMA_MINALIGN to 128 · ebc7e21e
      Catalin Marinas 提交于
      This patch increases the ARCH_DMA_MINALIGN to 128 so that it covers the
      currently known Cache Writeback Granule (CTR_EL0.CWG) on arm64 and moves
      the fallback in cache_line_size() from L1_CACHE_BYTES to this constant.
      In addition, it warns (and taints) if the CWG is larger than
      ARCH_DMA_MINALIGN as this is not safe with non-coherent DMA.
      
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      ebc7e21e
  24. 08 5月, 2018 1 次提交
  25. 01 5月, 2018 1 次提交
  26. 25 4月, 2018 1 次提交
    • E
      signal: Ensure every siginfo we send has all bits initialized · 3eb0f519
      Eric W. Biederman 提交于
      Call clear_siginfo to ensure every stack allocated siginfo is properly
      initialized before being passed to the signal sending functions.
      
      Note: It is not safe to depend on C initializers to initialize struct
      siginfo on the stack because C is allowed to skip holes when
      initializing a structure.
      
      The initialization of struct siginfo in tracehook_report_syscall_exit
      was moved from the helper user_single_step_siginfo into
      tracehook_report_syscall_exit itself, to make it clear that the local
      variable siginfo gets fully initialized.
      
      In a few cases the scope of struct siginfo has been reduced to make it
      clear that siginfo siginfo is not used on other paths in the function
      in which it is declared.
      
      Instances of using memset to initialize siginfo have been replaced
      with calls clear_siginfo for clarity.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      3eb0f519
  27. 24 4月, 2018 1 次提交
  28. 17 4月, 2018 1 次提交
    • M
      arm64: kasan: avoid pfn_to_nid() before page array is initialized · 800cb2e5
      Mark Rutland 提交于
      In arm64's kasan_init(), we use pfn_to_nid() to find the NUMA node a
      span of memory is in, hoping to allocate shadow from the same NUMA node.
      However, at this point, the page array has not been initialized, and
      thus this is bogus.
      
      Since commit:
      
        f165b378 ("mm: uninitialized struct page poisoning sanity")
      
      ... accessing fields of the page array results in a boot time Oops(),
      highlighting this problem:
      
      [    0.000000] Unable to handle kernel paging request at virtual address dfff200000000000
      [    0.000000] Mem abort info:
      [    0.000000]   ESR = 0x96000004
      [    0.000000]   Exception class = DABT (current EL), IL = 32 bits
      [    0.000000]   SET = 0, FnV = 0
      [    0.000000]   EA = 0, S1PTW = 0
      [    0.000000] Data abort info:
      [    0.000000]   ISV = 0, ISS = 0x00000004
      [    0.000000]   CM = 0, WnR = 0
      [    0.000000] [dfff200000000000] address between user and kernel address ranges
      [    0.000000] Internal error: Oops: 96000004 [#1] PREEMPT SMP
      [    0.000000] Modules linked in:
      [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.16.0-07317-gf165b378 #42
      [    0.000000] Hardware name: ARM Juno development board (r1) (DT)
      [    0.000000] pstate: 80000085 (Nzcv daIf -PAN -UAO)
      [    0.000000] pc : __asan_load8+0x8c/0xa8
      [    0.000000] lr : __dump_page+0x3c/0x3b8
      [    0.000000] sp : ffff2000099b7ca0
      [    0.000000] x29: ffff2000099b7ca0 x28: ffff20000a1762c0
      [    0.000000] x27: ffff7e0000000000 x26: ffff2000099dd000
      [    0.000000] x25: ffff200009a3f960 x24: ffff200008f9c38c
      [    0.000000] x23: ffff20000a9d3000 x22: ffff200009735430
      [    0.000000] x21: fffffffffffffffe x20: ffff7e0001e50420
      [    0.000000] x19: ffff7e0001e50400 x18: 0000000000001840
      [    0.000000] x17: ffffffffffff8270 x16: 0000000000001840
      [    0.000000] x15: 0000000000001920 x14: 0000000000000004
      [    0.000000] x13: 0000000000000000 x12: 0000000000000800
      [    0.000000] x11: 1ffff0012d0f89ff x10: ffff10012d0f89ff
      [    0.000000] x9 : 0000000000000000 x8 : ffff8009687c5000
      [    0.000000] x7 : 0000000000000000 x6 : ffff10000f282000
      [    0.000000] x5 : 0000000000000040 x4 : fffffffffffffffe
      [    0.000000] x3 : 0000000000000000 x2 : dfff200000000000
      [    0.000000] x1 : 0000000000000005 x0 : 0000000000000000
      [    0.000000] Process swapper (pid: 0, stack limit = 0x        (ptrval))
      [    0.000000] Call trace:
      [    0.000000]  __asan_load8+0x8c/0xa8
      [    0.000000]  __dump_page+0x3c/0x3b8
      [    0.000000]  dump_page+0xc/0x18
      [    0.000000]  kasan_init+0x2e8/0x5a8
      [    0.000000]  setup_arch+0x294/0x71c
      [    0.000000]  start_kernel+0xdc/0x500
      [    0.000000] Code: aa0403e0 9400063c 17ffffee d343fc00 (38e26800)
      [    0.000000] ---[ end trace 67064f0e9c0cc338 ]---
      [    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
      [    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
      
      Let's fix this by using early_pfn_to_nid(), as other architectures do in
      their kasan init code. Note that early_pfn_to_nid acquires the nid from
      the memblock array, which we iterate over in kasan_init(), so this
      should be fine.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Fixes: 39d114dd ("arm64: add KASAN support")
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      800cb2e5
  29. 12 4月, 2018 1 次提交
    • K
      exec: pass stack rlimit into mm layout functions · 8f2af155
      Kees Cook 提交于
      Patch series "exec: Pin stack limit during exec".
      
      Attempts to solve problems with the stack limit changing during exec
      continue to be frustrated[1][2].  In addition to the specific issues
      around the Stack Clash family of flaws, Andy Lutomirski pointed out[3]
      other places during exec where the stack limit is used and is assumed to
      be unchanging.  Given the many places it gets used and the fact that it
      can be manipulated/raced via setrlimit() and prlimit(), I think the only
      way to handle this is to move away from the "current" view of the stack
      limit and instead attach it to the bprm, and plumb this down into the
      functions that need to know the stack limits.  This series implements
      the approach.
      
      [1] 04e35f44 ("exec: avoid RLIMIT_STACK races with prlimit()")
      [2] 779f4e1c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"")
      [3] to security@kernel.org, "Subject: existing rlimit races?"
      
      This patch (of 3):
      
      Since it is possible that the stack rlimit can change externally during
      exec (either via another thread calling setrlimit() or another process
      calling prlimit()), provide a way to pass the rlimit down into the
      per-architecture mm layout functions so that the rlimit can stay in the
      bprm structure instead of sitting in the signal structure until exec is
      finalized.
      
      Link: http://lkml.kernel.org/r/1518638796-20819-2-git-send-email-keescook@chromium.orgSigned-off-by: NKees Cook <keescook@chromium.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Cc: Brad Spengler <spender@grsecurity.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f2af155
  30. 27 3月, 2018 2 次提交
    • W
      Revert "arm64: Revert L1_CACHE_SHIFT back to 6 (64-byte cache line size)" · 3f251cf0
      Will Deacon 提交于
      This reverts commit 1f85b42a.
      
      The internal dma-direct.h API has changed in -next, which collides with
      us trying to use it to manage non-coherent DMA devices on systems with
      unreasonably large cache writeback granules.
      
      This isn't at all trivial to resolve, so revert our changes for now and
      we can revisit this after the merge window. Effectively, this just
      restores our behaviour back to that of 4.16.
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      3f251cf0
    • S
      arm64: Delay enabling hardware DBM feature · 05abb595
      Suzuki K Poulose 提交于
      We enable hardware DBM bit in a capable CPU, very early in the
      boot via __cpu_setup. This doesn't give us a flexibility of
      optionally disable the feature, as the clearing the bit
      is a bit costly as the TLB can cache the settings. Instead,
      we delay enabling the feature until the CPU is brought up
      into the kernel. We use the feature capability mechanism
      to handle it.
      
      The hardware DBM is a non-conflicting feature. i.e, the kernel
      can safely run with a mix of CPUs with some using the feature
      and the others don't. So, it is safe for a late CPU to have
      this capability and enable it, even if the active CPUs don't.
      
      To get this handled properly by the infrastructure, we
      unconditionally set the capability and only enable it
      on CPUs which really have the feature. Also, we print the
      feature detection from the "matches" call back to make sure
      we don't mislead the user when none of the CPUs could use the
      feature.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Reviewed-by: NDave Martin <dave.martin@arm.com>
      Signed-off-by: NSuzuki K Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      05abb595