1. 16 8月, 2017 11 次提交
    • M
      arm64: add basic VMAP_STACK support · e3067861
      Mark Rutland 提交于
      This patch enables arm64 to be built with vmap'd task and IRQ stacks.
      
      As vmap'd stacks are mapped at page granularity, stacks must be a multiple of
      PAGE_SIZE. This means that a 64K page kernel must use stacks of at least 64K in
      size.
      
      To minimize the increase in Image size, IRQ stacks are dynamically allocated at
      boot time, rather than embedding the boot CPU's IRQ stack in the kernel image.
      
      This patch was co-authored by Ard Biesheuvel and Mark Rutland.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      e3067861
    • M
      arm64: use an irq stack pointer · f60fe78f
      Mark Rutland 提交于
      We allocate our IRQ stacks using a percpu array. This allows us to generate our
      IRQ stack pointers with adr_this_cpu, but bloats the kernel Image with the boot
      CPU's IRQ stack. Additionally, these are packed with other percpu variables,
      and aren't guaranteed to have guard pages.
      
      When we enable VMAP_STACK we'll want to vmap our IRQ stacks also, in order to
      provide guard pages and to permit more stringent alignment requirements. Doing
      so will require that we use a percpu pointer to each IRQ stack, rather than
      allocating a percpu IRQ stack in the kernel image.
      
      This patch updates our IRQ stack code to use a percpu pointer to the base of
      each IRQ stack. This will allow us to change the way the stack is allocated
      with minimal changes elsewhere. In some cases we may try to backtrace before
      the IRQ stack pointers are initialised, so on_irq_stack() is updated to account
      for this.
      
      In testing with cyclictest, there was no measureable difference between using
      adr_this_cpu (for irq_stack) and ldr_this_cpu (for irq_stack_ptr) in the IRQ
      entry path.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      f60fe78f
    • A
      arm64: assembler: allow adr_this_cpu to use the stack pointer · 8ea41b11
      Ard Biesheuvel 提交于
      Given that adr_this_cpu already requires a temp register in addition
      to the destination register, tweak the instruction sequence so that sp
      may be used as well.
      
      This will simplify switching to per-cpu stacks in subsequent patches. While
      this limits the range of adr_this_cpu, to +/-4GiB, we don't currently use
      adr_this_cpu in modules, and this is not problematic for the main kernel image.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [Mark: add more commit text]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      8ea41b11
    • M
      arm64: factor out entry stack manipulation · b11e5759
      Mark Rutland 提交于
      In subsequent patches, we will detect stack overflow in our exception
      entry code, by verifying the SP after it has been decremented to make
      space for the exception regs.
      
      This verification code is small, and we can minimize its impact by
      placing it directly in the vectors. To avoid redundant modification of
      the SP, we also need to move the initial decrement of the SP into the
      vectors.
      
      As a preparatory step, this patch introduces kernel_ventry, which
      performs this decrement, and updates the entry code accordingly.
      Subsequent patches will fold SP verification into kernel_ventry.
      
      There should be no functional change as a result of this patch.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [Mark: turn into prep patch, expand commit msg]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      b11e5759
    • M
      efi/arm64: add EFI_KIMG_ALIGN · 170976bc
      Mark Rutland 提交于
      The EFI stub is intimately coupled with the kernel, and takes advantage
      of this by relocating the kernel at a weaker alignment than the
      documented boot protocol mandates.
      
      However, it does so by assuming it can align the kernel to the segment
      alignment, and assumes that this is 64K. In subsequent patches, we'll
      have to consider other details to determine this de-facto alignment
      constraint.
      
      This patch adds a new EFI_KIMG_ALIGN definition that will track the
      kernel's de-facto alignment requirements. Subsequent patches will modify
      this as required.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      170976bc
    • M
      arm64: move SEGMENT_ALIGN to <asm/memory.h> · 8018ba4e
      Mark Rutland 提交于
      Currently we define SEGMENT_ALIGN directly in our vmlinux.lds.S.
      
      This is unfortunate, as the EFI stub currently open-codes the same
      number, and in future we'll want to fiddle with this.
      
      This patch moves the definition to our <asm/memory.h>, where it can be
      used by both vmlinux.lds.S and the EFI stub code.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      8018ba4e
    • M
      arm64: clean up irq stack definitions · f60ad4ed
      Mark Rutland 提交于
      Before we add yet another stack to the kernel, it would be nice to
      ensure that we consistently organise stack definitions and related
      helper functions.
      
      This patch moves the basic IRQ stack defintions to <asm/memory.h> to
      live with their task stack counterparts. Helpers used for unwinding are
      moved into <asm/stacktrace.h>, where subsequent patches will add helpers
      for other stacks. Includes are fixed up accordingly.
      
      This patch is a pure refactoring -- there should be no functional
      changes as a result of this patch.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      f60ad4ed
    • M
      arm64: clean up THREAD_* definitions · dbc9344a
      Mark Rutland 提交于
      Currently we define THREAD_SIZE and THREAD_SIZE_ORDER separately, with
      the latter dependent on particular CONFIG_ARM64_*K_PAGES definitions.
      This is somewhat opaque, and will get in the way of future modifications
      to THREAD_SIZE.
      
      This patch cleans this up, defining both in terms of a common
      THREAD_SHIFT, and using PAGE_SHIFT to calculate THREAD_SIZE_ORDER,
      rather than using a number of definitions dependent on config symbols.
      Subsequent patches will make use of this to alter the stack size used in
      some configurations.
      
      At the same time, these are moved into <asm/memory.h>, which will avoid
      circular include issues in subsequent patches. To ensure that existing
      code isn't adversely affected, <asm/thread_info.h> is updated to
      transitively include these definitions.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      dbc9344a
    • M
      arm64: factor out PAGE_* and CONT_* definitions · b6531456
      Mark Rutland 提交于
      Some headers rely on PAGE_* definitions from <asm/page.h>, but cannot
      include this due to potential circular includes. For example, a number
      of definitions in <asm/memory.h> rely on PAGE_SHIFT, and <asm/page.h>
      includes <asm/memory.h>.
      
      This requires users of these definitions to include both headers, which
      is fragile and error-prone.
      
      This patch ameliorates matters by moving the basic definitions out to a
      new header, <asm/page-def.h>. Both <asm/page.h> and <asm/memory.h> are
      updated to include this, avoiding this fragility, and avoiding the
      possibility of circular include dependencies.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      b6531456
    • A
      arm64: kernel: remove {THREAD,IRQ_STACK}_START_SP · 34be98f4
      Ard Biesheuvel 提交于
      For historical reasons, we leave the top 16 bytes of our task and IRQ
      stacks unused, a practice used to ensure that the SP can always be
      masked to find the base of the current stack (historically, where
      thread_info could be found).
      
      However, this is not necessary, as:
      
      * When an exception is taken from a task stack, we decrement the SP by
        S_FRAME_SIZE and stash the exception registers before we compare the
        SP against the task stack. In such cases, the SP must be at least
        S_FRAME_SIZE below the limit, and can be safely masked to determine
        whether the task stack is in use.
      
      * When transitioning to an IRQ stack, we'll place a dummy frame onto the
        IRQ stack before enabling asynchronous exceptions, or executing code
        we expect to trigger faults. Thus, if an exception is taken from the
        IRQ stack, the SP must be at least 16 bytes below the limit.
      
      * We no longer mask the SP to find the thread_info, which is now found
        via sp_el0. Note that historically, the offset was critical to ensure
        that cpu_switch_to() found the correct stack for new threads that
        hadn't yet executed ret_from_fork().
      
      Given that, this initial offset serves no purpose, and can be removed.
      This brings us in-line with other architectures (e.g. x86) which do not
      rely on this masking.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [Mark: rebase, kill THREAD_START_SP, commit msg additions]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      34be98f4
    • M
      arm64: remove __die()'s stack dump · c5bc503c
      Mark Rutland 提交于
      Our __die() implementation tries to dump the stack memory, in addition
      to a backtrace, which is problematic.
      
      For contemporary 16K stacks, this can be a lot of data, which can take a
      long time to dump, and can push other useful context out of the kernel's
      printk ringbuffer (and/or a user's scrollback buffer on an attached
      console).
      
      Additionally, the code implicitly assumes that the SP is on the task's
      stack, and tries to dump everything between the SP and the highest task
      stack address. When the SP points at an IRQ stack (or is corrupted),
      this makes the kernel attempt to dump vast amounts of VA space. With
      vmap'd stacks, this may result in erroneous accesses to peripherals.
      
      This patch removes the memory dump, leaving us to rely on the backtrace,
      and other means of dumping stack memory such as kdump.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Reviewed-by: NWill Deacon <will.deacon@arm.com>
      Tested-by: NLaura Abbott <labbott@redhat.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      c5bc503c
  2. 09 8月, 2017 2 次提交
    • A
      arm64: unwind: remove sp from struct stackframe · 31e43ad3
      Ard Biesheuvel 提交于
      The unwind code sets the sp member of struct stackframe to
      'frame pointer + 0x10' unconditionally, without regard for whether
      doing so produces a legal value. So let's simply remove it now that
      we have stopped using it anyway.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      31e43ad3
    • A
      arm64: unwind: reference pt_regs via embedded stack frame · 73267498
      Ard Biesheuvel 提交于
      As it turns out, the unwind code is slightly broken, and probably has
      been for a while. The problem is in the dumping of the exception stack,
      which is intended to dump the contents of the pt_regs struct at each
      level in the call stack where an exception was taken and routed to a
      routine marked as __exception (which means its stack frame is right
      below the pt_regs struct on the stack).
      
      'Right below the pt_regs struct' is ill defined, though: the unwind
      code assigns 'frame pointer + 0x10' to the .sp member of the stackframe
      struct at each level, and dump_backtrace() happily dereferences that as
      the pt_regs pointer when encountering an __exception routine. However,
      the actual size of the stack frame created by this routine (which could
      be one of many __exception routines we have in the kernel) is not known,
      and so frame.sp is pretty useless to figure out where struct pt_regs
      really is.
      
      So it seems the only way to ensure that we can find our struct pt_regs
      when walking the stack frames is to put it at a known fixed offset of
      the stack frame pointer that is passed to such __exception routines.
      The simplest way to do that is to put it inside pt_regs itself, which is
      the main change implemented by this patch. As a bonus, doing this allows
      us to get rid of a fair amount of cruft related to walking from one stack
      to the other, which is especially nice since we intend to introduce yet
      another stack for overflow handling once we add support for vmapped
      stacks. It also fixes an inconsistency where we only add a stack frame
      pointing to ELR_EL1 if we are executing from the IRQ stack but not when
      we are executing from the task stack.
      
      To consistly identify exceptions regs even in the presence of exceptions
      taken from entry code, we must check whether the next frame was created
      by entry text, rather than whether the current frame was crated by
      exception text.
      
      To avoid backtracing using PCs that fall in the idmap, or are controlled
      by userspace, we must explcitly zero the FP and LR in startup paths, and
      must ensure that the frame embedded in pt_regs is zeroed upon entry from
      EL0. To avoid these NULL entries showin in the backtrace, unwind_frame()
      is updated to avoid them.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      [Mark: compare current frame against .entry.text, avoid bogus PCs]
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      73267498
  3. 08 8月, 2017 5 次提交
    • A
      arm64: unwind: disregard frame.sp when validating frame pointer · c7365330
      Ard Biesheuvel 提交于
      Currently, when unwinding the call stack, we validate the frame pointer
      of each frame against frame.sp, whose value is not clearly defined, and
      which makes it more difficult to link stack frames together across
      different stacks. It is far better to simply check whether the frame
      pointer itself points into a valid stack.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      c7365330
    • M
      arm64: unwind: avoid percpu indirection for irq stack · 09668372
      Mark Rutland 提交于
      Our IRQ_STACK_PTR() and on_irq_stack() helpers both take a cpu argument,
      used to generate a percpu address. In all cases, they are passed
      {raw_,}smp_processor_id(), so this parameter is redundant.
      
      Since {raw_,}smp_processor_id() use a percpu variable internally, this
      approach means we generate a percpu offset to find the current cpu, then
      use this to index an array of percpu offsets, which we then use to find
      the current CPU's IRQ stack pointer. Thus, most of the work is
      redundant.
      
      Instead, we can consistently use raw_cpu_ptr() to generate the CPU's
      irq_stack pointer by simply adding the percpu offset to the irq_stack
      address, which is simpler in both respects.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      09668372
    • M
      arm64: move non-entry code out of .entry.text · ed84b4e9
      Mark Rutland 提交于
      Currently, cpu_switch_to and ret_from_fork both live in .entry.text,
      though neither form the critical path for an exception entry.
      
      In subsequent patches, we will require that code in .entry.text is part
      of the critical path for exception entry, for which we can assume
      certain properties (e.g. the presence of exception regs on the stack).
      
      Neither cpu_switch_to nor ret_from_fork will meet these requirements, so
      we must move them out of .entry.text. To ensure that neither are kprobed
      after being moved out of .entry.text, we must explicitly blacklist them,
      requiring a new NOKPROBE() asm helper.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      ed84b4e9
    • M
      arm64: consistently use bl for C exception entry · 2d0e751a
      Mark Rutland 提交于
      In most cases, our exception entry assembly branches to C handlers with
      a BL instruction, but in cases where we do not expect to return, we use
      B instead.
      
      While this is correct today, it means that backtraces for fatal
      exceptions miss the entry assembly (as the LR is stale at the point we
      call C code), while non-fatal exceptions have the entry assembly in the
      LR. In subsequent patches, we will need the LR to be set in these cases
      in order to backtrace reliably.
      
      This patch updates these sites to use a BL, ensuring consistency, and
      preparing for backtrace rework. An ASM_BUG() is added after each of
      these new BLs, which both catches unexpected returns, and ensures that
      the LR value doesn't point to another function label.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      2d0e751a
    • M
      arm64: Add ASM_BUG() · db44e9c5
      Mark Rutland 提交于
      Currently. we can only use BUG() from C code, though there are
      situations where we would like an equivalent mechanism in assembly code.
      
      This patch refactors our BUG() definition such that it can be used in
      either C or assembly, in the form of a new ASM_BUG().
      
      The refactoring requires the removal of escape sequences, such as '\n'
      and '\t', but these aren't strictly necessary as we can use ';' to
      terminate assembler statements.
      
      The low-level assembly is factored out into <asm/asm-bug.h>, with
      <asm/bug.h> retained as the C wrapper.
      Signed-off-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Dave Martin <dave.martin@arm.com>
      Cc: James Morse <james.morse@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      db44e9c5
  4. 28 7月, 2017 1 次提交
  5. 26 7月, 2017 1 次提交
  6. 25 7月, 2017 1 次提交
  7. 21 7月, 2017 1 次提交
    • P
      arm64/numa: Drop duplicate message · ece4b206
      Punit Agrawal 提交于
      When booting linux on a system without CONFIG_NUMA enabled, the
      following messages are printed during boot -
      
      NUMA: Faking a node at [mem 0x0000000000000000-0x00000083ffffffff]
      NUMA: Adding memblock [0x8000000000 - 0x8000e7ffff] on node 0
      NUMA: Adding memblock [0x8000e80000 - 0x83f65cffff] on node 0
      NUMA: Adding memblock [0x83f65d0000 - 0x83f665ffff] on node 0
      NUMA: Adding memblock [0x83f6660000 - 0x83f676ffff] on node 0
      NUMA: Adding memblock [0x83f6770000 - 0x83f678ffff] on node 0
      NUMA: Adding memblock [0x83f6790000 - 0x83fb82ffff] on node 0
      NUMA: Adding memblock [0x83fb830000 - 0x83fbc0ffff] on node 0
      NUMA: Adding memblock [0x83fbc10000 - 0x83fbdfffff] on node 0
      NUMA: Adding memblock [0x83fbe00000 - 0x83fbffffff] on node 0
      NUMA: Adding memblock [0x83fc000000 - 0x83fffbffff] on node 0
      NUMA: Adding memblock [0x83fffc0000 - 0x83fffdffff] on node 0
      NUMA: Adding memblock [0x83fffe0000 - 0x83ffffffff] on node 0
      NUMA: Initmem setup node 0 [mem 0x8000000000-0x83ffffffff]
      NUMA: NODE_DATA [mem 0x83fffec500-0x83fffedfff]
      
      The information is then duplicated by core kernel messages right after
      the above output.
      
      Early memory node ranges
        node   0: [mem 0x0000008000000000-0x0000008000e7ffff]
        node   0: [mem 0x0000008000e80000-0x00000083f65cffff]
        node   0: [mem 0x00000083f65d0000-0x00000083f665ffff]
        node   0: [mem 0x00000083f6660000-0x00000083f676ffff]
        node   0: [mem 0x00000083f6770000-0x00000083f678ffff]
        node   0: [mem 0x00000083f6790000-0x00000083fb82ffff]
        node   0: [mem 0x00000083fb830000-0x00000083fbc0ffff]
        node   0: [mem 0x00000083fbc10000-0x00000083fbdfffff]
        node   0: [mem 0x00000083fbe00000-0x00000083fbffffff]
        node   0: [mem 0x00000083fc000000-0x00000083fffbffff]
        node   0: [mem 0x00000083fffc0000-0x00000083fffdffff]
        node   0: [mem 0x00000083fffe0000-0x00000083ffffffff]
      Initmem setup node 0 [mem 0x0000008000000000-0x00000083ffffffff]
      
      Remove the duplication of memblock layout information printed during
      boot by dropping the messages from arm64 numa initialisation.
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      ece4b206
  8. 20 7月, 2017 6 次提交
  9. 13 7月, 2017 3 次提交
  10. 11 7月, 2017 2 次提交
  11. 10 7月, 2017 1 次提交
  12. 07 7月, 2017 4 次提交
    • P
      mm/hugetlb: add size parameter to huge_pte_offset() · 7868a208
      Punit Agrawal 提交于
      A poisoned or migrated hugepage is stored as a swap entry in the page
      tables.  On architectures that support hugepages consisting of
      contiguous page table entries (such as on arm64) this leads to ambiguity
      in determining the page table entry to return in huge_pte_offset() when
      a poisoned entry is encountered.
      
      Let's remove the ambiguity by adding a size parameter to convey
      additional information about the requested address.  Also fixup the
      definition/usage of huge_pte_offset() throughout the tree.
      
      Link: http://lkml.kernel.org/r/20170522133604.11392-4-punit.agrawal@arm.comSigned-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Acked-by: NSteve Capper <steve.capper@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com> (odd fixer:METAG ARCHITECTURE)
      Cc: Ralf Baechle <ralf@linux-mips.org> (supporter:MIPS)
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Rich Felker <dalias@libc.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7868a208
    • S
      arm64: hugetlb: remove spurious calls to huge_ptep_offset() · f0b38d65
      Steve Capper 提交于
      We don't need to call huge_ptep_offset as our accessors are already
      supplied with the pte_t *.  This patch removes those spurious calls.
      
      [punit.agrawal@arm.com: resolve rebase conflicts due to patch re-ordering]
      Link: http://lkml.kernel.org/r/20170524115409.31309-3-punit.agrawal@arm.comSigned-off-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Cc: David Woods <dwoods@mellanox.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0b38d65
    • S
      arm64: hugetlb: refactor find_num_contig() · bb9dd3df
      Steve Capper 提交于
      Patch series "Support for contiguous pte hugepages", v4.
      
      This patchset updates the hugetlb code to fix issues arising from
      contiguous pte hugepages (such as on arm64).  Compared to v3, This
      version addresses a build failure on arm64 by including two cleanup
      patches.  Other than the arm64 cleanups, the rest are generic code
      changes.  The remaining arm64 support based on these patches will be
      posted separately.  The patches are based on v4.12-rc2.  Previous
      related postings can be found at [0], [1], [2], and [3].
      
      The patches fall into three categories -
      
      * Patch 1-2 - arm64 cleanups required to greatly simplify changing
        huge_pte_offset() prototype in Patch 5.
      
        Catalin, Will - are you happy for these patches to go via mm?
      
      * Patches 3-4 address issues with gup
      
      * Patches 5-8 relate to passing a size argument to hugepage helpers to
        disambiguate the size of the referred page. These changes are
        required to enable arch code to properly handle swap entries for
        contiguous pte hugepages.
      
        The changes to huge_pte_offset() (patch 5) touch multiple
        architectures but I've managed to minimise these changes for the
        other affected functions - huge_pte_clear() and set_huge_pte_at().
      
      These patches gate the enabling of contiguous hugepages support on arm64
      which has been requested for systems using !4k page granule.
      
      The ARM64 architecture supports two flavours of hugepages -
      
      * Block mappings at the pud/pmd level
      
        These are regular hugepages where a pmd or a pud page table entry
        points to a block of memory. Depending on the PAGE_SIZE in use the
        following size of block mappings are supported -
      
                PMD	PUD
                ---	---
        4K:      2M	 1G
        16K:    32M
        64K:   512M
      
        For certain applications/usecases such as HPC and large enterprise
        workloads, folks are using 64k page size but the minimum hugepage size
        of 512MB isn't very practical.
      
      To overcome this ...
      
      * Using the Contiguous bit
      
        The architecture provides a contiguous bit in the translation table
        entry which acts as a hint to the mmu to indicate that it is one of a
        contiguous set of entries that can be cached in a single TLB entry.
      
        We use the contiguous bit in Linux to increase the mapping size at the
        pmd and pte (last) level.
      
        The number of supported contiguous entries varies by page size and
        level of the page table.
      
        Using the contiguous bit allows additional hugepage sizes -
      
                 CONT PTE    PMD    CONT PMD    PUD
                 --------    ---    --------    ---
          4K:         64K     2M         32M     1G
          16K:         2M    32M          1G
          64K:         2M   512M         16G
      
        Of these, 64K with 4K and 2M with 64K pages have been explicitly
        requested by a few different users.
      
      Entries with the contiguous bit set are required to be modified all
      together - which makes things like memory poisoning and migration
      impossible to do correctly without knowing the size of hugepage being
      dealt with - the reason for adding size parameter to a few of the
      hugepage helpers in this series.
      
      This patch (of 8):
      
      As we regularly check for contiguous pte's in the huge accessors, remove
      this extra check from find_num_contig.
      
      [punit.agrawal@arm.com: resolve rebase conflicts due to patch re-ordering]
      Link: http://lkml.kernel.org/r/20170524115409.31309-2-punit.agrawal@arm.comSigned-off-by: NSteve Capper <steve.capper@arm.com>
      Signed-off-by: NPunit Agrawal <punit.agrawal@arm.com>
      Cc: David Woods <dwoods@mellanox.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb9dd3df
    • A
      mm/hugetlb: clean up ARCH_HAS_GIGANTIC_PAGE · e1073d1e
      Aneesh Kumar K.V 提交于
      This moves the #ifdef in C code to a Kconfig dependency.  Also we move
      the gigantic_page_supported() function to be arch specific.
      
      This allows architectures to conditionally enable runtime allocation of
      gigantic huge page.  Architectures like ppc64 supports different
      gigantic huge page size (16G and 1G) based on the translation mode
      selected.  This provides an opportunity for ppc64 to enable runtime
      allocation only w.r.t 1G hugepage.
      
      No functional change in this patch.
      
      Link: http://lkml.kernel.org/r/1494995292-4443-1-git-send-email-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e1073d1e
  13. 04 7月, 2017 1 次提交
  14. 03 7月, 2017 1 次提交