1. 10 10月, 2017 7 次提交
    • M
      x86/hyperv: Fix hypercalls with extended CPU ranges for TLB flushing · ab7ff471
      Marcelo Henrique Cerri 提交于
      Do not consider the fixed size of hv_vp_set when passing the variable
      header size to hv_do_rep_hypercall().
      
      The Hyper-V hypervisor specification states that for a hypercall with a
      variable header only the size of the variable portion should be supplied
      via the input control.
      
      For HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX/LIST_EX calls that means the
      fixed portion of hv_vp_set should not be considered.
      
      That fixes random failures of some applications that are unexpectedly
      killed with SIGBUS or SIGSEGV.
      Signed-off-by: NMarcelo Henrique Cerri <marcelo.cerri@canonical.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: Josh Poulson <jopoulso@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Simon Xiao <sixiao@microsoft.com>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: devel@linuxdriverproject.org
      Fixes: 628f54cc ("x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls")
      Link: http://lkml.kernel.org/r/1507210469-29065-1-git-send-email-marcelo.cerri@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ab7ff471
    • V
      x86/hyperv: Don't use percpu areas for pcpu_flush/pcpu_flush_ex structures · 60d73a7c
      Vitaly Kuznetsov 提交于
      hv_do_hypercall() does virt_to_phys() translation and with some configs
      (CONFIG_SLAB) this doesn't work for percpu areas, we pass wrong memory to
      hypervisor and get #GP. We could use working slow_virt_to_phys() instead
      but doing so kills the performance.
      
      Move pcpu_flush/pcpu_flush_ex structures out of percpu areas and
      allocate memory on first call. The additional level of indirection gives
      us a small performance penalty, in future we may consider introducing
      hypercall functions which avoid virt_to_phys() conversion and cache
      physical addresses of pcpu_flush/pcpu_flush_ex structures somewhere.
      Reported-by: NSimon Xiao <sixiao@microsoft.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: Jork Loeser <Jork.Loeser@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: devel@linuxdriverproject.org
      Link: http://lkml.kernel.org/r/20171005113924.28021-1-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      60d73a7c
    • V
      x86/hyperv: Clear vCPU banks between calls to avoid flushing unneeded vCPUs · a3b74243
      Vitaly Kuznetsov 提交于
      hv_flush_pcpu_ex structures are not cleared between calls for performance
      reasons (they're variable size up to PAGE_SIZE each) but we must clear
      hv_vp_set.bank_contents part of it to avoid flushing unneeded vCPUs. The
      rest of the structure is formed correctly.
      
      To do the clearing in an efficient way stash the maximum possible vCPU
      number (this may differ from Linux CPU id).
      Reported-by: NJork Loeser <Jork.Loeser@microsoft.com>
      Signed-off-by: NVitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Dexuan Cui <decui@microsoft.com>
      Cc: Haiyang Zhang <haiyangz@microsoft.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Hemminger <sthemmin@microsoft.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: devel@linuxdriverproject.org
      Link: http://lkml.kernel.org/r/20171006154854.18092-1-vkuznets@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a3b74243
    • J
      x86/unwind: Disable unwinder warnings on 32-bit · d4a2d031
      Josh Poimboeuf 提交于
      x86-32 doesn't have stack validation, so in most cases it doesn't make
      sense to warn about bad frame pointers.
      Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: LKP <lkp@01.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/a69658760800bf281e6353248c23e0fa0acf5230.1507597785.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d4a2d031
    • J
      x86/unwind: Align stack pointer in unwinder dump · 99bd28a4
      Josh Poimboeuf 提交于
      When printing the unwinder dump, the stack pointer could be unaligned,
      for one of two reasons:
      
      - stack corruption; or
      
      - GCC created an unaligned stack.
      
      There's no way for the unwinder to tell the difference between the two,
      so we have to assume one or the other.  GCC unaligned stacks are very
      rare, and have only been spotted before GCC 5.  Presumably, if we're
      doing an unwinder stack dump, stack corruption is more likely than a
      GCC unaligned stack.  So always align the stack before starting the
      dump.
      Reported-and-tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: LKP <lkp@01.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/2f540c515946ab09ed267e1a1d6421202a0cce08.1507597785.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      99bd28a4
    • J
      x86/unwind: Use MSB for frame pointer encoding on 32-bit · 5c99b692
      Josh Poimboeuf 提交于
      On x86-32, Tetsuo Handa and Fengguang Wu reported unwinder warnings
      like:
      
        WARNING: kernel stack regs at f60bb9c8 in swapper:1 has bad 'bp' value 0ba00000
      
      And also there were some stack dumps with a bunch of unreliable '?'
      symbols after an apic_timer_interrupt symbol, meaning the unwinder got
      confused when it tried to read the regs.
      
      The cause of those issues is that, with GCC 4.8 (and possibly older),
      there are cases where GCC misaligns the stack pointer in a leaf function
      for no apparent reason:
      
        c124a388 <acpi_rs_move_data>:
        c124a388:       55                      push   %ebp
        c124a389:       89 e5                   mov    %esp,%ebp
        c124a38b:       57                      push   %edi
        c124a38c:       56                      push   %esi
        c124a38d:       89 d6                   mov    %edx,%esi
        c124a38f:       53                      push   %ebx
        c124a390:       31 db                   xor    %ebx,%ebx
        c124a392:       83 ec 03                sub    $0x3,%esp
        ...
        c124a3e3:       83 c4 03                add    $0x3,%esp
        c124a3e6:       5b                      pop    %ebx
        c124a3e7:       5e                      pop    %esi
        c124a3e8:       5f                      pop    %edi
        c124a3e9:       5d                      pop    %ebp
        c124a3ea:       c3                      ret
      
      If an interrupt occurs in such a function, the regs on the stack will be
      unaligned, which breaks the frame pointer encoding assumption.  So on
      32-bit, use the MSB instead of the LSB to encode the regs.
      
      This isn't an issue on 64-bit, because interrupts align the stack before
      writing to it.
      Reported-and-tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: LKP <lkp@01.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/279a26996a482ca716605c7dbc7f2db9d8d91e81.1507597785.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5c99b692
    • J
      x86/unwind: Fix dereference of untrusted pointer · 62dd86ac
      Josh Poimboeuf 提交于
      Tetsuo Handa and Fengguang Wu reported a panic in the unwinder:
      
        BUG: unable to handle kernel NULL pointer dereference at 000001f2
        IP: update_stack_state+0xd4/0x340
        *pde = 00000000
      
        Oops: 0000 [#1] PREEMPT SMP
        CPU: 0 PID: 18728 Comm: 01-cpu-hotplug Not tainted 4.13.0-rc4-00170-gb09be676 #592
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
        task: bb0b53c0 task.stack: bb3ac000
        EIP: update_stack_state+0xd4/0x340
        EFLAGS: 00010002 CPU: 0
        EAX: 0000a570 EBX: bb3adccb ECX: 0000f401 EDX: 0000a570
        ESI: 00000001 EDI: 000001ba EBP: bb3adc6b ESP: bb3adc3f
         DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
        CR0: 80050033 CR2: 000001f2 CR3: 0b3a7000 CR4: 00140690
        DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
        DR6: fffe0ff0 DR7: 00000400
        Call Trace:
         ? unwind_next_frame+0xea/0x400
         ? __unwind_start+0xf5/0x180
         ? __save_stack_trace+0x81/0x160
         ? save_stack_trace+0x20/0x30
         ? __lock_acquire+0xfa5/0x12f0
         ? lock_acquire+0x1c2/0x230
         ? tick_periodic+0x3a/0xf0
         ? _raw_spin_lock+0x42/0x50
         ? tick_periodic+0x3a/0xf0
         ? tick_periodic+0x3a/0xf0
         ? debug_smp_processor_id+0x12/0x20
         ? tick_handle_periodic+0x23/0xc0
         ? local_apic_timer_interrupt+0x63/0x70
         ? smp_trace_apic_timer_interrupt+0x235/0x6a0
         ? trace_apic_timer_interrupt+0x37/0x3c
         ? strrchr+0x23/0x50
        Code: 0f 95 c1 89 c7 89 45 e4 0f b6 c1 89 c6 89 45 dc 8b 04 85 98 cb 74 bc 88 4d e3 89 45 f0 83 c0 01 84 c9 89 04 b5 98 cb 74 bc 74 3b <8b> 47 38 8b 57 34 c6 43 1d 01 25 00 00 02 00 83 e2 03 09 d0 83
        EIP: update_stack_state+0xd4/0x340 SS:ESP: 0068:bb3adc3f
        CR2: 00000000000001f2
        ---[ end trace 0d147fd4aba8ff50 ]---
        Kernel panic - not syncing: Fatal exception in interrupt
      
      On x86-32, after decoding a frame pointer to get a regs address,
      regs_size() dereferences the regs pointer when it checks regs->cs to see
      if the regs are user mode.  This is dangerous because it's possible that
      what looks like a decoded frame pointer is actually a corrupt value, and
      we don't want the unwinder to make things worse.
      
      Instead of calling regs_size() on an unsafe pointer, just assume they're
      kernel regs to start with.  Later, once it's safe to access the regs, we
      can do the user mode check and corresponding safety check for the
      remaining two regs.
      Reported-and-tested-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Byungchul Park <byungchul.park@lge.com>
      Cc: LKP <lkp@01.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 5ed8d8bb ("x86/unwind: Move common code into update_stack_state()")
      Link: http://lkml.kernel.org/r/7f95b9a6993dec7674b3f3ab3dcd3294f7b9644d.1507597785.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      62dd86ac
  2. 09 10月, 2017 2 次提交
  3. 04 10月, 2017 2 次提交
  4. 02 10月, 2017 9 次提交
    • L
      Linux 4.14-rc3 · 9e66317d
      Linus Torvalds 提交于
      9e66317d
    • L
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 368f8998
      Linus Torvalds 提交于
      Pull x86 fixes from Thomas Gleixner:
       "This contains the following fixes and improvements:
      
         - Avoid dereferencing an unprotected VMA pointer in the fault signal
           generation code
      
         - Fix inline asm call constraints for GCC 4.4
      
         - Use existing register variable to retrieve the stack pointer
           instead of forcing the compiler to create another indirect access
           which results in excessive extra 'mov %rsp, %<dst>' instructions
      
         - Disable branch profiling for the memory encryption code to prevent
           an early boot crash
      
         - Fix a sparse warning caused by casting the __user annotation in
           __get_user_asm_u64() away
      
         - Fix an off by one error in the loop termination of the error patch
           in the x86 sysfs init code
      
         - Add missing CPU IDs to various Intel specific drivers to enable the
           functionality on recent hardware
      
         - More (init) constification in the numachip code"
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        x86/asm: Use register variable to get stack pointer value
        x86/mm: Disable branch profiling in mem_encrypt.c
        x86/asm: Fix inline asm call constraints for GCC 4.4
        perf/x86/intel/uncore: Correct num_boxes for IIO and IRP
        perf/x86/intel/rapl: Add missing CPU IDs
        perf/x86/msr: Add missing CPU IDs
        perf/x86/intel/cstate: Add missing CPU IDs
        x86: Don't cast away the __user in __get_user_asm_u64()
        x86/sysfs: Fix off-by-one error in loop termination
        x86/mm: Fix fault error path using unsafe vma pointer
        x86/numachip: Add const and __initconst to numachip2_clockevent
      368f8998
    • L
      Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · c42ed9f9
      Linus Torvalds 提交于
      Pull timer fixes from Thomas Gleixner:
       "This adds a new timer wheel function which is required for the
        conversion of the timer callback function from the 'unsigned long
        data' argument to 'struct timer_list *timer'. This conversion has two
        benefits:
      
         1) It makes struct timer_list smaller
      
         2) Many callers hand in a pointer to the timer or to the structure
            containing the timer, which happens via type casting both at setup
            and in the callback. This change gets rid of the typecasts.
      
        Once the conversion is complete, which is planned for 4.15, the old
        setup function and the intermediate typecast in the new setup function
        go away along with the data field in struct timer_list.
      
        Merging this now into mainline allows a smooth queueing of the actual
        conversion in the affected maintainer trees without creating
        dependencies"
      
      * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        um/time: Fixup namespace collision
        timer: Prepare to change timer callback argument type
      c42ed9f9
    • L
      Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 82513545
      Linus Torvalds 提交于
      Pull smp/hotplug fixes from Thomas Gleixner:
       "This addresses the fallout of the new lockdep mechanism which covers
        completions in the CPU hotplug code.
      
        The lockdep splats are false positives, but there is no way to
        annotate that reliably. The solution is to split the completions for
        CPU up and down, which requires some reshuffling of the failure
        rollback handling as well"
      
      * 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        smp/hotplug: Hotplug state fail injection
        smp/hotplug: Differentiate the AP completion between up and down
        smp/hotplug: Differentiate the AP-work lockdep class between up and down
        smp/hotplug: Callback vs state-machine consistency
        smp/hotplug: Rewrite AP state machine core
        smp/hotplug: Allow external multi-instance rollback
        smp/hotplug: Add state diagram
      82513545
    • L
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 7e103ace
      Linus Torvalds 提交于
      Pull scheduler fixes from Thomas Gleixner:
       "The scheduler pull request comes with the following updates:
      
         - Prevent a divide by zero issue by validating the input value of
           sysctl_sched_time_avg
      
         - Make task state printing consistent all over the place and have
           explicit state characters for IDLE and PARKED so they wont be
           displayed as 'D' state which confuses tools"
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        sched/sysctl: Check user input value of sysctl_sched_time_avg
        sched/debug: Add explicit TASK_PARKED printing
        sched/debug: Ignore TASK_IDLE for SysRq-W
        sched/debug: Add explicit TASK_IDLE printing
        sched/tracing: Use common task-state helpers
        sched/tracing: Fix trace_sched_switch task-state printing
        sched/debug: Remove unused variable
        sched/debug: Convert TASK_state to hex
        sched/debug: Implement consistent task-state printing
      7e103ace
    • L
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1c6f705b
      Linus Torvalds 提交于
      Pull perf fixes from Thomas Gleixner:
      
       - Prevent a division by zero in the perf aux buffer handling
      
       - Sync kernel headers with perf tool headers
      
       - Fix a build failure in the syscalltbl code
      
       - Make the debug messages of perf report --call-graph work correctly
      
       - Make sure that all required perf files are in the MANIFEST for
         container builds
      
       - Fix the atrr.exclude kernel handling so it respects the
         perf_event_paranoid and the user permissions
      
       - Make perf test on s390x work correctly
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        perf/aux: Only update ->aux_wakeup in non-overwrite mode
        perf test: Fix vmlinux failure on s390x part 2
        perf test: Fix vmlinux failure on s390x
        perf tools: Fix syscalltbl build failure
        perf report: Fix debug messages with --call-graph option
        perf evsel: Fix attr.exclude_kernel setting for default cycles:p
        tools include: Sync kernel ABI headers with tooling headers
        perf tools: Get all of tools/{arch,include}/ in the MANIFEST
      1c6f705b
    • L
      Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 1de47f3c
      Linus Torvalds 提交于
      Pull  locking fixes from Thomas Gleixner:
       "Two fixes for locking:
      
         - Plug a hole the pi_stat->owner serialization which was changed
           recently and failed to fixup two usage sites.
      
         - Prevent reordering of the rwsem_has_spinner() check vs the
           decrement of rwsem count in up_write() which causes a missed
           wakeup"
      
      * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        locking/rwsem-xadd: Fix missed wakeup due to reordering of load
        futex: Fix pi_state->owner serialization
      1de47f3c
    • L
      Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 3d9d62b9
      Linus Torvalds 提交于
      Pull irq fixes from Thomas Gleixner:
      
       - Add a missing NULL pointer check in free_irq()
      
       - Fix a memory leak/memory corruption in the generic irq chip
      
       - Add missing rcu annotations for radix tree access
      
       - Use ffs instead of fls when extracting data from a chip register in
         the MIPS GIC irq driver
      
       - Fix the unmasking of IPI interrupts in the MIPS GIC driver so they
         end up at the target CPU and not at CPU0
      
      * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        irq/generic-chip: Don't replace domain's name
        irqdomain: Add __rcu annotations to radix tree accessors
        irqchip/mips-gic: Use effective affinity to unmask
        irqchip/mips-gic: Fix shifts to extract register fields
        genirq: Check __free_irq() return value for NULL
      3d9d62b9
    • L
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 156069f8
      Linus Torvalds 提交于
      Pull objtool fixes from Thomas Gleixner:
       "Two small fixes for objtool:
      
         - Support frame pointer setup via 'lea (%rsp), %rbp' which was not
           yet supported and caused build warnings
      
         - Disable unreacahble warnings for GCC4.4 and older to avoid false
           positives caused by the compiler itself"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        objtool: Support unoptimized frame pointer setup
        objtool: Skip unreachable warnings for GCC 4.4 and older
      156069f8
  5. 01 10月, 2017 2 次提交
  6. 30 9月, 2017 16 次提交
  7. 29 9月, 2017 2 次提交
    • W
      arm64: fault: Route pte translation faults via do_translation_fault · 760bfb47
      Will Deacon 提交于
      We currently route pte translation faults via do_page_fault, which elides
      the address check against TASK_SIZE before invoking the mm fault handling
      code. However, this can cause issues with the path walking code in
      conjunction with our word-at-a-time implementation because
      load_unaligned_zeropad can end up faulting in kernel space if it reads
      across a page boundary and runs into a page fault (e.g. by attempting to
      read from a guard region).
      
      In the case of such a fault, load_unaligned_zeropad has registered a
      fixup to shift the valid data and pad with zeroes, however the abort is
      reported as a level 3 translation fault and we dispatch it straight to
      do_page_fault, despite it being a kernel address. This results in calling
      a sleeping function from atomic context:
      
        BUG: sleeping function called from invalid context at arch/arm64/mm/fault.c:313
        in_atomic(): 0, irqs_disabled(): 0, pid: 10290
        Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
        [...]
        [<ffffff8e016cd0cc>] ___might_sleep+0x134/0x144
        [<ffffff8e016cd158>] __might_sleep+0x7c/0x8c
        [<ffffff8e016977f0>] do_page_fault+0x140/0x330
        [<ffffff8e01681328>] do_mem_abort+0x54/0xb0
        Exception stack(0xfffffffb20247a70 to 0xfffffffb20247ba0)
        [...]
        [<ffffff8e016844fc>] el1_da+0x18/0x78
        [<ffffff8e017f399c>] path_parentat+0x44/0x88
        [<ffffff8e017f4c9c>] filename_parentat+0x5c/0xd8
        [<ffffff8e017f5044>] filename_create+0x4c/0x128
        [<ffffff8e017f59e4>] SyS_mkdirat+0x50/0xc8
        [<ffffff8e01684e30>] el0_svc_naked+0x24/0x28
        Code: 36380080 d5384100 f9400800 9402566d (d4210000)
        ---[ end trace 2d01889f2bca9b9f ]---
      
      Fix this by dispatching all translation faults to do_translation_faults,
      which avoids invoking the page fault logic for faults on kernel addresses.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NAnkit Jain <ankijain@codeaurora.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      760bfb47
    • W
      arm64: mm: Use READ_ONCE when dereferencing pointer to pte table · f069faba
      Will Deacon 提交于
      On kernels built with support for transparent huge pages, different CPUs
      can access the PMD concurrently due to e.g. fast GUP or page_vma_mapped_walk
      and they must take care to use READ_ONCE to avoid value tearing or caching
      of stale values by the compiler. Unfortunately, these functions call into
      our pgtable macros, which don't use READ_ONCE, and compiler caching has
      been observed to cause the following crash during ext4 writeback:
      
      PC is at check_pte+0x20/0x170
      LR is at page_vma_mapped_walk+0x2e0/0x540
      [...]
      Process doio (pid: 2463, stack limit = 0xffff00000f2e8000)
      Call trace:
      [<ffff000008233328>] check_pte+0x20/0x170
      [<ffff000008233758>] page_vma_mapped_walk+0x2e0/0x540
      [<ffff000008234adc>] page_mkclean_one+0xac/0x278
      [<ffff000008234d98>] rmap_walk_file+0xf0/0x238
      [<ffff000008236e74>] rmap_walk+0x64/0xa0
      [<ffff0000082370c8>] page_mkclean+0x90/0xa8
      [<ffff0000081f3c64>] clear_page_dirty_for_io+0x84/0x2a8
      [<ffff00000832f984>] mpage_submit_page+0x34/0x98
      [<ffff00000832fb4c>] mpage_process_page_bufs+0x164/0x170
      [<ffff00000832fc8c>] mpage_prepare_extent_to_map+0x134/0x2b8
      [<ffff00000833530c>] ext4_writepages+0x484/0xe30
      [<ffff0000081f6ab4>] do_writepages+0x44/0xe8
      [<ffff0000081e5bd4>] __filemap_fdatawrite_range+0xbc/0x110
      [<ffff0000081e5e68>] file_write_and_wait_range+0x48/0xd8
      [<ffff000008324310>] ext4_sync_file+0x80/0x4b8
      [<ffff0000082bd434>] vfs_fsync_range+0x64/0xc0
      [<ffff0000082332b4>] SyS_msync+0x194/0x1e8
      
      This is because page_vma_mapped_walk loads the PMD twice before calling
      pte_offset_map: the first time without READ_ONCE (where it gets all zeroes
      due to a concurrent pmdp_invalidate) and the second time with READ_ONCE
      (where it sees a valid table pointer due to a concurrent pmd_populate).
      However, the compiler inlines everything and caches the first value in
      a register, which is subsequently used in pte_offset_phys which returns
      a junk pointer that is later dereferenced when attempting to access the
      relevant pte.
      
      This patch fixes the issue by using READ_ONCE in pte_offset_phys to ensure
      that a stale value is not used. Whilst this is a point fix for a known
      failure (and simple to backport), a full fix moving all of our page table
      accessors over to {READ,WRITE}_ONCE and consistently using READ_ONCE in
      page_vma_mapped_walk is in the works for a future kernel release.
      
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Timur Tabi <timur@codeaurora.org>
      Cc: <stable@vger.kernel.org>
      Fixes: f27176cf ("mm: convert page_mkclean_one() to use page_vma_mapped_walk()")
      Tested-by: NRichard Ruigrok <rruigrok@codeaurora.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      f069faba