1. 09 8月, 2017 5 次提交
  2. 07 8月, 2017 3 次提交
    • J
      arm64: Decode information from ESR upon mem faults · 1f9b8936
      Julien Thierry 提交于
      When receiving unhandled faults from the CPU, description is very sparse.
      Adding information about faults decoded from ESR.
      
      Added defines to esr.h corresponding ESR fields. Values are based on ARM
      Archtecture Reference Manual (DDI 0487B.a), section D7.2.28 ESR_ELx, Exception
      Syndrome Register (ELx) (pages D7-2275 to D7-2280).
      
      New output is of the form:
      [   77.818059] Mem abort info:
      [   77.820826]   Exception class = DABT (current EL), IL = 32 bits
      [   77.826706]   SET = 0, FnV = 0
      [   77.829742]   EA = 0, S1PTW = 0
      [   77.832849] Data abort info:
      [   77.835713]   ISV = 0, ISS = 0x00000070
      [   77.839522]   CM = 0, WnR = 1
      Signed-off-by: NJulien Thierry <julien.thierry@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      [catalin.marinas@arm.com: fix "%lu" in a pr_alert() call]
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      1f9b8936
    • D
      arm64: Abstract syscallno manipulation · 17c28958
      Dave Martin 提交于
      The -1 "no syscall" value is written in various ways, shared with
      the user ABI in some places, and generally obscure.
      
      This patch attempts to make things a little more consistent and
      readable by replacing all these uses with a single #define.  A
      couple of symbolic helpers are provided to clarify the intent
      further.
      
      Because the in-syscall check in do_signal() is changed from >= 0 to
      != NO_SYSCALL by this patch, different behaviour may be observable
      if syscallno is set to values less than -1 by a tracer.  However,
      this is not different from the behaviour that is already observable
      if a tracer sets syscallno to a value >= __NR_(compat_)syscalls.
      
      It appears that this can cause spurious syscall restarting, but
      that is not a new behaviour either, and does not appear harmful.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      17c28958
    • D
      arm64: syscallno is secretly an int, make it official · 35d0e6fb
      Dave Martin 提交于
      The upper 32 bits of the syscallno field in thread_struct are
      handled inconsistently, being sometimes zero extended and sometimes
      sign-extended.  In fact, only the lower 32 bits seem to have any
      real significance for the behaviour of the code: it's been OK to
      handle the upper bits inconsistently because they don't matter.
      
      Currently, the only place I can find where those bits are
      significant is in calling trace_sys_enter(), which may be
      unintentional: for example, if a compat tracer attempts to cancel a
      syscall by passing -1 to (COMPAT_)PTRACE_SET_SYSCALL at the
      syscall-enter-stop, it will be traced as syscall 4294967295
      rather than -1 as might be expected (and as occurs for a native
      tracer doing the same thing).  Elsewhere, reads of syscallno cast
      it to an int or truncate it.
      
      There's also a conspicuous amount of code and casting to bodge
      around the fact that although semantically an int, syscallno is
      stored as a u64.
      
      Let's not pretend any more.
      
      In order to preserve the stp x instruction that stores the syscall
      number in entry.S, this patch special-cases the layout of struct
      pt_regs for big endian so that the newly 32-bit syscallno field
      maps onto the low bits of the stored value.  This is not beautiful,
      but benchmarking of the getpid syscall on Juno suggests indicates a
      minor slowdown if the stp is split into an stp x and stp w.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      35d0e6fb
  3. 05 8月, 2017 1 次提交
  4. 04 8月, 2017 5 次提交
  5. 03 8月, 2017 6 次提交
    • W
      KVM: nVMX: Fix interrupt window request with "Acknowledge interrupt on exit" · 6550c4df
      Wanpeng Li 提交于
      ------------[ cut here ]------------
       WARNING: CPU: 5 PID: 2288 at arch/x86/kvm/vmx.c:11124 nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
       CPU: 5 PID: 2288 Comm: qemu-system-x86 Not tainted 4.13.0-rc2+ #7
       RIP: 0010:nested_vmx_vmexit+0xd64/0xd70 [kvm_intel]
      Call Trace:
        vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
        ? vmx_check_nested_events+0x131/0x1f0 [kvm_intel]
        kvm_arch_vcpu_ioctl_run+0x5dd/0x1be0 [kvm]
        ? vmx_vcpu_load+0x1be/0x220 [kvm_intel]
        ? kvm_arch_vcpu_load+0x62/0x230 [kvm]
        kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? kvm_vcpu_ioctl+0x340/0x700 [kvm]
        ? __fget+0xfc/0x210
        do_vfs_ioctl+0xa4/0x6a0
        ? __fget+0x11d/0x210
        SyS_ioctl+0x79/0x90
        do_syscall_64+0x8f/0x750
        ? trace_hardirqs_on_thunk+0x1a/0x1c
        entry_SYSCALL64_slow_path+0x25/0x25
      
      This can be reproduced by booting L1 guest w/ 'noapic' grub parameter, which
      means that tells the kernel to not make use of any IOAPICs that may be present
      in the system.
      
      Actually external_intr variable in nested_vmx_vmexit() is the req_int_win
      variable passed from vcpu_enter_guest() which means that the L0's userspace
      requests an irq window. I observed the scenario (!kvm_cpu_has_interrupt(vcpu) &&
      L0's userspace reqeusts an irq window) is true, so there is no interrupt which
      L1 requires to inject to L2, we should not attempt to emualte "Acknowledge
      interrupt on exit" for the irq window requirement in this scenario.
      
      This patch fixes it by not attempt to emulate "Acknowledge interrupt on exit"
      if there is no L1 requirement to inject an interrupt to L2.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      [Added code comment to make it obvious that the behavior is not correct.
       We should do a userspace exit with open interrupt window instead of the
       nested VM exit.  This patch still improves the behavior, so it was
       accepted as a (temporary) workaround.]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      6550c4df
    • D
      KVM: nVMX: mark vmcs12 pages dirty on L2 exit · c9f04407
      David Matlack 提交于
      The host physical addresses of L1's Virtual APIC Page and Posted
      Interrupt descriptor are loaded into the VMCS02. The CPU may write
      to these pages via their host physical address while L2 is running,
      bypassing address-translation-based dirty tracking (e.g. EPT write
      protection). Mark them dirty on every exit from L2 to prevent them
      from getting out of sync with dirty tracking.
      
      Also mark the virtual APIC page and the posted interrupt descriptor
      dirty when KVM is virtualizing posted interrupt processing.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      c9f04407
    • D
      kvm: nVMX: don't flush VMCS12 during VMXOFF or VCPU teardown · 8ca44e88
      David Matlack 提交于
      According to the Intel SDM, software cannot rely on the current VMCS to be
      coherent after a VMXOFF or shutdown. So this is a valid way to handle VMCS12
      flushes.
      
      24.11.1 Software Use of Virtual-Machine Control Structures
      ...
        If a logical processor leaves VMX operation, any VMCSs active on
        that logical processor may be corrupted (see below). To prevent
        such corruption of a VMCS that may be used either after a return
        to VMX operation or on another logical processor, software should
        execute VMCLEAR for that VMCS before executing the VMXOFF instruction
        or removing power from the processor (e.g., as part of a transition
        to the S3 and S4 power states).
      ...
      
      This fixes a "suspicious rcu_dereference_check() usage!" warning during
      kvm_vm_release() because nested_release_vmcs12() calls
      kvm_vcpu_write_guest_page() without holding kvm->srcu.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      8ca44e88
    • P
      KVM: nVMX: do not pin the VMCS12 · 9f744c59
      Paolo Bonzini 提交于
      Since the current implementation of VMCS12 does a memcpy in and out
      of guest memory, we do not need current_vmcs12 and current_vmcs12_page
      anymore.  current_vmptr is enough to read and write the VMCS12.
      
      And David Matlack noted:
      
        This patch also fixes dirty tracking (memslot->dirty_bitmap) of the
        VMCS12 page by using kvm_write_guest. nested_release_page() only marks
        the struct page dirty.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      [Added David Matlack's note and nested_release_page_clean() fix.]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      9f744c59
    • L
      KVM: X86: init irq->level in kvm_pv_kick_cpu_op · ebd28fcb
      Longpeng(Mike) 提交于
      'lapic_irq' is a local variable and its 'level' field isn't
      initialized, so 'level' is random, it doesn't matter but
      makes UBSAN unhappy:
      
      UBSAN: Undefined behaviour in .../lapic.c:...
      load of value 10 is not a valid value for type '_Bool'
      ...
      Call Trace:
       [<ffffffff81f030b6>] dump_stack+0x1e/0x20
       [<ffffffff81f03173>] ubsan_epilogue+0x12/0x55
       [<ffffffff81f03b96>] __ubsan_handle_load_invalid_value+0x118/0x162
       [<ffffffffa1575173>] kvm_apic_set_irq+0xc3/0xf0 [kvm]
       [<ffffffffa1575b20>] kvm_irq_delivery_to_apic_fast+0x450/0x910 [kvm]
       [<ffffffffa15858ea>] kvm_irq_delivery_to_apic+0xfa/0x7a0 [kvm]
       [<ffffffffa1517f4e>] kvm_emulate_hypercall+0x62e/0x760 [kvm]
       [<ffffffffa113141a>] handle_vmcall+0x1a/0x30 [kvm_intel]
       [<ffffffffa114e592>] vmx_handle_exit+0x7a2/0x1fa0 [kvm_intel]
      ...
      Signed-off-by: NLongpeng(Mike) <longpeng2@huawei.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      ebd28fcb
    • W
      KVM: X86: Fix loss of pending INIT due to race · f4ef1910
      Wanpeng Li 提交于
      When SMP VM start, AP may lost INIT because of receiving INIT between
      kvm_vcpu_ioctl_x86_get/set_vcpu_events.
      
             vcpu 0                             vcpu 1
                                         kvm_vcpu_ioctl_x86_get_vcpu_events
                                           events->smi.latched_init = 0
        send INIT to vcpu1
          set vcpu1's pending_events
                                         kvm_vcpu_ioctl_x86_set_vcpu_events
                                            if (events->smi.latched_init == 0)
                                              clear INIT in pending_events
      
      This patch fixes it by just update SMM related flags if we are in SMM.
      
      Thanks Peng Hao for the report and original commit message.
      Reported-by: NPeng Hao <peng.hao2@zte.com.cn>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f4ef1910
  6. 02 8月, 2017 5 次提交
    • G
      ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge · d7a65c49
      Gregory CLEMENT 提交于
      The number of pins in South Bridge is 30 and not 29. There is a fix for
      the driver for the pinctrl, but a fix is also need at device tree level
      for the GPIO.
      
      Fixes: afda007f ("ARM64: dts: marvell: Add pinctrl nodes for Armada
      3700")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      d7a65c49
    • S
      powerpc/83xx/mpc832x_rdb: fix of_irq_to_resource() error check · f29bb786
      Sergei Shtylyov 提交于
      of_irq_to_resource() has recently been fixed to return negative error #'s
      along with 0 in case of failure, however the Freescale MPC832x RDB board
      code still only regards 0 as a failure indication -- fix it up.
      
      Fixes: 7a4228bb ("of: irq: use of_irq_get() in of_irq_to_resource()")
      Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
      Acked-by: NScott Wood <oss@buserror.net>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f29bb786
    • W
      KVM: async_pf: make rcu irq exit if not triggered from idle task · 337c017c
      Wanpeng Li 提交于
       WARNING: CPU: 5 PID: 1242 at kernel/rcu/tree_plugin.h:323 rcu_note_context_switch+0x207/0x6b0
       CPU: 5 PID: 1242 Comm: unity-settings- Not tainted 4.13.0-rc2+ #1
       RIP: 0010:rcu_note_context_switch+0x207/0x6b0
       Call Trace:
        __schedule+0xda/0xba0
        ? kvm_async_pf_task_wait+0x1b2/0x270
        schedule+0x40/0x90
        kvm_async_pf_task_wait+0x1cc/0x270
        ? prepare_to_swait+0x22/0x70
        do_async_page_fault+0x77/0xb0
        ? do_async_page_fault+0x77/0xb0
        async_page_fault+0x28/0x30
       RIP: 0010:__d_lookup_rcu+0x90/0x1e0
      
      I encounter this when trying to stress the async page fault in L1 guest w/
      L2 guests running.
      
      Commit 9b132fbe (Add rcu user eqs exception hooks for async page
      fault) adds rcu_irq_enter/exit() to kvm_async_pf_task_wait() to exit cpu
      idle eqs when needed, to protect the code that needs use rcu.  However,
      we need to call the pair even if the function calls schedule(), as seen
      from the above backtrace.
      
      This patch fixes it by informing the RCU subsystem exit/enter the irq
      towards/away from idle for both n.halted and !n.halted.
      
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Reviewed-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      337c017c
    • P
      KVM: nVMX: fixes to nested virt interrupt injection · b96fb439
      Paolo Bonzini 提交于
      There are three issues in nested_vmx_check_exception:
      
      1) it is not taking PFEC_MATCH/PFEC_MASK into account, as reported
      by Wanpeng Li;
      
      2) it should rebuild the interruption info and exit qualification fields
      from scratch, as reported by Jim Mattson, because the values from the
      L2->L0 vmexit may be invalid (e.g. if an emulated instruction causes
      a page fault, the EPT misconfig's exit qualification is incorrect).
      
      3) CR2 and DR6 should not be written for exception intercept vmexits
      (CR2 only for AMD).
      
      This patch fixes the first two and adds a comment about the last,
      outlining the fix.
      
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Wanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b96fb439
    • P
      KVM: nVMX: do not fill vm_exit_intr_error_code in prepare_vmcs12 · 7313c698
      Paolo Bonzini 提交于
      Do this in the caller of nested_vmx_vmexit instead.
      
      nested_vmx_check_exception was doing a vmwrite to the vmcs02's
      VM_EXIT_INTR_ERROR_CODE field, so that prepare_vmcs12 would move
      the field to vmcs12->vm_exit_intr_error_code.  However that isn't
      possible on pre-Haswell machines.  Moving the vmcs12 write to the
      callers fixes it.
      Reported-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [Changed nested_vmx_reflect_vmexit() return type to (int)1 from (bool)1,
       thanks to fengguang.wu@intel.com]
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      7313c698
  7. 01 8月, 2017 2 次提交
  8. 31 7月, 2017 4 次提交
    • B
      parisc: Define CONFIG_CPU_BIG_ENDIAN · 74ad3d28
      Babu Moger 提交于
      While working on enabling queued rwlock on SPARC, found this following
      code in include/asm-generic/qrwlock.h which uses CONFIG_CPU_BIG_ENDIAN
      to clear a byte.
      
      static inline u8 *__qrwlock_write_byte(struct qrwlock *lock)
       {
      	return (u8 *)lock + 3 * IS_BUILTIN(CONFIG_CPU_BIG_ENDIAN);
       }
      
      Problem is many of the fixed big endian architectures don't define
      CPU_BIG_ENDIAN and clears the wrong byte.
      
      Define CPU_BIG_ENDIAN for parisc architecture to fix it.
      Signed-off-by: NBabu Moger <babu.moger@oracle.com>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      74ad3d28
    • N
      powerpc/64s: Fix stack setup in watchdog soft_nmi_common() · cc491f1d
      Nicholas Piggin 提交于
      The watchdog soft-NMI exception stack setup loads a stack pointer
      twice, which is an obvious error. It ends up using the system reset
      interrupt (true-NMI) stack, which is also a bug because the watchdog
      could be preempted by a system reset interrupt that overwrites the
      NMI stack.
      
      Change the soft-NMI to use the "emergency stack". The current kernel
      stack is not used, because of the longer-term goal to prevent
      asynchronous stack access using soft-disable.
      
      Fixes: 2104180a ("powerpc/64s: implement arch-specific hardlockup watchdog")
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cc491f1d
    • H
      parisc: Increase thread and stack size to 32kb · 8f8201df
      Helge Deller 提交于
      Since kernel 4.11 the thread and irq stacks on parisc randomly overflow
      the default size of 16k. The reason why stack usage suddenly grew is yet
      unknown.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: stable@vger.kernel.org # 4.11+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      8f8201df
    • J
      parisc: Handle vma's whose context is not current in flush_cache_range · 13d57093
      John David Anglin 提交于
      In testing James' patch to drivers/parisc/pdc_stable.c, I hit the BUG
      statement in flush_cache_range() during a system shutdown:
      
      kernel BUG at arch/parisc/kernel/cache.c:595!
      CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
      Workqueue: events free_ioctx
      
       IAOQ[0]: flush_cache_range+0x144/0x148
       IAOQ[1]: flush_cache_page+0x0/0x1a8
       RP(r2): flush_cache_range+0xec/0x148
      Backtrace:
       [<00000000402910ac>] unmap_page_range+0x84/0x880
       [<00000000402918f4>] unmap_single_vma+0x4c/0x60
       [<0000000040291a18>] zap_page_range_single+0x110/0x160
       [<0000000040291c34>] unmap_mapping_range+0x174/0x1a8
       [<000000004026ccd8>] truncate_pagecache+0x50/0xa8
       [<000000004026cd84>] truncate_setsize+0x54/0x70
       [<000000004033d534>] put_aio_ring_file+0x44/0xb0
       [<000000004033d5d8>] aio_free_ring+0x38/0x140
       [<000000004033d714>] free_ioctx+0x34/0xa8
       [<00000000401b0028>] process_one_work+0x1b8/0x4d0
       [<00000000401b04f4>] worker_thread+0x1b4/0x648
       [<00000000401b9128>] kthread+0x1b0/0x208
       [<0000000040150020>] end_fault_vector+0x20/0x28
       [<0000000040639518>] nf_ip_reroute+0x50/0xa8
       [<0000000040638ed0>] nf_ip_route+0x10/0x78
       [<0000000040638c90>] xfrm4_mode_tunnel_input+0x180/0x1f8
      
      CPU: 2 PID: 6532 Comm: kworker/2:0 Not tainted 4.13.0-rc2+ #1
      Workqueue: events free_ioctx
      Backtrace:
       [<0000000040163bf0>] show_stack+0x20/0x38
       [<0000000040688480>] dump_stack+0xa8/0x120
       [<0000000040163dc4>] die_if_kernel+0x19c/0x2b0
       [<0000000040164d0c>] handle_interruption+0xa24/0xa48
      
      This patch modifies flush_cache_range() to handle non current contexts.
      In as much as this occurs infrequently, the simplest approach is to
      flush the entire cache when this happens.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # 4.9+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      13d57093
  9. 30 7月, 2017 1 次提交
    • R
      cpufreq: x86: Make scaling_cur_freq behave more as expected · 4815d3c5
      Rafael J. Wysocki 提交于
      After commit f8475cef "x86: use common aperfmperf_khz_on_cpu() to
      calculate KHz using APERF/MPERF" the scaling_cur_freq policy attribute
      in sysfs only behaves as expected on x86 with APERF/MPERF registers
      available when it is read from at least twice in a row.  The value
      returned by the first read may not be meaningful, because the
      computations in there use cached values from the previous iteration
      of aperfmperf_snapshot_khz() which may be stale.
      
      To prevent that from happening, modify arch_freq_get_on_cpu() to
      call aperfmperf_snapshot_khz() twice, with a short delay between
      these calls, if the previous invocation of aperfmperf_snapshot_khz()
      was too far back in the past (specifically, more that 1s ago).
      
      Also, as pointed out by Doug Smythies, aperf_delta is limited now
      and the multiplication of it by cpu_khz won't overflow, so simplify
      the s->khz computations too.
      
      Fixes: f8475cef "x86: use common aperfmperf_khz_on_cpu() to calculate KHz using APERF/MPERF"
      Reported-by: NDoug Smythies <dsmythies@telus.net>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4815d3c5
  10. 29 7月, 2017 1 次提交
  11. 28 7月, 2017 7 次提交
    • A
      powerpc/powernv/pci: Return failure for some uses of dma_set_mask() · 253fd51e
      Alistair Popple 提交于
      Commit 8e3f1b1d ("powerpc/powernv/pci: Enable 64-bit devices to access
      >4GB DMA space") introduced the ability for PCI device drivers to request a
      DMA mask between 64 and 32 bits and actually get a mask greater than
      32-bits. However currently if certain machine configuration dependent
      conditions are not meet the code silently falls back to a 32-bit mask.
      
      This makes it hard for device drivers to detect which mask they actually
      got. Instead we should return an error when the request could not be
      fulfilled which allows drivers to either fallback or implement other
      workarounds as documented in DMA-API-HOWTO.txt.
      Signed-off-by: NAlistair Popple <alistair@popple.id.au>
      Acked-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      253fd51e
    • M
      powerpc/boot: Fix 64-bit boot wrapper build with non-biarch compiler · 65c5ec11
      Michael Ellerman 提交于
      Historically the boot wrapper was always built 32-bit big endian, even
      for 64-bit kernels. That was because old firmwares didn't necessarily
      support booting a 64-bit image. Because of that arch/powerpc/boot/Makefile
      uses CROSS32CC for compilation.
      
      However when we added 64-bit little endian support, we also added
      support for building the boot wrapper 64-bit. However we kept using
      CROSS32CC, because in most cases it is just CC and everything works.
      
      However if the user doesn't specify CROSS32_COMPILE (which no one ever
      does AFAIK), and CC is *not* biarch (32/64-bit capable), then CROSS32CC
      becomes just "gcc". On native systems that is probably OK, but if we're
      cross building it definitely isn't, leading to eg:
      
        gcc ... -m64 -mlittle-endian -mabi=elfv2 ... arch/powerpc/boot/cpm-serial.c
        gcc: error: unrecognized argument in option ‘-mabi=elfv2’
        gcc: error: unrecognized command line option ‘-mlittle-endian’
        make: *** [zImage] Error 2
      
      To fix it, stop using CROSS32CC, because we may or may not be building
      32-bit. Instead setup a BOOTCC, which defaults to CC, and only use
      CROSS32_COMPILE if it's set and we're building for 32-bit.
      
      Fixes: 147c0516 ("powerpc/boot: Add support for 64bit little endian wrapper")
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NCyril Bur <cyrilbur@gmail.com>
      65c5ec11
    • M
      powerpc/smp: Call smp_ops->setup_cpu() directly on the boot CPU · 7b7622bb
      Michael Ellerman 提交于
      In smp_cpus_done() we need to call smp_ops->setup_cpu() for the boot
      CPU, which means it has to run *on* the boot CPU.
      
      In the past we ensured it ran on the boot CPU by changing the CPU
      affinity mask of current directly. That was removed in commit
      6d11b87d ("powerpc/smp: Replace open coded task affinity logic"),
      and replaced with a work queue call.
      
      Unfortunately using a work queue leads to a lockdep warning, now that
      the CPU hotplug lock is a regular semaphore:
      
        ======================================================
        WARNING: possible circular locking dependency detected
        ...
        kworker/0:1/971 is trying to acquire lock:
         (cpu_hotplug_lock.rw_sem){++++++}, at: [<c000000000100974>] apply_workqueue_attrs+0x34/0xa0
      
        but task is already holding lock:
         ((&wfc.work)){+.+.+.}, at: [<c0000000000fdb2c>] process_one_work+0x25c/0x800
        ...
             CPU0                    CPU1
             ----                    ----
        lock((&wfc.work));
                                     lock(cpu_hotplug_lock.rw_sem);
                                     lock((&wfc.work));
        lock(cpu_hotplug_lock.rw_sem);
      
      Although the deadlock can't happen in practice, because
      smp_cpus_done() only runs in early boot before CPU hotplug is allowed,
      lockdep can't tell that.
      
      Luckily in commit 8fb12156 ("init: Pin init task to the boot CPU,
      initially") tglx changed the generic code to pin init to the boot CPU
      to begin with. The unpinning of init from the boot CPU happens in
      sched_init_smp(), which is called after smp_cpus_done().
      
      So smp_cpus_done() is always called on the boot CPU, which means we
      don't need the work queue call at all - and the lockdep warning goes
      away.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      7b7622bb
    • W
      arm64: mmu: Place guard page after mapping of kernel image · 92bbd16e
      Will Deacon 提交于
      The vast majority of virtual allocations in the vmalloc region are followed
      by a guard page, which can help to avoid overruning on vma into another,
      which may map a read-sensitive device.
      
      This patch adds a guard page to the end of the kernel image mapping (i.e.
      following the data/bss segments).
      
      Cc: Mark Rutland <mark.rutland@arm.com>
      Reviewed-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      92bbd16e
    • M
      x86/boot: Disable the address-of-packed-member compiler warning · 20c6c189
      Matthias Kaehlcke 提交于
      The clang warning 'address-of-packed-member' is disabled for the general
      kernel code, also disable it for the x86 boot code.
      
      This suppresses a bunch of warnings like this when building with clang:
      
      ./arch/x86/include/asm/processor.h:535:30: warning: taking address of
        packed member 'sp0' of class or structure 'x86_hw_tss' may result in an
        unaligned pointer value [-Waddress-of-packed-member]
          return this_cpu_read_stable(cpu_tss.x86_tss.sp0);
                                      ^~~~~~~~~~~~~~~~~~~
      ./arch/x86/include/asm/percpu.h:391:59: note: expanded from macro
        'this_cpu_read_stable'
          #define this_cpu_read_stable(var)       percpu_stable_op("mov", var)
                                                                          ^~~
      ./arch/x86/include/asm/percpu.h:228:16: note: expanded from macro
        'percpu_stable_op'
          : "p" (&(var)));
                   ^~~
      Signed-off-by: NMatthias Kaehlcke <mka@chromium.org>
      Cc: Doug Anderson <dianders@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20170725215053.135586-1-mka@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20c6c189
    • G
      powerpc/tm: Fix saving of TM SPRs in core dump · cd63f3cf
      Gustavo Romero 提交于
      Currently flush_tmregs_to_thread() does not save the TM SPRs (TFHAR,
      TFIAR, TEXASR) to the thread struct, unless the process is currently
      inside a suspended transaction.
      
      If the process is core dumping, and the TM SPRs have changed since the
      last time the process was context switched, then we will save stale
      values of the TM SPRs to the core dump.
      
      Fix it by saving the live register state to the thread struct in that
      case.
      
      Fixes: 08e1c01d ("powerpc/ptrace: Enable support for TM SPR state")
      Cc: stable@vger.kernel.org # v4.8+
      Signed-off-by: NGustavo Romero <gromero@linux.vnet.ibm.com>
      Reviewed-by: NCyril Bur <cyrilbur@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      cd63f3cf
    • O
      powerpc/mm: Fix pmd/pte_devmap() on non-leaf entries · c9c98bc5
      Oliver O'Halloran 提交于
      The Radix MMU translation tree as defined in ISA v3.0 contains two
      different types of entry, directories and leaves. Leaves are
      identified by _PAGE_PTE being set.
      
      The formats of the two entries are different, with the directory
      entries containing no spare bits for use by software. In particular
      the bit we use for _PAGE_DEVMAP is not reserved for software, and is
      part of the NLB (Next Level Base) field, essentially the address of
      the next level in the tree.
      
      Note that the Linux pte_t is not == _PAGE_PTE. A huge page pmd
      entry (or devmap!) is also a leaf and so has _PAGE_PTE set, even
      though we use a pmd_t for it in Linux.
      
      The fix is to ensure that the pmd/pte_devmap() confirm they are
      looking at a leaf entry (_PAGE_PTE) as well as checking _PAGE_DEVMAP.
      
      Fixes: ebd31197 ("powerpc/mm: Add devmap support for ppc64")
      Signed-off-by: NOliver O'Halloran <oohall@gmail.com>
      Tested-by: NLaurent Vivier <lvivier@redhat.com>
      Tested-by: NJose Ricardo Ziviani <joserz@linux.vnet.ibm.com>
      Reviewed-by: NSuraj Jitindar Singh <sjitindarsingh@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      [mpe: Add a comment in the code and flesh out change log]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      c9c98bc5