1. 02 7月, 2021 1 次提交
    • A
      mm: define default value for FIRST_USER_ADDRESS · fac7757e
      Anshuman Khandual 提交于
      Currently most platforms define FIRST_USER_ADDRESS as 0UL duplication the
      same code all over.  Instead just define a generic default value (i.e 0UL)
      for FIRST_USER_ADDRESS and let the platforms override when required.  This
      makes it much cleaner with reduced code.
      
      The default FIRST_USER_ADDRESS here would be skipped in <linux/pgtable.h>
      when the given platform overrides its value via <asm/pgtable.h>.
      
      Link: https://lkml.kernel.org/r/1620615725-24623-1-git-send-email-anshuman.khandual@arm.comSigned-off-by: NAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Acked-by: Guo Ren <guoren@kernel.org>			[csky]
      Acked-by: Stafford Horne <shorne@gmail.com>		[openrisc]
      Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
      Acked-by: NMike Rapoport <rppt@linux.ibm.com>
      Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>	[RISC-V]
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chris Zankel <chris@zankel.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fac7757e
  2. 19 6月, 2021 1 次提交
    • J
      riscv: Ensure BPF_JIT_REGION_START aligned with PMD size · 3a02764c
      Jisheng Zhang 提交于
      Andreas reported commit fc850476 ("riscv: bpf: Avoid breaking W^X")
      breaks booting with one kind of defconfig, I reproduced a kernel panic
      with the defconfig:
      
      [    0.138553] Unable to handle kernel paging request at virtual address ffffffff81201220
      [    0.139159] Oops [#1]
      [    0.139303] Modules linked in:
      [    0.139601] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.13.0-rc5-default+ #1
      [    0.139934] Hardware name: riscv-virtio,qemu (DT)
      [    0.140193] epc : __memset+0xc4/0xfc
      [    0.140416]  ra : skb_flow_dissector_init+0x1e/0x82
      [    0.140609] epc : ffffffff8029806c ra : ffffffff8033be78 sp : ffffffe001647da0
      [    0.140878]  gp : ffffffff81134b08 tp : ffffffe001654380 t0 : ffffffff81201158
      [    0.141156]  t1 : 0000000000000002 t2 : 0000000000000154 s0 : ffffffe001647dd0
      [    0.141424]  s1 : ffffffff80a43250 a0 : ffffffff81201220 a1 : 0000000000000000
      [    0.141654]  a2 : 000000000000003c a3 : ffffffff81201258 a4 : 0000000000000064
      [    0.141893]  a5 : ffffffff8029806c a6 : 0000000000000040 a7 : ffffffffffffffff
      [    0.142126]  s2 : ffffffff81201220 s3 : 0000000000000009 s4 : ffffffff81135088
      [    0.142353]  s5 : ffffffff81135038 s6 : ffffffff8080ce80 s7 : ffffffff80800438
      [    0.142584]  s8 : ffffffff80bc6578 s9 : 0000000000000008 s10: ffffffff806000ac
      [    0.142810]  s11: 0000000000000000 t3 : fffffffffffffffc t4 : 0000000000000000
      [    0.143042]  t5 : 0000000000000155 t6 : 00000000000003ff
      [    0.143220] status: 0000000000000120 badaddr: ffffffff81201220 cause: 000000000000000f
      [    0.143560] [<ffffffff8029806c>] __memset+0xc4/0xfc
      [    0.143859] [<ffffffff8061e984>] init_default_flow_dissectors+0x22/0x60
      [    0.144092] [<ffffffff800010fc>] do_one_initcall+0x3e/0x168
      [    0.144278] [<ffffffff80600df0>] kernel_init_freeable+0x1c8/0x224
      [    0.144479] [<ffffffff804868a8>] kernel_init+0x12/0x110
      [    0.144658] [<ffffffff800022de>] ret_from_exception+0x0/0xc
      [    0.145124] ---[ end trace f1e9643daa46d591 ]---
      
      After some investigation, I think I found the root cause: commit
      2bfc6cd8 ("move kernel mapping outside of linear mapping") moves
      BPF JIT region after the kernel:
      
      | #define BPF_JIT_REGION_START	PFN_ALIGN((unsigned long)&_end)
      
      The &_end is unlikely aligned with PMD size, so the front bpf jit
      region sits with part of kernel .data section in one PMD size mapping.
      But kernel is mapped in PMD SIZE, when bpf_jit_binary_lock_ro() is
      called to make the first bpf jit prog ROX, we will make part of kernel
      .data section RO too, so when we write to, for example memset the
      .data section, MMU will trigger a store page fault.
      
      To fix the issue, we need to ensure the BPF JIT region is PMD size
      aligned. This patch acchieve this goal by restoring the BPF JIT region
      to original position, I.E the 128MB before kernel .text section. The
      modification to kasan_init.c is inspired by Alexandre.
      
      Fixes: fc850476 ("riscv: bpf: Avoid breaking W^X")
      Reported-by: NAndreas Schwab <schwab@linux-m68k.org>
      Signed-off-by: NJisheng Zhang <jszhang@kernel.org>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      3a02764c
  3. 11 6月, 2021 1 次提交
  4. 23 5月, 2021 1 次提交
    • J
      riscv: kexec: Fix W=1 build warnings · bab0d47c
      Jisheng Zhang 提交于
      Fixes the following W=1 build warning(s):
      
      In file included from include/linux/kexec.h:28,
                       from arch/riscv/kernel/machine_kexec.c:7:
      arch/riscv/include/asm/kexec.h:45:1: warning: ‘extern’ is not at beginning of declaration [-Wold-style-declaration]
         45 | const extern unsigned char riscv_kexec_relocate[];
            | ^~~~~
      arch/riscv/include/asm/kexec.h:46:1: warning: ‘extern’ is not at beginning of declaration [-Wold-style-declaration]
         46 | const extern unsigned int riscv_kexec_relocate_size;
            | ^~~~~
      arch/riscv/kernel/machine_kexec.c:125:6: warning: no previous prototype for ‘machine_shutdown’ [-Wmissing-prototypes]
        125 | void machine_shutdown(void)
            |      ^~~~~~~~~~~~~~~~
      arch/riscv/kernel/machine_kexec.c:147:1: warning: no previous prototype for ‘machine_crash_shutdown’ [-Wmissing-prototypes]
        147 | machine_crash_shutdown(struct pt_regs *regs)
            | ^~~~~~~~~~~~~~~~~~~~~~
      arch/riscv/kernel/machine_kexec.c:23: warning: Function parameter or member 'image' not described in 'kexec_image_info'
      arch/riscv/kernel/machine_kexec.c:53: warning: Function parameter or member 'image' not described in 'machine_kexec_prepare'
      arch/riscv/kernel/machine_kexec.c:114: warning: Function parameter or member 'image' not described in 'machine_kexec_cleanup'
      arch/riscv/kernel/machine_kexec.c:148: warning: Function parameter or member 'regs' not described in 'machine_crash_shutdown'
      arch/riscv/kernel/machine_kexec.c:167: warning: Function parameter or member 'image' not described in 'machine_kexec'
      Signed-off-by: NJisheng Zhang <jszhang@kernel.org>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      bab0d47c
  5. 07 5月, 2021 1 次提交
  6. 01 5月, 2021 2 次提交
  7. 26 4月, 2021 12 次提交
  8. 02 4月, 2021 1 次提交
    • B
      riscv: evaluate put_user() arg before enabling user access · 285a76bb
      Ben Dooks 提交于
      The <asm/uaccess.h> header has a problem with put_user(a, ptr) if
      the 'a' is not a simple variable, such as a function. This can lead
      to the compiler producing code as so:
      
      1:	enable_user_access()
      2:	evaluate 'a' into register 'r'
      3:	put 'r' to 'ptr'
      4:	disable_user_acess()
      
      The issue is that 'a' is now being evaluated with the user memory
      protections disabled. So we try and force the evaulation by assigning
      'x' to __val at the start, and hoping the compiler barriers in
       enable_user_access() do the job of ordering step 2 before step 1.
      
      This has shown up in a bug where 'a' sleeps and thus schedules out
      and loses the SR_SUM flag. This isn't sufficient to fully fix, but
      should reduce the window of opportunity. The first instance of this
      we found is in scheudle_tail() where the code does:
      
      $ less -N kernel/sched/core.c
      
      4263  if (current->set_child_tid)
      4264         put_user(task_pid_vnr(current), current->set_child_tid);
      
      Here, the task_pid_vnr(current) is called within the block that has
      enabled the user memory access. This can be made worse with KASAN
      which makes task_pid_vnr() a rather large call with plenty of
      opportunity to sleep.
      Signed-off-by: NBen Dooks <ben.dooks@codethink.co.uk>
      Reported-by: syzbot+e74b94fe601ab9552d69@syzkaller.appspotmail.com
      Suggested-by: NArnd Bergman <arnd@arndb.de>
      
      --
      Changes since v1:
      - fixed formatting and updated the patch description with more info
      
      Changes since v2:
      - fixed commenting on __put_user() (schwab@linux-m68k.org)
      
      Change since v3:
      - fixed RFC in patch title. Should be ready to merge.
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      285a76bb
  9. 28 3月, 2021 1 次提交
  10. 10 3月, 2021 7 次提交
  11. 23 2月, 2021 3 次提交
  12. 19 2月, 2021 2 次提交
    • A
      RISC-V: Implement ASID allocator · 65d4b9c5
      Anup Patel 提交于
      Currently, we do local TLB flush on every MM switch. This is very harsh on
      performance because we are forcing page table walks after every MM switch.
      
      This patch implements ASID allocator for assigning an ASID to a MM context.
      The number of ASIDs are limited in HW so we create a logical entity named
      CONTEXTID for assigning to MM context. The lower bits of CONTEXTID are ASID
      and upper bits are VERSION number. The number of usable ASID bits supported
      by HW are detected at boot-time by writing 1s to ASID bits in SATP CSR.
      
      We allocate new CONTEXTID on first MM switch for a MM context where the
      ASID is allocated from an ASID bitmap and VERSION is provide by an atomic
      counter. At time of allocating new CONTEXTID, if we run out of available
      ASIDs then:
      1. We flush the ASID bitmap
      2. Increment current VERSION atomic counter
      3. Re-allocate ASID from ASID bitmap
      4. Flush TLB on all CPUs
      5. Try CONTEXTID re-assignment on all CPUs
      
      Please note that we don't use ASID #0 because it is used at boot-time by
      all CPUs for initial MM context. Also, newly created context is always
      assigned CONTEXTID #0 (i.e. VERSION #0 and ASID #0) which is an invalid
      context in our implementation.
      
      Using above approach, we have virtually infinite CONTEXTIDs on-top-of
      limited number of HW ASIDs. This approach is inspired from ASID allocator
      used for Linux ARM/ARM64 but we have adapted it for RISC-V. Overall, this
      ASID allocator helps us reduce rate of local TLB flushes on every CPU
      thereby increasing performance.
      
      This patch is tested on QEMU virt machine, Spike and SiFive Unleashed
      board. On QEMU virt machine, we see some (3-5% approx) performance
      improvement with SW emulated TLBs provided by QEMU. Unfortunately,
      the ASID bits of the SATP CSR are not implemented on Spike and SiFive
      Unleashed board so we don't see any change in performance. On real HW
      having all ASID bits implemented, the performance gains will be much
      more due improved sharing of TLB among different processes.
      Signed-off-by: NAnup Patel <anup.patel@wdc.com>
      Reviewed-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      65d4b9c5
    • C
      RISC-V: remove unneeded semicolon · 3449831d
      Chengyang Fan 提交于
      Remove a superfluous semicolon after function definition.
      Signed-off-by: NChengyang Fan <cy.fan@huawei.com>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      3449831d
  13. 03 2月, 2021 2 次提交
  14. 15 1月, 2021 5 次提交
    • K
      riscv: Add dump stack in show_regs · 091b9450
      Kefeng Wang 提交于
      Like commit 1149aad1 ("arm64: Add dump_backtrace() in show_regs"),
      dump the stack in riscv show_regs as common code expects.
      Reviewed-by: NAtish Patra <atish.patra@wdc.com>
      Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      091b9450
    • G
      riscv: Enable per-task stack canaries · fea2fed2
      Guo Ren 提交于
      This enables the use of per-task stack canary values if GCC has
      support for emitting the stack canary reference relative to the
      value of tp, which holds the task struct pointer in the riscv
      kernel.
      
      After compare arm64 and x86 implementations, seems arm64's is more
      flexible and readable. The key point is how gcc get the offset of
      stack_canary from gs/el0_sp.
      
      x86: Use a fix offset from gs, not flexible.
      
      struct fixed_percpu_data {
      	/*
      	 * GCC hardcodes the stack canary as %gs:40.  Since the
      	 * irq_stack is the object at %gs:0, we reserve the bottom
      	 * 48 bytes of the irq stack for the canary.
      	 */
      	char            gs_base[40]; // :(
      	unsigned long   stack_canary;
      };
      
      arm64: Use -mstack-protector-guard-offset & guard-reg
      	gcc options:
      	-mstack-protector-guard=sysreg
      	-mstack-protector-guard-reg=sp_el0
      	-mstack-protector-guard-offset=xxx
      
      riscv: Use -mstack-protector-guard-offset & guard-reg
      	gcc options:
      	-mstack-protector-guard=tls
      	-mstack-protector-guard-reg=tp
      	-mstack-protector-guard-offset=xxx
      
       GCC's implementation has been merged:
       commit c931e8d5a96463427040b0d11f9c4352ac22b2b0
       Author: Cooper Qu <cooper.qu@linux.alibaba.com>
       Date:   Mon Jul 13 16:15:08 2020 +0800
      
           RISC-V: Add support for TLS stack protector canary access
      
      In the end, these codes are inserted by gcc before return:
      
      *  0xffffffe00020b396 <+120>:   ld      a5,1008(tp) # 0x3f0
      *  0xffffffe00020b39a <+124>:   xor     a5,a5,a4
      *  0xffffffe00020b39c <+126>:   mv      a0,s5
      *  0xffffffe00020b39e <+128>:   bnez    a5,0xffffffe00020b61c <_do_fork+766>
         0xffffffe00020b3a2 <+132>:   ld      ra,136(sp)
         0xffffffe00020b3a4 <+134>:   ld      s0,128(sp)
         0xffffffe00020b3a6 <+136>:   ld      s1,120(sp)
         0xffffffe00020b3a8 <+138>:   ld      s2,112(sp)
         0xffffffe00020b3aa <+140>:   ld      s3,104(sp)
         0xffffffe00020b3ac <+142>:   ld      s4,96(sp)
         0xffffffe00020b3ae <+144>:   ld      s5,88(sp)
         0xffffffe00020b3b0 <+146>:   ld      s6,80(sp)
         0xffffffe00020b3b2 <+148>:   ld      s7,72(sp)
         0xffffffe00020b3b4 <+150>:   addi    sp,sp,144
         0xffffffe00020b3b6 <+152>:   ret
         ...
      *  0xffffffe00020b61c <+766>:   auipc   ra,0x7f8
      *  0xffffffe00020b620 <+770>:   jalr    -1764(ra) # 0xffffffe000a02f38 <__stack_chk_fail>
      Signed-off-by: NGuo Ren <guoren@linux.alibaba.com>
      Signed-off-by: NCooper Qu <cooper.qu@linux.alibaba.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      fea2fed2
    • G
      riscv: Add support for function error injection · ee55ff80
      Guo Ren 提交于
      Inspired by the commit 42d038c4 ("arm64: Add support for function
      error injection"), this patch supports function error injection for
      riscv.
      
      This patch mainly support two functions: one is regs_set_return_value()
      which is used to overwrite the return value; the another function is
      override_function_with_return() which is to override the probed
      function returning and jump to its caller.
      
      Test log:
       cd /sys/kernel/debug/fail_function
       echo sys_clone > inject
       echo 100 > probability
       echo 1 > interval
       ls /
      [  313.176875] FAULT_INJECTION: forcing a failure.
      [  313.176875] name fail_function, interval 1, probability 100, space 0, times 1
      [  313.184357] CPU: 0 PID: 87 Comm: sh Not tainted 5.8.0-rc5-00007-g6a758cc #117
      [  313.187616] Call Trace:
      [  313.189100] [<ffffffe0002036b6>] walk_stackframe+0x0/0xc2
      [  313.191626] [<ffffffe00020395c>] show_stack+0x40/0x4c
      [  313.193927] [<ffffffe000556c60>] dump_stack+0x7c/0x96
      [  313.194795] [<ffffffe0005522e8>] should_fail+0x140/0x142
      [  313.195923] [<ffffffe000299ffc>] fei_kprobe_handler+0x2c/0x5a
      [  313.197687] [<ffffffe0009e2ec4>] kprobe_breakpoint_handler+0xb4/0x18a
      [  313.200054] [<ffffffe00020357e>] do_trap_break+0x36/0xca
      [  313.202147] [<ffffffe000201bca>] ret_from_exception+0x0/0xc
      [  313.204556] [<ffffffe000201bbc>] ret_from_syscall+0x0/0x2
      -sh: can't fork: Invalid argument
      Signed-off-by: NGuo Ren <guoren@linux.alibaba.com>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      ee55ff80
    • G
      riscv: Add uprobes supported · 74784081
      Guo Ren 提交于
      This patch adds support for uprobes on riscv architecture.
      
      Just like kprobe, it support single-step and simulate instructions.
      Signed-off-by: NGuo Ren <guoren@linux.alibaba.com>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      74784081
    • G
      riscv: Add kprobes supported · c22b0bcb
      Guo Ren 提交于
      This patch enables "kprobe & kretprobe" to work with ftrace
      interface. It utilized software breakpoint as single-step
      mechanism.
      
      Some instructions which can't be single-step executed must be
      simulated in kernel execution slot, such as: branch, jal, auipc,
      la ...
      
      Some instructions should be rejected for probing and we use a
      blacklist to filter, such as: ecall, ebreak, ...
      
      We use ebreak & c.ebreak to replace origin instruction and the
      kprobe handler prepares an executable memory slot for out-of-line
      execution with a copy of the original instruction being probed.
      In execution slot we add ebreak behind original instruction to
      simulate a single-setp mechanism.
      
      The patch is based on packi's work [1] and csky's work [2].
       - The kprobes_trampoline.S is all from packi's patch
       - The single-step mechanism is new designed for riscv without hw
         single-step trap
       - The simulation codes are from csky
       - Frankly, all codes refer to other archs' implementation
      
       [1] https://lore.kernel.org/linux-riscv/20181113195804.22825-1-me@packi.ch/
       [2] https://lore.kernel.org/linux-csky/20200403044150.20562-9-guoren@kernel.org/Signed-off-by: NGuo Ren <guoren@linux.alibaba.com>
      Co-developed-by: NPatrick Stählin <me@packi.ch>
      Signed-off-by: NPatrick Stählin <me@packi.ch>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Tested-by: NZong Li <zong.li@sifive.com>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Cc: Patrick Stählin <me@packi.ch>
      Cc: Palmer Dabbelt <palmerdabbelt@google.com>
      Cc: Björn Töpel <bjorn.topel@gmail.com>
      Signed-off-by: NPalmer Dabbelt <palmerdabbelt@google.com>
      c22b0bcb