1. 29 3月, 2019 1 次提交
  2. 17 3月, 2019 1 次提交
  3. 05 3月, 2019 1 次提交
    • L
      get rid of legacy 'get_ds()' function · 736706be
      Linus Torvalds 提交于
      Every in-kernel use of this function defined it to KERNEL_DS (either as
      an actual define, or as an inline function).  It's an entirely
      historical artifact, and long long long ago used to actually read the
      segment selector valueof '%ds' on x86.
      
      Which in the kernel is always KERNEL_DS.
      
      Inspired by a patch from Jann Horn that just did this for a very small
      subset of users (the ones in fs/), along with Al who suggested a script.
      I then just took it to the logical extreme and removed all the remaining
      gunk.
      
      Roughly scripted with
      
         git grep -l '(get_ds())' -- :^tools/ | xargs sed -i 's/(get_ds())/(KERNEL_DS)/'
         git grep -lw 'get_ds' -- :^tools/ | xargs sed -i '/^#define get_ds()/d'
      
      plus manual fixups to remove a few unusual usage patterns, the couple of
      inline function cases and to fix up a comment that had become stale.
      
      The 'get_ds()' function remains in an x86 kvm selftest, since in user
      space it actually does something relevant.
      Inspired-by: NJann Horn <jannh@google.com>
      Inspired-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      736706be
  4. 26 2月, 2019 3 次提交
  5. 20 2月, 2019 7 次提交
  6. 14 2月, 2019 2 次提交
  7. 08 2月, 2019 1 次提交
  8. 07 2月, 2019 3 次提交
    • M
      arm: KVM: Add missing kvm_stage2_has_pmd() helper · 309a2056
      Marc Zyngier 提交于
      Fixup 32bit by providing the now required helper.
      
      Cc: Suzuki Poulose <suzuki.poulose@arm.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      309a2056
    • M
      arm/arm64: KVM: Allow a VCPU to fully reset itself · 358b28f0
      Marc Zyngier 提交于
      The current kvm_psci_vcpu_on implementation will directly try to
      manipulate the state of the VCPU to reset it.  However, since this is
      not done on the thread that runs the VCPU, we can end up in a strangely
      corrupted state when the source and target VCPUs are running at the same
      time.
      
      Fix this by factoring out all reset logic from the PSCI implementation
      and forwarding the required information along with a request to the
      target VCPU.
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NChristoffer Dall <christoffer.dall@arm.com>
      358b28f0
    • A
      y2038: rename old time and utime syscalls · d33c577c
      Arnd Bergmann 提交于
      The time, stime, utime, utimes, and futimesat system calls are only
      used on older architectures, and we do not provide y2038 safe variants
      of them, as they are replaced by clock_gettime64, clock_settime64,
      and utimensat_time64.
      
      However, for consistency it seems better to have the 32-bit architectures
      that still use them call the "time32" entry points (leaving the
      traditional handlers for the 64-bit architectures), like we do for system
      calls that now require two versions.
      
      Note: We used to always define __ARCH_WANT_SYS_TIME and
      __ARCH_WANT_SYS_UTIME and only set __ARCH_WANT_COMPAT_SYS_TIME and
      __ARCH_WANT_SYS_UTIME32 for compat mode on 64-bit kernels. Now this is
      reversed: only 64-bit architectures set __ARCH_WANT_SYS_TIME/UTIME, while
      we need __ARCH_WANT_SYS_TIME32/UTIME32 for 32-bit architectures and compat
      mode. The resulting asm/unistd.h changes look a bit counterintuitive.
      
      This is only a cleanup patch and it should not change any behavior.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      d33c577c
  9. 06 2月, 2019 2 次提交
  10. 02 2月, 2019 8 次提交
    • R
      ARM: avoid Cortex-A9 livelock on tight dmb loops · 5388a5b8
      Russell King 提交于
      machine_crash_nonpanic_core() does this:
      
      	while (1)
      		cpu_relax();
      
      because the kernel has crashed, and we have no known safe way to deal
      with the CPU.  So, we place the CPU into an infinite loop which we
      expect it to never exit - at least not until the system as a whole is
      reset by some method.
      
      In the absence of erratum 754327, this code assembles to:
      
      	b	.
      
      In other words, an infinite loop.  When erratum 754327 is enabled,
      this becomes:
      
      1:	dmb
      	b	1b
      
      It has been observed that on some systems (eg, OMAP4) where, if a
      crash is triggered, the system tries to kexec into the panic kernel,
      but fails after taking the secondary CPU down - placing it into one
      of these loops.  This causes the system to livelock, and the most
      noticable effect is the system stops after issuing:
      
      	Loading crashdump kernel...
      
      to the system console.
      
      The tested as working solution I came up with was to add wfe() to
      these infinite loops thusly:
      
      	while (1) {
      		cpu_relax();
      		wfe();
      	}
      
      which, without 754327 builds to:
      
      1:	wfe
      	b	1b
      
      or with 754327 is enabled:
      
      1:	dmb
      	wfe
      	b	1b
      
      Adding "wfe" does two things depending on the environment we're running
      under:
      - where we're running on bare metal, and the processor implements
        "wfe", it stops us spinning endlessly in a loop where we're never
        going to do any useful work.
      - if we're running in a VM, it allows the CPU to be given back to the
        hypervisor and rescheduled for other purposes (maybe a different VM)
        rather than wasting CPU cycles inside a crashed VM.
      
      However, in light of erratum 794072, Will Deacon wanted to see 10 nops
      as well - which is reasonable to cover the case where we have erratum
      754327 enabled _and_ we have a processor that doesn't implement the
      wfe hint.
      
      So, we now end up with:
      
      1:      wfe
              b       1b
      
      when erratum 754327 is disabled, or:
      
      1:      dmb
              nop
              nop
              nop
              nop
              nop
              nop
              nop
              nop
              nop
              nop
              wfe
              b       1b
      
      when erratum 754327 is enabled.  We also get the dmb + 10 nop
      sequence elsewhere in the kernel, in terminating loops.
      
      This is reasonable - it means we get the workaround for erratum
      794072 when erratum 754327 is enabled, but still relinquish the dead
      processor - either by placing it in a lower power mode when wfe is
      implemented as such or by returning it to the hypervisior, or in the
      case where wfe is a no-op, we use the workaround specified in erratum
      794072 to avoid the problem.
      
      These as two entirely orthogonal problems - the 10 nops addresses
      erratum 794072, and the wfe is an optimisation that makes the system
      more efficient when crashed either in terms of power consumption or
      by allowing the host/other VMs to make use of the CPU.
      
      I don't see any reason not to use kexec() inside a VM - it has the
      potential to provide automated recovery from a failure of the VMs
      kernel with the opportunity for saving a crashdump of the failure.
      A panic() with a reboot timeout won't do that, and reading the
      libvirt documentation, setting on_reboot to "preserve" won't either
      (the documentation states "The preserve action for an on_reboot event
      is treated as a destroy".)  Surely it has to be a good thing to
      avoiding having CPUs spinning inside a VM that is doing no useful
      work.
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      5388a5b8
    • R
      ARM: smp: remove arch-provided "pen_release" · 6213f70e
      Russell King 提交于
      Consolidating the "pen_release" stuff amongst the various SoC
      implementations gives credence to having a CPU holding pen for
      secondary CPUs.  However, this is far from the truth.
      
      Many SoC implementations cargo-cult copied various bits of the pen
      release implementation from the initial Realview/Versatile Express
      implementation without understanding what it was or why it existed.
      The reason it existed is because these are _development_ platforms,
      and some board firmware is unable to individually control the
      startup of secondary CPUs.  Moreover, they do not have a way to
      power down or reset secondary CPUs for hot-unplug.  Hence, the
      pen_release implementation was designed for ARM Ltd's development
      platforms to provide a working implementation, even though it is
      very far from what is required.
      
      It was decided a while back to reduce the duplication by consolidating
      the "pen_release" variable, but this only made the situation worse -
      we have ended up with several implementations that read this variable
      but do not write it - again, showing the cargo-cult mentality at work,
      lack of proper review of new code, and in some cases a lack of testing.
      
      While it would be preferable to remove pen_release entirely from the
      kernel, this is not possible without help from the SoC maintainers,
      which seems to be lacking.  However, I want to remove pen_release from
      arch code to remove the credence that having it gives.
      
      This patch removes pen_release from the arch code entirely, adding
      private per-SoC definitions for it instead, and explicitly stating
      that write_pen_release() is cargo-cult copied and should not be
      copied any further.  Rename write_pen_release() in a similar fashion
      as well.
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      6213f70e
    • D
      ARM: 8824/1: fix a migrating irq bug when hotplug cpu · 1b5ba350
      Dietmar Eggemann 提交于
      Arm TC2 fails cpu hotplug stress test.
      
      This issue was tracked down to a missing copy of the new affinity
      cpumask for the vexpress-spc interrupt into struct
      irq_common_data.affinity when the interrupt is migrated in
      migrate_one_irq().
      
      Fix it by replacing the arm specific hotplug cpu migration with the
      generic irq code.
      
      This is the counterpart implementation to commit 217d453d ("arm64:
      fix a migrating irq bug when hotplug cpu").
      
      Tested with cpu hotplug stress test on Arm TC2 (multi_v7_defconfig plus
      CONFIG_ARM_BIG_LITTLE_CPUFREQ=y and CONFIG_ARM_VEXPRESS_SPC_CPUFREQ=y).
      The vexpress-spc interrupt (irq=22) on this board is affine to CPU0.
      Its affinity cpumask now changes correctly e.g. from 0 to 1-4 when
      CPU0 is hotplugged out.
      Suggested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      1b5ba350
    • V
      ARM: 8830/1: NOMMU: Toggle only bits in EXC_RETURN we are really care of · 72cd4064
      Vladimir Murzin 提交于
      ARMv8M introduces support for Security extension to M class, among
      other things it affects exception handling, especially, encoding of
      EXC_RETURN.
      
      The new bits have been added:
      
      Bit [6]	Secure or Non-secure stack
      Bit [5]	Default callee register stacking
      Bit [0]	Exception Secure
      
      which conflicts with hard-coded value of EXC_RETURN:
      
      In fact, we only care of few bits:
      
      Bit [3]	 Mode (0 - Handler, 1 - Thread)
      Bit [2]	 Stack pointer selection (0 - Main, 1 - Process)
      
      We can toggle only those bits and left other bits as they were on
      exception entry.
      
      It is basically, what patch does - saves EXC_RETURN when we do
      transition form Thread to Handler mode (it is first svc), so later
      saved value is used instead of EXC_RET_THREADMODE_PROCESSSTACK.
      Signed-off-by: NVladimir Murzin <vladimir.murzin@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      72cd4064
    • S
      ARM: 8829/1: spinlock: use unified assembler language syntax · eb7ff902
      Stefan Agner 提交于
      Convert the conditional infix to a postfix to make sure this inline
      assembly is unified syntax. Since gcc assumes non-unified syntax
      when emitting ARM instructions, make sure to define the syntax as
      unified.
      
      This allows to use LLVM's integrated assembler.
      Signed-off-by: NStefan Agner <stefan@agner.ch>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      eb7ff902
    • S
      ARM: 8828/1: uaccess: use unified assembler language syntax · 32fdb046
      Stefan Agner 提交于
      Convert the conditional infix to a postfix to make sure this inline
      assembly is unified syntax. Since gcc assumes non-unified syntax
      when emitting ARM instructions, make sure to define the syntax as
      unified.
      
      This allows to use LLVM's integrated assembler.
      Signed-off-by: NStefan Agner <stefan@agner.ch>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      32fdb046
    • V
      ARM: 8823/1: Implement pgprot_device() · 58ca3382
      Vincent Whitchurch 提交于
      This is used when mmapping the PCI resource* files in sys.  Because ARM
      currently lacks an implementation of pgprot_device(), it falls back to
      pgprot_uncached() (Strongly Ordered), but we should be able to use
      Device memory instead.
      
      Doing this speeds up large writes to the resource files by about 40% on
      one of my systems.  It also ensures that mmaps on these resources use
      the same memory type as ioremap().
      Signed-off-by: NVincent Whitchurch <vincent.whitchurch@axis.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      58ca3382
    • G
      ARM: 8822/1: smp_twd: Remove legacy TWD registration · fec9eac6
      Geert Uytterhoeven 提交于
      As of commit 7484c727 ("ARM: realview: delete the RealView board
      files"), the ARM Timer and Watchdog Unit is instantiated from DT only.
      Moreover, the driver is selected from ARCH_MULTIPLATFORM platforms only,
      which implies OF, TIMER_OF, and COMMON_CLK.
      
      Hence remove all unused legacy infrastructure from the driver.
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      fec9eac6
  11. 26 1月, 2019 1 次提交
    • A
      ARM: add migrate_pages() system call · 78594b95
      Arnd Bergmann 提交于
      The migrate_pages system call has an assigned number on all architectures
      except ARM. When it got added initially in commit d80ade7b ("ARM:
      Fix warning: #warning syscall migrate_pages not implemented"), it was
      intentionally left out based on the observation that there are no 32-bit
      ARM NUMA systems.
      
      However, there are now arm64 NUMA machines that can in theory run 32-bit
      kernels (actually enabling NUMA there would require additional work)
      as well as 32-bit user space on 64-bit kernels, so that argument is no
      longer very strong.
      
      Assigning the number lets us use the system call on 64-bit kernels as well
      as providing a more consistent set of syscalls across architectures.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      78594b95
  12. 24 1月, 2019 1 次提交
  13. 05 1月, 2019 1 次提交
    • J
      mm: treewide: remove unused address argument from pte_alloc functions · 4cf58924
      Joel Fernandes (Google) 提交于
      Patch series "Add support for fast mremap".
      
      This series speeds up the mremap(2) syscall by copying page tables at
      the PMD level even for non-THP systems.  There is concern that the extra
      'address' argument that mremap passes to pte_alloc may do something
      subtle architecture related in the future that may make the scheme not
      work.  Also we find that there is no point in passing the 'address' to
      pte_alloc since its unused.  This patch therefore removes this argument
      tree-wide resulting in a nice negative diff as well.  Also ensuring
      along the way that the enabled architectures do not do anything funky
      with the 'address' argument that goes unnoticed by the optimization.
      
      Build and boot tested on x86-64.  Build tested on arm64.  The config
      enablement patch for arm64 will be posted in the future after more
      testing.
      
      The changes were obtained by applying the following Coccinelle script.
      (thanks Julia for answering all Coccinelle questions!).
      Following fix ups were done manually:
      * Removal of address argument from  pte_fragment_alloc
      * Removal of pte_alloc_one_fast definitions from m68k and microblaze.
      
      // Options: --include-headers --no-includes
      // Note: I split the 'identifier fn' line, so if you are manually
      // running it, please unsplit it so it runs for you.
      
      virtual patch
      
      @pte_alloc_func_def depends on patch exists@
      identifier E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      type T2;
      @@
      
       fn(...
      - , T2 E2
       )
       { ... }
      
      @pte_alloc_func_proto_noarg depends on patch exists@
      type T1, T2, T3, T4;
      identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1, T2);
      + T3 fn(T1);
      |
      - T3 fn(T1, T2, T4);
      + T3 fn(T1, T2);
      )
      
      @pte_alloc_func_proto depends on patch exists@
      identifier E1, E2, E4;
      type T1, T2, T3, T4;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1 E1, T2 E2);
      + T3 fn(T1 E1);
      |
      - T3 fn(T1 E1, T2 E2, T4 E4);
      + T3 fn(T1 E1, T2 E2);
      )
      
      @pte_alloc_func_call depends on patch exists@
      expression E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
       fn(...
      -,  E2
       )
      
      @pte_alloc_macro depends on patch exists@
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      identifier a, b, c;
      expression e;
      position p;
      @@
      
      (
      - #define fn(a, b, c) e
      + #define fn(a, b) e
      |
      - #define fn(a, b) e
      + #define fn(a) e
      )
      
      Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4cf58924
  14. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  15. 21 12月, 2018 1 次提交
  16. 20 12月, 2018 2 次提交
  17. 18 12月, 2018 4 次提交