1. 07 2月, 2019 1 次提交
  2. 26 1月, 2019 1 次提交
    • A
      arch: add split IPC system calls where needed · 0d6040d4
      Arnd Bergmann 提交于
      The IPC system call handling is highly inconsistent across architectures,
      some use sys_ipc, some use separate calls, and some use both.  We also
      have some architectures that require passing IPC_64 in the flags, and
      others that set it implicitly.
      
      For the addition of a y2038 safe semtimedop() system call, I chose to only
      support the separate entry points, but that requires first supporting
      the regular ones with their own syscall numbers.
      
      The IPC_64 is now implied by the new semctl/shmctl/msgctl system
      calls even on the architectures that require passing it with the ipc()
      multiplexer.
      
      I'm not adding the new semtimedop() or semop() on 32-bit architectures,
      those will get implemented using the new semtimedop_time64() version
      that gets added along with the other time64 calls.
      Three 64-bit architectures (powerpc, s390 and sparc) get semtimedop().
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      0d6040d4
  3. 06 1月, 2019 1 次提交
    • M
      jump_label: move 'asm goto' support test to Kconfig · e9666d10
      Masahiro Yamada 提交于
      Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".
      
      The jump label is controlled by HAVE_JUMP_LABEL, which is defined
      like this:
      
        #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
        # define HAVE_JUMP_LABEL
        #endif
      
      We can improve this by testing 'asm goto' support in Kconfig, then
      make JUMP_LABEL depend on CC_HAS_ASM_GOTO.
      
      Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
      match to the real kernel capability.
      Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      e9666d10
  4. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  5. 19 12月, 2018 1 次提交
    • I
      Revert "x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs" · e769742d
      Ingo Molnar 提交于
      This reverts commit 5bdcd510.
      
      The macro based workarounds for GCC's inlining bugs caused regressions: distcc
      and other distro build setups broke, and the fixes are not easy nor will they
      solve regressions on already existing installations.
      
      So we are reverting this patch and the 8 followup patches.
      
      What makes this revert easier is that GCC9 will likely include the new 'asm inline'
      syntax that makes inlining of assembly blocks a lot more robust.
      
      This is a superior method to any macro based hackeries - and might even be
      backported to GCC8, which would make all modern distros get the inlining
      fixes as well.
      
      Many thanks to Masahiro Yamada and others for helping sort out these problems.
      Reported-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Richard Biener <rguenther@suse.de>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Segher Boessenkool <segher@kernel.crashing.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e769742d
  6. 15 12月, 2018 1 次提交
  7. 08 12月, 2018 1 次提交
  8. 06 12月, 2018 1 次提交
    • A
      kprobes/x86: Blacklist non-attachable interrupt functions · a50480cb
      Andrea Righi 提交于
      These interrupt functions are already non-attachable by kprobes.
      Blacklist them explicitly so that they can show up in
      /sys/kernel/debug/kprobes/blacklist and tools like BCC can use this
      additional information.
      Signed-off-by: NAndrea Righi <righi.andrea@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yonghong Song <yhs@fb.com>
      Link: http://lkml.kernel.org/r/20181206095648.GA8249@DellSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a50480cb
  9. 05 12月, 2018 2 次提交
    • S
      x86/vdso: Remove a stale/misleading comment from the linker script · 29434801
      Sean Christopherson 提交于
      Once upon a time, vdso2c aggressively stripped data from the vDSO
      image when generating the final userspace image.  This included
      stripping the .altinstructions and .altinstr_replacement sections.
      Eventually, the stripping process reverted to "objdump -S" and no
      longer removed the aforementioned sections, but the comment remained.
      
      Keeping the .alt* sections at the end of the PT_LOAD segment is no
      longer necessary, but there's no harm in doing so and it's a helpful
      reminder that they don't need to be included in the final vDSO image,
      i.e. someone may want to take another stab at zapping/stripping the
      unneeded sections.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: da861e18 ("x86, vdso: Get rid of the fake section mechanism")
      Link: http://lkml.kernel.org/r/20181204212600.28090-3-sean.j.christopherson@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      29434801
    • S
      x86/vdso: Remove obsolete "fake section table" reservation · 24b7c77b
      Sean Christopherson 提交于
      At one point the vDSO image was manually stripped down by vdso2c in an
      attempt to minimize the size of the image mapped into userspace.  Part
      of that stripping process involved building a fake section table so as
      not to break userspace processes that parse the section table.  Memory
      for the fake section table was reserved in the .rodata section so that
      vdso2c could simply copy the entire PT_LOAD segment into the userspace
      image after building the fake table.
      
      Eventually, the entire fake section table approach was dropped in favor
      of stripping the vdso "the old fashioned way", i.e. via objdump -S.
      But, the reservation in .rodata for the fake table was left behind.
      Remove the reserveration along with a few other related defines and
      section entries.
      
      Removing the fake section table placeholder zaps a whopping 0x340 bytes
      from the 64-bit vDSO image, which drops the current image's size to
      under 4k, i.e. reduces the effective size of the userspace vDSO mapping
      by a full page.
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: da861e18 ("x86, vdso: Get rid of the fake section mechanism")
      Link: http://lkml.kernel.org/r/20181204212600.28090-2-sean.j.christopherson@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      24b7c77b
  10. 03 12月, 2018 1 次提交
    • I
      x86: Fix various typos in comments · a97673a1
      Ingo Molnar 提交于
      Go over arch/x86/ and fix common typos in comments,
      and a typo in an actual function argument name.
      
      No change in functionality intended.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a97673a1
  11. 22 11月, 2018 1 次提交
  12. 27 10月, 2018 1 次提交
  13. 17 10月, 2018 2 次提交
  14. 14 10月, 2018 1 次提交
  15. 08 10月, 2018 5 次提交
    • I
      x86/fsgsbase/64: Clean up various details · ec3a9418
      Ingo Molnar 提交于
      So:
      
       - use 'extern' consistently for APIs
      
       - fix weird header guard
      
       - clarify code comments
      
       - reorder APIs by type
      
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chang S. Bae <chang.seok.bae@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Markus T Metzger <markus.t.metzger@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1537312139-5580-2-git-send-email-chang.seok.bae@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ec3a9418
    • I
      x86/segments: Introduce the 'CPUNODE' naming to better document the segment limit CPU/node NR trick · 22245bdf
      Ingo Molnar 提交于
      We have a special segment descriptor entry in the GDT, whose sole purpose is to
      encode the CPU and node numbers in its limit (size) field. There are user-space
      instructions that allow the reading of the limit field, which gives us a really
      fast way to read the CPU and node IDs from the vDSO for example.
      
      But the naming of related functionality does not make this clear, at all:
      
      	VDSO_CPU_SIZE
      	VDSO_CPU_MASK
      	__CPU_NUMBER_SEG
      	GDT_ENTRY_CPU_NUMBER
      	vdso_encode_cpu_node
      	vdso_read_cpu_node
      
      There's a number of problems:
      
       - The 'VDSO_CPU_SIZE' doesn't really make it clear that these are number
         of bits, nor does it make it clear which 'CPU' this refers to, i.e.
         that this is about a GDT entry whose limit encodes the CPU and node number.
      
       - Furthermore, the 'CPU_NUMBER' naming is actively misleading as well,
         because the segment limit encodes not just the CPU number but the
         node ID as well ...
      
      So use a better nomenclature all around: name everything related to this trick
      as 'CPUNODE', to make it clear that this is something special, and add
      _BITS to make it clear that these are number of bits, and propagate this to
      every affected name:
      
      	VDSO_CPU_SIZE         =>  VDSO_CPUNODE_BITS
      	VDSO_CPU_MASK         =>  VDSO_CPUNODE_MASK
      	__CPU_NUMBER_SEG      =>  __CPUNODE_SEG
      	GDT_ENTRY_CPU_NUMBER  =>  GDT_ENTRY_CPUNODE
      	vdso_encode_cpu_node  =>  vdso_encode_cpunode
      	vdso_read_cpu_node    =>  vdso_read_cpunode
      
      This, beyond being less confusing, also makes it easier to grep for all related
      functionality:
      
        $ git grep -i cpunode arch/x86
      
      Also, while at it, fix "return is not a function" style sloppiness in vdso_encode_cpunode().
      
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chang S. Bae <chang.seok.bae@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Markus T Metzger <markus.t.metzger@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Link: http://lkml.kernel.org/r/1537312139-5580-2-git-send-email-chang.seok.bae@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      22245bdf
    • C
      x86/vdso: Initialize the CPU/node NR segment descriptor earlier · b2e2ba57
      Chang S. Bae 提交于
      Currently the CPU/node NR segment descriptor (GDT_ENTRY_CPU_NUMBER) is
      initialized relatively late during CPU init, from the vCPU code, which
      has a number of disadvantages, such as hotplug CPU notifiers and SMP
      cross-calls.
      
      Instead just initialize it much earlier, directly in cpu_init().
      
      This reduces complexity and increases robustness.
      
      [ mingo: Wrote new changelog. ]
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Markus T Metzger <markus.t.metzger@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1537312139-5580-9-git-send-email-chang.seok.bae@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b2e2ba57
    • C
      x86/vdso: Introduce helper functions for CPU and node number · ffebbaed
      Chang S. Bae 提交于
      Clean up the CPU/node number related code a bit, to make it more apparent
      how we are encoding/extracting the CPU and node fields from the
      segment limit.
      
      No change in functionality intended.
      
      [ mingo: Wrote new changelog. ]
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Markus T Metzger <markus.t.metzger@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Rik van Riel <riel@surriel.com>
      Link: http://lkml.kernel.org/r/1537312139-5580-8-git-send-email-chang.seok.bae@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ffebbaed
    • C
      x86/segments/64: Rename the GDT PER_CPU entry to CPU_NUMBER · c4755613
      Chang S. Bae 提交于
      The old 'per CPU' naming was misleading: 64-bit kernels don't use this
      GDT entry for per CPU data, but to store the CPU (and node) ID.
      
      [ mingo: Wrote new changelog. ]
      Suggested-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NChang S. Bae <chang.seok.bae@intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Markus T Metzger <markus.t.metzger@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Rik van Riel <riel@surriel.com>
      Link: http://lkml.kernel.org/r/1537312139-5580-7-git-send-email-chang.seok.bae@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c4755613
  16. 06 10月, 2018 2 次提交
  17. 05 10月, 2018 10 次提交
    • A
      x86/vdso: Remove "memory" clobbers in the vDSO syscall fallbacks · 89fe0a1f
      Andy Lutomirski 提交于
      When a vDSO clock function falls back to the syscall, no special
      barriers or ordering is needed, and the syscall fallbacks don't
      clobber any memory that is not explicitly listed in the asm
      constraints.  Remove the "memory" clobber.
      
      This causes minor changes to the generated code, but otherwise has
      no obvious performance impact.  I think it's nice to have, though,
      since it may help the optimizer in the future.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/3a7438f5fb2422ed881683d2ccffd7f987b2dc44.1538689401.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      89fe0a1f
    • T
      x66/vdso: Add CLOCK_TAI support · 315f28fa
      Thomas Gleixner 提交于
      With the storage array in place it's now trivial to support CLOCK_TAI in
      the vdso. Extend the base time storage array and add the update code.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NMatt Rickard <matt@softrans.com.au>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: devel@linuxdriverproject.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Juergen Gross <jgross@suse.com>
      Link: https://lkml.kernel.org/r/20180917130707.823878601@linutronix.de
      315f28fa
    • T
      x86/vdso: Move cycle_last handling into the caller · 3e89bf35
      Thomas Gleixner 提交于
      Dereferencing gtod->cycle_last all over the place and foing the cycles <
      last comparison in the vclock read functions generates horrible code. Doing
      it at the call site is much better and gains a few cycles both for TSC and
      pvclock.
      
      Caveat: This adds the comparison to the hyperv vclock as well, but I have
      no way to test that.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Matt Rickard <matt@softrans.com.au>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: devel@linuxdriverproject.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Juergen Gross <jgross@suse.com>
      Link: https://lkml.kernel.org/r/20180917130707.741440803@linutronix.de
      3e89bf35
    • T
      x86/vdso: Simplify the invalid vclock case · 4f72adc5
      Thomas Gleixner 提交于
      The code flow for the vclocks is convoluted as it requires the vclocks
      which can be invalidated separately from the vsyscall_gtod_data sequence to
      store the fact in a separate variable. That's inefficient.
      
      Restructure the code so the vclock readout returns cycles and the
      conversion to nanoseconds is handled at the call site.
      
      If the clock gets invalidated or vclock is already VCLOCK_NONE, return
      U64_MAX as the cycle value, which is invalid for all clocks and leave the
      sequence loop immediately in that case by calling the fallback function
      directly.
      
      This allows to remove the gettimeofday fallback as it now uses the
      clock_gettime() fallback and does the nanoseconds to microseconds
      conversion in the same way as it does when the vclock is functional. It
      does not make a difference whether the division by 1000 happens in the
      kernel fallback or in userspace.
      
      Generates way better code and gains a few cycles back.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Matt Rickard <matt@softrans.com.au>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: devel@linuxdriverproject.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Juergen Gross <jgross@suse.com>
      Link: https://lkml.kernel.org/r/20180917130707.657928937@linutronix.de
      4f72adc5
    • T
      x86/vdso: Replace the clockid switch case · f3e83938
      Thomas Gleixner 提交于
      Now that the time getter functions use the clockid as index into the
      storage array for the base time access, the switch case can be replaced.
      
      - Check for clockid >= MAX_CLOCKS and for negative clockid (CPU/FD) first
        and call the fallback function right away.
      
      - After establishing that clockid is < MAX_CLOCKS, convert the clockid to a
        bitmask
      
      - Check for the supported high resolution and coarse functions by anding
        the bitmask of supported clocks and check whether a bit is set.
      
      This completely avoids jump tables, reduces the number of conditionals and
      makes the VDSO extensible for other clock ids.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Matt Rickard <matt@softrans.com.au>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: devel@linuxdriverproject.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Juergen Gross <jgross@suse.com>
      Link: https://lkml.kernel.org/r/20180917130707.574315796@linutronix.de
      f3e83938
    • T
      x86/vdso: Collapse coarse functions · 6deec5bd
      Thomas Gleixner 提交于
      do_realtime_coarse() and do_monotonic_coarse() are now the same except for
      the storage array index. Hand the index in as an argument and collapse the
      functions.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Matt Rickard <matt@softrans.com.au>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: devel@linuxdriverproject.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Juergen Gross <jgross@suse.com>
      Link: https://lkml.kernel.org/r/20180917130707.490733779@linutronix.de
      6deec5bd
    • T
      x86/vdso: Collapse high resolution functions · e9a62f76
      Thomas Gleixner 提交于
      do_realtime() and do_monotonic() are now the same except for the storage
      array index. Hand the index in as an argument and collapse the functions.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Matt Rickard <matt@softrans.com.au>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: devel@linuxdriverproject.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Juergen Gross <jgross@suse.com>
      Link: https://lkml.kernel.org/r/20180917130707.407955860@linutronix.de
      e9a62f76
    • T
      x86/vdso: Introduce and use vgtod_ts · 49116f20
      Thomas Gleixner 提交于
      It's desired to support more clocks in the VDSO, e.g. CLOCK_TAI. This
      results either in indirect calls due to the larger switch case, which then
      requires retpolines or when the compiler is forced to avoid jump tables it
      results in even more conditionals.
      
      To avoid both variants which are bad for performance the high resolution
      functions and the coarse grained functions will be collapsed into one for
      each. That requires to store the clock specific base time in an array.
      
      Introcude struct vgtod_ts for storage and convert the data store, the
      update function and the individual clock functions over to use it.
      
      The new storage does not longer use gtod_long_t for seconds depending on 32
      or 64 bit compile because this needs to be the full 64bit value even for
      32bit when a Y2038 function is added. No point in keeping the distinction
      alive in the internal representation.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Matt Rickard <matt@softrans.com.au>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: devel@linuxdriverproject.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Juergen Gross <jgross@suse.com>
      Link: https://lkml.kernel.org/r/20180917130707.324679401@linutronix.de
      49116f20
    • T
      x86/vdso: Use unsigned int consistently for vsyscall_gtod_data:: Seq · 77e9c678
      Thomas Gleixner 提交于
      The sequence count in vgtod_data is unsigned int, but the call sites use
      unsigned long, which is a pointless exercise. Fix the call sites and
      replace 'unsigned' with unsinged 'int' while at it.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Matt Rickard <matt@softrans.com.au>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: devel@linuxdriverproject.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Juergen Gross <jgross@suse.com>
      Link: https://lkml.kernel.org/r/20180917130707.236250416@linutronix.de
      77e9c678
    • T
      x86/vdso: Enforce 64bit clocksource · a51e996d
      Thomas Gleixner 提交于
      All VDSO clock sources are TSC based and use CLOCKSOURCE_MASK(64). There is
      no point in masking with all FF. Get rid of it and enforce the mask in the
      sanity checker.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Matt Rickard <matt@softrans.com.au>
      Cc: Stephen Boyd <sboyd@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Florian Weimer <fweimer@redhat.com>
      Cc: "K. Y. Srinivasan" <kys@microsoft.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: devel@linuxdriverproject.org
      Cc: virtualization@lists.linux-foundation.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Juergen Gross <jgross@suse.com>
      Link: https://lkml.kernel.org/r/20180917130707.151963007@linutronix.de
      a51e996d
  18. 04 10月, 2018 1 次提交
  19. 03 10月, 2018 1 次提交
  20. 02 10月, 2018 1 次提交
    • A
      x86/vdso: Fix asm constraints on vDSO syscall fallbacks · 715bd9d1
      Andy Lutomirski 提交于
      The syscall fallbacks in the vDSO have incorrect asm constraints.
      They are not marked as writing to their outputs -- instead, they are
      marked as clobbering "memory", which is useless.  In particular, gcc
      is smart enough to know that the timespec parameter hasn't escaped,
      so a memory clobber doesn't clobber it.  And passing a pointer as an
      asm *input* does not tell gcc that the pointed-to value is changed.
      
      Add in the fact that the asm instructions weren't volatile, and gcc
      was free to omit them entirely unless their sole output (the return
      value) is used.  Which it is (phew!), but that stops happening with
      some upcoming patches.
      
      As a trivial example, the following code:
      
      void test_fallback(struct timespec *ts)
      {
      	vdso_fallback_gettime(CLOCK_MONOTONIC, ts);
      }
      
      compiles to:
      
      00000000000000c0 <test_fallback>:
        c0:   c3                      retq
      
      To add insult to injury, the RCX and R11 clobbers on 64-bit
      builds were missing.
      
      The "memory" clobber is also unnecessary -- no ordering with respect to
      other memory operations is needed, but that's going to be fixed in a
      separate not-for-stable patch.
      
      Fixes: 2aae950b ("x86_64: Add vDSO for x86-64 with gettimeofday/clock_gettime/getcpu")
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: https://lkml.kernel.org/r/2c0231690551989d2fafa60ed0e7b5cc8b403908.1538422295.git.luto@kernel.org
      715bd9d1
  21. 21 9月, 2018 1 次提交
  22. 13 9月, 2018 1 次提交
    • A
      x86/pti/64: Remove the SYSCALL64 entry trampoline · bf904d27
      Andy Lutomirski 提交于
      The SYSCALL64 trampoline has a couple of nice properties:
      
       - The usual sequence of SWAPGS followed by two GS-relative accesses to
         set up RSP is somewhat slow because the GS-relative accesses need
         to wait for SWAPGS to finish.  The trampoline approach allows
         RIP-relative accesses to set up RSP, which avoids the stall.
      
       - The trampoline avoids any percpu access before CR3 is set up,
         which means that no percpu memory needs to be mapped in the user
         page tables.  This prevents using Meltdown to read any percpu memory
         outside the cpu_entry_area and prevents using timing leaks
         to directly locate the percpu areas.
      
      The downsides of using a trampoline may outweigh the upsides, however.
      It adds an extra non-contiguous I$ cache line to system calls, and it
      forces an indirect jump to transfer control back to the normal kernel
      text after CR3 is set up.  The latter is because x86 lacks a 64-bit
      direct jump instruction that could jump from the trampoline to the entry
      text.  With retpolines enabled, the indirect jump is extremely slow.
      
      Change the code to map the percpu TSS into the user page tables to allow
      the non-trampoline SYSCALL64 path to work under PTI.  This does not add a
      new direct information leak, since the TSS is readable by Meltdown from the
      cpu_entry_area alias regardless.  It does allow a timing attack to locate
      the percpu area, but KASLR is more or less a lost cause against local
      attack on CPUs vulnerable to Meltdown regardless.  As far as I'm concerned,
      on current hardware, KASLR is only useful to mitigate remote attacks that
      try to attack the kernel without first gaining RCE against a vulnerable
      user process.
      
      On Skylake, with CONFIG_RETPOLINE=y and KPTI on, this reduces syscall
      overhead from ~237ns to ~228ns.
      
      There is a possible alternative approach: Move the trampoline within 2G of
      the entry text and make a separate copy for each CPU.  This would allow a
      direct jump to rejoin the normal entry path. There are pro's and con's for
      this approach:
      
       + It avoids a pipeline stall
      
       - It executes from an extra page and read from another extra page during
         the syscall. The latter is because it needs to use a relative
         addressing mode to find sp1 -- it's the same *cacheline*, but accessed
         using an alias, so it's an extra TLB entry.
      
       - Slightly more memory. This would be one page per CPU for a simple
         implementation and 64-ish bytes per CPU or one page per node for a more
         complex implementation.
      
       - More code complexity.
      
      The current approach is chosen for simplicity and because the alternative
      does not provide a significant benefit, which makes it worth.
      
      [ tglx: Added the alternative discussion to the changelog ]
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: https://lkml.kernel.org/r/8c7c6e483612c3e4e10ca89495dc160b1aa66878.1536015544.git.luto@kernel.org
      bf904d27
  23. 08 9月, 2018 2 次提交