1. 06 1月, 2020 2 次提交
  2. 30 12月, 2019 1 次提交
  3. 16 12月, 2019 1 次提交
  4. 13 12月, 2019 2 次提交
    • S
      powerpc/shared: Use static key to detect shared processor · 656c21d6
      Srikar Dronamraju 提交于
      With the static key shared processor available, is_shared_processor()
      can return without having to query the lppaca structure.
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: NPhil Auld <pauld@redhat.com>
      Acked-by: NWaiman Long <longman@redhat.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20191213035036.6913-2-mpe@ellerman.id.au
      656c21d6
    • S
      powerpc/vcpu: Assume dedicated processors as non-preempt · 14c73bd3
      Srikar Dronamraju 提交于
      With commit 247f2f6f ("sched/core: Don't schedule threads on
      pre-empted vCPUs"), the scheduler avoids preempted vCPUs to schedule
      tasks on wakeup. This leads to wrong choice of CPU, which in-turn
      leads to larger wakeup latencies. Eventually, it leads to performance
      regression in latency sensitive benchmarks like soltp, schbench etc.
      
      On Powerpc, vcpu_is_preempted() only looks at yield_count. If the
      yield_count is odd, the vCPU is assumed to be preempted. However
      yield_count is increased whenever the LPAR enters CEDE state (idle).
      So any CPU that has entered CEDE state is assumed to be preempted.
      
      Even if vCPU of dedicated LPAR is preempted/donated, it should have
      right of first-use since they are supposed to own the vCPU.
      
      On a Power9 System with 32 cores:
        # lscpu
        Architecture:        ppc64le
        Byte Order:          Little Endian
        CPU(s):              128
        On-line CPU(s) list: 0-127
        Thread(s) per core:  8
        Core(s) per socket:  1
        Socket(s):           16
        NUMA node(s):        2
        Model:               2.2 (pvr 004e 0202)
        Model name:          POWER9 (architected), altivec supported
        Hypervisor vendor:   pHyp
        Virtualization type: para
        L1d cache:           32K
        L1i cache:           32K
        L2 cache:            512K
        L3 cache:            10240K
        NUMA node0 CPU(s):   0-63
        NUMA node1 CPU(s):   64-127
      
        # perf stat -a -r 5 ./schbench
        v5.4                               v5.4 + patch
        Latency percentiles (usec)         Latency percentiles (usec)
              50.0000th: 45                      50.0th: 45
              75.0000th: 62                      75.0th: 63
              90.0000th: 71                      90.0th: 74
              95.0000th: 77                      95.0th: 78
              *99.0000th: 91                     *99.0th: 82
              99.5000th: 707                     99.5th: 83
              99.9000th: 6920                    99.9th: 86
              min=0, max=10048                   min=0, max=96
        Latency percentiles (usec)         Latency percentiles (usec)
              50.0000th: 45                      50.0th: 46
              75.0000th: 61                      75.0th: 64
              90.0000th: 72                      90.0th: 75
              95.0000th: 79                      95.0th: 79
              *99.0000th: 691                    *99.0th: 83
              99.5000th: 3972                    99.5th: 85
              99.9000th: 8368                    99.9th: 91
              min=0, max=16606                   min=0, max=117
        Latency percentiles (usec)         Latency percentiles (usec)
              50.0000th: 45                      50.0th: 46
              75.0000th: 61                      75.0th: 64
              90.0000th: 71                      90.0th: 75
              95.0000th: 77                      95.0th: 79
              *99.0000th: 106                    *99.0th: 83
              99.5000th: 2364                    99.5th: 84
              99.9000th: 7480                    99.9th: 90
              min=0, max=10001                   min=0, max=95
        Latency percentiles (usec)         Latency percentiles (usec)
              50.0000th: 45                      50.0th: 47
              75.0000th: 62                      75.0th: 65
              90.0000th: 72                      90.0th: 75
              95.0000th: 78                      95.0th: 79
              *99.0000th: 93                     *99.0th: 84
              99.5000th: 108                     99.5th: 85
              99.9000th: 6792                    99.9th: 90
              min=0, max=17681                   min=0, max=117
        Latency percentiles (usec)         Latency percentiles (usec)
              50.0000th: 46                      50.0th: 45
              75.0000th: 62                      75.0th: 64
              90.0000th: 73                      90.0th: 75
              95.0000th: 79                      95.0th: 79
              *99.0000th: 113                    *99.0th: 82
              99.5000th: 2724                    99.5th: 83
              99.9000th: 6184                    99.9th: 93
              min=0, max=9887                    min=0, max=111
      
         Performance counter stats for 'system wide' (5 runs):
      
        context-switches    43,373  ( +-  0.40% )   44,597 ( +-  0.55% )
        cpu-migrations       1,211  ( +-  5.04% )      220 ( +-  6.23% )
        page-faults         15,983  ( +-  5.21% )   15,360 ( +-  3.38% )
      
      Waiman Long suggested using static_keys.
      
      Fixes: 247f2f6f ("sched/core: Don't schedule threads on pre-empted vCPUs")
      Cc: stable@vger.kernel.org # v4.18+
      Reported-by: NParth Shah <parth@linux.ibm.com>
      Reported-by: NIhor Pasichnyk <Ihor.Pasichnyk@ibm.com>
      Tested-by: NJuri Lelli <juri.lelli@redhat.com>
      Acked-by: NWaiman Long <longman@redhat.com>
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Acked-by: NPhil Auld <pauld@redhat.com>
      Reviewed-by: NVaidyanathan Srinivasan <svaidy@linux.ibm.com>
      Tested-by: NParth Shah <parth@linux.ibm.com>
      [mpe: Move the key and setting of the key to pseries/setup.c]
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20191213035036.6913-1-mpe@ellerman.id.au
      14c73bd3
  5. 05 12月, 2019 3 次提交
    • M
      arch: sembuf.h: make uapi asm/sembuf.h self-contained · 0fb9dc28
      Masahiro Yamada 提交于
      Userspace cannot compile <asm/sembuf.h> due to some missing type
      definitions.  For example, building it for x86 fails as follows:
      
          CC      usr/include/asm/sembuf.h.s
        In file included from <command-line>:32:0:
        usr/include/asm/sembuf.h:17:20: error: field `sem_perm' has incomplete type
          struct ipc64_perm sem_perm; /* permissions .. see ipc.h */
                            ^~~~~~~~
        usr/include/asm/sembuf.h:24:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t sem_otime; /* last semop time */
          ^~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:25:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused1;
          ^~~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:26:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t sem_ctime; /* last change time */
          ^~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:27:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused2;
          ^~~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:29:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t sem_nsems; /* no. of semaphores in array */
          ^~~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:30:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused3;
          ^~~~~~~~~~~~~~~~
        usr/include/asm/sembuf.h:31:2: error: unknown type name `__kernel_ulong_t'
          __kernel_ulong_t __unused4;
          ^~~~~~~~~~~~~~~~
      
      It is just a matter of missing include directive.
      
      Include <asm/ipcbuf.h> to make it self-contained, and add it to
      the compile-test coverage.
      
      Link: http://lkml.kernel.org/r/20191030063855.9989-3-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0fb9dc28
    • M
      arch: msgbuf.h: make uapi asm/msgbuf.h self-contained · 9ef0e004
      Masahiro Yamada 提交于
      Userspace cannot compile <asm/msgbuf.h> due to some missing type
      definitions.  For example, building it for x86 fails as follows:
      
          CC      usr/include/asm/msgbuf.h.s
        In file included from usr/include/asm/msgbuf.h:6:0,
                         from <command-line>:32:
        usr/include/asm-generic/msgbuf.h:25:20: error: field `msg_perm' has incomplete type
          struct ipc64_perm msg_perm;
                            ^~~~~~~~
        usr/include/asm-generic/msgbuf.h:27:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t msg_stime; /* last msgsnd time */
          ^~~~~~~~~~~~~~~
        usr/include/asm-generic/msgbuf.h:28:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t msg_rtime; /* last msgrcv time */
          ^~~~~~~~~~~~~~~
        usr/include/asm-generic/msgbuf.h:29:2: error: unknown type name `__kernel_time_t'
          __kernel_time_t msg_ctime; /* last change time */
          ^~~~~~~~~~~~~~~
        usr/include/asm-generic/msgbuf.h:41:2: error: unknown type name `__kernel_pid_t'
          __kernel_pid_t msg_lspid; /* pid of last msgsnd */
          ^~~~~~~~~~~~~~
        usr/include/asm-generic/msgbuf.h:42:2: error: unknown type name `__kernel_pid_t'
          __kernel_pid_t msg_lrpid; /* last receive pid */
          ^~~~~~~~~~~~~~
      
      It is just a matter of missing include directive.
      
      Include <asm/ipcbuf.h> to make it self-contained, and add it to
      the compile-test coverage.
      
      Link: http://lkml.kernel.org/r/20191030063855.9989-2-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9ef0e004
    • A
      powerpc/archrandom: fix arch_get_random_seed_int() · b6afd123
      Ard Biesheuvel 提交于
      Commit 01c9348c
      
        powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_*
      
      updated arch_get_random_[int|long]() to be NOPs, and moved the hardware
      RNG backing to arch_get_random_seed_[int|long]() instead. However, it
      failed to take into account that arch_get_random_int() was implemented
      in terms of arch_get_random_long(), and so we ended up with a version
      of the former that is essentially a NOP as well.
      
      Fix this by calling arch_get_random_seed_long() from
      arch_get_random_seed_int() instead.
      
      Fixes: 01c9348c ("powerpc: Use hardware RNG for arch_get_random_seed_* not arch_get_random_*")
      Signed-off-by: NArd Biesheuvel <ardb@kernel.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20191204115015.18015-1-ardb@kernel.org
      b6afd123
  6. 04 12月, 2019 1 次提交
  7. 02 12月, 2019 1 次提交
  8. 28 11月, 2019 5 次提交
    • B
      KVM: PPC: Book3S HV: Support reset of secure guest · 22945688
      Bharata B Rao 提交于
      Add support for reset of secure guest via a new ioctl KVM_PPC_SVM_OFF.
      This ioctl will be issued by QEMU during reset and includes the
      the following steps:
      
      - Release all device pages of the secure guest.
      - Ask UV to terminate the guest via UV_SVM_TERMINATE ucall
      - Unpin the VPA pages so that they can be migrated back to secure
        side when guest becomes secure again. This is required because
        pinned pages can't be migrated.
      - Reinit the partition scoped page tables
      
      After these steps, guest is ready to issue UV_ESM call once again
      to switch to secure mode.
      Signed-off-by: NBharata B Rao <bharata@linux.ibm.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      	[Implementation of uv_svm_terminate() and its call from
      	guest shutdown path]
      Signed-off-by: NRam Pai <linuxram@us.ibm.com>
      	[Unpinning of VPA pages]
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      22945688
    • B
      KVM: PPC: Book3S HV: Handle memory plug/unplug to secure VM · c3262257
      Bharata B Rao 提交于
      Register the new memslot with UV during plug and unregister
      the memslot during unplug. In addition, release all the
      device pages during unplug.
      Signed-off-by: NBharata B Rao <bharata@linux.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      c3262257
    • B
      KVM: PPC: Book3S HV: Radix changes for secure guest · 008e359c
      Bharata B Rao 提交于
      - After the guest becomes secure, when we handle a page fault of a page
        belonging to SVM in HV, send that page to UV via UV_PAGE_IN.
      - Whenever a page is unmapped on the HV side, inform UV via UV_PAGE_INVAL.
      - Ensure all those routines that walk the secondary page tables of
        the guest don't do so in case of secure VM. For secure guest, the
        active secondary page tables are in secure memory and the secondary
        page tables in HV are freed when guest becomes secure.
      Signed-off-by: NBharata B Rao <bharata@linux.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      008e359c
    • B
      KVM: PPC: Book3S HV: Shared pages support for secure guests · 60f0a643
      Bharata B Rao 提交于
      A secure guest will share some of its pages with hypervisor (Eg. virtio
      bounce buffers etc). Support sharing of pages between hypervisor and
      ultravisor.
      
      Shared page is reachable via both HV and UV side page tables. Once a
      secure page is converted to shared page, the device page that represents
      the secure page is unmapped from the HV side page tables.
      Signed-off-by: NBharata B Rao <bharata@linux.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      60f0a643
    • B
      KVM: PPC: Book3S HV: Support for running secure guests · ca9f4942
      Bharata B Rao 提交于
      A pseries guest can be run as secure guest on Ultravisor-enabled
      POWER platforms. On such platforms, this driver will be used to manage
      the movement of guest pages between the normal memory managed by
      hypervisor (HV) and secure memory managed by Ultravisor (UV).
      
      HV is informed about the guest's transition to secure mode via hcalls:
      
      H_SVM_INIT_START: Initiate securing a VM
      H_SVM_INIT_DONE: Conclude securing a VM
      
      As part of H_SVM_INIT_START, register all existing memslots with
      the UV. H_SVM_INIT_DONE call by UV informs HV that transition of
      the guest to secure mode is complete.
      
      These two states (transition to secure mode STARTED and transition
      to secure mode COMPLETED) are recorded in kvm->arch.secure_guest.
      Setting these states will cause the assembly code that enters the
      guest to call the UV_RETURN ucall instead of trying to enter the
      guest directly.
      
      Migration of pages betwen normal and secure memory of secure
      guest is implemented in H_SVM_PAGE_IN and H_SVM_PAGE_OUT hcalls.
      
      H_SVM_PAGE_IN: Move the content of a normal page to secure page
      H_SVM_PAGE_OUT: Move the content of a secure page to normal page
      
      Private ZONE_DEVICE memory equal to the amount of secure memory
      available in the platform for running secure guests is created.
      Whenever a page belonging to the guest becomes secure, a page from
      this private device memory is used to represent and track that secure
      page on the HV side. The movement of pages between normal and secure
      memory is done via migrate_vma_pages() using UV_PAGE_IN and
      UV_PAGE_OUT ucalls.
      
      In order to prevent the device private pages (that correspond to pages
      of secure guest) from participating in KSM merging, H_SVM_PAGE_IN
      calls ksm_madvise() under read version of mmap_sem. However
      ksm_madvise() needs to be under write lock.  Hence we call
      kvmppc_svm_page_in with mmap_sem held for writing, and it then
      downgrades to a read lock after calling ksm_madvise.
      
      [paulus@ozlabs.org - roll in patch "KVM: PPC: Book3S HV: Take write
       mmap_sem when calling ksm_madvise"]
      Signed-off-by: NBharata B Rao <bharata@linux.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@ozlabs.org>
      ca9f4942
  9. 27 11月, 2019 2 次提交
  10. 25 11月, 2019 1 次提交
  11. 21 11月, 2019 2 次提交
  12. 19 11月, 2019 3 次提交
  13. 18 11月, 2019 5 次提交
  14. 15 11月, 2019 4 次提交
    • A
      y2038: syscalls: change remaining timeval to __kernel_old_timeval · 75d319c0
      Arnd Bergmann 提交于
      All of the remaining syscalls that pass a timeval (gettimeofday, utime,
      futimesat) can trivially be changed to pass a __kernel_old_timeval
      instead, which has a compatible layout, but avoids ambiguity with
      the timeval type in user space.
      Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      75d319c0
    • A
      y2038: stat: avoid 'time_t' in 'struct stat' · 1bf883c1
      Arnd Bergmann 提交于
      The time_t definition may differ between user space and kernel space,
      so replace time_t with an unambiguous 'long' for the mips and sparc.
      
      The same structures also contain 'off_t', which has the same problem,
      so replace that as well on those two architectures and powerpc.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      1bf883c1
    • A
      y2038: ipc: remove __kernel_time_t reference from headers · caf5e32d
      Arnd Bergmann 提交于
      There are two structures based on time_t that conflict between libc and
      kernel: timeval and timespec. Both are now renamed to __kernel_old_timeval
      and __kernel_old_timespec.
      
      For time_t, the old typedef is still __kernel_time_t. There is nothing
      wrong with that name, but it would be nice to not use that going forward
      as this type is used almost only in deprecated interfaces because of
      the y2038 overflow.
      
      In the IPC headers (msgbuf.h, sembuf.h, shmbuf.h), __kernel_time_t is only
      used for the 64-bit variants, which are not deprecated.
      
      Change these to a plain 'long', which is the same type as __kernel_time_t
      on all 64-bit architectures anyway, to reduce the number of users of the
      old type.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      caf5e32d
    • A
      y2038: vdso: powerpc: avoid timespec references · 176ed98c
      Arnd Bergmann 提交于
      As a preparation to stop using 'struct timespec' in the kernel,
      change the powerpc vdso implementation:
      
      - split up the vdso data definition to have equivalent members
         for seconds and nanoseconds instead of an xtime structure
      
      - use timespec64 as an intermediate for the xtime update
      
      - change the asm-offsets definition to be based the appropriate
        fixed-length types
      
      This is only a temporary fix for changing the types, in order
      to actually support a 64-bit safe vdso32 version of clock_gettime(),
      the entire powerpc vdso should be replaced with the generic
      lib/vdso/ implementation. If that happens first, this patch
      becomes obsolete.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      176ed98c
  15. 14 11月, 2019 2 次提交
    • M
      KVM: PPC: Book3S HV: Flush link stack on guest exit to host kernel · af2e8c68
      Michael Ellerman 提交于
      On some systems that are vulnerable to Spectre v2, it is up to
      software to flush the link stack (return address stack), in order to
      protect against Spectre-RSB.
      
      When exiting from a guest we do some house keeping and then
      potentially exit to C code which is several stack frames deep in the
      host kernel. We will then execute a series of returns without
      preceeding calls, opening up the possiblity that the guest could have
      poisoned the link stack, and direct speculative execution of the host
      to a gadget of some sort.
      
      To prevent this we add a flush of the link stack on exit from a guest.
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      af2e8c68
    • M
      powerpc/book3s64: Fix link stack flush on context switch · 39e72bf9
      Michael Ellerman 提交于
      In commit ee13cb24 ("powerpc/64s: Add support for software count
      cache flush"), I added support for software to flush the count
      cache (indirect branch cache) on context switch if firmware told us
      that was the required mitigation for Spectre v2.
      
      As part of that code we also added a software flush of the link
      stack (return address stack), which protects against Spectre-RSB
      between user processes.
      
      That is all correct for CPUs that activate that mitigation, which is
      currently Power9 Nimbus DD2.3.
      
      What I got wrong is that on older CPUs, where firmware has disabled
      the count cache, we also need to flush the link stack on context
      switch.
      
      To fix it we create a new feature bit which is not set by firmware,
      which tells us we need to flush the link stack. We set that when
      firmware tells us that either of the existing Spectre v2 mitigations
      are enabled.
      
      Then we adjust the patching code so that if we see that feature bit we
      enable the link stack flush. If we're also told to flush the count
      cache in software then we fall through and do that also.
      
      On the older CPUs we don't need to do do the software count cache
      flush, firmware has disabled it, so in that case we patch in an early
      return after the link stack flush.
      
      The naming of some of the functions is awkward after this patch,
      because they're called "count cache" but they also do link stack. But
      we'll fix that up in a later commit to ease backporting.
      
      This is the fix for CVE-2019-18660.
      Reported-by: NAnthony Steinhauser <asteinhauser@google.com>
      Fixes: ee13cb24 ("powerpc/64s: Add support for software count cache flush")
      Cc: stable@vger.kernel.org # v4.4+
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      39e72bf9
  16. 13 11月, 2019 5 次提交