1. 07 8月, 2018 18 次提交
  2. 31 7月, 2018 3 次提交
  3. 30 7月, 2018 15 次提交
  4. 24 7月, 2018 4 次提交
    • N
      powerpc/powernv: implement opal_put_chars_atomic · 17cc1dd4
      Nicholas Piggin 提交于
      The RAW console does not need writes to be atomic, so relax
      opal_put_chars to be able to do partial writes, and implement an
      _atomic variant which does not take a spinlock. This API is used
      in xmon, so the less locking that is used, the better chance there
      is that a crash can be debugged.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      17cc1dd4
    • N
      powerpc/powernv: Implement and use opal_flush_console · d2a2262e
      Nicholas Piggin 提交于
      A new console flushing firmware API was introduced to replace event
      polling loops, and implemented in opal-kmsg with affddff6
      ("powerpc/powernv: Add a kmsg_dumper that flushes console output on
      panic"), to flush the console in the panic path.
      
      The OPAL console driver has other situations where interrupts are off
      and it needs to flush the console synchronously. These still use a
      polling loop.
      
      So move the opal-kmsg flush code to opal_flush_console, and use the
      new function in opal-kmsg and opal_put_chars.
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Reviewed-by: NRussell Currey <ruscur@russell.cc>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d2a2262e
    • S
      powerpc/64: enhance memcmp() with VMX instruction for long bytes comparision · d58badfb
      Simon Guo 提交于
      This patch add VMX primitives to do memcmp() in case the compare size
      is equal or greater than 4K bytes. KSM feature can benefit from this.
      
      Test result with following test program(replace the "^>" with ""):
      ------
      ># cat tools/testing/selftests/powerpc/stringloops/memcmp.c
      >#include <malloc.h>
      >#include <stdlib.h>
      >#include <string.h>
      >#include <time.h>
      >#include "utils.h"
      >#define SIZE (1024 * 1024 * 900)
      >#define ITERATIONS 40
      
      int test_memcmp(const void *s1, const void *s2, size_t n);
      
      static int testcase(void)
      {
              char *s1;
              char *s2;
              unsigned long i;
      
              s1 = memalign(128, SIZE);
              if (!s1) {
                      perror("memalign");
                      exit(1);
              }
      
              s2 = memalign(128, SIZE);
              if (!s2) {
                      perror("memalign");
                      exit(1);
              }
      
              for (i = 0; i < SIZE; i++)  {
                      s1[i] = i & 0xff;
                      s2[i] = i & 0xff;
              }
              for (i = 0; i < ITERATIONS; i++) {
      		int ret = test_memcmp(s1, s2, SIZE);
      
      		if (ret) {
      			printf("return %d at[%ld]! should have returned zero\n", ret, i);
      			abort();
      		}
      	}
      
              return 0;
      }
      
      int main(void)
      {
              return test_harness(testcase, "memcmp");
      }
      ------
      Without this patch (but with the first patch "powerpc/64: Align bytes
      before fall back to .Lshort in powerpc64 memcmp()." in the series):
      	4.726728762 seconds time elapsed                                          ( +-  3.54%)
      With VMX patch:
      	4.234335473 seconds time elapsed                                          ( +-  2.63%)
      		There is ~+10% improvement.
      
      Testing with unaligned and different offset version (make s1 and s2 shift
      random offset within 16 bytes) can archieve higher improvement than 10%..
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d58badfb
    • S
      powerpc: add vcmpequd/vcmpequb ppc instruction macro · f1ecbaf4
      Simon Guo 提交于
      Some old tool chains don't know about instructions like vcmpequd.
      
      This patch adds .long macro for vcmpequd and vcmpequb, which is
      a preparation to optimize ppc64 memcmp with VMX instructions.
      Signed-off-by: NSimon Guo <wei.guo.simon@gmail.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      f1ecbaf4