1. 02 9月, 2020 1 次提交
  2. 27 8月, 2020 2 次提交
  3. 24 8月, 2020 1 次提交
  4. 21 8月, 2020 1 次提交
  5. 20 8月, 2020 1 次提交
  6. 17 8月, 2020 2 次提交
  7. 30 7月, 2020 1 次提交
  8. 27 7月, 2020 1 次提交
    • N
      powerpc/64s/hash: Fix hash_preload running with interrupts enabled · 909adfc6
      Nicholas Piggin 提交于
      Commit 2f92447f ("powerpc/book3s64/hash: Use the pte_t address from the
      caller") removed the local_irq_disable from hash_preload, but it was
      required for more than just the page table walk: the hash pte busy bit is
      effectively a lock which may be taken in interrupt context, and the local
      update flag test must not be preempted before it's used.
      
      This solves apparent lockups with perf interrupting __hash_page_64K. If
      get_perf_callchain then also takes a hash fault on the same page while it
      is already locked, it will loop forever taking hash faults, which looks like
      this:
      
        cpu 0x49e: Vector: 100 (System Reset) at [c00000001a4f7d70]
            pc: c000000000072dc8: hash_page_mm+0x8/0x800
            lr: c00000000000c5a4: do_hash_page+0x24/0x38
            sp: c0002ac1cc69ac70
           msr: 8000000000081033
          current = 0xc0002ac1cc602e00
          paca    = 0xc00000001de1f280   irqmask: 0x03   irq_happened: 0x01
            pid   = 20118, comm = pread2_processe
        Linux version 5.8.0-rc6-00345-g1fad14f18bc6
        49e:mon> t
        [c0002ac1cc69ac70] c00000000000c5a4 do_hash_page+0x24/0x38 (unreliable)
        --- Exception: 300 (Data Access) at c00000000008fa60 __copy_tofrom_user_power7+0x20c/0x7ac
        [link register   ] c000000000335d10 copy_from_user_nofault+0xf0/0x150
        [c0002ac1cc69af70] c00032bf9fa3c880 (unreliable)
        [c0002ac1cc69afa0] c000000000109df0 read_user_stack_64+0x70/0xf0
        [c0002ac1cc69afd0] c000000000109fcc perf_callchain_user_64+0x15c/0x410
        [c0002ac1cc69b060] c000000000109c00 perf_callchain_user+0x20/0x40
        [c0002ac1cc69b080] c00000000031c6cc get_perf_callchain+0x25c/0x360
        [c0002ac1cc69b120] c000000000316b50 perf_callchain+0x70/0xa0
        [c0002ac1cc69b140] c000000000316ddc perf_prepare_sample+0x25c/0x790
        [c0002ac1cc69b1a0] c000000000317350 perf_event_output_forward+0x40/0xb0
        [c0002ac1cc69b220] c000000000306138 __perf_event_overflow+0x88/0x1a0
        [c0002ac1cc69b270] c00000000010cf70 record_and_restart+0x230/0x750
        [c0002ac1cc69b620] c00000000010d69c perf_event_interrupt+0x20c/0x510
        [c0002ac1cc69b730] c000000000027d9c performance_monitor_exception+0x4c/0x60
        [c0002ac1cc69b750] c00000000000b2f8 performance_monitor_common_virt+0x1b8/0x1c0
        --- Exception: f00 (Performance Monitor) at c0000000000cb5b0 pSeries_lpar_hpte_insert+0x0/0x160
        [link register   ] c0000000000846f0 __hash_page_64K+0x210/0x540
        [c0002ac1cc69ba50] 0000000000000000 (unreliable)
        [c0002ac1cc69bb00] c000000000073ae0 update_mmu_cache+0x390/0x3a0
        [c0002ac1cc69bb70] c00000000037f024 wp_page_copy+0x364/0xce0
        [c0002ac1cc69bc20] c00000000038272c do_wp_page+0xdc/0xa60
        [c0002ac1cc69bc70] c0000000003857bc handle_mm_fault+0xb9c/0x1b60
        [c0002ac1cc69bd50] c00000000006c434 __do_page_fault+0x314/0xc90
        [c0002ac1cc69be20] c00000000000c5c8 handle_page_fault+0x10/0x2c
        --- Exception: 300 (Data Access) at 00007fff8c861fe8
        SP (7ffff6b19660) is in userspace
      
      Fixes: 2f92447f ("powerpc/book3s64/hash: Use the pte_t address from the caller")
      Reported-by: NAthira Rajeev <atrajeev@linux.vnet.ibm.com>
      Reported-by: NAnton Blanchard <anton@ozlabs.org>
      Signed-off-by: NNicholas Piggin <npiggin@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200727060947.10060-1-npiggin@gmail.com
      909adfc6
  9. 22 7月, 2020 7 次提交
  10. 21 7月, 2020 1 次提交
  11. 16 7月, 2020 3 次提交
  12. 18 6月, 2020 2 次提交
  13. 10 6月, 2020 1 次提交
    • M
      mm: don't include asm/pgtable.h if linux/mm.h is already included · e31cf2f4
      Mike Rapoport 提交于
      Patch series "mm: consolidate definitions of page table accessors", v2.
      
      The low level page table accessors (pXY_index(), pXY_offset()) are
      duplicated across all architectures and sometimes more than once.  For
      instance, we have 31 definition of pgd_offset() for 25 supported
      architectures.
      
      Most of these definitions are actually identical and typically it boils
      down to, e.g.
      
      static inline unsigned long pmd_index(unsigned long address)
      {
              return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1);
      }
      
      static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
      {
              return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address);
      }
      
      These definitions can be shared among 90% of the arches provided
      XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined.
      
      For architectures that really need a custom version there is always
      possibility to override the generic version with the usual ifdefs magic.
      
      These patches introduce include/linux/pgtable.h that replaces
      include/asm-generic/pgtable.h and add the definitions of the page table
      accessors to the new header.
      
      This patch (of 12):
      
      The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the
      functions involving page table manipulations, e.g.  pte_alloc() and
      pmd_alloc().  So, there is no point to explicitly include <asm/pgtable.h>
      in the files that include <linux/mm.h>.
      
      The include statements in such cases are remove with a simple loop:
      
      	for f in $(git grep -l "include <linux/mm.h>") ; do
      		sed -i -e '/include <asm\/pgtable.h>/ d' $f
      	done
      Signed-off-by: NMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Cain <bcain@codeaurora.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ley Foon Tan <ley.foon.tan@intel.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Nick Hu <nickhu@andestech.com>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vincent Chen <deanbo422@gmail.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e31cf2f4
  14. 09 6月, 2020 1 次提交
    • S
      mm/gup.c: convert to use get_user_{page|pages}_fast_only() · dadbb612
      Souptick Joarder 提交于
      API __get_user_pages_fast() renamed to get_user_pages_fast_only() to
      align with pin_user_pages_fast_only().
      
      As part of this we will get rid of write parameter.  Instead caller will
      pass FOLL_WRITE to get_user_pages_fast_only().  This will not change any
      existing functionality of the API.
      
      All the callers are changed to pass FOLL_WRITE.
      
      Also introduce get_user_page_fast_only(), and use it in a few places
      that hard-code nr_pages to 1.
      
      Updated the documentation of the API.
      Signed-off-by: NSouptick Joarder <jrdr.linux@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: Paul Mackerras <paulus@ozlabs.org>		[arch/powerpc/kvm]
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Mike Rapoport <rppt@linux.ibm.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Michal Suchanek <msuchanek@suse.de>
      Link: http://lkml.kernel.org/r/1590396812-31277-1-git-send-email-jrdr.linux@gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dadbb612
  15. 28 5月, 2020 3 次提交
    • K
      powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show processor details · 60beb65d
      Kajol Jain 提交于
      To expose the system dependent parameter like total number of
      sockets and numbers of chips per socket, patch adds two sysfs files.
      "sockets" and "chips" are added to /sys/devices/hv_24x7/interface/
      of the "hv_24x7" pmu.
      Signed-off-by: NKajol Jain <kjain@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200525104308.9814-4-kjain@linux.ibm.com
      60beb65d
    • K
      powerpc/hv-24x7: Add rtas call in hv-24x7 driver to get processor details · 8ba21426
      Kajol Jain 提交于
      For hv_24x7 socket/chip level events, specific chip-id to which
      the data requested should be added as part of pmu events.
      But number of chips/socket in the system details are not exposed.
      
      Patch implements read_24x7_sys_info() to get system parameter values
      like number of sockets, cores per chip and chips per socket. Rtas_call
      with token "PROCESSOR_MODULE_INFO" is used to get these values.
      
      Subsequent patch exports these values via sysfs.
      
      Patch also make these parameters default to 1.
      Signed-off-by: NKajol Jain <kjain@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200525104308.9814-3-kjain@linux.ibm.com
      8ba21426
    • K
      powerpc/perf/hv-24x7: Fix inconsistent output values incase multiple hv-24x7 events run · b4ac18ee
      Kajol Jain 提交于
      Commit 2b206ee6 ("powerpc/perf/hv-24x7: Display change in counter
      values")' added to print _change_ in the counter value rather then raw
      value for 24x7 counters. Incase of transactions, the event count
      is set to 0 at the beginning of the transaction. It also sets
      the event's prev_count to the raw value at the time of initialization.
      Because of setting event count to 0, we are seeing some weird behaviour,
      whenever we run multiple 24x7 events at a time.
      
      For example:
      
      command#: ./perf stat -e "{hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/,
      			   hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/}"
      	  		   -C 0 -I 1000 sleep 100
      
           1.000121704                120 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
           1.000121704                  5 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
           2.000357733                  8 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
           2.000357733                 10 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
           3.000495215 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
           3.000495215 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
           4.000641884                 56 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
           4.000641884 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
           5.000791887 18,446,744,073,709,551,616 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
      
      Getting these large values in case we do -I.
      
      As we are setting event_count to 0, for interval case, overall event_count is not
      coming in incremental order. As we may can get new delta lesser then previous count.
      Because of which when we print intervals, we are getting negative value which create
      these large values.
      
      This patch removes part where we set event_count to 0 in function
      'h_24x7_event_read'. There won't be much impact as we do set event->hw.prev_count
      to the raw value at the time of initialization to print change value.
      
      With this patch
      In power9 platform
      
      command#: ./perf stat -e "{hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/,
      		           hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/}"
      			   -C 0 -I 1000 sleep 100
      
           1.000117685                 93 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
           1.000117685                  1 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
           2.000349331                 98 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
           2.000349331                  2 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
           3.000495900                131 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
           3.000495900                  4 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
           4.000645920                204 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
           4.000645920                 61 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=1/
           4.284169997                 22 hv_24x7/PM_MCS01_128B_RD_DISP_PORT01,chip=0/
      Suggested-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NKajol Jain <kjain@linux.ibm.com>
      Tested-by: NMadhavan Srinivasan <maddy@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200525104308.9814-2-kjain@linux.ibm.com
      b4ac18ee
  16. 26 5月, 2020 1 次提交
  17. 18 5月, 2020 2 次提交
  18. 05 5月, 2020 1 次提交
    • A
      powerpc/perf/callchain: Use __get_user_pages_fast in read_user_stack_slow · 15759cb0
      Aneesh Kumar K.V 提交于
      read_user_stack_slow is called with interrupts soft disabled and it copies contents
      from the page which we find mapped to a specific address. To convert
      userspace address to pfn, the kernel now uses lockless page table walk.
      
      The kernel needs to make sure the pfn value read remains stable and is not released
      and reused for another process while the contents are read from the page. This
      can only be achieved by holding a page reference.
      
      One of the first approaches I tried was to check the pte value after the kernel
      copies the contents from the page. But as shown below we can still get it wrong
      
      CPU0                           CPU1
      pte = READ_ONCE(*ptep);
                                     pte_clear(pte);
                                     put_page(page);
                                     page = alloc_page();
                                     memcpy(page_address(page), "secret password", nr);
      memcpy(buf, kaddr + offset, nb);
                                     put_page(page);
                                     handle_mm_fault()
                                     page = alloc_page();
                                     set_pte(pte, page);
      if (pte_val(pte) != pte_val(*ptep))
      
      Hence switch to __get_user_pages_fast.
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Link: https://lore.kernel.org/r/20200505071729.54912-8-aneesh.kumar@linux.ibm.com
      15759cb0
  19. 16 4月, 2020 1 次提交
    • A
      powerpc/perf: open access for CAP_PERFMON privileged process · ff467583
      Alexey Budankov 提交于
      Open access to monitoring for CAP_PERFMON privileged process.  Providing
      the access under CAP_PERFMON capability singly, without the rest of
      CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials
      and makes operation more secure.
      
      CAP_PERFMON implements the principle of least privilege for performance
      monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39
      principle of least privilege: A security design principle that states
      that a process or program be granted only those privileges (e.g.,
      capabilities) necessary to accomplish its legitimate function, and only
      for the time that such privileges are actually required)
      
      For backward compatibility reasons access to the monitoring remains open
      for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for
      secure monitoring is discouraged with respect to CAP_PERFMON capability.
      Signed-off-by: NAlexey Budankov <alexey.budankov@linux.intel.com>
      Reviewed-by: NJames Morris <jamorris@linux.microsoft.com>
      Acked-by: NAnju T Sudhakar <anju@linux.vnet.ibm.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Igor Lubashev <ilubashe@akamai.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Serge Hallyn <serge@hallyn.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: intel-gfx@lists.freedesktop.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-man@vger.kernel.org
      Cc: linux-security-module@vger.kernel.org
      Cc: selinux@vger.kernel.org
      Link: http://lore.kernel.org/lkml/ac98cd9f-b59e-673c-c70d-180b3e7695d2@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ff467583
  20. 02 4月, 2020 5 次提交
  21. 11 2月, 2020 1 次提交
    • K
      perf/core: Add new branch sample type for HW index of raw branch records · bbfd5e4f
      Kan Liang 提交于
      The low level index is the index in the underlying hardware buffer of
      the most recently captured taken branch which is always saved in
      branch_entries[0]. It is very useful for reconstructing the call stack.
      For example, in Intel LBR call stack mode, the depth of reconstructed
      LBR call stack limits to the number of LBR registers. With the low level
      index information, perf tool may stitch the stacks of two samples. The
      reconstructed LBR call stack can break the HW limitation.
      
      Add a new branch sample type to retrieve low level index of raw branch
      records. The low level index is between -1 (unknown) and max depth which
      can be retrieved in /sys/devices/cpu/caps/branches.
      
      Only when the new branch sample type is set, the low level index
      information is dumped into the PERF_SAMPLE_BRANCH_STACK output.
      Perf tool should check the attr.branch_sample_type, and apply the
      corresponding format for PERF_SAMPLE_BRANCH_STACK samples.
      Otherwise, some user case may be broken. For example, users may parse a
      perf.data, which include the new branch sample type, with an old version
      perf tool (without the check). Users probably get incorrect information
      without any warning.
      Signed-off-by: NKan Liang <kan.liang@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      Link: https://lkml.kernel.org/r/20200127165355.27495-2-kan.liang@linux.intel.com
      bbfd5e4f
  22. 27 1月, 2020 1 次提交