1. 05 7月, 2016 5 次提交
  2. 14 6月, 2016 1 次提交
  3. 17 5月, 2016 3 次提交
    • A
      perf core: Add perf_callchain_store_context() helper · 3e4de4ec
      Arnaldo Carvalho de Melo 提交于
      We need have different helpers to account how many contexts we have in
      the sample and for real addresses, so do it now as a prep patch, to
      ease review.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-q964tnyuqrxw5gld18vizs3c@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3e4de4ec
    • A
      perf core: Add a 'nr' field to perf_event_callchain_context · 3b1fff08
      Arnaldo Carvalho de Melo 提交于
      We will use it to count how many addresses are in the entry->ip[] array,
      excluding PERF_CONTEXT_{KERNEL,USER,etc} entries, so that we can really
      return the number of entries specified by the user via the relevant
      sysctl, kernel.perf_event_max_contexts, or via the per event
      perf_event_attr.sample_max_stack knob.
      
      This way we keep the perf_sample->ip_callchain->nr meaning, that is the
      number of entries, be it real addresses or PERF_CONTEXT_ entries, while
      honouring the max_stack knobs, i.e. the end result will be max_stack
      entries if we have at least that many entries in a given stack trace.
      
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-s8teto51tdqvlfhefndtat9r@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3b1fff08
    • A
      perf core: Pass max stack as a perf_callchain_entry context · cfbcf468
      Arnaldo Carvalho de Melo 提交于
      This makes perf_callchain_{user,kernel}() receive the max stack
      as context for the perf_callchain_entry, instead of accessing
      the global sysctl_perf_event_max_stack.
      
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: He Kuang <hekuang@huawei.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: Milian Wolff <milian.wolff@kdab.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: Wang Nan <wangnan0@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/n/tip-kolmn1yo40p7jhswxwrc7rrd@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      cfbcf468
  4. 01 5月, 2016 1 次提交
  5. 27 4月, 2016 2 次提交
  6. 21 4月, 2016 1 次提交
  7. 10 3月, 2016 7 次提交
    • M
      powerpc/perf: Fix misleading comment in pmao_restore_workaround() · 58bffb5b
      Madhavan Srinivasan 提交于
      The current comment in pmao_restore_workaround() regarding
      hard_irq_disable() is wrong. It should say to hard *disable* interrupts
      instead of *enable*. Fix it.
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      58bffb5b
    • S
      powerpc/perf/24x7: Eliminate domain suffix in event names · 8f69dc70
      Sukadev Bhattiprolu 提交于
      The Physical Core events of the 24x7 PMU can be monitored across various
      domains (physical core, vcpu home core, vcpu home node etc). For each of
      these core events, we currently create multiple events in sysfs, one for
      each domain the event can be monitored in. These events are distinguished
      by their suffixes like __PHYS_CORE, __VCPU_HOME_CORE etc.
      
      Rather than creating multiple such entries, we could let the user specify
      make 'domain' index a required parameter and let the user specify a value
      for it (like they currently specify the core index).
      
      	$ cat /sys/bus/event_source/devices/hv_24x7/events/HPM_CCYC
      	domain=?,offset=0x98,core=?,lpar=0x0
      
      	$ perf stat -C 0 -e hv_24x7/HPM_CCYC,domain=2,core=1/ true
      
      (the 'domain=?' and 'core=?' in sysfs tell perf tool to enforce them as
      required parameters).
      
      This simplifies the interface and allows users to identify events by the
      name specified in the catalog (User can determine the domain index by
      referring to '/sys/bus/event_source/devices/hv_24x7/interface/domains').
      
      Eliminating the event suffix eliminates several functions and simplifies
      code.
      
      Note that Physical Chip events can only be monitored in the chip domain
      so those events have the domain set to 1 (rather than =?) and users don't
      need to specify the domain index for the Chip events.
      
      	$ cat /sys/bus/event_source/devices/hv_24x7/events/PM_XLINK_CYCLES
      	domain=1,offset=0x230,chip=?,lpar=0x0
      
      	$ perf stat -C 0 -e hv_24x7/PM_XLINK_CYCLES,chip=1/ true
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      8f69dc70
    • S
      powerpc/perf/hv-24x7: Display domain indices in sysfs · d34171e8
      Sukadev Bhattiprolu 提交于
      To help users determine domains, display the domain indices used by the
      kernel in sysfs.
      
      	$ cat /sys/bus/event_source/devices/hv_24x7/interface/domains
      	1: Physical Chip
      	2: Physical Core
      	3: VCPU Home Core
      	4: VCPU Home Chip
      	5: VCPU Home Node
      	6: VCPU Remote Node
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d34171e8
    • S
      powerpc/perf/hv-24x7: Display change in counter values · 2b206ee6
      Sukadev Bhattiprolu 提交于
      For 24x7 counters, perf displays the raw value of the 24x7 counter, which
      is a monotonically increasing value.
      
      	perf stat -C 0 -e \
      		'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \
      		sleep 1
      
       Performance counter stats for 'CPU(s) 0':
      
           9,105,403,170      hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/
      
             0.000425751 seconds time elapsed
      
      In the typical usage of 'perf stat' this counter value is not as useful
      as the _change_ in the counter value over the duration of the application.
      
      Have h_24x7_event_init() set the event's prev_count to the raw value of
      the 24x7 counter at the time of initialization. When the application
      terminates, hv_24x7_event_read() will compute the change in value and
      report to the perf tool. Similarly, for the transaction interface, clear
      the event count to 0 at the beginning of the transaction.
      
      	perf stat -C 0 -e \
      		'hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/' \
      		sleep 1
      
       Performance counter stats for 'CPU(s) 0':
      
                 245,758      hv_24x7/HPM_0THRD_NON_IDLE_CCYC__PHYS_CORE,core=1/
      
             1.006366383 seconds time elapsed
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      2b206ee6
    • S
      powerpc/perf/hv-24x7: Fix usage with chip events. · e5a5886d
      Sukadev Bhattiprolu 提交于
      24x7 counters can belong to different domains (core, chip, virtual CPU
      etc). For events in the 'chip' domain, sysfs entry currently looks like:
      
      	$ cd /sys/bus/event_source/devices/hv_24x7/events
      	$ cat PM_XLINK_CYCLES__PHYS_CHIP
      	domain=0x1,offset=0x230,core=?,lpar=0x0
      
      where the required parameter, 'core=?' is specified with perf as:
      
      	perf stat -C 0 -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,core=1/ \
      		/bin/true
      
      This is inconsistent in that 'core' is a required parameter for a chip
      event.  Instead, have the the sysfs entry display 'chip=?' for chip
      events:
      
      	$ cd /sys/bus/event_source/devices/hv_24x7/events
      	$ cat PM_XLINK_CYCLES__PHYS_CHIP
      	domain=0x1,offset=0x230,chip=?,lpar=0x0
      
      We also need to add a 'chip' entry in the sysfs format directory:
      
      	$ ls /sys/bus/event_source/devices/hv_24x7/format
      	chip  core  domain  lpar  offset  vcpu
      	^^^^
      	(new)
      
      so the perf tool can automatically check usage and format the chip
      parameter correctly:
      
      	$ perf stat -C 0 -v -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP/ \
      		/bin/true
      	Required parameter 'chip' not specified
      	invalid or unsupported event: 'hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP/'
      
      	$ perf stat -C 0 -v -e hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/ \
      		/bin/true
      	hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/: 0 6628908 6628908
      
      	 Performance counter stats for 'CPU(s) 0':
      
      	         0      hv_24x7/PM_XLINK_CYCLES__PHYS_CHIP,chip=1/
      
      	    0.006606970 seconds time elapsed
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e5a5886d
    • S
      powerpc/perf: Export Power8 generic and cache events to sysfs · e0728b50
      Sukadev Bhattiprolu 提交于
      Power8 supports a large number of events in each susbystem so when a
      user runs:
      
      	perf stat -e branch-instructions sleep 1
      	perf stat -e L1-dcache-loads sleep 1
      
      it is not clear as to which PMU events were monitored.
      
      Export the generic hardware and cache perf events for Power8 to sysfs,
      so users can precisely determine the PMU event monitored by the generic
      event.
      
      Eg:
      	cat /sys/bus/event_source/devices/cpu/events/branch-instructions
      	event=0x10068
      
      	$ cat /sys/bus/event_source/devices/cpu/events/L1-dcache-loads
      	event=0x100ee
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      e0728b50
    • S
      powerpc/perf: Remove PME_ prefix for power7 events · d4969e24
      Sukadev Bhattiprolu 提交于
      We used the PME_ prefix earlier to avoid some macro/variable name
      collisions.  We have since changed the way we define/use the event
      macros so we no longer need the prefix.
      
      By dropping the prefix, we keep the the event macros consistent with
      their official names.
      Reported-by: NMichael Ellerman <ellerman@au1.ibm.com>
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      d4969e24
  8. 01 3月, 2016 1 次提交
  9. 16 2月, 2016 1 次提交
  10. 10 2月, 2016 1 次提交
  11. 28 1月, 2016 1 次提交
    • M
      powerpc/perf: Remove PPMU_HAS_SSLOT flag for Power8 · 370f06c8
      Madhavan Srinivasan 提交于
      Commit 7a786832 ("powerpc/perf: Add an explict flag indicating
      presence of SLOT field") introduced the PPMU_HAS_SSLOT flag to remove
      the assumption that MMCRA[SLOT] was present when PPMU_ALT_SIPR was not
      set.
      
      That commit's changelog also mentions that Power8 does not support
      MMCRA[SLOT]. However when the Power8 PMU support was merged, it
      errnoeously included the PPMU_HAS_SSLOT flag.
      
      So remove PPMU_HAS_SSLOT from the Power8 flags.
      
      mpe: On systems where MMCRA[SLOT] exists, the field occupies bits 37:39
      (IBM numbering). On Power8 bit 37 is reserved, and 38:39 overlap with
      the high bits of the Threshold Event Counter Mantissa. I am not aware of
      any published events which use the threshold counting mechanism, which
      would cause the mantissa bits to be set. So in practice this bug is
      unlikely to trigger.
      
      Fixes: e05b9b9e ("powerpc/perf: Power8 PMU support")
      Signed-off-by: NMadhavan Srinivasan <maddy@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      370f06c8
  12. 20 10月, 2015 1 次提交
  13. 12 10月, 2015 1 次提交
    • A
      powerpc/mm: Differentiate between hugetlb and THP during page walk · 891121e6
      Aneesh Kumar K.V 提交于
      We need to properly identify whether a hugepage is an explicit or
      a transparent hugepage in follow_huge_addr(). We used to depend
      on hugepage shift argument to do that. But in some case that can
      result in wrong results. For ex:
      
      On finding a transparent hugepage we set hugepage shift to PMD_SHIFT.
      But we can end up clearing the thp pte, via pmdp_huge_get_and_clear.
      We do prevent reusing the pfn page via the usage of
      kick_all_cpus_sync(). But that happens after we updated the pte to 0.
      Hence in follow_huge_addr() we can find hugepage shift set, but transparent
      huge page check fail for a thp pte.
      
      NOTE: We fixed a variant of this race against thp split in commit
      691e95fd
      ("powerpc/mm/thp: Make page table walk safe against thp split/collapse")
      
      Without this patch, we may hit the BUG_ON(flags & FOLL_GET) in
      follow_page_mask occasionally.
      
      In the long term, we may want to switch ppc64 64k page size config to
      enable CONFIG_ARCH_WANT_GENERAL_HUGETLB
      Reported-by: NDavid Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      891121e6
  14. 13 9月, 2015 3 次提交
    • S
      perf/core: Drop PERF_EVENT_TXN · 8f3e5684
      Sukadev Bhattiprolu 提交于
      We currently use PERF_EVENT_TXN flag to determine if we are in the middle
      of a transaction. If in a transaction, we defer the schedulability checks
      from pmu->add() operation to the pmu->commit() operation.
      
      Now that we have "transaction types" (PERF_PMU_TXN_ADD, PERF_PMU_TXN_READ)
      we can use the type to determine if we are in a transaction and drop the
      PERF_EVENT_TXN flag.
      
      When PERF_EVENT_TXN is dropped, the cpuhw->group_flag on some architectures
      becomes unused, so drop that field as well.
      
      This is an extension of the Powerpc patch from Peter Zijlstra to s390,
      Sparc and x86 architectures.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1441336073-22750-11-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8f3e5684
    • S
      powerpc, perf/powerpc/hv-24x7: Use PMU_TXN_READ interface · 88a48613
      Sukadev Bhattiprolu 提交于
      The 24x7 counters in Powerpc allow monitoring a large number of counters
      simultaneously. They also allow reading several counters in a single
      HCALL so we can get a more consistent snapshot of the system.
      
      Use the PMU's transaction interface to monitor and read several event
      counters at once. The idea is that users can group several 24x7 events
      into a single group of events. We use the following logic to submit
      the group of events to the PMU and read the values:
      
      	pmu->start_txn()		// Initialize before first event
      
      	for each event in group
      		pmu->read(event);	// Queue each event to be read
      
      	pmu->commit_txn()		// Read/update all queuedcounters
      
      The ->commit_txn() also updates the event counts in the respective
      perf_event objects.  The perf subsystem can then directly get the
      event counts from the perf_event and can avoid submitting a new
      ->read() request to the PMU.
      
      Thanks to input from Peter Zijlstra.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1441336073-22750-10-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      88a48613
    • S
      perf/core: Add a 'flags' parameter to the PMU transactional interfaces · fbbe0701
      Sukadev Bhattiprolu 提交于
      Currently, the PMU interface allows reading only one counter at a time.
      But some PMUs like the 24x7 counters in Power, support reading several
      counters at once. To leveage this functionality, extend the transaction
      interface to support a "transaction type".
      
      The first type, PERF_PMU_TXN_ADD, refers to the existing transactions,
      i.e. used to _schedule_ all the events on the PMU as a group. A second
      transaction type, PERF_PMU_TXN_READ, will be used in a follow-on patch,
      by the 24x7 counters to read several counters at once.
      
      Extend the transaction interfaces to the PMU to accept a 'txn_flags'
      parameter and use this parameter to ignore any transactions that are
      not of type PERF_PMU_TXN_ADD.
      
      Thanks to Peter Zijlstra for his input.
      Signed-off-by: NSukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
      [peterz: s390 compile fix]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1441336073-22750-3-git-send-email-sukadev@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fbbe0701
  15. 27 7月, 2015 1 次提交
  16. 25 7月, 2015 2 次提交
  17. 08 7月, 2015 1 次提交
  18. 02 6月, 2015 1 次提交
    • A
      powerpc/perf: Fix book3s kernel to userspace backtraces · 72e349f1
      Anton Blanchard 提交于
      When we take a PMU exception or a software event we call
      perf_read_regs(). This overloads regs->result with a boolean that
      describes if we should use the sampled instruction address register
      (SIAR) or the regs.
      
      If the exception is in kernel, we start with the kernel regs and
      backtrace through the kernel stack. At this point we switch to the
      userspace regs and backtrace the user stack with perf_callchain_user().
      
      Unfortunately these regs have not got the perf_read_regs() treatment,
      so regs->result could be anything. If it is non zero,
      perf_instruction_pointer() decides to use the SIAR, and we get issues
      like this:
      
      0.11%  qemu-system-ppc  [kernel.kallsyms]        [k] _raw_spin_lock_irqsave
             |
             ---_raw_spin_lock_irqsave
                |
                |--52.35%-- 0
                |          |
                |          |--46.39%-- __hrtimer_start_range_ns
                |          |          kvmppc_run_core
                |          |          kvmppc_vcpu_run_hv
                |          |          kvmppc_vcpu_run
                |          |          kvm_arch_vcpu_ioctl_run
                |          |          kvm_vcpu_ioctl
                |          |          do_vfs_ioctl
                |          |          sys_ioctl
                |          |          system_call
                |          |          |
                |          |          |--67.08%-- _raw_spin_lock_irqsave <--- hi mum
                |          |          |          |
                |          |          |           --100.00%-- 0x7e714
                |          |          |                     0x7e714
      
      Notice the bogus _raw_spin_irqsave when we transition from kernel
      (system_call) to userspace (0x7e714). We inserted what was in the SIAR.
      
      Add a check in regs_use_siar() to check that the regs in question
      are from a PMU exception. With this fix the backtrace makes sense:
      
           0.47%  qemu-system-ppc  [kernel.vmlinux]         [k] _raw_spin_lock_irqsave
                  |
                  ---_raw_spin_lock_irqsave
                     |
                     |--53.83%-- 0
                     |          |
                     |          |--44.73%-- hrtimer_try_to_cancel
                     |          |          kvmppc_start_thread
                     |          |          kvmppc_run_core
                     |          |          kvmppc_vcpu_run_hv
                     |          |          kvmppc_vcpu_run
                     |          |          kvm_arch_vcpu_ioctl_run
                     |          |          kvm_vcpu_ioctl
                     |          |          do_vfs_ioctl
                     |          |          sys_ioctl
                     |          |          system_call
                     |          |          __ioctl
                     |          |          0x7e714
                     |          |          0x7e714
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      72e349f1
  19. 17 4月, 2015 1 次提交
    • A
      powerpc/mm/thp: Make page table walk safe against thp split/collapse · 691e95fd
      Aneesh Kumar K.V 提交于
      We can disable a THP split or a hugepage collapse by disabling irq.
      We do send IPI to all the cpus in the early part of split/collapse,
      and disabling local irq ensure we don't make progress with
      split/collapse. If the THP is getting split we return NULL from
      find_linux_pte_or_hugepte(). For all the current callers it should be ok.
      We need to be careful if we want to use returned pte_t pointer outside
      the irq disabled region. W.r.t to THP split, the pfn remains the same,
      but then a hugepage collapse will result in a pfn change. There are
      few steps we can take to avoid a hugepage collapse.One way is to take page
      reference inside the irq disable region. Other option is to take
      mmap_sem so that a parallel collapse will not happen. We can also
      disable collapse by taking pmd_lock. Another method used by kvm
      subsystem is to check whether we had a mmu_notifer update in between
      using mmu_notifier_retry().
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      691e95fd
  20. 14 4月, 2015 2 次提交
  21. 11 4月, 2015 3 次提交