1. 15 6月, 2011 8 次提交
    • J
      tracing, function_graph: Merge overhead and duration display functions · ffeb80fc
      Jiri Olsa 提交于
      Functions print_graph_overhead() and print_graph_duration() displays
      data for one field - DURATION.
      
      I merged them into single function print_graph_duration(),
      and added a way to display the empty parts of the field.
      
      This way the print_graph_irq() function can use this column to display
      the IRQ signs if needed and the DURATION field details stays inside
      the print_graph_duration() function.
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-3-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      ffeb80fc
    • J
      tracing, function_graph: Remove dependency of abstime and duration fields on latency · 321e68b0
      Jiri Olsa 提交于
      The display of absolute time and duration fields is based on the
      latency field. This was added during the irqsoff/wakeup tracers
      graph support changes.
      
      It's causing confusion in what fields will be displayed for the
      function_graph tracer itself. So I'm removing this depency, and
      adding absolute time and duration fields to the preemptirqsoff
      preemptoff irqsoff wakeup tracers.
      
      With following commands:
      	# echo function_graph > ./current_tracer
      	# cat trace
      
      This is what it looked like before:
      # tracer: function_graph
      #
      #     TIME        CPU  DURATION                  FUNCTION CALLS
      #      |          |     |   |                     |   |   |   |
       0)   0.068 us    |          } /* page_add_file_rmap */
       0)               |          _raw_spin_unlock() {
      ...
      
      This is what it looks like now:
      # tracer: function_graph
      #
      # CPU  DURATION                  FUNCTION CALLS
      # |     |   |                     |   |   |   |
       0)   0.068 us    |                } /* add_preempt_count */
       0)   0.993 us    |              } /* vfsmount_lock_local_lock */
      ...
      
      For preemptirqsoff preemptoff irqsoff wakeup tracers,
      this is what it looked like before:
      SNIP
      #                       _-----=> irqs-off
      #                      / _----=> need-resched
      #                     | / _---=> hardirq/softirq
      #                     || / _--=> preempt-depth
      #                     ||| / _-=> lock-depth
      #                     |||| /
      # CPU  TASK/PID       |||||  DURATION                  FUNCTION CALLS
      # |     |    |        |||||   |   |                     |   |   |   |
       1)    <idle>-0    |  d..1  0.000 us    |  acpi_idle_enter_simple();
      ...
      
      This is what it looks like now:
      SNIP
      #
      #                                       _-----=> irqs-off
      #                                      / _----=> need-resched
      #                                     | / _---=> hardirq/softirq
      #                                     || / _--=> preempt-depth
      #                                     ||| /
      #     TIME        CPU  TASK/PID       ||||  DURATION                  FUNCTION CALLS
      #      |          |     |    |        ||||   |   |                     |   |   |   |
         19.847735 |   1)    <idle>-0    |  d..1  0.000 us    |  acpi_idle_enter_simple();
      ...
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Link: http://lkml.kernel.org/r/1307113131-10045-2-git-send-email-jolsa@redhat.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      321e68b0
    • P
      async: Fixed an include coding style issue · 84c15027
      Paul McQuade 提交于
      Added <linux/atomic.h>,<linux/ktime.h> and Removed <asm/atomic.h>.
      Added KERN_DEBUG to printk() functions.
      Acked-by: NArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NPaul McQuade <tungstentide@gmail.com>
      Link: http://lkml.kernel.org/r/4DE596B4.7030904@gmail.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      84c15027
    • P
      ftrace: Fixed an include coding style issue · bd38c0e6
      Paul McQuade 提交于
      Removed <asm/ftrace.h> because <linux/ftrace.h> was already declared.
      Braces of struct's coding style fixed.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NPaul McQuade <tungstentide@gmail.com>
      Link: http://lkml.kernel.org/r/4DE59711.3090900@gmail.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      bd38c0e6
    • S
      tracing: Add disable_on_free option · cf30cf67
      Steven Rostedt 提交于
      Add a trace option to disable tracing on free. When this option is
      set, a write into the free_buffer file will not only shrink the
      ring buffer down to zero, but it will also disable tracing.
      
      Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      cf30cf67
    • V
      tracing: Add a proc file to stop tracing and free buffer · 4f271a2a
      Vaibhav Nagarnaik 提交于
      The proc file entry buffer_size_kb is used to set the size of tracing
      buffer. The memory to expand the buffer size is kernel memory. Consider
      a use case where tracing is handled by a user space utility, which acts
      as a gate keeper for tracing requests. In an OOM condition, tracing is
      considered a low priority task and if the utility gets killed the ring
      buffer memory cannot be released back to the kernel.
      
      This patch adds a proc file called "free_buffer" whose purpose is to
      stop tracing and free up the ring buffer when it is closed.
      
      The user space process can then set the desired size in buffer_size_kb
      file and open the fd to the "free_buffer" file. Under OOM condition, if
      the process gets killed, the kernel closes the file descriptor. The
      release handler stops the tracing and releases the kernel memory
      automatically.
      
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Link: http://lkml.kernel.org/r/1308012717-11148-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      4f271a2a
    • V
      tracing: Use NUMA allocation for per-cpu ring buffer pages · 7ea59064
      Vaibhav Nagarnaik 提交于
      The tracing ring buffer is a group of per-cpu ring buffers where
      allocation and logging is done on a per-cpu basis. The events that are
      generated on a particular CPU are logged in the corresponding buffer.
      This is to provide wait-free writes between CPUs and good NUMA node
      locality while accessing the ring buffer.
      
      However, the allocation routines consider NUMA locality only for buffer
      page metadata and not for the actual buffer page. This causes the pages
      to be allocated on the NUMA node local to the CPU where the allocation
      routine is running at the time.
      
      This patch fixes the problem by using a NUMA node specific allocation
      routine so that the pages are allocated from a NUMA node local to the
      logging CPU.
      
      I tested with the getuid_microbench from autotest. It is a simple binary
      that calls getuid() in a loop and measures the average time for the
      syscall to complete. The following command was used to test:
      $ getuid_microbench 1000000
      
      Compared the numbers found on kernel with and without this patch and
      found that logging latency decreases by 30-50 ns/call.
      tracing with non-NUMA allocation - 569 ns/call
      tracing with NUMA allocation     - 512 ns/call
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Link: http://lkml.kernel.org/r/1304470602-20366-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7ea59064
    • V
      tracing: Schedule a delayed work to call wakeup() · e7e2ee89
      Vaibhav Nagarnaik 提交于
      In using syscall tracing by concurrent processes, the wakeup() that is
      called in the event commit function causes contention on the spin lock
      of the waitqueue. I enabled sys_enter_getuid and sys_exit_getuid
      tracepoints, and by running getuid_microbench from autotest in parallel
      I found that the contention causes exponential latency increase in the
      tracing path.
      
      The autotest binary getuid_microbench calls getuid() in a tight loop for
      the given number of iterations and measures the average time required to
      complete a single invocation of syscall.
      
      The patch schedules a delayed work after 2 ms once an event commit calls
      to wake up the trace wait_queue. This removes the delay caused by
      contention on spin lock in wakeup() and amortizes the wakeup() calls
      scheduled over the 2 ms period.
      
      In the following example, the script enables the sys_enter_getuid and
      sys_exit_getuid tracepoints and runs the getuid_microbench in parallel
      with the given number of processes. The output clearly shows the latency
      increase caused by contentions.
      
      $ ~/getuid.sh 1
      1000000 calls in 0.720974253 s (720.974253 ns/call)
      
      $ ~/getuid.sh 2
      1000000 calls in 1.166457554 s (1166.457554 ns/call)
      1000000 calls in 1.168933765 s (1168.933765 ns/call)
      
      $ ~/getuid.sh 3
      1000000 calls in 1.783827516 s (1783.827516 ns/call)
      1000000 calls in 1.795553270 s (1795.553270 ns/call)
      1000000 calls in 1.796493376 s (1796.493376 ns/call)
      
      $ ~/getuid.sh 4
      1000000 calls in 4.483041796 s (4483.041796 ns/call)
      1000000 calls in 4.484165388 s (4484.165388 ns/call)
      1000000 calls in 4.484850762 s (4484.850762 ns/call)
      1000000 calls in 4.485643576 s (4485.643576 ns/call)
      
      $ ~/getuid.sh 5
      1000000 calls in 6.497521653 s (6497.521653 ns/call)
      1000000 calls in 6.502000236 s (6502.000236 ns/call)
      1000000 calls in 6.501709115 s (6501.709115 ns/call)
      1000000 calls in 6.502124100 s (6502.124100 ns/call)
      1000000 calls in 6.502936358 s (6502.936358 ns/call)
      
      After the patch, the latencies scale better.
      1000000 calls in 0.728720455 s (728.720455 ns/call)
      
      1000000 calls in 0.842782857 s (842.782857 ns/call)
      1000000 calls in 0.883803135 s (883.803135 ns/call)
      
      1000000 calls in 0.902077764 s (902.077764 ns/call)
      1000000 calls in 0.902838202 s (902.838202 ns/call)
      1000000 calls in 0.908896885 s (908.896885 ns/call)
      
      1000000 calls in 0.932523515 s (932.523515 ns/call)
      1000000 calls in 0.958009672 s (958.009672 ns/call)
      1000000 calls in 0.986188020 s (986.188020 ns/call)
      1000000 calls in 0.989771102 s (989.771102 ns/call)
      
      1000000 calls in 0.933518391 s (933.518391 ns/call)
      1000000 calls in 0.958897947 s (958.897947 ns/call)
      1000000 calls in 1.031038897 s (1031.038897 ns/call)
      1000000 calls in 1.089516025 s (1089.516025 ns/call)
      1000000 calls in 1.141998347 s (1141.998347 ns/call)
      Signed-off-by: NVaibhav Nagarnaik <vnagarnaik@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Michael Rubin <mrubin@google.com>
      Cc: David Sharp <dhsharp@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1305059241-7629-1-git-send-email-vnagarnaik@google.comSigned-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e7e2ee89
  2. 07 6月, 2011 1 次提交
  3. 04 6月, 2011 3 次提交
  4. 03 6月, 2011 12 次提交
  5. 02 6月, 2011 6 次提交
    • A
      perf evlist: Don't die if sample_{id_all|type} is invalid · c2a70653
      Arnaldo Carvalho de Melo 提交于
      Fixes two more cases where the python binding would not load:
      
      . Not finding die(), which it shouldn't anyway, not good to just stop the
        world because some particular perf.data file is invalid, just propagate
        the error to the caller.
      
      . Not finding perf_sample_size: fix it by moving it from event.c to evsel,
        where it belongs, as most cases are moving to operate on an evsel object.o
      
      One of the fixed problems:
      
      [root@emilia ~]# python
      >>> import perf
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      ImportError: /home/acme/git/build/perf/python/perf.so: undefined symbol: perf_sample_size
      >>>
      [root@emilia ~]#
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-1hkj7b2cvgbfnoizsekjb6c9@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      c2a70653
    • A
      perf python: Use exception to propagate errors · 5c6970af
      Arnaldo Carvalho de Melo 提交于
      We were using pr_debug to tell the user about not being able to parse a sample
      where we should really use the python way of reporting errors: exceptions.
      
      Fixes this problem:
      
      [root@emilia ~]# python
      >>> import perf
      Traceback (most recent call last):
        File "<stdin>", line 1, in <module>
      ImportError: /home/acme/git/build/perf/python/perf.so: undefined symbol: eprintf
      >>>
      [root@emilia ~]
      
      As we want to keep the objects linked in the python binding (and in the future
      in a shared library) minimal.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-m9dba9kaluas0kq8r58z191c@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      5c6970af
    • A
      perf evlist: Remove dependency on debug routines · bccdaba0
      Arnaldo Carvalho de Melo 提交于
      So far we avoided having to link debug.o in the python binding, keep it
      that way by not using ui__warning() in evlist.c.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/n/tip-4wtew8hd3g7ejnlehtspys2t@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      bccdaba0
    • L
      Revert "mm: fail GFP_DMA allocations when ZONE_DMA is not configured" · 1fa7b6a2
      Linus Torvalds 提交于
      This reverts commit a197b59a.
      
      As rmk says:
       "Commit a197b59a (mm: fail GFP_DMA allocations when ZONE_DMA is not
        configured) is causing regressions on ARM with various drivers which
        use GFP_DMA.
      
        The behaviour up until now has been to silently ignore that flag when
        CONFIG_ZONE_DMA is not enabled, and to allocate from the normal zone.
        However, as a result of the above commit, such allocations now fail
        which causes drivers to fail.  These are regressions compared to the
        previous kernel version."
      
      so just revert it.
      Requested-by: NRussell King <linux@arm.linux.org.uk>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1fa7b6a2
    • L
      Merge git://git.infradead.org/iommu-2.6 · f0f52a94
      Linus Torvalds 提交于
      * git://git.infradead.org/iommu-2.6:
        intel-iommu: Fix off-by-one in RMRR setup
        intel-iommu: Add domain check in domain_remove_one_dev_info
        intel-iommu: Remove Host Bridge devices from identity mapping
        intel-iommu: Use coherent DMA mask when requested
        intel-iommu: Dont cache iova above 32bit
        intel-iommu: Speed up processing of the identity_mapping function
        intel-iommu: Check for identity mapping candidate using system dma mask
        intel-iommu: Only unlink device domains from iommu
        intel-iommu: Enable super page (2MiB, 1GiB, etc.) support
        intel-iommu: Flush unmaps at domain_exit
        intel-iommu: Remove obsolete comment from detect_intel_iommu
        intel-iommu: fix VT-d PMR disable for TXT on S3 resume
      f0f52a94
    • L
      block: fix mismerge of the DISK_EVENT_MEDIA_CHANGE removal · 0f48f260
      Linus Torvalds 提交于
      Jens' back-merge commit 698567f3 ("Merge commit 'v2.6.39' into
      for-2.6.40/core") was incorrectly done, and re-introduced the
      DISK_EVENT_MEDIA_CHANGE lines that had been removed earlier in commits
      
       - 9fd097b1 ("block: unexport DISK_EVENT_MEDIA_CHANGE for
         legacy/fringe drivers")
      
       - 7eec77a1 ("ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd
         and ide-cd")
      
      because of conflicts with the "g->flags" updates near-by by commit
      d4dc210f ("block: don't block events on excl write for non-optical
      devices")
      
      As a result, we re-introduced the hanging behavior due to infinite disk
      media change reports.
      
      Tssk, tssk, people! Don't do back-merges at all, and *definitely* don't
      do them to hide merge conflicts from me - especially as I'm likely
      better at merging them than you are, since I do so many merges.
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Jens Axboe <jaxboe@fusionio.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f48f260
  6. 01 6月, 2011 10 次提交
    • L
      Merge git://git.infradead.org/mtd-2.6 · 3f303103
      Linus Torvalds 提交于
      * git://git.infradead.org/mtd-2.6:
        mtd: fix physmap.h warnings
      3f303103
    • D
      intel-iommu: Fix off-by-one in RMRR setup · 70e535d1
      David Woodhouse 提交于
      We were mapping an extra byte (and hence usually an extra page):
      iommu_prepare_identity_map() expects to be given an 'end' argument which
      is the last byte to be mapped; not the first byte *not* to be mapped.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      70e535d1
    • M
      intel-iommu: Add domain check in domain_remove_one_dev_info · 8519dc44
      Mike Habeck 提交于
      The comment in domain_remove_one_dev_info() states "No need to compare
      PCI domain; it has to be the same". But for the si_domain that isn't
      going to be true, as it consists of all the PCI devices that are
      identity mapped thus multiple PCI domains can be in si_domain.  The
      code needs to validate the PCI domain too.
      Signed-off-by: NMike Habeck <habeck@sgi.com>
      Signed-off-by: NMike Travis <travis@sgi.com>
      Cc: stable@kernel.org
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      8519dc44
    • M
      intel-iommu: Remove Host Bridge devices from identity mapping · 825507d6
      Mike Travis 提交于
      When using the 1:1 (identity) PCI DMA remapping, PCI Host Bridge devices
      that do not use the IOMMU causes a kernel panic.  Fix that by not
      inserting those devices into the si_domain.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NMike Habeck <habeck@sgi.com>
      Cc: stable@kernel.org
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      825507d6
    • M
      intel-iommu: Use coherent DMA mask when requested · c681d0ba
      Mike Travis 提交于
      The __intel_map_single function is not honoring the passed in DMA mask.
      This results in not using the coherent DMA mask when called from
      intel_alloc_coherent().
      Signed-off-by: NMike Travis <travis@sgi.com>
      Acked-by: NChris Wright <chrisw@sous-sol.org>
      Reviewed-by: NMike Habeck <habeck@sgi.com>
      Cc: stable@kernel.org
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      c681d0ba
    • C
      intel-iommu: Dont cache iova above 32bit · 1c9fc3d1
      Chris Wright 提交于
      Mike Travis and Mike Habeck reported an issue where iova allocation
      would return a range that was larger than a device's dma mask.
      
      https://lkml.org/lkml/2011/3/29/423
      
      The dmar initialization code will reserve all PCI MMIO regions and copy
      those reservations into a domain specific iova tree.  It is possible for
      one of those regions to be above the dma mask of a device.  It is typical
      to allocate iovas with a 32bit mask (despite device's dma mask possibly
      being larger) and cache the result until it exhausts the lower 32bit
      address space.  Freeing the iova range that is >= the last iova in the
      lower 32bit range when there is still an iova above the 32bit range will
      corrupt the cached iova by pointing it to a region that is above 32bit.
      If that region is also larger than the device's dma mask, a subsequent
      allocation will return an unusable iova and cause dma failure.
      
      Simply don't cache an iova that is above the 32bit caching boundary.
      Reported-by: NMike Travis <travis@sgi.com>
      Reported-by: NMike Habeck <habeck@sgi.com>
      Cc: stable@kernel.org
      Acked-by: NMike Travis <travis@sgi.com>
      Tested-by: NMike Habeck <habeck@sgi.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      1c9fc3d1
    • M
      intel-iommu: Speed up processing of the identity_mapping function · cb452a40
      Mike Travis 提交于
      When there are a large count of PCI devices, and the pass through
      option for iommu is set, much time is spent in the identity_mapping
      function hunting though the iommu domains to check if a specific
      device is "identity mapped".
      
      Speed up the function by checking the cached info to see if
      it's mapped to the static identity domain.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NMike Habeck <habeck@sgi.com>
      Cc: stable@kernel.org
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      cb452a40
    • C
      intel-iommu: Check for identity mapping candidate using system dma mask · 8fcc5372
      Chris Wright 提交于
      The identity mapping code appears to make the assumption that if the
      devices dma_mask is greater than 32bits the device can use identity
      mapping.  But that is not true: take the case where we have a 40bit
      device in a 44bit architecture. The device can potentially receive a
      physical address that it will truncate and cause incorrect addresses
      to be used.
      
      Instead check to see if the device's dma_mask is large enough
      to address the system's dma_mask.
      Signed-off-by: NMike Travis <travis@sgi.com>
      Reviewed-by: NMike Habeck <habeck@sgi.com>
      Cc: stable@kernel.org
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      8fcc5372
    • A
      intel-iommu: Only unlink device domains from iommu · 9b4554b2
      Alex Williamson 提交于
      Commit a97590e5 added unlinking domains from iommus to reciprocate the
      iommu from domains unlinking that was already done.  We actually want
      to only do this for device domains and never for the static
      identity map domain or VM domains.  The SI domain is special and
      never freed, while VM domain->id lives in their own special address
      space, separate from iommu->domain_ids.
      
      In the current code, a VM can get domain->id zero, then mark that
      domain unused when unbound from pci-stub.  This leads to DMAR
      write faults when the device is re-bound to the host driver.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      9b4554b2
    • Y
      intel-iommu: Enable super page (2MiB, 1GiB, etc.) support · 6dd9a7c7
      Youquan Song 提交于
      There are no externally-visible changes with this. In the loop in the
      internal __domain_mapping() function, we simply detect if we are mapping:
        - size >= 2MiB, and
        - virtual address aligned to 2MiB, and
        - physical address aligned to 2MiB, and
        - on hardware that supports superpages.
      
      (and likewise for larger superpages).
      
      We automatically use a superpage for such mappings. We never have to
      worry about *breaking* superpages, since we trust that we will always
      *unmap* the same range that was mapped. So all we need to do is ensure
      that dma_pte_clear_range() will also cope with superpages.
      
      Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so
      it can return a PTE at the appropriate level rather than always
      extending the page tables all the way down to level 1. Again, this is
      simplified by the fact that we should never encounter existing small
      pages when we're creating a mapping; any old mapping that used the same
      virtual range will have been entirely removed and its obsolete page
      tables freed.
      
      Provide an 'intel_iommu=sp_off' argument on the command line as a
      chicken bit. Not that it should ever be required.
      
      ==
      
      The original commit seen in the iommu-2.6.git was Youquan's
      implementation (and completion) of my own half-baked code which I'd
      typed into an email. Followed by half a dozen subsequent 'fixes'.
      
      I've taken the unusual step of rewriting history and collapsing the
      original commits in order to keep the main history simpler, and make
      life easier for the people who are going to have to backport this to
      older kernels. And also so I can give it a more coherent commit comment
      which (hopefully) gives a better explanation of what's going on.
      
      The original sequence of commits leading to identical code was:
      
      Youquan Song (3):
            intel-iommu: super page support
            intel-iommu: Fix superpage alignment calculation error
            intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte()
      
      David Woodhouse (4):
            intel-iommu: Precalculate superpage support for dmar_domain
            intel-iommu: Fix hardware_largepage_caps()
            intel-iommu: Fix inappropriate use of superpages in __domain_mapping()
            intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      6dd9a7c7