1. 19 7月, 2017 1 次提交
    • R
      sparc64: Prevent perf from running during super critical sections · fc290a11
      Rob Gardner 提交于
      This fixes another cause of random segfaults and bus errors that may
      occur while running perf with the callgraph option.
      
      Critical sections beginning with spin_lock_irqsave() raise the interrupt
      level to PIL_NORMAL_MAX (14) and intentionally do not block performance
      counter interrupts, which arrive at PIL_NMI (15).
      
      But some sections of code are "super critical" with respect to perf
      because the perf_callchain_user() path accesses user space and may cause
      TLB activity as well as faults as it unwinds the user stack.
      
      One particular critical section occurs in switch_mm:
      
              spin_lock_irqsave(&mm->context.lock, flags);
              ...
              load_secondary_context(mm);
              tsb_context_switch(mm);
              ...
              spin_unlock_irqrestore(&mm->context.lock, flags);
      
      If a perf interrupt arrives in between load_secondary_context() and
      tsb_context_switch(), then perf_callchain_user() could execute with
      the context ID of one process, but with an active TSB for a different
      process. When the user stack is accessed, it is very likely to
      incur a TLB miss, since the h/w context ID has been changed. The TLB
      will then be reloaded with a translation from the TSB for one process,
      but using a context ID for another process. This exposes memory from
      one process to another, and since it is a mapping for stack memory,
      this usually causes the new process to crash quickly.
      
      This super critical section needs more protection than is provided
      by spin_lock_irqsave() since perf interrupts must not be allowed in.
      
      Since __tsb_context_switch already goes through the trouble of
      disabling interrupts completely, we fix this by moving the secondary
      context load down into this better protected region.
      
      Orabug: 25577560
      Signed-off-by: NDave Aldridge <david.j.aldridge@oracle.com>
      Signed-off-by: NRob Gardner <rob.gardner@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fc290a11
  2. 07 6月, 2017 1 次提交
    • M
      sparc64: mm: fix copy_tsb to correctly copy huge page TSBs · 654f4807
      Mike Kravetz 提交于
      When a TSB grows beyond its current capacity, a new TSB is allocated
      and copy_tsb is called to copy entries from the old TSB to the new.
      A hash shift based on page size is used to calculate the index of an
      entry in the TSB.  copy_tsb has hard coded PAGE_SHIFT in these
      calculations.  However, for huge page TSBs the value REAL_HPAGE_SHIFT
      should be used.  As a result, when copy_tsb is called for a huge page
      TSB the entries are placed at the incorrect index in the newly
      allocated TSB.  When doing hardware table walk, the MMU does not
      match these entries and we end up in the TSB miss handling code.
      This code will then create and write an entry to the correct index
      in the TSB.  We take a performance hit for the table walk miss and
      recreation of these entries.
      
      Pass a new parameter to copy_tsb that is the page size shift to be
      used when copying the TSB.
      Suggested-by: NAnthony Yznaga <anthony.yznaga@oracle.com>
      Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      654f4807
  3. 24 2月, 2017 1 次提交
    • N
      sparc64: Multi-page size support · c7d9f77d
      Nitin Gupta 提交于
      Add support for using multiple hugepage sizes simultaneously
      on mainline. Currently, support for 256M has been added which
      can be used along with 8M pages.
      
      Page tables are set like this (e.g. for 256M page):
          VA + (8M * x) -> PA + (8M * x) (sz bit = 256M) where x in [0, 31]
      
      and TSB is set similarly:
          VA + (4M * x) -> PA + (4M * x) (sz bit = 256M) where x in [0, 63]
      
      - Testing
      
      Tested on Sonoma (which supports 256M pages) by running stream
      benchmark instances in parallel: one instance uses 8M pages and
      another uses 256M pages, consuming 48G each.
      
      Boot params used:
      
      default_hugepagesz=256M hugepagesz=256M hugepages=300 hugepagesz=8M
      hugepages=10000
      Signed-off-by: NNitin Gupta <nitin.m.gupta@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d9f77d
  4. 28 7月, 2016 1 次提交
  5. 19 10月, 2014 1 次提交
    • D
      sparc64: Fix corrupted thread fault code. · 84bd6d8b
      David S. Miller 提交于
      Every path that ends up at do_sparc64_fault() must install a valid
      FAULT_CODE_* bitmask in the per-thread fault code byte.
      
      Two paths leading to the label winfix_trampoline (which expects the
      FAULT_CODE_* mask in register %g4) were not doing so:
      
      1) For pre-hypervisor TLB protection violation traps, if we took
         the 'winfix_trampoline' path we wouldn't have %g4 initialized
         with the FAULT_CODE_* value yet.  Resulting in using the
         TLB_TAG_ACCESS register address value instead.
      
      2) In the TSB miss path, when we notice that we are going to use a
         hugepage mapping, but we haven't allocated the hugepage TSB yet, we
         still have to take the window fixup case into consideration and
         in that particular path we leave %g4 not setup properly.
      
      Errors on this sort were largely invisible previously, but after
      commit 4ccb9272 ("sparc64: sun4v TLB
      error power off events") we now have a fault_code mask bit
      (FAULT_CODE_BAD_RA) that triggers due to this bug.
      
      FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS
      (see #1 above) and thus we get seemingly random bus errors triggered
      for user processes.
      
      Fixes: 4ccb9272 ("sparc64: sun4v TLB error power off events")
      Reported-by: NMeelis Roos <mroos@linux.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84bd6d8b
  6. 13 11月, 2013 1 次提交
    • D
      sparc64: Move from 4MB to 8MB huge pages. · 37b3a8ff
      David S. Miller 提交于
      The impetus for this is that we would like to move to 64-bit PMDs and
      PGDs, but that would result in only supporting a 42-bit address space
      with the current page table layout.  It'd be nice to support at least
      43-bits.
      
      The reason we'd end up with only 42-bits after making PMDs and PGDs
      64-bit is that we only use half-page sized PTE tables in order to make
      PMDs line up to 4MB, the hardware huge page size we use.
      
      So what we do here is we make huge pages 8MB, and fabricate them using
      4MB hw TLB entries.
      
      Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in
      places that really need to operate on hardware 4MB pages.
      
      Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT,
      PGD_SHIFT, and the build time CPP test as needed.  Use a CPP test to
      make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up.
      
      This makes the pgtable cache completely unused, so remove the code
      managing it and the state used in mm_context_t.  Now we have less
      spinlocks taken in the page table allocation path.
      
      The technique we use to fabricate the 8MB pages is to transfer bit 22
      from the missing virtual address into the PTEs physical address field.
      That takes care of the transparent huge pages case.
      
      For hugetlb, we fill things in at the PTE level and that code already
      puts the sub huge page physical bits into the PTEs, based upon the
      offset, so there is nothing special we need to do.  It all just works
      out.
      
      So, a small amount of complexity in the THP case, but this code is
      about to get much simpler when we move the 64-bit PMDs as we can move
      away from the fancy 32-bit huge PMD encoding and just put a real PTE
      value in there.
      
      With bug fixes and help from Bob Picco.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37b3a8ff
  7. 21 2月, 2013 1 次提交
    • D
      sparc64: Fix tsb_grow() in atomic context. · 0fbebed6
      David S. Miller 提交于
      If our first THP installation for an MM is via the set_pmd_at() done
      during khugepaged's collapsing we'll end up in tsb_grow() trying to do
      a GFP_KERNEL allocation with several locks held.
      
      Simply using GFP_ATOMIC in this situation is not the best option
      because we really can't have this fail, so we'd really like to keep
      this an order 0 GFP_KERNEL allocation if possible.
      
      Also, doing the TSB allocation from khugepaged is a really bad idea
      because we'll allocate it potentially from the wrong NUMA node in that
      context.
      
      So what we do is defer the hugepage TSB allocation until the first TLB
      miss we take on a hugepage.  This is slightly tricky because we have
      to handle two unusual cases:
      
      1) Taking the first hugepage TLB miss in the window trap handler.
         We'll call the winfix_trampoline when that is detected.
      
      2) An initial TSB allocation via TLB miss races with a hugetlb
         fault on another cpu running the same MM.  We handle this by
         unconditionally loading the TSB we see into the current cpu
         even if it's non-NULL at hugetlb_setup time.
      Reported-by: NMeelis Roos <mroos@ut.ee>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0fbebed6
  8. 09 10月, 2012 1 次提交
    • D
      sparc64: Support transparent huge pages. · 9e695d2e
      David Miller 提交于
      This is relatively easy since PMD's now cover exactly 4MB of memory.
      
      Our PMD entries are 32-bits each, so we use a special encoding.  The
      lowest bit, PMD_ISHUGE, determines the interpretation.  This is possible
      because sparc64's page tables are purely software entities so we can use
      whatever encoding scheme we want.  We just have to make the TLB miss
      assembler page table walkers aware of the layout.
      
      set_pmd_at() works much like set_pte_at() but it has to operate in two
      page from a table of non-huge PTEs, so we have to queue up TLB flushes
      based upon what mappings are valid in the PTE table.  In the second regime
      we are going from huge-page to non-huge-page, and in that case we need
      only queue up a single TLB flush to push out the huge page mapping.
      
      We still have 5 bits remaining in the huge PMD encoding so we can very
      likely support any new pieces of THP state tracking that might get added
      in the future.
      
      With lots of help from Johannes Weiner.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e695d2e
  9. 20 2月, 2010 1 次提交
    • D
      sparc64: Fix sun4u execute bit check in TSB I-TLB load. · 1f474646
      David S. Miller 提交于
      Thanks to testcase and report from Brad Spengler:
      
      --------------------
      #include <stdio.h>
      
      typedef int (* _wee)(void);
      
      int main(void)
      {
              char buf[8] = { '\x81', '\xc7', '\xe0', '\x08', '\x81', '\xe8',
                              '\x00', '\x00' };
              _wee wee;
              printf("%p\n", &buf);
              wee = (_wee)&buf;
              wee();
      
              return 0;
      }
      --------------------
      
      TSB I-tlb load code tries to use andcc to check the _PAGE_EXEC_4U bit,
      but that's bit 12 so it gets sign extended all the way up to bit 63
      and the test nearly always passes as a result.
      
      Use sethi to fix the bug.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f474646
  10. 05 12月, 2008 2 次提交
  11. 24 4月, 2008 1 次提交
  12. 20 3月, 2007 1 次提交
    • D
      [SPARC64]: store-init needs trailing membar. · 24d559ca
      David S. Miller 提交于
      The manual says that it is required and we actually have crash reports
      where loads see stale data due to not having membars here.
      
      In one case the networking does:
      
      	memset(skb, 0, offsetof(struct sk_buff, truesize));
      
      and then some code later checks skb->nohdr for zero, but it's still
      the value that was there before the memset().
      
      Note that arch/sparc64/lib/xor.S already got this right.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24d559ca
  13. 01 7月, 2006 1 次提交
  14. 22 3月, 2006 1 次提交
  15. 20 3月, 2006 25 次提交
    • D
      [SPARC64]: Optimized TSB table initialization. · bb8646d8
      David S. Miller 提交于
      We only need to write an invalid tag every 16 bytes,
      so taking advantage of this can save many instructions
      compared to the simple memset() call we make now.
      
      A prefetching implementation is implemented for sun4u
      and a block-init store version if implemented for Niagara.
      
      The next trick is to be able to perform an init and
      a copy_tsb() in parallel when growing a TSB table.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bb8646d8
    • D
      [SPARC64]: Fix and re-enable dynamic TSB sizing. · 7a1ac526
      David S. Miller 提交于
      This is good for up to %50 performance improvement of some test cases.
      The problem has been the race conditions, and hopefully I've plugged
      them all up here.
      
      1) There was a serious race in switch_mm() wrt. lazy TLB
         switching to and from kernel threads.
      
         We could erroneously skip a tsb_context_switch() and thus
         use a stale TSB across a TSB grow event.
      
         There is a big comment now in that function describing
         exactly how it can happen.
      
      2) All code paths that do something with the TSB need to be
         guarded with the mm->context.lock spinlock.  This makes
         page table flushing paths properly synchronize with both
         TSB growing and TLB context changes.
      
      3) TSB growing events are moved to the end of successful fault
         processing.  Previously it was in update_mmu_cache() but
         that is deadlock prone.  At the end of do_sparc64_fault()
         we hold no spinlocks that could deadlock the TSB grow
         sequence.  We also have dropped the address space semaphore.
      
      While we're here, add prefetching to the copy_tsb() routine
      and put it in assembler into the tsb.S file.  This piece of
      code is quite time critical.
      
      There are some small negative side effects to this code which
      can be improved upon.  In particular we grab the mm->context.lock
      even for the tsb insert done by update_mmu_cache() now and that's
      a bit excessive.  We can get rid of that locking, and the same
      lock taking in flush_tsb_user(), by disabling PSTATE_IE around
      the whole operation including the capturing of the tsb pointer
      and tsb_nentries value.  That would work because anyone growing
      the TSB won't free up the old TSB until all cpus respond to the
      TSB change cross call.
      
      I'm not quite so confident in that optimization to put it in
      right now, but eventually we might be able to and the description
      is here for reference.
      
      This code seems very solid now.  It passes several parallel GCC
      bootstrap builds, and our favorite "nut cruncher" stress test which is
      a full "make -j8192" build of a "make allmodconfig" kernel.  That puts
      about 256 processes on each cpu's run queue, makes lots of process cpu
      migrations occur, causes lots of page table and TLB flushing activity,
      incurs many context version number changes, and it swaps the machine
      real far out to disk even though there is 16GB of ram on this test
      system. :-)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a1ac526
    • D
      [SPARC64]: Simplify TSB insert checks. · 74ae9987
      David S. Miller 提交于
      Don't try to avoid putting non-base page sized entries
      into the user TSB.  It actually costs us more to check
      this than it helps.
      
      Eventually we'll have a multiple TSB scheme for user
      processes.  Once a process starts using larger pages,
      we'll allocate and use such a TSB.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      74ae9987
    • D
      [SPARC64]: Fix _PAGE_EXEC handling. · 45f791eb
      David S. Miller 提交于
      First of all, use the known _PAGE_EXEC_{4U,4V} value instead
      of loading _PAGE_EXEC from memory.  We either know which one
      to use by context, or we can code patch the test.
      
      Next, we need to check executability of a PTE in the generic
      TSB miss handler.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45f791eb
    • D
      [SPARC64]: More TLB/TSB handling fixes. · 8b234274
      David S. Miller 提交于
      The SUN4V convention with non-shared TSBs is that the context
      bit of the TAG is clear.  So we have to choose an "invalid"
      bit and initialize new TSBs appropriately.  Otherwise a zero
      TAG looks "valid".
      
      Make sure, for the window fixup cases, that we use the right
      global registers and that we don't potentially trample on
      the live global registers in etrap/rtrap handling (%g2 and
      %g6) and that we put the missing virtual address properly
      in %g5.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b234274
    • D
      [SPARC64]: Fix some SUN4V TLB handling bugs. · 6c8927c9
      David S. Miller 提交于
      1) Add error return checking for TLB load hypervisor
         calls.
      
      2) Don't fallthru to dtlb tsb miss handler from itlb tsb
         miss handler, oops.
      
      3) On window fixups, propagate fault information to fixup
         handler correctly.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c8927c9
    • D
      [SPARC64]: Do not write garbage into %pstate in tsb_context_switch(). · a7b31bac
      David S. Miller 提交于
      For SUN4V, we were clobbering %o5 to do the hypervisor call.
      This clobbers the saved %pstate value and we end up writing
      garbage into that register as a result.  Oops.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7b31bac
    • D
      [SPARC64]: Deal with PTE layout differences in SUN4V. · c4bce90e
      David S. Miller 提交于
      Yes, you heard it right, they changed the PTE layout for
      SUN4V.  Ho hum...
      
      This is the simple and inefficient way to support this.
      It'll get optimized, don't worry.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4bce90e
    • D
      [SPARC64]: Simplify sun4v TLB handling using macros. · 36a68e77
      David S. Miller 提交于
      There was also a bug in sun4v_itlb_miss, it loaded the
      MMU Fault Status base into %g3 instead of %g2.
      
      This pointed out a fast path for TSB miss processing,
      since we have %g2 with the MMU Fault Status base, we
      can use that to quickly load up the PGD phys address.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      36a68e77
    • D
      [SPARC64]: Fix hypervisor call arg passing. · 164c220f
      David S. Miller 提交于
      Function goes in %o5, args go in %o0 --> %o5.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      164c220f
    • D
      618e9ed9
    • D
      [SPARC64]: Implement sun4v TSB miss handlers. · aa9143b9
      David S. Miller 提交于
      When we register a TSB with the hypervisor, so that it or hardware can
      handle TLB misses and do the TSB walk for us, the hypervisor traps
      down to these trap when it incurs a TSB miss.
      
      Processing is simple, we load the missing virtual address and context,
      and do a full page table walk.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa9143b9
    • D
    • D
      [SPARC64]: Initial sun4v TLB miss handling infrastructure. · d257d5da
      David S. Miller 提交于
      Things are a little tricky because, unlike sun4u, we have
      to:
      
      1) do a hypervisor trap to do the TLB load.
      2) do the TSB lookup calculations by hand
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d257d5da
    • D
      [SPARC64]: Sanitize %pstate writes for sun4v. · 45fec05f
      David S. Miller 提交于
      If we're just switching between different alternate global
      sets, nop it out on sun4v.  Also, get rid of all of the
      alternate global save/restore in the OBP CIF trampoline code.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45fec05f
    • D
      [SPARC64]: Refine register window trap handling. · 314ef685
      David S. Miller 提交于
      When saving and restoing trap state, do the window spill/fill
      handling inline so that we never trap deeper than 2 trap levels.
      This is important for chips like Niagara.
      
      The window fixup code is massively simplified, and many more
      improvements are now possible.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      314ef685
    • D
      [SPARC64]: Add explicit register args to trap state loading macros. · ffe483d5
      David S. Miller 提交于
      This, as well as making the code cleaner, allows a simplification in
      the TSB miss handling path.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ffe483d5
    • D
      [SPARC64]: Access TSB with physical addresses when possible. · 517af332
      David S. Miller 提交于
      This way we don't need to lock the TSB into the TLB.
      The trick is that every TSB load/store is registered into
      a special instruction patch section.  The default uses
      virtual addresses, and the patch instructions use physical
      address load/stores.
      
      We can't do this on all chips because only cheetah+ and later
      have the physical variant of the atomic quad load.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      517af332
    • D
      [SPARC64]: Fix too early reference to %g6 · 9bc657b2
      David S. Miller 提交于
      %g6 is not necessarily set to current_thread_info()
      at sparc64_realfault_common.  So store the fault
      code and address after we invoke etrap and %g6 is
      properly set up.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bc657b2
    • D
      [SPARC64]: Kill PROM locked TLB entry preservation code. · 3487d1d4
      David S. Miller 提交于
      It is totally unnecessary complexity.  After we take over
      the trap table, we handle all PROM tlb misses fully.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3487d1d4
    • D
      [SPARC64]: Use sparc64_highest_unlocked_tlb_ent in __tsb_context_switch() · 6b6d0172
      David S. Miller 提交于
      Instead of ugly hard-coded value.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b6d0172
    • D
      b70c0fa1
    • D
      [SPARC64]: Add infrastructure for dynamic TSB sizing. · 98c5584c
      David S. Miller 提交于
      This also cleans up tsb_context_switch().  The assembler
      routine is now __tsb_context_switch() and the former is
      an inline function that picks out the bits from the mm_struct
      and passes it into the assembler code as arguments.
      
      setup_tsb_parms() computes the locked TLB entry to map the
      TSB.  Later when we support using the physical address quad
      load instructions of Cheetah+ and later, we'll simply use
      the physical address for the TSB register value and set
      the map virtual and PTE both to zero.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      98c5584c
    • D
      [SPARC64]: TSB refinements. · 09f94287
      David S. Miller 提交于
      Move {init_new,destroy}_context() out of line.
      
      Do not put huge pages into the TSB, only base page size translations.
      There are some clever things we could do here, but for now let's be
      correct instead of fancy.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09f94287
    • D
      [SPARC64]: Elminate all usage of hard-coded trap globals. · 56fb4df6
      David S. Miller 提交于
      UltraSPARC has special sets of global registers which are switched to
      for certain trap types.  There is one set for MMU related traps, one
      set of Interrupt Vector processing, and another set (called the
      Alternate globals) for all other trap types.
      
      For what seems like forever we've hard coded the values in some of
      these trap registers.  Some examples include:
      
      1) Interrupt Vector global %g6 holds current processors interrupt
         work struct where received interrupts are managed for IRQ handler
         dispatch.
      
      2) MMU global %g7 holds the base of the page tables of the currently
         active address space.
      
      3) Alternate global %g6 held the current_thread_info() value.
      
      Such hardcoding has resulted in some serious issues in many areas.
      There are some code sequences where having another register available
      would help clean up the implementation.  Taking traps such as
      cross-calls from the OBP firmware requires some trick code sequences
      wherein we have to save away and restore all of the special sets of
      global registers when we enter/exit OBP.
      
      We were also using the IMMU TSB register on SMP to hold the per-cpu
      area base address, which doesn't work any longer now that we actually
      use the TSB facility of the cpu.
      
      The implementation is pretty straight forward.  One tricky bit is
      getting the current processor ID as that is different on different cpu
      variants.  We use a stub with a fancy calling convention which we
      patch at boot time.  The calling convention is that the stub is
      branched to and the (PC - 4) to return to is in register %g1.  The cpu
      number is left in %g6.  This stub can be invoked by using the
      __GET_CPUID macro.
      
      We use an array of per-cpu trap state to store the current thread and
      physical address of the current address space's page tables.  The
      TRAP_LOAD_THREAD_REG loads %g6 with the current thread from this
      table, it uses __GET_CPUID and also clobbers %g1.
      
      TRAP_LOAD_IRQ_WORK is used by the interrupt vector processing to load
      the current processor's IRQ software state into %g6.  It also uses
      __GET_CPUID and clobbers %g1.
      
      Finally, TRAP_LOAD_PGD_PHYS loads the physical address base of the
      current address space's page tables into %g7, it clobbers %g1 and uses
      __GET_CPUID.
      
      Many refinements are possible, as well as some tuning, with this stuff
      in place.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      56fb4df6