- 19 7月, 2017 1 次提交
-
-
由 Rob Gardner 提交于
This fixes another cause of random segfaults and bus errors that may occur while running perf with the callgraph option. Critical sections beginning with spin_lock_irqsave() raise the interrupt level to PIL_NORMAL_MAX (14) and intentionally do not block performance counter interrupts, which arrive at PIL_NMI (15). But some sections of code are "super critical" with respect to perf because the perf_callchain_user() path accesses user space and may cause TLB activity as well as faults as it unwinds the user stack. One particular critical section occurs in switch_mm: spin_lock_irqsave(&mm->context.lock, flags); ... load_secondary_context(mm); tsb_context_switch(mm); ... spin_unlock_irqrestore(&mm->context.lock, flags); If a perf interrupt arrives in between load_secondary_context() and tsb_context_switch(), then perf_callchain_user() could execute with the context ID of one process, but with an active TSB for a different process. When the user stack is accessed, it is very likely to incur a TLB miss, since the h/w context ID has been changed. The TLB will then be reloaded with a translation from the TSB for one process, but using a context ID for another process. This exposes memory from one process to another, and since it is a mapping for stack memory, this usually causes the new process to crash quickly. This super critical section needs more protection than is provided by spin_lock_irqsave() since perf interrupts must not be allowed in. Since __tsb_context_switch already goes through the trouble of disabling interrupts completely, we fix this by moving the secondary context load down into this better protected region. Orabug: 25577560 Signed-off-by: NDave Aldridge <david.j.aldridge@oracle.com> Signed-off-by: NRob Gardner <rob.gardner@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 07 6月, 2017 1 次提交
-
-
由 Mike Kravetz 提交于
When a TSB grows beyond its current capacity, a new TSB is allocated and copy_tsb is called to copy entries from the old TSB to the new. A hash shift based on page size is used to calculate the index of an entry in the TSB. copy_tsb has hard coded PAGE_SHIFT in these calculations. However, for huge page TSBs the value REAL_HPAGE_SHIFT should be used. As a result, when copy_tsb is called for a huge page TSB the entries are placed at the incorrect index in the newly allocated TSB. When doing hardware table walk, the MMU does not match these entries and we end up in the TSB miss handling code. This code will then create and write an entry to the correct index in the TSB. We take a performance hit for the table walk miss and recreation of these entries. Pass a new parameter to copy_tsb that is the page size shift to be used when copying the TSB. Suggested-by: NAnthony Yznaga <anthony.yznaga@oracle.com> Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 2月, 2017 1 次提交
-
-
由 Nitin Gupta 提交于
Add support for using multiple hugepage sizes simultaneously on mainline. Currently, support for 256M has been added which can be used along with 8M pages. Page tables are set like this (e.g. for 256M page): VA + (8M * x) -> PA + (8M * x) (sz bit = 256M) where x in [0, 31] and TSB is set similarly: VA + (4M * x) -> PA + (4M * x) (sz bit = 256M) where x in [0, 63] - Testing Tested on Sonoma (which supports 256M pages) by running stream benchmark instances in parallel: one instance uses 8M pages and another uses 256M pages, consuming 48G each. Boot params used: default_hugepagesz=256M hugepagesz=256M hugepages=300 hugepagesz=8M hugepages=10000 Signed-off-by: NNitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 28 7月, 2016 1 次提交
-
-
由 David S. Miller 提交于
On pre-Niagara systems, we fetch the fault address on data TLB exceptions from the TLB_TAG_ACCESS register. But this register also contains the context ID assosciated with the fault in the low 13 bits of the register value. This propagates into current_thread_info()->fault_address and can cause trouble later on. So clear the low 13-bits out of the TLB_TAG_ACCESS value in the cases where it matters. Reported-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 19 10月, 2014 1 次提交
-
-
由 David S. Miller 提交于
Every path that ends up at do_sparc64_fault() must install a valid FAULT_CODE_* bitmask in the per-thread fault code byte. Two paths leading to the label winfix_trampoline (which expects the FAULT_CODE_* mask in register %g4) were not doing so: 1) For pre-hypervisor TLB protection violation traps, if we took the 'winfix_trampoline' path we wouldn't have %g4 initialized with the FAULT_CODE_* value yet. Resulting in using the TLB_TAG_ACCESS register address value instead. 2) In the TSB miss path, when we notice that we are going to use a hugepage mapping, but we haven't allocated the hugepage TSB yet, we still have to take the window fixup case into consideration and in that particular path we leave %g4 not setup properly. Errors on this sort were largely invisible previously, but after commit 4ccb9272 ("sparc64: sun4v TLB error power off events") we now have a fault_code mask bit (FAULT_CODE_BAD_RA) that triggers due to this bug. FAULT_CODE_BAD_RA triggers because this bit is set in TLB_TAG_ACCESS (see #1 above) and thus we get seemingly random bus errors triggered for user processes. Fixes: 4ccb9272 ("sparc64: sun4v TLB error power off events") Reported-by: NMeelis Roos <mroos@linux.ee> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 11月, 2013 1 次提交
-
-
由 David S. Miller 提交于
The impetus for this is that we would like to move to 64-bit PMDs and PGDs, but that would result in only supporting a 42-bit address space with the current page table layout. It'd be nice to support at least 43-bits. The reason we'd end up with only 42-bits after making PMDs and PGDs 64-bit is that we only use half-page sized PTE tables in order to make PMDs line up to 4MB, the hardware huge page size we use. So what we do here is we make huge pages 8MB, and fabricate them using 4MB hw TLB entries. Facilitate this by providing a "REAL_HPAGE_SHIFT" which is used in places that really need to operate on hardware 4MB pages. Use full pages (512 entries) for PTE tables, and adjust PMD_SHIFT, PGD_SHIFT, and the build time CPP test as needed. Use a CPP test to make sure REAL_HPAGE_SHIFT and the _PAGE_SZHUGE_* we use match up. This makes the pgtable cache completely unused, so remove the code managing it and the state used in mm_context_t. Now we have less spinlocks taken in the page table allocation path. The technique we use to fabricate the 8MB pages is to transfer bit 22 from the missing virtual address into the PTEs physical address field. That takes care of the transparent huge pages case. For hugetlb, we fill things in at the PTE level and that code already puts the sub huge page physical bits into the PTEs, based upon the offset, so there is nothing special we need to do. It all just works out. So, a small amount of complexity in the THP case, but this code is about to get much simpler when we move the 64-bit PMDs as we can move away from the fancy 32-bit huge PMD encoding and just put a real PTE value in there. With bug fixes and help from Bob Picco. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 21 2月, 2013 1 次提交
-
-
由 David S. Miller 提交于
If our first THP installation for an MM is via the set_pmd_at() done during khugepaged's collapsing we'll end up in tsb_grow() trying to do a GFP_KERNEL allocation with several locks held. Simply using GFP_ATOMIC in this situation is not the best option because we really can't have this fail, so we'd really like to keep this an order 0 GFP_KERNEL allocation if possible. Also, doing the TSB allocation from khugepaged is a really bad idea because we'll allocate it potentially from the wrong NUMA node in that context. So what we do is defer the hugepage TSB allocation until the first TLB miss we take on a hugepage. This is slightly tricky because we have to handle two unusual cases: 1) Taking the first hugepage TLB miss in the window trap handler. We'll call the winfix_trampoline when that is detected. 2) An initial TSB allocation via TLB miss races with a hugetlb fault on another cpu running the same MM. We handle this by unconditionally loading the TSB we see into the current cpu even if it's non-NULL at hugetlb_setup time. Reported-by: NMeelis Roos <mroos@ut.ee> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 09 10月, 2012 1 次提交
-
-
由 David Miller 提交于
This is relatively easy since PMD's now cover exactly 4MB of memory. Our PMD entries are 32-bits each, so we use a special encoding. The lowest bit, PMD_ISHUGE, determines the interpretation. This is possible because sparc64's page tables are purely software entities so we can use whatever encoding scheme we want. We just have to make the TLB miss assembler page table walkers aware of the layout. set_pmd_at() works much like set_pte_at() but it has to operate in two page from a table of non-huge PTEs, so we have to queue up TLB flushes based upon what mappings are valid in the PTE table. In the second regime we are going from huge-page to non-huge-page, and in that case we need only queue up a single TLB flush to push out the huge page mapping. We still have 5 bits remaining in the huge PMD encoding so we can very likely support any new pieces of THP state tracking that might get added in the future. With lots of help from Johannes Weiner. Signed-off-by: NDavid S. Miller <davem@davemloft.net> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 2月, 2010 1 次提交
-
-
由 David S. Miller 提交于
Thanks to testcase and report from Brad Spengler: -------------------- #include <stdio.h> typedef int (* _wee)(void); int main(void) { char buf[8] = { '\x81', '\xc7', '\xe0', '\x08', '\x81', '\xe8', '\x00', '\x00' }; _wee wee; printf("%p\n", &buf); wee = (_wee)&buf; wee(); return 0; } -------------------- TSB I-tlb load code tries to use andcc to check the _PAGE_EXEC_4U bit, but that's bit 12 so it gets sign extended all the way up to bit 63 and the test nearly always passes as a result. Use sethi to fix the bug. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 05 12月, 2008 2 次提交
-
-
由 Sam Ravnborg 提交于
o Move all files from sparc64/kernel/ to sparc/kernel - rename as appropriate o Update sparc/Makefile to the changes o Update sparc/kernel/Makefile to include the sparc64 files NOTE: This commit changes link order on sparc64! Link order had to change for either of sparc32 and sparc64. And assuming sparc64 see more testing than sparc32 change link order on sparc64 where issues will be caught faster. Signed-off-by: NSam Ravnborg <sam@ravnborg.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
The kernel always executes in the TSO memory model now, so none of this stuff is necessary any more. With helpful feedback from Nick Piggin. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 24 4月, 2008 1 次提交
-
-
由 David S. Miller 提交于
Now that we indicate the "restart system call" in the trap type field of pt_regs->magic, we don't need to set the %l6 boolean in all of the trap return paths. And we therefore don't need to pass it to do_notify_resume(). Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 3月, 2007 1 次提交
-
-
由 David S. Miller 提交于
The manual says that it is required and we actually have crash reports where loads see stale data due to not having membars here. In one case the networking does: memset(skb, 0, offsetof(struct sk_buff, truesize)); and then some code later checks skb->nohdr for zero, but it's still the value that was there before the memset(). Note that arch/sparc64/lib/xor.S already got this right. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 01 7月, 2006 1 次提交
-
-
由 Jörn Engel 提交于
Signed-off-by: NJörn Engel <joern@wohnheim.fh-wedel.de> Signed-off-by: NAdrian Bunk <bunk@stusta.de>
-
- 22 3月, 2006 1 次提交
-
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 20 3月, 2006 25 次提交
-
-
由 David S. Miller 提交于
We only need to write an invalid tag every 16 bytes, so taking advantage of this can save many instructions compared to the simple memset() call we make now. A prefetching implementation is implemented for sun4u and a block-init store version if implemented for Niagara. The next trick is to be able to perform an init and a copy_tsb() in parallel when growing a TSB table. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This is good for up to %50 performance improvement of some test cases. The problem has been the race conditions, and hopefully I've plugged them all up here. 1) There was a serious race in switch_mm() wrt. lazy TLB switching to and from kernel threads. We could erroneously skip a tsb_context_switch() and thus use a stale TSB across a TSB grow event. There is a big comment now in that function describing exactly how it can happen. 2) All code paths that do something with the TSB need to be guarded with the mm->context.lock spinlock. This makes page table flushing paths properly synchronize with both TSB growing and TLB context changes. 3) TSB growing events are moved to the end of successful fault processing. Previously it was in update_mmu_cache() but that is deadlock prone. At the end of do_sparc64_fault() we hold no spinlocks that could deadlock the TSB grow sequence. We also have dropped the address space semaphore. While we're here, add prefetching to the copy_tsb() routine and put it in assembler into the tsb.S file. This piece of code is quite time critical. There are some small negative side effects to this code which can be improved upon. In particular we grab the mm->context.lock even for the tsb insert done by update_mmu_cache() now and that's a bit excessive. We can get rid of that locking, and the same lock taking in flush_tsb_user(), by disabling PSTATE_IE around the whole operation including the capturing of the tsb pointer and tsb_nentries value. That would work because anyone growing the TSB won't free up the old TSB until all cpus respond to the TSB change cross call. I'm not quite so confident in that optimization to put it in right now, but eventually we might be able to and the description is here for reference. This code seems very solid now. It passes several parallel GCC bootstrap builds, and our favorite "nut cruncher" stress test which is a full "make -j8192" build of a "make allmodconfig" kernel. That puts about 256 processes on each cpu's run queue, makes lots of process cpu migrations occur, causes lots of page table and TLB flushing activity, incurs many context version number changes, and it swaps the machine real far out to disk even though there is 16GB of ram on this test system. :-) Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Don't try to avoid putting non-base page sized entries into the user TSB. It actually costs us more to check this than it helps. Eventually we'll have a multiple TSB scheme for user processes. Once a process starts using larger pages, we'll allocate and use such a TSB. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
First of all, use the known _PAGE_EXEC_{4U,4V} value instead of loading _PAGE_EXEC from memory. We either know which one to use by context, or we can code patch the test. Next, we need to check executability of a PTE in the generic TSB miss handler. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
The SUN4V convention with non-shared TSBs is that the context bit of the TAG is clear. So we have to choose an "invalid" bit and initialize new TSBs appropriately. Otherwise a zero TAG looks "valid". Make sure, for the window fixup cases, that we use the right global registers and that we don't potentially trample on the live global registers in etrap/rtrap handling (%g2 and %g6) and that we put the missing virtual address properly in %g5. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
1) Add error return checking for TLB load hypervisor calls. 2) Don't fallthru to dtlb tsb miss handler from itlb tsb miss handler, oops. 3) On window fixups, propagate fault information to fixup handler correctly. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
For SUN4V, we were clobbering %o5 to do the hypervisor call. This clobbers the saved %pstate value and we end up writing garbage into that register as a result. Oops. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Yes, you heard it right, they changed the PTE layout for SUN4V. Ho hum... This is the simple and inefficient way to support this. It'll get optimized, don't worry. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
There was also a bug in sun4v_itlb_miss, it loaded the MMU Fault Status base into %g3 instead of %g2. This pointed out a fast path for TSB miss processing, since we have %g2 with the MMU Fault Status base, we can use that to quickly load up the PGD phys address. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Function goes in %o5, args go in %o0 --> %o5. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
When we register a TSB with the hypervisor, so that it or hardware can handle TLB misses and do the TSB walk for us, the hypervisor traps down to these trap when it incurs a TSB miss. Processing is simple, we load the missing virtual address and context, and do a full page table walk. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Things are a little tricky because, unlike sun4u, we have to: 1) do a hypervisor trap to do the TLB load. 2) do the TSB lookup calculations by hand Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
If we're just switching between different alternate global sets, nop it out on sun4v. Also, get rid of all of the alternate global save/restore in the OBP CIF trampoline code. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
When saving and restoing trap state, do the window spill/fill handling inline so that we never trap deeper than 2 trap levels. This is important for chips like Niagara. The window fixup code is massively simplified, and many more improvements are now possible. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This, as well as making the code cleaner, allows a simplification in the TSB miss handling path. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This way we don't need to lock the TSB into the TLB. The trick is that every TSB load/store is registered into a special instruction patch section. The default uses virtual addresses, and the patch instructions use physical address load/stores. We can't do this on all chips because only cheetah+ and later have the physical variant of the atomic quad load. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
%g6 is not necessarily set to current_thread_info() at sparc64_realfault_common. So store the fault code and address after we invoke etrap and %g6 is properly set up. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
It is totally unnecessary complexity. After we take over the trap table, we handle all PROM tlb misses fully. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Instead of ugly hard-coded value. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
This also cleans up tsb_context_switch(). The assembler routine is now __tsb_context_switch() and the former is an inline function that picks out the bits from the mm_struct and passes it into the assembler code as arguments. setup_tsb_parms() computes the locked TLB entry to map the TSB. Later when we support using the physical address quad load instructions of Cheetah+ and later, we'll simply use the physical address for the TSB register value and set the map virtual and PTE both to zero. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
Move {init_new,destroy}_context() out of line. Do not put huge pages into the TSB, only base page size translations. There are some clever things we could do here, but for now let's be correct instead of fancy. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 David S. Miller 提交于
UltraSPARC has special sets of global registers which are switched to for certain trap types. There is one set for MMU related traps, one set of Interrupt Vector processing, and another set (called the Alternate globals) for all other trap types. For what seems like forever we've hard coded the values in some of these trap registers. Some examples include: 1) Interrupt Vector global %g6 holds current processors interrupt work struct where received interrupts are managed for IRQ handler dispatch. 2) MMU global %g7 holds the base of the page tables of the currently active address space. 3) Alternate global %g6 held the current_thread_info() value. Such hardcoding has resulted in some serious issues in many areas. There are some code sequences where having another register available would help clean up the implementation. Taking traps such as cross-calls from the OBP firmware requires some trick code sequences wherein we have to save away and restore all of the special sets of global registers when we enter/exit OBP. We were also using the IMMU TSB register on SMP to hold the per-cpu area base address, which doesn't work any longer now that we actually use the TSB facility of the cpu. The implementation is pretty straight forward. One tricky bit is getting the current processor ID as that is different on different cpu variants. We use a stub with a fancy calling convention which we patch at boot time. The calling convention is that the stub is branched to and the (PC - 4) to return to is in register %g1. The cpu number is left in %g6. This stub can be invoked by using the __GET_CPUID macro. We use an array of per-cpu trap state to store the current thread and physical address of the current address space's page tables. The TRAP_LOAD_THREAD_REG loads %g6 with the current thread from this table, it uses __GET_CPUID and also clobbers %g1. TRAP_LOAD_IRQ_WORK is used by the interrupt vector processing to load the current processor's IRQ software state into %g6. It also uses __GET_CPUID and clobbers %g1. Finally, TRAP_LOAD_PGD_PHYS loads the physical address base of the current address space's page tables into %g7, it clobbers %g1 and uses __GET_CPUID. Many refinements are possible, as well as some tuning, with this stuff in place. Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-