- 05 8月, 2010 2 次提交
-
-
由 Benjamin Herrenschmidt 提交于
This introduce memblock.current_limit which is used to limit allocations from memblock_alloc() or memblock_alloc_base(..., MEMBLOCK_ALLOC_ACCESSIBLE). The old MEMBLOCK_ALLOC_ANYWHERE changes value from 0 to ~(u64)0 and can still be used with memblock_alloc_base() to allocate really anywhere. It is -no-longer- cropped to MEMBLOCK_REAL_LIMIT which disappears. Note to archs: I'm leaving the default limit to MEMBLOCK_ALLOC_ANYWHERE. I strongly recommend that you ensure that you set an appropriate limit during boot in order to guarantee that an memblock_alloc() at any time results in something that is accessible with a simple __va(). The reason is that a subsequent patch will introduce the ability for the array to resize itself by reallocating itself. The MEMBLOCK core will honor the current limit when performing those allocations. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 04 8月, 2010 2 次提交
-
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Benjamin Herrenschmidt 提交于
Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 23 7月, 2010 2 次提交
-
-
由 Benjamin Herrenschmidt 提交于
This adds some debug output to our MMU hash code to print out some useful debug data if the hypervisor refuses the insertion (which should normally never happen). Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> ---
-
由 Benjamin Herrenschmidt 提交于
Instead of adding _PAGE_PRESENT to the access permission mask in each low level routine independently, we add it once from hash_page(). We also move the preliminary access check (the racy one before the PTE is locked) up so it applies to the huge page case. This duplicates code in __hash_page_huge() which we'll remove in a subsequent patch to fix a race in there. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 14 7月, 2010 1 次提交
-
-
由 Yinghai Lu 提交于
via following scripts FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/lmb/memblock/g' \ -e 's/LMB/MEMBLOCK/g' \ $FILES for N in $(find . -name lmb.[ch]); do M=$(echo $N | sed 's/lmb/memblock/g') mv $N $M done and remove some wrong change like lmbench and dlmb etc. also move memblock.c from lib/ to mm/ Suggested-by: NIngo Molnar <mingo@elte.hu> Acked-by: N"H. Peter Anvin" <hpa@zytor.com> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NYinghai Lu <yinghai@kernel.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 18 12月, 2009 2 次提交
-
-
由 David Gibson 提交于
Commit d28513bc ("Fix bug in pagetable cache cleanup with CONFIG_PPC_SUBPAGE_PROT"), itself a fix for breakage caused by an earlier clean up patch of mine, contains a stupid bug. I changed the parameters of the subpage_protection() function, but failed to update one of the callers. This patch fixes it, and replaces a void * with a typed pointer so that the compiler will warn on such an error in future. Signed-off-by: NDavid Gibson <dwg@au1.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Sachin P. Sant 提交于
This time without the funny characters. Fix following build errors generated with DEBUG=1 cc1: warnings being treated as errors arch/powerpc/mm/hash_utils_64.c: In function 'htab_dt_scan_page_sizes': arch/powerpc/mm/hash_utils_64.c:343: error: format '%04x' expects type 'unsigned int', but argument 4 has type 'long unsigned int' arch/powerpc/mm/hash_utils_64.c:343: error: format '%08x' expects type 'unsigned int', but argument 5 has type 'long unsigned int' arch/powerpc/mm/hash_utils_64.c: In function 'htab_initialize': arch/powerpc/mm/hash_utils_64.c:666: error: format '%x' expects type 'unsigned int', but argument 4 has type 'long unsigned int' ... SNIP ... Signed-off-by: NSachin Sant <sachinp@in.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 08 12月, 2009 1 次提交
-
-
由 David Gibson 提交于
Commit a0668cdc cleans up the handling of kmem_caches for allocating various levels of pagetables. Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to the latter's cleverly hidden technique of adding some extra allocation space to the top level page directory to store the extra information it needs. Since that extra allocation really doesn't fit into the cleaned up page directory allocating scheme, this patch alters CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct subpage_prot_table as part of the mm_context_t. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 02 12月, 2009 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
This reverts commit c045256d. It breaks build when CONFIG_PPC_SUBPAGE_PROT is not set. I will commit a fixed version separately Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 27 11月, 2009 1 次提交
-
-
由 David Gibson 提交于
Commit a0668cdc cleans up the handling of kmem_caches for allocating various levels of pagetables. Unfortunately, it conflicts badly with CONFIG_PPC_SUBPAGE_PROT, due to the latter's cleverly hidden technique of adding some extra allocation space to the top level page directory to store the extra information it needs. Since that extra allocation really doesn't fit into the cleaned up page directory allocating scheme, this patch alters CONFIG_PPC_SUBPAGE_PROT to instead allocate its struct subpage_prot_table as part of the mm_context_t. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 05 11月, 2009 1 次提交
-
-
由 Alexander Graf 提交于
We want to be able to build KVM as a module. To enable us doing so, we need some more exports from core Linux parts. This patch exports all functions and variables that are required for KVM. Signed-off-by: NAlexander Graf <agraf@suse.de> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 30 10月, 2009 3 次提交
-
-
由 David Gibson 提交于
The hugepage arch code provides a number of hook functions/macros which mirror the functionality of various normal page pte access functions. Various changes in the normal page accessors (in particular BenH's recent changes to the handling of lazy icache flushing and PAGE_EXEC) have caused the hugepage versions to get out of sync with the originals. In some cases, this is a bug, at least on some MMU types. One of the reasons that some hooks were not identical to the normal page versions, is that the fact we're dealing with a hugepage needed to be passed down do use the correct dcache-icache flush function. This patch makes the main flush_dcache_icache_page() function hugepage aware (by checking for the PageCompound flag). That in turn means we can make set_huge_pte_at() just a call to set_pte_at() bringing it back into sync. As a bonus, this lets us remove the hash_huge_page_do_lazy_icache() function, replacing it with a call to the hash_page_do_lazy_icache() function it was based on. Some other hugepage pte access hooks - huge_ptep_get_and_clear() and huge_ptep_clear_flush() - are not so easily unified, but this patch at least brings them back into sync with the current versions of the corresponding normal page functions. Signed-off-by: NDavid Gibson <dwg@au1.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 David Gibson 提交于
This patch simplifies the logic used to initialize hugepages on powerpc. The somewhat oddly named set_huge_psize() is renamed to add_huge_page_size() and now does all necessary verification of whether it's given a valid hugepage sizes (instead of just some) and instantiates the generic hstate structure (but no more). hugetlbpage_init() now steps through the available pagesizes, checks if they're valid for hugepages by calling add_huge_page_size() and initializes the kmem_caches for the hugepage pagetables. This means we can now eliminate the mmu_huge_psizes array, since we no longer need to pass the sizing information for the pagetable caches from set_huge_psize() into hugetlbpage_init() Determination of the default huge page size is also moved from the hash code into the general hugepage code. Signed-off-by: NDavid Gibson <dwg@au1.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 David Gibson 提交于
Currently each available hugepage size uses a slightly different pagetable layout: that is, the bottem level table of pointers to hugepages is a different size, and may branch off from the normal page tables at a different level. Every hugepage aware path that needs to walk the pagetables must therefore look up the hugepage size from the slice info first, and work out the correct way to walk the pagetables accordingly. Future hardware is likely to add more possible hugepage sizes, more layout options and more mess. This patch, therefore reworks the handling of hugepage pagetables to reduce this complexity. In the new scheme, instead of having to consult the slice mask, pagetable walking code can check a flag in the PGD/PUD/PMD entries to see where to branch off to hugepage pagetables, and the entry also contains the information (eseentially hugepage shift) necessary to then interpret that table without recourse to the slice mask. This scheme can be extended neatly to handle multiple levels of self-describing "special" hugepage pagetables, although for now we assume only one level exists. This approach means that only the pagetable allocation path needs to know how the pagetables should be set out. All other (hugepage) pagetable walking paths can just interpret the structure as they go. There already was a flag bit in PGD/PUD/PMD entries for hugepage directory pointers, but it was only used for debug. We alter that flag bit to instead be a 0 in the MSB to indicate a hugepage pagetable pointer (normally it would be 1 since the pointer lies in the linear mapping). This means that asm pagetable walking can test for (and punt on) hugepage pointers with the same test that checks for unpopulated page directory entries (beq becomes bge), since hugepage pointers will always be positive, and normal pointers always negative. While we're at it, we get rid of the confusing (and grep defeating) #defining of hugepte_shift to be the same thing as mmu_huge_psizes. Signed-off-by: NDavid Gibson <dwg@au1.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 22 4月, 2009 1 次提交
-
-
由 Michael Ellerman 提交于
early_init_mmu_secondary() is called at CPU hotplug time, so it must be marked as __cpuinit, not __init. Caused by 757c74d2 ("powerpc/mm: Introduce early_init_mmu() on 64-bit"). Tested-by: NSachin Sant <sachinp@in.ibm.com> Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 24 3月, 2009 2 次提交
-
-
由 Benjamin Herrenschmidt 提交于
This moves some MMU related init code out of setup_64.c into hash_utils_64.c and calls it early_init_mmu() and early_init_mmu_secondary(). This will make it easier to plug in a new MMU type. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
由 Rusty Russell 提交于
Makes code futureproof against the impending change to mm->cpu_vm_mask. It's also a chance to use the new cpumask_ ops which take a pointer (the older ones are deprecated, but there's no hurry for arch code). Signed-off-by: NRusty Russell <rusty@rustcorp.com.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 23 2月, 2009 1 次提交
-
-
由 Anton Blanchard 提交于
At the moment we size the hashtable based on 4kB pages / 2, even on a 64kB kernel. This results in a hashtable that is much larger than it needs to be. Grab the real page size and size the hashtable based on that Note: This only has effect on non hypervisor machines. Signed-off-by: NAnton Blanchard <anton@samba.org> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 22 10月, 2008 1 次提交
-
-
由 Jon Tollefson 提交于
If mem= is used on the boot command line to limit memory then the memory block where a 16G page resides may not be available. Thanks to Michael Ellerman for finding the problem. Signed-off-by: NJon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 14 10月, 2008 1 次提交
-
-
由 David Gibson 提交于
The typesafe version of the powerpc pagetable handling (with USE_STRICT_MM_TYPECHECKS defined) has bitrotted again. This patch makes a bunch of small fixes to get it back to building status. It's still not enabled by default as gcc still generates worse code with it for some reason. Signed-off-by: NDavid Gibson <david@gibson.dropbear.id.au> Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 16 9月, 2008 1 次提交
-
-
由 Paul Mackerras 提交于
This implements CONFIG_RELOCATABLE for 64-bit by making the kernel as a position-independent executable (PIE) when it is set. This involves processing the dynamic relocations in the image in the early stages of booting, even if the kernel is being run at the address it is linked at, since the linker does not necessarily fill in words in the image for which there are dynamic relocations. (In fact the linker does fill in such words for 64-bit executables, though not for 32-bit executables, so in principle we could avoid calling relocate() entirely when we're running a 64-bit kernel at the linked address.) The dynamic relocations are processed by a new function relocate(addr), where the addr parameter is the virtual address where the image will be run. In fact we call it twice; once before calling prom_init, and again when starting the main kernel. This means that reloc_offset() returns 0 in prom_init (since it has been relocated to the address it is running at), which necessitated a few adjustments. This also changes __va and __pa to use an equivalent definition that is simpler. With the relocatable kernel, PAGE_OFFSET and MEMORY_START are constants (for 64-bit) whereas PHYSICAL_START is a variable (and KERNELBASE ideally should be too, but isn't yet). With this, relocatable kernels still copy themselves down to physical address 0 and run there. Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 03 9月, 2008 1 次提交
-
-
由 Paul Mackerras 提交于
Commit bc033b63 ("powerpc/mm: Fix attribute confusion with htab_bolt_mapping()") moved the check for whether we should make pages of the linear mapping executable from htab_bolt_mapping into its callers, including htab_initialize. A side-effect of this is that the decision is now made once for each contiguous section in the LMB array rather than for each page individually. This can often mean that the whole of the linear mapping ends up being executable. This reverts to the previous behaviour, where individual pages are checked for being part of the kernel text or not, by moving the check back down into htab_bolt_mapping. Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 20 8月, 2008 1 次提交
-
-
由 Tony Breeds 提交于
htab_dt_scan_hugepage_blocks is only used when CONFIG_HUGETLB_PAGE is defined, so guard the declaration likewise. Signed-off-by: NTony Breeds <tony@bakeyournoodle.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 11 8月, 2008 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
The function htab_bolt_mapping() is used to create permanent mappings in the MMU hash table, for example, in order to create the linear mapping of vmemmap. It's also used by early boot ioremap (before mem_init_done). However, the way ioremap uses it is incorrect as it passes it the protection flags in the "linux PTE" form while htab_bolt_mapping() expects them in the hash table format. This is made more confusing by the fact that some of those flags are actually in the same position in both cases. This fixes it all by making htab_bolt_mapping() take normal linux protection flags instead, and use a little helper to convert them to htab flags. Callers can now use the usual PAGE_* definitions safely. Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> arch/powerpc/include/asm/mmu-hash64.h | 2 - arch/powerpc/mm/hash_utils_64.c | 65 ++++++++++++++++++++-------------- arch/powerpc/mm/init_64.c | 9 +--- 3 files changed, 44 insertions(+), 32 deletions(-) Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 25 7月, 2008 2 次提交
-
-
由 Jon Tollefson 提交于
Instead of using the variable mmu_huge_psize to keep track of the huge page size we use an array of MMU_PAGE_* values. For each supported huge page size we need to know the hugepte_shift value and have a pgtable_cache. The hstate or an mmu_huge_psizes index is passed to functions so that they know which huge page size they should use. The hugepage sizes 16M and 64K are setup(if available on the hardware) so that they don't have to be set on the boot cmd line in order to use them. The number of 16G pages have to be specified at boot-time though (e.g. hugepagesz=16G hugepages=5). Signed-off-by: NJon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jon Tollefson 提交于
The 16G huge pages have to be reserved in the HMC prior to boot. The location of the pages are placed in the device tree. This patch adds code to scan the device tree during very early boot and save these page locations until hugetlbfs is ready for them. Acked-by: NAdam Litke <agl@us.ibm.com> Signed-off-by: NJon Tollefson <kniht@linux.vnet.ibm.com> Signed-off-by: NNick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 7月, 2008 1 次提交
-
-
由 Paul Mackerras 提交于
At present, if we have a kernel with a 64kB page size, and some process maps something that has to be mapped with 4kB pages (such as a cache-inhibited mapping on POWER5+, or the eHCA infiniband queue-pair pages), we change the process to use 4kB pages everywhere. This hurts the performance of HPC programs that access eHCA from userspace. With this patch, the kernel will only demote the slice(s) containing the eHCA or cache-inhibited mappings, leaving the remaining slices able to use 64kB hardware pages. This also changes the slice_get_unmapped_area code so that it is willing to place a 64k-page mapping into (or across) a 4k-page slice if there is no better alternative, i.e. if the program specified MAP_FIXED or if there is not sufficient space available in slices that are either empty or already have 64k-page mappings in them. Signed-off-by: NPaul Mackerras <paulus@samba.org> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
-
- 15 5月, 2008 1 次提交
-
-
由 Benjamin Herrenschmidt 提交于
This changes vmemmap to use a different region (region 0xf) of the address space, and to configure the page size of that region dynamically at boot. The problem with the current approach of always using 16M pages is that it's not well suited to machines that have small amounts of memory such as small partitions on pseries, or PS3's. In fact, on the PS3, failure to allocate the 16M page backing vmmemmap tends to prevent hotplugging the HV's "additional" memory, thus limiting the available memory even more, from my experience down to something like 80M total, which makes it really not very useable. The logic used by my match to choose the vmemmap page size is: - If 16M pages are available and there's 1G or more RAM at boot, use that size. - Else if 64K pages are available, use that - Else use 4K pages I've tested on a POWER6 (16M pages) and on an iSeries POWER3 (4K pages) and it seems to work fine. Note that I intend to change the way we organize the kernel regions & SLBs so the actual region will change from 0xf back to something else at one point, as I simplify the SLB miss handler, but that will be for a later patch. Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 14 5月, 2008 2 次提交
-
-
由 Michael Ellerman 提交于
... instead of having extern declarations in a .c file. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
由 Michael Ellerman 提交于
Make two vmemmap helpers static in init_64.c Make stab variables static in stab.c Make psize defs static in hash_utils_64.c Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 07 4月, 2008 1 次提交
-
-
由 Stephen Rothwell 提交于
This eliminates a warning in builds that don't define CONFIG_MEMORY_HOTPLUG. Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 01 4月, 2008 1 次提交
-
-
由 Badari Pulavarty 提交于
If the platform doesn't support hpte_removebolted(), gracefully return failure rather than success. Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 24 3月, 2008 1 次提交
-
-
由 Paul Mackerras 提交于
On pSeries, the hypervisor doesn't let us map in the eHEA ethernet adapter using 64k pages, and thus the ehea driver will fail if 64k pages are configured. This works around the problem by always using 4k pages for ioremap on pSeries (but not on other platforms). A better fix would be to check whether the partition could ever have an eHEA adapter, and only force 4k pages if it could, but this will do for 2.6.25. This is based on an earlier patch by Tony Breeds. Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 13 3月, 2008 1 次提交
-
-
由 Michael Ellerman 提交于
My recent hack to allocate the hash table under 1GB on cell was poorly tested, *cough*. It turns out on blades with large amounts of memory we fail to allocate the hash table at all. This is because RTAS has been instantiated just below 768MB, and 0-x MB are used by the kernel, leaving no areas that are both large enough and also naturally-aligned. For the cell IOMMU hack the page tables must be under 2GB, so use that as the limit instead. This has been tested on real hardware and boots happily. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 26 2月, 2008 1 次提交
-
-
由 Badari Pulavarty 提交于
For memory remove, we need to clean up htab mappings for the section of the memory we are removing. This implements support for removing htab bolted mappings for pSeries logical partitions. Other sub-archs may need to implement similar functionality for hotplug memory remove to work on them. Signed-off-by: NBadari Pulavarty <pbadari@us.ibm.com> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 14 2月, 2008 1 次提交
-
-
由 David S. Miller 提交于
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 31 1月, 2008 1 次提交
-
-
由 Michael Ellerman 提交于
In order to support the fixed IOMMU mapping (in a subsequent patch), we need the hash table to be inside the IOMMUs DMA window. This is usually 2G, but let's make sure the hash table is under 1G as that will satisfy the IOMMU requirements and also means the hash table will be on node 0. Signed-off-by: NMichael Ellerman <michael@ellerman.id.au> Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NPaul Mackerras <paulus@samba.org>
-
- 24 1月, 2008 1 次提交
-
-
由 Paul Mackerras 提交于
Using 64k pages on 64-bit PowerPC systems makes life difficult for emulators that are trying to emulate an ISA, such as x86, which use a smaller page size, since the emulator can no longer use the MMU and the normal system calls for controlling page protections. Of course, the emulator can emulate the MMU by checking and possibly remapping the address for each memory access in software, but that is pretty slow. This provides a facility for such programs to control the access permissions on individual 4k sub-pages of 64k pages. The idea is that the emulator supplies an array of protection masks to apply to a specified range of virtual addresses. These masks are applied at the level where hardware PTEs are inserted into the hardware page table based on the Linux PTEs, so the Linux PTEs are not affected. Note that this new mechanism does not allow any access that would otherwise be prohibited; it can only prohibit accesses that would otherwise be allowed. This new facility is only available on 64-bit PowerPC and only when the kernel is configured for 64k pages. The masks are supplied using a new subpage_prot system call, which takes a starting virtual address and length, and a pointer to an array of protection masks in memory. The array has a 32-bit word per 64k page to be protected; each 32-bit word consists of 16 2-bit fields, for which 0 allows any access (that is otherwise allowed), 1 prevents write accesses, and 2 or 3 prevent any access. Implicit in this is that the regions of the address space that are protected are switched to use 4k hardware pages rather than 64k hardware pages (on machines with hardware 64k page support). In fact the whole process is switched to use 4k hardware pages when the subpage_prot system call is used, but this could be improved in future to switch only the affected segments. The subpage protection bits are stored in a 3 level tree akin to the page table tree. The top level of this tree is stored in a structure that is appended to the top level of the page table tree, i.e., the pgd array. Since it will often only be 32-bit addresses (below 4GB) that are protected, the pointers to the first four bottom level pages are also stored in this structure (each bottom level page contains the protection bits for 1GB of address space), so the protection bits for addresses below 4GB can be accessed with one fewer loads than those for higher addresses. Signed-off-by: NPaul Mackerras <paulus@samba.org>
-