1. 18 11月, 2012 5 次提交
    • Y
      x86, mm: Remove early_memremap workaround for page table accessing on 64bit · 973dc4f3
      Yinghai Lu 提交于
      We try to put page table high to make room for kdump, and at that time
      those ranges are not mapped yet, and have to use ioremap to access it.
      
      Now after patch that pre-map page table top down.
      	x86, mm: setup page table in top-down
      We do not need that workaround anymore.
      
      Just use __va to return directly mapping address.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1353123563-3103-23-git-send-email-yinghai@kernel.orgAcked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      973dc4f3
    • Y
      x86, mm: setup page table in top-down · 8d57470d
      Yinghai Lu 提交于
      Get pgt_buf early from BRK, and use it to map PMD_SIZE from top at first.
      Then use mapped pages to map more ranges below, and keep looping until
      all pages get mapped.
      
      alloc_low_page will use page from BRK at first, after that buffer is used
      up, will use memblock to find and reserve pages for page table usage.
      
      Introduce min_pfn_mapped to make sure find new pages from mapped ranges,
      that will be updated when lower pages get mapped.
      
      Also add step_size to make sure that don't try to map too big range with
      limited mapped pages initially, and increase the step_size when we have
      more mapped pages on hand.
      
      We don't need to call pagetable_reserve anymore, reserve work is done
      in alloc_low_page() directly.
      
      At last we can get rid of calculation and find early pgt related code.
      
      -v2: update to after fix_xen change,
           also use MACRO for initial pgt_buf size and add comments with it.
      -v3: skip big reserved range in memblock.reserved near end.
      -v4: don't need fix_xen change now.
      -v5: add changelog about moving about reserving pagetable to alloc_low_page.
      Suggested-by: N"H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1353123563-3103-22-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      8d57470d
    • Y
      x86, mm: Don't clear page table if range is ram · eceb3632
      Yinghai Lu 提交于
      After we add code use buffer in BRK to pre-map buf for page table in
      following patch:
      	x86, mm: setup page table in top-down
      it should be safe to remove early_memmap for page table accessing.
      Instead we get panic with that.
      
      It turns out that we clear the initial page table wrongly for next range
      that is separated by holes.
      And it only happens when we are trying to map ram range one by one.
      
      We need to check if the range is ram before clearing page table.
      
      We change the loop structure to remove the extra little loop and use
      one loop only, and in that loop will caculate next at first, and check if
      [addr,next) is covered by E820_RAM.
      
      -v2: E820_RESERVED_KERN is treated as E820_RAM. EFI one change some E820_RAM
           to that, so next kernel by kexec will know that range is used already.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1353123563-3103-20-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      eceb3632
    • Y
      x86, mm: Align start address to correct big page size · 960ddb4f
      Yinghai Lu 提交于
      We are going to use buffer in BRK to map small range just under memory top,
      and use those new mapped ram to map ram range under it.
      
      The ram range that will be mapped at first could be only page aligned,
      but ranges around it are ram too, we could use bigger page to map it to
      avoid small page size.
      
      We will adjust page_size_mask in following patch:
      	x86, mm: Use big page size for small memory range
      to use big page size for small ram range.
      
      Before that patch, this patch will make sure start address to be
      aligned down according to bigger page size, otherwise entry in page
      page will not have correct value.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1353123563-3103-18-git-send-email-yinghai@kernel.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      960ddb4f
    • J
      x86, mm: Only direct map addresses that are marked as E820_RAM · 66520ebc
      Jacob Shin 提交于
      Currently direct mappings are created for [ 0 to max_low_pfn<<PAGE_SHIFT )
      and [ 4GB to max_pfn<<PAGE_SHIFT ), which may include regions that are not
      backed by actual DRAM. This is fine for holes under 4GB which are covered
      by fixed and variable range MTRRs to be UC. However, we run into trouble
      on higher memory addresses which cannot be covered by MTRRs.
      
      Our system with 1TB of RAM has an e820 that looks like this:
      
       BIOS-e820: [mem 0x0000000000000000-0x00000000000983ff] usable
       BIOS-e820: [mem 0x0000000000098400-0x000000000009ffff] reserved
       BIOS-e820: [mem 0x00000000000d0000-0x00000000000fffff] reserved
       BIOS-e820: [mem 0x0000000000100000-0x00000000c7ebffff] usable
       BIOS-e820: [mem 0x00000000c7ec0000-0x00000000c7ed7fff] ACPI data
       BIOS-e820: [mem 0x00000000c7ed8000-0x00000000c7ed9fff] ACPI NVS
       BIOS-e820: [mem 0x00000000c7eda000-0x00000000c7ffffff] reserved
       BIOS-e820: [mem 0x00000000fec00000-0x00000000fec0ffff] reserved
       BIOS-e820: [mem 0x00000000fee00000-0x00000000fee00fff] reserved
       BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved
       BIOS-e820: [mem 0x0000000100000000-0x000000e037ffffff] usable
       BIOS-e820: [mem 0x000000e038000000-0x000000fcffffffff] reserved
       BIOS-e820: [mem 0x0000010000000000-0x0000011ffeffffff] usable
      
      and so direct mappings are created for huge memory hole between
      0x000000e038000000 to 0x0000010000000000. Even though the kernel never
      generates memory accesses in that region, since the page tables mark
      them incorrectly as being WB, our (AMD) processor ends up causing a MCE
      while doing some memory bookkeeping/optimizations around that area.
      
      This patch iterates through e820 and only direct maps ranges that are
      marked as E820_RAM, and keeps track of those pfn ranges. Depending on
      the alignment of E820 ranges, this may possibly result in using smaller
      size (i.e. 4K instead of 2M or 1G) page tables.
      
      -v2: move changes from setup.c to mm/init.c, also use for_each_mem_pfn_range
      	instead.  - Yinghai Lu
      -v3: add calculate_all_table_space_size() to get correct needed page table
      	size. - Yinghai Lu
      -v4: fix add_pfn_range_mapped() to get correct max_low_pfn_mapped when
           mem map does have hole under 4g that is found by Konard on xen
           domU with 8g ram. - Yinghai
      Signed-off-by: NJacob Shin <jacob.shin@amd.com>
      Link: http://lkml.kernel.org/r/1353123563-3103-16-git-send-email-yinghai@kernel.orgSigned-off-by: NYinghai Lu <yinghai@kernel.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      66520ebc
  2. 24 10月, 2012 1 次提交
  3. 18 5月, 2012 1 次提交
    • J
      x86-64: Fix accounting in kernel_physical_mapping_init() · 20167d34
      Jan Beulich 提交于
      When finding a present and acceptable 2M/1G mapping, the number
      of pages mapped this way shouldn't be incremented (as it was
      already incremented when the earlier part of the mapping was
      established). Instead, last_map_addr needs to be updated in this
      case.
      
      Further, address increments were wrong in one place each in both
      phys_pmd_init() and phys_pud_init() (lacking the aligning down
      to the respective page boundary).
      
      As we're now doing the same calculation several times, fold it
      into a single instance using a local variable (matching how
      kernel_physical_mapping_init() itself does it at the PGD level).
      
      Observed during code inspection, not because of an actual
      problem.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/4FB3C27202000078000841A0@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20167d34
  4. 29 3月, 2012 1 次提交
  5. 11 11月, 2011 5 次提交
  6. 15 7月, 2011 1 次提交
    • T
      x86: Use HAVE_MEMBLOCK_NODE_MAP · 0608f70c
      Tejun Heo 提交于
      From 5732e1247898d67cbf837585150fe9f68974671d Mon Sep 17 00:00:00 2001
      From: Tejun Heo <tj@kernel.org>
      Date: Thu, 14 Jul 2011 11:22:16 +0200
      
      Convert x86 to HAVE_MEMBLOCK_NODE_MAP.  The only difference in memory
      handling is that allocations can't no longer cross node boundaries
      whether they're node affine or not, which shouldn't matter at all.
      
      This conversion will enable further simplification of boot memory
      handling.
      
      -v2: Fix build failure on !NUMA configurations discovered by hpa.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Link: http://lkml.kernel.org/r/20110714094423.GG3455@htj.dyndns.org
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0608f70c
  7. 12 7月, 2011 1 次提交
  8. 17 5月, 2011 1 次提交
  9. 02 5月, 2011 1 次提交
    • T
      x86-64, NUMA: Simplify hotadd memory handling · 9688678a
      Tejun Heo 提交于
      The only special handling NUMA needs to do for hotadd memory is
      determining the node for the hotadd memory given the address of it and
      there's nothing specific to specific config method used.
      
      srat_64.c does somewhat elaborate error checking on
      ACPI_SRAT_MEM_HOT_PLUGGABLE regions, remembers them and implements
      memory_add_physaddr_to_nid() which determines the node for given
      hotadd address.
      
      This is almost completely redundant.  All the information is already
      available to the generic NUMA code which already performs all the
      sanity checking and merging.  All that's necessary is not using
      __initdata from numa_meminfo and providing a function which uses it to
      map address to node.
      
      Drop the specific implementation from srat_64.c and add generic
      memory_add_physaddr_to_nid() in numa_64.c, which is enabled if
      CONFIG_MEMORY_HOTPLUG is set.  Other than dropping the code, srat_64.c
      doesn't need any change as it already calls numa_add_memblk() for hot
      pluggable regions which is enough.
      
      While at it, change CONFIG_MEMORY_HOTPLUG_SPARSE in srat_64.c to
      CONFIG_MEMORY_HOTPLUG, for NUMA on x86-64, the two are always the
      same.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      9688678a
  10. 24 3月, 2011 3 次提交
  11. 20 3月, 2011 1 次提交
    • Y
      x86: Cleanup highmap after brk is concluded · e5f15b45
      Yinghai Lu 提交于
      Now cleanup_highmap actually is in two steps: one is early in head64.c
      and only clears above _end; a second one is in init_memory_mapping() and
      tries to clean from _brk_end to _end.
      It should check if those boundaries are PMD_SIZE aligned but currently
      does not.
      Also init_memory_mapping() is called several times for numa or memory
      hotplug, so we really should not handle initial kernel mappings there.
      
      This patch moves cleanup_highmap() down after _brk_end is settled so
      we can do everything in one step.
      Also we honor max_pfn_mapped in the implementation of cleanup_highmap.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      e5f15b45
  12. 10 3月, 2011 1 次提交
  13. 04 3月, 2011 1 次提交
    • T
      x86-64, NUMA: Revert NUMA affine page table allocation · f8911250
      Tejun Heo 提交于
      This patch reverts NUMA affine page table allocation added by commit
      1411e0ec (x86-64, numa: Put pgtable to local node memory).
      
      The commit made an undocumented change where the kernel linear mapping
      strictly follows intersection of e820 memory map and NUMA
      configuration.  If the physical memory configuration has holes or NUMA
      nodes are not properly aligned, this leads to using unnecessarily
      smaller mapping size which leads to increased TLB pressure.  For
      details,
      
        http://thread.gmane.org/gmane.linux.kernel/1104672
      
      Patches to fix the problem have been proposed but the underlying code
      needs more cleanup and the approach itself seems a bit heavy handed
      and it has been determined to revert the feature for now and come back
      to it in the next developement cycle.
      
        http://thread.gmane.org/gmane.linux.kernel/1105959
      
      As init_memory_mapping_high() callsites have been consolidated since
      the commit, reverting is done manually.  Also, the RED-PEN comment in
      arch/x86/mm/init.c is not restored as the problem no longer exists
      with memblock based top-down early memory allocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      f8911250
  14. 24 2月, 2011 1 次提交
    • Y
      x86: Rename e820_table_* to pgt_buf_* · d1b19426
      Yinghai Lu 提交于
      e820_table_{start|end|top}, which are used to buffer page table
      allocation during early boot, are now derived from memblock and don't
      have much to do with e820.  Change the names so that they reflect what
      they're used for.
      
      This patch doesn't introduce any behavior change.
      
      -v2: Ingo found that earlier patch "x86: Use early pre-allocated page
           table buffer top-down" caused crash on 32bit and needed to be
           dropped.  This patch was updated to reflect the change.
      
      -tj: Updated commit description.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d1b19426
  15. 16 2月, 2011 2 次提交
    • T
      x86, NUMA: Move *_numa_init() invocations into initmem_init() · d8fc3afc
      Tejun Heo 提交于
      There's no reason for these to live in setup_arch().  Move them inside
      initmem_init().
      
      - v2: x86-32 initmem_init() weren't updated breaking 32bit builds.
        Fixed.  Found by Ankita.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ankita Garg <ankita@in.ibm.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      d8fc3afc
    • T
      x86, NUMA: Drop @start/last_pfn from initmem_init() · 86ef4dbf
      Tejun Heo 提交于
      initmem_init() extensively accesses and modifies global data
      structures and the parameters aren't even followed depending on which
      path is being used.  Drop @start/last_pfn and let it deal with
      @max_pfn directly.  This is in preparation for further NUMA init
      cleanups.
      
      - v2: x86-32 initmem_init() weren't updated breaking 32bit builds.
        Fixed.  Found by Yinghai.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Cc: Shaohui Zheng <shaohui.zheng@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      86ef4dbf
  16. 04 2月, 2011 1 次提交
  17. 30 12月, 2010 2 次提交
    • Y
      x86-64, numa: Put pgtable to local node memory · 1411e0ec
      Yinghai Lu 提交于
      Introduce init_memory_mapping_high(), and use it with 64bit.
      
      It will go with every memory segment above 4g to create page table to the
      memory range itself.
      
      before this patch all page tables was on one node.
      
      with this patch, one RED-PEN is killed
      
      debug out for 8 sockets system after patch
      [    0.000000] initial memory mapped : 0 - 20000000
      [    0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff]
      [    0.000000]  0000000000 - 007f600000 page 2M
      [    0.000000]  007f600000 - 007f750000 page 4k
      [    0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff]
      [    0.000000] RAMDISK: 7bc84000 - 7f745000
      ....
      [    0.000000] Adding active range (0, 0x10, 0x95) 0 entries of 3200 used
      [    0.000000] Adding active range (0, 0x100, 0x7f750) 1 entries of 3200 used
      [    0.000000] Adding active range (0, 0x100000, 0x1080000) 2 entries of 3200 used
      [    0.000000] Adding active range (1, 0x1080000, 0x2080000) 3 entries of 3200 used
      [    0.000000] Adding active range (2, 0x2080000, 0x3080000) 4 entries of 3200 used
      [    0.000000] Adding active range (3, 0x3080000, 0x4080000) 5 entries of 3200 used
      [    0.000000] Adding active range (4, 0x4080000, 0x5080000) 6 entries of 3200 used
      [    0.000000] Adding active range (5, 0x5080000, 0x6080000) 7 entries of 3200 used
      [    0.000000] Adding active range (6, 0x6080000, 0x7080000) 8 entries of 3200 used
      [    0.000000] Adding active range (7, 0x7080000, 0x8080000) 9 entries of 3200 used
      [    0.000000] init_memory_mapping: [0x00000100000000-0x0000107fffffff]
      [    0.000000]  0100000000 - 1080000000 page 2M
      [    0.000000] kernel direct mapping tables up to 1080000000 @ [0x107ffbd000-0x107fffffff]
      [    0.000000]     memblock_x86_reserve_range: [0x107ffc2000-0x107fffffff]          PGTABLE
      [    0.000000] init_memory_mapping: [0x00001080000000-0x0000207fffffff]
      [    0.000000]  1080000000 - 2080000000 page 2M
      [    0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff]
      [    0.000000]     memblock_x86_reserve_range: [0x207ffc0000-0x207fffffff]          PGTABLE
      [    0.000000] init_memory_mapping: [0x00002080000000-0x0000307fffffff]
      [    0.000000]  2080000000 - 3080000000 page 2M
      [    0.000000] kernel direct mapping tables up to 3080000000 @ [0x307ff3d000-0x307fffffff]
      [    0.000000]     memblock_x86_reserve_range: [0x307ffc0000-0x307fffffff]          PGTABLE
      [    0.000000] init_memory_mapping: [0x00003080000000-0x0000407fffffff]
      [    0.000000]  3080000000 - 4080000000 page 2M
      [    0.000000] kernel direct mapping tables up to 4080000000 @ [0x407fefd000-0x407fffffff]
      [    0.000000]     memblock_x86_reserve_range: [0x407ffc0000-0x407fffffff]          PGTABLE
      [    0.000000] init_memory_mapping: [0x00004080000000-0x0000507fffffff]
      [    0.000000]  4080000000 - 5080000000 page 2M
      [    0.000000] kernel direct mapping tables up to 5080000000 @ [0x507febd000-0x507fffffff]
      [    0.000000]     memblock_x86_reserve_range: [0x507ffc0000-0x507fffffff]          PGTABLE
      [    0.000000] init_memory_mapping: [0x00005080000000-0x0000607fffffff]
      [    0.000000]  5080000000 - 6080000000 page 2M
      [    0.000000] kernel direct mapping tables up to 6080000000 @ [0x607fe7d000-0x607fffffff]
      [    0.000000]     memblock_x86_reserve_range: [0x607ffc0000-0x607fffffff]          PGTABLE
      [    0.000000] init_memory_mapping: [0x00006080000000-0x0000707fffffff]
      [    0.000000]  6080000000 - 7080000000 page 2M
      [    0.000000] kernel direct mapping tables up to 7080000000 @ [0x707fe3d000-0x707fffffff]
      [    0.000000]     memblock_x86_reserve_range: [0x707ffc0000-0x707fffffff]          PGTABLE
      [    0.000000] init_memory_mapping: [0x00007080000000-0x0000807fffffff]
      [    0.000000]  7080000000 - 8080000000 page 2M
      [    0.000000] kernel direct mapping tables up to 8080000000 @ [0x807fdfc000-0x807fffffff]
      [    0.000000]     memblock_x86_reserve_range: [0x807ffbf000-0x807fffffff]          PGTABLE
      [    0.000000] Initmem setup node 0 [0000000000000000-000000107fffffff]
      [    0.000000]   NODE_DATA [0x0000107ffbd000-0x0000107ffc1fff]
      [    0.000000] Initmem setup node 1 [0000001080000000-000000207fffffff]
      [    0.000000]   NODE_DATA [0x0000207ffbb000-0x0000207ffbffff]
      [    0.000000] Initmem setup node 2 [0000002080000000-000000307fffffff]
      [    0.000000]   NODE_DATA [0x0000307ffbb000-0x0000307ffbffff]
      [    0.000000] Initmem setup node 3 [0000003080000000-000000407fffffff]
      [    0.000000]   NODE_DATA [0x0000407ffbb000-0x0000407ffbffff]
      [    0.000000] Initmem setup node 4 [0000004080000000-000000507fffffff]
      [    0.000000]   NODE_DATA [0x0000507ffbb000-0x0000507ffbffff]
      [    0.000000] Initmem setup node 5 [0000005080000000-000000607fffffff]
      [    0.000000]   NODE_DATA [0x0000607ffbb000-0x0000607ffbffff]
      [    0.000000] Initmem setup node 6 [0000006080000000-000000707fffffff]
      [    0.000000]   NODE_DATA [0x0000707ffbb000-0x0000707ffbffff]
      [    0.000000] Initmem setup node 7 [0000007080000000-000000807fffffff]
      [    0.000000]   NODE_DATA [0x0000807ffba000-0x0000807ffbefff]
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4D1933D1.9020609@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      1411e0ec
    • Y
      x86-64, mm: Put early page table high · 4b239f45
      Yinghai Lu 提交于
      While dubug kdump, found current kernel will have problem with crashkernel=512M.
      
      It turns out that initial mapping is to 512M, and later initial mapping to 4G
      (acutally is 2040M in my platform), will put page table near 512M.
      then initial mapping to 128g will be near 2g.
      
      before this patch:
      [    0.000000] initial memory mapped : 0 - 20000000
      [    0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff]
      [    0.000000]  0000000000 - 007f600000 page 2M
      [    0.000000]  007f600000 - 007f750000 page 4k
      [    0.000000] kernel direct mapping tables up to 7f750000 @ [0x1fffc000-0x1fffffff]
      [    0.000000]     memblock_x86_reserve_range: [0x1fffc000-0x1fffdfff]          PGTABLE
      [    0.000000] init_memory_mapping: [0x00000100000000-0x0000207fffffff]
      [    0.000000]  0100000000 - 2080000000 page 2M
      [    0.000000] kernel direct mapping tables up to 2080000000 @ [0x7bc01000-0x7bc83fff]
      [    0.000000]     memblock_x86_reserve_range: [0x7bc01000-0x7bc7efff]          PGTABLE
      [    0.000000] RAMDISK: 7bc84000 - 7f745000
      [    0.000000] crashkernel reservation failed - No suitable area found.
      
      after patch:
      [    0.000000] initial memory mapped : 0 - 20000000
      [    0.000000] init_memory_mapping: [0x00000000000000-0x0000007f74ffff]
      [    0.000000]  0000000000 - 007f600000 page 2M
      [    0.000000]  007f600000 - 007f750000 page 4k
      [    0.000000] kernel direct mapping tables up to 7f750000 @ [0x7f74c000-0x7f74ffff]
      [    0.000000]     memblock_x86_reserve_range: [0x7f74c000-0x7f74dfff]          PGTABLE
      [    0.000000] init_memory_mapping: [0x00000100000000-0x0000207fffffff]
      [    0.000000]  0100000000 - 2080000000 page 2M
      [    0.000000] kernel direct mapping tables up to 2080000000 @ [0x207ff7d000-0x207fffffff]
      [    0.000000]     memblock_x86_reserve_range: [0x207ff7d000-0x207fffafff]          PGTABLE
      [    0.000000] RAMDISK: 7bc84000 - 7f745000
      [    0.000000]     memblock_x86_reserve_range: [0x17000000-0x36ffffff]     CRASH KERNEL
      [    0.000000] Reserving 512MB of memory at 368MB for crashkernel (System RAM: 133120MB)
      
      It means with the patch, page table for [0, 2g) will need 2g, instead of under 512M,
      page table for [4g, 128g) will be near 128g, instead of under 2g.
      
      That would good, if we have lots of memory above 4g, like 1024g, or 2048g or 16T, will not put
      related page table under 2g. that would be have chance to fill the under 2g if 1G or 2M page is
      not used.
      
      the code change will use add map_low_page() and update unmap_low_page() for 64bit, and use them
      to get access the corresponding high memory for page table setting.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4D0C0734.7060900@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4b239f45
  18. 22 11月, 2010 1 次提交
    • L
      x86: Resume trampoline must be executable · 691513f7
      Lin Ming 提交于
      commit 5bd5a452(x86: Add NX protection for kernel data) marked the
      trampoline area NX - which unsurprisingly breaks resume and cpu
      hotplug.
      
      Revert the portion of that commit, which touches the trampoline.
      
      Originally-from: Lin Ming <ming.m.lin@intel.com>
      LKML-Reference: <1290410581.2405.24.camel@minggr.sh.intel.com>
      Cc: Matthieu Castet <castet.matthieu@free.fr>
      Cc: Siarhei Liakh <sliakh.lkml@gmail.com>
      Cc: Xuxian Jiang <jiang@cs.ncsu.edu>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Tested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      691513f7
  19. 18 11月, 2010 1 次提交
    • M
      x86: Add NX protection for kernel data · 5bd5a452
      Matthieu Castet 提交于
      This patch expands functionality of CONFIG_DEBUG_RODATA to set main
      (static) kernel data area as NX.
      
      The following steps are taken to achieve this:
      
       1. Linker script is adjusted so .text always starts and ends on a page bound
       2. Linker script is adjusted so .rodata always start and end on a page boundary
       3. NX is set for all pages from _etext through _end in mark_rodata_ro.
       4. free_init_pages() sets released memory NX in arch/x86/mm/init.c
       5. bios rom is set to x when pcibios is used.
      
      The results of patch application may be observed in the diff of kernel page
      table dumps:
      
      pcibios:
      
       -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
       ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
        0x00000000-0xc0000000           3G                           pmd
        ---[ Kernel Mapping ]---
       -0xc0000000-0xc0100000           1M     RW             GLB x  pte
       +0xc0000000-0xc00a0000         640K     RW             GLB NX pte
       +0xc00a0000-0xc0100000         384K     RW             GLB x  pte
       -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
       +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
       +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
       -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
       +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
        0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
        0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
        0xf7bfe000-0xf7c00000           8K                           pte
      
      No pcibios:
      
       -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
       ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
        0x00000000-0xc0000000           3G                           pmd
        ---[ Kernel Mapping ]---
       -0xc0000000-0xc0100000           1M     RW             GLB x  pte
       +0xc0000000-0xc0100000           1M     RW             GLB NX pte
       -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
       +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
       +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
       -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
       +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
        0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
        0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
        0xf7bfe000-0xf7c00000           8K                           pte
      
      The patch has been originally developed for Linux 2.6.34-rc2 x86 by
      Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.
      
       -v1:  initial patch for 2.6.30
       -v2:  patch for 2.6.31-rc7
       -v3:  moved all code into arch/x86, adjusted credits
       -v4:  fixed ifdef, removed credits from CREDITS
       -v5:  fixed an address calculation bug in mark_nxdata_nx()
       -v6:  added acked-by and PT dump diff to commit log
       -v7:  minor adjustments for -tip
       -v8:  rework with the merge of "Set first MB as RW+NX"
      Signed-off-by: NSiarhei Liakh <sliakh.lkml@gmail.com>
      Signed-off-by: NXuxian Jiang <jiang@cs.ncsu.edu>
      Signed-off-by: NMatthieu CASTET <castet.matthieu@free.fr>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Kees Cook <kees.cook@canonical.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <4CE2F82E.60601@free.fr>
      [ minor cleanliness edits ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5bd5a452
  20. 28 10月, 2010 1 次提交
  21. 20 10月, 2010 2 次提交
  22. 23 9月, 2010 1 次提交
  23. 03 9月, 2010 1 次提交
  24. 28 8月, 2010 4 次提交