1. 06 12月, 2011 1 次提交
  2. 18 3月, 2011 1 次提交
  3. 10 3月, 2011 1 次提交
  4. 03 2月, 2011 1 次提交
  5. 18 11月, 2010 2 次提交
    • M
      x86: Add NX protection for kernel data · 5bd5a452
      Matthieu Castet 提交于
      This patch expands functionality of CONFIG_DEBUG_RODATA to set main
      (static) kernel data area as NX.
      
      The following steps are taken to achieve this:
      
       1. Linker script is adjusted so .text always starts and ends on a page bound
       2. Linker script is adjusted so .rodata always start and end on a page boundary
       3. NX is set for all pages from _etext through _end in mark_rodata_ro.
       4. free_init_pages() sets released memory NX in arch/x86/mm/init.c
       5. bios rom is set to x when pcibios is used.
      
      The results of patch application may be observed in the diff of kernel page
      table dumps:
      
      pcibios:
      
       -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
       ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
        0x00000000-0xc0000000           3G                           pmd
        ---[ Kernel Mapping ]---
       -0xc0000000-0xc0100000           1M     RW             GLB x  pte
       +0xc0000000-0xc00a0000         640K     RW             GLB NX pte
       +0xc00a0000-0xc0100000         384K     RW             GLB x  pte
       -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
       +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
       +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
       -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
       +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
        0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
        0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
        0xf7bfe000-0xf7c00000           8K                           pte
      
      No pcibios:
      
       -- data_nx_pt_before.txt       2009-10-13 07:48:59.000000000 -0400
       ++ data_nx_pt_after.txt        2009-10-13 07:26:46.000000000 -0400
        0x00000000-0xc0000000           3G                           pmd
        ---[ Kernel Mapping ]---
       -0xc0000000-0xc0100000           1M     RW             GLB x  pte
       +0xc0000000-0xc0100000           1M     RW             GLB NX pte
       -0xc0100000-0xc03d7000        2908K     ro             GLB x  pte
       +0xc0100000-0xc0318000        2144K     ro             GLB x  pte
       +0xc0318000-0xc03d7000         764K     ro             GLB NX pte
       -0xc03d7000-0xc0600000        2212K     RW             GLB x  pte
       +0xc03d7000-0xc0600000        2212K     RW             GLB NX pte
        0xc0600000-0xf7a00000         884M     RW         PSE GLB NX pmd
        0xf7a00000-0xf7bfe000        2040K     RW             GLB NX pte
        0xf7bfe000-0xf7c00000           8K                           pte
      
      The patch has been originally developed for Linux 2.6.34-rc2 x86 by
      Siarhei Liakh <sliakh.lkml@gmail.com> and Xuxian Jiang <jiang@cs.ncsu.edu>.
      
       -v1:  initial patch for 2.6.30
       -v2:  patch for 2.6.31-rc7
       -v3:  moved all code into arch/x86, adjusted credits
       -v4:  fixed ifdef, removed credits from CREDITS
       -v5:  fixed an address calculation bug in mark_nxdata_nx()
       -v6:  added acked-by and PT dump diff to commit log
       -v7:  minor adjustments for -tip
       -v8:  rework with the merge of "Set first MB as RW+NX"
      Signed-off-by: NSiarhei Liakh <sliakh.lkml@gmail.com>
      Signed-off-by: NXuxian Jiang <jiang@cs.ncsu.edu>
      Signed-off-by: NMatthieu CASTET <castet.matthieu@free.fr>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Kees Cook <kees.cook@canonical.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <4CE2F82E.60601@free.fr>
      [ minor cleanliness edits ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5bd5a452
    • M
      x86: Fix improper large page preservation · 64edc8ed
      matthieu castet 提交于
      This patch fixes a bug in try_preserve_large_page() which may
      result in improper large page preservation and improper
      application of page attributes to the memory area outside of the
      original change request.
      
      More specifically, the problem manifests itself when set_memory_*()
      is called for several pages at the beginning of the large page and
      try_preserve_large_page() erroneously concludes that the change can
      be applied to whole large page.
      
      The fix consists of 3 parts:
      
        1. Addition of "required" protection attributes in
           static_protections(), so .data and .bss can be guaranteed to
           stay "RW"
      
        2. static_protections() is now called for every small
           page within large page to determine compatibility of new
           protection attributes (instead of just small pages within the
           requested range).
      
        3. Large page can be preserved only if attribute change is
           large-page-aligned and covers whole large page.
      
       -v1: Try_preserve_large_page() patch for Linux 2.6.34-rc2
       -v2: Replaced pfn check with address check for kernel rw-data
      Signed-off-by: NSiarhei Liakh <sliakh.lkml@gmail.com>
      Signed-off-by: NXuxian Jiang <jiang@cs.ncsu.edu>
      Reviewed-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Kees Cook <kees.cook@canonical.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <4CE2F7F3.8030809@free.fr>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      64edc8ed
  6. 06 4月, 2010 1 次提交
  7. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  8. 23 2月, 2010 1 次提交
    • S
      x86_64, cpa: Don't work hard in preserving kernel 2M mappings when using 4K already · 281ff33b
      Suresh Siddha 提交于
      We currently enforce the !RW mapping for the kernel mapping that maps
      holes between different text, rodata and data sections. However, kernel
      identity mappings will have different RWX permissions to the pages mapping to
      text and to the pages padding (which are freed) the text, rodata sections.
      Hence kernel identity mappings will be broken to smaller pages. For 64-bit,
      kernel text and kernel identity mappings are different, so we can enable
      protection checks that come with CONFIG_DEBUG_RODATA, as well as retain 2MB
      large page mappings for kernel text.
      
      Konrad reported a boot failure with the Linux Xen paravirt guest because of
      this. In this paravirt guest case, the kernel text mapping and the kernel
      identity mapping share the same page-table pages. Thus forcing the !RW mapping
      for some of the kernel mappings also cause the kernel identity mappings to be
      read-only resulting in the boot failure. Linux Xen paravirt guest also
      uses 4k mappings and don't use 2M mapping.
      
      Fix this issue and retain large page performance advantage for native kernels
      by not working hard and not enforcing !RW for the kernel text mapping,
      if the current mapping is already using small page mapping.
      Reported-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <1266522700.2909.34.camel@sbs-t61.sc.intel.com>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: stable@kernel.org	[2.6.32, 2.6.33]
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      281ff33b
  9. 17 11月, 2009 1 次提交
    • H
      x86, pageattr: Make set_memory_(x|nx) aware of NX support · 583140af
      H. Peter Anvin 提交于
      Make set_memory_x/set_memory_nx directly aware of if NX is supported
      in the system or not, rather than requiring that every caller assesses
      that support independently.
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Tim Starling <tstarling@wikimedia.org>
      Cc: Hannes Eder <hannes@hanneseder.net>
      LKML-Reference: <1258154897-6770-4-git-send-email-hpa@zytor.com>
      Acked-by: NKees Cook <kees.cook@canonical.com>
      583140af
  10. 03 11月, 2009 2 次提交
  11. 28 10月, 2009 1 次提交
    • S
      tracing: allow to change permissions for text with dynamic ftrace enabled · 883242dd
      Steven Rostedt 提交于
      The commit 74e08179
      x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA
      prevents text sections from becoming read/write using set_memory_rw.
      
      The dynamic ftrace changes all text pages to read/write just before
      converting the calls to tracing to nops, and vice versa.
      
      I orginally just added a flag to allow this transaction when ftrace
      did the change, but I also found that when the CPA testing was running
      it would remove the read/write as well, and ftrace does not do the text
      conversion on boot up, and the CPA changes caused the dynamic tracer
      to fail on self tests.
      
      The current solution I have is to simply not to prevent
      change_page_attr from setting the RW bit for kernel text pages.
      Reported-by: NIngo Molnar <mingo@elte.hu>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      883242dd
  12. 20 10月, 2009 1 次提交
    • S
      x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA · 74e08179
      Suresh Siddha 提交于
      CONFIG_DEBUG_RODATA chops the large pages spanning boundaries of kernel
      text/rodata/data to small 4KB pages as they are mapped with different
      attributes (text as RO, RODATA as RO and NX etc).
      
      On x86_64, preserve the large page mappings for kernel text/rodata/data
      boundaries when CONFIG_DEBUG_RODATA is enabled. This is done by allowing the
      RODATA section to be hugepage aligned and having same RWX attributes
      for the 2MB page boundaries
      
      Extra Memory pages padding the sections will be freed during the end of the boot
      and the kernel identity mappings will have different RWX permissions compared to
      the kernel text mappings.
      
      Kernel identity mappings to these physical pages will be mapped with smaller
      pages but large page mappings are still retained for kernel text,rodata,data
      mappings.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20091014220254.190119924@sbs-t61.sc.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      74e08179
  13. 12 9月, 2009 1 次提交
    • E
      agp/intel: Fix the pre-9xx chipset flush. · e517a5e9
      Eric Anholt 提交于
      Ever since we enabled GEM, the pre-9xx chipsets (particularly 865) have had
      serious stability issues.  Back in May a wbinvd was added to the DRM to
      work around much of the problem.  Some failure remained -- easily visible
      by dragging a window around on an X -retro desktop, or by looking at bugzilla.
      
      The chipset flush was on the right track -- hitting the right amount of
      memory, and it appears to be the only way to flush on these chipsets, but the
      flush page was mapped uncached.  As a result, the writes trying to clear the
      writeback cache ended up bypassing the cache, and not flushing anything!  The
      wbinvd would flush out other writeback data and often cause the data we wanted
      to get flushed, but not always.  By removing the setting of the page to UC
      and instead just clflushing the data we write to try to flush it, we get the
      desired behavior with no wbinvd.
      
      This exports clflush_cache_range(), which was laying around and happened to
      basically match the code I was otherwise going to copy from the DRM.
      Signed-off-by: NEric Anholt <eric@anholt.net>
      Signed-off-by: NBrice Goglin <Brice.Goglin@ens-lyon.org>
      Cc: stable@kernel.org
      e517a5e9
  14. 10 9月, 2009 1 次提交
  15. 14 8月, 2009 1 次提交
  16. 04 8月, 2009 1 次提交
  17. 31 7月, 2009 1 次提交
    • P
      x86, pat: Fix set_memory_wc related corruption · bdc6340f
      Pallipadi, Venkatesh 提交于
      Changeset 3869c4aa
      that went in after 2.6.30-rc1 was a seemingly small change to _set_memory_wc()
      to make it complaint with SDM requirements. But, introduced a nasty bug, which
      can result in crash and/or strange corruptions when set_memory_wc is used.
      One such crash reported here
      http://lkml.org/lkml/2009/7/30/94
      
      Actually, that changeset introduced two bugs.
      * change_page_attr_set() takes &addr as first argument and can the addr value
        might have changed on return, even for single page change_page_attr_set()
        call. That will make the second change_page_attr_set() in this routine
        operate on unrelated addr, that can eventually cause strange corruptions
        and bad page state crash.
      * The second change_page_attr_set() call, before setting _PAGE_CACHE_WC, should
        clear the earlier _PAGE_CACHE_UC_MINUS, as otherwise cache attribute will not
        be WC (will be UC instead).
      
      The patch below fixes both these problems. Sending a single patch to fix both
      the problems, as the change is to the same line of code. The change to have a
      addr_copy is not very clean. But, it is simpler than making more changes
      through various routines in pageattr.c.
      
      A huge thanks to Jerome for reporting this problem and providing a simple test
      case that helped us root cause the problem.
      Reported-by: NJerome Glisse <glisse@freedesktop.org>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      LKML-Reference: <20090730214319.GA1889@linux-os.sc.intel.com>
      Acked-by: NDave Airlie <airlied@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      bdc6340f
  18. 04 7月, 2009 1 次提交
    • T
      x86,percpu: generalize lpage first chunk allocator · 8c4bfc6e
      Tejun Heo 提交于
      Generalize and move x86 setup_pcpu_lpage() into
      pcpu_lpage_first_chunk().  setup_pcpu_lpage() now is a simple wrapper
      around the generalized version.  Other than taking size parameters and
      using arch supplied callbacks to allocate/free/map memory,
      pcpu_lpage_first_chunk() is identical to the original implementation.
      
      This simplifies arch code and will help converting more archs to
      dynamic percpu allocator.
      
      While at it, factor out pcpu_calc_fc_sizes() which is common to
      pcpu_embed_first_chunk() and pcpu_lpage_first_chunk().
      
      [ Impact: code reorganization and generalization ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      8c4bfc6e
  19. 22 6月, 2009 2 次提交
    • T
      x86: fix pageattr handling for lpage percpu allocator and re-enable it · e59a1bb2
      Tejun Heo 提交于
      lpage allocator aliases a PMD page for each cpu and returns whatever
      is unused to the page allocator.  When the pageattr of the recycled
      pages are changed, this makes the two aliases point to the overlapping
      regions with different attributes which isn't allowed and known to
      cause subtle data corruption in certain cases.
      
      This can be handled in simliar manner to the x86_64 highmap alias.
      pageattr code should detect if the target pages have PMD alias and
      split the PMD alias and synchronize the attributes.
      
      pcpur allocator is updated to keep the allocated PMD pages map sorted
      in ascending address order and provide pcpu_lpage_remapped() function
      which binary searches the array to determine whether the given address
      is aliased and if so to which address.  pageattr is updated to use
      pcpu_lpage_remapped() to detect the PMD alias and split it up as
      necessary from cpa_process_alias().
      
      Jan Beulich spotted the original problem and incorrect usage of vaddr
      instead of laddr for lookup.
      
      With this, lpage percpu allocator should work correctly.  Re-enable
      it.
      
      [ Impact: fix subtle lpage pageattr bug and re-enable lpage ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      e59a1bb2
    • T
      x86: reorganize cpa_process_alias() · 992f4c1c
      Tejun Heo 提交于
      Reorganize cpa_process_alias() so that new alias condition can be
      added easily.
      
      Jan Beulich spotted problem in the original cleanup thread which
      incorrectly assumed the two existing conditions were mutially
      exclusive.
      
      [ Impact: code reorganization ]
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Jan Beulich <JBeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      992f4c1c
  20. 15 6月, 2009 1 次提交
  21. 27 5月, 2009 1 次提交
  22. 23 5月, 2009 2 次提交
  23. 10 4月, 2009 3 次提交
  24. 30 3月, 2009 1 次提交
  25. 20 3月, 2009 3 次提交
  26. 15 3月, 2009 1 次提交
    • J
      x86: add brk allocation for very, very early allocations · 93dbda7c
      Jeremy Fitzhardinge 提交于
      Impact: new interface
      
      Add a brk()-like allocator which effectively extends the bss in order
      to allow very early code to do dynamic allocations.  This is better than
      using statically allocated arrays for data in subsystems which may never
      get used.
      
      The space for brk allocations is in the bss ELF segment, so that the
      space is mapped properly by the code which maps the kernel, and so
      that bootloaders keep the space free rather than putting a ramdisk or
      something into it.
      
      The bss itself, delimited by __bss_stop, ends before the brk area
      (__brk_base to __brk_limit).  The kernel text, data and bss is reserved
      up to __bss_stop.
      
      Any brk-allocated data is reserved separately just before the kernel
      pagetable is built, as that code allocates from unreserved spaces
      in the e820 map, potentially allocating from any unused brk memory.
      Ultimately any unused memory in the brk area is used in the general
      kernel memory pool.
      
      Initially the brk space is set to 1MB, which is probably much larger
      than any user needs (the largest current user is i386 head_32.S's code
      to build the pagetables to map the kernel, which can get fairly large
      with a big kernel image and no PSE support).  So long as the system
      has sufficient memory for the bootloader to reserve the kernel+1MB brk,
      there are no bad effects resulting from an over-large brk.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      93dbda7c
  27. 12 3月, 2009 1 次提交
  28. 21 2月, 2009 1 次提交
    • I
      x86, pat: add large-PAT check to split_large_page() · 7a5714e0
      Ingo Molnar 提交于
      Impact: future-proof the split_large_page() function
      
      Linus noticed that split_large_page() is not safe wrt. the
      PAT bit: it is bit 12 on the 1GB and 2MB page table level
      (_PAGE_BIT_PAT_LARGE), and it is bit 7 on the 4K page
      table level (_PAGE_BIT_PAT).
      
      Currently it is not a problem because we never set
      _PAGE_BIT_PAT_LARGE on any of the large-page mappings - but
      should this happen in the future the split_large_page() would
      silently lift bit 12 into the lowlevel 4K pte and would start
      corrupting the physical page frame offset. Not fun.
      
      So add a debug warning, to make sure if something ever sets
      the PAT bit then this function gets updated too.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7a5714e0
  29. 20 2月, 2009 1 次提交
    • I
      x86: use the right protections for split-up pagetables · 07a66d7c
      Ingo Molnar 提交于
      Steven Rostedt found a bug in where in his modified kernel
      ftrace was unable to modify the kernel text, due to the PMD
      itself having been marked read-only as well in
      split_large_page().
      
      The fix, suggested by Linus, is to not try to 'clone' the
      reference protection of a huge-page, but to use the standard
      (and permissive) page protection bits of KERNPG_TABLE.
      
      The 'cloning' makes sense for the ptes but it's a confused and
      incorrect concept at the page table level - because the
      pagetable entry is a set of all ptes and hence cannot
      'clone' any single protection attribute - the ptes can be any
      mixture of protections.
      
      With the permissive KERNPG_TABLE, even if the pte protections
      get changed after this point (due to ftrace doing code-patching
      or other similar activities like kprobes), the resulting combined
      protections will still be correct and the pte's restrictive
      (or permissive) protections will control it.
      
      Also update the comment.
      
      This bug was there for a long time but has not caused visible
      problems before as it needs a rather large read-only area to
      trigger. Steve possibly hacked his kernel with some really
      large arrays or so. Anyway, the bug is definitely worth fixing.
      
      [ Huang Ying also experienced problems in this area when writing
        the EFI code, but the real bug in split_large_page() was not
        realized back then. ]
      Reported-by: NSteven Rostedt <rostedt@goodmis.org>
      Reported-by: NHuang Ying <ying.huang@intel.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      07a66d7c
  30. 13 2月, 2009 1 次提交
  31. 12 2月, 2009 1 次提交
  32. 21 1月, 2009 1 次提交
    • S
      x86: fix page attribute corruption with cpa() · a1e46212
      Suresh Siddha 提交于
      Impact: fix sporadic slowdowns and warning messages
      
      This patch fixes a performance issue reported by Linus on his
      Nehalem system. While Linus reverted the PAT patch (commit
      58dab916) which exposed the issue,
      existing cpa() code can potentially still cause wrong(page attribute
      corruption) behavior.
      
      This patch also fixes the "WARNING: at arch/x86/mm/pageattr.c:560" that
      various people reported.
      
      In 64bit kernel, kernel identity mapping might have holes depending
      on the available memory and how e820 reports the address range
      covering the RAM, ACPI, PCI reserved regions. If there is a 2MB/1GB hole
      in the address range that is not listed by e820 entries, kernel identity
      mapping will have a corresponding hole in its 1-1 identity mapping.
      
      If cpa() happens on the kernel identity mapping which falls into these holes,
      existing code fails like this:
      
      	__change_page_attr_set_clr()
      		__change_page_attr()
      			returns 0 because of if (!kpte). But doesn't
      			set cpa->numpages and cpa->pfn.
      		cpa_process_alias()
      			uses uninitialized cpa->pfn (random value)
      			which can potentially lead to changing the page
      			attribute of kernel text/data, kernel identity
      			mapping of RAM pages etc. oops!
      
      This bug was easily exposed by another PAT patch which was doing
      cpa() more often on kernel identity mapping holes (physical range between
      max_low_pfn_mapped and 4GB), where in here it was setting the
      cache disable attribute(PCD) for kernel identity mappings aswell.
      
      Fix cpa() to handle the kernel identity mapping holes. Retain
      the WARN() for cpa() calls to other not present address ranges
      (kernel-text/data, ioremap() addresses)
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a1e46212