1. 23 10月, 2008 1 次提交
  2. 13 10月, 2008 2 次提交
  3. 14 9月, 2008 1 次提交
  4. 21 8月, 2008 1 次提交
    • D
      i386: vmalloc size fix · e621bd18
      Dave Young 提交于
      Booting kernel with vmalloc=[any size<=16m] will oops on my pc (i386/1G memory).
      
      BUG_ON in arch/x86/mm/init_32.c triggered:
      BUG_ON((unsigned long)high_memory > VMALLOC_START);
      
      It's due to the vm area hole.
      
      In include/asm-x86/pgtable_32.h:
      #define VMALLOC_OFFSET	(8 * 1024 * 1024)
      #define VMALLOC_START	(((unsigned long)high_memory + 2 * VMALLOC_OFFSET - 1) \
      			 & ~(VMALLOC_OFFSET - 1))
      
      There's several related point:
      1. MAXMEM :
       (-__PAGE_OFFSET - __VMALLOC_RESERVE).
      The space after VMALLOC_END is included as well, I set it to
      (VMALLOC_END - PAGE_OFFSET - __VMALLOC_RESERVE)
      
      2. VMALLOC_OFFSET is not considered in __VMALLOC_RESERVE
      fixed by adding VMALLOC_OFFSET to it.
      
      3. VMALLOC_START :
       (((unsigned long)high_memory + 2 * VMALLOC_OFFSET - 1) & ~(VMALLOC_OFFSET - 1))
      So it's not always 8M, bigger than 8M possible.
      I set it to ((unsigned long)high_memory + VMALLOC_OFFSET)
      
      4. the VMALLOC_RESERVE is an unused macro, so remove it here.
      Signed-off-by: NDave Young <hidave.darkstar@gmail.com>
      Cc: akpm@linux-foundation.org
      Cc: hidave.darkstar@gmail.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      e621bd18
  5. 23 7月, 2008 2 次提交
  6. 08 7月, 2008 4 次提交
  7. 19 6月, 2008 3 次提交
    • J
      x86, MM: virtual address debug, v2 · a1bf9631
      Jiri Slaby 提交于
      I've removed the test from phys_to_nid and made a function from __phys_addr
      only when the debugging is enabled (on x86_32).
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Cc: tglx@linutronix.de
      Cc: hpa@zytor.com
      Cc: Mike Travis <travis@sgi.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: <x86@kernel.org>
      Cc: linux-mm@kvack.org
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a1bf9631
    • J
      MM: virtual address debug · 59ea7463
      Jiri Slaby 提交于
      Add some (configurable) expensive sanity checking to catch wrong address
      translations on x86.
      
      - create linux/mmdebug.h file to be able include this file in
        asm headers to not get unsolvable loops in header files
      - __phys_addr on x86_32 became a function in ioremap.c since
        PAGE_OFFSET, is_vmalloc_addr and VMALLOC_* non-constasts are undefined
        if declared in page_32.h
      - add __phys_addr_const for initializing doublefault_tss.__cr3
      
      Tested on 386, 386pae, x86_64 and x86_64 numa=fake=2.
      
      Contains Andi's enable numa virtual address debug patch.
      Signed-off-by: NJiri Slaby <jirislaby@gmail.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      59ea7463
    • J
      x86: set PAE PHYSICAL_MASK_SHIFT to 44 bits. · ad524d46
      Jeremy Fitzhardinge 提交于
      When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
      potentially have the same number of physical address bits as the
      64-bit host ("Enhanced Legacy PAE Paging").  This means, in theory,
      we could have up to 52 bits of physical address in a pte.
      
      The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
      This means that it can only represent physical addresses up to 32+12=44
      bits wide.  Rather than widening pfns everywhere, just set 2^44 as the
      Linux x86_32-PAE architectural limit for physical address size.
      
      This is a bugfix for two cases:
      1. running a 32-bit PAE kernel on a machine with
        more than 64GB RAM.
      2. running a 32-bit PAE Xen guest on a host machine with
        more than 64GB RAM
      
      In both cases, a pte could need to have more than 36 bits of physical,
      and masking it to 36-bits will cause fairly severe havoc.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ad524d46
  8. 10 6月, 2008 1 次提交
    • J
      x86: set PAE PHYSICAL_MASK_SHIFT to 44 bits. · ce8e37cd
      Jeremy Fitzhardinge 提交于
      When a 64-bit x86 processor runs in 32-bit PAE mode, a pte can
      potentially have the same number of physical address bits as the
      64-bit host ("Enhanced Legacy PAE Paging").  This means, in theory,
      we could have up to 52 bits of physical address in a pte.
      
      The 32-bit kernel uses a 32-bit unsigned long to represent a pfn.
      This means that it can only represent physical addresses up to 32+12=44
      bits wide.  Rather than widening pfns everywhere, just set 2^44 as the
      Linux x86_32-PAE architectural limit for physical address size.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce8e37cd
  9. 13 5月, 2008 1 次提交
  10. 17 4月, 2008 1 次提交
  11. 10 2月, 2008 1 次提交
    • I
      x86: construct 32-bit boot time page tables in native format. · 551889a6
      Ian Campbell 提交于
      Specifically the boot time page tables in a CONFIG_X86_PAE=y enabled
      kernel are in PAE format.
      
      early_ioremap is updated to use the standard page table accessors.
      
      Clear any mappings beyond max_low_pfn from the boot page tables in
      native_pagetable_setup_start because the initial mappings can extend
      beyond the range of physical memory and into the vmalloc area.
      
      Derived from patches by Eric Biederman and H. Peter Anvin.
      
      [ jeremy@goop.org: PAE swapper_pg_dir needs to be page-sized fix ]
      Signed-off-by: NIan Campbell <ijc@hellion.org.uk>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Mika Penttilä <mika.penttila@kolumbus.fi>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      551889a6
  12. 09 2月, 2008 2 次提交
    • I
      x86: fix pgtable_t build breakage · 3bf8f5a9
      Ingo Molnar 提交于
      Commit 2f569afd ("CONFIG_HIGHPTE vs.
      sub-page page tables") caused some build breakage due to pgtable_t only
      getting declared in the CONFIG_X86_PAE case.
      
      Move the declaration outside the PAE section.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3bf8f5a9
    • M
      CONFIG_HIGHPTE vs. sub-page page tables. · 2f569afd
      Martin Schwidefsky 提交于
      Background: I've implemented 1K/2K page tables for s390.  These sub-page
      page tables are required to properly support the s390 virtualization
      instruction with KVM.  The SIE instruction requires that the page tables
      have 256 page table entries (pte) followed by 256 page status table entries
      (pgste).  The pgstes are only required if the process is using the SIE
      instruction.  The pgstes are updated by the hardware and by the hypervisor
      for a number of reasons, one of them is dirty and reference bit tracking.
      To avoid wasting memory the standard pte table allocation should return
      1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
      
      Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
      the s390 version for pte_alloc_one cannot return a pointer to a struct
      page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
      cannot return a pointer to a pte either, since that would require more than
      32 bit for the return value of pte_alloc_one (and the pte * would not be
      accessible since its not kmapped).
      
      Solution: The only solution I found to this dilemma is a new typedef: a
      pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
      later patch.  For everybody else it will be a (struct page *).  The
      additional problem with the initialization of the ptl lock and the
      NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
      a destructor pgtable_page_dtor.  The page table allocation and free
      functions need to call these two whenever a page table page is allocated or
      freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
       To get the pgtable_t back from a pmd entry that has been installed with
      pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
      call in free_pte_range and apply_to_pte_range.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f569afd
  13. 30 1月, 2008 9 次提交
  14. 11 10月, 2007 1 次提交
  15. 22 7月, 2007 1 次提交
  16. 18 7月, 2007 1 次提交
    • M
      Add __GFP_MOVABLE for callers to flag allocations from high memory that may be migrated · 769848c0
      Mel Gorman 提交于
      It is often known at allocation time whether a page may be migrated or not.
      This patch adds a flag called __GFP_MOVABLE and a new mask called
      GFP_HIGH_MOVABLE.  Allocations using the __GFP_MOVABLE can be either migrated
      using the page migration mechanism or reclaimed by syncing with backing
      storage and discarding.
      
      An API function very similar to alloc_zeroed_user_highpage() is added for
      __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable().  The
      flags used by alloc_zeroed_user_highpage() are not changed because it would
      change the semantics of an existing API.  After this patch is applied there
      are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
      be marked deprecated if this patch is merged.
      
      Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
      shmem.c to keep all flag modifications to inode->mapping in the
      shmem_dir_alloc() helper function.  This clean-up suggestion is courtesy of
      Hugh Dickens.
      
      Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
      concept.  Credit to Hugh Dickens for catching issues with shmem swap vector
      and ramfs allocations.
      
      [akpm@linux-foundation.org: build fix]
      [hugh@veritas.com: __GFP_ZERO cleanup]
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      769848c0
  17. 03 5月, 2007 2 次提交
    • J
      [PATCH] i386: PARAVIRT: Add pagetable accessors to pack and unpack pagetable entries · 3dc494e8
      Jeremy Fitzhardinge 提交于
      Add a set of accessors to pack, unpack and modify page table entries
      (at all levels).  This allows a paravirt implementation to control the
      contents of pgd/pmd/pte entries.  For example, Xen uses this to
      convert the (pseudo-)physical address into a machine address when
      populating a pagetable entry, and converting back to pphys address
      when an entry is read.
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      3dc494e8
    • J
      [PATCH] i386: Make COMPAT_VDSO runtime selectable. · 1dbf527c
      Jeremy Fitzhardinge 提交于
      Now that relocation of the VDSO for COMPAT_VDSO users is done at
      runtime rather than compile time, it is possible to enable/disable
      compat mode at runtime.
      
      This patch allows you to enable COMPAT_VDSO mode with "vdso=2" on the
      kernel command line, or via sysctl.  (Switching on a running system
      shouldn't be done lightly; any process which was relying on the compat
      VDSO will be upset if it goes away.)
      
      The COMPAT_VDSO config option still exists, but if enabled it just
      makes vdso_enabled default to VDSO_COMPAT.
      
      +From: Hugh Dickins <hugh@veritas.com>
      
      Fix oops from i386-make-compat_vdso-runtime-selectable.patch.
      
      Even mingetty at system startup finds it easy to trigger an oops
      while reading /proc/PID/maps: though it has a good hold on the mm
      itself, that cannot stop exit_mm() from resetting tsk->mm to NULL.
      
      (It is usually show_map()'s call to get_gate_vma() which oopses,
      and I expect we could change that to check priv->tail_vma instead;
      but no matter, even m_start()'s call just after get_task_mm() is racy.)
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: "Jan Beulich" <JBeulich@novell.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roland McGrath <roland@redhat.com>
      1dbf527c
  18. 27 1月, 2007 1 次提交
    • R
      [PATCH] Fix CONFIG_COMPAT_VDSO · a1f3bb9a
      Roland McGrath 提交于
      I wouldn't mind if CONFIG_COMPAT_VDSO went away entirely.  But if it's there,
      it should work properly.  Currently it's quite haphazard: both real vma and
      fixmap are mapped, both are put in the two different AT_* slots, sysenter
      returns to the vma address rather than the fixmap address, and core dumps yet
      are another story.
      
      This patch makes CONFIG_COMPAT_VDSO disable the real vma and use the fixmap
      area consistently.  This makes it actually compatible with what the old vdso
      implementation did.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1f3bb9a
  19. 07 12月, 2006 3 次提交
  20. 28 6月, 2006 1 次提交
    • I
      [PATCH] vdso: randomize the i386 vDSO by moving it into a vma · e6e5494c
      Ingo Molnar 提交于
      Move the i386 VDSO down into a vma and thus randomize it.
      
      Besides the security implications, this feature also helps debuggers, which
      can COW a vma-backed VDSO just like a normal DSO and can thus do
      single-stepping and other debugging features.
      
      It's good for hypervisors (Xen, VMWare) too, which typically live in the same
      high-mapped address space as the VDSO, hence whenever the VDSO is used, they
      get lots of guest pagefaults and have to fix such guest accesses up - which
      slows things down instead of speeding things up (the primary purpose of the
      VDSO).
      
      There's a new CONFIG_COMPAT_VDSO (default=y) option, which provides support
      for older glibcs that still rely on a prelinked high-mapped VDSO.  Newer
      distributions (using glibc 2.3.3 or later) can turn this option off.  Turning
      it off is also recommended for security reasons: attackers cannot use the
      predictable high-mapped VDSO page as syscall trampoline anymore.
      
      There is a new vdso=[0|1] boot option as well, and a runtime
      /proc/sys/vm/vdso_enabled sysctl switch, that allows the VDSO to be turned
      on/off.
      
      (This version of the VDSO-randomization patch also has working ELF
      coredumping, the previous patch crashed in the coredumping code.)
      
      This code is a combined work of the exec-shield VDSO randomization
      code and Gerd Hoffmann's hypervisor-centric VDSO patch. Rusty Russell
      started this patch and i completed it.
      
      [akpm@osdl.org: cleanups]
      [akpm@osdl.org: compile fix]
      [akpm@osdl.org: compile fix 2]
      [akpm@osdl.org: compile fix 3]
      [akpm@osdl.org: revernt MAXMEM change]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NArjan van de Ven <arjan@infradead.org>
      Cc: Gerd Hoffmann <kraxel@suse.de>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Andi Kleen <ak@muc.de>
      Cc: Jan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      e6e5494c
  21. 27 4月, 2006 1 次提交