1. 30 1月, 2008 3 次提交
  2. 15 1月, 2008 1 次提交
    • I
      x86: fix boot crash on HIGHMEM4G && SPARSEMEM · 23be8c7d
      Ingo Molnar 提交于
      Denys Fedoryshchenko reported a bootup crash when he upgraded
      his system from 3GB to 4GB RAM:
      
         http://lkml.org/lkml/2008/1/7/9
      
      the bug is due to HIGHMEM4G && SPARSEMEM kernels making pfn_to_page()
      to return an invalid pointer when the pfn is in a memory hole. The
      256 MB PCI aperture at the end of RAM was not mapped by sparsemem,
      and hence the pfn was not valid. But set_highmem_pages_init() iterated
      this range without checking the pfn's validity first.
      
      this bug was probably present in the sparsemem code ever since sparsemem
      has been introduced in v2.6.13. It was masked due to HIGHMEM64G using
      larger memory regions in sparsemem_32.h:
      
       #ifdef CONFIG_X86_PAE
       #define SECTION_SIZE_BITS       30
       #define MAX_PHYSADDR_BITS       36
       #define MAX_PHYSMEM_BITS        36
       #else
       #define SECTION_SIZE_BITS       26
       #define MAX_PHYSADDR_BITS       32
       #define MAX_PHYSMEM_BITS        32
       #endif
      
      which creates 1GB sparsemem regions instead of 64MB sparsemem regions.
      So in practice we only ever created true sparsemem holes on x86 with
      HIGHMEM4G - but that was rarely used by distros.
      
      ( btw., we could probably save 2MB of mem_map[]s on X86_PAE if we reduced
        the sparsemem region size to 256 MB. )
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      23be8c7d
  3. 18 10月, 2007 1 次提交
    • I
      x86: fix CONFIG_PAGEALLOC related boot hangs/OOMs · 509a80c4
      Ingo Molnar 提交于
      if CONFIG_PAGEALLOC is enabled then X86_FEATURE_PSE is disabled and all
      the kernel physical RAM pagetables are set up as 4K pages. This is
      needed so that CONFIG_PAGEALLOC can do finegrained mapping and unmapping
      of pages.
      
      as a side-effect though, the total size of memory allocated as kernel
      pagetables increases significantly. All these pagetables are allocated
      via alloc_bootmem_low_pages(), straight out of the lowmem DMA pool. If
      the system has enough RAM and a large kernel image then almost all of
      the 16 MB lowmem DMA pool is allocated to the image and to pagetables -
      leaving no space for __GFP_DMA allocations.
      
      this results in drivers failing and the bootup hanging:
      
       swapper invoked oom-killer: gfp_mask=0x80d1, order=0, oomkilladj=0
        [<4015059f>] out_of_memory+0x17f/0x1c0
        [<40151f3c>] __alloc_pages+0x37c/0x3a0
        [<40168cd7>] slob_new_page+0x37/0x50
        [<40168dff>] slob_alloc+0x10f/0x190
        [<40169010>] __kmalloc_node+0x80/0x90
        [<405a17e3>] scsi_host_alloc+0x33/0x2c0
        [<405a1a82>] scsi_register+0x12/0x60
        [<40d5889e>] aha1542_detect+0x9e/0x940
        [<405c5ba5>] ultrastor_detect+0x265/0x5f0
        [<401352f5>] getnstimeofday+0x35/0xf0
        [<40d58751>] init_this_scsi_driver+0x41/0xf0
        [<40d0b856>] kernel_init+0x136/0x310
        [<40d58710>] init_this_scsi_driver+0x0/0xf0
        [<40d0b720>] kernel_init+0x0/0x310
        [<40105547>] kernel_thread_helper+0x7/0x10
        =======================
      
      the fix is to first allocate from above the DMA pool, and if that fails
      (for example due to it being a machine with less than 16 MB of RAM),
      allocate from the DMA pool as a fallback.
      
      With this fix applied i was able to boot a PAGEALLOC=y kernel that would
      hang before.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      509a80c4
  4. 17 10月, 2007 2 次提交
    • J
      remove dead code in pgtable_cache_init · 4f817847
      Jeremy Fitzhardinge 提交于
      The conversion from using a slab cache to quicklist left some residual
      dead code.
      
      I note that in the conversion it now always allocates a whole page for
      the pgd, rather than the 32 bytes needed for a PAE pgd.  Was this
      intended?
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: William Lee Irwin III <wli@holomorphy.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      4f817847
    • K
      fix memory hot remove not configured case. · 48e94196
      KAMEZAWA Hiroyuki 提交于
      Now, arch dependent code around CONFIG_MEMORY_HOTREMOVE is a mess.
      This patch cleans up them. This is against 2.6.23-rc6-mm1.
      
       - fix compile failure on ia64/ CONFIG_MEMORY_HOTPLUG && !CONFIG_MEMORY_HOTREMOVE case.
       - For !CONFIG_MEMORY_HOTREMOVE, add generic no-op remove_memory(),
         which returns -EINVAL.
       - removed remove_pages() only used in powerpc.
       - removed no-op remove_memory() in i386, sh, sparc64, x86_64.
      
       - only powerpc returns -ENOSYS at memory hot remove(no-op). changes it
         to return -EINVAL.
      
      Note:
      Currently, only ia64 supports CONFIG_MEMORY_HOTREMOVE. I welcome other
      archs if there are requirements and testers.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48e94196
  5. 11 10月, 2007 2 次提交
  6. 30 7月, 2007 1 次提交
  7. 27 7月, 2007 1 次提交
  8. 25 7月, 2007 1 次提交
  9. 23 7月, 2007 1 次提交
    • A
      x86: Fix alternatives and kprobes to remap write-protected kernel text · 19d36ccd
      Andi Kleen 提交于
      Reenable kprobes and alternative patching when the kernel text is write
      protected by DEBUG_RODATA
      
      Add a general utility function to change write protected text.  The new
      function remaps the code using vmap to write it and takes care of CPU
      synchronization.  It also does CLFLUSH to make icache recovery faster.
      
      There are some limitations on when the function can be used, see the
      comment.
      
      This is a newer version that also changes the paravirt_ops code.
      text_poke also supports multi byte patching now.
      
      Contains bug fixes from Zach Amsden and suggestions from Mathieu
      Desnoyers.
      
      Cc: Jan Beulich <jbeulich@novell.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
      Cc: Zach Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19d36ccd
  10. 22 7月, 2007 1 次提交
  11. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  12. 18 7月, 2007 2 次提交
  13. 22 6月, 2007 1 次提交
  14. 13 5月, 2007 1 次提交
  15. 09 5月, 2007 1 次提交
  16. 07 5月, 2007 1 次提交
    • L
      Revert "[PATCH] x86: __pa and __pa_symbol address space separation" · e3ebadd9
      Linus Torvalds 提交于
      This was broken.  It adds complexity, for no good reason.  Rather than
      separate __pa() and __pa_symbol(), we should deprecate __pa_symbol(),
      and preferably __pa() too - and just use "virt_to_phys()" instead, which
      is more readable and has nicer semantics.
      
      However, right now, just undo the separation, and make __pa_symbol() be
      the exact same as __pa().  That fixes the bugs this patch introduced,
      and we can do the fairly obvious cleanups later.
      
      Do the new __phys_addr() function (which is now the actual workhorse for
      the unified __pa()/__pa_symbol()) as a real external function, that way
      all the potential issues with compile/link-time optimizations of
      constant symbol addresses go away, and we can also, if we choose to, add
      more sanity-checking of the argument.
      
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Vivek Goyal <vgoyal@in.ibm.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3ebadd9
  17. 03 5月, 2007 4 次提交
    • J
      [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing · 5311ab62
      Jeremy Fitzhardinge 提交于
      Normally when running in PAE mode, the 4th PMD maps the kernel address space,
      which can be shared among all processes (since they all need the same kernel
      mappings).
      
      Xen, however, does not allow guests to have the kernel pmd shared between page
      tables, so parameterize pgtable.c to allow both modes of operation.
      
      There are several side-effects of this.  One is that vmalloc will update the
      kernel address space mappings, and those updates need to be propagated into
      all processes if the kernel mappings are not intrinsically shared.  In the
      non-PAE case, this is done by maintaining a pgd_list of all processes; this
      list is used when all process pagetables must be updated.  pgd_list is
      threaded via otherwise unused entries in the page structure for the pgd, which
      means that the pgd must be page-sized for this to work.
      
      Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
      pgd to page aligned anyway, so this patch forces the pgd to be page
      aligned+sized when the kernel pmd is unshared, to accomodate both these
      requirements.
      
      Also, since there may be several distinct kernel pmds (if the user/kernel
      split is below 3G), there's no point in allocating them from a slab cache;
      they're just allocated with get_free_page and initialized appropriately.  (Of
      course the could be cached if there is just a single kernel pmd - which is the
      default with a 3G user/kernel split - but it doesn't seem worthwhile to add
      yet another case into this code).
      
      [ Many thanks to wli for review comments. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NWilliam Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      5311ab62
    • J
      [PATCH] i386: PARAVIRT: Hooks to set up initial pagetable · b239fb25
      Jeremy Fitzhardinge 提交于
      This patch introduces paravirt_ops hooks to control how the kernel's
      initial pagetable is set up.
      
      In the case of a native boot, the very early bootstrap code creates a
      simple non-PAE pagetable to map the kernel and physical memory.  When
      the VM subsystem is initialized, it creates a proper pagetable which
      respects the PAE mode, large pages, etc.
      
      When booting under a hypervisor, there are many possibilities for what
      paging environment the hypervisor establishes for the guest kernel, so
      the constructon of the kernel's pagetable depends on the hypervisor.
      
      In the case of Xen, the hypervisor boots the kernel with a fully
      constructed pagetable, which is already using PAE if necessary.  Also,
      Xen requires particular care when constructing pagetables to make sure
      all pagetables are always mapped read-only.
      
      In order to make this easier, kernel's initial pagetable construction
      has been changed to only allocate and initialize a pagetable page if
      there's no page already present in the pagetable.  This allows the Xen
      paravirt backend to make a copy of the hypervisor-provided pagetable,
      allowing the kernel to establish any more mappings it needs while
      keeping the existing ones.
      
      A slightly subtle point which is worth highlighting here is that Xen
      requires all kernel mappings to share the same pte_t pages between all
      pagetables, so that updating a kernel page's mapping in one pagetable
      is reflected in all other pagetables.  This makes it possible to
      allocate a page and attach it to a pagetable without having to
      explicitly enumerate that page's mapping in all pagetables.
      
      And:
      
      +From: "Eric W. Biederman" <ebiederm@xmission.com>
      
      If we don't set the leaf page table entries it is quite possible that
      will inherit and incorrect page table entry from the initial boot
      page table setup in head.S.  So we need to redo the effort here,
      so we pick up PSE, PGE and the like.
      
      Hypervisors like Xen require that their page tables be read-only,
      which is slightly incompatible with our low identity mappings, however
      I discussed this with Jeremy he has modified the Xen early set_pte
      function to avoid problems in this area.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Acked-by: NWilliam Irwin <bill.irwin@oracle.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      b239fb25
    • J
      [PATCH] x86: tighten kernel image page access rights · 6fb14755
      Jan Beulich 提交于
      On x86-64, kernel memory freed after init can be entirely unmapped instead
      of just getting 'poisoned' by overwriting with a debug pattern.
      
      On i386 and x86-64 (under CONFIG_DEBUG_RODATA), kernel text and bug table
      can also be write-protected.
      
      Compared to the first version, this one prevents re-creating deleted
      mappings in the kernel image range on x86-64, if those got removed
      previously. This, together with the original changes, prevents temporarily
      having inconsistent mappings when cacheability attributes are being
      changed on such pages (e.g. from AGP code). While on i386 such duplicate
      mappings don't exist, the same change is done there, too, both for
      consistency and because checking pte_present() before using various other
      pte_XXX functions is a requirement anyway. At once, i386 code gets
      adjusted to use pte_huge() instead of open coding this.
      
      AK: split out cpa() changes
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      6fb14755
    • V
      [PATCH] x86: __pa and __pa_symbol address space separation · 0dbf7028
      Vivek Goyal 提交于
      Currently __pa_symbol is for use with symbols in the kernel address
      map and __pa is for use with pointers into the physical memory map.
      But the code is implemented so you can usually interchange the two.
      
      __pa which is much more common can be implemented much more cheaply
      if it is it doesn't have to worry about any other kernel address
      spaces.  This is especially true with a relocatable kernel as
      __pa_symbol needs to peform an extra variable read to resolve
      the address.
      
      There is a third macro that is added for the vsyscall data
      __pa_vsymbol for finding the physical addesses of vsyscall pages.
      
      Most of this patch is simply sorting through the references to
      __pa or __pa_symbol and using the proper one.  A little of
      it is continuing to use a physical address when we have it
      instead of recalculating it several times.
      
      swapper_pgd is now NULL.  leave_mm now uses init_mm.pgd
      and init_mm.pgd is initialized at boot (instead of compile time)
      to the physmem virtual mapping of init_level4_pgd.  The
      physical address changed.
      
      Except for the for EMPTY_ZERO page all of the remaining references
      to __pa_symbol appear to be during kernel initialization.  So this
      should reduce the cost of __pa in the common case, even on a relocated
      kernel.
      
      As this is technically a semantic change we need to be on the lookout
      for anything I missed.  But it works for me (tm).
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NVivek Goyal <vgoyal@in.ibm.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      0dbf7028
  18. 13 2月, 2007 1 次提交
    • Z
      [PATCH] MM: page allocation hooks for VMI backend · c119ecce
      Zachary Amsden 提交于
      The VMI backend uses explicit page type notification to track shadow page
      tables.  The allocation of page table roots is especially tricky.  We need to
      clone the root for non-PAE mode while it is protected under the pgd lock to
      correctly copy the shadow.
      
      We don't need to allocate pgds in PAE mode, (PDPs in Intel terminology) as
      they only have 4 entries, and are cached entirely by the processor, which
      makes shadowing them rather simple.
      
      For base page table level allocation, pmd_populate provides the exact hook
      point we need.  Also, we need to allocate pages when splitting a large page,
      and we must release pages before returning the page to any free pool.
      
      Despite being required with these slightly odd semantics for VMI, Xen also
      uses these hooks to determine the exact moment when page tables are created or
      released.
      
      AK: All nops for other architectures
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      c119ecce
  19. 11 1月, 2007 1 次提交
  20. 23 12月, 2006 1 次提交
  21. 08 12月, 2006 1 次提交
  22. 07 12月, 2006 1 次提交
  23. 04 10月, 2006 1 次提交
  24. 01 10月, 2006 1 次提交
    • Z
      [PATCH] paravirt: update pte hook · 789e6ac0
      Zachary Amsden 提交于
      Add a pte_update_hook which notifies about pte changes that have been made
      without using the set_pte / clear_pte interfaces.  This allows shadow mode
      hypervisors which do not trap on page table access to maintain synchronized
      shadows.
      
      It also turns out, there was one pte update in PAE mode that wasn't using any
      accessor interface at all for setting NX protection.  Considering it is PAE
      specific, and the accessor is i386 specific, I didn't want to add a generic
      encapsulation of this behavior yet.
      Signed-off-by: NZachary Amsden <zach@vmware.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
      Cc: Andi Kleen <ak@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      789e6ac0
  25. 26 9月, 2006 4 次提交
    • J
      [PATCH] x86: make __FIXADDR_TOP variable to allow it to make space for a hypervisor · 052e7994
      Jeremy Fitzhardinge 提交于
      Make __FIXADDR_TOP a variable, so that it can be set to not get in the way of
      address space a hypervisor may want to reserve.
      
      Original patch by Gerd Hoffmann <kraxel@suse.de>
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NChris Wright <chrisw@sous-sol.org>
      Cc: Gerd Hoffmann <kraxel@suse.de>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      052e7994
    • C
      [PATCH] reduce MAX_NR_ZONES: remove two strange uses of MAX_NR_ZONES · 776ed98b
      Christoph Lameter 提交于
      I keep seeing zones on various platforms that are never used and wonder why we
      compile support for them into the kernel.  Counters show up for HIGHMEM and
      DMA32 that are alway zero.
      
      This patch allows the removal of ZONE_DMA32 for non x86_64 architectures and
      it will get rid of ZONE_HIGHMEM for arches not using highmem (like 64 bit
      architectures).  If an arch does not define CONFIG_HIGHMEM then ZONE_HIGHMEM
      will not be defined.  Similarly if an arch does not define CONFIG_ZONE_DMA32
      then ZONE_DMA32 will not be defined.
      
      No current architecture uses all the 4 zones (DMA,DMA32,NORMAL,HIGH) that we
      have now.  The patchset will reduce the number of zones for all platforms.
      
      On many platforms that do not have DMA32 or HIGHMEM this will reduce the
      number of zones by 50%.  F.e.  ia64 only uses DMA and NORMAL.
      
      Large amounts of memory can be saved for larger systemss that may have a few
      hundred NUMA nodes.
      
      With ZONE_DMA32 and ZONE_HIGHMEM support optional MAX_NR_ZONES will be 2 for
      many non i386 platforms and even for i386 without CONFIG_HIGHMEM set.
      
      Tested on ia64, x86_64 and on i386 with and without highmem.
      
      The patchset consists of 11 patches that are following this message.
      
      One could go even further than this patchset and also make ZONE_DMA optional
      because some platforms do not need a separate DMA zone and can do DMA to all
      of memory.  This could reduce MAX_NR_ZONES to 1.  Such a patchset will
      hopefully follow soon.
      
      This patch:
      
      Fix strange uses of MAX_NR_ZONES
      
      Sometimes we use MAX_NR_ZONES - x to refer to a zone.  Make that explicit.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@osdl.org>
      Signed-off-by: NLinus Torvalds <torvalds@osdl.org>
      776ed98b
    • R
      [PATCH] i386: Replace i386 open-coded cmdline parsing with · 1a3f239d
      Rusty Russell 提交于
      This patch replaces the open-coded early commandline parsing
      throughout the i386 boot code with the generic mechanism (already used
      by ppc, powerpc, ia64 and s390).  The code was inconsistent with
      whether it deletes the option from the cmdline or not, meaning some of
      these will get passed through the environment into init.
      
      This transformation is mainly mechanical, but there are some notable
      parts:
      
      1) Grammar: s/linux never set's it up/linux never sets it up/
      
      2) Remove hacked-in earlyprintk= option scanning.  When someone
         actually implements CONFIG_EARLY_PRINTK, then they can use
         early_param().
      [AK: actually it is implemented, but I'm adding the early_param it in the next
      x86-64 patch]
      
      3) Move declaration of generic_apic_probe() from setup.c into asm/apic.h
      
      4) Various parameters now moved into their appropriate files (thanks Andi).
      
      5) All parse functions which examine arg need to check for NULL,
         except one where it has subtle humor value.
      
      AK: readded acpi_sci handling which was completely dropped
      AK: moved some more variables into acpi/boot.c
      
      Cc: len.brown@intel.com
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      1a3f239d
    • J
      [PATCH] i386: initialize end-of-memory variables as early as possible · ba9c231f
      Jan Beulich 提交于
      Move initialization of all memory end variables to as early as
      possible, so that dependent code doesn't need to check whether these
      variables have already been set.
      
      Change the range check in kunmap_atomic to actually make use of this
      so that the no-mapping-estabished path (under CONFIG_DEBUG_HIGHMEM)
      gets used only when the address is inside the lowmem area (and BUG()
      otherwise).
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      ba9c231f
  26. 02 7月, 2006 1 次提交
  27. 01 7月, 2006 1 次提交
  28. 28 6月, 2006 2 次提交