1. 03 5月, 2010 1 次提交
    • I
      x86: Fix parse_reservetop() build failure on certain configs · 56f0e74c
      Ingo Molnar 提交于
      Commit e67a807f ("x86: Fix 'reservetop=' functionality") added a
      fixup_early_ioremap() call to parse_reservetop() and declared it
      in io.h.
      
      But asm/io.h was only included indirectly - and on some configs
      not at all, causing a build failure on those configs.
      
      Cc: Liang Li <liang.li@windriver.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Wang Chen <wangchen@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <1272621711-8683-1-git-send-email-liang.li@windriver.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      56f0e74c
  2. 30 4月, 2010 1 次提交
    • L
      x86: Fix 'reservetop=' functionality · e67a807f
      Liang Li 提交于
      When specifying the 'reservetop=0xbadc0de' kernel parameter,
      the kernel will stop booting due to a early_ioremap bug that
      relates to commit 8827247f.
      
      The root cause of boot failure problem is the value of
      'slot_virt[i]' was initialized in setup_arch->early_ioremap_init().
      But later in setup_arch, the function 'parse_early_param' will
      modify 'FIXADDR_TOP' when 'reservetop=0xbadc0de' being specified.
      
      The simplest fix might be use __fix_to_virt(idx0) to get updated
      value of 'FIXADDR_TOP' in '__early_ioremap' instead of reference
      old value from slot_virt[slot] directly.
      
      Changelog since v0:
      
      -v1: When reservetop being handled then FIXADDR_TOP get
           adjusted, Hence check prev_map then re-initialize slot_virt and
           PMD based on new FIXADDR_TOP.
      
      -v2: place fixup_early_ioremap hence call early_ioremap_init in
           reserve_top_address  to re-initialize slot_virt and
           corresponding PMD when parse_reservertop
      
      -v3: move fixup_early_ioremap out of reserve_top_address to make
           sure other clients of reserve_top_address like xen/lguest won't
           broken
      Signed-off-by: NLiang Li <liang.li@windriver.com>
      Tested-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Wang Chen <wangchen@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      LKML-Reference: <1272621711-8683-1-git-send-email-liang.li@windriver.com>
      [ fixed three small cleanliness details in fixup_early_ioremap() ]
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e67a807f
  3. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  4. 19 3月, 2009 1 次提交
  5. 03 3月, 2009 1 次提交
  6. 28 2月, 2009 1 次提交
  7. 21 8月, 2008 1 次提交
    • D
      i386: vmalloc size fix · e621bd18
      Dave Young 提交于
      Booting kernel with vmalloc=[any size<=16m] will oops on my pc (i386/1G memory).
      
      BUG_ON in arch/x86/mm/init_32.c triggered:
      BUG_ON((unsigned long)high_memory > VMALLOC_START);
      
      It's due to the vm area hole.
      
      In include/asm-x86/pgtable_32.h:
      #define VMALLOC_OFFSET	(8 * 1024 * 1024)
      #define VMALLOC_START	(((unsigned long)high_memory + 2 * VMALLOC_OFFSET - 1) \
      			 & ~(VMALLOC_OFFSET - 1))
      
      There's several related point:
      1. MAXMEM :
       (-__PAGE_OFFSET - __VMALLOC_RESERVE).
      The space after VMALLOC_END is included as well, I set it to
      (VMALLOC_END - PAGE_OFFSET - __VMALLOC_RESERVE)
      
      2. VMALLOC_OFFSET is not considered in __VMALLOC_RESERVE
      fixed by adding VMALLOC_OFFSET to it.
      
      3. VMALLOC_START :
       (((unsigned long)high_memory + 2 * VMALLOC_OFFSET - 1) & ~(VMALLOC_OFFSET - 1))
      So it's not always 8M, bigger than 8M possible.
      I set it to ((unsigned long)high_memory + VMALLOC_OFFSET)
      
      4. the VMALLOC_RESERVE is an unused macro, so remove it here.
      Signed-off-by: NDave Young <hidave.darkstar@gmail.com>
      Cc: akpm@linux-foundation.org
      Cc: hidave.darkstar@gmail.com
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      e621bd18
  8. 27 7月, 2008 1 次提交
  9. 09 7月, 2008 1 次提交
  10. 08 7月, 2008 1 次提交
  11. 20 6月, 2008 2 次提交
  12. 07 5月, 2008 1 次提交
    • H
      x86: fix PAE pmd_bad bootup warning · aeed5fce
      Hugh Dickins 提交于
      Fix warning from pmd_bad() at bootup on a HIGHMEM64G HIGHPTE x86_32.
      
      That came from 9fc34113 x86: debug pmd_bad();
      but we understand now that the typecasting was wrong for PAE in the previous
      version: pagetable pages above 4GB looked bad and stopped Arjan from booting.
      
      And revert that cded932b x86: fix pmd_bad
      and pud_bad to support huge pages.  It was the wrong way round: we shouldn't
      weaken every pmd_bad and pud_bad check to let huge pages slip through - in
      part they check that we _don't_ have a huge page where it's not expected.
      
      Put the x86 pmd_bad() and pud_bad() definitions back to what they have long
      been: they can be improved (x86_32 should use PTE_MASK, to stop PAE thinking
      junk in the upper word is good; and x86_64 should follow x86_32's stricter
      comparison, to stop thinking any subset of required bits is good); but that
      should be a later patch.
      
      Fix Hans' good observation that follow_page() will never find pmd_huge()
      because that would have already failed the pmd_bad test: test pmd_huge in
      between the pmd_none and pmd_bad tests.  Tighten x86's pmd_huge() check?
      No, once it's a hugepage entry, it can get quite far from a good pmd: for
      example, PROT_NONE leaves it with only ACCESSED of the KERN_PGTABLE bits.
      
      However... though follow_page() contains this and another test for huge
      pages, so it's nice to keep it working on them, where does it actually get
      called on a huge page?  get_user_pages() checks is_vm_hugetlb_page(vma) to
      to call alternative hugetlb processing, as does unmap_vmas() and others.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Earlier-version-tested-by: NIngo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jeff Chua <jeff.chua.linux@gmail.com>
      Cc: Hans Rosenfeld <hans.rosenfeld@amd.com>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aeed5fce
  13. 25 4月, 2008 3 次提交
  14. 20 4月, 2008 1 次提交
  15. 17 4月, 2008 2 次提交
  16. 12 3月, 2008 1 次提交
    • T
      x86: remove quicklists · 985a34bd
      Thomas Gleixner 提交于
      quicklists cause a serious memory leak on 32-bit x86,
      as documented at:
      
        http://bugzilla.kernel.org/show_bug.cgi?id=9991
      
      the reason is that the quicklist pool is a special-purpose
      cache that grows out of proportion. It is not accounted for
      anywhere and users have no way to even realize that it's
      the quicklists that are causing RAM usage spikes. It was
      supposed to be a relatively small pool, but as demonstrated
      by KOSAKI Motohiro, they can grow as large as:
      
        Quicklists:    1194304 kB
      
      given how much trouble this code has caused historically,
      and given that Andrew objected to its introduction on x86
      (years ago), the best option at this point is to remove them.
      
      [ any performance benefits of caching constructed pgds should
        be implemented in a more generic way (possibly within the page
        allocator), while still allowing constructed pages to be
        allocated by other workloads. ]
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      985a34bd
  17. 09 2月, 2008 1 次提交
    • M
      CONFIG_HIGHPTE vs. sub-page page tables. · 2f569afd
      Martin Schwidefsky 提交于
      Background: I've implemented 1K/2K page tables for s390.  These sub-page
      page tables are required to properly support the s390 virtualization
      instruction with KVM.  The SIE instruction requires that the page tables
      have 256 page table entries (pte) followed by 256 page status table entries
      (pgste).  The pgstes are only required if the process is using the SIE
      instruction.  The pgstes are updated by the hardware and by the hypervisor
      for a number of reasons, one of them is dirty and reference bit tracking.
      To avoid wasting memory the standard pte table allocation should return
      1K/2K (31/64 bit) and 2K/4K if the process is using SIE.
      
      Problem: Page size on s390 is 4K, page table size is 1K or 2K.  That means
      the s390 version for pte_alloc_one cannot return a pointer to a struct
      page.  Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one
      cannot return a pointer to a pte either, since that would require more than
      32 bit for the return value of pte_alloc_one (and the pte * would not be
      accessible since its not kmapped).
      
      Solution: The only solution I found to this dilemma is a new typedef: a
      pgtable_t.  For s390 pgtable_t will be a (pte *) - to be introduced with a
      later patch.  For everybody else it will be a (struct page *).  The
      additional problem with the initialization of the ptl lock and the
      NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and
      a destructor pgtable_page_dtor.  The page table allocation and free
      functions need to call these two whenever a page table page is allocated or
      freed.  pmd_populate will get a pgtable_t instead of a struct page pointer.
       To get the pgtable_t back from a pmd entry that has been installed with
      pmd_populate a new function pmd_pgtable is added.  It replaces the pmd_page
      call in free_pte_range and apply_to_pte_range.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f569afd
  18. 06 2月, 2008 1 次提交
  19. 04 2月, 2008 2 次提交
  20. 01 2月, 2008 1 次提交
  21. 30 1月, 2008 5 次提交
  22. 18 10月, 2007 2 次提交
  23. 17 10月, 2007 1 次提交
  24. 11 10月, 2007 2 次提交
  25. 22 7月, 2007 1 次提交
  26. 13 5月, 2007 1 次提交
  27. 03 5月, 2007 2 次提交
    • J
      [PATCH] i386: PARAVIRT: Allow paravirt backend to choose kernel PMD sharing · 5311ab62
      Jeremy Fitzhardinge 提交于
      Normally when running in PAE mode, the 4th PMD maps the kernel address space,
      which can be shared among all processes (since they all need the same kernel
      mappings).
      
      Xen, however, does not allow guests to have the kernel pmd shared between page
      tables, so parameterize pgtable.c to allow both modes of operation.
      
      There are several side-effects of this.  One is that vmalloc will update the
      kernel address space mappings, and those updates need to be propagated into
      all processes if the kernel mappings are not intrinsically shared.  In the
      non-PAE case, this is done by maintaining a pgd_list of all processes; this
      list is used when all process pagetables must be updated.  pgd_list is
      threaded via otherwise unused entries in the page structure for the pgd, which
      means that the pgd must be page-sized for this to work.
      
      Normally the PAE pgd is only 4x64 byte entries large, but Xen requires the PAE
      pgd to page aligned anyway, so this patch forces the pgd to be page
      aligned+sized when the kernel pmd is unshared, to accomodate both these
      requirements.
      
      Also, since there may be several distinct kernel pmds (if the user/kernel
      split is below 3G), there's no point in allocating them from a slab cache;
      they're just allocated with get_free_page and initialized appropriately.  (Of
      course the could be cached if there is just a single kernel pmd - which is the
      default with a 3G user/kernel split - but it doesn't seem worthwhile to add
      yet another case into this code).
      
      [ Many thanks to wli for review comments. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NWilliam Lee Irwin III <wli@holomorphy.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: Christoph Lameter <clameter@sgi.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      5311ab62
    • J
      [PATCH] i386: Relocate VDSO ELF headers to match mapped location with COMPAT_VDSO · d4f7a2c1
      Jeremy Fitzhardinge 提交于
      Some versions of libc can't deal with a VDSO which doesn't have its
      ELF headers matching its mapped address.  COMPAT_VDSO maps the VDSO at
      a specific system-wide fixed address.  Previously this was all done at
      build time, on the grounds that the fixed VDSO address is always at
      the top of the address space.  However, a hypervisor may reserve some
      of that address space, pushing the fixmap address down.
      
      This patch does the adjustment dynamically at runtime, depending on
      the runtime location of the VDSO fixmap.
      
      [ Patch has been through several hands: Jan Beulich wrote the orignal
        version; Zach reworked it, and Jeremy converted it to relocate phdrs
        as well as sections. ]
      Signed-off-by: NJeremy Fitzhardinge <jeremy@xensource.com>
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Cc: Zachary Amsden <zach@vmware.com>
      Cc: "Jan Beulich" <JBeulich@novell.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Andi Kleen <ak@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Roland McGrath <roland@redhat.com>
      d4f7a2c1
  28. 13 2月, 2007 1 次提交