1. 20 3月, 2011 1 次提交
    • Y
      x86: Cleanup highmap after brk is concluded · e5f15b45
      Yinghai Lu 提交于
      Now cleanup_highmap actually is in two steps: one is early in head64.c
      and only clears above _end; a second one is in init_memory_mapping() and
      tries to clean from _brk_end to _end.
      It should check if those boundaries are PMD_SIZE aligned but currently
      does not.
      Also init_memory_mapping() is called several times for numa or memory
      hotplug, so we really should not handle initial kernel mappings there.
      
      This patch moves cleanup_highmap() down after _brk_end is settled so
      we can do everything in one step.
      Also we honor max_pfn_mapped in the implementation of cleanup_highmap.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      LKML-Reference: <alpine.DEB.2.00.1103171739050.3382@kaball-desktop>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      e5f15b45
  2. 14 10月, 2010 1 次提交
  3. 28 8月, 2010 2 次提交
    • Y
      x86, memblock: Replace e820_/_early string with memblock_ · a9ce6bc1
      Yinghai Lu 提交于
      1.include linux/memblock.h directly. so later could reduce e820.h reference.
      2 this patch is done by sed scripts mainly
      
      -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      a9ce6bc1
    • Y
      x86: Use memblock to replace early_res · 72d7c3b3
      Yinghai Lu 提交于
      1. replace find_e820_area with memblock_find_in_range
      2. replace reserve_early with memblock_x86_reserve_range
      3. replace free_early with memblock_x86_free_range.
      4. NO_BOOTMEM will switch to use memblock too.
      5. use _e820, _early wrap in the patch, in following patch, will
         replace them all
      6. because memblock_x86_free_range support partial free, we can remove some special care
      7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill()
         so adjust some calling later in setup.c::setup_arch()
         -- corruption_check and mptable_update
      
      -v2: Move reserve_brk() early
          Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range()
          that could happen We have more then 128 RAM entry in E820 tables, and
          memblock_x86_fill() could use memblock_find_in_range() to find a new place for
          memblock.memory.region array.
          and We don't need to use extend_brk() after fill_memblock_area()
          So move reserve_brk() early before fill_memblock_area().
      -v3: Move find_smp_config early
          To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable
          in right place.
      -v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in
          memblock.reserved already..
          use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later.
      -v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit
          active_region for 32bit does include high pages
          need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped()
      -v6: Use current_limit instead
      -v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L
      -v8: Set memblock_can_resize early to handle EFI with more RAM entries
      -v9: update after kmemleak changes in mainline
      Suggested-by: NDavid S. Miller <davem@davemloft.net>
      Suggested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      72d7c3b3
  4. 30 3月, 2010 1 次提交
    • Y
      x86: Make sure free_init_pages() frees pages on page boundary · c967da6a
      Yinghai Lu 提交于
      When CONFIG_NO_BOOTMEM=y, it could use memory more effiently, or
      in a more compact fashion.
      
      Example:
      
       Allocated new RAMDISK: 00ec2000 - 0248ce57
       Move RAMDISK from 000000002ea04000 - 000000002ffcee56 to 00ec2000 - 0248ce56
      
      The new RAMDISK's end is not page aligned.
      Last page could be shared with other users.
      
      When free_init_pages are called for initrd or .init, the page
      could be freed and we could corrupt other data.
      
      code segment in free_init_pages():
      
       |        for (; addr < end; addr += PAGE_SIZE) {
       |                ClearPageReserved(virt_to_page(addr));
       |                init_page_count(virt_to_page(addr));
       |                memset((void *)(addr & ~(PAGE_SIZE-1)),
       |                        POISON_FREE_INITMEM, PAGE_SIZE);
       |                free_page(addr);
       |                totalram_pages++;
       |        }
      
      last half page could be used as one whole free page.
      
      So page align the boundaries.
      
      -v2: make the original initramdisk to be aligned, according to
           Johannes, otherwise we have the chance to lose one page.
           we still need to keep initrd_end not aligned, otherwise it could
           confuse decompressor.
      -v3: change to WARN_ON instead, suggested by Johannes.
      -v4: use PAGE_ALIGN, suggested by Johannes.
           We may fix that macro name later to PAGE_ALIGN_UP, and PAGE_ALIGN_DOWN
           Add comments about assuming ramdisk start is aligned
           in relocate_initrd(), change to re get ramdisk_image instead of save it
           to make diff smaller. Add warning for wrong range, suggested by Johannes.
      -v6: remove one WARN()
           We need to align beginning in free_init_pages()
           do not copy more than ramdisk_size, noticed by Johannes
      Reported-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Tested-by: NStanislaw Gruszka <sgruszka@redhat.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      LKML-Reference: <1269830604-26214-3-git-send-email-yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c967da6a
  5. 11 12月, 2009 1 次提交
    • Y
      x86: Use find_e820() instead of hard coded trampoline address · 893f38d1
      Yinghai Lu 提交于
      Jens found the following crash/regression:
      
      [    0.000000] found SMP MP-table at [ffff8800000fdd80] fdd80
      [    0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 0-fff BIOS data page
      
      and
      
      [    0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 6000-7fff TRAMPOLINE
      
      and bisected it to b24c2a92 ("x86: Move find_smp_config()
      earlier and avoid bootmem usage").
      
      It turns out the BIOS is using the first 64k for mptable,
      without reserving it.
      
      So try to find good range for the real-mode trampoline instead of
      hard coding it, in case some bios tries to use that range for sth.
      Reported-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Tested-by: NJens Axboe <jens.axboe@oracle.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      LKML-Reference: <4B21630A.6000308@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      893f38d1
  6. 31 8月, 2009 1 次提交
    • T
      x86: Add early platform detection · 47a3d5da
      Thomas Gleixner 提交于
      Platforms like Moorestown require early setup and want to avoid the
      call to reserve_ebda_region. The x86_init override is too late when
      the MRST detection happens in setup_arch. Move the default i386
      x86_init overrides and the call to reserve_ebda_region into a separate
      function which is called as the default of a switch case depending on
      the hardware_subarch id in boot params. This allows us to add a case
      for MRST and let MRST have its own early setup function.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      47a3d5da
  7. 27 8月, 2009 1 次提交
  8. 15 3月, 2009 1 次提交
    • J
      x86: add brk allocation for very, very early allocations · 93dbda7c
      Jeremy Fitzhardinge 提交于
      Impact: new interface
      
      Add a brk()-like allocator which effectively extends the bss in order
      to allow very early code to do dynamic allocations.  This is better than
      using statically allocated arrays for data in subsystems which may never
      get used.
      
      The space for brk allocations is in the bss ELF segment, so that the
      space is mapped properly by the code which maps the kernel, and so
      that bootloaders keep the space free rather than putting a ramdisk or
      something into it.
      
      The bss itself, delimited by __bss_stop, ends before the brk area
      (__brk_base to __brk_limit).  The kernel text, data and bss is reserved
      up to __bss_stop.
      
      Any brk-allocated data is reserved separately just before the kernel
      pagetable is built, as that code allocates from unreserved spaces
      in the e820 map, potentially allocating from any unused brk memory.
      Ultimately any unused memory in the brk area is used in the general
      kernel memory pool.
      
      Initially the brk space is set to 1MB, which is probably much larger
      than any user needs (the largest current user is i386 head_32.S's code
      to build the pagetables to map the kernel, which can get fairly large
      with a big kernel image and no PSE support).  So long as the system
      has sufficient memory for the bootloader to reserve the kernel+1MB brk,
      there are no bad effects resulting from an over-large brk.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      93dbda7c
  9. 20 1月, 2009 1 次提交
  10. 16 1月, 2009 7 次提交
    • T
      x86: misc clean up after the percpu update · 004aa322
      Tejun Heo 提交于
      Do the following cleanups:
      
      * kill x86_64_init_pda() which now is equivalent to pda_init()
      
      * use per_cpu_offset() instead of cpu_pda() when initializing
        initial_gs
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      004aa322
    • T
      x86: make pda a percpu variable · b12d8db8
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      As pda is now allocated in percpu area, it can easily be made a proper
      percpu variable.  Make it so by defining per cpu symbol from linker
      script and declaring it in C code for SMP and simply defining it for
      UP.  This change cleans up code and brings SMP and UP closer a bit.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b12d8db8
    • T
      x86: merge 64 and 32 SMP percpu handling · 9939ddaf
      Tejun Heo 提交于
      Now that pda is allocated as part of percpu, percpu doesn't need to be
      accessed through pda.  Unify x86_64 SMP percpu access with x86_32 SMP
      one.  Other than the segment register, operand size and the base of
      percpu symbols, they behave identical now.
      
      This patch replaces now unnecessary pda->data_offset with a dummy
      field which is necessary to keep stack_canary at its place.  This
      patch also moves per_cpu_offset initialization out of init_gdt() into
      setup_per_cpu_areas().  Note that this change also necessitates
      explicit per_cpu_offset initializations in voyager_smp.c.
      
      With this change, x86_OP_percpu()'s are as efficient on x86_64 as on
      x86_32 and also x86_64 can use assembly PER_CPU macros.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9939ddaf
    • T
      x86: fold pda into percpu area on SMP · 1a51e3a0
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      Currently pdas and percpu areas are allocated separately.  %gs points
      to local pda and percpu area can be reached using pda->data_offset.
      This patch folds pda into percpu area.
      
      Due to strange gcc requirement, pda needs to be at the beginning of
      the percpu area so that pda->stack_canary is at %gs:40.  To achieve
      this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
      added and used to reserve pda sized chunk at the start of the percpu
      area.
      
      After this change, for boot cpu, %gs first points to pda in the
      data.init area and later during setup_per_cpu_areas() gets updated to
      point to the actual pda.  This means that setup_per_cpu_areas() need
      to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
      already has modified it when control reaches setup_per_cpu_areas().
      
      This patch also removes now unnecessary get_local_pda() and its call
      sites.
      
      A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
      per cpu area" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      1a51e3a0
    • T
      x86: use static _cpu_pda array · c8f3329a
      Tejun Heo 提交于
      _cpu_pda array first uses statically allocated storage in data.init
      and then switches to allocated bootmem to conserve space.  However,
      after folding pda area into percpu area, _cpu_pda array will be
      removed completely.  Drop the reallocation part to simplify the code
      for soon-to-follow changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c8f3329a
    • T
      x86: load pointer to pda into %gs while brining up a CPU · f32ff538
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      CPU startup code in head_64.S loaded address of a zero page into %gs
      for temporary use till pda is loaded but address to the actual pda is
      available at the point.  Load the real address directly instead.
      
      This will help unifying percpu and pda handling later on.
      
      This patch is mostly taken from Mike Travis' "x86_64: Fold pda into
      per cpu area" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f32ff538
    • T
      x86: make percpu symbols zerobased on SMP · 3e5d8f97
      Tejun Heo 提交于
      [ Based on original patch from Christoph Lameter and Mike Travis. ]
      
      This patch makes percpu symbols zerobased on x86_64 SMP by adding
      PERCPU_VADDR() to vmlinux.lds.h which helps setting explicit vaddr on
      the percpu output section and using it in vmlinux_64.lds.S.  A new
      PHDR is added as existing ones cannot contain sections near address
      zero.  PERCPU_VADDR() also adds a new symbol __per_cpu_load which
      always points to the vaddr of the loaded percpu data.init region.
      
      The following adjustments have been made to accomodate the address
      change.
      
      * code to locate percpu gdt_page in head_64.S is updated to add the
        load address to the gdt_page offset.
      
      * __per_cpu_load is used in places where access to the init data area
        is necessary.
      
      * pda->data_offset is initialized soon after C code is entered as zero
        value doesn't work anymore.
      
      This patch is mostly taken from Mike Travis' "x86_64: Base percpu
      variables at zero" patch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e5d8f97
  11. 03 1月, 2009 1 次提交
  12. 08 12月, 2008 1 次提交
    • R
      x86: change static allocation of trampoline area · 3e1e9002
      Rafael J. Wysocki 提交于
      Impact: fix trampoline sizing bug, save space
      
      While debugging a suspend-to-RAM related issue it occured to me that
      if the trampoline code had grown past 4 KB, we would have been
      allocating too little memory for it, since the 4 KB size of the
      trampoline is hardcoded into arch/x86/kernel/e820.c .  Change that
      by making the kernel compute the trampoline size and allocate as much
      memory as necessary.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3e1e9002
  13. 29 9月, 2008 1 次提交
  14. 25 9月, 2008 1 次提交
  15. 15 8月, 2008 1 次提交
  16. 16 7月, 2008 1 次提交
  17. 08 7月, 2008 4 次提交
    • Y
      x86: move reserve_setup_data to setup.c · 28bb2237
      Yinghai Lu 提交于
      Ying Huang would like setup_data to be reserved, but not included in the
      no save range.
      
      Here we try to modify the e820 table to reserve that range early.
      also add that in early_res in case bootloader messes up with the ramdisk.
      
      other solution would be
      1. add early_res_to_highmem...
      2. early_res_to_e820...
      but they could reserve another type memory wrongly, if early_res has some
      resource reserved early, and not needed later, but it is not removed from
      early_res in time. Like the RAMDISK (already handled).
      Signed-off-by: NYinghai Lu <yhlu.kernel@gmail.com>
      Cc: andi@firstfloor.org
      Tested-by: NHuang, Ying <ying.huang@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      28bb2237
    • J
      x86, 64-bit: split x86_64_start_kernel · f97013fd
      Jeremy Fitzhardinge 提交于
      Split x86_64_start_kernel() into two pieces:
      
         The first essentially cleans up after head_64.S.  It clears the
         bss, zaps low identity mappings, sets up some early exception
         handlers.
      
         The second part preserves the boot data, reserves the kernel's
         text/data/bss, pagetables and ramdisk, and then starts the kernel
         proper.
      
      This split is so that Xen can call the second part to do the set up it
      needs done.  It doesn't need any of the first part setups, because it
      doesn't boot via head_64.S, and its redundant or actively damaging.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: xen-devel <xen-devel@lists.xensource.com>
      Cc: Stephen Tweedie <sct@redhat.com>
      Cc: Eduardo Habkost <ehabkost@redhat.com>
      Cc: Mark McLoughlin <markmc@redhat.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f97013fd
    • M
      x86: leave initial __cpu_pda array in place until cpus are booted · 5deb0b2a
      Mike Travis 提交于
      Ingo Molnar wrote:
      ...
      > they crashed after about 3 randconfig iterations with:
      >
      >   early res: 4 [8000-afff] PGTABLE
      >   early res: 5 [b000-b87f] MEMNODEMAP
      > PANIC: early exception 0e rip 10:ffffffff8077a150 error 2 cr2 37
      > Pid: 0, comm: swapper Not tainted 2.6.25-sched-devel.git-x86-latest.git #14
      >
      > Call Trace:
      >  [<ffffffff81466196>] early_idt_handler+0x56/0x6a
      >  [<ffffffff8077a150>] ? numa_set_node+0x30/0x60
      >  [<ffffffff8077a129>] ? numa_set_node+0x9/0x60
      >  [<ffffffff8147a543>] numa_init_array+0x93/0xf0
      >  [<ffffffff8147b039>] acpi_scan_nodes+0x3b9/0x3f0
      >  [<ffffffff8147a496>] numa_initmem_init+0x136/0x150
      >  [<ffffffff8146da5f>] setup_arch+0x48f/0x700
      >  [<ffffffff802566ea>] ? clockevents_register_notifier+0x3a/0x50
      >  [<ffffffff81466a87>] start_kernel+0xd7/0x440
      >  [<ffffffff81466422>] x86_64_start_kernel+0x222/0x280
      ...
      Here's the fixup...  This one should follow the previous patches.
      
      Thanks,
      Mike
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      5deb0b2a
    • M
      x86: remove static boot_cpu_pda array v2 · 3461b0af
      Mike Travis 提交于
        * Remove the boot_cpu_pda array and pointer table from the data section.
          Allocate the pointer table and array during init.  do_boot_cpu()
          will reallocate the pda in node local memory and if the cpu is being
          brought up before the bootmem array is released (after_bootmem = 0),
          then it will free the initial pda.  This will happen for all cpus
          present at system startup.
      
          This removes 512k + 32k bytes from the data section.
      
      For inclusion into sched-devel/latest tree.
      
      Based on:
      	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
          +   sched-devel/latest  .../mingo/linux-2.6-sched-devel.git
      Signed-off-by: NMike Travis <travis@sgi.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      3461b0af
  18. 05 6月, 2008 2 次提交
  19. 27 4月, 2008 1 次提交
  20. 26 4月, 2008 1 次提交
  21. 20 4月, 2008 1 次提交
  22. 17 4月, 2008 5 次提交
  23. 19 2月, 2008 1 次提交
    • T
      x86: zap invalid and unused pmds in early boot · 31eedd82
      Thomas Gleixner 提交于
      The early boot code maps KERNEL_TEXT_SIZE (currently 40MB) starting
      from __START_KERNEL_map. The kernel itself only needs _text to _end
      mapped in the high alias. On relocatible kernels the ASM setup code
      adjusts the compile time created high mappings to the relocation. This
      creates invalid pmd entries for negative offsets:
      
      0xffffffff80000000 -> pmd entry: ffffffffff2001e3
      It points outside of the physical address space and is marked present.
      
      This starts at the virtual address __START_KERNEL_map and goes up to
      the point where the first valid physical address (0x0) is mapped.
      
      Zap the mappings before _text and after _end right away in early
      boot. This removes also the invalid entries.
      
      Furthermore it simplifies the range check for high aliases.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      31eedd82
  24. 02 2月, 2008 1 次提交
  25. 30 1月, 2008 1 次提交