1. 13 12月, 2011 1 次提交
    • M
      x86, efi: EFI boot stub support · 291f3632
      Matt Fleming 提交于
      There is currently a large divide between kernel development and the
      development of EFI boot loaders. The idea behind this patch is to give
      the kernel developers full control over the EFI boot process. As
      H. Peter Anvin put it,
      
      "The 'kernel carries its own stub' approach been very successful in
      dealing with BIOS, and would make a lot of sense to me for EFI as
      well."
      
      This patch introduces an EFI boot stub that allows an x86 bzImage to
      be loaded and executed by EFI firmware. The bzImage appears to the
      firmware as an EFI application. Luckily there are enough free bits
      within the bzImage header so that it can masquerade as an EFI
      application, thereby coercing the EFI firmware into loading it and
      jumping to its entry point. The beauty of this masquerading approach
      is that both BIOS and EFI boot loaders can still load and run the same
      bzImage, thereby allowing a single kernel image to work in any boot
      environment.
      
      The EFI boot stub supports multiple initrds, but they must exist on
      the same partition as the bzImage. Command-line arguments for the
      kernel can be appended after the bzImage name when run from the EFI
      shell, e.g.
      
      Shell> bzImage console=ttyS0 root=/dev/sdb initrd=initrd.img
      
      v7:
       - Fix checkpatch warnings.
      
      v6:
      
       - Try to allocate initrd memory just below hdr->inird_addr_max.
      
      v5:
      
       - load_options_size is UTF-16, which needs dividing by 2 to convert
         to the corresponding ASCII size.
      
      v4:
      
       - Don't read more than image->load_options_size
      
      v3:
      
       - Fix following warnings when compiling CONFIG_EFI_STUB=n
      
         arch/x86/boot/tools/build.c: In function ‘main’:
         arch/x86/boot/tools/build.c:138:24: warning: unused variable ‘pe_header’
         arch/x86/boot/tools/build.c:138:15: warning: unused variable ‘file_sz’
      
       - As reported by Matthew Garrett, some Apple machines have GOPs that
         don't have hardware attached. We need to weed these out by
         searching for ones that handle the PCIIO protocol.
      
       - Don't allocate memory if no initrds are on cmdline
       - Don't trust image->load_options_size
      
      Maarten Lankhorst noted:
       - Don't strip first argument when booted from efibootmgr
       - Don't allocate too much memory for cmdline
       - Don't update cmdline_size, the kernel considers it read-only
       - Don't accept '\n' for initrd names
      
      v2:
      
       - File alignment was too large, was 8192 should be 512. Reported by
         Maarten Lankhorst on LKML.
       - Added UGA support for graphics
       - Use VIDEO_TYPE_EFI instead of hard-coded number.
       - Move linelength assignment until after we've assigned depth
       - Dynamically fill out AddressOfEntryPoint in tools/build.c
       - Don't use magic number for GDT/TSS stuff. Requested by Andi Kleen
       - The bzImage may need to be relocated as it may have been loaded at
         a high address address by the firmware. This was required to get my
         macbook booting because the firmware loaded it at 0x7cxxxxxx, which
         triggers this error in decompress_kernel(),
      
      	if (heap > ((-__PAGE_OFFSET-(128<<20)-1) & 0x7fffffff))
      		error("Destination address too large");
      
      Cc: Mike Waychison <mikew@google.com>
      Cc: Matthew Garrett <mjg@redhat.com>
      Tested-by: NHenrik Rydberg <rydberg@euromail.se>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Link: http://lkml.kernel.org/r/1321383097.2657.9.camel@mfleming-mobl1.ger.corp.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      291f3632
  2. 10 12月, 2011 3 次提交
  3. 09 12月, 2011 3 次提交
    • Y
      thp: add compound tail page _mapcount when mapped · b6999b19
      Youquan Song 提交于
      With the 3.2-rc kernel, IOMMU 2M pages in KVM works.  But when I tried
      to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
      failed to be used.
      
      The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
      page calls gup_huge_pmd.  If compound pages are used and the page is a
      tail page, gup_huge_pmd() increases _mapcount to record tail page are
      mapped while gup_huge_pud does not do that.
      
      So when the mapped page is relesed, it will result in kernel oops
      because the page is not marked mapped.
      
      This patch add tail process for compound page in 1GB huge page which
      keeps the same process as 2M page.
      
      Reproduce like:
      1. Add grub boot option: hugepagesz=1G hugepages=8
      2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
      3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
      	-net none -device pci-assign,host=07:00.1
      
        kernel BUG at mm/swap.c:114!
        invalid opcode: 0000 [#1] SMP
        Call Trace:
          put_page+0x15/0x37
          kvm_release_pfn_clean+0x31/0x36
          kvm_iommu_put_pages+0x94/0xb1
          kvm_iommu_unmap_memslots+0x80/0xb6
          kvm_assign_device+0xba/0x117
          kvm_vm_ioctl_assigned_device+0x301/0xa47
          kvm_vm_ioctl+0x36c/0x3a2
          do_vfs_ioctl+0x49e/0x4e4
          sys_ioctl+0x5a/0x7c
          system_call_fastpath+0x16/0x1b
        RIP  put_compound_page+0xd4/0x168
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6999b19
    • M
      x86, efi: Calling __pa() with an ioremap()ed address is invalid · e8c71062
      Matt Fleming 提交于
      If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set
      in ->attribute we currently call set_memory_uc(), which in turn
      calls __pa() on a potentially ioremap'd address.
      
      On CONFIG_X86_32 this is invalid, resulting in the following
      oops on some machines:
      
        BUG: unable to handle kernel paging request at f7f22280
        IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210
        [...]
      
        Call Trace:
         [<c104f8ca>] ? page_is_ram+0x1a/0x40
         [<c1025aff>] reserve_memtype+0xdf/0x2f0
         [<c1024dc9>] set_memory_uc+0x49/0xa0
         [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa
         [<c19216d4>] start_kernel+0x291/0x2f2
         [<c19211c7>] ? loglevel+0x1b/0x1b
         [<c19210bf>] i386_start_kernel+0xbf/0xc8
      
      A better approach to this problem is to map the memory region
      with the correct attributes from the start, instead of modifying
      it after the fact. The uncached case can be handled by
      ioremap_nocache() and the cached by ioremap_cache().
      
      Despite first impressions, it's not possible to use
      ioremap_cache() to map all cached memory regions on
      CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really
      don't like being mapped into the vmalloc space, as detailed in
      the following bug report,
      
      	https://bugzilla.redhat.com/show_bug.cgi?id=748516
      
      Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA
      regions are covered by the direct kernel mapping table on
      CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI
      regions via the direct kernel mapping with the initial call to
      init_memory_mapping() in setup_arch(), whereas previously these
      regions wouldn't be mapped if they were after the last E820_RAM
      region until efi_ioremap() was called. Doing it this way allows
      us to delete efi_ioremap() completely.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg@redhat.com>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Huang Ying <huang.ying.caritas@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      e8c71062
    • M
      x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked · 2ded6e6a
      Mark Langsdorf 提交于
      When HPET is operating in RTC mode, the TN_ENABLE bit on timer1
      controls whether the HPET or the RTC delivers interrupts to irq8. When
      the system goes into suspend, the RTC driver sends a signal to the
      HPET driver so that the HPET releases control of irq8, allowing the
      RTC to wake the system from suspend. The switchover is accomplished by
      a write to the HPET configuration registers which currently only
      occurs while servicing the HPET interrupt.
      
      On some systems, I have seen the system suspend before an HPET
      interrupt occurs, preventing the write to the HPET configuration
      register and leaving the HPET in control of the irq8. As the HPET is
      not active during suspend, it does not generate a wake signal and RTC
      alarms do not work.
      
      This patch forces the HPET driver to immediately transfer control of
      the irq8 channel to the RTC instead of waiting until the next
      interrupt event.
      Signed-off-by: NMark Langsdorf <mark.langsdorf@amd.com>
      Link: http://lkml.kernel.org/r/20111118153306.GB16319@alberich.amd.comTested-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      2ded6e6a
  4. 06 12月, 2011 5 次提交
  5. 05 12月, 2011 15 次提交
  6. 04 12月, 2011 1 次提交
    • K
      xen/pm_idle: Make pm_idle be default_idle under Xen. · e5fd47bf
      Konrad Rzeszutek Wilk 提交于
      The idea behind commit d91ee586 ("cpuidle: replace xen access to x86
      pm_idle and default_idle") was to have one call - disable_cpuidle()
      which would make pm_idle not be molested by other code.  It disallows
      cpuidle_idle_call to be set to pm_idle (which is excellent).
      
      But in the select_idle_routine() and idle_setup(), the pm_idle can still
      be set to either: amd_e400_idle, mwait_idle or default_idle.  This
      depends on some CPU flags (MWAIT) and in AMD case on the type of CPU.
      
      In case of mwait_idle we can hit some instances where the hypervisor
      (Amazon EC2 specifically) sets the MWAIT and we get:
      
        Brought up 2 CPUs
        invalid opcode: 0000 [#1] SMP
      
        Pid: 0, comm: swapper Not tainted 3.1.0-0.rc6.git0.3.fc16.x86_64 #1
        RIP: e030:[<ffffffff81015d1d>]  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
        ...
        Call Trace:
         [<ffffffff8100e2ed>] cpu_idle+0xae/0xe8
         [<ffffffff8149ee78>] cpu_bringup_and_idle+0xe/0x10
        RIP  [<ffffffff81015d1d>] mwait_idle+0x6f/0xb4
         RSP <ffff8801d28ddf10>
      
      In the case of amd_e400_idle we don't get so spectacular crashes, but we
      do end up making an MSR which is trapped in the hypervisor, and then
      follow it up with a yield hypercall.  Meaning we end up going to
      hypervisor twice instead of just once.
      
      The previous behavior before v3.0 was that pm_idle was set to
      default_idle regardless of select_idle_routine/idle_setup.
      
      We want to do that, but only for one specific case: Xen.  This patch
      does that.
      
      Fixes RH BZ #739499 and Ubuntu #881076
      Reported-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5fd47bf
  7. 22 11月, 2011 1 次提交
  8. 20 11月, 2011 1 次提交
  9. 17 11月, 2011 6 次提交
  10. 14 11月, 2011 4 次提交