1. 27 12月, 2011 7 次提交
    • M
      [S390] kernel: Fix smp_switch_to_ipl_cpu() stack frame setup · 3931723f
      Michael Holzheu 提交于
      Currently, when smp_switch_to_ipl_cpu() is done, the backchain in the dump
      analysis tool crash looks like the following:
      
       #0 [1f746e70] __machine_kexec at 11dd92
       #1 [1f746eb8] smp_restart_cpu at 11820e
       #0 [00907eb0] cpu_idle at 10602e
       #1 [00907ef8] start_kernel at 979a08
      
      It would be good to see the registers of the interrupted function.
      To achieve this, the backchain on the new stack has to be set to zero.
      This looks then like the following:
      
       #0 [1f746e70] __machine_kexec at 11dd8e
       #1 [1f746eb8] smp_restart_cpu at 11820a
       PSW:  0706000180000000 00000000005c6fe6 (vtime_stop_cpu+134)
       GPRS: 0000000000000000 00000000005c6fe6 0000000001ad0228 0000000001ad0248
             0000000000907f08 0000000001ad0b40 0000000000979344 0000000000000000
             00000000009c0000 00000000009c0010 00000000009ab024 0000000001ad0200
             0000000001ad0238 00000000005cc9d8 000000000010602e 0000000000907e68
       #0 [00907eb0] cpu_idle at 10602e
       #1 [00907ef8] start_kernel at 979a08
      
      In addition to this, now also the correct PSW is stored in the pt_regs
      structure that is located at the start of the panic stack.
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      3931723f
    • M
      [S390] add support for physical memory > 4TB · 14045ebf
      Martin Schwidefsky 提交于
      The kernel address space of a 64 bit kernel currently uses a three level
      page table and the vmemmap array has a fixed address and a fixed maximum
      size. A three level page table is good enough for systems with less than
      3.8TB of memory, for bigger systems four page table levels need to be
      used. Each page table level costs a bit of performance, use 3 levels for
      normal systems and 4 levels only for the really big systems.
      To avoid bloating sparse.o too much set MAX_PHYSMEM_BITS to 46 for a
      maximum of 64TB of memory.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      14045ebf
    • M
    • M
      [S390] Rework create_mem_hole() function · 44e5ddc4
      Michael Holzheu 提交于
      This patch makes the create_mem_hole() function more readable and
      fixes some minor bugs (e.g. off-by-one problems).
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      44e5ddc4
    • C
      [S390] kvm: fix sleeping function ... at mm/page_alloc.c:2260 · c86cce2a
      Christian Borntraeger 提交于
      commit cc772456
          [S390] fix list corruption in gmap reverse mapping
      
      added a potential dead lock:
      
      BUG: sleeping function called from invalid context at mm/page_alloc.c:2260
      in_atomic(): 1, irqs_disabled(): 0, pid: 1108, name: qemu-system-s39
      3 locks held by qemu-system-s39/1108:
       #0:  (&kvm->slots_lock){+.+.+.}, at: [<000003e004866542>] kvm_set_memory_region+0x3a/0x6c [kvm]
       #1:  (&mm->mmap_sem){++++++}, at: [<0000000000123790>] gmap_map_segment+0x9c/0x298
       #2:  (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [<00000000001237a8>] gmap_map_segment+0xb4/0x298
      CPU: 0 Not tainted 3.1.3 #45
      Process qemu-system-s39 (pid: 1108, task: 00000004f8b3cb30, ksp: 00000004fd5978d0)
      00000004fd5979a0 00000004fd597920 0000000000000002 0000000000000000
             00000004fd5979c0 00000004fd597938 00000004fd597938 0000000000617e96
             0000000000000000 00000004f8b3cf58 0000000000000000 0000000000000000
             000000000000000d 000000000000000c 00000004fd597988 0000000000000000
             0000000000000000 0000000000100a18 00000004fd597920 00000004fd597960
      Call Trace:
      ([<0000000000100926>] show_trace+0xee/0x144)
       [<0000000000131f3a>] __might_sleep+0x12a/0x158
       [<0000000000217fb4>] __alloc_pages_nodemask+0x224/0xadc
       [<0000000000123086>] gmap_alloc_table+0x46/0x114
       [<000000000012395c>] gmap_map_segment+0x268/0x298
       [<000003e00486b014>] kvm_arch_commit_memory_region+0x44/0x6c [kvm]
       [<000003e004866414>] __kvm_set_memory_region+0x3b0/0x4a4 [kvm]
       [<000003e004866554>] kvm_set_memory_region+0x4c/0x6c [kvm]
       [<000003e004867c7a>] kvm_vm_ioctl+0x14a/0x314 [kvm]
       [<0000000000292100>] do_vfs_ioctl+0x94/0x588
       [<0000000000292688>] SyS_ioctl+0x94/0xac
       [<000000000061e124>] sysc_noemu+0x22/0x28
       [<000003fffcd5e7ca>] 0x3fffcd5e7ca
      3 locks held by qemu-system-s39/1108:
       #0:  (&kvm->slots_lock){+.+.+.}, at: [<000003e004866542>] kvm_set_memory_region+0x3a/0x6c [kvm]
       #1:  (&mm->mmap_sem){++++++}, at: [<0000000000123790>] gmap_map_segment+0x9c/0x298
       #2:  (&(&mm->page_table_lock)->rlock){+.+.+.}, at: [<00000000001237a8>] gmap_map_segment+0xb4/0x298
      
      Fix this by freeing the lock on the alloc path. This is ok, since the
      gmap table is never freed until we call gmap_free, so the table we are
      walking cannot go.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      c86cce2a
    • M
      [S390] Check for NULL termination in command line setup · 1fb81057
      Michael Holzheu 提交于
      The current code in setup_boot_command_line() uses a heuristic to
      detect an EBCDIC command line. It checks if any of the bytes in
      the command line has bit one (0x80) set. In that case it is assumed
      that we have an EBCDIC string and the complete command line is
      converted.
      
      On s390 there are cases where the boot loader provides a kernel
      command line that is NULL terminated, but has random data after
      the NULL termination. In that case, setup_boot_command_line()
      might misinterpret an ASCII string for an EBCDIC string. A
      subsequent string conversion can then damage the ASCII string.
      
      This patch solves the problem by checking for NULL termination.
      If no EBCDIC character has been found until the the NULL
      termination has been found, we now assume that we have an ASCII
      string.
      Signed-off-by: NMichael Holzheu <holzheu@linux.vnet.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      1fb81057
    • H
      [S390] irq: fix accounting of external call/emergency signal · 272f01bf
      Heiko Carstens 提交于
      Mask the extint_code parameter of the smp external interrupt handler
      to get the interruption code. Otherwise emergency call interrupts
      erroneously might be accounted as emergency signal interrupts.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      272f01bf
  2. 26 12月, 2011 5 次提交
  3. 25 12月, 2011 1 次提交
    • J
      KVM: x86: Prevent starting PIT timers in the absence of irqchip support · 0924ab2c
      Jan Kiszka 提交于
      User space may create the PIT and forgets about setting up the irqchips.
      In that case, firing PIT IRQs will crash the host:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000128
      IP: [<ffffffffa10f6280>] kvm_set_irq+0x30/0x170 [kvm]
      ...
      Call Trace:
       [<ffffffffa11228c1>] pit_do_work+0x51/0xd0 [kvm]
       [<ffffffff81071431>] process_one_work+0x111/0x4d0
       [<ffffffff81071bb2>] worker_thread+0x152/0x340
       [<ffffffff81075c8e>] kthread+0x7e/0x90
       [<ffffffff815a4474>] kernel_thread_helper+0x4/0x10
      
      Prevent this by checking the irqchip mode before starting a timer. We
      can't deny creating the PIT if the irqchips aren't set up yet as
      current user land expects this order to work.
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      0924ab2c
  4. 23 12月, 2011 1 次提交
    • D
      sparc64: Fix MSIQ HV call ordering in pci_sun4v_msiq_build_irq(). · 7cc85833
      David S. Miller 提交于
      This silently was working for many years and stopped working on
      Niagara-T3 machines.
      
      We need to set the MSIQ to VALID before we can set it's state to IDLE.
      
      On Niagara-T3, setting the state to IDLE first was causing HV_EINVAL
      errors.  The hypervisor documentation says, rather ambiguously, that
      the MSIQ must be "initialized" before one can set the state.
      
      I previously understood this to mean merely that a successful setconf()
      operation has been performed on the MSIQ, which we have done at this
      point.  But it seems to also mean that it has been set VALID too.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7cc85833
  5. 20 12月, 2011 3 次提交
  6. 16 12月, 2011 2 次提交
    • U
      ARM: unwinder: fix bisection to find origin in .idx section · ddf5a25c
      Uwe Kleine-König 提交于
      The bisection implemented in unwind_find_origin() stopped to early.  If
      there is only a single entry left to check the original code just took
      the end point as origin which might be wrong.
      
      This was introduced in commit de66a979 ("ARM: 7187/1: fix unwinding
      for XIP kernels").
      Reported-and-tested-by: NNick Bowler <nbowler@elliptictech.com>
      Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ddf5a25c
    • I
      xen: only limit memory map to maximum reservation for domain 0. · d3db7281
      Ian Campbell 提交于
      d312ae87 "xen: use maximum reservation to limit amount of usable RAM"
      clamped the total amount of RAM to the current maximum reservation. This is
      correct for dom0 but is not correct for guest domains. In order to boot a guest
      "pre-ballooned" (e.g. with memory=1G but maxmem=2G) in order to allow for
      future memory expansion the guest must derive max_pfn from the e820 provided by
      the toolstack and not the current maximum reservation (which can reflect only
      the current maximum, not the guest lifetime max). The existing algorithm
      already behaves this correctly if we do not artificially limit the maximum
      number of pages for the guest case.
      
      For a guest booted with maxmem=512, memory=128 this results in:
       [    0.000000] BIOS-provided physical RAM map:
       [    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
       [    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
      -[    0.000000]  Xen: 0000000000100000 - 0000000008100000 (usable)
      -[    0.000000]  Xen: 0000000008100000 - 0000000020800000 (unusable)
      +[    0.000000]  Xen: 0000000000100000 - 0000000020800000 (usable)
      ...
       [    0.000000] NX (Execute Disable) protection: active
       [    0.000000] DMI not present or invalid.
       [    0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
       [    0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
      -[    0.000000] last_pfn = 0x8100 max_arch_pfn = 0x1000000
      +[    0.000000] last_pfn = 0x20800 max_arch_pfn = 0x1000000
       [    0.000000] initial memory mapped : 0 - 027ff000
       [    0.000000] Base memory trampoline at [c009f000] 9f000 size 4096
      -[    0.000000] init_memory_mapping: 0000000000000000-0000000008100000
      -[    0.000000]  0000000000 - 0008100000 page 4k
      -[    0.000000] kernel direct mapping tables up to 8100000 @ 27bb000-27ff000
      +[    0.000000] init_memory_mapping: 0000000000000000-0000000020800000
      +[    0.000000]  0000000000 - 0020800000 page 4k
      +[    0.000000] kernel direct mapping tables up to 20800000 @ 26f8000-27ff000
       [    0.000000] xen: setting RW the range 27e8000 - 27ff000
       [    0.000000] 0MB HIGHMEM available.
      -[    0.000000] 129MB LOWMEM available.
      -[    0.000000]   mapped low ram: 0 - 08100000
      -[    0.000000]   low ram: 0 - 08100000
      +[    0.000000] 520MB LOWMEM available.
      +[    0.000000]   mapped low ram: 0 - 20800000
      +[    0.000000]   low ram: 0 - 20800000
      
      With this change "xl mem-set <domain> 512M" will successfully increase the
      guest RAM (by reducing the balloon).
      
      There is no change for dom0.
      Reported-and-Tested-by: NGeorge Shuklin <george.shuklin@gmail.com>
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: stable@kernel.org
      Reviewed-by: NDavid Vrabel <david.vrabel@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d3db7281
  7. 15 12月, 2011 1 次提交
  8. 14 12月, 2011 1 次提交
  9. 13 12月, 2011 2 次提交
  10. 12 12月, 2011 1 次提交
  11. 10 12月, 2011 1 次提交
    • M
      x86, efi: Make efi_call_phys_{prelog,epilog} CONFIG_RELOCATABLE-aware · 6d3e32e6
      Matt Fleming 提交于
      efi_call_phys_prelog() sets up a 1:1 mapping of the physical address
      range in swapper_pg_dir. Instead of replacing then restoring entries
      in swapper_pg_dir we should be using initial_page_table which already
      contains the 1:1 mapping.
      
      It's safe to blindly switch back to swapper_pg_dir in the epilog
      because the physical EFI routines are only called before
      efi_enter_virtual_mode(), e.g. before any user processes have been
      forked. Therefore, we don't need to track which pgd was in %cr3 when
      we entered the prelog.
      
      The previous code actually contained a bug because it assumed that the
      kernel was loaded at a physical address within the first 8MB of ram,
      usually at 0x100000. However, this isn't the case with a
      CONFIG_RELOCATABLE=y kernel which could have been loaded anywhere in
      the physical address space.
      
      Also delete the ancient (and bogus) comments about the page table
      being restored after the lock is released. There is no locking.
      
      Cc: Matthew Garrett <mjg@redhat.com>
      Cc: Darrent Hart <dvhart@linux.intel.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Link: http://lkml.kernel.org/r/1323346250.3894.74.camel@mfleming-mobl1.ger.corp.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      6d3e32e6
  12. 09 12月, 2011 7 次提交
    • Y
      thp: add compound tail page _mapcount when mapped · b6999b19
      Youquan Song 提交于
      With the 3.2-rc kernel, IOMMU 2M pages in KVM works.  But when I tried
      to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
      failed to be used.
      
      The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
      page calls gup_huge_pmd.  If compound pages are used and the page is a
      tail page, gup_huge_pmd() increases _mapcount to record tail page are
      mapped while gup_huge_pud does not do that.
      
      So when the mapped page is relesed, it will result in kernel oops
      because the page is not marked mapped.
      
      This patch add tail process for compound page in 1GB huge page which
      keeps the same process as 2M page.
      
      Reproduce like:
      1. Add grub boot option: hugepagesz=1G hugepages=8
      2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
      3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
      	-net none -device pci-assign,host=07:00.1
      
        kernel BUG at mm/swap.c:114!
        invalid opcode: 0000 [#1] SMP
        Call Trace:
          put_page+0x15/0x37
          kvm_release_pfn_clean+0x31/0x36
          kvm_iommu_put_pages+0x94/0xb1
          kvm_iommu_unmap_memslots+0x80/0xb6
          kvm_assign_device+0xba/0x117
          kvm_vm_ioctl_assigned_device+0x301/0xa47
          kvm_vm_ioctl+0x36c/0x3a2
          do_vfs_ioctl+0x49e/0x4e4
          sys_ioctl+0x5a/0x7c
          system_call_fastpath+0x16/0x1b
        RIP  put_compound_page+0xd4/0x168
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6999b19
    • S
      arm/imx: fix power button on imx51 babbage board · 847a2ee7
      Shawn Guo 提交于
      Since commit 6571534b (plat-mxc: iomux-v3.h: implicitly enable
      pull-up/down when that's desired) was in, the power button on imx51
      babbage board stopped working because it's pulled up by mistake.
      The patch removes the pull-up setting from the pad configuration for
      that gpio to make the power button back to work.
      Signed-off-by: NShawn Guo <shawn.guo@linaro.org>
      Signed-off-by: NSascha Hauer <s.hauer@pengutronix.de>
      847a2ee7
    • R
      ARM: imx: fix cpufreq build errors · 300a47b4
      Richard Zhao 提交于
        CC      arch/arm/plat-mxc/cpufreq.o
      arch/arm/plat-mxc/cpufreq.c:203: error: expected declaration specifiers or '...' before string constant
      arch/arm/plat-mxc/cpufreq.c:203: warning: data definition has no type or storage class
      arch/arm/plat-mxc/cpufreq.c:203: warning: type defaults to 'int' in declaration of 'MODULE_AUTHOR'
      arch/arm/plat-mxc/cpufreq.c:203: warning: function declaration isn't a prototype
      arch/arm/plat-mxc/cpufreq.c:204: error: expected declaration specifiers or '...' before string constant
      arch/arm/plat-mxc/cpufreq.c:204: warning: data definition has no type or storage class
      arch/arm/plat-mxc/cpufreq.c:204: warning: type defaults to 'int' in declaration of 'MODULE_DESCRIPTION'
      arch/arm/plat-mxc/cpufreq.c:204: warning: function declaration isn't a prototype
      arch/arm/plat-mxc/cpufreq.c:205: error: expected declaration specifiers or '...' before string constant
      arch/arm/plat-mxc/cpufreq.c:205: warning: data definition has no type or storage class
      arch/arm/plat-mxc/cpufreq.c:205: warning: type defaults to 'int' in declaration of 'MODULE_LICENSE'
      arch/arm/plat-mxc/cpufreq.c:205: warning: function declaration isn't a prototype
      make[1]: *** [arch/arm/plat-mxc/cpufreq.o] Error 1
      make: *** [arch/arm/plat-mxc] Error 2
      Signed-off-by: NRichard Zhao <richard.zhao@freescale.com>
      Signed-off-by: NRichard Zhao <richard.zhao@linaro.org>
      Signed-off-by: NSascha Hauer <s.hauer@pengutronix.de>
      300a47b4
    • D
    • J
      MXC PWM: should active during DOZE/WAIT/DBG mode · c0d96aed
      Jason Chen 提交于
      Signed-off-by: NJason Chen <jason.chen@linaro.org>
      Signed-off-by: NSascha Hauer <s.hauer@pengutronix.de>
      Cc: stable@kernel.org
      c0d96aed
    • M
      x86, efi: Calling __pa() with an ioremap()ed address is invalid · e8c71062
      Matt Fleming 提交于
      If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set
      in ->attribute we currently call set_memory_uc(), which in turn
      calls __pa() on a potentially ioremap'd address.
      
      On CONFIG_X86_32 this is invalid, resulting in the following
      oops on some machines:
      
        BUG: unable to handle kernel paging request at f7f22280
        IP: [<c10257b9>] reserve_ram_pages_type+0x89/0x210
        [...]
      
        Call Trace:
         [<c104f8ca>] ? page_is_ram+0x1a/0x40
         [<c1025aff>] reserve_memtype+0xdf/0x2f0
         [<c1024dc9>] set_memory_uc+0x49/0xa0
         [<c19334d0>] efi_enter_virtual_mode+0x1c2/0x3aa
         [<c19216d4>] start_kernel+0x291/0x2f2
         [<c19211c7>] ? loglevel+0x1b/0x1b
         [<c19210bf>] i386_start_kernel+0xbf/0xc8
      
      A better approach to this problem is to map the memory region
      with the correct attributes from the start, instead of modifying
      it after the fact. The uncached case can be handled by
      ioremap_nocache() and the cached by ioremap_cache().
      
      Despite first impressions, it's not possible to use
      ioremap_cache() to map all cached memory regions on
      CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really
      don't like being mapped into the vmalloc space, as detailed in
      the following bug report,
      
      	https://bugzilla.redhat.com/show_bug.cgi?id=748516
      
      Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA
      regions are covered by the direct kernel mapping table on
      CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI
      regions via the direct kernel mapping with the initial call to
      init_memory_mapping() in setup_arch(), whereas previously these
      regions wouldn't be mapped if they were after the last E820_RAM
      region until efi_ioremap() was called. Doing it this way allows
      us to delete efi_ioremap() completely.
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Matthew Garrett <mjg@redhat.com>
      Cc: Zhang Rui <rui.zhang@intel.com>
      Cc: Huang Ying <huang.ying.caritas@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.orgSigned-off-by: NIngo Molnar <mingo@elte.hu>
      e8c71062
    • M
      x86, hpet: Immediately disable HPET timer 1 if rtc irq is masked · 2ded6e6a
      Mark Langsdorf 提交于
      When HPET is operating in RTC mode, the TN_ENABLE bit on timer1
      controls whether the HPET or the RTC delivers interrupts to irq8. When
      the system goes into suspend, the RTC driver sends a signal to the
      HPET driver so that the HPET releases control of irq8, allowing the
      RTC to wake the system from suspend. The switchover is accomplished by
      a write to the HPET configuration registers which currently only
      occurs while servicing the HPET interrupt.
      
      On some systems, I have seen the system suspend before an HPET
      interrupt occurs, preventing the write to the HPET configuration
      register and leaving the HPET in control of the irq8. As the HPET is
      not active during suspend, it does not generate a wake signal and RTC
      alarms do not work.
      
      This patch forces the HPET driver to immediately transfer control of
      the irq8 channel to the RTC instead of waiting until the next
      interrupt event.
      Signed-off-by: NMark Langsdorf <mark.langsdorf@amd.com>
      Link: http://lkml.kernel.org/r/20111118153306.GB16319@alberich.amd.comTested-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Signed-off-by: NAndreas Herrmann <andreas.herrmann3@amd.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      2ded6e6a
  13. 08 12月, 2011 5 次提交
  14. 07 12月, 2011 1 次提交
  15. 06 12月, 2011 2 次提交