1. 28 8月, 2012 5 次提交
  2. 23 8月, 2012 4 次提交
  3. 22 8月, 2012 8 次提交
    • T
      KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended · 35f2d16b
      Takuya Yoshikawa 提交于
      Although the possible race described in
      
        commit 85b70591
        KVM: MMU: fix shrinking page from the empty mmu
      
      was correct, the real cause of that issue was a more trivial bug of
      mmu_shrink() introduced by
      
        commit 19526396
        KVM: MMU: do not iterate over all VMs in mmu_shrink()
      
      Here is the bug:
      
      	if (kvm->arch.n_used_mmu_pages > 0) {
      		if (!nr_to_scan--)
      			break;
      		continue;
      	}
      
      We skip VMs whose n_used_mmu_pages is not zero and try to shrink others:
      in other words we try to shrink empty ones by mistake.
      
      This patch reverses the logic so that mmu_shrink() can free pages from
      the first VM whose n_used_mmu_pages is not zero.  Note that we also add
      comments explaining the role of nr_to_scan which is not practically
      important now, hoping this will be improved in the future.
      Signed-off-by: NTakuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      35f2d16b
    • X
      KVM: introduce readonly memslot · 4d8b81ab
      Xiao Guangrong 提交于
      In current code, if we map a readonly memory space from host to guest
      and the page is not currently mapped in the host, we will get a fault
      pfn and async is not allowed, then the vm will crash
      
      We introduce readonly memory region to map ROM/ROMD to the guest, read access
      is happy for readonly memslot, write access on readonly memslot will cause
      KVM_EXIT_MMIO exit
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      4d8b81ab
    • X
      KVM: introduce gfn_to_pfn_memslot_atomic · 037d92dc
      Xiao Guangrong 提交于
      It can instead of hva_to_pfn_atomic
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      037d92dc
    • X
      KVM: x86: fix possible infinite loop caused by reexecute_instruction · 8e3d9d06
      Xiao Guangrong 提交于
      Currently, we reexecute all unhandleable instructions if they do not
      access on the mmio, however, it can not work if host map the readonly
      memory to guest. If the instruction try to write this kind of memory,
      it will fault again when guest retry it, then we will goto a infinite
      loop: retry instruction -> write #PF -> emulation fail ->
      retry instruction -> ...
      
      Fix it by retrying the instruction only when it faults on the writable
      memory
      Signed-off-by: NXiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      8e3d9d06
    • A
      x86/alternatives: Fix p6 nops on non-modular kernels · cb09cad4
      Avi Kivity 提交于
      Probably a leftover from the early days of self-patching, p6nops
      are marked __initconst_or_module, which causes them to be
      discarded in a non-modular kernel.  If something later triggers
      patching, it will overwrite kernel code with garbage.
      Reported-by: NTomas Racek <tracek@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      Cc: Michael Tokarev <mjt@tls.msk.ru>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: qemu-devel@nongnu.org
      Cc: Anthony Liguori <anthony@codemonkey.ws>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Alan Cox <alan@linux.intel.com>
      Link: http://lkml.kernel.org/r/5034AE84.90708@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cb09cad4
    • L
      x86/fixup_irq: Use cpu_online_mask instead of cpu_all_mask · 2530cd4f
      Liu, Chuansheng 提交于
      When one CPU is going down and this CPU is the last one in irq
      affinity, current code is setting cpu_all_mask as the new
      affinity for that irq.
      
      But for some systems (such as in Medfield Android mobile) the
      firmware sends the interrupt to each CPU in the irq affinity
      mask, averaged, and cpu_all_mask includes all potential CPUs,
      i.e. offline ones as well.
      
      So replace cpu_all_mask with cpu_online_mask.
      Signed-off-by: Nliu chuansheng <chuansheng.liu@intel.com>
      Acked-by: NYanmin Zhang <yanmin_zhang@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/27240C0AC20F114CBF8149A2696CBE4A137286@SHSMSX101.ccr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2530cd4f
    • R
      x86/spinlocks: Fix comment in spinlock.h · 83be4ffa
      Richard Weinberger 提交于
      This comment is no longer true.  We support up to 2^16 CPUs
      because __ticket_t is an u16 if NR_CPUS is larger than 256.
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      83be4ffa
    • M
      mm: hugetlbfs: correctly populate shared pmd · eb48c071
      Michal Hocko 提交于
      Each page mapped in a process's address space must be correctly
      accounted for in _mapcount.  Normally the rules for this are
      straightforward but hugetlbfs page table sharing is different.  The page
      table pages at the PMD level are reference counted while the mapcount
      remains the same.
      
      If this accounting is wrong, it causes bugs like this one reported by
      Larry Woodman:
      
        kernel BUG at mm/filemap.c:135!
        invalid opcode: 0000 [#1] SMP
        CPU 22
        Modules linked in: bridge stp llc sunrpc binfmt_misc dcdbas microcode pcspkr acpi_pad acpi]
        Pid: 18001, comm: mpitest Tainted: G        W    3.3.0+ #4 Dell Inc. PowerEdge R620/07NDJ2
        RIP: 0010:[<ffffffff8112cfed>]  [<ffffffff8112cfed>] __delete_from_page_cache+0x15d/0x170
        Process mpitest (pid: 18001, threadinfo ffff880428972000, task ffff880428b5cc20)
        Call Trace:
          delete_from_page_cache+0x40/0x80
          truncate_hugepages+0x115/0x1f0
          hugetlbfs_evict_inode+0x18/0x30
          evict+0x9f/0x1b0
          iput_final+0xe3/0x1e0
          iput+0x3e/0x50
          d_kill+0xf8/0x110
          dput+0xe2/0x1b0
          __fput+0x162/0x240
      
      During fork(), copy_hugetlb_page_range() detects if huge_pte_alloc()
      shared page tables with the check dst_pte == src_pte.  The logic is if
      the PMD page is the same, they must be shared.  This assumes that the
      sharing is between the parent and child.  However, if the sharing is
      with a different process entirely then this check fails as in this
      diagram:
      
        parent
          |
          ------------>pmd
                       src_pte----------> data page
                                              ^
        other--------->pmd--------------------|
                        ^
        child-----------|
                       dst_pte
      
      For this situation to occur, it must be possible for Parent and Other to
      have faulted and failed to share page tables with each other.  This is
      possible due to the following style of race.
      
        PROC A                                          PROC B
        copy_hugetlb_page_range                         copy_hugetlb_page_range
          src_pte == huge_pte_offset                      src_pte == huge_pte_offset
          !src_pte so no sharing                          !src_pte so no sharing
      
        (time passes)
      
        hugetlb_fault                                   hugetlb_fault
          huge_pte_alloc                                  huge_pte_alloc
            huge_pmd_share                                 huge_pmd_share
              LOCK(i_mmap_mutex)
              find nothing, no sharing
              UNLOCK(i_mmap_mutex)
                                                            LOCK(i_mmap_mutex)
                                                            find nothing, no sharing
                                                            UNLOCK(i_mmap_mutex)
            pmd_alloc                                       pmd_alloc
            LOCK(instantiation_mutex)
            fault
            UNLOCK(instantiation_mutex)
                                                        LOCK(instantiation_mutex)
                                                        fault
                                                        UNLOCK(instantiation_mutex)
      
      These two processes are not poing to the same data page but are not
      sharing page tables because the opportunity was missed.  When either
      process later forks, the src_pte == dst pte is potentially insufficient.
      As the check falls through, the wrong PTE information is copied in
      (harmless but wrong) and the mapcount is bumped for a page mapped by a
      shared page table leading to the BUG_ON.
      
      This patch addresses the issue by moving pmd_alloc into huge_pmd_share
      which guarantees that the shared pud is populated in the same critical
      section as pmd.  This also means that huge_pte_offset test in
      huge_pmd_share is serialized correctly now which in turn means that the
      success of the sharing will be higher as the racing tasks see the pud
      and pmd populated together.
      
      Race identified and changelog written mostly by Mel Gorman.
      
      {akpm@linux-foundation.org: attempt to make the huge_pmd_share() comment comprehensible, clean up coding style]
      Reported-by: NLarry Woodman <lwoodman@redhat.com>
      Tested-by: NLarry Woodman <lwoodman@redhat.com>
      Reviewed-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Ken Chen <kenchen@google.com>
      Cc: Cong Wang <xiyou.wangcong@gmail.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eb48c071
  4. 19 8月, 2012 1 次提交
    • M
      x32: Use compat shims for {g,s}etsockopt · 515c7af8
      Mike Frysinger 提交于
      Some of the arguments to {g,s}etsockopt are passed in userland pointers.
      If we try to use the 64bit entry point, we end up sometimes failing.
      
      For example, dhcpcd doesn't run in x32:
      	# dhcpcd eth0
      	dhcpcd[1979]: version 5.5.6 starting
      	dhcpcd[1979]: eth0: broadcasting for a lease
      	dhcpcd[1979]: eth0: open_socket: Invalid argument
      	dhcpcd[1979]: eth0: send_raw_packet: Bad file descriptor
      
      The code in particular is getting back EINVAL when doing:
      	struct sock_fprog pf;
      	setsockopt(s, SOL_SOCKET, SO_ATTACH_FILTER, &pf, sizeof(pf));
      
      Diving into the kernel code, we can see:
      include/linux/filter.h:
      	struct sock_fprog {
      		unsigned short len;
      		struct sock_filter __user *filter;
      	};
      
      net/core/sock.c:
      	case SO_ATTACH_FILTER:
      		ret = -EINVAL;
      		if (optlen == sizeof(struct sock_fprog)) {
      			struct sock_fprog fprog;
      
      			ret = -EFAULT;
      			if (copy_from_user(&fprog, optval, sizeof(fprog)))
      				break;
      
      			ret = sk_attach_filter(&fprog, sk);
      		}
      		break;
      
      arch/x86/syscalls/syscall_64.tbl:
      	54 common setsockopt sys_setsockopt
      	55 common getsockopt sys_getsockopt
      
      So for x64, sizeof(sock_fprog) is 16 bytes.  For x86/x32, it's 8 bytes.
      This comes down to the pointer being 32bit for x32, which means we need
      to do structure size translation.  But since x32 comes in directly to
      sys_setsockopt, it doesn't get translated like x86.
      
      After changing the syscall table and rebuilding glibc with the new kernel
      headers, dhcp runs fine in an x32 userland.
      
      Oddly, it seems like Linus noted the same thing during the initial port,
      but I guess that was missed/lost along the way:
      	https://lkml.org/lkml/2011/8/26/452
      
      [ hpa: tagging for -stable since this is an ABI fix. ]
      
      Bugzilla: https://bugs.gentoo.org/423649Reported-by: NMads <mads@ab3.no>
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Link: http://lkml.kernel.org/r/1345320697-15713-1-git-send-email-vapier@gentoo.org
      Cc: H. J. Lu <hjl.tools@gmail.com>
      Cc: <stable@vger.kernel.org> v3.4..v3.5
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      515c7af8
  5. 17 8月, 2012 2 次提交
  6. 16 8月, 2012 1 次提交
  7. 15 8月, 2012 3 次提交
  8. 14 8月, 2012 8 次提交
  9. 11 8月, 2012 1 次提交
  10. 09 8月, 2012 2 次提交
  11. 07 8月, 2012 4 次提交
  12. 06 8月, 2012 1 次提交