1. 29 3月, 2012 1 次提交
  2. 22 3月, 2012 3 次提交
  3. 20 3月, 2012 1 次提交
  4. 07 3月, 2012 2 次提交
    • L
      x86: fix typo in recent find_vma_prev purge · 55062d06
      Linus Torvalds 提交于
      It turns out that test-compiling this file on x86-64 doesn't really
      help, because much of it is x86-32-specific.  And so I hadn't noticed
      the slightly over-eager removal of the 'r' from 'addr' variable despite
      thinking I had tested it.
      Signed-off-by: NLinus "oopsie" Torvalds <torvalds@linux-foundation.org>
      55062d06
    • L
      vm: avoid using find_vma_prev() unnecessarily · 097d5910
      Linus Torvalds 提交于
      Several users of "find_vma_prev()" were not in fact interested in the
      previous vma if there was no primary vma to be found either.  And in
      those cases, we're much better off just using the regular "find_vma()",
      and then "prev" can be looked up by just checking vma->vm_prev.
      
      The find_vma_prev() semantics are fairly subtle (see Mikulas' recent
      commit 83cd904d: "mm: fix find_vma_prev"), and the whole "return
      prev by reference" means that it generates worse code too.
      
      Thus this "let's avoid using this inconvenient and clearly too subtle
      interface when we don't really have to" patch.
      
      Cc: Mikulas Patocka <mpatocka@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      097d5910
  5. 04 3月, 2012 1 次提交
  6. 27 1月, 2012 1 次提交
    • P
      bugs, x86: Fix printk levels for panic, softlockups and stack dumps · b0f4c4b3
      Prarit Bhargava 提交于
      rsyslog will display KERN_EMERG messages on a connected
      terminal.  However, these messages are useless/undecipherable
      for a general user.
      
      For example, after a softlockup we get:
      
       Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
       kernel:Stack:
      
       Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
       kernel:Call Trace:
      
       Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ...
       kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89
       d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 <e8> ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89
      
      This happens because the printk levels for these messages are
      incorrect. Only an informational message should be displayed on
      a terminal.
      
      I modified the printk levels for various messages in the kernel
      and tested the output by using the drivers/misc/lkdtm.c kernel
      modules (ie, softlockups, panics, hard lockups, etc.) and
      confirmed that the console output was still the same and that
      the output to the terminals was correct.
      
      For example, in the case of a softlockup we now see the much
      more informative:
      
       Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ...
       BUG: soft lockup - CPU4 stuck for 60s!
      
      instead of the above confusing messages.
      
      AFAICT, the messages no longer have to be KERN_EMERG.  In the
      most important case of a panic we set console_verbose().  As for
      the other less severe cases the correct data is output to the
      console and /var/log/messages.
      
      Successfully tested by me using the drivers/misc/lkdtm.c module.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: dzickus@redhat.com
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1327586134-11926-1-git-send-email-prarit@redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      b0f4c4b3
  7. 17 1月, 2012 1 次提交
  8. 13 1月, 2012 2 次提交
  9. 24 12月, 2011 1 次提交
  10. 09 12月, 2011 2 次提交
    • Y
      thp: add compound tail page _mapcount when mapped · b6999b19
      Youquan Song 提交于
      With the 3.2-rc kernel, IOMMU 2M pages in KVM works.  But when I tried
      to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
      failed to be used.
      
      The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
      page calls gup_huge_pmd.  If compound pages are used and the page is a
      tail page, gup_huge_pmd() increases _mapcount to record tail page are
      mapped while gup_huge_pud does not do that.
      
      So when the mapped page is relesed, it will result in kernel oops
      because the page is not marked mapped.
      
      This patch add tail process for compound page in 1GB huge page which
      keeps the same process as 2M page.
      
      Reproduce like:
      1. Add grub boot option: hugepagesz=1G hugepages=8
      2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
      3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
      	-net none -device pci-assign,host=07:00.1
      
        kernel BUG at mm/swap.c:114!
        invalid opcode: 0000 [#1] SMP
        Call Trace:
          put_page+0x15/0x37
          kvm_release_pfn_clean+0x31/0x36
          kvm_iommu_put_pages+0x94/0xb1
          kvm_iommu_unmap_memslots+0x80/0xb6
          kvm_assign_device+0xba/0x117
          kvm_vm_ioctl_assigned_device+0x301/0xa47
          kvm_vm_ioctl+0x36c/0x3a2
          do_vfs_ioctl+0x49e/0x4e4
          sys_ioctl+0x5a/0x7c
          system_call_fastpath+0x16/0x1b
        RIP  put_compound_page+0xd4/0x168
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6999b19
    • P
      x86/numa: Add constraints check for nid parameters · 54eed6cb
      Petr Holasek 提交于
      This patch adds constraint checks to the numa_set_distance()
      function.
      
      When the check triggers (this should not happen normally) it
      emits a warning and avoids a store to a negative index in
      numa_distance[] array - i.e. avoids memory corruption.
      
      Negative ids can be passed when the pxm-to-nids mapping is not
      properly filled while parsing the SRAT.
      Signed-off-by: NPetr Holasek <pholasek@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Anton Arapov <anton@redhat.com>
      Link: http://lkml.kernel.org/r/20111208121640.GA2229@dhcp-27-244.brq.redhat.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      54eed6cb
  11. 06 12月, 2011 5 次提交
  12. 05 12月, 2011 1 次提交
  13. 11 11月, 2011 7 次提交
  14. 03 11月, 2011 2 次提交
    • A
      thp: share get_huge_page_tail() · b35a35b5
      Andrea Arcangeli 提交于
      This avoids duplicating the function in every arch gup_fast.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b35a35b5
    • A
      mm: thp: tail page refcounting fix · 70b50f94
      Andrea Arcangeli 提交于
      Michel while working on the working set estimation code, noticed that
      calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
      wasn't safe, if the pfn ended up being a tail page of a transparent
      hugepage under splitting by __split_huge_page_refcount().
      
      He then found the problem could also theoretically materialize with
      page_cache_get_speculative() during the speculative radix tree lookups
      that uses get_page_unless_zero() in SMP if the radix tree page is freed
      and reallocated and get_user_pages is called on it before
      page_cache_get_speculative has a chance to call get_page_unless_zero().
      
      So the best way to fix the problem is to keep page_tail->_count zero at
      all times.  This will guarantee that get_page_unless_zero() can never
      succeed on any tail page.  page_tail->_mapcount is guaranteed zero and
      is unused for all tail pages of a compound page, so we can simply
      account the tail page references there and transfer them to
      tail_page->_count in __split_huge_page_refcount() (in addition to the
      head_page->_mapcount).
      
      While debugging this s/_count/_mapcount/ change I also noticed get_page is
      called by direct-io.c on pages returned by get_user_pages.  That wasn't
      entirely safe because the two atomic_inc in get_page weren't atomic.  As
      opposed to other get_user_page users like secondary-MMU page fault to
      establish the shadow pagetables would never call any superflous get_page
      after get_user_page returns.  It's safer to make get_page universally safe
      for tail pages and to use get_page_foll() within follow_page (inside
      get_user_pages()).  get_page_foll() is safe to do the refcounting for tail
      pages without taking any locks because it is run within PT lock protected
      critical sections (PT lock for pte and page_table_lock for
      pmd_trans_huge).
      
      The standard get_page() as invoked by direct-io instead will now take
      the compound_lock but still only for tail pages.  The direct-io paths
      are usually I/O bound and the compound_lock is per THP so very
      finegrined, so there's no risk of scalability issues with it.  A simple
      direct-io benchmarks with all lockdep prove locking and spinlock
      debugging infrastructure enabled shows identical performance and no
      overhead.  So it's worth it.  Ideally direct-io should stop calling
      get_page() on pages returned by get_user_pages().  The spinlock in
      get_page() is already optimized away for no-THP builds but doing
      get_page() on tail pages returned by GUP is generally a rare operation
      and usually only run in I/O paths.
      
      This new refcounting on page_tail->_mapcount in addition to avoiding new
      RCU critical sections will also allow the working set estimation code to
      work without any further complexity associated to the tail page
      refcounting with THP.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reported-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NMinchan Kim <minchan.kim@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Cc: <stable@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70b50f94
  15. 24 10月, 2011 1 次提交
    • T
      x86: Fix S4 regression · 8548c84d
      Takashi Iwai 提交于
      Commit 4b239f45 ("x86-64, mm: Put early page table high") causes a S4
      regression since 2.6.39, namely the machine reboots occasionally at S4
      resume.  It doesn't happen always, overall rate is about 1/20.  But,
      like other bugs, once when this happens, it continues to happen.
      
      This patch fixes the problem by essentially reverting the memory
      assignment in the older way.
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      Cc: <stable@kernel.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: Yinghai Lu <yinghai.lu@oracle.com>
      [ We'll hopefully find the real fix, but that's too late for 3.1 now ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8548c84d
  16. 29 9月, 2011 1 次提交
    • J
      x86-64: Don't apply destructive erratum workaround on unaffected CPUs · e05139f2
      Jan Beulich 提交于
      Erratum 93 applies to AMD K8 CPUs only, and its workaround
      (forcing the upper 32 bits of %rip to all get set under certain
      conditions) is actually getting in the way of analyzing page
      faults occurring during EFI physical mode runtime calls (in
      particular the page table walk shown is completely unrelated to
      the actual fault). This is because typically EFI runtime code
      lives in the space between 2G and 4G, which - modulo the above
      manipulation - is likely to overlap with the kernel or modules
      area.
      
      While even for the other errata workarounds their taking effect
      could be limited to just the affected CPUs, none of them appears
      to be destructive, and they're generally getting called only
      outside of performance critical paths, so they're being left
      untouched.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Link: http://lkml.kernel.org/r/4E835FE30200007800058464@nat28.tlf.novell.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      e05139f2
  17. 15 9月, 2011 1 次提交
  18. 23 8月, 2011 1 次提交
  19. 16 8月, 2011 2 次提交
  20. 11 8月, 2011 1 次提交
  21. 07 8月, 2011 1 次提交
  22. 06 8月, 2011 1 次提交
    • B
      x86, amd: Avoid cache aliasing penalties on AMD family 15h · dfb09f9b
      Borislav Petkov 提交于
      This patch provides performance tuning for the "Bulldozer" CPU. With its
      shared instruction cache there is a chance of generating an excessive
      number of cache cross-invalidates when running specific workloads on the
      cores of a compute module.
      
      This excessive amount of cross-invalidations can be observed if cache
      lines backed by shared physical memory alias in bits [14:12] of their
      virtual addresses, as those bits are used for the index generation.
      
      This patch addresses the issue by clearing all the bits in the [14:12]
      slice of the file mapping's virtual address at generation time, thus
      forcing those bits the same for all mappings of a single shared library
      across processes and, in doing so, avoids instruction cache aliases.
      
      It also adds the command line option "align_va_addr=(32|64|on|off)" with
      which virtual address alignment can be enabled for 32-bit or 64-bit x86
      individually, or both, or be completely disabled.
      
      This change leaves virtual region address allocation on other families
      and/or vendors unaffected.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Link: http://lkml.kernel.org/r/1312550110-24160-2-git-send-email-bp@amd64.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      dfb09f9b
  23. 05 8月, 2011 1 次提交