1. 19 5月, 2018 4 次提交
  2. 18 5月, 2018 1 次提交
    • M
      kvm: rename KVM_HINTS_DEDICATED to KVM_HINTS_REALTIME · 633711e8
      Michael S. Tsirkin 提交于
      KVM_HINTS_DEDICATED seems to be somewhat confusing:
      
      Guest doesn't really care whether it's the only task running on a host
      CPU as long as it's not preempted.
      
      And there are more reasons for Guest to be preempted than host CPU
      sharing, for example, with memory overcommit it can get preempted on a
      memory access, post copy migration can cause preemption, etc.
      
      Let's call it KVM_HINTS_REALTIME which seems to better
      match what guests expect.
      
      Also, the flag most be set on all vCPUs - current guests assume this.
      Note so in the documentation.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      633711e8
  3. 14 5月, 2018 1 次提交
  4. 02 5月, 2018 3 次提交
  5. 27 4月, 2018 2 次提交
  6. 26 4月, 2018 1 次提交
  7. 24 4月, 2018 2 次提交
  8. 23 4月, 2018 1 次提交
  9. 21 4月, 2018 1 次提交
  10. 17 4月, 2018 5 次提交
    • A
      x86,sched: Allow topologies where NUMA nodes share an LLC · 1340ccfa
      Alison Schofield 提交于
      Intel's Skylake Server CPUs have a different LLC topology than previous
      generations. When in Sub-NUMA-Clustering (SNC) mode, the package is divided
      into two "slices", each containing half the cores, half the LLC, and one
      memory controller and each slice is enumerated to Linux as a NUMA
      node. This is similar to how the cores and LLC were arranged for the
      Cluster-On-Die (CoD) feature.
      
      CoD allowed the same cache line to be present in each half of the LLC.
      But, with SNC, each line is only ever present in *one* slice. This means
      that the portion of the LLC *available* to a CPU depends on the data being
      accessed:
      
          Remote socket: entire package LLC is shared
          Local socket->local slice: data goes into local slice LLC
          Local socket->remote slice: data goes into remote-slice LLC. Slightly
                          		higher latency than local slice LLC.
      
      The biggest implication from this is that a process accessing all
      NUMA-local memory only sees half the LLC capacity.
      
      The CPU describes its cache hierarchy with the CPUID instruction. One of
      the CPUID leaves enumerates the "logical processors sharing this
      cache". This information is used for scheduling decisions so that tasks
      move more freely between CPUs sharing the cache.
      
      But, the CPUID for the SNC configuration discussed above enumerates the LLC
      as being shared by the entire package. This is not 100% precise because the
      entire cache is not usable by all accesses. But, it *is* the way the
      hardware enumerates itself, and this is not likely to change.
      
      The userspace visible impact of all the above is that the sysfs info
      reports the entire LLC as being available to the entire package. As noted
      above, this is not true for local socket accesses. This patch does not
      correct the sysfs info. It is the same, pre and post patch.
      
      The current code emits the following warning:
      
       sched: CPU #3's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency.
      
      The warning is coming from the topology_sane() check in smpboot.c because
      the topology is not matching the expectations of the model for obvious
      reasons.
      
      To fix this, add a vendor and model specific check to never call
      topology_sane() for these systems. Also, just like "Cluster-on-Die" disable
      the "coregroup" sched_domain_topology_level and use NUMA information from
      the SRAT alone.
      
      This is OK at least on the hardware we are immediately concerned about
      because the LLC sharing happens at both the slice and at the package level,
      which are also NUMA boundaries.
      Signed-off-by: NAlison Schofield <alison.schofield@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: brice.goglin@gmail.com
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@linux.intel.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Link: https://lkml.kernel.org/r/20180407002130.GA18984@alison-desk.jf.intel.com
      1340ccfa
    • D
      x86/acpi: Prevent X2APIC id 0xffffffff from being accounted · 10daf10a
      Dou Liyang 提交于
      RongQing reported that there are some X2APIC id 0xffffffff in his machine's
      ACPI MADT table, which makes the number of possible CPU inaccurate.
      
      The reason is that the ACPI X2APIC parser has no sanity check for APIC ID
      0xffffffff, which is an invalid id in all APIC types. See "Intel® 64
      Architecture x2APIC Specification", Chapter 2.4.1.
      
      Add a sanity check to acpi_parse_x2apic() which ignores the invalid id.
      Reported-by: NLi RongQing <lirongqing@baidu.com>
      Signed-off-by: NDou Liyang <douly.fnst@cn.fujitsu.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: len.brown@intel.com
      Cc: rjw@rjwysocki.net
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/20180412014052.25186-1-douly.fnst@cn.fujitsu.com
      10daf10a
    • X
      x86/tsc: Prevent 32bit truncation in calc_hpet_ref() · d3878e16
      Xiaoming Gao 提交于
      The TSC calibration code uses HPET as reference. The conversion normalizes
      the delta of two HPET timestamps:
      
          hpetref = ((tshpet1 - tshpet2) * HPET_PERIOD) / 1e6
      
      and then divides the normalized delta of the corresponding TSC timestamps
      by the result to calulate the TSC frequency.
      
          tscfreq = ((tstsc1 - tstsc2 ) * 1e6) / hpetref
      
      This uses do_div() which takes an u32 as the divisor, which worked so far
      because the HPET frequency was low enough that 'hpetref' never exceeded
      32bit.
      
      On Skylake machines the HPET frequency increased so 'hpetref' can exceed
      32bit. do_div() truncates the divisor, which causes the calibration to
      fail.
      
      Use div64_u64() to avoid the problem.
      
      [ tglx: Fixes whitespace mangled patch and rewrote changelog ]
      Signed-off-by: NXiaoming Gao <newtongao@tencent.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: peterz@infradead.org
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/38894564-4fc9-b8ec-353f-de702839e44e@gmail.com
      d3878e16
    • C
      x86: Remove pci-nommu.c · ef97837d
      Christoph Hellwig 提交于
      The commit that switched x86 to dma_direct_ops stopped using and building
      this file, but accidentally left it in the tree.  Remove it.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: iommu@lists.infradead.org
      Link: https://lkml.kernel.org/r/20180416124442.13831-1-hch@lst.de
      ef97837d
    • J
      x86/ldt: Fix support_pte_mask filtering in map_ldt_struct() · e6f39e87
      Joerg Roedel 提交于
      The |= operator will let us end up with an invalid PTE. Use
      the correct &= instead.
      
      [ The bug was also independently reported by Shuah Khan ]
      
      Fixes: fb43d6cb ('x86/mm: Do not auto-massage page protections')
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NJoerg Roedel <jroedel@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6f39e87
  11. 14 4月, 2018 9 次提交
  12. 12 4月, 2018 3 次提交
    • D
      x86/mm: Comment _PAGE_GLOBAL mystery · 430d4005
      Dave Hansen 提交于
      I was mystified as to where the _PAGE_GLOBAL in the kernel page tables
      for kernel text came from.  I audited all the places I could find, but
      I missed one: head_64.S.
      
      The page tables that we create in here live for a long time, and they
      also have _PAGE_GLOBAL set, despite whether the processor supports it
      or not.  It's harmless, and we got *lucky* that the pageattr code
      accidentally clears it when we wipe it out of __supported_pte_mask and
      then later try to mark kernel text read-only.
      
      Comment some of these properties to make it easier to find and
      understand in the future.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205513.079BB265@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      430d4005
    • D
      x86/mm: Do not auto-massage page protections · fb43d6cb
      Dave Hansen 提交于
      A PTE is constructed from a physical address and a pgprotval_t.
      __PAGE_KERNEL, for instance, is a pgprot_t and must be converted
      into a pgprotval_t before it can be used to create a PTE.  This is
      done implicitly within functions like pfn_pte() by massage_pgprot().
      
      However, this makes it very challenging to set bits (and keep them
      set) if your bit is being filtered out by massage_pgprot().
      
      This moves the bit filtering out of pfn_pte() and friends.  For
      users of PAGE_KERNEL*, filtering will be done automatically inside
      those macros but for users of __PAGE_KERNEL*, they need to do their
      own filtering now.
      
      Note that we also just move pfn_pte/pmd/pud() over to check_pgprot()
      instead of massage_pgprot().  This way, we still *look* for
      unsupported bits and properly warn about them if we find them.  This
      might happen if an unfiltered __PAGE_KERNEL* value was passed in,
      for instance.
      
      - printk format warning fix from: Arnd Bergmann <arnd@arndb.de>
      - boot crash fix from:            Tom Lendacky <thomas.lendacky@amd.com>
      - crash bisected by:              Mike Galbraith <efault@gmx.de>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reported-and-fixed-by: NArnd Bergmann <arnd@arndb.de>
      Fixed-by: NTom Lendacky <thomas.lendacky@amd.com>
      Bisected-by: NMike Galbraith <efault@gmx.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205509.77E1D7F6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fb43d6cb
    • P
      xen, mm: allow deferred page initialization for xen pv domains · 6f84f8d1
      Pavel Tatashin 提交于
      Juergen Gross noticed that commit f7f99100 ("mm: stop zeroing memory
      during allocation in vmemmap") broke XEN PV domains when deferred struct
      page initialization is enabled.
      
      This is because the xen's PagePinned() flag is getting erased from
      struct pages when they are initialized later in boot.
      
      Juergen fixed this problem by disabling deferred pages on xen pv
      domains.  It is desirable, however, to have this feature available as it
      reduces boot time.  This fix re-enables the feature for pv-dmains, and
      fixes the problem the following way:
      
      The fix is to delay setting PagePinned flag until struct pages for all
      allocated memory are initialized, i.e.  until after free_all_bootmem().
      
      A new x86_init.hyper op init_after_bootmem() is called to let xen know
      that boot allocator is done, and hence struct pages for all the
      allocated memory are now initialized.  If deferred page initialization
      is enabled, the rest of struct pages are going to be initialized later
      in boot once page_alloc_init_late() is called.
      
      xen_after_bootmem() walks page table's pages and marks them pinned.
      
      Link: http://lkml.kernel.org/r/20180226160112.24724-2-pasha.tatashin@oracle.comSigned-off-by: NPavel Tatashin <pasha.tatashin@oracle.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Tested-by: NJuergen Gross <jgross@suse.com>
      Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
      Cc: Pavel Tatashin <pasha.tatashin@oracle.com>
      Cc: Alok Kataria <akataria@vmware.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Mathias Krause <minipli@googlemail.com>
      Cc: Jinbum Park <jinb.park7@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jia Zhang <zhang.jia@linux.alibaba.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Stefano Stabellini <sstabellini@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6f84f8d1
  13. 11 4月, 2018 1 次提交
  14. 10 4月, 2018 3 次提交
    • L
      x86/apic: Fix signedness bug in APIC ID validity checks · a774635d
      Li RongQing 提交于
      The APIC ID as parsed from ACPI MADT is validity checked with the
      apic->apic_id_valid() callback, which depends on the selected APIC type.
      
      For non X2APIC types APIC IDs >= 0xFF are invalid, but values > 0x7FFFFFFF
      are detected as valid. This happens because the 'apicid' argument of the
      apic_id_valid() callback is type 'int'. So the resulting comparison
      
         apicid < 0xFF
      
      evaluates to true for all unsigned int values > 0x7FFFFFFF which are handed
      to default_apic_id_valid(). As a consequence, invalid APIC IDs in !X2APIC
      mode are considered valid and accounted as possible CPUs.
      
      Change the apicid argument type of the apic_id_valid() callback to u32 so
      the evaluation is unsigned and returns the correct result.
      
      [ tglx: Massaged changelog ]
      Signed-off-by: NLi RongQing <lirongqing@baidu.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Cc: jgross@suse.com
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: hpa@zytor.com
      Link: https://lkml.kernel.org/r/1523322966-10296-1-git-send-email-lirongqing@baidu.com
      a774635d
    • K
      x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption · d94a155c
      Kirill A. Shutemov 提交于
      Some features (Intel MKTME, AMD SME) reduce the number of effectively
      available physical address bits. cpuinfo_x86::x86_phys_bits is adjusted
      accordingly during the early cpu feature detection.
      
      Though if get_cpu_cap() is called later again then this adjustement is
      overwritten. That happens in setup_pku(), which is called after
      detect_tme().
      
      To address this, extract the address sizes enumeration into a separate
      function, which is only called only from early_identify_cpu() and from
      generic_identify().
      
      This makes get_cpu_cap() safe to be called later during boot proccess
      without overwriting cpuinfo_x86::x86_phys_bits.
      
      [ tglx: Massaged changelog ]
      
      Fixes: cb06d8e3 ("x86/tme: Detect if TME and MKTME is activated by BIOS")
      Reported-by: NKai Huang <kai.huang@linux.intel.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: linux-mm@kvack.org
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Link: https://lkml.kernel.org/r/20180410092704.41106-1-kirill.shutemov@linux.intel.com
      d94a155c
    • D
      x86/espfix: Document use of _PAGE_GLOBAL · 6baf4bec
      Dave Hansen 提交于
      The "normal" kernel page table creation mechanisms using
      PAGE_KERNEL_* page protections will never set _PAGE_GLOBAL with PTI.
      The few places in the kernel that always want _PAGE_GLOBAL must
      avoid using PAGE_KERNEL_*.
      
      Document that we want it here and its use is not accidental.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20180406205507.BCF4D4F0@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6baf4bec
  15. 06 4月, 2018 1 次提交
    • R
      headers: untangle kmemleak.h from mm.h · 514c6032
      Randy Dunlap 提交于
      Currently <linux/slab.h> #includes <linux/kmemleak.h> for no obvious
      reason.  It looks like it's only a convenience, so remove kmemleak.h
      from slab.h and add <linux/kmemleak.h> to any users of kmemleak_* that
      don't already #include it.  Also remove <linux/kmemleak.h> from source
      files that do not use it.
      
      This is tested on i386 allmodconfig and x86_64 allmodconfig.  It would
      be good to run it through the 0day bot for other $ARCHes.  I have
      neither the horsepower nor the storage space for the other $ARCHes.
      
      Update: This patch has been extensively build-tested by both the 0day
      bot & kisskb/ozlabs build farms.  Both of them reported 2 build failures
      for which patches are included here (in v2).
      
      [ slab.h is the second most used header file after module.h; kernel.h is
        right there with slab.h. There could be some minor error in the
        counting due to some #includes having comments after them and I didn't
        combine all of those. ]
      
      [akpm@linux-foundation.org: security/keys/big_key.c needs vmalloc.h, per sfr]
      Link: http://lkml.kernel.org/r/e4309f98-3749-93e1-4bb7-d9501a39d015@infradead.org
      Link: http://kisskb.ellerman.id.au/kisskb/head/13396/Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Reported-by: Michael Ellerman <mpe@ellerman.id.au>	[2 build failures]
      Reported-by: Fengguang Wu <fengguang.wu@intel.com>	[2 build failures]
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Wei Yongjun <weiyongjun1@huawei.com>
      Cc: Luis R. Rodriguez <mcgrof@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: John Johansen <john.johansen@canonical.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      514c6032
  16. 03 4月, 2018 2 次提交