1. 09 9月, 2015 3 次提交
    • V
      mm: rename alloc_pages_exact_node() to __alloc_pages_node() · 96db800f
      Vlastimil Babka 提交于
      alloc_pages_exact_node() was introduced in commit 6484eb3e ("page
      allocator: do not check NUMA node ID when the caller knows the node is
      valid") as an optimized variant of alloc_pages_node(), that doesn't
      fallback to current node for nid == NUMA_NO_NODE.  Unfortunately the
      name of the function can easily suggest that the allocation is
      restricted to the given node and fails otherwise.  In truth, the node is
      only preferred, unless __GFP_THISNODE is passed among the gfp flags.
      
      The misleading name has lead to mistakes in the past, see for example
      commits 5265047a ("mm, thp: really limit transparent hugepage
      allocation to local node") and b360edb4 ("mm, mempolicy:
      migrate_to_node should only migrate to node").
      
      Another issue with the name is that there's a family of
      alloc_pages_exact*() functions where 'exact' means exact size (instead
      of page order), which leads to more confusion.
      
      To prevent further mistakes, this patch effectively renames
      alloc_pages_exact_node() to __alloc_pages_node() to better convey that
      it's an optimized variant of alloc_pages_node() not intended for general
      usage.  Both functions get described in comments.
      
      It has been also considered to really provide a convenience function for
      allocations restricted to a node, but the major opinion seems to be that
      __GFP_THISNODE already provides that functionality and we shouldn't
      duplicate the API needlessly.  The number of users would be small
      anyway.
      
      Existing callers of alloc_pages_exact_node() are simply converted to
      call __alloc_pages_node(), with the exception of sba_alloc_coherent()
      which open-codes the check for NUMA_NO_NODE, so it is converted to use
      alloc_pages_node() instead.  This means it no longer performs some
      VM_BUG_ON checks, and since the current check for nid in
      alloc_pages_node() uses a 'nid < 0' comparison (which includes
      NUMA_NO_NODE), it may hide wrong values which would be previously
      exposed.
      
      Both differences will be rectified by the next patch.
      
      To sum up, this patch makes no functional changes, except temporarily
      hiding potentially buggy callers.  Restricting the checks in
      alloc_pages_node() is left for the next patch which can in turn expose
      more existing buggy callers.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NRobin Holt <robinmholt@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Gleb Natapov <gleb@kernel.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Cliff Whickman <cpw@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96db800f
    • M
      x86: use generic early mem copy · 5dd2c4bd
      Mark Salter 提交于
      The early_ioremap library now has a generic copy_from_early_mem()
      function.  Use the generic copy function for x86 relocate_initrd().
      
      [akpm@linux-foundation.org: remove MAX_MAP_CHUNK define, per Yinghai Lu]
      Signed-off-by: NMark Salter <msalter@redhat.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5dd2c4bd
    • T
      mem-hotplug: handle node hole when initializing numa_meminfo. · 95cf82ec
      Tang Chen 提交于
      When parsing SRAT, all memory ranges are added into numa_meminfo.  In
      numa_init(), before entering numa_cleanup_meminfo(), all possible memory
      ranges are in numa_meminfo.  And numa_cleanup_meminfo() removes all
      ranges over max_pfn or empty.
      
      But, this only works if the nodes are continuous.  Let's have a look at
      the following example:
      
      We have an SRAT like this:
      SRAT: Node 0 PXM 0 [mem 0x00000000-0x5fffffff]
      SRAT: Node 0 PXM 0 [mem 0x100000000-0x1ffffffffff]
      SRAT: Node 1 PXM 1 [mem 0x20000000000-0x3ffffffffff]
      SRAT: Node 4 PXM 2 [mem 0x40000000000-0x5ffffffffff] hotplug
      SRAT: Node 5 PXM 3 [mem 0x60000000000-0x7ffffffffff] hotplug
      SRAT: Node 2 PXM 4 [mem 0x80000000000-0x9ffffffffff] hotplug
      SRAT: Node 3 PXM 5 [mem 0xa0000000000-0xbffffffffff] hotplug
      SRAT: Node 6 PXM 6 [mem 0xc0000000000-0xdffffffffff] hotplug
      SRAT: Node 7 PXM 7 [mem 0xe0000000000-0xfffffffffff] hotplug
      
      On boot, only node 0,1,2,3 exist.
      
      And the numa_meminfo will look like this:
      numa_meminfo.nr_blks = 9
      1. on node 0: [0, 60000000]
      2. on node 0: [100000000, 20000000000]
      3. on node 1: [20000000000, 40000000000]
      4. on node 4: [40000000000, 60000000000]
      5. on node 5: [60000000000, 80000000000]
      6. on node 2: [80000000000, a0000000000]
      7. on node 3: [a0000000000, a0800000000]
      8. on node 6: [c0000000000, a0800000000]
      9. on node 7: [e0000000000, a0800000000]
      
      And numa_cleanup_meminfo() will merge 1 and 2, and remove 8,9 because the
      end address is over max_pfn, which is a0800000000.  But 4 and 5 are not
      removed because their end addresses are less then max_pfn.  But in fact,
      node 4 and 5 don't exist.
      
      In a word, numa_cleanup_meminfo() is not able to handle holes between nodes.
      
      Since memory ranges in node 4 and 5 are in numa_meminfo, in
      numa_register_memblks(), node 4 and 5 will be mistakenly set to online.
      
      If you run lscpu, it will show:
      NUMA node0 CPU(s):     0-14,128-142
      NUMA node1 CPU(s):     15-29,143-157
      NUMA node2 CPU(s):
      NUMA node3 CPU(s):
      NUMA node4 CPU(s):     62-76,190-204
      NUMA node5 CPU(s):     78-92,206-220
      
      In this patch, we use memblock_overlaps_region() to check if ranges in
      numa_meminfo overlap with ranges in memory_block.  Since memory_block
      contains all available memory at boot time, if they overlap, it means the
      ranges exist.  If not, then remove them from numa_meminfo.
      
      After this patch, lscpu will show:
      NUMA node0 CPU(s):     0-14,128-142
      NUMA node1 CPU(s):     15-29,143-157
      NUMA node4 CPU(s):     62-76,190-204
      NUMA node5 CPU(s):     78-92,206-220
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Vladimir Murzin <vladimir.murzin@arm.com>
      Cc: Fabian Frederick <fabf@skynet.be>
      Cc: Alexander Kuleshov <kuleshovmail@gmail.com>
      Cc: Baoquan He <bhe@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95cf82ec
  2. 05 9月, 2015 6 次提交
    • M
      mm: send one IPI per CPU to TLB flush all entries after unmapping pages · 72b252ae
      Mel Gorman 提交于
      An IPI is sent to flush remote TLBs when a page is unmapped that was
      potentially accesssed by other CPUs.  There are many circumstances where
      this happens but the obvious one is kswapd reclaiming pages belonging to a
      running process as kswapd and the task are likely running on separate
      CPUs.
      
      On small machines, this is not a significant problem but as machine gets
      larger with more cores and more memory, the cost of these IPIs can be
      high.  This patch uses a simple structure that tracks CPUs that
      potentially have TLB entries for pages being unmapped.  When the unmapping
      is complete, the full TLB is flushed on the assumption that a refill cost
      is lower than flushing individual entries.
      
      Architectures wishing to do this must give the following guarantee.
      
              If a clean page is unmapped and not immediately flushed, the
              architecture must guarantee that a write to that linear address
              from a CPU with a cached TLB entry will trap a page fault.
      
      This is essentially what the kernel already depends on but the window is
      much larger with this patch applied and is worth highlighting.  The
      architecture should consider whether the cost of the full TLB flush is
      higher than sending an IPI to flush each individual entry.  An additional
      architecture helper called flush_tlb_local is required.  It's a trivial
      wrapper with some accounting in the x86 case.
      
      The impact of this patch depends on the workload as measuring any benefit
      requires both mapped pages co-located on the LRU and memory pressure.  The
      case with the biggest impact is multiple processes reading mapped pages
      taken from the vm-scalability test suite.  The test case uses NR_CPU
      readers of mapped files that consume 10*RAM.
      
      Linear mapped reader on a 4-node machine with 64G RAM and 48 CPUs
      
                                                 4.2.0-rc1          4.2.0-rc1
                                                   vanilla       flushfull-v7
      Ops lru-file-mmap-read-elapsed      159.62 (  0.00%)   120.68 ( 24.40%)
      Ops lru-file-mmap-read-time_range    30.59 (  0.00%)     2.80 ( 90.85%)
      Ops lru-file-mmap-read-time_stddv     6.70 (  0.00%)     0.64 ( 90.38%)
      
                 4.2.0-rc1    4.2.0-rc1
                   vanilla flushfull-v7
      User          581.00       611.43
      System       5804.93      4111.76
      Elapsed       161.03       122.12
      
      This is showing that the readers completed 24.40% faster with 29% less
      system CPU time.  From vmstats, it is known that the vanilla kernel was
      interrupted roughly 900K times per second during the steady phase of the
      test and the patched kernel was interrupts 180K times per second.
      
      The impact is lower on a single socket machine.
      
                                                 4.2.0-rc1          4.2.0-rc1
                                                   vanilla       flushfull-v7
      Ops lru-file-mmap-read-elapsed       25.33 (  0.00%)    20.38 ( 19.54%)
      Ops lru-file-mmap-read-time_range     0.91 (  0.00%)     1.44 (-58.24%)
      Ops lru-file-mmap-read-time_stddv     0.28 (  0.00%)     0.47 (-65.34%)
      
                 4.2.0-rc1    4.2.0-rc1
                   vanilla flushfull-v7
      User           58.09        57.64
      System        111.82        76.56
      Elapsed        27.29        22.55
      
      It's still a noticeable improvement with vmstat showing interrupts went
      from roughly 500K per second to 45K per second.
      
      The patch will have no impact on workloads with no memory pressure or have
      relatively few mapped pages.  It will have an unpredictable impact on the
      workload running on the CPU being flushed as it'll depend on how many TLB
      entries need to be refilled and how long that takes.  Worst case, the TLB
      will be completely cleared of active entries when the target PFNs were not
      resident at all.
      
      [sasha.levin@oracle.com: trace tlb flush after disabling preemption in try_to_unmap_flush]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72b252ae
    • M
      x86, mm: trace when an IPI is about to be sent · 5b74283a
      Mel Gorman 提交于
      When unmapping pages it is necessary to flush the TLB.  If that page was
      accessed by another CPU then an IPI is used to flush the remote CPU.  That
      is a lot of IPIs if kswapd is scanning and unmapping >100K pages per
      second.
      
      There already is a window between when a page is unmapped and when it is
      TLB flushed.  This series increases the window so multiple pages can be
      flushed using a single IPI.  This should be safe or the kernel is hosed
      already.
      
      Patch 1 simply made the rest of the series easier to write as ftrace
              could identify all the senders of TLB flush IPIS.
      
      Patch 2 tracks what CPUs potentially map a PFN and then sends an IPI
              to flush the entire TLB.
      
      Patch 3 tracks when there potentially are writable TLB entries that
              need to be batched differently
      
      Patch 4 increases SWAP_CLUSTER_MAX to further batch flushes
      
      The performance impact is documented in the changelogs but in the optimistic
      case on a 4-socket machine the full series reduces interrupts from 900K
      interrupts/second to 60K interrupts/second.
      
      This patch (of 4):
      
      It is easy to trace when an IPI is received to flush a TLB but harder to
      detect what event sent it.  This patch makes it easy to identify the
      source of IPIs being transmitted for TLB flushes on x86.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b74283a
    • A
      userfaultfd: activate syscall · 1380fca0
      Andrea Arcangeli 提交于
      This activates the userfaultfd syscall.
      
      [sfr@canb.auug.org.au: activate syscall fix]
      [akpm@linux-foundation.org: don't enable userfaultfd on powerpc]
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Cc: Sanidhya Kashyap <sanidhya.gatech@gmail.com>
      Cc: zhang.zhanghailiang@huawei.com
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andres Lagar-Cavilla <andreslc@google.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: "Huangpeng (Peter)" <peter.huangpeng@huawei.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1380fca0
    • U
      watchdog: rename watchdog_suspend() and watchdog_resume() · ec6a9066
      Ulrich Obergfell 提交于
      Rename watchdog_suspend() to lockup_detector_suspend() and
      watchdog_resume() to lockup_detector_resume() to avoid confusion with the
      watchdog subsystem and to be consistent with the existing name
      lockup_detector_init().
      
      Also provide comment blocks to explain the watchdog_running and
      watchdog_suspended variables and their relationship.
      Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
      Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec6a9066
    • U
      watchdog: use suspend/resume interface in fixup_ht_bug() · 999bbe49
      Ulrich Obergfell 提交于
      Remove watchdog_nmi_disable_all() and watchdog_nmi_enable_all() since
      these functions are no longer needed.  If a subsystem has a need to
      deactivate the watchdog temporarily, it should utilize the
      watchdog_suspend() and watchdog_resume() functions.
      
      [akpm@linux-foundation.org: fix build with CONFIG_LOCKUP_DETECTOR=m]
      Signed-off-by: NUlrich Obergfell <uobergfe@redhat.com>
      Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      999bbe49
    • G
      kernel/watchdog: move NMI function header declarations from watchdog.h to nmi.h · aacfbe6a
      Guenter Roeck 提交于
      The kernel's NMI watchdog has nothing to do with the watchdog subsystem.
      Its header declarations should be in linux/nmi.h, not linux/watchdog.h.
      
      The code provided two sets of dummy functions if HARDLOCKUP_DETECTOR is
      not configured, one in the include file and one in kernel/watchdog.c.
      Remove the dummy functions from kernel/watchdog.c and use those from the
      include file.
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Don Zickus <dzickus@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aacfbe6a
  3. 28 8月, 2015 3 次提交
    • T
      x86/irq: Do not dereference irq descriptor before checking it · a47d4576
      Thomas Gleixner 提交于
      Having the IS_NULL_OR_ERR() check after dereferencing the pointer is
      not really working well.
      
      Move the dereference after the check.
      
      Fixes: a782a7e4 'x86/irq: Store irq descriptor in vector array'
      Reported-and-tested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      a47d4576
    • L
      x86/mm/mtrr: Remove kernel internal MTRR interfaces: unexport mtrr_add() and mtrr_del() · 2baa891e
      Luis R. Rodriguez 提交于
      The effort to replace mtrr_add() with architecture agnostic
      arch_phys_wc_add() is complete, this will ensure write-combining
      implementations (PAT on x86) is taken advantage instead of using
      MTRR. With the effort done now, hide direct MTRR access for
      drivers.
      
      The legacy user-space /proc/mtrr ABI is not affected.
      
      Update x86 documentation on MTRR to reflect the completion of
      the phasing out of direct access to MTRR, also add a note on
      platform firmware code use of MTRRs based on the obituary
      discussion of MTRRs on Linux [0].
      
        [0] http://lkml.kernel.org/r/1438991330.3109.196.camel@hp.comSigned-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Cc: <syrjala@sci.fi>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Walls <awalls@md.metrocast.net>
      Cc: Antonino Daplas <adaplas@gmail.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jean-Christophe Plagniol-Villard <plagnioj@jcrosoft.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Suresh Siddha <sbsiddha@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tomi Valkeinen <tomi.valkeinen@ti.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Ville Syrjälä <syrjala@sci.fi>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: airlied@linux.ie
      Cc: benh@kernel.crashing.org
      Cc: bhelgaas@google.com
      Cc: dan.j.williams@intel.com
      Cc: konrad.wilk@oracle.com
      Cc: linux-fbdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: linux-media@vger.kernel.org
      Cc: mst@redhat.com
      Cc: netdev@vger.kernel.org
      Cc: vinod.koul@intel.com
      Cc: xen-devel@lists.xensource.com
      Link: http://lkml.kernel.org/r/1440443613-13696-12-git-send-email-mcgrof@do-not-panic.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2baa891e
    • H
      x86/asm: Drop repeated macro of X86_EFLAGS_AC definition · 7e01ebff
      Huang Rui 提交于
      We just need one macro of X86_EFLAGS_AC_BIT and X86_EFLAGS_AC.
      Signed-off-by: NHuang Rui <ray.huang@amd.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Li <tony.li@amd.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Link: http://lkml.kernel.org/r/1440669844-21535-1-git-send-email-ray.huang@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7e01ebff
  4. 27 8月, 2015 1 次提交
    • J
      ACPI, PCI: Penalize legacy IRQ used by ACPI SCI · 5d0ddfeb
      Jiang Liu 提交于
      Nick Meier reported a regression with HyperV that "
        After rebooting the VM, the following messages are logged in syslog
        when trying to load the tulip driver:
          tulip: Linux Tulip drivers version 1.1.15 (Feb 27, 2007)
          tulip: 0000:00:0a.0: PCI INT A: failed to register GSI
          tulip: Cannot enable tulip board #0, aborting
          tulip: probe of 0000:00:0a.0 failed with error -16
        Errors occur in 3.19.0 kernel
        Works in 3.17 kernel.
      "
      
      According to the ACPI dump file posted by Nick at
      https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072
      
      The ACPI MADT table includes an interrupt source overridden entry for
      ACPI SCI:
      [236h 0566  1]                Subtable Type : 02 <Interrupt Source Override>
      [237h 0567  1]                       Length : 0A
      [238h 0568  1]                          Bus : 00
      [239h 0569  1]                       Source : 09
      [23Ah 0570  4]                    Interrupt : 00000009
      [23Eh 0574  2]        Flags (decoded below) : 000D
                                         Polarity : 1
                                     Trigger Mode : 3
      
      And in DSDT table, we have _PRT method to define PCI interrupts, which
      eventually goes to:
              Name (PRSA, ResourceTemplate ()
              {
                  IRQ (Level, ActiveLow, Shared, )
                      {3,4,5,7,9,10,11,12,14,15}
              })
              Name (PRSB, ResourceTemplate ()
              {
                  IRQ (Level, ActiveLow, Shared, )
                      {3,4,5,7,9,10,11,12,14,15}
              })
              Name (PRSC, ResourceTemplate ()
              {
                  IRQ (Level, ActiveLow, Shared, )
                      {3,4,5,7,9,10,11,12,14,15}
              })
              Name (PRSD, ResourceTemplate ()
              {
                  IRQ (Level, ActiveLow, Shared, )
                      {3,4,5,7,9,10,11,12,14,15}
              })
      
      According to the MADT and DSDT tables, IRQ 9 may be used for:
       1) ACPI SCI in level, high mode
       2) PCI legacy IRQ in level, low mode
      So there's a conflict in polarity setting for IRQ 9.
      
      Prior to commit cd68f6bd ("x86, irq, acpi: Get rid of special
      handling of GSI for ACPI SCI"), ACPI SCI is handled specially and
      there's no check for conflicts between ACPI SCI and PCI legagy IRQ.
      And it seems that the HyperV hypervisor doesn't make use of the
      polarity configuration in IOAPIC entry, so it just works.
      
      Commit cd68f6bd gets rid of the specially handling of ACPI SCI,
      and then the pin attribute checking code discloses the conflicts
      between ACPI SCI and PCI legacy IRQ on HyperV virtual machine,
      and rejects the request to assign IRQ9 to PCI devices.
      
      So penalize legacy IRQ used by ACPI SCI and mark it unusable if ACPI
      SCI attributes conflict with PCI IRQ attributes.
      
      Please refer to following links for more information:
      https://bugzilla.kernel.org/show_bug.cgi?id=101301
      https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1440072
      
      Fixes: cd68f6bd ("x86, irq, acpi: Get rid of special handling of GSI for ACPI SCI")
      Reported-and-tested-by: NNick Meier <nmeier@microsoft.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: 3.19+ <stable@vger.kernel.org> # 3.19+
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      5d0ddfeb
  5. 25 8月, 2015 3 次提交
  6. 23 8月, 2015 1 次提交
  7. 22 8月, 2015 8 次提交
    • T
      x86/apic: Fix fallout from x2apic cleanup · a57e456a
      Thomas Gleixner 提交于
      In the recent x2apic cleanup I got two things really wrong:
      1) The safety check in __disable_x2apic which allows the function to
         be called unconditionally is backwards. The check is there to
         prevent access to the apic MSR in case that the machine has no
         apic. Though right now it returns if the machine has an apic and
         therefor the disabling of x2apic is never invoked.
      
      2) x2apic_disable() sets x2apic_mode to 0 after registering the local
         apic. That's wrong, because register_lapic_address() checks x2apic
         mode and therefor takes the wrong code path.
      
      This results in boot failures on machines with x2apic preenabled by
      BIOS and can also lead to an fatal MSR access on machines without
      apic.
      
      The solutions are simple:
      1) Correct the sanity check for apic availability
      2) Clear x2apic_mode _before_ calling register_lapic_address()
      
      Fixes: 659006bf 'x86/x2apic: Split enable and setup function'
      Reported-and-tested-by: NJavier Monteagudo <javiermon@gmail.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Link: https://bugzilla.redhat.com/show_bug.cgi?id=1224764
      Cc: stable@vger.kernel.org # 4.0+
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Jiang Liu <jiang.liu@linux.intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      a57e456a
    • A
      x86/kasan, mm: Introduce generic kasan_populate_zero_shadow() · 69786cdb
      Andrey Ryabinin 提交于
      Introduce generic kasan_populate_zero_shadow(shadow_start,
      shadow_end). This function maps kasan_zero_page to the
      [shadow_start, shadow_end] addresses.
      
      This replaces x86_64 specific populate_zero_shadow() and will
      be used for ARM64 in follow on patches.
      
      The main changes from original version are:
      
       * Use p?d_populate*() instead of set_p?d()
       * Use memblock allocator directly instead of vmemmap_alloc_block()
       * __pa() instead of __pa_nodebug(). __pa() causes troubles
         iff we use it before kasan_early_init(). kasan_populate_zero_shadow()
         will be used later, so we ok with __pa() here.
      Signed-off-by: NAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexey Klimov <klimov.linux@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: David Keitel <dkeitel@codeaurora.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yury <yury.norov@gmail.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1439444244-26057-3-git-send-email-ryabinin.a.a@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69786cdb
    • A
      x86/kasan: Define KASAN_SHADOW_OFFSET per architecture · 920e277e
      Andrey Ryabinin 提交于
      Current definition of  KASAN_SHADOW_OFFSET in
      include/linux/kasan.h will not work for upcomming arm64, so move
      it to the arch header.
      Signed-off-by: NAndrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexey Klimov <klimov.linux@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: David Keitel <dkeitel@codeaurora.org>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yury <yury.norov@gmail.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/1439444244-26057-2-git-send-email-ryabinin.a.a@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      920e277e
    • H
      x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer · b466bdb6
      Huang Rui 提交于
      MWAITX can enable a timer and a corresponding timer value
      specified in SW P0 clocks. The SW P0 frequency is the same as
      TSC. The timer provides an upper bound on how long the
      instruction waits before exiting.
      
      This way, a delay function in the kernel can leverage that
      MWAITX timer of MWAITX.
      
      When a CPU core executes MWAITX, it will be quiesced in a
      waiting phase, diminishing its power consumption. This way, we
      can save power in comparison to our default TSC-based delays.
      
      A simple test shows that:
      
      	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
      	$ sleep 10000s
      	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
      
      Results:
      
      	* TSC-based default delay:      485115 uWatts average power
      	* MWAITX-based delay:           252738 uWatts average power
      
      Thus, that's about 240 milliWatts less power consumption. The
      test method relies on the support of AMD CPU accumulated power
      algorithm in fam15h_power for which patches are forthcoming.
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Suggested-by: NBorislav Petkov <bp@suse.de>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NHuang Rui <ray.huang@amd.com>
      [ Fix delay truncation. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
      Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hector Marco-Gisbert <hecmargi@upv.es>
      Cc: Jacob Shin <jacob.w.shin@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Li <tony.li@amd.com>
      Link: http://lkml.kernel.org/r/1438744732-1459-3-git-send-email-ray.huang@amd.com
      Link: http://lkml.kernel.org/r/1439201994-28067-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b466bdb6
    • H
      x86/asm: Add MONITORX/MWAITX instruction support · f9675674
      Huang Rui 提交于
      AMD Carrizo processors (Family 15h, Models 60h-6fh) added a new
      feature called MWAITX (MWAIT with extensions) as an extension to
      MONITOR/MWAIT.
      
      This new instruction controls a configurable timer which causes
      the core to exit wait state on timer expiration, in addition to
      "normal" MWAIT condition of reading from a monitored VA.
      
      Compared to MONITOR/MWAIT, there are minor differences in opcode
      and input parameters:
      
      MWAITX ECX[1]: enable timer if set
      MWAITX EBX[31:0]: max wait time expressed in SW P0 clocks ==
      TSC. The software P0 frequency is the same as the TSC frequency.
      
                      MWAIT                           MWAITX
      opcode          0f 01 c9           |            0f 01 fb
      ECX[0]                  value of RFLAGS.IF seen by instruction
      ECX[1]          unused/#GP if set  |            enable timer if set
      ECX[31:2]                     unused/#GP if set
      EAX                           unused (reserve for hint)
      EBX[31:0]       unused             |            max wait time (SW P0 == TSC)
      
                      MONITOR                         MONITORX
      opcode          0f 01 c8           |            0f 01 fa
      EAX                     (logical) address to monitor
      ECX                     #GP if not zero
      
      Max timeout = EBX/(TSC frequency)
      Signed-off-by: NHuang Rui <ray.huang@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Dirk Brandewie <dirk.j.brandewie@intel.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Li <tony.li@amd.com>
      Link: http://lkml.kernel.org/r/1439201994-28067-3-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f9675674
    • A
      x86/traps: Weaken context tracking entry assertions · f0a97af8
      Andy Lutomirski 提交于
      We were asserting that we were all the way in CONTEXT_KERNEL
      when exception handlers were called.  While having this be true
      is, I think, a nice goal (or maybe a variant in which we assert
      that we're in CONTEXT_KERNEL or some new IRQ context), we're not
      quite there.
      
      In particular, if an IRQ interrupts the SYSCALL prologue and the
      IRQ handler in turn causes an exception, the exception entry
      will be called in RCU IRQ mode but with CONTEXT_USER.
      
      This is okay (nothing goes wrong), but until we fix up the
      SYSCALL prologue, we need to avoid warning.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Acked-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/c81faf3916346c0e04346c441392974f49cd7184.1440133286.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f0a97af8
    • I
      x86/fpu/math-emu: Fix crash in fork() · 827409b2
      Ingo Molnar 提交于
      During later stages of math-emu bootup the following crash triggers:
      
      	 math_emulate: 0060:c100d0a8
      	 Kernel panic - not syncing: Math emulation needed in kernel
      	 CPU: 0 PID: 1511 Comm: login Not tainted 4.2.0-rc7+ #1012
      	 [...]
      	 Call Trace:
      	  [<c181d50d>] dump_stack+0x41/0x52
      	  [<c181c918>] panic+0x77/0x189
      	  [<c1003530>] ? math_error+0x140/0x140
      	  [<c164c2d7>] math_emulate+0xba7/0xbd0
      	  [<c100d0a8>] ? fpu__copy+0x138/0x1c0
      	  [<c1109c3c>] ? __alloc_pages_nodemask+0x12c/0x870
      	  [<c136ac20>] ? proc_clear_tty+0x40/0x70
      	  [<c136ac6e>] ? session_clear_tty+0x1e/0x30
      	  [<c1003530>] ? math_error+0x140/0x140
      	  [<c1003575>] do_device_not_available+0x45/0x70
      	  [<c100d0a8>] ? fpu__copy+0x138/0x1c0
      	  [<c18258e6>] error_code+0x5a/0x60
      	  [<c1003530>] ? math_error+0x140/0x140
      	  [<c100d0a8>] ? fpu__copy+0x138/0x1c0
      	  [<c100c205>] arch_dup_task_struct+0x25/0x30
      	  [<c1048cea>] copy_process.part.51+0xea/0x1480
      	  [<c115a8e5>] ? dput+0x175/0x200
      	  [<c136af70>] ? no_tty+0x30/0x30
      	  [<c1157242>] ? do_vfs_ioctl+0x322/0x540
      	  [<c104a21a>] _do_fork+0xca/0x340
      	  [<c1057b06>] ? SyS_rt_sigaction+0x66/0x90
      	  [<c104a557>] SyS_clone+0x27/0x30
      	  [<c1824a80>] sysenter_do_call+0x12/0x12
      
      The reason is the incorrect assumption in fpu_copy(), that FNSAVE
      can be executed from math-emu kernels as well.
      
      Don't try to copy the registers, the soft state will be copied
      by fork anyway, so the child task inherits the parent task's
      soft math state.
      
      With this fix applied math-emu kernels boot up fine on modern
      hardware and the 'no387 nofxsr' boot options.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Bobby Powers <bobbypowers@gmail.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      827409b2
    • I
      x86/fpu/math-emu: Fix math-emu boot crash · 5fc96038
      Ingo Molnar 提交于
      On a math-emu bootup the following crash occurs:
      
      	Initializing CPU#0
      	------------[ cut here ]------------
      	kernel BUG at arch/x86/kernel/traps.c:779!
      	invalid opcode: 0000 [#1] SMP
      	[...]
      	EIP is at do_device_not_available+0xe/0x70
      	[...]
      	Call Trace:
      	 [<c18238e6>] error_code+0x5a/0x60
      	 [<c1002bd0>] ? math_error+0x140/0x140
      	 [<c100bbd9>] ? fpu__init_cpu+0x59/0xa0
      	 [<c1012322>] cpu_init+0x202/0x330
      	 [<c104509f>] ? __native_set_fixmap+0x1f/0x30
      	 [<c1b56ab0>] trap_init+0x305/0x346
      	 [<c1b548af>] start_kernel+0x1a5/0x35d
      	 [<c1b542b4>] i386_start_kernel+0x82/0x86
      
      The reason is that in the following commit:
      
        b1276c48 ("x86/fpu: Initialize fpregs in fpu__init_cpu_generic()")
      
      I failed to consider math-emu's limitation that it cannot execute the
      FNINIT instruction in kernel mode.
      
      The long term fix might be to allow math-emu to execute (certain) kernel
      mode FPU instructions, but for now apply the safe (albeit somewhat ugly)
      fix: initialize the emulation state explicitly without trapping out to
      the FPU emulator.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5fc96038
  8. 21 8月, 2015 5 次提交
  9. 20 8月, 2015 1 次提交
    • D
      x86/xen: make CONFIG_XEN depend on CONFIG_X86_LOCAL_APIC · 87ffd2b9
      David Vrabel 提交于
      Since commit feb44f1f (x86/xen:
      Provide a "Xen PV" APIC driver to support >255 VCPUs) Xen guests need
      a full APIC driver and thus should depend on X86_LOCAL_APIC.
      
      This fixes an i386 build failure with !SMP && !CONFIG_X86_UP_APIC by
      disabling Xen support in this configuration.
      
      Users needing Xen support in a non-SMP i386 kernel will need to enable
      CONFIG_X86_UP_APIC.
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Cc: <stable@vger.kernel.org>
      87ffd2b9
  10. 19 8月, 2015 1 次提交
  11. 18 8月, 2015 1 次提交
  12. 17 8月, 2015 6 次提交
    • H
      crypto: aead - Remove CRYPTO_ALG_AEAD_NEW flag · 5e4b8c1f
      Herbert Xu 提交于
      This patch removes the CRYPTO_ALG_AEAD_NEW flag now that everyone
      has been converted.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      5e4b8c1f
    • L
      x86/smpboot: Remove APIC.wait_for_init_deassert and atomic init_deasserted · 656bba30
      Len Brown 提交于
      Both the per-APIC flag ".wait_for_init_deassert",
      and the global atomic_t "init_deasserted"
      are dead code -- remove them.
      
      For all APIC types, "wait_for_master()"
      prevents an AP from proceeding until the BSP has set
      cpu_callout_mask, making "init_deasserted" {unnecessary}:
      
      	BSP: <de-assert INIT>
      	...
      	BSP: {set init_deasserted}
      	AP: wait_for_master()
      		set cpu_initialized_mask
      		wait for cpu_callout_mask
      	BSP: test cpu_initialized_mask
      	BSP: set cpu_callout_mask
      	AP: test cpu_callout_mask
      	AP: {wait for init_deasserted}
      	...
      	AP: <touch APIC>
      
      Deleting the {dead code} above is necessary to enable
      some parallelism in a future patch.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jan H. Schönherr <jschoenh@amazon.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/de4b3a9bab894735e285870b5296da25ee6a8a5a.1439739165.git.len.brown@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      656bba30
    • L
      x86/smpboot: Remove SIPI delays from cpu_up() · a9bcaa02
      Len Brown 提交于
      MPS 1.4 example code shows the following required delays during processor
      on-lining:
      
      	INIT
      	 udelay(10,000)
      	SIPI
      	 udelay(200)
      	SIPI
      	 udelay(200) /* Linux actually implements this as udelay(300) */
      
      Linux skips the udelay(10,000) on modern processors.
      This patch removes the udelay(200) after each SIPI
      on those same processors.
      
      All three legacy delays can be restored by the cmdline
      "cpu_init_udelay=10000".
      
      As measured by analyze_suspend.py, this patch speeds
      processor resume time on my desktop from 2.4ms to 1.8ms, per AP.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jan H. Schönherr <jschoenh@amazon.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/a5dfdbc8fbfdd813784da204aad5677fe459ac37.1439739165.git.len.brown@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a9bcaa02
    • L
      x86/smpboot: Remove udelay(100) when polling cpu_callin_map · 2d99af8e
      Len Brown 提交于
      After the BSP sends INIT/SIPI/SIP to the AP and sees the AP
      in the cpu_initialized_map, it sets the AP loose via the
      cpu_callout_map, and waits for it via the cpu_callin_map.
      
      The BSP polls the cpu_callin_map with a udelay(100)
      and a schedule() in each iteration.
      
      The udelay(100) adds no value.
      
      For example, on my 4-CPU dekstop, the AP finishes
      cpu_callin() in under 70 usec and sets the cpu_callin_mask.
      The BSP, however, doesn't see that setting until over 30 usec
      later, because it was still running its udelay(100)
      when the AP finished.
      
      Deleting the udelay(100) in the cpu_callin_mask polling loop,
      saves from 0 to 100 usec per Application Processor.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jan H. Schönherr <jschoenh@amazon.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/0aade12eabeb89a688c929fe80856eaea0544bb7.1439739165.git.len.brown@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2d99af8e
    • L
      x86/smpboot: Remove udelay(100) when polling cpu_initialized_map · 6e38f1e7
      Len Brown 提交于
      After the BSP sends the APIC INIT/SIPI/SIPI to the AP,
      it waits for the AP to come up and indicate that it is alive
      by setting its own bit in the cpu_initialized_mask.
      
      Linux polls for up to 10 seconds for this to happen.
      Each polling loop has a udelay(100) and a call to schedule().
      
      The udelay(100) adds no value.
      
      For example, on my desktop, the BSP waits for the
      other 3 CPUs to come on line at boot for 305, 404, 405 usec.
      For resume from S3, it waits 317, 404, 405 usec.
      
      But when the udelay(100) is removed, the BSP waits
      305, 310, 306 for boot, and 305, 307, 306 for resume.
      
      So for both boot and resume, removing the udelay(100)
      speeds online by about 100us in 2 of 3 cases.
      Signed-off-by: NLen Brown <len.brown@intel.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jan H. Schönherr <jschoenh@amazon.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhu Guihua <zhugh.fnst@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/33ef746c67d2489cad0a9b1958cf71167232ff2b.1439739165.git.len.brown@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6e38f1e7
    • A
      x86/ldt: Further fix FPU emulation · 12e244f4
      Andy Lutomirski 提交于
      The previous fix confused a selector with a segment prefix.  Fix it.
      
      Compile-tested only.
      
      Cc: stable@vger.kernel.org
      Cc: Juergen Gross <jgross@suse.com>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Fixes: 4809146b ("x86/ldt: Correct FPU emulation access to LDT")
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      12e244f4
  13. 15 8月, 2015 1 次提交