1. 07 7月, 2017 10 次提交
    • A
      powerpc/hugetlb: enable hugetlb migration for ppc64 · f7fb506f
      Aneesh Kumar K.V 提交于
      Link: http://lkml.kernel.org/r/1494926612-23928-10-git-send-email-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Mike Kravetz <kravetz@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f7fb506f
    • A
      powerpc/mm/hugetlb: remove follow_huge_addr for powerpc · 28c05716
      Aneesh Kumar K.V 提交于
      With generic code now handling hugetlb entries at pgd level and also
      supporting hugepage directory format, we can now remove the powerpc
      sepcific follow_huge_addr implementation.
      
      Link: http://lkml.kernel.org/r/1494926612-23928-9-git-send-email-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Mike Kravetz <kravetz@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      28c05716
    • A
      powerpc/hugetlb: add follow_huge_pd implementation for ppc64 · 50791e6d
      Aneesh Kumar K.V 提交于
      Link: http://lkml.kernel.org/r/1494926612-23928-8-git-send-email-aneesh.kumar@linux.vnet.ibm.comSigned-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Mike Kravetz <kravetz@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50791e6d
    • M
      mm, memory_hotplug: replace for_device by want_memblock in arch_add_memory · 3d79a728
      Michal Hocko 提交于
      arch_add_memory gets for_device argument which then controls whether we
      want to create memblocks for created memory sections.  Simplify the
      logic by telling whether we want memblocks directly rather than going
      through pointless negation.  This also makes the api easier to
      understand because it is clear what we want rather than nothing telling
      for_device which can mean anything.
      
      This shouldn't introduce any functional change.
      
      Link: http://lkml.kernel.org/r/20170515085827.16474-13-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d79a728
    • M
      mm, memory_hotplug: do not associate hotadded memory to zones until online · f1dd2cd1
      Michal Hocko 提交于
      The current memory hotplug implementation relies on having all the
      struct pages associate with a zone/node during the physical hotplug
      phase (arch_add_memory->__add_pages->__add_section->__add_zone).  In the
      vast majority of cases this means that they are added to ZONE_NORMAL.
      This has been so since 9d99aaa3 ("[PATCH] x86_64: Support memory
      hotadd without sparsemem") and it wasn't a big deal back then because
      movable onlining didn't exist yet.
      
      Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
      onlining 511c2aba ("mm, memory-hotplug: dynamic configure movable
      memory and portion memory") and then things got more complicated.
      Rather than reconsidering the zone association which was no longer
      needed (because the memory hotplug already depended on SPARSEMEM) a
      convoluted semantic of zone shifting has been developed.  Only the
      currently last memblock or the one adjacent to the zone_movable can be
      onlined movable.  This essentially means that the online type changes as
      the new memblocks are added.
      
      Let's simulate memory hot online manually
        $ echo 0x100000000 > /sys/devices/system/memory/probe
        $ grep . /sys/devices/system/memory/memory32/valid_zones
        Normal Movable
      
        $ echo $((0x100000000+(128<<20))) > /sys/devices/system/memory/probe
        $ grep . /sys/devices/system/memory/memory3?/valid_zones
        /sys/devices/system/memory/memory32/valid_zones:Normal
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
      
        $ echo $((0x100000000+2*(128<<20))) > /sys/devices/system/memory/probe
        $ grep . /sys/devices/system/memory/memory3?/valid_zones
        /sys/devices/system/memory/memory32/valid_zones:Normal
        /sys/devices/system/memory/memory33/valid_zones:Normal
        /sys/devices/system/memory/memory34/valid_zones:Normal Movable
      
        $ echo online_movable > /sys/devices/system/memory/memory34/state
        $ grep . /sys/devices/system/memory/memory3?/valid_zones
        /sys/devices/system/memory/memory32/valid_zones:Normal
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
        /sys/devices/system/memory/memory34/valid_zones:Movable Normal
      
      This is an awkward semantic because an udev event is sent as soon as the
      block is onlined and an udev handler might want to online it based on
      some policy (e.g.  association with a node) but it will inherently race
      with new blocks showing up.
      
      This patch changes the physical online phase to not associate pages with
      any zone at all.  All the pages are just marked reserved and wait for
      the onlining phase to be associated with the zone as per the online
      request.  There are only two requirements
      
      	- existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap
      
      	- ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses
      
      the latter one is not an inherent requirement and can be changed in the
      future.  It preserves the current behavior and made the code slightly
      simpler.  This is subject to change in future.
      
      This means that the same physical online steps as above will lead to the
      following state: Normal Movable
      
        /sys/devices/system/memory/memory32/valid_zones:Normal Movable
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
      
        /sys/devices/system/memory/memory32/valid_zones:Normal Movable
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
        /sys/devices/system/memory/memory34/valid_zones:Normal Movable
      
        /sys/devices/system/memory/memory32/valid_zones:Normal Movable
        /sys/devices/system/memory/memory33/valid_zones:Normal Movable
        /sys/devices/system/memory/memory34/valid_zones:Movable
      
      Implementation:
      The current move_pfn_range is reimplemented to check the above
      requirements (allow_online_pfn_range) and then updates the respective
      zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
      pfn range with the zone/node.  __add_pages is updated to not require the
      zone and only initializes sections in the range.  This allowed to
      simplify the arch_add_memory code (s390 could get rid of quite some of
      code).
      
      devm_memremap_pages is the only user of arch_add_memory which relies on
      the zone association because it only hooks into the memory hotplug only
      half way.  It uses it to associate the new memory with ZONE_DEVICE but
      doesn't allow it to be {on,off}lined via sysfs.  This means that this
      particular code path has to call move_pfn_range_to_zone explicitly.
      
      The original zone shifting code is kept in place and will be removed in
      the follow up patch for an easier review.
      
      Please note that this patch also changes the original behavior when
      offlining a memory block adjacent to another zone (Normal vs.  Movable)
      used to allow to change its movable type.  This will be handled later.
      
      [richard.weiyang@gmail.com: simplify zone_intersects()]
        Link: http://lkml.kernel.org/r/20170616092335.5177-1-richard.weiyang@gmail.com
      [richard.weiyang@gmail.com: remove duplicate call for set_page_links]
        Link: http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiyang@gmail.com
      [akpm@linux-foundation.org: remove unused local `i']
      Link: http://lkml.kernel.org/r/20170515085827.16474-12-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NWei Yang <richard.weiyang@gmail.com>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NReza Arbab <arbab@linux.vnet.ibm.com>
      Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com> # For s390 bits
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1dd2cd1
    • M
      mm, memory_hotplug: get rid of is_zone_device_section · 1b862aec
      Michal Hocko 提交于
      Device memory hotplug hooks into regular memory hotplug only half way.
      It needs memory sections to track struct pages but there is no
      need/desire to associate those sections with memory blocks and export
      them to the userspace via sysfs because they cannot be onlined anyway.
      
      This is currently expressed by for_device argument to arch_add_memory
      which then makes sure to associate the given memory range with
      ZONE_DEVICE.  register_new_memory then relies on is_zone_device_section
      to distinguish special memory hotplug from the regular one.  While this
      works now, later patches in this series want to move __add_zone outside
      of arch_add_memory path so we have to come up with something else.
      
      Add want_memblock down the __add_pages path and use it to control
      whether the section->memblock association should be done.
      arch_add_memory then just trivially want memblock for everything but
      for_device hotplug.
      
      remove_memory_section doesn't need is_zone_device_section either.  We
      can simply skip all the memblock specific cleanup if there is no
      memblock for the given section.
      
      This shouldn't introduce any functional change.
      
      Link: http://lkml.kernel.org/r/20170515085827.16474-5-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Tested-by: NDan Williams <dan.j.williams@intel.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Balbir Singh <bsingharora@gmail.com>
      Cc: Daniel Kiper <daniel.kiper@oracle.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Igor Mammedov <imammedo@redhat.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Reza Arbab <arbab@linux.vnet.ibm.com>
      Cc: Tobias Regnery <tobias.regnery@gmail.com>
      Cc: Toshi Kani <toshi.kani@hpe.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: Xishi Qiu <qiuxishi@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1b862aec
    • H
      mm, THP, swap: delay splitting THP during swap out · 38d8b4e6
      Huang Ying 提交于
      Patch series "THP swap: Delay splitting THP during swapping out", v11.
      
      This patchset is to optimize the performance of Transparent Huge Page
      (THP) swap.
      
      Recently, the performance of the storage devices improved so fast that
      we cannot saturate the disk bandwidth with single logical CPU when do
      page swap out even on a high-end server machine.  Because the
      performance of the storage device improved faster than that of single
      logical CPU.  And it seems that the trend will not change in the near
      future.  On the other hand, the THP becomes more and more popular
      because of increased memory size.  So it becomes necessary to optimize
      THP swap performance.
      
      The advantages of the THP swap support include:
      
       - Batch the swap operations for the THP to reduce lock
         acquiring/releasing, including allocating/freeing the swap space,
         adding/deleting to/from the swap cache, and writing/reading the swap
         space, etc. This will help improve the performance of the THP swap.
      
       - The THP swap space read/write will be 2M sequential IO. It is
         particularly helpful for the swap read, which are usually 4k random
         IO. This will improve the performance of the THP swap too.
      
       - It will help the memory fragmentation, especially when the THP is
         heavily used by the applications. The 2M continuous pages will be
         free up after THP swapping out.
      
       - It will improve the THP utilization on the system with the swap
         turned on. Because the speed for khugepaged to collapse the normal
         pages into the THP is quite slow. After the THP is split during the
         swapping out, it will take quite long time for the normal pages to
         collapse back into the THP after being swapped in. The high THP
         utilization helps the efficiency of the page based memory management
         too.
      
      There are some concerns regarding THP swap in, mainly because possible
      enlarged read/write IO size (for swap in/out) may put more overhead on
      the storage device.  To deal with that, the THP swap in should be turned
      on only when necessary.  For example, it can be selected via
      "always/never/madvise" logic, to be turned on globally, turned off
      globally, or turned on only for VMA with MADV_HUGEPAGE, etc.
      
      This patchset is the first step for the THP swap support.  The plan is
      to delay splitting THP step by step, finally avoid splitting THP during
      the THP swapping out and swap out/in the THP as a whole.
      
      As the first step, in this patchset, the splitting huge page is delayed
      from almost the first step of swapping out to after allocating the swap
      space for the THP and adding the THP into the swap cache.  This will
      reduce lock acquiring/releasing for the locks used for the swap cache
      management.
      
      With the patchset, the swap out throughput improves 15.5% (from about
      3.73GB/s to about 4.31GB/s) in the vm-scalability swap-w-seq test case
      with 8 processes.  The test is done on a Xeon E5 v3 system.  The swap
      device used is a RAM simulated PMEM (persistent memory) device.  To test
      the sequential swapping out, the test case creates 8 processes, which
      sequentially allocate and write to the anonymous pages until the RAM and
      part of the swap device is used up.
      
      This patch (of 5):
      
      In this patch, splitting huge page is delayed from almost the first step
      of swapping out to after allocating the swap space for the THP
      (Transparent Huge Page) and adding the THP into the swap cache.  This
      will batch the corresponding operation, thus improve THP swap out
      throughput.
      
      This is the first step for the THP swap optimization.  The plan is to
      delay splitting the THP step by step and avoid splitting the THP
      finally.
      
      In this patch, one swap cluster is used to hold the contents of each THP
      swapped out.  So, the size of the swap cluster is changed to that of the
      THP (Transparent Huge Page) on x86_64 architecture (512).  For other
      architectures which want such THP swap optimization,
      ARCH_USES_THP_SWAP_CLUSTER needs to be selected in the Kconfig file for
      the architecture.  In effect, this will enlarge swap cluster size by 2
      times on x86_64.  Which may make it harder to find a free cluster when
      the swap space becomes fragmented.  So that, this may reduce the
      continuous swap space allocation and sequential write in theory.  The
      performance test in 0day shows no regressions caused by this.
      
      In the future of THP swap optimization, some information of the swapped
      out THP (such as compound map count) will be recorded in the
      swap_cluster_info data structure.
      
      The mem cgroup swap accounting functions are enhanced to support charge
      or uncharge a swap cluster backing a THP as a whole.
      
      The swap cluster allocate/free functions are added to allocate/free a
      swap cluster for a THP.  A fair simple algorithm is used for swap
      cluster allocation, that is, only the first swap device in priority list
      will be tried to allocate the swap cluster.  The function will fail if
      the trying is not successful, and the caller will fallback to allocate a
      single swap slot instead.  This works good enough for normal cases.  If
      the difference of the number of the free swap clusters among multiple
      swap devices is significant, it is possible that some THPs are split
      earlier than necessary.  For example, this could be caused by big size
      difference among multiple swap devices.
      
      The swap cache functions is enhanced to support add/delete THP to/from
      the swap cache as a set of (HPAGE_PMD_NR) sub-pages.  This may be
      enhanced in the future with multi-order radix tree.  But because we will
      split the THP soon during swapping out, that optimization doesn't make
      much sense for this first step.
      
      The THP splitting functions are enhanced to support to split THP in swap
      cache during swapping out.  The page lock will be held during allocating
      the swap cluster, adding the THP into the swap cache and splitting the
      THP.  So in the code path other than swapping out, if the THP need to be
      split, the PageSwapCache(THP) will be always false.
      
      The swap cluster is only available for SSD, so the THP swap optimization
      in this patchset has no effect for HDD.
      
      [ying.huang@intel.com: fix two issues in THP optimize patch]
        Link: http://lkml.kernel.org/r/87k25ed8zo.fsf@yhuang-dev.intel.com
      [hannes@cmpxchg.org: extensive cleanups and simplifications, reduce code size]
      Link: http://lkml.kernel.org/r/20170515112522.32457-2-ying.huang@intel.comSigned-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Suggested-by: Andrew Morton <akpm@linux-foundation.org> [for config option]
      Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> [for changes in huge_memory.c and huge_mm.h]
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Ebru Akagunduz <ebru.akagunduz@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      38d8b4e6
    • L
      tile: provide default ioremap declaration · 39229200
      Logan Gunthorpe 提交于
      Add a default ioremap function which was not provided in all
      circumstances.  (Only when CONFIG_PCI and CONFIG_TILEGX was set).
      
      I have designs to use them in scatterlist.c where they'd likely never be
      called with this architecture, but it is needed to compile.  Thus, if
      the function is ever hit it returns NULL.
      
      Link: http://lkml.kernel.org/r/1495726904-27380-1-git-send-email-logang@deltatee.comSigned-off-by: NLogan Gunthorpe <logang@deltatee.com>
      Signed-off-by: NStephen Bates <sbates@raithlin.com>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      39229200
    • T
      mn10300: use generic fb.h · 9cfc5e04
      Tobias Klauser 提交于
      The mn10300 arch uses a verbatim copy of the asm-generic version and
      does not add any own implementations to the header, so use
      asm-generic/fb.h instead of duplicating code.
      
      Link: http://lkml.kernel.org/r/20170517083348.1815-1-tklauser@distanz.chSigned-off-by: NTobias Klauser <tklauser@distanz.ch>
      Reviewed-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9cfc5e04
    • T
      mn10300: remove wrapper header for asm/device.h · dc513164
      Tobias Klauser 提交于
      mn10300's asm/device.h is merely including asm-generic/device.h.  Thus,
      the arch specific header can be omitted and the generic header can be
      used directly.
      
      Link: http://lkml.kernel.org/r/20170517124857.26834-1-tklauser@distanz.chSigned-off-by: NTobias Klauser <tklauser@distanz.ch>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc513164
  2. 06 7月, 2017 1 次提交
  3. 03 7月, 2017 6 次提交
    • T
      parisc: DMA API: return error instead of BUG_ON for dma ops on non dma devs · 33f9e024
      Thomas Bogendoerfer 提交于
      Enabling parport pc driver on a B2600 (and probably other 64bit PARISC
      systems) produced following BUG:
      
      CPU: 0 PID: 1 Comm: swapper Not tainted 4.12.0-rc5-30198-g1132d5e7 #156
      task: 000000009e050000 task.stack: 000000009e04c000
      
           YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
      PSW: 00001000000001101111111100001111 Not tainted
      r00-03  000000ff0806ff0f 000000009e04c990 0000000040871b78 000000009e04cac0
      r04-07  0000000040c14de0 ffffffffffffffff 000000009e07f098 000000009d82d200
      r08-11  000000009d82d210 0000000000000378 0000000000000000 0000000040c345e0
      r12-15  0000000000000005 0000000040c345e0 0000000000000000 0000000040c9d5e0
      r16-19  0000000040c345e0 00000000f00001c4 00000000f00001bc 0000000000000061
      r20-23  000000009e04ce28 0000000000000010 0000000000000010 0000000040b89e40
      r24-27  0000000000000003 0000000000ffffff 000000009d82d210 0000000040c14de0
      r28-31  0000000000000000 000000009e04ca90 000000009e04cb40 0000000000000000
      sr00-03  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      sr04-07  0000000000000000 0000000000000000 0000000000000000 0000000000000000
      
      IASQ: 0000000000000000 0000000000000000 IAOQ: 00000000404aece0 00000000404aece4
       IIR: 03ffe01f    ISR: 0000000010340000  IOR: 000001781304cac8
       CPU:        0   CR30: 000000009e04c000 CR31: 00000000e2976de2
       ORIG_R28: 0000000000000200
       IAOQ[0]: sba_dma_supported+0x80/0xd0
       IAOQ[1]: sba_dma_supported+0x84/0xd0
       RP(r2): parport_pc_probe_port+0x178/0x1200
      
      Cause is a call to dma_coerce_mask_and_coherenet in parport_pc_probe_port,
      which PARISC DMA API doesn't handle very nicely. This commit gives back
      DMA_ERROR_CODE for DMA API calls, if device isn't capable of DMA
      transaction.
      
      Cc: <stable@vger.kernel.org> # v3.13+
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      33f9e024
    • M
      ARM64: dts: marvell: armada37xx: Fix timer interrupt specifiers · 88cda007
      Marc Zyngier 提交于
      Contrary to popular belief, PPIs connected to a GICv3 to not have
      an affinity field similar to that of GICv2. That is consistent
      with the fact that GICv3 is designed to accomodate thousands of
      CPUs, and fitting them as a bitmap in a byte is... difficult.
      
      Fixes: adbc3695 ("arm64: dts: add the Marvell Armada 3700 family and
      a development board")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
      88cda007
    • A
      ARM: owl: smp: Drop bogus holding pen · 18cfd942
      Andreas Färber 提交于
      The S500 SoC can start secondary CPUs without busy-looping for pen_release,
      so simplify the SMP code compared to the LeMaker kernel tree.
      
      Fixes: 172067e0 ("ARM: owl: Implement CPU enable-method for S500")
      Suggested-by: NArnd Bergmann <arnd@arndb.de>
      Cc: David Liu <liuwei@actions-semi.com>
      Signed-off-by: NAndreas Färber <afaerber@suse.de>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      18cfd942
    • A
      ARM: owl: Drop custom machine · eb382745
      Andreas Färber 提交于
      Rely on the fallback to "Generic DT based system".
      This change is visible in /proc/cpuinfo.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndreas Färber <afaerber@suse.de>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      eb382745
    • H
      parisc: Report SIGSEGV instead of SIGBUS when running out of stack · 24746231
      Helge Deller 提交于
      When a process runs out of stack the parisc kernel wrongly faults with SIGBUS
      instead of the expected SIGSEGV signal.
      
      This example shows how the kernel faults:
      do_page_fault() command='a.out' type=15 address=0xfaac2000 in libc-2.24.so[f8308000+16c000]
      trap #15: Data TLB miss fault, vm_start = 0xfa2c2000, vm_end = 0xfaac2000
      
      The vma->vm_end value is the first address which does not belong to the vma, so
      adjust the check to include vma->vm_end to the range for which to send the
      SIGSEGV signal.
      
      This patch unbreaks building the debian libsigsegv package.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHelge Deller <deller@gmx.de>
      24746231
    • E
      parisc: use compat_sys_keyctl() · b0f94efd
      Eric Biggers 提交于
      Architectures with a compat syscall table must put compat_sys_keyctl()
      in it, not sys_keyctl().  The parisc architecture was not doing this;
      fix it.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NEric Biggers <ebiggers@google.com>
      Acked-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      b0f94efd
  4. 02 7月, 2017 1 次提交
  5. 01 7月, 2017 3 次提交
  6. 30 6月, 2017 19 次提交
    • A
      arm64: cpuinfo: constify attribute_group structures. · 70a62ad1
      Arvind Yadav 提交于
      attribute_groups are not supposed to change at runtime. All functions
      working with attribute_groups provided by <linux/sysfs.h> work with const
      attribute_group. So mark the non-const structs as const.
      Signed-off-by: NArvind Yadav <arvind.yadav.cs@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      70a62ad1
    • J
      objtool, x86: Add several functions and files to the objtool whitelist · c207aee4
      Josh Poimboeuf 提交于
      In preparation for an objtool rewrite which will have broader checks,
      whitelist functions and files which cause problems because they do
      unusual things with the stack.
      
      These whitelists serve as a TODO list for which functions and files
      don't yet have undwarf unwinder coverage.  Eventually most of the
      whitelists can be removed in favor of manual CFI hint annotations or
      objtool improvements.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/7f934a5d707a574bda33ea282e9478e627fb1829.1498659915.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c207aee4
    • A
      x86/mm: Delete a big outdated comment about TLB flushing · 8781fb7e
      Andy Lutomirski 提交于
      The comment describes the old explicit IPI-based flush logic, which
      is long gone.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/55e44997e56086528140c5180f8337dc53fb7ffc.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8781fb7e
    • A
      x86/mm: Don't reenter flush_tlb_func_common() · bc0d5a89
      Andy Lutomirski 提交于
      It was historically possible to have two concurrent TLB flushes
      targetting the same CPU: one initiated locally and one initiated
      remotely.  This can now cause an OOPS in leave_mm() at
      arch/x86/mm/tlb.c:47:
      
              if (this_cpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
                      BUG();
      
      with this call trace:
       flush_tlb_func_local arch/x86/mm/tlb.c:239 [inline]
       flush_tlb_mm_range+0x26d/0x370 arch/x86/mm/tlb.c:317
      
      Without reentrancy, this OOPS is impossible: leave_mm() is only
      called if we're not in TLBSTATE_OK, but then we're unexpectedly
      in TLBSTATE_OK in leave_mm().
      
      This can be caused by flush_tlb_func_remote() happening between
      the two checks and calling leave_mm(), resulting in two consecutive
      leave_mm() calls on the same CPU with no intervening switch_mm()
      calls.
      
      We never saw this OOPS before because the old leave_mm()
      implementation didn't put us back in TLBSTATE_OK, so the assertion
      didn't fire.
      
      Nadav noticed the reentrancy issue in a different context, but
      neither of us realized that it caused a problem yet.
      Reported-by: NLevin, Alexander (Sasha Levin) <alexander.levin@verizon.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NNadav Amit <nadav.amit@gmail.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: linux-mm@kvack.org
      Fixes: 3d28ebce ("x86/mm: Rework lazy TLB to track the actual loaded mm")
      Link: http://lkml.kernel.org/r/855acf733268d521c9f2e191faee2dcc23a29729.1498751203.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bc0d5a89
    • P
      x86/uaccess: Optimize copy_user_enhanced_fast_string() for short strings · 236222d3
      Paolo Abeni 提交于
      According to the Intel datasheet, the REP MOVSB instruction
      exposes a pretty heavy setup cost (50 ticks), which hurts
      short string copy operations.
      
      This change tries to avoid this cost by calling the explicit
      loop available in the unrolled code for strings shorter
      than 64 bytes.
      
      The 64 bytes cutoff value is arbitrary from the code logic
      point of view - it has been selected based on measurements,
      as the largest value that still ensures a measurable gain.
      
      Micro benchmarks of the __copy_from_user() function with
      lengths in the [0-63] range show this performance gain
      (shorter the string, larger the gain):
      
       - in the [55%-4%] range on Intel Xeon(R) CPU E5-2690 v4
       - in the [72%-9%] range on Intel Core i7-4810MQ
      
      Other tested CPUs - namely Intel Atom S1260 and AMD Opteron
      8216 - show no difference, because they do not expose the
      ERMS feature bit.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/4533a1d101fd460f80e21329a34928fad521c1d4.1498744345.git.pabeni@redhat.com
      [ Clarified the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      236222d3
    • C
      perf/x86/intel: Constify the 'lbr_desc[]' array and make a function static · e91c8d97
      Colin Ian King 提交于
      A few minor clean-ups: constify the lbr_desc[] array and make
      local function lbr_from_signext_quirk_rd() static to fix a sparse warning:
      
        "symbol 'lbr_from_signext_quirk_rd' was not declared. Should it be static?"
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Cc: Dan Carpenter <dan.carpenter@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kernel-janitors@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170629091406.9870-1-colin.king@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e91c8d97
    • K
      x86/KASLR: Fix detection 32/64 bit bootloaders for 5-level paging · a24261d7
      Kirill A. Shutemov 提交于
      KASLR uses hack to detect whether we booted via startup_32() or
      startup_64(): it checks what is loaded into cr3 and compares it to
      _pgtables. _pgtables is the array of page tables where early code
      allocates page table from.
      
      KASLR expects cr3 to point to _pgtables if we booted via startup_32(), but
      that's not true if we booted with 5-level paging enabled. In this case top
      level page table is allocated separately and only the first p4d page table
      is allocated from the array.
      
      Let's modify the check to cover both 4- and 5-level paging cases.
      
      The patch also renames 'level4p' to 'top_level_pgt' as it now can hold
      page table for 4th or 5th level, depending on configuration.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20170628121730.43079-1-kirill.shutemov@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a24261d7
    • B
      x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug · 8eabf42a
      Baoquan He 提交于
      Kernel text KASLR is separated into physical address and virtual
      address randomization. And for virtual address randomization, we
      only randomiza to get an offset between 16M and KERNEL_IMAGE_SIZE.
      So the initial value of 'virt_addr' should be LOAD_PHYSICAL_ADDR,
      but not the original kernel loading address 'output'.
      
      The bug will cause kernel boot failure if kernel is loaded at a different
      position than the address, 16M, which is decided at compiled time.
      Kexec/kdump is such practical case.
      
      To fix it, just assign LOAD_PHYSICAL_ADDR to virt_addr as initial
      value.
      Tested-by: NDave Young <dyoung@redhat.com>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 8391c73c ("x86/KASLR: Randomize virtual address separately")
      Link: http://lkml.kernel.org/r/1498567146-11990-3-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8eabf42a
    • B
      x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization · b892cb87
      Baoquan He 提交于
      For kernel text KASLR, the virtual address is confined to area of 1G,
      [0xffffffff80000000, 0xffffffffc0000000). For the implemenataion of
      virtual address randomization, we only randomize to get an offset
      between 16M and 1G, then add this offset to the starting address,
      0xffffffff80000000. Here 16M is the offset which is decided at linking
      stage. So the amount of the local variable 'virt_addr' which respresents
      the offset plus the kernel output size can not exceed KERNEL_IMAGE_SIZE.
      
      Add a debug check for the offset. If out of bounds, print error
      message and hang there.
      Suggested-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NBaoquan He <bhe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1498567146-11990-2-git-send-email-bhe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b892cb87
    • J
      MIPS: Avoid accidental raw backtrace · 85423636
      James Hogan 提交于
      Since commit 81a76d71 ("MIPS: Avoid using unwind_stack() with
      usermode") show_backtrace() invokes the raw backtracer when
      cp0_status & ST0_KSU indicates user mode to fix issues on EVA kernels
      where user and kernel address spaces overlap.
      
      However this is used by show_stack() which creates its own pt_regs on
      the stack and leaves cp0_status uninitialised in most of the code paths.
      This results in the non deterministic use of the raw back tracer
      depending on the previous stack content.
      
      show_stack() deals exclusively with kernel mode stacks anyway, so
      explicitly initialise regs.cp0_status to KSU_KERNEL (i.e. 0) to ensure
      we get a useful backtrace.
      
      Fixes: 81a76d71 ("MIPS: Avoid using unwind_stack() with usermode")
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: <stable@vger.kernel.org> # 3.15+
      Patchwork: https://patchwork.linux-mips.org/patch/16656/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      85423636
    • P
      MIPS: Perform post-DMA cache flushes on systems with MAARs · cad482c1
      Paul Burton 提交于
      Recent CPUs from Imagination Technologies such as the I6400 or P6600 are
      able to speculatively fetch data from memory into caches. This means
      that if used in a system with non-coherent DMA they require that caches
      be invalidated after a device performs DMA, and before the CPU reads the
      DMA'd data, in order to ensure that stale values weren't speculatively
      prefetched.
      
      Such CPUs also introduced Memory Accessibility Attribute Registers
      (MAARs) in order to control the regions in which they are allowed to
      speculate. Thus we can use the presence of MAARs as a good indication
      that the CPU requires the above cache maintenance. Use the presence of
      MAARs to determine the result of cpu_needs_post_dma_flush() in the
      default case, in order to handle these recent CPUs correctly.
      
      Note that the return type of cpu_needs_post_dma_flush() is changed to
      bool, such that it's clearer what's happening when cpu_has_maar is cast
      to bool for the return value. If this patch were backported to a
      pre-v4.7 kernel then MIPS_CPU_MAAR was 1ull<<34, so when cast to an int
      we would incorrectly return 0. It so happens that MIPS_CPU_MAAR is
      currently 1ull<<30, so when truncated to an int gives a non-zero value
      anyway, but even so the implicit conversion from long long int to bool
      makes it clearer to understand what will happen than the implicit
      conversion from long long int to int would. The bool return type also
      fits this usage better semantically, so seems like an all-round win.
      
      Thanks to Ed for spotting the issue for pre-v4.7 kernels & suggesting
      the return type change.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Reviewed-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
      Tested-by: NBryan O'Donoghue <pure.logic@nexus-software.ie>
      Cc: Ed Blake <ed.blake@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/16363/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      cad482c1
    • P
      MIPS: Fix IRQ tracing & lockdep when rescheduling · d8550860
      Paul Burton 提交于
      When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler
      from arch/mips/kernel/entry.S we disable interrupts. This is true
      regardless of whether we reach work_resched from syscall_exit_work,
      resume_userspace or by looping after calling schedule(). Although we
      disable interrupts in these paths we don't call trace_hardirqs_off()
      before calling into C code which may acquire locks, and we therefore
      leave lockdep with an inconsistent view of whether interrupts are
      disabled or not when CONFIG_PROVE_LOCKING & CONFIG_DEBUG_LOCKDEP are
      both enabled.
      
      Without tracing this interrupt state lockdep will print warnings such
      as the following once a task returns from a syscall via
      syscall_exit_partial with TIF_NEED_RESCHED set:
      
      [   49.927678] ------------[ cut here ]------------
      [   49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8
      [   49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      [   49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197
      [   49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4
      [   49.974431]         0000000000000000 0000000000000000 0000000000000000 000000000000004a
      [   49.985300]         ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8
      [   49.996194]         0000000000000001 0000000000000000 0000000000000000 0000000077c8030c
      [   50.007063]         000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88
      [   50.017945]         0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498
      [   50.028827]         0000000000000000 0000000000000001 0000000000000000 0000000000000000
      [   50.039688]         0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc
      [   50.050575]         00000000140084e0 0000000000000000 0000000000000000 0000000000040a00
      [   50.061448]         0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc
      [   50.072327]         ...
      [   50.076087] Call Trace:
      [   50.079869] [<ffffffff8010e1b0>] show_stack+0x80/0xa8
      [   50.086577] [<ffffffff805509bc>] dump_stack+0x10c/0x190
      [   50.093498] [<ffffffff8015dde0>] __warn+0xf0/0x108
      [   50.099889] [<ffffffff8015de34>] warn_slowpath_fmt+0x3c/0x48
      [   50.107241] [<ffffffff801c15b4>] check_flags.part.41+0x1dc/0x1e8
      [   50.114961] [<ffffffff801c239c>] lock_is_held_type+0x8c/0xb0
      [   50.122291] [<ffffffff809461b8>] __schedule+0x8c0/0x10f8
      [   50.129221] [<ffffffff80946a60>] schedule+0x30/0x98
      [   50.135659] [<ffffffff80106278>] work_resched+0x8/0x34
      [   50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]---
      [   50.148405] possible reason: unannotated irqs-off.
      [   50.154600] irq event stamp: 400463
      [   50.159566] hardirqs last  enabled at (400463): [<ffffffff8094edc8>] _raw_spin_unlock_irqrestore+0x40/0xa8
      [   50.171981] hardirqs last disabled at (400462): [<ffffffff8094eb98>] _raw_spin_lock_irqsave+0x30/0xb0
      [   50.183897] softirqs last  enabled at (400450): [<ffffffff8016580c>] __do_softirq+0x4ac/0x6a8
      [   50.195015] softirqs last disabled at (400425): [<ffffffff80165e78>] irq_exit+0x110/0x128
      
      Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off()
      when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking
      schedule() following the work_resched label because:
      
       1) Interrupts are disabled regardless of the path we take to reach
          work_resched() & schedule().
      
       2) Performing the tracing here avoids the need to do it in paths which
          disable interrupts but don't call out to C code before hitting a
          path which uses the RESTORE_SOME macro that will call
          trace_hardirqs_on() or trace_hardirqs_off() as appropriate.
      
      We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling
      syscall_trace_leave() for similar reasons, ensuring that lockdep has a
      consistent view of state after we re-enable interrupts.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: linux-mips@linux-mips.org
      Cc: stable <stable@vger.kernel.org>
      Patchwork: https://patchwork.linux-mips.org/patch/15385/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      d8550860
    • P
      MIPS: pm-cps: Drop manual cache-line alignment of ready_count · 161c51cc
      Paul Burton 提交于
      We allocate memory for a ready_count variable per-CPU, which is accessed
      via a cached non-coherent TLB mapping to perform synchronisation between
      threads within the core using LL/SC instructions. In order to ensure
      that the variable is contained within its own data cache line we
      allocate 2 lines worth of memory & align the resulting pointer to a line
      boundary. This is however unnecessary, since kmalloc is guaranteed to
      return memory which is at least cache-line aligned (see
      ARCH_DMA_MINALIGN). Stop the redundant manual alignment.
      
      Besides cleaning up the code & avoiding needless work, this has the side
      effect of avoiding an arithmetic error found by Bryan on 64 bit systems
      due to the 32 bit size of the former dlinesz. This led the ready_count
      variable to have its upper 32b cleared erroneously for MIPS64 kernels,
      causing problems when ready_count was later used on MIPS64 via cpuidle.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Fixes: 3179d37e ("MIPS: pm-cps: add PM state entry code for CPS systems")
      Reported-by: NBryan O'Donoghue <bryan.odonoghue@imgtec.com>
      Reviewed-by: NBryan O'Donoghue <bryan.odonoghue@imgtec.com>
      Tested-by: NBryan O'Donoghue <bryan.odonoghue@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Cc: stable <stable@vger.kernel.org> # v3.16+
      Patchwork: https://patchwork.linux-mips.org/patch/15383/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      161c51cc
    • D
      ARM: 8685/1: ensure memblock-limit is pmd-aligned · 9e25ebfe
      Doug Berger 提交于
      The pmd containing memblock_limit is cleared by prepare_page_table()
      which creates the opportunity for early_alloc() to allocate unmapped
      memory if memblock_limit is not pmd aligned causing a boot-time hang.
      
      Commit 965278dc ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
      attempted to resolve this problem, but there is a path through the
      adjust_lowmem_bounds() routine where if all memory regions start and
      end on pmd-aligned addresses the memblock_limit will be set to
      arm_lowmem_limit.
      
      Since arm_lowmem_limit can be affected by the vmalloc early parameter,
      the value of arm_lowmem_limit may not be pmd-aligned. This commit
      corrects this oversight such that memblock_limit is always rounded
      down to pmd-alignment.
      
      Fixes: 965278dc ("ARM: 8356/1: mm: handle non-pmd-aligned end of RAM")
      Signed-off-by: NDoug Berger <opendmb@gmail.com>
      Suggested-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
      9e25ebfe
    • K
      x86/ftrace: Exclude functions in head64.c from function-tracing · bb43dbc5
      Kirill A. Shutemov 提交于
      A recent commit moved most logic of early boot up from startup_64() written
      in assembly to __startup_64() written in C.
      
      Fengguang reported breakage due to the change. It was tracked down to
      CONFIG_FUNCTION_TRACER being enabled.
      
      Tracing this function is not possible because it's invoked from the
      earliest boot stage before the relocation fixups have been done. It is the
      function doing the relocation.
      
      Exclude it from being built with tracer stubs.
      
      Fixes: c88d7150 ("x86/boot/64: Rewrite startup_64() in C")
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: lkp@01.org
      Link: http://lkml.kernel.org/r/20170627115948.17938-1-kirill.shutemov@linux.intel.com
      bb43dbc5
    • K
      perf/x86/intel/uncore: Fix wrong box pointer check · 80c65fdb
      Kan Liang 提交于
      Should not init a NULL box. It will cause system crash.
      The issue looks like caused by a typo.
      
      This was not noticed because there is no NULL box. Also, for most
      boxes, they are enabled by default. The init code is not critical.
      
      Fixes: fff4b87e ("perf/x86/intel/uncore: Make package handling more robust")
      Signed-off-by: NKan Liang <kan.liang@intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/20170629190926.2456-1-kan.liang@intel.com
      80c65fdb
    • D
      arm64: ptrace: Fix incorrect get_user() use in compat_vfp_set() · 5fbd5fc4
      Dave Martin 提交于
      Now that compat_vfp_get() uses the regset API to copy the FPSCR
      value out to userspace, compat_vfp_set() looks inconsistent.  In
      particular, compat_vfp_set() will fail if called with kbuf != NULL
      && ubuf == NULL (which is valid usage according to the regset API).
      
      This patch fixes compat_vfp_set() to use user_regset_copyin(),
      similarly to compat_vfp_get().
      
      This also squashes a sparse warning triggered by the cast that
      drops __user when calling get_user().
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      5fbd5fc4
    • D
      arm64: ptrace: Remove redundant overrun check from compat_vfp_set() · 16d38acb
      Dave Martin 提交于
      compat_vfp_set() checks for userspace trying to write an excessive
      amount of data to the regset.  However this check is conspicuous
      for its absence from every other _set() in the arm64 ptrace
      implementation.  In fact, the core ptrace_regset() already clamps
      userspace's iov_len to the regset size before the individual regset
      .{get,set}() methods get called.
      
      This patch removes the redundant check.
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      16d38acb
    • D
      arm64: ptrace: Avoid setting compat FP[SC]R to garbage if get_user fails · 53b1a742
      Dave Martin 提交于
      If get_user() fails when reading the new FPSCR value from userspace
      in compat_vfp_get(), then garbage* will be written to the task's
      FPSR and FPCR registers.
      
      This patch prevents this by checking the return from get_user()
      first.
      
      [*] Actually, zero, due to the behaviour of get_user() on error, but
      that's still not what userspace expects.
      
      Fixes: 478fcb2c ("arm64: Debugging support")
      Signed-off-by: NDave Martin <Dave.Martin@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      53b1a742