1. 22 2月, 2016 3 次提交
    • S
      x86/efi: Map EFI_MEMORY_{XP,RO} memory region bits to EFI page tables · 6d0cc887
      Sai Praneeth 提交于
      Now that we have EFI memory region bits that indicate which regions do
      not need execute permission or read/write permission in the page tables,
      let's use them.
      
      We also check for EFI_NX_PE_DATA and only enforce the restrictive
      mappings if it's present (to allow us to ignore buggy firmware that sets
      bits it didn't mean to and to preserve backwards compatibility).
      
      Instead of assuming that firmware would set appropriate attributes in
      memory descriptor like EFI_MEMORY_RO for code and EFI_MEMORY_XP for
      data, we can expect some firmware out there which might only set *type*
      in memory descriptor to be EFI_RUNTIME_SERVICES_CODE or
      EFI_RUNTIME_SERVICES_DATA leaving away attribute. This will lead to
      improper mappings of EFI runtime regions. In order to avoid it, we check
      attribute and type of memory descriptor to update mappings and moreover
      Windows works this way.
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Lee, Chun-Yi <jlee@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1455712566-16727-13-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6d0cc887
    • S
      x86/mm/pat: Don't implicitly allow _PAGE_RW in kernel_map_pages_in_pgd() · 15f003d2
      Sai Praneeth 提交于
      As part of the preparation for the EFI_MEMORY_RO flag added in the UEFI
      2.5 specification, we need the ability to map pages in kernel page
      tables without _PAGE_RW being set.
      
      Modify kernel_map_pages_in_pgd() to require its callers to pass _PAGE_RW
      if the pages need to be mapped read/write. Otherwise, we'll map the
      pages as read-only.
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Lee, Chun-Yi <jlee@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1455712566-16727-12-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      15f003d2
    • S
      x86/mm/pat: Use _PAGE_GLOBAL bit for EFI page table mappings · 39763015
      Sai Praneeth 提交于
      Since EFI page tables can be treated as kernel page tables they should
      be global. All the other page mapping functions in pageattr.c set the
      _PAGE_GLOBAL bit and we want to avoid inconsistencies when we map a page
      in the EFI code paths, for example when that page is split in
      __split_large_page(), etc. It also makes it easier to validate that the
      EFI region mappings have the correct attributes because there are fewer
      differences compared with regular kernel mappings.
      Signed-off-by: NSai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ravi Shankar <ravi.v.shankar@intel.com>
      Cc: Ricardo Neri <ricardo.neri@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1455712566-16727-4-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      39763015
  2. 19 2月, 2016 1 次提交
  3. 18 2月, 2016 1 次提交
    • T
      x86/mm: Fix vmalloc_fault() to handle large pages properly · f4eafd8b
      Toshi Kani 提交于
      A kernel page fault oops with the callstack below was observed
      when a read syscall was made to a pmem device after a huge amount
      (>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
      system:
      
           BUG: unable to handle kernel paging request at ffff880840000ff8
           IP: vmalloc_fault+0x1be/0x300
           PGD c7f03a067 PUD 0
           Oops: 0000 [#1] SM
           Call Trace:
              __do_page_fault+0x285/0x3e0
              do_page_fault+0x2f/0x80
              ? put_prev_entity+0x35/0x7a0
              page_fault+0x28/0x30
              ? memcpy_erms+0x6/0x10
              ? schedule+0x35/0x80
              ? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
              ? schedule_timeout+0x183/0x240
              btt_log_read+0x63/0x140 [nd_btt]
               :
              ? __symbol_put+0x60/0x60
              ? kernel_read+0x50/0x80
              SyS_finit_module+0xb9/0xf0
              entry_SYSCALL_64_fastpath+0x1a/0xa4
      
      Since v4.1, ioremap() supports large page (pud/pmd) mappings in
      x86_64 and PAE.  vmalloc_fault() however assumes that the vmalloc
      range is limited to pte mappings.
      
      vmalloc faults do not normally happen in ioremap'd ranges since
      ioremap() sets up the kernel page tables, which are shared by
      user processes.  pgd_ctor() sets the kernel's PGD entries to
      user's during fork().  When allocation of the vmalloc ranges
      crosses a 512GB boundary, ioremap() allocates a new pud table
      and updates the kernel PGD entry to point it.  If user process's
      PGD entry does not have this update yet, a read/write syscall
      to the range will cause a vmalloc fault, which hits the Oops
      above as it does not handle a large page properly.
      
      Following changes are made to vmalloc_fault().
      
      64-bit:
      
       - No change for the PGD sync operation as it handles large
         pages already.
       - Add pud_huge() and pmd_huge() to the validation code to
         handle large pages.
       - Change pud_page_vaddr() to pud_pfn() since an ioremap range
         is not directly mapped (while the if-statement still works
         with a bogus addr).
       - Change pmd_page() to pmd_pfn() since an ioremap range is not
         backed by struct page (while the if-statement still works
         with a bogus addr).
      
      32-bit:
       - No change for the sync operation since the index3 PGD entry
         covers the entire vmalloc range, which is always valid.
         (A separate change to sync PGD entry is necessary if this
          memory layout is changed regardless of the page size.)
       - Add pmd_huge() to the validation code to handle large pages.
         This is for completeness since vmalloc_fault() won't happen
         in ioremap'd ranges as its PGD entry is always valid.
      Reported-by: NHenning Schild <henning.schild@siemens.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Cc: <stable@vger.kernel.org> # 4.1+
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: linux-mm@kvack.org
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f4eafd8b
  4. 17 2月, 2016 4 次提交
    • M
      hpet: Drop stale URLs · 4e7f9df2
      Michael S. Tsirkin 提交于
      Looks like the HPET spec at intel.com got moved.
      It isn't hard to find so drop the link, just mention
      the revision assumed.
      Suggested-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Clemens Ladisch <clemens@ladisch.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/1455145462-3877-1-git-send-email-mst@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4e7f9df2
    • T
      x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache() · a82eee74
      Toshi Kani 提交于
      Data corruption issues were observed in tests which initiated
      a system crash/reset while accessing BTT devices.  This problem
      is reproducible.
      
      The BTT driver calls pmem_rw_bytes() to update data in pmem
      devices.  This interface calls __copy_user_nocache(), which
      uses non-temporal stores so that the stores to pmem are
      persistent.
      
      __copy_user_nocache() uses non-temporal stores when a request
      size is 8 bytes or larger (and is aligned by 8 bytes).  The
      BTT driver updates the BTT map table, which entry size is
      4 bytes.  Therefore, updates to the map table entries remain
      cached, and are not written to pmem after a crash.
      
      Change __copy_user_nocache() to use non-temporal store when
      a request size is 4 bytes.  The change extends the current
      byte-copy path for a less-than-8-bytes request, and does not
      add any overhead to the regular path.
      Reported-and-tested-by: NMicah Parrish <micah.parrish@hpe.com>
      Reported-and-tested-by: NBrian Boylston <brian.boylston@hpe.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/1455225857-12039-3-git-send-email-toshi.kani@hpe.com
      [ Small readability edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a82eee74
    • T
      x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable · ee9737c9
      Toshi Kani 提交于
      Add comments to __copy_user_nocache() to clarify its procedures
      and alignment requirements.
      
      Also change numeric branch target labels to named local labels.
      
      No code changed:
      
       arch/x86/lib/copy_user_64.o:
      
          text    data     bss     dec     hex filename
          1239       0       0    1239     4d7 copy_user_64.o.before
          1239       0       0    1239     4d7 copy_user_64.o.after
      
       md5:
          58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.before.asm
          58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.after.asm
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: brian.boylston@hpe.com
      Cc: dan.j.williams@intel.com
      Cc: linux-nvdimm@lists.01.org
      Cc: micah.parrish@hpe.com
      Cc: ross.zwisler@linux.intel.com
      Cc: vishal.l.verma@intel.com
      Link: http://lkml.kernel.org/r/1455225857-12039-2-git-send-email-toshi.kani@hpe.com
      [ Small readability edits and added object file comparison. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ee9737c9
    • T
      perf/x86/amd/uncore: Plug reference leak · 8bc9162c
      Thomas Gleixner 提交于
      In the error path of amd_uncore_cpu_up_prepare() the newly allocated uncore
      struct is freed, but the percpu pointer still references it. Set it to NULL.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1602162302170.19512@nanosSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8bc9162c
  5. 12 2月, 2016 1 次提交
  6. 08 2月, 2016 1 次提交
    • I
      x86/mm/numa: Fix 32-bit memblock range truncation bug on 32-bit NUMA kernels · 59fd1214
      Ingo Molnar 提交于
      The following commit:
      
        a0acda91 ("acpi, numa, mem_hotplug: mark all nodes the kernel resides un-hotpluggable")
      
      Introduced numa_clear_kernel_node_hotplug(), which function is executed
      during early bootup, and which marks all currently reserved memblock
      regions as hot-memory-unswappable as well.
      
      y14sg1 <y14sg1@comcast.net> reported that when running 32-bit NUMA kernels,
      the grsecurity/PAX kernel patch flagged a size overflow in this function:
      
        PAX: size overflow detected in function x86_numa_init arch/x86/mm/numa.c:691 [...]
      
      ... the reason for the overflow is that memblock_clear_hotplug() takes physical
      addresses as arguments, while the start/end variables used by
      numa_clear_kernel_node_hotplug() are 'unsigned long', which is 32-bit on PAE
      kernels, but which has 64-bit physical addresses.
      
      So on 32-bit PAE kernels that have physical memory above the 4GB boundary,
      we truncate a 64-bit physical address range to 32 bits and pass it to
      memblock_clear_hotplug(), which at minimum prevents the original memory-hotplug
      bugfix from working, but might have other side effects as well.
      
      The fix is to use the proper type to handle physical addresses, phys_addr_t.
      Reported-by: Ny14sg1 <y14sg1@comcast.net>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Brad Spengler <spender@grsecurity.net>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: PaX Team <pageexec@freemail.hu>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      59fd1214
  7. 06 2月, 2016 1 次提交
    • V
      mm, hugetlb: don't require CMA for runtime gigantic pages · 080fe206
      Vlastimil Babka 提交于
      Commit 944d9fec ("hugetlb: add support for gigantic page allocation
      at runtime") has added the runtime gigantic page allocation via
      alloc_contig_range(), making this support available only when CONFIG_CMA
      is enabled.  Because it doesn't depend on MIGRATE_CMA pageblocks and the
      associated infrastructure, it is possible with few simple adjustments to
      require only CONFIG_MEMORY_ISOLATION instead of full CONFIG_CMA.
      
      After this patch, alloc_contig_range() and related functions are
      available and used for gigantic pages with just CONFIG_MEMORY_ISOLATION
      enabled.  Note CONFIG_CMA selects CONFIG_MEMORY_ISOLATION.  This allows
      supporting runtime gigantic pages without the CMA-specific checks in
      page allocator fastpaths.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Luiz Capitulino <lcapitulino@redhat.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      080fe206
  8. 05 2月, 2016 1 次提交
    • D
      x86: Fix KASAN false positives in thread_saved_pc() · 75edb54a
      Dmitry Vyukov 提交于
      thread_saved_pc() reads stack of a potentially running task.
      This can cause false KASAN stack-out-of-bounds reports,
      because the running task concurrently poisons and unpoisons
      own stack.
      
      The same happens in get_wchan(), and get get_wchan() was fixed
      by using READ_ONCE_NOCHECK(). Do the same here.
      
      Example KASAN report triggered by sysrq-t:
      
        BUG: KASAN: out-of-bounds in sched_show_task+0x306/0x3b0 at addr ffff880043c97c18
        Read of size 8 by task syz-executor/23839
        [...]
        page dumped because: kasan: bad access detected
        [...]
        Call Trace:
         [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40
         [<ffffffff813e7a26>] sched_show_task+0x306/0x3b0
         [<ffffffff813e7bf4>] show_state_filter+0x124/0x1a0
         [<ffffffff82d2ca00>] fn_show_state+0x10/0x20
         [<ffffffff82d2cf98>] k_spec+0xa8/0xe0
         [<ffffffff82d3354f>] kbd_event+0xb9f/0x4000
         [<ffffffff843ca8a7>] input_to_handler+0x3a7/0x4b0
         [<ffffffff843d1954>] input_pass_values.part.5+0x554/0x6b0
         [<ffffffff843d29bc>] input_handle_event+0x2ac/0x1070
         [<ffffffff843d3a47>] input_inject_event+0x237/0x280
         [<ffffffff843e8c28>] evdev_write+0x478/0x680
         [<ffffffff817ac653>] __vfs_write+0x113/0x480
         [<ffffffff817ae0e7>] vfs_write+0x167/0x4a0
         [<ffffffff817b13d1>] SyS_write+0x111/0x220
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: glider@google.com
      Cc: kasan-dev@googlegroups.com
      Cc: kcc@google.com
      Cc: linux-kernel@vger.kernel.org
      Cc: ryabinin.a.a@gmail.com
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      75edb54a
  9. 03 2月, 2016 3 次提交
    • R
      x86/efi: Show actual ending addresses in efi_print_memmap · 1e82b947
      Robert Elliott 提交于
      Adjust efi_print_memmap to print the real end address of each
      range, not 1 byte beyond. This matches other prints like those
      for SRAT and nosave memory.
      
      While investigating grub persistent memory corruption issues, it
      was helpful to make this table match the ending address
      convention used by:
      * the kernel's e820 table prints
      	BIOS-e820: [mem 0x0000001680000000-0x0000001c7fffffff] reserved
      * the kernel's nosave memory prints
      	PM: Registered nosave memory: [mem 0x880000000-0xc7fffffff]
      * the kernel's ACPI System Resource Affinity Table prints
      	SRAT: Node 1 PXM 1 [mem 0x480000000-0x87fffffff]
      * grub's lsmmap and lsefimmap commands
      	reserved  0000001680000000-0000001c7fffffff 00600000     24GiB UC WC WT WB NV
      * the UEFI shell's memmap command
      	Reserved   000000007FC00000-000000007FFFFFFF 0000000000000400 0000000000000001
      
      For example, if you grep all the various logs for c7fffffff, you
      won't find the kernel's line if it uses c80000000.
      
      Also, change the closing ) to ] to match the opening [.
      
      old:
          efi: mem61: [Persistent Memory  |   |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000880000000-0x0000000c80000000) (16384MB)
      
      new:
          efi: mem61: [Persistent Memory  |   |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000880000000-0x0000000c7fffffff] (16384MB)
      Signed-off-by: NRobert Elliott <elliott@hpe.com>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Reviewed-by: NLaszlo Ersek <lersek@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Leif Lindholm <leif.lindholm@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1454364428-494-12-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1e82b947
    • M
      x86/efi/bgrt: Don't ignore the BGRT if the 'valid' bit is 0 · 66dbe99c
      Môshe van der Sterre 提交于
      Unintuitively, the BGRT graphic is apparently meant to be usable
      if the valid bit in not set. The valid bit only conveys
      uncertainty about the validity in relation to the screen state.
      
      Windows 10 actually uses the BGRT image for its boot screen even
      if not 'valid', for example when the user triggered the boot
      menu. Because it is unclear if all firmwares will provide a
      usable graphic in this case, we now look at the BMP magic number
      as an additional check.
      Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
      Signed-off-by: NMôshe van der Sterre <me@moshe.nl>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: =?UTF-8?q?M=C3=B4she=20van=20der=20Sterre?= <me@moshe.nl>
      Link: http://lkml.kernel.org/r/1454364428-494-10-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      66dbe99c
    • A
      efi: Add nonblocking option to efi_query_variable_store() · ca0e30dc
      Ard Biesheuvel 提交于
      The function efi_query_variable_store() may be invoked by
      efivar_entry_set_nonblocking(), which itself takes care to only
      call a non-blocking version of the SetVariable() runtime
      wrapper. However, efi_query_variable_store() may call the
      SetVariable() wrapper directly, as well as the wrapper for
      QueryVariableInfo(), both of which could deadlock in the same
      way we are trying to prevent by calling
      efivar_entry_set_nonblocking() in the first place.
      
      So instead, modify efi_query_variable_store() to use the
      non-blocking variants of QueryVariableInfo() (and give up rather
      than free up space if the available space is below
      EFI_MIN_RESERVE) if invoked with the 'nonblocking' argument set
      to true.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/1454364428-494-5-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ca0e30dc
  10. 29 1月, 2016 3 次提交
    • M
      x86/mm/pat: Avoid truncation when converting cpa->numpages to address · 74256377
      Matt Fleming 提交于
      There are a couple of nasty truncation bugs lurking in the pageattr
      code that can be triggered when mapping EFI regions, e.g. when we pass
      a cpa->pgd pointer. Because cpa->numpages is a 32-bit value, shifting
      left by PAGE_SHIFT will truncate the resultant address to 32-bits.
      
      Viorel-Cătălin managed to trigger this bug on his Dell machine that
      provides a ~5GB EFI region which requires 1236992 pages to be mapped.
      When calling populate_pud() the end of the region gets calculated
      incorrectly in the following buggy expression,
      
        end = start + (cpa->numpages << PAGE_SHIFT);
      
      And only 188416 pages are mapped. Next, populate_pud() gets invoked
      for a second time because of the loop in __change_page_attr_set_clr(),
      only this time no pages get mapped because shifting the remaining
      number of pages (1048576) by PAGE_SHIFT is zero. At which point the
      loop in __change_page_attr_set_clr() spins forever because we fail to
      map progress.
      
      Hitting this bug depends very much on the virtual address we pick to
      map the large region at and how many pages we map on the initial run
      through the loop. This explains why this issue was only recently hit
      with the introduction of commit
      
        a5caa209 ("x86/efi: Fix boot crash by mapping EFI memmap
         entries bottom-up at runtime, instead of top-down")
      
      It's interesting to note that safe uses of cpa->numpages do exist in
      the pageattr code. If instead of shifting ->numpages we multiply by
      PAGE_SIZE, no truncation occurs because PAGE_SIZE is a UL value, and
      so the result is unsigned long.
      
      To avoid surprises when users try to convert very large cpa->numpages
      values to addresses, change the data type from 'int' to 'unsigned
      long', thereby making it suitable for shifting by PAGE_SHIFT without
      any type casting.
      
      The alternative would be to make liberal use of casting, but that is
      far more likely to cause problems in the future when someone adds more
      code and fails to cast properly; this bug was difficult enough to
      track down in the first place.
      Reported-and-tested-by: NViorel-Cătălin Răpițeanu <rapiteanu.catalin@gmail.com>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Cc: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=110131
      Link: http://lkml.kernel.org/r/1454067370-10374-1-git-send-email-matt@codeblueprint.co.ukSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      74256377
    • P
      perf/x86: De-obfuscate code · 8f04b853
      Peter Zijlstra 提交于
      Get rid of the 'onln' obfuscation.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      8f04b853
    • P
      perf/x86: Fix uninitialized value usage · e01d8718
      Peter Zijlstra 提交于
      When calling intel_alt_er() with .idx != EXTRA_REG_RSP_* we will not
      initialize alt_idx and then use this uninitialized value to index an
      array.
      
      When that is not fatal, it can result in an infinite loop in its
      caller __intel_shared_reg_get_constraints(), with IRQs disabled.
      
      Alternative error modes are random memory corruption due to the
      cpuc->shared_regs->regs[] array overrun, which manifest in either
      get_constraints or put_constraints doing weird stuff.
      
      Only took 6 hours of painful debugging to find this. Neither GCC nor
      Smatch warnings flagged this bug.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kan Liang <kan.liang@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: ae3f011f ("perf/x86/intel: Fix SLM MSR_OFFCORE_RSP1 valid_mask")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e01d8718
  11. 27 1月, 2016 1 次提交
  12. 25 1月, 2016 1 次提交
  13. 23 1月, 2016 1 次提交
    • R
      pmem: add wb_cache_pmem() to the PMEM API · 3f4a2670
      Ross Zwisler 提交于
      __arch_wb_cache_pmem() was already an internal implementation detail of
      the x86 PMEM API, but this functionality needs to be exported as part of
      the general PMEM API to handle the fsync/msync case for DAX mmaps.
      
      One thing worth noting is that we really do want this to be part of the
      PMEM API as opposed to a stand-alone function like clflush_cache_range()
      because of ordering restrictions.  By having wb_cache_pmem() as part of
      the PMEM API we can leave it unordered, call it multiple times to write
      back large amounts of memory, and then order the multiple calls with a
      single wmb_pmem().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jan Kara <jack@suse.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <matthew.r.wilcox@intel.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3f4a2670
  14. 22 1月, 2016 2 次提交
  15. 21 1月, 2016 4 次提交
  16. 20 1月, 2016 2 次提交
  17. 19 1月, 2016 4 次提交
  18. 17 1月, 2016 1 次提交
  19. 16 1月, 2016 5 次提交