1. 27 7月, 2016 6 次提交
  2. 24 7月, 2016 2 次提交
    • A
      x86/mm/cpa: Add missing comment in populate_pdg() · 55920d31
      Andy Lutomirski 提交于
      In commit:
      
        21cbc2822aa1 ("x86/mm/cpa: Unbreak populate_pgd(): stop trying to deallocate failed PUDs")
      
      I intended to add this comment, but I failed at using git.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/242baf8612394f4e31216f96d13c4d2e9b90d1b7.1469293159.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      55920d31
    • A
      x86/mm/cpa: Fix populate_pgd(): Stop trying to deallocate failed PUDs · 530dd8d4
      Andy Lutomirski 提交于
      Valdis Kletnieks bisected a boot failure back to this recent commit:
      
        360cb4d1 ("x86/mm/cpa: In populate_pgd(), don't set the PGD entry until it's populated")
      
      I broke the case where a PUD table got allocated -- populate_pud()
      would wander off a pgd_none entry and get lost.  I'm not sure how
      this survived my testing.
      
      Fix the original issue in a much simpler way.  The problem
      was that, if we allocated a PUD table, failed to populate it, and
      freed it, another CPU could potentially keep using the PGD entry we
      installed (either by copying it via vmalloc_fault or by speculatively
      caching it).  There's a straightforward fix: simply leave the
      top-level entry in place if this happens.  This can't waste any
      significant amount of memory -- there are at most 256 entries like
      this systemwide and, as a practical matter, if we hit this failure
      path repeatedly, we're likely to reuse the same page anyway.
      
      For context, this is a reversion with this hunk added in:
      
      	if (ret < 0) {
      +		/*
      +		 * Leave the PUD page in place in case some other CPU or thread
      +		 * already found it, but remove any useless entries we just
      +		 * added to it.
      +		 */
      -		unmap_pgd_range(cpa->pgd, addr,
      +		unmap_pud_range(pgd_entry, addr,
      			        addr + (cpa->numpages << PAGE_SHIFT));
      		return ret;
      	}
      
      This effectively open-codes what the now-deleted unmap_pgd_range()
      function used to do except that unmap_pgd_range() used to try to
      free the page as well.
      Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Mike Krinkin <krinkin.m.u@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Link: http://lkml.kernel.org/r/21cbc2822aa18aa812c0215f4231dbf5f65afa7f.1469249789.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      530dd8d4
  3. 22 7月, 2016 3 次提交
    • A
      x86/boot: Simplify EBDA-vs-BIOS reservation logic · 6a79296c
      Andy Lutomirski 提交于
      Both the intent and the effect of reserve_bios_regions() is simple:
      reserve the range from the apparent BIOS start (suitably filtered)
      through 1MB and, if the EBDA start address is sensible, extend that
      reservation downward to cover the EBDA as well.
      
      The code is overcomplicated, though, and contains head-scratchers
      like:
      
      	if (ebda_start < BIOS_START_MIN)
      		ebda_start = BIOS_START_MAX;
      
      That snipped is trying to say "if ebda_start < BIOS_START_MIN,
      ignore it".
      
      Simplify it: reorder the code so that it makes sense.  This should
      have no functional effect under any circumstances.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Link: http://lkml.kernel.org/r/ef89c0c761be20ead8bd9a3275743e6259b6092a.1469135598.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6a79296c
    • A
      x86/boot: Clarify what x86_legacy_features.reserve_bios_regions does · 30f02739
      Andy Lutomirski 提交于
      It doesn't just control probing for the EBDA -- it controls whether we
      detect and reserve the <1MB BIOS regions in general.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Mario Limonciello <mario_limonciello@dell.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Link: http://lkml.kernel.org/r/55bd591115498440d461857a7b64f349a5d911f3.1469135598.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      30f02739
    • D
      x86/fpu: Do not BUG_ON() in early FPU code · ec3ed4a2
      Dave Hansen 提交于
      I don't think it is really possible to have a system where CPUID
      enumerates support for XSAVE but that it does not have FP/SSE
      (they are "legacy" features and always present).
      
      But, I did manage to hit this case in qemu when I enabled its
      somewhat shaky XSAVE support.  The bummer is that the FPU is set
      up before we parse the command-line or have *any* console support
      including earlyprintk.  That turned what should have been an easy
      thing to debug in to a bit more of an odyssey.
      
      So a BUG() here is worthless.  All it does it guarantee that
      if/when we hit this case we have an empty console.  So, remove
      the BUG() and try to limp along by disabling XSAVE and trying to
      continue.  Add a comment on why we are doing this, and also add
      a common "out_disable" path for leaving fpu__init_system_xstate().
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20160720194551.63BB2B58@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ec3ed4a2
  4. 21 7月, 2016 1 次提交
    • I
      x86/boot: Reorganize and clean up the BIOS area reservation code · edce2121
      Ingo Molnar 提交于
      So the reserve_ebda_region() code has accumulated a number of
      problems over the years that make it really difficult to read
      and understand:
      
      - The calculation of 'lowmem' and 'ebda_addr' is an unnecessarily
        interleaved mess of first lowmem, then ebda_addr, then lowmem tweaks...
      
      - 'lowmem' here means 'super low mem' - i.e. 16-bit addressable memory. In other
        parts of the x86 code 'lowmem' means 32-bit addressable memory... This makes it
        super confusing to read.
      
      - It does not help at all that we have various memory range markers, half of which
        are 'start of range', half of which are 'end of range' - but this crucial
        property is not obvious in the naming at all ... gave me a headache trying to
        understand all this.
      
      - Also, the 'ebda_addr' name sucks: it highlights that it's an address (which is
        obvious, all values here are addresses!), while it does not highlight that it's
        the _start_ of the EBDA region ...
      
      - 'BIOS_LOWMEM_KILOBYTES' says a lot of things, except that this is the only value
        that is a pointer to a value, not a memory range address!
      
      - The function name itself is a misnomer: it says 'reserve_ebda_region()' while
        its main purpose is to reserve all the firmware ROM typically between 640K and
        1MB, while the 'EBDA' part is only a small part of that ...
      
      - Likewise, the paravirt quirk flag name 'ebda_search' is misleading as well: this
        too should be about whether to reserve firmware areas in the paravirt case.
      
      - In fact thinking about this as 'end of RAM' is confusing: what this function
        *really* wants to reserve is firmware data and code areas! Once the thinking is
        inverted from a mixed 'ram' and 'reserved firmware area' notion to a pure
        'reserved area' notion everything becomes a lot clearer.
      
      To improve all this rewrite the whole code (without changing the logic):
      
      - Firstly invert the naming from 'lowmem end' to 'BIOS reserved area start'
        and propagate this concept through all the variable names and constants.
      
      	BIOS_RAM_SIZE_KB_PTR		// was: BIOS_LOWMEM_KILOBYTES
      
      	BIOS_START_MIN			// was: INSANE_CUTOFF
      
      	ebda_start			// was: ebda_addr
      	bios_start			// was: lowmem
      
      	BIOS_START_MAX			// was: LOWMEM_CAP
      
      - Then clean up the name of the function itself by renaming it
        to reserve_bios_regions() and renaming the ::ebda_search paravirt
        flag to ::reserve_bios_regions.
      
      - Fix up all the comments (fix typos), harmonize and simplify their
        formulation and remove comments that become unnecessary due to
        the much better naming all around.
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      edce2121
  5. 19 7月, 2016 1 次提交
  6. 15 7月, 2016 18 次提交
  7. 13 7月, 2016 7 次提交
    • D
      x86/mm: Use pte_none() to test for empty PTE · dcb32d99
      Dave Hansen 提交于
      The page table manipulation code seems to have grown a couple of
      sites that are looking for empty PTEs.  Just in case one of these
      entries got a stray bit set, use pte_none() instead of checking
      for a zero pte_val().
      
      The use pte_same() makes me a bit nervous.  If we were doing a
      pte_same() check against two cleared entries and one of them had
      a stray bit set, it might fail the pte_same() check.  But, I
      don't think we ever _do_ pte_same() for cleared entries.  It is
      almost entirely used for checking for races in fault-in paths.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001915.813703D9@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dcb32d99
    • D
      x86/mm: Disallow running with 32-bit PTEs to work around erratum · e4a84be6
      Dave Hansen 提交于
      The Intel(R) Xeon Phi(TM) Processor x200 Family (codename: Knights
      Landing) has an erratum where a processor thread setting the Accessed
      or Dirty bits may not do so atomically against its checks for the
      Present bit.  This may cause a thread (which is about to page fault)
      to set A and/or D, even though the Present bit had already been
      atomically cleared.
      
      These bits are truly "stray".  In the case of the Dirty bit, the
      thread associated with the stray set was *not* allowed to write to
      the page.  This means that we do not have to launder the bit(s); we
      can simply ignore them.
      
      If the PTE is used for storing a swap index or a NUMA migration index,
      the A bit could be misinterpreted as part of the swap type.  The stray
      bits being set cause a software-cleared PTE to be interpreted as a
      swap entry.  In some cases (like when the swap index ends up being
      for a non-existent swapfile), the kernel detects the stray value
      and WARN()s about it, but there is no guarantee that the kernel can
      always detect it.
      
      When we have 64-bit PTEs (64-bit mode or 32-bit PAE), we were able
      to move the swap PTE format around to avoid these troublesome bits.
      But, 32-bit non-PAE is tight on bits.  So, disallow it from running
      on this hardware.  I can't imagine anyone wanting to run 32-bit
      non-highmem kernels on this hardware, but disallowing them from
      running entirely is surely the safe thing to do.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001914.D0B50110@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      e4a84be6
    • D
      x86/mm: Ignore A/D bits in pte/pmd/pud_none() · 97e3c602
      Dave Hansen 提交于
      The erratum we are fixing here can lead to stray setting of the
      A and D bits.  That means that a pte that we cleared might
      suddenly have A/D set.  So, stop considering those bits when
      determining if a pte is pte_none().  The same goes for the
      other pmd_none() and pud_none().  pgd_none() can be skipped
      because it is not affected; we do not use PGD entries for
      anything other than pagetables on affected configurations.
      
      This adds a tiny amount of overhead to all pte_none() checks.
      I doubt we'll be able to measure it anywhere.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001912.5216F89C@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      97e3c602
    • D
      x86/mm: Move swap offset/type up in PTE to work around erratum · 00839ee3
      Dave Hansen 提交于
      This erratum can result in Accessed/Dirty getting set by the hardware
      when we do not expect them to be (on !Present PTEs).
      
      Instead of trying to fix them up after this happens, we just
      allow the bits to get set and try to ignore them.  We do this by
      shifting the layout of the bits we use for swap offset/type in
      our 64-bit PTEs.
      
      It looks like this:
      
       bitnrs: |     ...            | 11| 10|  9|8|7|6|5| 4| 3|2|1|0|
       names:  |     ...            |SW3|SW2|SW1|G|L|D|A|CD|WT|U|W|P|
       before: |         OFFSET (9-63)          |0|X|X| TYPE(1-5) |0|
        after: | OFFSET (14-63)  |  TYPE (9-13) |0|X|X|X| X| X|X|X|0|
      
      Note that D was already a don't care (X) even before.  We just
      move TYPE up and turn its old spot (which could be hit by the
      A bit) into all don't cares.
      
      We take 5 bits away from the offset, but that still leaves us
      with 50 bits which lets us index into a 62-bit swapfile (4 EiB).
      I think that's probably fine for the moment.  We could
      theoretically reclaim 5 of the bits (1, 2, 3, 4, 7) but it
      doesn't gain us anything.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: dave.hansen@intel.com
      Cc: linux-mm@kvack.org
      Cc: mhocko@suse.com
      Link: http://lkml.kernel.org/r/20160708001911.9A3FD2B6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      00839ee3
    • A
      x86/sfi: Enable enumeration of SD devices · 05f310e2
      Andy Shevchenko 提交于
      SFI specification v0.8.2 defines type of devices which are connected to
      SD bus. In particularly WiFi dongle is a such.
      
      Add a callback to enumerate the devices connected to SD bus.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1468322192-62080-1-git-send-email-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      05f310e2
    • A
      x86/pci: Use MRFLD abbreviation for Merrifield · 707a605b
      Andy Shevchenko 提交于
      Everywhere in the kernel the MRFLD is used as abbreviation of Intel Merrifield.
      Do the same in intel_mid_pci.c module.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1468321462-136016-1-git-send-email-andriy.shevchenko@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      707a605b
    • A
      kvm arm/arm64: Remove trailing whitespace from headers · f2d3adf4
      Arnaldo Carvalho de Melo 提交于
      Noticed while making a copy of these files to tools/ where those kernel
      files were being directly accessed, which we're not allowing anymore to
      avoid that changes in the kernel side break tooling.
      
      Cc: Christoffer Dall <christoffer.dall@linaro.org>
      Cc: Eric Auger <eric.auger@linaro.org>
      Cc: Marc Zyngier <marc.zyngier@arm.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Wang Nan <wangnan0@huawei.com>
      Link: http://lkml.kernel.org/n/tip-82thftcdhj2j5wt6ir4vuyhk@git.kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f2d3adf4
  8. 12 7月, 2016 2 次提交
新手
引导
客服 返回
顶部