1. 02 8月, 2017 3 次提交
    • V
      x86/intel_rdt: Change file names to accommodate RDT monitor code · 05830204
      Vikas Shivappa 提交于
      Because the "perf cqm" and resctrl code were separately added and
      indivdually configurable, there seem to be separate context switch code
      and also things on global .h which are not really needed.
      
      Move only the scheduling specific code and definitions to
      <asm/intel_rdt_sched.h> and the put all the other declarations to a
      local intel_rdt.h.
      
      h/t to Reinette Chatre for pointing out that we should separate the
      public interfaces used by other parts of the kernel from private
      objects shared between the various files comprising RDT.
      
      No functional change.
      Signed-off-by: NVikas Shivappa <vikas.shivappa@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: ravi.v.shankar@intel.com
      Cc: tony.luck@intel.com
      Cc: fenghua.yu@intel.com
      Cc: peterz@infradead.org
      Cc: eranian@google.com
      Cc: vikas.shivappa@intel.com
      Cc: ak@linux.intel.com
      Cc: davidcc@google.com
      Cc: reinette.chatre@intel.com
      Link: http://lkml.kernel.org/r/1501017287-28083-5-git-send-email-vikas.shivappa@linux.intel.com
      05830204
    • V
      x86/intel_rdt: Introduce a common compile option for RDT · f01d7d51
      Vikas Shivappa 提交于
      We currently have a CONFIG_RDT_A which is for RDT(Resource directory
      technology) allocation based resctrl filesystem interface. As a
      preparation to add support for RDT monitoring as well into the same
      resctrl filesystem, change the config option to be CONFIG_RDT which
      would include both RDT allocation and monitoring code.
      
      No functional change.
      Signed-off-by: NVikas Shivappa <vikas.shivappa@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: ravi.v.shankar@intel.com
      Cc: tony.luck@intel.com
      Cc: fenghua.yu@intel.com
      Cc: peterz@infradead.org
      Cc: eranian@google.com
      Cc: vikas.shivappa@intel.com
      Cc: ak@linux.intel.com
      Cc: davidcc@google.com
      Cc: reinette.chatre@intel.com
      Link: http://lkml.kernel.org/r/1501017287-28083-4-git-send-email-vikas.shivappa@linux.intel.com
      f01d7d51
    • V
      x86/perf/cqm: Wipe out perf based cqm · c39a0e2c
      Vikas Shivappa 提交于
      'perf cqm' never worked due to the incompatibility between perf
      infrastructure and cqm hardware support.  The hardware uses RMIDs to
      track the llc occupancy of tasks and these RMIDs are per package. This
      makes monitoring a hierarchy like cgroup along with monitoring of tasks
      separately difficult and several patches sent to lkml to fix them were
      NACKed. Further more, the following issues in the current perf cqm make
      it almost unusable:
      
          1. No support to monitor the same group of tasks for which we do
          allocation using resctrl.
      
          2. It gives random and inaccurate data (mostly 0s) once we run out
          of RMIDs due to issues in Recycling.
      
          3. Recycling results in inaccuracy of data because we cannot
          guarantee that the RMID was stolen from a task when it was not
          pulling data into cache or even when it pulled the least data. Also
          for monitoring llc_occupancy, if we stop using an RMID_x and then
          start using an RMID_y after we reclaim an RMID from an other event,
          we miss accounting all the occupancy that was tagged to RMID_x at a
          later perf_count.
      
          2. Recycling code makes the monitoring code complex including
          scheduling because the event can lose RMID any time. Since MBM
          counters count bandwidth for a period of time by taking snap shot of
          total bytes at two different times, recycling complicates the way we
          count MBM in a hierarchy. Also we need a spin lock while we do the
          processing to account for MBM counter overflow. We also currently
          use a spin lock in scheduling to prevent the RMID from being taken
          away.
      
          4. Lack of support when we run different kind of event like task,
          system-wide and cgroup events together. Data mostly prints 0s. This
          is also because we can have only one RMID tied to a cpu as defined
          by the cqm hardware but a perf can at the same time tie multiple
          events during one sched_in.
      
          5. No support of monitoring a group of tasks. There is partial support
          for cgroup but it does not work once there is a hierarchy of cgroups
          or if we want to monitor a task in a cgroup and the cgroup itself.
      
          6. No support for monitoring tasks for the lifetime without perf
          overhead.
      
          7. It reported the aggregate cache occupancy or memory bandwidth over
          all sockets. But most cloud and VMM based use cases want to know the
          individual per-socket usage.
      Signed-off-by: NVikas Shivappa <vikas.shivappa@linux.intel.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: ravi.v.shankar@intel.com
      Cc: tony.luck@intel.com
      Cc: fenghua.yu@intel.com
      Cc: peterz@infradead.org
      Cc: eranian@google.com
      Cc: vikas.shivappa@intel.com
      Cc: ak@linux.intel.com
      Cc: davidcc@google.com
      Cc: reinette.chatre@intel.com
      Link: http://lkml.kernel.org/r/1501017287-28083-2-git-send-email-vikas.shivappa@linux.intel.com
      c39a0e2c
  2. 27 7月, 2017 1 次提交
  3. 21 7月, 2017 1 次提交
  4. 20 7月, 2017 2 次提交
    • J
      debug: Fix WARN_ON_ONCE() for modules · 325cdacd
      Josh Poimboeuf 提交于
      Mike Galbraith reported a situation where a WARN_ON_ONCE() call in DRM
      code turned into an oops.  As it turns out, WARN_ON_ONCE() seems to be
      completely broken when called from a module.
      
      The bug was introduced with the following commit:
      
        19d43626 ("debug: Add _ONCE() logic to report_bug()")
      
      That commit changed WARN_ON_ONCE() to move its 'once' logic into the bug
      trap handler.  It requires a writable bug table so that the BUGFLAG_DONE
      bit can be written to the flags to indicate the first warning has
      occurred.
      
      The bug table was made writable for vmlinux, which relies on
      vmlinux.lds.S and vmlinux.lds.h for laying out the sections.  However,
      it wasn't made writable for modules, which rely on the ELF section
      header flags.
      Reported-by: NMike Galbraith <efault@gmx.de>
      Tested-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 19d43626 ("debug: Add _ONCE() logic to report_bug()")
      Link: http://lkml.kernel.org/r/a53b04235a65478dd9afc51f5b329fdc65c84364.1500095401.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      325cdacd
    • A
      x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl · 7206f9bf
      Arnd Bergmann 提交于
      The x86 version of insb/insw/insl uses an inline assembly that does
      not have the target buffer listed as an output. This can confuse
      the compiler, leading it to think that a subsequent access of the
      buffer is uninitialized:
      
        drivers/net/wireless/wl3501_cs.c: In function ‘wl3501_mgmt_scan_confirm’:
        drivers/net/wireless/wl3501_cs.c:665:9: error: ‘sig.status’ is used uninitialized in this function [-Werror=uninitialized]
        drivers/net/wireless/wl3501_cs.c:668:12: error: ‘sig.cap_info’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
        drivers/net/sb1000.c: In function 'sb1000_rx':
        drivers/net/sb1000.c:775:9: error: 'st[0]' is used uninitialized in this function [-Werror=uninitialized]
        drivers/net/sb1000.c:776:10: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
        drivers/net/sb1000.c:784:11: error: 'st[1]' may be used uninitialized in this function [-Werror=maybe-uninitialized]
      
      I tried to mark the exact input buffer as an output here, but couldn't
      figure it out. As suggested by Linus, marking all memory as clobbered
      however is good enough too. For the outs operations, I also add the
      memory clobber, to force the input to be written to local variables.
      This is probably already guaranteed by the "asm volatile", but it can't
      hurt to do this for symmetry.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tom Lendacky <thomas.lendacky@amd.com>
      Link: http://lkml.kernel.org/r/20170719125310.2487451-5-arnd@arndb.de
      Link: https://lkml.org/lkml/2017/7/12/605Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7206f9bf
  5. 18 7月, 2017 1 次提交
    • R
      x86/mm, KVM: Fix warning when !CONFIG_PREEMPT_COUNT · 4c07f904
      Roman Kagan 提交于
      A recent commit:
      
        d6e41f11 ("x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant")
      
      introduced a VM_WARN_ON(!in_atomic()) which generates false positives
      on every VM entry on !CONFIG_PREEMPT_COUNT kernels.
      
      Replace it with a test for preemptible(), which appears to match the
      original intent and works across different CONFIG_PREEMPT* variations.
      Signed-off-by: NRoman Kagan <rkagan@virtuozzo.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: Borislav Petkov <bpetkov@suse.de>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Nadav Amit <namit@vmware.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kvm@vger.kernel.org
      Cc: linux-mm@kvack.org
      Fixes: d6e41f11 ("x86/mm, KVM: Teach KVM's VMX code that CR3 isn't a constant")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      4c07f904
  6. 14 7月, 2017 5 次提交
  7. 13 7月, 2017 9 次提交
  8. 11 7月, 2017 1 次提交
    • K
      binfmt_elf: use ELF_ET_DYN_BASE only for PIE · eab09532
      Kees Cook 提交于
      The ELF_ET_DYN_BASE position was originally intended to keep loaders
      away from ET_EXEC binaries.  (For example, running "/lib/ld-linux.so.2
      /bin/cat" might cause the subsequent load of /bin/cat into where the
      loader had been loaded.)
      
      With the advent of PIE (ET_DYN binaries with an INTERP Program Header),
      ELF_ET_DYN_BASE continued to be used since the kernel was only looking
      at ET_DYN.  However, since ELF_ET_DYN_BASE is traditionally set at the
      top 1/3rd of the TASK_SIZE, a substantial portion of the address space
      is unused.
      
      For 32-bit tasks when RLIMIT_STACK is set to RLIM_INFINITY, programs are
      loaded above the mmap region.  This means they can be made to collide
      (CVE-2017-1000370) or nearly collide (CVE-2017-1000371) with
      pathological stack regions.
      
      Lowering ELF_ET_DYN_BASE solves both by moving programs below the mmap
      region in all cases, and will now additionally avoid programs falling
      back to the mmap region by enforcing MAP_FIXED for program loads (i.e.
      if it would have collided with the stack, now it will fail to load
      instead of falling back to the mmap region).
      
      To allow for a lower ELF_ET_DYN_BASE, loaders (ET_DYN without INTERP)
      are loaded into the mmap region, leaving space available for either an
      ET_EXEC binary with a fixed location or PIE being loaded into mmap by
      the loader.  Only PIE programs are loaded offset from ELF_ET_DYN_BASE,
      which means architectures can now safely lower their values without risk
      of loaders colliding with their subsequently loaded programs.
      
      For 64-bit, ELF_ET_DYN_BASE is best set to 4GB to allow runtimes to use
      the entire 32-bit address space for 32-bit pointers.
      
      Thanks to PaX Team, Daniel Micay, and Rik van Riel for inspiration and
      suggestions on how to implement this solution.
      
      Fixes: d1fd836d ("mm: split ET_DYN ASLR from mmap ASLR")
      Link: http://lkml.kernel.org/r/20170621173201.GA114489@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Daniel Micay <danielmicay@gmail.com>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Pratyush Anand <panand@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eab09532
  9. 07 7月, 2017 1 次提交
  10. 05 7月, 2017 3 次提交
    • C
      x86/boot/e820: Introduce the bootloader provided e820_table_firmware[] table · 12df216c
      Chen Yu 提交于
      Add the real e820_tabel_firmware[] that will not be modified by the kernel
      or the EFI boot stub under any circumstance.
      
      In addition to that modify the code so that e820_table_firmwarep[] is
      exposed via sysfs to represent the real firmware memory layout,
      rather than exposing the e820_table_kexec[] table.
      
      This fixes a hibernation bug/warning, which uses e820_table_kexec[] to check
      RAM layout consistency across hibernation/resume:
      
        The suspend kernel:
        [    0.000000] e820: update [mem 0x76671018-0x76679457] usable ==> usable
      
        The resume kernel:
        [    0.000000] e820: update [mem 0x7666f018-0x76677457] usable ==> usable
        ...
        [   15.752088] PM: Using 3 thread(s) for decompression.
        [   15.752088] PM: Loading and decompressing image data (471870 pages)...
        [   15.764971] Hibernate inconsistent memory map detected!
        [   15.770833] PM: Image mismatch: architecture specific data
      
      Actually it is safe to restore these pages because E820_TYPE_RAM and
      E820_TYPE_RESERVED_KERN are treated the same during hibernation, so
      the original e820 table provided by the bootloader is used for
      hibernation MD5 fingerprint checking.
      
      The side effect is that, this newly introduced variable might increase the
      kernel size at compile time.
      Suggested-by: NIngo Molnar <mingo@redhat.com>
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Xunlei Pang <xlpang@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      12df216c
    • C
      x86/boot/e820: Rename the e820_table_firmware to e820_table_kexec · a09bae0f
      Chen Yu 提交于
      Currently the e820_table_firmware[] table is mainly used by the kexec,
      and it is not what it's supposed to be - despite its name it might be
      modified by the kernel.
      
      So change its name to e820_table_kexec[]. In the next patch we will
      introduce the real e820_table_firmware[] table.
      
      No functional change.
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Xunlei Pang <xlpang@redhat.com>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a09bae0f
    • M
      x86/mm/pat: Don't report PAT on CPUs that don't support it · 99c13b8c
      Mikulas Patocka 提交于
      The pat_enabled() logic is broken on CPUs which do not support PAT and
      where the initialization code fails to call pat_init(). Due to that the
      enabled flag stays true and pat_enabled() returns true wrongfully.
      
      As a consequence the mappings, e.g. for Xorg, are set up with the wrong
      caching mode and the required MTRR setups are omitted.
      
      To cure this the following changes are required:
      
        1) Make pat_enabled() return true only if PAT initialization was
           invoked and successful.
      
        2) Invoke init_cache_modes() unconditionally in setup_arch() and
           remove the extra callsites in pat_disable() and the pat disabled
           code path in pat_init().
      
      Also rename __pat_enabled to pat_disabled to reflect the real purpose of
      this variable.
      
      Fixes: 9cd25aac ("x86/mm/pat: Emulate PAT when it is disabled")
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Bernhard Held <berny156@gmx.de>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: "Luis R. Rodriguez" <mcgrof@suse.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/r/alpine.LRH.2.02.1707041749300.3456@file01.intranet.prod.int.rdu2.redhat.com
      99c13b8c
  11. 04 7月, 2017 1 次提交
  12. 03 7月, 2017 3 次提交
  13. 01 7月, 2017 2 次提交
    • K
      randstruct: opt-out externally exposed function pointer structs · 8acdf505
      Kees Cook 提交于
      Some function pointer structures are used externally to the kernel, like
      the paravirt structures. These should never be randomized, so mark them
      as such, in preparation for enabling randstruct's automatic selection
      of all-function-pointer structures.
      
      These markings are verbatim from Brad Spengler/PaX Team's code in the
      last public patch of grsecurity/PaX based on my understanding of the
      code. Changes or omissions from the original code are mine and don't
      reflect the original grsecurity/PaX code.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      8acdf505
    • K
      randstruct: Mark various structs for randomization · 3859a271
      Kees Cook 提交于
      This marks many critical kernel structures for randomization. These are
      structures that have been targeted in the past in security exploits, or
      contain functions pointers, pointers to function pointer tables, lists,
      workqueues, ref-counters, credentials, permissions, or are otherwise
      sensitive. This initial list was extracted from Brad Spengler/PaX Team's
      code in the last public patch of grsecurity/PaX based on my understanding
      of the code. Changes or omissions from the original code are mine and
      don't reflect the original grsecurity/PaX code.
      
      Left out of this list is task_struct, which requires special handling
      and will be covered in a subsequent patch.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      3859a271
  14. 29 6月, 2017 2 次提交
  15. 28 6月, 2017 4 次提交
  16. 24 6月, 2017 1 次提交