1. 31 10月, 2016 7 次提交
    • T
      x86/intel_rdt: Add schemata file · 60ec2440
      Tony Luck 提交于
      Last of the per resource group files. Also mode 0644. This one shows
      the resources available to the group. Syntax depends on whether the
      "cdp" mount option was given. With code/data prioritization disabled
      it is simply a list of masks for each cache domain. Initial value
      allows access to all of the L3 cache on all domains. E.g. on a 2 socket
      Broadwell:
              L3:0=fffff;1=fffff
      With CDP enabled, separate masks for data and instructions are provided:
              L3DATA:0=fffff;1=fffff
              L3CODE:0=fffff;1=fffff
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477692289-37412-9-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      60ec2440
    • F
      x86/intel_rdt: Add tasks files · e02737d5
      Fenghua Yu 提交于
      The root directory all subdirectories are automatically populated with a
      read/write (mode 0644) file named "tasks". When read it will show all the
      task IDs assigned to the resource group. Tasks can be added (one at a time)
      to a group by writing the task ID to the file.  E.g.
      
      Membership in a resource group is indicated by a new field in the
      task_struct "int closid" which holds the CLOSID for each task. The default
      resource group uses CLOSID=0 which means that all existing tasks when the
      resctrl file system is mounted belong to the default group.
      
      If a group is removed, tasks which are members of that group are moved to
      the default group.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477692289-37412-8-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e02737d5
    • T
      x86/intel_rdt: Add cpus file · 12e0110c
      Tony Luck 提交于
      Now we populate each directory with a read/write (mode 0644) file
      named "cpus". This is used to over-ride the resources available
      to processes in the default resource group when running on specific
      CPUs.  Each "cpus" file reads as a cpumask showing which CPUs belong
      to this resource group. Initially all online CPUs are assigned to
      the default group. They can be added to other groups by writing a
      cpumask to the "cpus" file in the directory for the resource group
      (which will remove them from the previous group to which they were
      assigned). CPU online/offline operations will delete CPUs that go
      offline from whatever group they are in and add new CPUs to the
      default group.
      
      If there are CPUs assigned to a group when the directory is removed,
      they are returned to the default group.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477692289-37412-7-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      12e0110c
    • F
      x86/intel_rdt: Add mkdir to resctrl file system · 60cf5e10
      Fenghua Yu 提交于
      Resource control groups are represented as directories in the resctrl
      file system. The root directory describes the default resources available
      to tasks that have not been assigned specific resources. Other directories
      can be created at the root level to make new resource groups. It is not
      permitted to make directories within other directories.
      
      Hardware uses a CLOSID (Class of service ID) to determine which resource
      limits are currently in effect. The exact number available is enumerated
      by CPUID leaf 0x10, but on current implementations it is a small number.
      We implement a simple bitmask allocator for CLOSIDs.
      
      Each resource control group uses one CLOSID, which limits the total number
      of directories that can be created.
      
      Resource groups can be removed using rmdir.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477692289-37412-6-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      60cf5e10
    • F
      x86/intel_rdt: Add "info" files to resctrl file system · 4e978d06
      Fenghua Yu 提交于
      For the convenience of applications we make the decoded values of some
      of the CPUID values available in read-only (0444) files.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477692289-37412-5-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4e978d06
    • F
      x86/intel_rdt: Add basic resctrl filesystem support · 5ff193fb
      Fenghua Yu 提交于
      Use kernfs as basis for our user interface filesystem. This patch
      supports mount/umount, and one mount parameter "cdp" to enable code/data
      prioritization (though all we do at this point is ensure that the system
      can support CDP).  The file system is not populated yet in this patch.
      
      [ tglx: Fixed up a few nits and added cdp handling in case of error ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477692289-37412-4-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      5ff193fb
    • T
      x86/intel_rdt: Build structures for each resource based on cache topology · 2264d9c7
      Tony Luck 提交于
      We use the cpu hotplug notifier to catch each cpu in turn and look at
      its cache topology w.r.t each of the resource groups. As we discover
      new resources, we initialize the bitmask array for each to the default
      (full access) value.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477692289-37412-3-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2264d9c7
  2. 27 10月, 2016 5 次提交
    • F
      x86/intel_rdt: Pick up L3/L2 RDT parameters from CPUID · c1c7c3f9
      Fenghua Yu 提交于
      Define struct rdt_resource to hold all the parameterized values for an RDT
      resource and fill in the CPUID enumerated values from leaf 0x10 if
      available. Hard code them for the MSR detected Haswells.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477142405-32078-9-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      c1c7c3f9
    • F
      x86/intel_rdt: Add Haswell feature discovery · 113c6097
      Fenghua Yu 提交于
      Some Haswell generation CPUs support RDT, but they don't enumerate this via
      CPUID.  Use rdmsr_safe() and wrmsr_safe() to probe the MSRs on cpu model 63
      (INTEL_FAM6_HASWELL_X)
      
      Move the relevant defines into a common header file which is shared between
      RDT/CQM and RDT/Allocation to avoid duplication.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477142405-32078-8-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      113c6097
    • F
      x86/intel_rdt: Add CONFIG, Makefile, and basic initialization · 78e99b4a
      Fenghua Yu 提交于
      Introduce CONFIG_INTEL_RDT_A (default: no, dependent on CPU_SUP_INTEL) to
      control inclusion of Resource Director Technology in the build.
      
      Simple init() routine just checks which features are present. If they are
      pr_info() one line summary for each feature for now.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477142405-32078-7-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      78e99b4a
    • F
      x86/cpufeature: Add RDT CPUID feature bits · 4ab15864
      Fenghua Yu 提交于
      Check CPUID leaves for all the Resource Director Technology (RDT)
      Cache Allocation Technology (CAT) bits.
      
      Presence of allocation features:
        CPUID.(EAX=7H, ECX=0):EBX[bit 15]	X86_FEATURE_RDT_A
      
      L2 and L3 caches are each separately enabled:
        CPUID.(EAX=10H, ECX=0):EBX[bit 1]	X86_FEATURE_CAT_L3
        CPUID.(EAX=10H, ECX=0):EBX[bit 2]	X86_FEATURE_CAT_L2
      
      L3 cache may support independent control of allocation for
      code and data (CDP = Code/Data Prioritization):
        CPUID.(EAX=10H, ECX=1):ECX[bit 2]	X86_FEATURE_CDP_L3
      
      [ tglx: Fixed up Borislavs comments and moved the feature bits into a gap ]
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Acked-by: N"Borislav Petkov" <bp@suse.de>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477142405-32078-5-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4ab15864
    • F
      x86/intel_cacheinfo: Enable cache id in cache info · d57e3ab7
      Fenghua Yu 提交于
      Cache id is retrieved from APIC ID and CPUID leaf 4 on x86.
      
      For more details please see the section on "Cache ID Extraction
      Parameters" in "Intel 64 Architecture Processor Topology Enumeration".
      
      Also the documentation of the CPUID instruction in the "Intel 64 and
      IA-32 Architectures Software Developer's Manual"
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Cc: "Ravi V Shankar" <ravi.v.shankar@intel.com>
      Cc: "Tony Luck" <tony.luck@intel.com>
      Cc: "David Carrillo-Cisneros" <davidcc@google.com>
      Cc: "Sai Prakhya" <sai.praneeth.prakhya@intel.com>
      Cc: "Peter Zijlstra" <peterz@infradead.org>
      Cc: "Stephane Eranian" <eranian@google.com>
      Cc: "Dave Hansen" <dave.hansen@intel.com>
      Cc: "Shaohua Li" <shli@fb.com>
      Cc: "Nilay Vaish" <nilayvaish@gmail.com>
      Cc: "Vikas Shivappa" <vikas.shivappa@linux.intel.com>
      Cc: "Ingo Molnar" <mingo@elte.hu>
      Cc: "Borislav Petkov" <bp@suse.de>
      Cc: "H. Peter Anvin" <h.peter.anvin@intel.com>
      Link: http://lkml.kernel.org/r/1477142405-32078-4-git-send-email-fenghua.yu@intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d57e3ab7
  3. 22 10月, 2016 1 次提交
    • V
      x86/boot/smp: Don't try to poke disabled/non-existent APIC · ff856051
      Ville Syrjälä 提交于
      Apparently trying to poke a disabled or non-existent APIC
      leads to a box that doesn't even boot. Let's not do that.
      
      No real clue if this is the right fix, but at least my
      P3 machine boots again.
      Signed-off-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: dyoung@redhat.com
      Cc: kexec@lists.infradead.org
      Cc: stable@vger.kernel.org
      Fixes: 2a51fe08 ("arch/x86: Handle non enumerated CPU after physical hotplug")
      Link: http://lkml.kernel.org/r/1477102684-5092-1-git-send-email-ville.syrjala@linux.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ff856051
  4. 20 10月, 2016 1 次提交
  5. 19 10月, 2016 3 次提交
  6. 16 10月, 2016 3 次提交
    • D
      x86/e820: Don't merge consecutive E820_PRAM ranges · 23446cb6
      Dan Williams 提交于
      Commit:
      
        917db484 ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")
      
      ... fixed up the broken manipulations of max_pfn in the presence of
      E820_PRAM ranges.
      
      However, it also broke the sanitize_e820_map() support for not merging
      E820_PRAM ranges.
      
      Re-introduce the enabling to keep resource boundaries between
      consecutive defined ranges. Otherwise, for example, an environment that
      boots with memmap=2G!8G,2G!10G will end up with a single 4G /dev/pmem0
      device instead of a /dev/pmem0 and /dev/pmem1 device 2G in size.
      Reported-by: NDave Chinner <david@fromorbit.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Zhang Yi <yizhan@redhat.com>
      Cc: linux-nvdimm@lists.01.org
      Fixes: 917db484 ("x86/boot: Fix kdump, cleanup aborted E820_PRAM max_pfn manipulation")
      Link: http://lkml.kernel.org/r/147629530854.10618.10383744751594021268.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      23446cb6
    • D
      kprobes: Unpoison stack in jprobe_return() for KASAN · 9f7d416c
      Dmitry Vyukov 提交于
      I observed false KSAN positives in the sctp code, when
      sctp uses jprobe_return() in jsctp_sf_eat_sack().
      
      The stray 0xf4 in shadow memory are stack redzones:
      
      [     ] ==================================================================
      [     ] BUG: KASAN: stack-out-of-bounds in memcmp+0xe9/0x150 at addr ffff88005e48f480
      [     ] Read of size 1 by task syz-executor/18535
      [     ] page:ffffea00017923c0 count:0 mapcount:0 mapping:          (null) index:0x0
      [     ] flags: 0x1fffc0000000000()
      [     ] page dumped because: kasan: bad access detected
      [     ] CPU: 1 PID: 18535 Comm: syz-executor Not tainted 4.8.0+ #28
      [     ] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      [     ]  ffff88005e48f2d0 ffffffff82d2b849 ffffffff0bc91e90 fffffbfff10971e8
      [     ]  ffffed000bc91e90 ffffed000bc91e90 0000000000000001 0000000000000000
      [     ]  ffff88005e48f480 ffff88005e48f350 ffffffff817d3169 ffff88005e48f370
      [     ] Call Trace:
      [     ]  [<ffffffff82d2b849>] dump_stack+0x12e/0x185
      [     ]  [<ffffffff817d3169>] kasan_report+0x489/0x4b0
      [     ]  [<ffffffff817d31a9>] __asan_report_load1_noabort+0x19/0x20
      [     ]  [<ffffffff82d49529>] memcmp+0xe9/0x150
      [     ]  [<ffffffff82df7486>] depot_save_stack+0x176/0x5c0
      [     ]  [<ffffffff817d2031>] save_stack+0xb1/0xd0
      [     ]  [<ffffffff817d27f2>] kasan_slab_free+0x72/0xc0
      [     ]  [<ffffffff817d05b8>] kfree+0xc8/0x2a0
      [     ]  [<ffffffff85b03f19>] skb_free_head+0x79/0xb0
      [     ]  [<ffffffff85b0900a>] skb_release_data+0x37a/0x420
      [     ]  [<ffffffff85b090ff>] skb_release_all+0x4f/0x60
      [     ]  [<ffffffff85b11348>] consume_skb+0x138/0x370
      [     ]  [<ffffffff8676ad7b>] sctp_chunk_put+0xcb/0x180
      [     ]  [<ffffffff8676ae88>] sctp_chunk_free+0x58/0x70
      [     ]  [<ffffffff8677fa5f>] sctp_inq_pop+0x68f/0xef0
      [     ]  [<ffffffff8675ee36>] sctp_assoc_bh_rcv+0xd6/0x4b0
      [     ]  [<ffffffff8677f2c1>] sctp_inq_push+0x131/0x190
      [     ]  [<ffffffff867bad69>] sctp_backlog_rcv+0xe9/0xa20
      [ ... ]
      [     ] Memory state around the buggy address:
      [     ]  ffff88005e48f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ]  ffff88005e48f400: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ] >ffff88005e48f480: f4 f4 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ]                    ^
      [     ]  ffff88005e48f500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ]  ffff88005e48f580: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      [     ] ==================================================================
      
      KASAN stack instrumentation poisons stack redzones on function entry
      and unpoisons them on function exit. If a function exits abnormally
      (e.g. with a longjmp like jprobe_return()), stack redzones are left
      poisoned. Later this leads to random KASAN false reports.
      
      Unpoison stack redzones in the frames we are going to jump over
      before doing actual longjmp in jprobe_return().
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Reviewed-by: NMark Rutland <mark.rutland@arm.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@kernel.org>
      Cc: kasan-dev@googlegroups.com
      Cc: surovegin@google.com
      Cc: rostedt@goodmis.org
      Link: http://lkml.kernel.org/r/1476454043-101898-1-git-send-email-dvyukov@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9f7d416c
    • D
      kprobes: Avoid false KASAN reports during stack copy · 9254139a
      Dmitry Vyukov 提交于
      Kprobes save and restore raw stack chunks with memcpy().
      With KASAN these chunks can contain poisoned stack redzones,
      as the result memcpy() interceptor produces false
      stack out-of-bounds reports.
      
      Use __memcpy() instead of memcpy() for stack copying.
      __memcpy() is not instrumented by KASAN and does not lead
      to the false reports.
      
      Currently there is a spew of KASAN reports during boot
      if CONFIG_KPROBES_SANITY_TEST is enabled:
      
      [   ] Kprobe smoke test: started
      [   ] ==================================================================
      [   ] BUG: KASAN: stack-out-of-bounds in setjmp_pre_handler+0x17c/0x280 at addr ffff88085259fba8
      [   ] Read of size 64 by task swapper/0/1
      [   ] page:ffffea00214967c0 count:0 mapcount:0 mapping:          (null) index:0x0
      [   ] flags: 0x2fffff80000000()
      [   ] page dumped because: kasan: bad access detected
      [...]
      Reported-by: NCAI Qian <caiqian@redhat.com>
      Tested-by: NCAI Qian <caiqian@redhat.com>
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: kasan-dev@googlegroups.com
      [ Improved various details. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9254139a
  7. 14 10月, 2016 1 次提交
    • W
      x86/smp: Add irq_enter/exit() in smp_reschedule_interrupt() · 1ec6ec14
      Wanpeng Li 提交于
       ===============================
       [ INFO: suspicious RCU usage. ]
       4.8.0+ #24 Not tainted
       -------------------------------
       ./arch/x86/include/asm/msr-trace.h:47 suspicious rcu_dereference_check() usage!
       
       other info that might help us debug this:
       
       RCU used illegally from idle CPU!
       rcu_scheduler_active = 1, debug_locks = 0
       RCU used illegally from extended quiescent state!
       no locks held by swapper/1/0.
       
        [<ffffffff9d492b95>] do_trace_write_msr+0x135/0x140
        [<ffffffff9d06f860>] native_write_msr+0x20/0x30
        [<ffffffff9d065fad>] native_apic_msr_eoi_write+0x1d/0x30
        [<ffffffff9d05bd1d>] smp_reschedule_interrupt+0x1d/0x30
        [<ffffffff9d8daec6>] reschedule_interrupt+0x96/0xa0
      
      Reschedule interrupt may be called in cpu idle state. This causes lockdep 
      check warning above. 
      
      Add irq_enter/exit() in smp_reschedule_interrupt(), irq_enter() tells the RCU 
      subsystems to end the extended quiescent state, so the following trace call in 
      ack_APIC_irq() works correctly.
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Link: http://lkml.kernel.org/r/1476409733-5133-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      1ec6ec14
  8. 12 10月, 2016 3 次提交
  9. 08 10月, 2016 4 次提交
    • T
      x86/apic: Prevent pointless warning messages · df610d67
      Thomas Gleixner 提交于
      Markus reported that he sees new warnings:
      
        APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 4/0x84 ignored.
        APIC: NR_CPUS/possible_cpus limit of 4 reached.  Processor 5/0x85 ignored.
      
      This comes from the recent persistant cpuid - nodeid changes. The code
      which emits the warning has been called prior to these changes only for
      enabled processors. Now it's called for disabled processors as well to get
      the possible cpu accounting correct. So if the kernel is compiled for the
      number of actual available/enabled CPUs and the BIOS reports disabled CPUs
      as well then the above warnings are printed.
      
      That's a pointless exercise as it only makes sense if there are more CPUs
      enabled than the kernel supports.
      
      Nake the warning conditional on enabled processors so we are back to the
      state before these changes.
      
      Fixes: 8f54969d ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping") 
      Reported-and-tested-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Cc: Dou Liyang <douly.fnst@cn.fujitsu.com>
      Cc: linux-acpi@vger.kernel.org
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1610071549330.19804@nanosSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      df610d67
    • T
      x86/acpi: Prevent LAPIC id 0xff from being accounted · f3bf1dbe
      Thomas Gleixner 提交于
      Yinghai reported that the recent changes to make the cpuid - nodeid
      relationship permanent causes a cpuid ordering regression on a system which
      has 2apic enabled..
      
      The reason is that the ACPI local APIC parser has no sanity check for
      apicid 0xff, which is an invalid id. So a CPU id for this invalid local
      APIC id is allocated and therefor breaks the cpuid ordering.
      
      Add a sanity check to acpi_parse_lapic() which ignores the invalid id.
      
      Fixes: 8f54969d ("x86/acpi: Introduce persistent storage for cpuid <-> apicid mapping")
      Reported-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Gu Zheng <guz.fnst@cn.fujitsu.com>,
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: douly.fnst@cn.fujitsu.com,
      Cc: zhugh.fnst@cn.fujitsu.com
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Lv Zheng <lv.zheng@intel.com>,
      Cc: robert.moore@intel.com
      Cc: linux-acpi@vger.kernel.org
      Link: https://lkml.kernel.org/r/CAE9FiQVQx6FRXT-RdR7Crz4dg5LeUWHcUSy1KacjR+JgU_vGJg@mail.gmail.com
      f3bf1dbe
    • C
      nmi_backtrace: generate one-line reports for idle cpus · 6727ad9e
      Chris Metcalf 提交于
      When doing an nmi backtrace of many cores, most of which are idle, the
      output is a little overwhelming and very uninformative.  Suppress
      messages for cpus that are idling when they are interrupted and just
      emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
      
      We do this by grouping all the cpuidle code together into a new
      .cpuidle.text section, and then checking the address of the interrupted
      PC to see if it lies within that section.
      
      This commit suitably tags x86 and tile idle routines, and only adds in
      the minimal framework for other architectures.
      
      Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Tested-by: NPetr Mladek <pmladek@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6727ad9e
    • C
      nmi_backtrace: add more trigger_*_cpu_backtrace() methods · 9a01c3ed
      Chris Metcalf 提交于
      Patch series "improvements to the nmi_backtrace code" v9.
      
      This patch series modifies the trigger_xxx_backtrace() NMI-based remote
      backtracing code to make it more flexible, and makes a few small
      improvements along the way.
      
      The motivation comes from the task isolation code, where there are
      scenarios where we want to be able to diagnose a case where some cpu is
      about to interrupt a task-isolated cpu.  It can be helpful to see both
      where the interrupting cpu is, and also an approximation of where the
      cpu that is being interrupted is.  The nmi_backtrace framework allows us
      to discover the stack of the interrupted cpu.
      
      I've tested that the change works as desired on tile, and build-tested
      x86, arm, mips, and sparc64.  For x86 I confirmed that the generic
      cpuidle stuff as well as the architecture-specific routines are in the
      new cpuidle section.  For arm, mips, and sparc I just build-tested it
      and made sure the generic cpuidle routines were in the new cpuidle
      section, but I didn't attempt to figure out which the platform-specific
      idle routines might be.  That might be more usefully done by someone
      with platform experience in follow-up patches.
      
      This patch (of 4):
      
      Currently you can only request a backtrace of either all cpus, or all
      cpus but yourself.  It can also be helpful to request a remote backtrace
      of a single cpu, and since we want that, the logical extension is to
      support a cpumask as the underlying primitive.
      
      This change modifies the existing lib/nmi_backtrace.c code to take a
      cpumask as its basic primitive, and modifies the linux/nmi.h code to use
      the new "cpumask" method instead.
      
      The existing clients of nmi_backtrace (arm and x86) are converted to
      using the new cpumask approach in this change.
      
      The other users of the backtracing API (sparc64 and mips) are converted
      to use the cpumask approach rather than the all/allbutself approach.
      The mips code ignored the "include_self" boolean but with this change it
      will now also dump a local backtrace if requested.
      
      Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a01c3ed
  10. 07 10月, 2016 1 次提交
    • P
      arch/x86: Handle non enumerated CPU after physical hotplug · 2a51fe08
      Prarit Bhargava 提交于
      When a CPU is physically added to a system then the MADT table is not
      updated.
      
      If subsequently a kdump kernel is started on that physically added CPU then
      the ACPI enumeration fails to provide the information for this CPU which is
      now the boot CPU of the kdump kernel.
      
      As a consequence, generic_processor_info() is not invoked for that CPU so
      the number of enumerated processors is 0 and none of the initializations,
      including the logical package id management, are performed.
      
      We have code which relies on the correctness of the logical package map and
      other information which is initialized via generic_processor_info().
      Executing such code will result in undefined behaviour or kernel crashes.
      
      This problem applies only to the kdump kernel because a normal kexec will
      switch to the original boot CPU, which is enumerated in MADT, before
      jumping into the kexec kernel.
      
      The boot code already has a check for num_processors equal 0 in
      prefill_possible_map(). We can use that check as an indicator that the
      enumeration of the boot CPU did not happen and invoke generic_processor_info()
      for it. That initializes the relevant data for the boot CPU and therefore
      prevents subsequent failure.
      
      [ tglx: Refined the code and rewrote the changelog ]
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Fixes: 1f12e32f ("x86/topology: Create logical package id")
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: dyoung@redhat.com
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: kexec@lists.infradead.org
      Link: http://lkml.kernel.org/r/1475514432-27682-1-git-send-email-prarit@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      2a51fe08
  11. 06 10月, 2016 1 次提交
    • J
      x86/unwind: Fix oprofile module link error · cfee9edd
      Josh Poimboeuf 提交于
      When compiling on x86 with CONFIG_OPROFILE=m and CONFIG_FRAME_POINTER=n,
      the oprofile module fails to link:
      
        ERROR: ftrace_graph_ret_addr" [arch/x86/oprofile/oprofile.ko] undefined!
      
      The problem was introduced when oprofile was converted to use the new
      x86 unwinder.  When frame pointers are disabled, the "guess" unwinder's
      unwind_get_return_address() is an inline function which calls
      ftrace_graph_ret_addr(), which is not exported.
      
      Fix it by converting the "guess" version of unwind_get_return_address()
      to an exported out-of-line function, just like its frame pointer
      counterpart.
      Reported-by: NKarl Beldan <karl.beldan@gmail.com>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: ec2ad9cc ("oprofile/x86: Convert x86_backtrace() to use the new unwinder")
      Link: http://lkml.kernel.org/r/be08d589f6474df78364e081c42777e382af9352.1475731632.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      cfee9edd
  12. 05 10月, 2016 1 次提交
  13. 04 10月, 2016 1 次提交
    • M
      x86/irq: Prevent force migration of irqs which are not in the vector domain · db91aa79
      Mika Westerberg 提交于
      When a CPU is about to be offlined we call fixup_irqs() that resets IRQ
      affinities related to the CPU in question. The same thing is also done when
      the system is suspended to S-states like S3 (mem).
      
      For each IRQ we try to complete any on-going move regardless whether the
      IRQ is actually part of x86_vector_domain. For each IRQ descriptor we fetch
      its chip_data, assume it is of type struct apic_chip_data and manipulate it
      by clearing old_domain mask etc. For irq_chips that are not part of the
      x86_vector_domain, like those created by various GPIO drivers, will find
      their chip_data being changed unexpectly.
      
      Below is an example where GPIO chip owned by pinctrl-sunrisepoint.c gets
      corrupted after resume:
      
        # cat /sys/kernel/debug/gpio
        gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
         gpio-511 (                    |sysfs               ) in  hi
      
        # rtcwake -s10 -mmem
        <10 seconds passes>
      
        # cat /sys/kernel/debug/gpio
        gpiochip0: GPIOs 360-511, parent: platform/INT344B:00, INT344B:00:
         gpio-511 (                    |sysfs               ) in  ?
      
      Note '?' in the output. It means the struct gpio_chip ->get function is
      NULL whereas before suspend it was there.
      
      Fix this by first checking that the IRQ belongs to x86_vector_domain before
      we try to use the chip_data as struct apic_chip_data.
      Reported-and-tested-by: NSakari Ailus <sakari.ailus@linux.intel.com>
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: stable@vger.kernel.org # 4.4+
      Link: http://lkml.kernel.org/r/20161003101708.34795-1-mika.westerberg@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      db91aa79
  14. 30 9月, 2016 5 次提交
  15. 27 9月, 2016 1 次提交
  16. 22 9月, 2016 2 次提交