1. 13 11月, 2016 1 次提交
    • L
      x86/efi: Retrieve and assign Apple device properties · 58c5475a
      Lukas Wunner 提交于
      Apple's EFI drivers supply device properties which are needed to support
      Macs optimally. They contain vital information which cannot be obtained
      any other way (e.g. Thunderbolt Device ROM). They're also used to convey
      the current device state so that OS drivers can pick up where EFI
      drivers left (e.g. GPU mode setting).
      
      There's an EFI driver dubbed "AAPL,PathProperties" which implements a
      per-device key/value store. Other EFI drivers populate it using a custom
      protocol. The macOS bootloader /System/Library/CoreServices/boot.efi
      retrieves the properties with the same protocol. The kernel extension
      AppleACPIPlatform.kext subsequently merges them into the I/O Kit
      registry (see ioreg(8)) where they can be queried by other kernel
      extensions and user space.
      
      This commit extends the efistub to retrieve the device properties before
      ExitBootServices is called. It assigns them to devices in an fs_initcall
      so that they can be queried with the API in <linux/property.h>.
      
      Note that the device properties will only be available if the kernel is
      booted with the efistub. Distros should adjust their installers to
      always use the efistub on Macs. grub with the "linux" directive will not
      work unless the functionality of this commit is duplicated in grub.
      (The "linuxefi" directive should work but is not included upstream as of
      this writing.)
      
      The custom protocol has GUID 91BD12FE-F6C3-44FB-A5B7-5122AB303AE0 and
      looks like this:
      
      typedef struct {
      	unsigned long version; /* 0x10000 */
      	efi_status_t (*get) (
      		IN	struct apple_properties_protocol *this,
      		IN	struct efi_dev_path *device,
      		IN	efi_char16_t *property_name,
      		OUT	void *buffer,
      		IN OUT	u32 *buffer_len);
      		/* EFI_SUCCESS, EFI_NOT_FOUND, EFI_BUFFER_TOO_SMALL */
      	efi_status_t (*set) (
      		IN	struct apple_properties_protocol *this,
      		IN	struct efi_dev_path *device,
      		IN	efi_char16_t *property_name,
      		IN	void *property_value,
      		IN	u32 property_value_len);
      		/* allocates copies of property name and value */
      		/* EFI_SUCCESS, EFI_OUT_OF_RESOURCES */
      	efi_status_t (*del) (
      		IN	struct apple_properties_protocol *this,
      		IN	struct efi_dev_path *device,
      		IN	efi_char16_t *property_name);
      		/* EFI_SUCCESS, EFI_NOT_FOUND */
      	efi_status_t (*get_all) (
      		IN	struct apple_properties_protocol *this,
      		OUT	void *buffer,
      		IN OUT	u32 *buffer_len);
      		/* EFI_SUCCESS, EFI_BUFFER_TOO_SMALL */
      } apple_properties_protocol;
      
      Thanks to Pedro Vilaça for this blog post which was helpful in reverse
      engineering Apple's EFI drivers and bootloader:
      https://reverse.put.as/2016/06/25/apple-efi-firmware-passwords-and-the-scbo-myth/
      
      If someone at Apple is reading this, please note there's a memory leak
      in your implementation of the del() function as the property struct is
      freed but the name and value allocations are not.
      
      Neither the macOS bootloader nor Apple's EFI drivers check the protocol
      version, but we do to avoid breakage if it's ever changed. It's been the
      same since at least OS X 10.6 (2009).
      
      The get_all() function conveniently fills a buffer with all properties
      in marshalled form which can be passed to the kernel as a setup_data
      payload. The number of device properties is dynamic and can change
      between a first invocation of get_all() (to determine the buffer size)
      and a second invocation (to retrieve the actual buffer), hence the
      peculiar loop which does not finish until the buffer size settles.
      The macOS bootloader does the same.
      
      The setup_data payload is later on unmarshalled in an fs_initcall. The
      idea is that most buses instantiate devices in "subsys" initcall level
      and drivers are usually bound to these devices in "device" initcall
      level, so we assign the properties in-between, i.e. in "fs" initcall
      level.
      
      This assumes that devices to which properties pertain are instantiated
      from a "subsys" initcall or earlier. That should always be the case
      since on macOS, AppleACPIPlatformExpert::matchEFIDevicePath() only
      supports ACPI and PCI nodes and we've fully scanned those buses during
      "subsys" initcall level.
      
      The second assumption is that properties are only needed from a "device"
      initcall or later. Seems reasonable to me, but should this ever not work
      out, an alternative approach would be to store the property sets e.g. in
      a btree early during boot. Then whenever device_add() is called, an EFI
      Device Path would have to be constructed for the newly added device,
      and looked up in the btree. That way, the property set could be assigned
      to the device immediately on instantiation. And this would also work for
      devices instantiated in a deferred fashion. It seems like this approach
      would be more complicated and require more code. That doesn't seem
      justified without a specific use case.
      
      For comparison, the strategy on macOS is to assign properties to objects
      in the ACPI namespace (AppleACPIPlatformExpert::mergeEFIProperties()).
      That approach is definitely wrong as it fails for devices not present in
      the namespace: The NHI EFI driver supplies properties for attached
      Thunderbolt devices, yet on Macs with Thunderbolt 1 only one device
      level behind the host controller is described in the namespace.
      Consequently macOS cannot assign properties for chained devices. With
      Thunderbolt 2 they started to describe three device levels behind host
      controllers in the namespace but this grossly inflates the SSDT and
      still fails if the user daisy-chained more than three devices.
      
      We copy the property names and values from the setup_data payload to
      swappable virtual memory and afterwards make the payload available to
      the page allocator. This is just for the sake of good housekeeping, it
      wouldn't occupy a meaningful amount of physical memory (4444 bytes on my
      machine). Only the payload is freed, not the setup_data header since
      otherwise we'd break the list linkage and we cannot safely update the
      predecessor's ->next link because there's no locking for the list.
      
      The payload is currently not passed on to kexec'ed kernels, same for PCI
      ROMs retrieved by setup_efi_pci(). This can be added later if there is
      demand by amending setup_efi_state(). The payload can then no longer be
      made available to the page allocator of course.
      
      Tested-by: Lukas Wunner <lukas@wunner.de> [MacBookPro9,1]
      Tested-by: Pierre Moreau <pierre.morrow@free.fr> [MacBookPro11,3]
      Signed-off-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Cc: Andreas Noever <andreas.noever@gmail.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Pedro Vilaça <reverser@put.as>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: grub-devel@gnu.org
      Cc: linux-efi@vger.kernel.org
      Link: http://lkml.kernel.org/r/20161112213237.8804-9-matt@codeblueprint.co.ukSigned-off-by: NIngo Molnar <mingo@kernel.org>
      58c5475a
  2. 26 10月, 2016 1 次提交
    • J
      x86/dumpstack: Remove raw stack dump · 0ee1dd9f
      Josh Poimboeuf 提交于
      For mostly historical reasons, the x86 oops dump shows the raw stack
      values:
      
        ...
        [registers]
        Stack:
         ffff880079af7350 ffff880079905400 0000000000000000 ffffc900008f3ae0
         ffffffffa0196610 0000000000000001 00010000ffffffff 0000000087654321
         0000000000000002 0000000000000000 0000000000000000 0000000000000000
        Call Trace:
        ...
      
      This seems to be an artifact from long ago, and probably isn't needed
      anymore.  It generally just adds noise to the dump, and it can be
      actively harmful because it leaks kernel addresses.
      
      Linus says:
      
        "The stack dump actually goes back to forever, and it used to be
         useful back in 1992 or so. But it used to be useful mainly because
         stacks were simpler and we didn't have very good call traces anyway. I
         definitely remember having used them - I just do not remember having
         used them in the last ten+ years.
      
         Of course, it's still true that if you can trigger an oops, you've
         likely already lost the security game, but since the stack dump is so
         useless, let's aim to just remove it and make games like the above
         harder."
      
      This also removes the related 'kstack=' cmdline option and the
      'kstack_depth_to_print' sysctl.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/e83bd50df52d8fe88e94d2566426ae40d813bf8f.1477405374.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0ee1dd9f
  3. 25 10月, 2016 1 次提交
    • B
      x86/cpu: Get rid of the show_msr= boot option · 59c6f278
      Borislav Petkov 提交于
      It is useless as it dumps the MSRs too early BUT(!) we do set MSRs later too.
      Also, it dumps only BSP MSRs as it gets called only for CPU 0.
      
      And the MSR range array would need constant updating anyway, and so on
      and so on...
      
      Oh, and we have msr.ko and msr-tools which are the much better solution
      anyway. So off it goes...
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/20161024173844.23038-4-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      59c6f278
  4. 12 10月, 2016 2 次提交
    • M
      Input: i8042 - skip selftest on ASUS laptops · 930e1924
      Marcos Paulo de Souza 提交于
      On suspend/resume cycle, selftest is executed to reset i8042 controller.
      But when this is done in Asus devices, subsequent calls to detect/init
      functions to elantech driver fails. Skipping selftest fixes this problem.
      
      An easier step to reproduce this problem is adding i8042.reset=1 as a
      kernel parameter. On Asus laptops, it'll make the system to start with the
      touchpad already stuck, since psmouse_probe forcibly calls the selftest
      function.
      
      This patch was inspired by John Hiesey's change[1], but, since this problem
      affects a lot of models of Asus, let's avoid running selftests on them.
      
      All models affected by this problem:
      A455LD
      K401LB
      K501LB
      K501LX
      R409L
      V502LX
      X302LA
      X450LCP
      X450LD
      X455LAB
      X455LDB
      X455LF
      Z450LA
      
      [1]: https://marc.info/?l=linux-input&m=144312209020616&w=2
      
      Fixes: "ETPS/2 Elantech Touchpad dies after resume from suspend"
      (https://bugzilla.kernel.org/show_bug.cgi?id=107971)
      Signed-off-by: NMarcos Paulo de Souza <marcos.souza.org@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      930e1924
    • N
      lib/bitmap.c: enhance bitmap syntax · 2d13e6ca
      Noam Camus 提交于
      Today there are platforms with many CPUs (up to 4K).  Trying to boot only
      part of the CPUs may result in too long string.
      
      For example lets take NPS platform that is part of arch/arc.  This
      platform have SMP system with 256 cores each with 16 HW threads (SMT
      machine) where HW thread appears as CPU to the kernel.  In this example
      there is total of 4K CPUs.  When one tries to boot only part of the HW
      threads from each core the string representing the map may be long...  For
      example if for sake of performance we decided to boot only first half of
      HW threads of each core the map will look like:
      0-7,16-23,32-39,...,4080-4087
      
      This patch introduce new syntax to accommodate with such use case.  I
      added an optional postfix to a range of CPUs which will choose according
      to given modulo the desired range of reminders i.e.:
      
          <cpus range>:sed_size/group_size
      
      For example, above map can be described in new syntax like this:
      0-4095:8/16
      
      Note that this patch is backward compatible with current syntax.
      
      [akpm@linux-foundation.org: rework documentation]
      Link: http://lkml.kernel.org/r/1473579629-4283-1-git-send-email-noamca@mellanox.comSigned-off-by: NNoam Camus <noamca@mellanox.com>
      Cc: David Decotigny <decot@googlers.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Pan Xinhui <xinhui@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d13e6ca
  5. 27 9月, 2016 2 次提交
  6. 24 9月, 2016 1 次提交
    • S
      arm64: arch_timer: Work around QorIQ Erratum A-008585 · f6dc1576
      Scott Wood 提交于
      Erratum A-008585 says that the ARM generic timer counter "has the
      potential to contain an erroneous value for a small number of core
      clock cycles every time the timer value changes".  Accesses to TVAL
      (both read and write) are also affected due to the implicit counter
      read.  Accesses to CVAL are not affected.
      
      The workaround is to reread TVAL and count registers until successive
      reads return the same value.  Writes to TVAL are replaced with an
      equivalent write to CVAL.
      
      The workaround is to reread TVAL and count registers until successive reads
      return the same value, and when writing TVAL to retry until counter
      reads before and after the write return the same value.
      
      The workaround is enabled if the fsl,erratum-a008585 property is found in
      the timer node in the device tree.  This can be overridden with the
      clocksource.arm_arch_timer.fsl-a008585 boot parameter, which allows KVM
      users to enable the workaround until a mechanism is implemented to
      automatically communicate this information.
      
      This erratum can be found on LS1043A and LS2080A.
      Acked-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NScott Wood <oss@buserror.net>
      [will: renamed read macro to reflect that it's not usually unstable]
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      f6dc1576
  7. 20 9月, 2016 1 次提交
  8. 13 9月, 2016 1 次提交
  9. 09 9月, 2016 1 次提交
    • D
      x86/pkeys: Default to a restrictive init PKRU · acd547b2
      Dave Hansen 提交于
      PKRU is the register that lets you disallow writes or all access to a given
      protection key.
      
      The XSAVE hardware defines an "init state" of 0 for PKRU: its most
      permissive state, allowing access/writes to everything.  Since we start off
      all new processes with the init state, we start all processes off with the
      most permissive possible PKRU.
      
      This is unfortunate.  If a thread is clone()'d [1] before a program has
      time to set PKRU to a restrictive value, that thread will be able to write
      to all data, no matter what pkey is set on it.  This weakens any integrity
      guarantees that we want pkeys to provide.
      
      To fix this, we define a very restrictive PKRU to override the
      XSAVE-provided value when we create a new FPU context.  We choose a value
      that only allows access to pkey 0, which is as restrictive as we can
      practically make it.
      
      This does not cause any practical problems with applications using
      protection keys because we require them to specify initial permissions for
      each key when it is allocated, which override the restrictive default.
      
      In the end, this ensures that threads which do not know how to manage their
      own pkey rights can not do damage to data which is pkey-protected.
      
      I would have thought this was a pretty contrived scenario, except that I
      heard a bug report from an MPX user who was creating threads in some very
      early code before main().  It may be crazy, but folks evidently _do_ it.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: linux-arch@vger.kernel.org
      Cc: Dave Hansen <dave@sr71.net>
      Cc: mgorman@techsingularity.net
      Cc: arnd@arndb.de
      Cc: linux-api@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: luto@kernel.org
      Cc: akpm@linux-foundation.org
      Cc: torvalds@linux-foundation.org
      Link: http://lkml.kernel.org/r/20160729163021.F3C25D4A@viggo.jf.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      acd547b2
  10. 06 9月, 2016 1 次提交
  11. 05 9月, 2016 1 次提交
  12. 31 8月, 2016 1 次提交
  13. 26 8月, 2016 1 次提交
  14. 19 8月, 2016 1 次提交
    • B
      Update the maximum depth of C-state from 6 to 9 · 22c6bbe4
      baolex.ni 提交于
      Hi Jon,
      
      This patch is an old one, we have corrected some minor issues on the newer one.
      Please only review the newest version from my last mail with this subject
      "[PATCH] ACPI: Update the maximum depth of C-state from 6 to 9".
      And I also attached it to this mail.
      
      Thanks,
      Baole
      
      On 7/11/2016 6:37 AM, Jonathan Corbet wrote:
      > On Mon, 4 Jul 2016 09:55:10 +0800
      > "baolex.ni" <baolex.ni@intel.com> wrote:
      >
      >> Currently, CPUIDLE_STATE_MAX has been defined as 10 in the cpuidle head file,
      >> and max_cstate = CPUIDLE_STATE_MAX – 1, so 9 is the right maximum depth of C-state.
      >> This change is reflected in one place of the kernel-param file,
      >> but not in the other place where I suggest changing.
      >>
      >> Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
      >> Signed-off-by: Baole Ni <baolex.ni@intel.com>
      >
      > So why are there two signoffs on a single-line patch?  Which one of you
      > is the actual author?
      >
      > Thanks,
      >
      > jon
      >
      
      From cf5f8aa6885874f6490b11507d3c0c86fa0a11f4 Mon Sep 17 00:00:00 2001
      From: Chuansheng Liu <chuansheng.liu@intel.com>
      Date: Mon, 4 Jul 2016 08:52:51 +0800
      Subject: [PATCH] Update the maximum depth of C-state from 6 to 9
      MIME-Version: 1.0
      Content-Type: text/plain; charset=UTF-8
      Content-Transfer-Encoding: 8bit
      
      Currently, CPUIDLE_STATE_MAX has been defined as 10 in the cpuidle head file,
      and max_cstate = CPUIDLE_STATE_MAX – 1, so 9 is the right maximum depth of C-state.
      This change is reflected in one place of the kernel-param file,
      but not in the other place where I suggest changing.
      Signed-off-by: NChuansheng Liu <chuansheng.liu@intel.com>
      Signed-off-by: NBaole Ni <baolex.ni@intel.com>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      22c6bbe4
  15. 10 8月, 2016 1 次提交
    • M
      PCI: Update "pci=resource_alignment" documentation · 8b078c60
      Mathias Koehrer 提交于
      Some uio based PCI drivers, e.g., uio_cif, do not work if the assigned PCI
      memory resources are not page aligned.  By using the kernel option
      "pci=resource_alignment=<align>@<bus>:<slot>.<func>" it is possible to
      request page alignment for memory resources of devices.
      
      However, this is cumbersome when using several devices, and the
      bus/slot/func addresses may change if devices are added to or removed from
      the system.
      
      Extend the "pci=resource_alignment" option so we can specify the relevant
      devices via PCI vendor, device, subvendor, and subdevice IDs.  The
      specification of the devices via IDs is indicated by a leading string
      "pci:" as argument to "pci=resource_alignment".
      
      The format of the specification is
        pci:<vendor>:<device>[:<subvendor>:<subdevice>]
      
      Examples:
        pci=resource_alignment=4096@pci:8086:9c22:103c:198f
        pci=resource_alignment=pci:8086:9c22       # defaults to PAGE_SIZE align
      
      [bhelgaas: changelog, use actual vendor/device IDs in examples]
      Signed-off-by: NMathias Koehrer <mathias.koehrer@etas.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      8b078c60
  16. 04 8月, 2016 2 次提交
    • P
      modules: Add kernel parameter to blacklist modules · be7de5f9
      Prarit Bhargava 提交于
      Blacklisting a module in linux has long been a problem.  The current
      procedure is to use rd.blacklist=module_name, however, that doesn't
      cover the case after the initramfs and before a boot prompt (where one
      is supposed to use /etc/modprobe.d/blacklist.conf to blacklist
      runtime loading). Using rd.shell to get an early prompt is hit-or-miss,
      and doesn't cover all situations AFAICT.
      
      This patch adds this functionality of permanently blacklisting a module
      by its name via the kernel parameter module_blacklist=module_name.
      
      [v2]: Rusty, use core_param() instead of __setup() which simplifies
      things.
      
      [v3]: Rusty, undo wreckage from strsep()
      
      [v4]: Rusty, simpler version of blacklisted()
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: linux-doc@vger.kernel.org
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      be7de5f9
    • S
      Documenation: update cgroup's document path · 09c3bcce
      seokhoon.yoon 提交于
      cgroup's document path is changed to "cgroup-v1". update it.
      Signed-off-by: Nseokhoon.yoon <iamyooon@gmail.com>
      Signed-off-by: NJonathan Corbet <corbet@lwn.net>
      09c3bcce
  17. 03 8月, 2016 1 次提交
    • B
      printk: add kernel parameter to control writes to /dev/kmsg · 750afe7b
      Borislav Petkov 提交于
      Add a "printk.devkmsg" kernel command line parameter which controls how
      userspace writes into /dev/kmsg.  It has three options:
      
       * ratelimit - ratelimit logging from userspace.
       * on  - unlimited logging from userspace
       * off - logging from userspace gets ignored
      
      The default setting is to ratelimit the messages written to it.
      
      This changes the kernel default setting of "on" to "ratelimit" and we do
      that because we want to keep userspace spamming /dev/kmsg to sane
      levels.  This is especially moot when a small kernel log buffer wraps
      around and messages get lost.  So the ratelimiting setting should be a
      sane setting where kernel messages should have a bit higher chance of
      survival from all the spamming.
      
      It additionally does not limit logging to /dev/kmsg while the system is
      booting if we haven't disabled it on the command line.
      
      Furthermore, we can control the logging from a lower priority sysctl
      interface - kernel.printk_devkmsg.
      
      That interface will succeed only if printk.devkmsg *hasn't* been
      supplied on the command line.  If it has, then printk.devkmsg is a
      one-time setting which remains for the duration of the system lifetime.
      This "locking" of the setting is to prevent userspace from changing the
      logging on us through sysctl(2).
      
      This patch is based on previous patches from Linus and Steven.
      
      [bp@suse.de: fixes]
        Link: http://lkml.kernel.org/r/20160719072344.GC25563@nazgul.tnic
      Link: http://lkml.kernel.org/r/20160716061745.15795-3-bp@alien8.deSigned-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Franck Bui <fbui@suse.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      750afe7b
  18. 26 7月, 2016 1 次提交
  19. 17 7月, 2016 1 次提交
  20. 14 7月, 2016 1 次提交
  21. 10 7月, 2016 1 次提交
    • R
      PM / hibernate: Image data protection during restoration · 4c0b6c10
      Rafael J. Wysocki 提交于
      Make it possible to protect all pages holding image data during
      hibernate image restoration by setting them read-only (so as to
      catch attempts to write to those pages after image data have been
      stored in them).
      
      This adds overhead to image restoration code (it may cause large
      page mappings to be split as a result of page flags changes) and
      the errors it protects against should never happen in theory, so
      the feature is only active after passing hibernate=protect_image
      to the command line of the restore kernel.
      
      Also it only is built if CONFIG_DEBUG_RODATA is set.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4c0b6c10
  22. 09 7月, 2016 1 次提交
  23. 05 7月, 2016 1 次提交
  24. 30 6月, 2016 1 次提交
    • H
      ACPI / APEI: Add Boot Error Record Table (BERT) support · a3e2acc5
      Huang Ying 提交于
      ACPI/APEI is designed to verifiy/report H/W errors, like Corrected
      Error(CE) and Uncorrected Error(UC). It contains four tables: HEST,
      ERST, EINJ and BERT. The first three tables have been merged for
      a long time, but because of lacking BIOS support for BERT, the
      support for BERT is pending until now. Recently on ARM 64 platform
      it is has been supported. So here we come.
      
      Under normal circumstances, when a hardware error occurs, kernel will
      be notified via NMI, MCE or some other method, then kernel will
      process the error condition, report it, and recover it if possible.
      But sometime, the situation is so bad, so that firmware may choose to
      reset directly without notifying Linux kernel.
      
      Linux kernel can use the Boot Error Record Table (BERT) to get the
      un-notified hardware errors that occurred in a previous boot. In this
      patch, the error information is reported via printk.
      
      For more information about BERT, please refer to ACPI Specification
      version 6.0, section 18.3.1:
        http://www.uefi.org/sites/default/files/resources/ACPI_6.0.pdf
      
      The following log is a BERT record after system reboot because of hitting
      a fatal memory error:
      BERT: Error records from previous boot:
      [Hardware Error]: It has been corrected by h/w and requires no further action
      [Hardware Error]: event severity: corrected
      [Hardware Error]:  Error 0, type: recoverable
      [Hardware Error]:   section_type: memory error
      [Hardware Error]:   error_status: 0x0000000000000400
      [Hardware Error]:   physical_address: 0xffffffffffffffff
      [Hardware Error]:   card: 1 module: 2 bank: 3 row: 1 column: 2 bit_position: 5
      [Hardware Error]:   error_type: 2, single-bit ECC
      
      [Tomasz Nowicki: Clear error status at the end of error handling]
      [Tony: Applied some cleanups suggested by Fu Wei]
      [Fu Wei: delete EXPORT_SYMBOL_GPL(bert_disable), improve the code]
      Signed-off-by: NHuang Ying <ying.huang@intel.com>
      Signed-off-by: NTomasz Nowicki <tomasz.nowicki@linaro.org>
      Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
      Tested-by: NJonathan (Zhixiong) Zhang <zjzhang@codeaurora.org>
      Signed-off-by: NFu Wei <fu.wei@linaro.org>
      Tested-by: NTyler Baicar <tbaicar@codeaurora.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a3e2acc5
  25. 28 6月, 2016 2 次提交
  26. 26 6月, 2016 1 次提交
    • K
      x86/KASLR, x86/power: Remove x86 hibernation restrictions · 65fe935d
      Kees Cook 提交于
      With the following fix:
      
        70595b479ce1 ("x86/power/64: Fix crash whan the hibernation code passes control to the image kernel")
      
      ... there is no longer a problem with hibernation resuming a
      KASLR-booted kernel image, so remove the restriction.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Linux PM list <linux-pm@vger.kernel.org>
      Cc: Logan Gunthorpe <logang@deltatee.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: linux-doc@vger.kernel.org
      Link: http://lkml.kernel.org/r/20160613221002.GA29719@www.outflux.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      65fe935d
  27. 22 6月, 2016 1 次提交
    • K
      PCI: Extending pci=resource_alignment to specify device/vendor IDs · 644a544f
      Koehrer Mathias (ETAS/ESW5) 提交于
      Some uio-based PCI drivers, e.g., uio_cif do not work if the assigned PCI
      memory resources are not page aligned.
      
      By using the kernel option "pci=resource_alignment" it is possible to force
      single PCI boards to use page alignment for their memory resources.
      However, this is fairly cumbersome if several of these boards are in use
      as the specification of the cards has to be done via PCI bus/slot/function
      number which might change, e.g., by adding another board.
      
      Extend the kernel option "pci=resource_alignment" to allow specification of
      relevant devices via PCI device/vendor (and subdevice/subvendor) IDs.  The
      specification of the devices via device/vendor is indicated by a leading
      string "pci:" as argument to "pci=resource_alignment".  The format of the
      specification is pci:<vendor>:<device>[:<subvendor>:<subdevice>]
      Signed-off-by: NMathias Koehrer <mathias.koehrer@etas.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      644a544f
  28. 14 6月, 2016 1 次提交
    • M
      PCI: Put PCIe ports into D3 during suspend · 9d26d3a8
      Mika Westerberg 提交于
      Currently the Linux PCI core does not touch power state of PCI bridges and
      PCIe ports when system suspend is entered.  Leaving them in D0 consumes
      power unnecessarily and may prevent the CPU from entering deeper C-states.
      
      With recent PCIe hardware we can power down the ports to save power given
      that we take into account few restrictions:
      
        - The PCIe port hardware is recent enough, starting from 2015.
      
        - Devices connected to PCIe ports are effectively in D3cold once the port
          is transitioned to D3 (the config space is not accessible anymore and
          the link may be powered down).
      
        - Devices behind the PCIe port need to be allowed to transition to D3cold
          and back.  There is a way both drivers and userspace can forbid this.
      
        - If the device behind the PCIe port is capable of waking the system it
          needs to be able to do so from D3cold.
      
      This patch adds a new flag to struct pci_device called 'bridge_d3'.  This
      flag is set and cleared by the PCI core whenever there is a change in power
      management state of any of the devices behind the PCIe port.  When system
      later on is suspended we only need to check this flag and if it is true
      transition the port to D3 otherwise we leave it in D0.
      
      Also provide override mechanism via command line parameter
      "pcie_port_pm=[off|force]" that can be used to disable or enable the
      feature regardless of the BIOS manufacturing date.
      Tested-by: NLukas Wunner <lukas@wunner.de>
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9d26d3a8
  29. 04 6月, 2016 1 次提交
  30. 20 5月, 2016 1 次提交
  31. 05 5月, 2016 2 次提交
  32. 01 5月, 2016 1 次提交
  33. 28 4月, 2016 2 次提交
    • S
      cpufreq: intel_pstate: Enable PPC enforcement for servers · 2b3ec765
      Srinivas Pandruvada 提交于
      For platforms which are controlled via remove node manager, enable _PPC by
      default. These platforms are mostly categorized as enterprise server or
      performance servers. These platforms needs to go through some
      certifications tests, which tests control via _PPC.
      The relative risk of enabling by default is  low as this is is less likely
      that these systems have broken _PSS table.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2b3ec765
    • S
      cpufreq: intel_pstate: Enforce _PPC limits · 9522a2ff
      Srinivas Pandruvada 提交于
      Use ACPI _PPC notification to limit max P state driver will request.
      ACPI _PPC change notification is sent by BIOS to limit max P state
      in several cases:
      - Reduce impact of platform thermal condition
      - When Config TDP feature is used, a changed _PPC is sent to
      follow TDP change
      - Remote node managers in server want to control platform power
      via baseboard management controller (BMC)
      
      This change registers with ACPI processor performance lib so that
      _PPC changes are notified to cpufreq core, which in turns will
      result in call to .setpolicy() callback. Also the way _PSS
      table identifies a turbo frequency is not compatible to max turbo
      frequency in intel_pstate, so the very first entry in _PSS needs
      to be adjusted.
      
      This feature can be turned on by using kernel parameters:
      intel_pstate=support_acpi_ppc
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      [ rjw: Minor cleanups ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9522a2ff
  34. 26 4月, 2016 1 次提交