1. 31 1月, 2013 1 次提交
    • M
      efi: Make 'efi_enabled' a function to query EFI facilities · 83e68189
      Matt Fleming 提交于
      Originally 'efi_enabled' indicated whether a kernel was booted from
      EFI firmware. Over time its semantics have changed, and it now
      indicates whether or not we are booted on an EFI machine with
      bit-native firmware, e.g. 64-bit kernel with 64-bit firmware.
      
      The immediate motivation for this patch is the bug report at,
      
          https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557
      
      which details how running a platform driver on an EFI machine that is
      designed to run under BIOS can cause the machine to become
      bricked. Also, the following report,
      
          https://bugzilla.kernel.org/show_bug.cgi?id=47121
      
      details how running said driver can also cause Machine Check
      Exceptions. Drivers need a new means of detecting whether they're
      running on an EFI machine, as sadly the expression,
      
          if (!efi_enabled)
      
      hasn't been a sufficient condition for quite some time.
      
      Users actually want to query 'efi_enabled' for different reasons -
      what they really want access to is the list of available EFI
      facilities.
      
      For instance, the x86 reboot code needs to know whether it can invoke
      the ResetSystem() function provided by the EFI runtime services, while
      the ACPI OSL code wants to know whether the EFI config tables were
      mapped successfully. There are also checks in some of the platform
      driver code to simply see if they're running on an EFI machine (which
      would make it a bad idea to do BIOS-y things).
      
      This patch is a prereq for the samsung-laptop fix patch.
      
      Cc: David Airlie <airlied@linux.ie>
      Cc: Corentin Chary <corentincj@iksaif.net>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Steve Langasek <steve.langasek@canonical.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      83e68189
  2. 30 1月, 2013 1 次提交
  3. 28 1月, 2013 6 次提交
  4. 25 1月, 2013 4 次提交
    • M
      x86, efi: Set runtime_version to the EFI spec revision · 712ba9e9
      Matt Fleming 提交于
      efi.runtime_version is erroneously being set to the value of the
      vendor's firmware revision instead of that of the implemented EFI
      specification. We can't deduce which EFI functions are available based
      on the revision of the vendor's firmware since the version scheme is
      likely to be unique to each vendor.
      
      What we really need to know is the revision of the implemented EFI
      specification, which is available in the EFI System Table header.
      
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: stable@vger.kernel.org # 3.7.x
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      712ba9e9
    • J
      x86, efi: fix 32-bit warnings in setup_efi_pci() · bc754790
      Jan Beulich 提交于
      Fix four similar build warnings on 32-bit (casts between different
      size pointers and integers).
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Stefan Hasko <hasko.stevo@gmail.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      bc754790
    • A
      x86/msr: Add capabilities check · c903f045
      Alan Cox 提交于
      At the moment the MSR driver only relies upon file system
      checks. This means that anything as root with any capability set
      can write to MSRs. Historically that wasn't very interesting but
      on modern processors the MSRs are such that writing to them
      provides several ways to execute arbitary code in kernel space.
      Sample code and documentation on doing this is circulating and
      MSR attacks are used on Windows 64bit rootkits already.
      
      In the Linux case you still need to be able to open the device
      file so the impact is fairly limited and reduces the security of
      some capability and security model based systems down towards
      that of a generic "root owns the box" setup.
      
      Therefore they should require CAP_SYS_RAWIO to prevent an
      elevation of capabilities. The impact of this is fairly minimal
      on most setups because they don't have heavy use of
      capabilities. Those using SELinux, SMACK or AppArmor rules might
      want to consider if their rulesets on the MSR driver could be
      tighter.
      Signed-off-by: NAlan Cox <alan@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Horses <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      c903f045
    • M
      x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES · 73b664ce
      Maarten Lankhorst 提交于
      I ran out of free entries when I had CONFIG_DMA_API_DEBUG
      enabled. Some other archs seem to default to 65536, so increase
      this limit for x86 too.
      Signed-off-by: NMaarten Lankhorst <maarten.lankhorst@canonical.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Link: http://lkml.kernel.org/r/50A612AA.7040206@canonical.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ----
      73b664ce
  5. 24 1月, 2013 3 次提交
  6. 23 1月, 2013 1 次提交
    • O
      ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL · 9899d11f
      Oleg Nesterov 提交于
      putreg() assumes that the tracee is not running and pt_regs_access() can
      safely play with its stack.  However a killed tracee can return from
      ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
      that debugger can actually read/modify the kernel stack until the tracee
      does SAVE_REST again.
      
      set_task_blockstep() can race with SIGKILL too and in some sense this
      race is even worse, the very fact the tracee can be woken up breaks the
      logic.
      
      As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
      call, this ensures that nobody can ever wakeup the tracee while the
      debugger looks at it.  Not only this fixes the mentioned problems, we
      can do some cleanups/simplifications in arch_ptrace() paths.
      
      Probably ptrace_unfreeze_traced() needs more callers, for example it
      makes sense to make the tracee killable for oom-killer before
      access_process_vm().
      
      While at it, add the comment into may_ptrace_stop() to explain why
      ptrace_stop() still can't rely on SIGKILL and signal_pending_state().
      Reported-by: NSalman Qazi <sqazi@google.com>
      Reported-by: NSuleiman Souhlal <suleiman@google.com>
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9899d11f
  7. 20 1月, 2013 1 次提交
    • H
      x86-32: Start out cr0 clean, disable paging before modifying cr3/4 · 021ef050
      H. Peter Anvin 提交于
      Patch
      
        5a5a51db x86-32: Start out eflags and cr4 clean
      
      ... made x86-32 match x86-64 in that we initialize %eflags and %cr4
      from scratch.  This broke OLPC XO-1.5, because the XO enters the
      kernel with paging enabled, which the kernel doesn't expect.
      
      Since we no longer support 386 (the source of most of the variability
      in %cr0 configuration), we can simply match further x86-64 and
      initialize %cr0 to a fixed value -- the one variable part remaining in
      %cr0 is for FPU control, but all that is handled later on in
      initialization; in particular, configuring %cr0 as if the FPU is
      present until proven otherwise is correct and necessary for the probe
      to work.
      
      To deal with the XO case sanely, explicitly disable paging in %cr0
      before we muck with %cr3, %cr4 or EFER -- those operations are
      inherently unsafe with paging enabled.
      
      NOTE: There is still a lot of 386-related junk in head_32.S which we
      can and should get rid of, however, this is intended as a minimal fix
      whereas the cleanup can be deferred to the next merge window.
      Reported-by: NAndres Salomon <dilinger@queued.net>
      Tested-by: NDaniel Drake <dsd@laptop.org>
      Link: http://lkml.kernel.org/r/50FA0661.2060400@linux.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      021ef050
  8. 18 1月, 2013 1 次提交
    • N
      efi, x86: Pass a proper identity mapping in efi_call_phys_prelog · b8f2c21d
      Nathan Zimmer 提交于
      Update efi_call_phys_prelog to install an identity mapping of all available
      memory.  This corrects a bug on very large systems with more then 512 GB in
      which bios would not be able to access addresses above not in the mapping.
      
      The result is a crash that looks much like this.
      
      BUG: unable to handle kernel paging request at 000000effd870020
      IP: [<0000000078bce331>] 0x78bce330
      PGD 0
      Oops: 0000 [#1] SMP
      Modules linked in:
      CPU 0
      Pid: 0, comm: swapper/0 Tainted: G        W    3.8.0-rc1-next-20121224-medusa_ntz+ #2 Intel Corp. Stoutland Platform
      RIP: 0010:[<0000000078bce331>]  [<0000000078bce331>] 0x78bce330
      RSP: 0000:ffffffff81601d28  EFLAGS: 00010006
      RAX: 0000000078b80e18 RBX: 0000000000000004 RCX: 0000000000000004
      RDX: 0000000078bcf958 RSI: 0000000000002400 RDI: 8000000000000000
      RBP: 0000000078bcf760 R08: 000000effd870000 R09: 0000000000000000
      R10: 0000000000000000 R11: 00000000000000c3 R12: 0000000000000030
      R13: 000000effd870000 R14: 0000000000000000 R15: ffff88effd870000
      FS:  0000000000000000(0000) GS:ffff88effe400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000effd870020 CR3: 000000000160c000 CR4: 00000000000006b0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81614400)
      Stack:
       0000000078b80d18 0000000000000004 0000000078bced7b ffff880078b81fff
       0000000000000000 0000000000000082 0000000078bce3a8 0000000000002400
       0000000060000202 0000000078b80da0 0000000078bce45d ffffffff8107cb5a
      Call Trace:
       [<ffffffff8107cb5a>] ? on_each_cpu+0x77/0x83
       [<ffffffff8102f4eb>] ? change_page_attr_set_clr+0x32f/0x3ed
       [<ffffffff81035946>] ? efi_call4+0x46/0x80
       [<ffffffff816c5abb>] ? efi_enter_virtual_mode+0x1f5/0x305
       [<ffffffff816aeb24>] ? start_kernel+0x34a/0x3d2
       [<ffffffff816ae5ed>] ? repair_env_string+0x60/0x60
       [<ffffffff816ae2be>] ? x86_64_start_reservations+0xba/0xc1
       [<ffffffff816ae120>] ? early_idt_handlers+0x120/0x120
       [<ffffffff816ae419>] ? x86_64_start_kernel+0x154/0x163
      Code:  Bad RIP value.
      RIP  [<0000000078bce331>] 0x78bce330
       RSP <ffffffff81601d28>
      CR2: 000000effd870020
      ---[ end trace ead828934fef5eab ]---
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NNathan Zimmer <nzimmer@sgi.com>
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      b8f2c21d
  9. 17 1月, 2013 1 次提交
    • A
      xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests. · 9174adbe
      Andrew Cooper 提交于
      This fixes CVE-2013-0190 / XSA-40
      
      There has been an error on the xen_failsafe_callback path for failed
      iret, which causes the stack pointer to be wrong when entering the
      iret_exc error path.  This can result in the kernel crashing.
      
      In the classic kernel case, the relevant code looked a little like:
      
              popl %eax      # Error code from hypervisor
              jz 5f
              addl $16,%esp
              jmp iret_exc   # Hypervisor said iret fault
      5:      addl $16,%esp
                             # Hypervisor said segment selector fault
      
      Here, there are two identical addls on either option of a branch which
      appears to have been optimised by hoisting it above the jz, and
      converting it to an lea, which leaves the flags register unaffected.
      
      In the PVOPS case, the code looks like:
      
              popl_cfi %eax         # Error from the hypervisor
              lea 16(%esp),%esp     # Add $16 before choosing fault path
              CFI_ADJUST_CFA_OFFSET -16
              jz 5f
              addl $16,%esp         # Incorrectly adjust %esp again
              jmp iret_exc
      
      It is possible unprivileged userspace applications to cause this
      behaviour, for example by loading an LDT code selector, then changing
      the code selector to be not-present.  At this point, there is a race
      condition where it is possible for the hypervisor to return back to
      userspace from an interrupt, fault on its own iret, and inject a
      failsafe_callback into the kernel.
      
      This bug has been present since the introduction of Xen PVOPS support
      in commit 5ead97c8 (xen: Core Xen implementation), in 2.6.23.
      Signed-off-by: NFrediano Ziglio <frediano.ziglio@citrix.com>
      Signed-off-by: NAndrew Cooper <andrew.cooper3@citrix.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9174adbe
  10. 16 1月, 2013 1 次提交
  11. 14 1月, 2013 2 次提交
  12. 12 1月, 2013 1 次提交
    • J
      x86/Sandy Bridge: reserve pages when integrated graphics is present · a9acc536
      Jesse Barnes 提交于
      SNB graphics devices have a bug that prevent them from accessing certain
      memory ranges, namely anything below 1M and in the pages listed in the
      table.  So reserve those at boot if set detect a SNB gfx device on the
      CPU to avoid GPU hangs.
      
      Stephane Marchesin had a similar patch to the page allocator awhile
      back, but rather than reserving pages up front, it leaked them at
      allocation time.
      
      [ hpa: made a number of stylistic changes, marked arrays as static
        const, and made less verbose; use "memblock=debug" for full
        verbosity. ]
      Signed-off-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      a9acc536
  13. 10 1月, 2013 1 次提交
    • D
      perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side · a706d965
      David Ahern 提交于
      This patch is brought to you by the letter 'H'.
      
      Commit 20b279 breaks compatiblity with older perf binaries when run with
      precise modifier (:p or :pp) by requiring the exclude_guest attribute to be
      set. Older binaries default exclude_guest to 0 (ie., wanting guest-based
      samples) unless host only profiling is requested (:H modifier). The workaround
      for older binaries is to add H to the modifier list (e.g., -e cycles:ppH -
      toggles exclude_guest to 1). This was deemed unacceptable by Linus:
      
      https://lkml.org/lkml/2012/12/12/570
      
      Between family in town and the fresh snow in Breckenridge there is no time left
      to be working on the proper fix for this over the holidays. In the New Year I
      have more pressing problems to resolve -- like some memory leaks in perf which
      are proving to be elusive -- although the aforementioned snow is probably why
      they are proving to be elusive. Either way I do not have any spare time to work
      on this and from the time I have managed to spend on it the solution is more
      difficult than just moving to a new exclude_guest flag (does not work) or
      flipping the logic to include_guest (which is not as trivial as one would
      think).
      
      So, two options: silently force exclude_guest on as suggested by Gleb which
      means no impact to older perf binaries or revert the original patch which
      caused the breakage.
      
      This patch does the latter -- reverts the original patch that introduced the
      regression. The problem can be revisited in the future as time allows.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      Link: http://lkml.kernel.org/r/1356749767-17322-1-git-send-email-dsahern@gmail.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      a706d965
  14. 08 1月, 2013 1 次提交
  15. 04 1月, 2013 1 次提交
    • G
      X86: drivers: remove __dev* attributes. · a18e3690
      Greg Kroah-Hartman 提交于
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitconst,
      and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Drake <dsd@laptop.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a18e3690
  16. 27 12月, 2012 1 次提交
  17. 21 12月, 2012 1 次提交
  18. 20 12月, 2012 7 次提交
  19. 19 12月, 2012 2 次提交
  20. 18 12月, 2012 3 次提交
    • L
      Add rcu user eqs exception hooks for async page fault · 9b132fbe
      Li Zhong 提交于
      This patch adds user eqs exception hooks for async page fault page not
      present code path, to exit the user eqs and re-enter it as necessary.
      
      Async page fault is different from other exceptions that it may be
      triggered from idle process, so we still need rcu_irq_enter() and
      rcu_irq_exit() to exit cpu idle eqs when needed, to protect the code
      that needs use rcu.
      
      As Frederic pointed out it would be safest and simplest to protect the
      whole kvm_async_pf_task_wait(). Otherwise, "we need to check all the
      code there deeply for potential RCU uses and ensure it will never be
      extended later to use RCU.".
      
      However, We'd better re-enter the cpu idle eqs if we get the exception
      in cpu idle eqs, by calling rcu_irq_exit() before native_safe_halt().
      
      So the patch does what Frederic suggested for rcu_irq_*() API usage
      here, except that I moved the rcu_irq_*() pair originally in
      do_async_page_fault() into kvm_async_pf_task_wait().
      
      That's because, I think it's better to have rcu_irq_*() pairs to be in
      one function ( rcu_irq_exit() after rcu_irq_enter() ), especially here,
      kvm_async_pf_task_wait() has other callers, which might cause
      rcu_irq_exit() be called without a matching rcu_irq_enter() before it,
      which is illegal if the cpu happens to be in rcu idle state.
      Signed-off-by: NLi Zhong <zhong@linux.vnet.ibm.com>
      Signed-off-by: NGleb Natapov <gleb@redhat.com>
      9b132fbe
    • W
      xen/vcpu: Fix vcpu restore path. · 9d328a94
      Wei Liu 提交于
      The runstate of vcpu should be restored for all possible cpus, as well as the
      vcpu info placement.
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NWei Liu <wei.liu2@citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9d328a94
    • K
      xen/smp: Use smp_store_boot_cpu_info() to store cpu info for BSP during boot time. · 06d0b5d9
      Konrad Rzeszutek Wilk 提交于
      Git commit 30106c17
      ("x86, hotplug: Support functions for CPU0 online/offline") alters what
      the call to smp_store_cpu_info() does. For BSP we should use the
      smp_store_boot_cpu_info() and for secondary CPU's the old
      variant of smp_store_cpu_info() should be used. This fixes
      the regression introduced by said commit.
      Reported-and-Tested-by: NSander Eikelenboom <linux@eikelenboom.it>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      06d0b5d9