1. 22 6月, 2014 11 次提交
    • J
      x86, irq: Clean up unused IOAPIC interface · 9f354b02
      Jiang Liu 提交于
      Now we have converted all x86 platforms to use the common irqdomain map
      interface. There's no caller of io_apic_set_pci_routing(),
      setup_IO_APIC_irq_extra() and io_apic_setup_irq_pin_once() any more,
      so kill them.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1402302011-23642-35-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9f354b02
    • J
      x86, irq: Introduce two helper functions to support irqdomain map operation · 15a3c7cc
      Jiang Liu 提交于
      Currently there are multiple entries to program IOAPIC pins, such as
      io_apic_setup_irq_pin_once(), io_apic_set_pci_routing() and
      setup_IO_APIC_irq_extra() etc.
      
      This patch introduces two functions to help consolidate the code to
      program IOAPIC pins. Function mp_set_pin_attr() is used to optionally
      set trigger, polarity and NUMA node property for an IOAPIC pin.
      If mp_set_pin_attr() is not invoked for a pin, the default configuration
      from BIOS will be used.
      
      Function mp_irqdomain_map() is an common implementation of irqdomain map()
      operation. It figures out attribures for pin and then actually programs
      the IOAPIC pin. We hope this will be the only entrance for programming
      IOAPIC pin.
      
      And the flow will:
      1) caller such as xxx_pci_irq_enable figures out pin attributes.
      2) Invoke mp_set_pin_attr() to set attributes for a pin. If the pin has
         already bin programmed,  mp_set_pin_attr() will aslo detects attribute
         confictions.
      3) Invoke mp_map_pin_to_irq()
      3.1) If IRQ has already been assigned, return irq_find_mapping()
      3.2) Else irq_create_mapping()
      		->irq_domain_associate()
      			->mp_irqdomain_map()
      				->io_apic_setup_irq_pin()
      
      So every pin will only programmed once by mp_irqdomain_map(), so we
      could kill io_apic_setup_irq_pin_once(), io_apic_set_pci_routing() and
      setup_IO_APIC_irq_extra() etc.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1402302011-23642-30-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      15a3c7cc
    • J
      x86, devicetree, irq: Use common mechanism to support irqdomain · facd8fdb
      Jiang Liu 提交于
      Now the ioapic driver provides a common interface to create irqdomain,
      so replace the private implementation.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Tony Lindgren <tony@atomide.com>
      Link: http://lkml.kernel.org/r/1402302011-23642-29-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      facd8fdb
    • J
      x86, irq: Enhance mp_register_ioapic() to support irqdomain · 44767bfa
      Jiang Liu 提交于
      Enhance function mp_register_ioapic() to support irqdomain.
      When registering IOAPIC, caller may provide callbacks and parameters
      for creating irqdomain. The IOAPIC core will create irqdomain later
      if caller has passed in corresponding parameters.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: sfi-devel@simplefirmware.org
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Tony Lindgren <tony@atomide.com>
      Link: http://lkml.kernel.org/r/1402302011-23642-25-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      44767bfa
    • J
      x86, irq: Introduce mechanisms to support dynamically allocate IRQ for IOAPIC · d7f3d478
      Jiang Liu 提交于
      Currently x86 support identity mapping between GSI(IOAPIC pin) and IRQ
      number, so continous IRQs at low end are statically allocated to IOAPICs
      at boot time. This design causes trouble to support IOAPIC hotplug.
      
      This patch implements basic mechanism to dynamically allocate IRQ on
      demand for IOAPIC pins by using irqdomain framework.
      
      It first adds several fields into struct ioapic to support irqdomain.
      Then it implements an algorithm to dynamically allocate IRQ number
      for IOAPIC pins on demand.
      
      Currently it supports three types of irqdomain:
      1) LEGACY: used to support IOAPIC hosting legacy IRQs and building
         identity mapping for legacy IRQs. A speical case, we dynamically
         allocate IRQ number for IOAPIC pin which has GSI number below
         nr_legacy_irqs() but isn't legacy IRQ. This is for backward
         compatibility and avoid regression.
      2) STRICT: build identity mapping between GSI and IRQ nubmer.
      3) DYNAMIC: dynamically allocate IRQ number for IOAPIC pin on demand.
      
      Legacy(ISA) IRQs is not managed by irqdomain because there may be
      multiple pins sharing the same IRQ number and current irqdomain only
      supports 1:1 mapping between pins and IRQ.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Link: http://lkml.kernel.org/r/1402302011-23642-24-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      d7f3d478
    • J
      x86, ACPI, irq: Consolidate algorithm of mapping (ioapic, pin) to IRQ number · 6b9fb708
      Jiang Liu 提交于
      Currently ACPI and ioapic both implement algorithms to map (ioapic, pin)
      to IRQ number. So consolidate the common part into one place, which is
      also preparing for irqdomain support.
      
      It introduces mp_map_gsi_to_irq(), which will be used to allocate IRQ
      number IOAPIC pins when irqdomain is enabled.
      
      Also rename gsi_to_irq() to map_gsi_to_irq(), later we will introduce
      unmap_gsi_to_irq() when enabling IOAPIC hotplug.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Link: http://lkml.kernel.org/r/1402380812-32446-1-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      6b9fb708
    • J
      x86, irq: Count legacy IRQs by legacy_pic->nr_legacy_irqs instead of NR_IRQS_LEGACY · 95d76acc
      Jiang Liu 提交于
      Some platforms, such as Intel MID and mshypv, do not support legacy
      interrupt controllers. So count legacy IRQs by legacy_pic->nr_legacy_irqs
      instead of hard-coded NR_IRQS_LEGACY.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: xen-devel@lists.xenproject.org
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Tony Lindgren <tony@atomide.com>
      Acked-by: NDavid Vrabel <david.vrabel@citrix.com>
      Link: http://lkml.kernel.org/r/1402302011-23642-20-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      95d76acc
    • J
      x86, irq: Introduce some helper utilities to improve readability · 18e48551
      Jiang Liu 提交于
      It also fixes an off by one bug in
      	if ((ioapic_idx > 0) && (irq > NR_IRQS_LEGACY))
      It should be
      	if ((ioapic_idx > 0) && (irq >= NR_IRQS_LEGACY))
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1402302011-23642-17-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      18e48551
    • J
      x86, ioapic: Kill unused global variable timer_through_8259 · 4035ed01
      Jiang Liu 提交于
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Link: http://lkml.kernel.org/r/1402302011-23642-12-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      4035ed01
    • J
      x86, irq, trivial: Minor improvements of IRQ related code · 3eb2be5f
      Jiang Liu 提交于
      1) Kill unused MAX_HARDIRQS_PER_CPU.
      2) Improve function prototype declararions.
      3) Simple typo fix, change "gsit" to "gsi".
      4) Use macro VECTOR_UNDEFINED instead of hard-coded -1.
      5) Kill redundant comments.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Jiri Kosina <trivial@kernel.org>
      Link: http://lkml.kernel.org/r/1402302011-23642-11-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      3eb2be5f
    • J
      x86, mpparse: Simplify arch/x86/include/asm/mpspec.h · a491cc90
      Jiang Liu 提交于
      Simplify arch/x86/include/asm/mpspec.h by
      1) Change max_physical_apicid to static as it's only used in apic.c.
      2) Kill declaration of mpc_default_type, it's never defined.
      3) Delete default_acpi_madt_oem_check(), it has already been declared
         in apic.h.
      4) Make default_acpi_madt_oem_check() depends on CONFIG_X86_LOCAL_APIC
         instead of CONFIG_X86_64 to support i386.
      5) Change mp_override_legacy_irq(), mp_config_acpi_legacy_irqs() and
         mp_register_gsi() as static because they are only used in acpi/boot.c.
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Andi Kleen <ak@linux.intel.com>
      Link: http://lkml.kernel.org/r/1402302011-23642-4-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      a491cc90
  2. 07 6月, 2014 1 次提交
  3. 06 6月, 2014 1 次提交
  4. 05 6月, 2014 7 次提交
    • O
      uprobes/x86: Rename arch_uprobe->def to ->defparam, minor comment updates · 5cdb76d6
      Oleg Nesterov 提交于
      Purely cosmetic, no changes in .o,
      
      1. As Jim pointed out arch_uprobe->def looks ambiguous, rename it to
         ->defparam.
      
      2. Add the comment into default_post_xol_op() to explain "regs->sp +=".
      
      3. Remove the stale part of the comment in arch_uprobe_analyze_insn().
      Suggested-by: NJim Keniston <jkenisto@us.ibm.com>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Acked-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      5cdb76d6
    • F
      sys_sgetmask/sys_ssetmask: add CONFIG_SGETMASK_SYSCALL · f6187769
      Fabian Frederick 提交于
      sys_sgetmask and sys_ssetmask are obsolete system calls no longer
      supported in libc.
      
      This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert
      mode configuration.That option is enabled by default for those
      architectures.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Steven Miao <realmz6@gmail.com>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f6187769
    • C
      hwpoison: remove unused global variable in do_machine_check() · 65eb7182
      Chen Yucong 提交于
      Remove an unused global variable mce_entry and relative operations in
      do_machine_check().
      Signed-off-by: NChen Yucong <slaoub@gmail.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65eb7182
    • C
      mm: x86 pgtable: require X86_64 for soft-dirty tracker · 2bf01f9f
      Cyrill Gorcunov 提交于
      Tracking dirty status on 2 level pages requires very ugly macros and
      taking into account how old the machines who can operate without PAE
      mode only are, lets drop soft dirty tracker from them for code
      simplicity (note I can't drop all the macros from 2 level pages by now
      since _PAGE_BIT_PROTNONE and _PAGE_BIT_FILE are still used even without
      tracker).
      
      Linus proposed to completely rip off softdirty support on x86-32 (even
      with PAE) and since for CRIU we're not planning to support native x86-32
      mode, lets do that.
      
      (Softdirty tracker is relatively new feature which is mostly used by
      CRIU so I don't expect if such API change would cause problems for
      userspace).
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2bf01f9f
    • C
      mm: x86 pgtable: drop unneeded preprocessor ifdef · 2373eaec
      Cyrill Gorcunov 提交于
      _PAGE_BIT_FILE (bit 6) is always less than _PAGE_BIT_PROTNONE (bit 8), so
      drop redundant #ifdef.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2373eaec
    • A
      x86: enable DMA CMA with swiotlb · 9c5a3621
      Akinobu Mita 提交于
      The DMA Contiguous Memory Allocator support on x86 is disabled when
      swiotlb config option is enabled.  So DMA CMA is always disabled on
      x86_64 because swiotlb is always enabled.  This attempts to support for
      DMA CMA with enabling swiotlb config option.
      
      The contiguous memory allocator on x86 is integrated in the function
      dma_generic_alloc_coherent() which is .alloc callback in nommu_dma_ops
      for dma_alloc_coherent().
      
      x86_swiotlb_alloc_coherent() which is .alloc callback in swiotlb_dma_ops
      tries to allocate with dma_generic_alloc_coherent() firstly and then
      swiotlb_alloc_coherent() is called as a fallback.
      
      The main part of supporting DMA CMA with swiotlb is that changing
      x86_swiotlb_free_coherent() which is .free callback in swiotlb_dma_ops
      for dma_free_coherent() so that it can distinguish memory allocated by
      dma_generic_alloc_coherent() from one allocated by
      swiotlb_alloc_coherent() and release it with dma_generic_free_coherent()
      which can handle contiguous memory.  This change requires making
      is_swiotlb_buffer() global function.
      
      This also needs to change .free callback in the dma_map_ops for amd_gart
      and sta2x11, because these dma_ops are also using
      dma_generic_alloc_coherent().
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Acked-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Don Dutile <ddutile@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9c5a3621
    • M
      x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels · c46a7c81
      Mel Gorman 提交于
      _PAGE_NUMA is currently an alias of _PROT_PROTNONE to trap NUMA hinting
      faults on x86.  Care is taken such that _PAGE_NUMA is used only in
      situations where the VMA flags distinguish between NUMA hinting faults
      and prot_none faults.  This decision was x86-specific and conceptually
      it is difficult requiring special casing to distinguish between PROTNONE
      and NUMA ptes based on context.
      
      Fundamentally, we only need the _PAGE_NUMA bit to tell the difference
      between an entry that is really unmapped and a page that is protected
      for NUMA hinting faults as if the PTE is not present then a fault will
      be trapped.
      
      Swap PTEs on x86-64 use the bits after _PAGE_GLOBAL for the offset.
      This patch shrinks the maximum possible swap size and uses the bit to
      uniquely distinguish between NUMA hinting ptes and swap ptes.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Noonan <steven@uplinklabs.net>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c46a7c81
  5. 31 5月, 2014 1 次提交
    • M
      x86_64: expand kernel stack to 16K · 6538b8ea
      Minchan Kim 提交于
      While I play inhouse patches with much memory pressure on qemu-kvm,
      3.14 kernel was randomly crashed. The reason was kernel stack overflow.
      
      When I investigated the problem, the callstack was a little bit deeper
      by involve with reclaim functions but not direct reclaim path.
      
      I tried to diet stack size of some functions related with alloc/reclaim
      so did a hundred of byte but overflow was't disappeard so that I encounter
      overflow by another deeper callstack on reclaim/allocator path.
      
      Of course, we might sweep every sites we have found for reducing
      stack usage but I'm not sure how long it saves the world(surely,
      lots of developer start to add nice features which will use stack
      agains) and if we consider another more complex feature in I/O layer
      and/or reclaim path, it might be better to increase stack size(
      meanwhile, stack usage on 64bit machine was doubled compared to 32bit
      while it have sticked to 8K. Hmm, it's not a fair to me and arm64
      already expaned to 16K. )
      
      So, my stupid idea is just let's expand stack size and keep an eye
      toward stack consumption on each kernel functions via stacktrace of ftrace.
      For example, we can have a bar like that each funcion shouldn't exceed 200K
      and emit the warning when some function consumes more in runtime.
      Of course, it could make false positive but at least, it could make a
      chance to think over it.
      
      I guess this topic was discussed several time so there might be
      strong reason not to increase kernel stack size on x86_64, for me not
      knowing so Ccing x86_64 maintainers, other MM guys and virtio
      maintainers.
      
      Here's an example call trace using up the kernel stack:
      
               Depth    Size   Location    (51 entries)
               -----    ----   --------
         0)     7696      16   lookup_address
         1)     7680      16   _lookup_address_cpa.isra.3
         2)     7664      24   __change_page_attr_set_clr
         3)     7640     392   kernel_map_pages
         4)     7248     256   get_page_from_freelist
         5)     6992     352   __alloc_pages_nodemask
         6)     6640       8   alloc_pages_current
         7)     6632     168   new_slab
         8)     6464       8   __slab_alloc
         9)     6456      80   __kmalloc
        10)     6376     376   vring_add_indirect
        11)     6000     144   virtqueue_add_sgs
        12)     5856     288   __virtblk_add_req
        13)     5568      96   virtio_queue_rq
        14)     5472     128   __blk_mq_run_hw_queue
        15)     5344      16   blk_mq_run_hw_queue
        16)     5328      96   blk_mq_insert_requests
        17)     5232     112   blk_mq_flush_plug_list
        18)     5120     112   blk_flush_plug_list
        19)     5008      64   io_schedule_timeout
        20)     4944     128   mempool_alloc
        21)     4816      96   bio_alloc_bioset
        22)     4720      48   get_swap_bio
        23)     4672     160   __swap_writepage
        24)     4512      32   swap_writepage
        25)     4480     320   shrink_page_list
        26)     4160     208   shrink_inactive_list
        27)     3952     304   shrink_lruvec
        28)     3648      80   shrink_zone
        29)     3568     128   do_try_to_free_pages
        30)     3440     208   try_to_free_pages
        31)     3232     352   __alloc_pages_nodemask
        32)     2880       8   alloc_pages_current
        33)     2872     200   __page_cache_alloc
        34)     2672      80   find_or_create_page
        35)     2592      80   ext4_mb_load_buddy
        36)     2512     176   ext4_mb_regular_allocator
        37)     2336     128   ext4_mb_new_blocks
        38)     2208     256   ext4_ext_map_blocks
        39)     1952     160   ext4_map_blocks
        40)     1792     384   ext4_writepages
        41)     1408      16   do_writepages
        42)     1392      96   __writeback_single_inode
        43)     1296     176   writeback_sb_inodes
        44)     1120      80   __writeback_inodes_wb
        45)     1040     160   wb_writeback
        46)      880     208   bdi_writeback_workfn
        47)      672     144   process_one_work
        48)      528     112   worker_thread
        49)      416     240   kthread
        50)      176     176   ret_from_fork
      
      [ Note: the problem is exacerbated by certain gcc versions that seem to
        generate much bigger stack frames due to apparently bad coalescing of
        temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
        35% more stack on the virtio path than 4.8.2 does, for example.
      
        Minchan not only uses such a bad gcc version (4.6.3 in his case), but
        some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
        what causes that kernel_map_pages() frame, for example). But we're
        clearly getting too close.
      
        The VM code also seems to have excessive stack frames partly for the
        same compiler reason, triggered by excessive inlining and lots of
        function arguments.
      
        We need to improve on our stack use, but in the meantime let's do this
        simple stack increase too.  Unlike most earlier reports, there is
        nothing simple that stands out as being really horribly wrong here,
        apart from the fact that the stack frames are just bigger than they
        should need to be.        - Linus ]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S Tsirkin <mst@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6538b8ea
  6. 28 5月, 2014 3 次提交
    • H
      PCI: Turn pcibios_penalize_isa_irq() into a weak function · a43ae58c
      Hanjun Guo 提交于
      pcibios_penalize_isa_irq() is only implemented by x86 now, and legacy ISA
      is not used by some architectures.  Make pcibios_penalize_isa_irq() a
      __weak function to simplify the code.  This removes the need for new
      platforms to add stub implementations of pcibios_penalize_isa_irq().
      
      [bhelgaas: changelog, comments]
      Signed-off-by: NHanjun Guo <hanjun.guo@linaro.org>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      a43ae58c
    • L
      ACPICA: Clean up redudant definitions already defined elsewhere · 92985ef1
      Lv Zheng 提交于
      Since mis-order issues have been solved, we can cleanup redundant
      definitions that already have defaults in <acpi/platform/acenv.h>.
      
      This patch removes redudant environments for __KERNEL__ surrounded code.
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      92985ef1
    • L
      ACPICA: Linux headers: Add <asm/acenv.h> to remove mis-ordered inclusion of <asm/acpi.h> · 07d83914
      Lv Zheng 提交于
      There is a mis-order inclusion for <asm/acpi.h>.
      
      As we will enforce including <linux/acpi.h> for all Linux ACPI users, we
      can find the inclusion order is as follows:
      
      <linux/acpi.h>
        <acpi/acpi.h>
         <acpi/platform/acenv.h>
          (acenv.h before including aclinux.h)
          <acpi/platform/aclinux.h>
      ...........................................................................
           (aclinux.h before including asm/acpi.h)
           <asm/acpi.h>                             @Redundant@
            (ACPICA specific stuff)
      ...........................................................................
      ...........................................................................
            (Linux ACPI specific stuff) ? - - - - - - - - - - - - +
           (aclinux.h after including asm/acpi.h)   @Invisible@   |
          (acenv.h after including aclinux.h)       @Invisible@   |
         other ACPICA headers                       @Invisible@   |
      ............................................................|..............
        <acpi/acpi_bus.h>                                         |
        <acpi/acpi_drivers.h>                                     |
        <asm/acpi.h> (Excluded)                                   |
         (Linux ACPI specific stuff) ! <- - - - - - - - - - - - - +
      
      NOTE that, in ACPICA, <acpi/platform/acenv.h> is more like Kconfig
      generated <generated/autoconf.h> for Linux, it is meant to be included
      before including any ACPICA code.
      
      In the above figure, there is a question mark for "Linux ACPI specific
      stuff" in <asm/acpi.h> which should be included after including all other
      ACPICA header files.  Thus they really need to be moved to the position
      marked with exclaimation mark or the definitions in the blocks marked with
      "@Invisible@" will be invisible to such architecture specific "Linux ACPI
      specific stuff" header blocks.  This leaves 2 issues:
      1. All environmental definitions in these blocks should have a copy in the
         area marked with "@Redundant@" if they are required by the "Linux ACPI
         specific stuff".
      2. We cannot use any ACPICA defined types in <asm/acpi.h>.
      
      This patch splits architecture specific ACPICA stuff from <asm/acpi.h> to
      fix this issue.
      Signed-off-by: NLv Zheng <lv.zheng@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      07d83914
  7. 22 5月, 2014 3 次提交
    • D
      x86: fix page fault tracing when KVM guest support enabled · 65a7f03f
      Dave Hansen 提交于
      I noticed on some of my systems that page fault tracing doesn't
      work:
      
      	cd /sys/kernel/debug/tracing
      	echo 1 > events/exceptions/enable
      	cat trace;
      	# nothing shows up
      
      I eventually traced it down to CONFIG_KVM_GUEST.  At least in a
      KVM VM, enabling that option breaks page fault tracing, and
      disabling fixes it.  I tried on some old kernels and this does
      not appear to be a regression: it never worked.
      
      There are two page-fault entry functions today.  One when tracing
      is on and another when it is off.  The KVM code calls do_page_fault()
      directly instead of calling the traced version:
      
      > dotraplinkage void __kprobes
      > do_async_page_fault(struct pt_regs *regs, unsigned long
      > error_code)
      > {
      >         enum ctx_state prev_state;
      >
      >         switch (kvm_read_and_reset_pf_reason()) {
      >         default:
      >                 do_page_fault(regs, error_code);
      >                 break;
      >         case KVM_PV_REASON_PAGE_NOT_PRESENT:
      
      I'm also having problems with the page fault tracing on bare
      metal (same symptom of no trace output).  I'm unsure if it's
      related.
      
      Steven had an alternative to this which has zero overhead when
      tracing is off where this includes the standard noops even when
      tracing is disabled.  I'm unconvinced that the extra complexity
      of his apporach:
      
      	http://lkml.kernel.org/r/20140508194508.561ed220@gandalf.local.home
      
      is worth it, expecially considering that the KVM code is already
      making page fault entry slower here.  This solution is
      dirt-simple.
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: kvm@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      65a7f03f
    • P
      KVM: x86: get CPL from SS.DPL · ae9fedc7
      Paolo Bonzini 提交于
      CS.RPL is not equal to the CPL in the few instructions between
      setting CR0.PE and reloading CS.  And CS.DPL is also not equal
      to the CPL for conforming code segments.
      
      However, SS.DPL *is* always equal to the CPL except for the weird
      case of SYSRET on AMD processors, which sets SS.DPL=SS.RPL from the
      value in the STAR MSR, but force CPL=3 (Intel instead forces
      SS.DPL=SS.RPL=CPL=3).
      
      So this patch:
      
      - modifies SVM to update the CPL from SS.DPL rather than CS.RPL;
      the above case with SYSRET is not broken further, and the way
      to fix it would be to pass the CPL to userspace and back
      
      - modifies VMX to always return the CPL from SS.DPL (except
      forcing it to 0 if we are emulating real mode via vm86 mode;
      in vm86 mode all DPLs have to be 3, but real mode does allow
      privileged instructions).  It also removes the CPL cache,
      which becomes a duplicate of the SS access rights cache.
      
      This fixes doing KVM_IOCTL_SET_SREGS exactly after setting
      CR0.PE=1 but before CS has been reloaded.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ae9fedc7
    • P
      KVM: x86: drop set_rflags callback · fb5e336b
      Paolo Bonzini 提交于
      Not needed anymore now that the CPL is computed directly
      during task switch.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fb5e336b
  8. 21 5月, 2014 3 次提交
  9. 16 5月, 2014 2 次提交
  10. 14 5月, 2014 3 次提交
  11. 10 5月, 2014 2 次提交
  12. 09 5月, 2014 1 次提交
  13. 08 5月, 2014 2 次提交