1. 26 7月, 2008 13 次提交
    • R
      x86_64: fix ia32 AMD syscall audit fast-path · 024e8ac0
      Roland McGrath 提交于
      The new code in commit 5cbf1565
      has a bug in the version supporting the AMD 'syscall' instruction.
      It clobbers the user's %ecx register value (with the %ebp value).
      
      This change fixes it.
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      024e8ac0
    • N
      powerpc: Fix boot problem due to AT_BASE_PLATFORM change · fc532f81
      Nathan Lynch 提交于
      Commit 9115d134 ("powerpc: Enable
      AT_BASE_PLATFORM aux vector") broke boot on 32-bit powerpc systems; we
      have to use PTRRELOC to initialize powerpc_base_platform this early in
      boot.
      
      Bug reported by Jon Smirl.
      Signed-off-by: NNathan Lynch <ntl@pobox.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      fc532f81
    • D
      sparc: Wire up new system calls. · f1373da8
      David S. Miller 提交于
      This wires up the recently added Wire up signalfd4, eventfd2,
      epoll_create1, dup3, pipe2, and inotify_init1 system calls.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f1373da8
    • A
      pty: remove unused UNIX98_PTY_COUNT options · 7833351b
      Adrian Bunk 提交于
      The h8300 and sparc options somehow survived when the code stopped using
      CONFIG_UNIX98_PTY_COUNT.
      Reviewed-by: NRobert P. J. Day <rpjday@crashcourse.ca>
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7833351b
    • C
      calgary iommu: use the first kernels TCE tables in kdump · 95b68dec
      Chandru 提交于
      kdump kernel fails to boot with calgary iommu and aacraid driver on a x366
      box.  The ongoing dma's of aacraid from the first kernel continue to exist
      until the driver is loaded in the kdump kernel.  Calgary is initialized
      prior to aacraid and creation of new tce tables causes wrong dma's to
      occur.  Here we try to get the tce tables of the first kernel in kdump
      kernel and use them.  While in the kdump kernel we do not allocate new tce
      tables but instead read the base address register contents of calgary
      iommu and use the tables that the registers point to.  With these changes
      the kdump kernel and hence aacraid now boots normally.
      Signed-off-by: NChandru Siddalingappa <chandru@in.ibm.com>
      Acked-by: NMuli Ben-Yehuda <muli@il.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      95b68dec
    • O
      S390 topology: don't use kthread() for arch_reinit_sched_domains() · 69b895fd
      Oleg Nesterov 提交于
      Now that it is safe to use get_online_cpus() we can revert
      
      	[S390] cpu topology: Fix possible deadlock.
      	commit: fd781fa2
      
      and call arch_reinit_sched_domains() directly from topology_work_fn().
      Signed-off-by: NOleg Nesterov <oleg@tv-sign.ru>
      Cc: Gautham R Shenoy <ego@in.ibm.com>
      Tested-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Max Krasnyansky <maxk@qualcomm.com>
      Cc: Paul Jackson <pj@sgi.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69b895fd
    • A
      remove unused #include <linux/dirent.h>'s · e8938a62
      Adrian Bunk 提交于
      Remove some unused #include <linux/dirent.h>'s.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8938a62
    • M
      gpiolib: allow user-selection · 7444a72e
      Michael Buesch 提交于
      This patch adds functionality to the gpio-lib subsystem to make it
      possible to enable the gpio-lib code even if the architecture code didn't
      request to get it built in.
      
      The archtitecture code does still need to implement the gpiolib accessor
      functions in its asm/gpio.h file.  This patch adds the implementations for
      x86 and PPC.
      
      With these changes it is possible to run generic GPIO expansion cards on
      every architecture that implements the trivial wrapper functions.  Support
      for more architectures can easily be added.
      Signed-off-by: NMichael Buesch <mb@bu3sch.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Jean Delvare <khali@linux-fr.org>
      Cc: Samuel Ortiz <sameo@openedhand.com>
      Cc: Kumar Gala <galak@gate.crashing.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Adrian Bunk <bunk@stusta.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7444a72e
    • D
      gpio: sysfs interface · d8f388d8
      David Brownell 提交于
      This adds a simple sysfs interface for GPIOs.
      
          /sys/class/gpio
          	/export ... asks the kernel to export a GPIO to userspace
          	/unexport ... to return a GPIO to the kernel
              /gpioN ... for each exported GPIO #N
      	    /value ... always readable, writes fail for input GPIOs
      	    /direction ... r/w as: in, out (default low); write high, low
      	/gpiochipN ... for each gpiochip; #N is its first GPIO
      	    /base ... (r/o) same as N
      	    /label ... (r/o) descriptive, not necessarily unique
      	    /ngpio ... (r/o) number of GPIOs; numbered N .. N+(ngpio - 1)
      
      GPIOs claimed by kernel code may be exported by its owner using a new
      gpio_export() call, which should be most useful for driver debugging.
      Such exports may optionally be done without a "direction" attribute.
      
      Userspace may ask to take over a GPIO by writing to a sysfs control file,
      helping to cope with incomplete board support or other "one-off"
      requirements that don't merit full kernel support:
      
        echo 23 > /sys/class/gpio/export
      	... will gpio_request(23, "sysfs") and gpio_export(23);
      	use /sys/class/gpio/gpio-23/direction to (re)configure it,
      	when that GPIO can be used as both input and output.
        echo 23 > /sys/class/gpio/unexport
      	... will gpio_free(23), when it was exported as above
      
      The extra D-space footprint is a few hundred bytes, except for the sysfs
      resources associated with each exported GPIO.  The additional I-space
      footprint is about two thirds of the current size of gpiolib (!).  Since
      no /dev node creation is involved, no "udev" support is needed.
      
      Related changes:
      
        * This adds a device pointer to "struct gpio_chip".  When GPIO
          providers initialize that, sysfs gpio class devices become children of
          that device instead of being "virtual" devices.
      
        * The (few) gpio_chip providers which have such a device node have
          been updated.
      
        * Some gpio_chip drivers also needed to update their module "owner"
          field ...  for which missing kerneldoc was added.
      
        * Some gpio_chips don't support input GPIOs.  Those GPIOs are now
          flagged appropriately when the chip is registered.
      
      Based on previous patches, and discussion both on and off LKML.
      
      A Documentation/ABI/testing/sysfs-gpio update is ready to submit once this
      merges to mainline.
      
      [akpm@linux-foundation.org: a few maintenance build fixes]
      Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
      Cc: Guennadi Liakhovetski <g.liakhovetski@pengutronix.de>
      Cc: Greg KH <greg@kroah.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d8f388d8
    • S
      kprobes: improve kretprobe scalability with hashed locking · ef53d9c5
      Srinivasa D S 提交于
      Currently list of kretprobe instances are stored in kretprobe object (as
      used_instances,free_instances) and in kretprobe hash table.  We have one
      global kretprobe lock to serialise the access to these lists.  This causes
      only one kretprobe handler to execute at a time.  Hence affects system
      performance, particularly on SMP systems and when return probe is set on
      lot of functions (like on all systemcalls).
      
      Solution proposed here gives fine-grain locks that performs better on SMP
      system compared to present kretprobe implementation.
      
      Solution:
      
       1) Instead of having one global lock to protect kretprobe instances
          present in kretprobe object and kretprobe hash table.  We will have
          two locks, one lock for protecting kretprobe hash table and another
          lock for kretporbe object.
      
       2) We hold lock present in kretprobe object while we modify kretprobe
          instance in kretprobe object and we hold per-hash-list lock while
          modifying kretprobe instances present in that hash list.  To prevent
          deadlock, we never grab a per-hash-list lock while holding a kretprobe
          lock.
      
       3) We can remove used_instances from struct kretprobe, as we can
          track used instances of kretprobe instances using kretprobe hash
          table.
      
      Time duration for kernel compilation ("make -j 8") on a 8-way ppc64 system
      with return probes set on all systemcalls looks like this.
      
      cacheline              non-cacheline             Un-patched kernel
      aligned patch 	       aligned patch
      ===============================================================================
      real    9m46.784s       9m54.412s                  10m2.450s
      user    40m5.715s       40m7.142s                  40m4.273s
      sys     2m57.754s       2m58.583s                  3m17.430s
      ===========================================================
      
      Time duration for kernel compilation ("make -j 8) on the same system, when
      kernel is not probed.
      =========================
      real    9m26.389s
      user    40m8.775s
      sys     2m7.283s
      =========================
      Signed-off-by: NSrinivasa DS <srinivasa@in.ibm.com>
      Signed-off-by: NJim Keniston <jkenisto@us.ibm.com>
      Acked-by: NAnanth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef53d9c5
    • T
      inflate: refactor inflate malloc code · 2d6ffcca
      Thomas Petazzoni 提交于
      Inflate requires some dynamic memory allocation very early in the boot
      process and this is provided with a set of four functions:
      malloc/free/gzip_mark/gzip_release.
      
      The old inflate code used a mark/release strategy rather than implement
      free.  This new version instead keeps a count on the number of outstanding
      allocations and when it hits zero, it resets the malloc arena.
      
      This allows removing all the mark and release implementations and unifying
      all the malloc/free implementations.
      
      The architecture-dependent code must define two addresses:
       - free_mem_ptr, the address of the beginning of the area in which
         allocations should be made
       - free_mem_end_ptr, the address of the end of the area in which
         allocations should be made. If set to 0, then no check is made on
         the number of allocations, it just grows as much as needed
      
      The architecture-dependent code can also provide an arch_decomp_wdog()
      function call.  This function will be called several times during the
      decompression process, and allow to notify the watchdog that the system is
      still running.  If an architecture provides such a call, then it must
      define ARCH_HAS_DECOMP_WDOG so that the generic inflate code calls
      arch_decomp_wdog().
      
      Work initially done by Matt Mackall, updated to a recent version of the
      kernel and improved by me.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Mikael Starvik <mikael.starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Acked-by: NPaul Mundt <lethal@linux-sh.org>
      Acked-by: NYoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d6ffcca
    • J
      introduce HAVE_EFFICIENT_UNALIGNED_ACCESS Kconfig symbol · 58340a07
      Johannes Berg 提交于
      In many cases, especially in networking, it can be beneficial to know at
      compile time whether the architecture can do unaligned accesses efficiently.
      This patch introduces a new Kconfig symbol
      
      	HAVE_EFFICIENT_UNALIGNED_ACCESS
      
      for that purpose and adds it to the powerpc and x86 architectures.  Also add
      some documentation about alignment and networking, and especially one intended
      use of this symbol.
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Acked-by: NSam Ravnborg <sam@ravnborg.org>
      Acked-by: Ingo Molnar <mingo@elte.hu> [x86 architecture part]
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      58340a07
    • T
      [IA64] Wire up new system calls · 3e4d0cab
      Tony Luck 提交于
      Six new system calls: signalfd4, eventfd2, epoll_create1,
      dup3, pipe2 and inotify_init1.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      3e4d0cab
  2. 25 7月, 2008 27 次提交
    • N
      powerpc/pseries: Remove kmalloc call in handling writes to lparcfg · 16c14b46
      Nathan Fontenot 提交于
      There are only 4 valid name=value pairs for writes to
      /proc/ppc64/lparcfg.  Current code allocates a buffer to copy
      this information in from the user.  Since the longest name=value
      pair will easily fit into a buffer of 64 characters, simply
      put the buffer on the stack instead of allocating the buffer.
      Signed-off-by: NNathan Fotenot <nfont@austin.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      16c14b46
    • N
      powerpc/pseries: Update arch vector to indicate support for CMO · 8391e42a
      Nathan Fontenot 提交于
      Update the architecture vector to indicate that Cooperative Memory
      Overcommitment is supported if CONFIG_PPC_SMLPAR is set.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      8391e42a
    • N
      powerpc/pseries: Verify CMO memory entitlement updates with virtual I/O · 22e1a4dd
      Nathan Fontenot 提交于
      Verify memory entitlement updates can be handled by vio.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      22e1a4dd
    • R
      powerpc/pseries: vio bus support for CMO · a90ab95a
      Robert Jennings 提交于
      This is a large patch but the normal code path is not affected.  For
      non-pSeries platforms the code is ifdef'ed out and for non-CMO enabled
      pSeries systems this does not affect the normal code path.  Devices that
      do not perform DMA operations do not need modification with this patch.
      The function get_desired_dma was renamed from get_io_entitlement for
      clarity.
      
      Overview
      
      Cooperative Memory Overcommitment (CMO) allows for a set of OS partitions
      to be run with less RAM than the aggregate needs of the group of
      partitions.  The firmware will balance memory between the partitions
      and page in/out memory as needed.  Based on the number and type of IO
      adpaters preset each partition is allocated an amount of memory for
      DMA operations and this allocation will be guaranteed to the partition;
      this is referred to as the partition's 'entitlement'.
      
      Partitions running in a CMO environment can only have virtual IO devices
      present.  The VIO bus layer will manage the IO entitlement for the system.
      Accounting, at a system and per-device level, is tracked in the VIO bus
      code and exposed via sysfs.  A set of dma_ops functions are added to
      the bus to allow for this accounting.
      
      Bus initialization
      
      At initialization, the bus will calculate the minimum needs of the system
      based on providing each device present with a standard minimum entitlement
      along with a spare allocation for the bus to handle hotplug events.
      If the minimum needs can not be met the system boot will be halted.
      
      Device changes
      
      The significant changes for devices while running under CMO are that the
      devices must specify how much dedicated IO entitlement they desire and
      must also handle DMA mapping errors that can occur due to constrained
      IO memory.  The virtual IO drivers are modified to silence errors when
      DMA mappings fail for CMO and handle these failures gracefully.
      
      Each devices will be guaranteed a minimum entitlement that can always
      be mapped.  Devices will specify how much entitlement they desire and
      the VIO bus will attempt to provide for this.  Devices can change their
      desired entitlement level at any point in time to address particular needs
      (via vio_cmo_set_dev_desired()), not just at device probe time.
      
      VIO bus changes
      
      The system will have a particular entitlement level available from which
      it can provide memory to the devices.  The bus defines two pools of memory
      within this entitlement, the reserved and excess pools.  Each device is
      provided with it's own entitlement no less than a system defined minimum
      entitlement and no greater than what the device has specified as it's
      desired entitlement.  The entitlement provided to devices comes from the
      reserve pool.  The reserve pool can also contain a spare allocation as
      large as the system defined minimum entitlement which is used for device
      hotplug events.  Any entitlement not needed to fulfill the needs of a
      reserve pool is placed in the excess pool.  Each device is guaranteed
      that it can map up to it's entitled level; additional mapping are possible
      as long as there is unmapped memory in the excess pool.
      
      Bus probe
      
      As the system starts, each device is given an entitlement equal only
      to the system defined minimum entitlement.  The reserve pool is equal
      to the sum of these entitlements, plus a spare allocation.  The VIO bus
      also tracks the aggregate desired entitlement of all the devices.  If the
      system desired entitlement is greater than the size of the reserve pool,
      when devices unmap IO memory it will be reserved and a balance operation
      will be scheduled for some time in the future.
      
      Entitlement balancing
      
      The balance function tries to fairly distribute entitlement between the
      devices in the system with the goal of providing each device with it's
      desired amount of entitlement.  Devices using more than what would be
      ideal will have their entitled set-point adjusted; this will effectively
      set a goal for lower IO memory usage as future mappings can fail and
      deallocations will trigger a balance operation to distribute the newly
      unmapped memory.  A fair distribution of entitlement can take several
      balance operations to achieve.  Entitlement changes and device DLPAR
      events will alter the state of CMO and will trigger balance operations.
      
      Hotplug events
      
      The VIO bus allows for changes in system entitlement at run-time via
      'vio_cmo_entitlement_update()'.  When devices are added the hotplug
      device event will be preceded by a system entitlement increase and this
      is reversed when devices are removed.
      
      The following changes are made that the VIO bus layer for CMO:
       * add IO memory accounting per device structure.
       * add IO memory entitlement query function to driver structure.
       * during vio bus probe, if CMO is enabled, check that driver has
         memory entitlement query function defined.  Fail if function not defined.
       * fail to register driver if io entitlement function not defined.
       * create set of dma_ops at vio level for CMO that will track allocations
         and return DMA failures once entitlement is reached.  Entitlement will
         limited by overall system entitlement.  Devices will have a reserved
         quantity of memory that is guaranteed, the rest can be used as available.
       * expose entitlement, current allocation, desired allocation, and the
         allocation error counter for devices to the user through sysfs
       * provide mechanism for changing a device's desired entitlement at run time
         for devices as an exported function and sysfs tunable
       * track any DMA failures for entitled IO memory for each vio device.
       * check entitlement against available system entitlement on device add
       * track entitlement metrics (high water mark, current usage)
       * provide function to reset high water mark
       * provide minimum and desired entitlement numbers at a bus level
       * provide drivers with a minimum guaranteed entitlement
       * balance available entitlement between devices to satisfy their needs
       * handle system entitlement changes and device hotplug
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a90ab95a
    • R
      powerpc/pseries: iommu enablement for CMO · 6490c490
      Robert Jennings 提交于
      To support Cooperative Memory Overcommitment (CMO), we need to check
      for failure from some of the tce hcalls.
      
      These changes for the pseries platform affect the powerpc architecture;
      patches for the other affected platforms are included in this patch.
      
      pSeries platform IOMMU code changes:
       * platform TCE functions must handle H_NOT_ENOUGH_RESOURCES errors and
         return an error.
      
      Architecture IOMMU code changes:
       * Calls to ppc_md.tce_build need to check return values and return
         DMA_MAPPING_ERROR for transient errors.
      
      Architecture changes:
       * struct machdep_calls for tce_build*_pSeriesLP functions need to change
         to indicate failure.
       * all other platforms will need updates to iommu functions to match the new
         calling semantics; they will return 0 on success.  The other platforms
         default configs have been built, but no further testing was performed.
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NOlof Johansson <olof@lixom.net>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      6490c490
    • B
      powerpc/pseries: Add CMO paging statistics · ffa5abbd
      Brian King 提交于
      With the addition of Cooperative Memory Overcommitment (CMO) support
      for IBM Power Systems, two fields have been added to the VPA to report
      paging statistics.  Add support in lparcfg to report them to userspace.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      ffa5abbd
    • B
      powerpc/pseries: Add collaborative memory manager · 84af458b
      Brian King 提交于
      Adds a collaborative memory manager, which acts as a simple balloon driver
      for System p machines that support cooperative memory overcommitment
      (CMO).
      
      Adds a platform configuration option for CMO called PPC_SMLPAR.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      84af458b
    • B
      powerpc/pseries: Utilities to set firmware page state · 86630a32
      Brian King 提交于
      Newer versions of firmware support page states, which are used by the
      collaborative memory manager (future patch) to "loan" pages to the
      hypervisor for use by other partitions.
      Signed-off-by: NBrian King <brking@linux.vnet.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      86630a32
    • R
      powerpc/pseries: Enable CMO feature during platform setup · e46de429
      Robert Jennings 提交于
      For Cooperative Memory Overcommitment (CMO), set the FW_FEATURE_CMO
      flag in powerpc_firmware_features from the rtas ibm,get-system-parameters
      table prior to calling iommu_init_early_pSeries.
      
      With this, any CMO specific functionality can be controlled by checking:
       firmware_has_feature(FW_FEATURE_CMO)
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      e46de429
    • R
      powerpc/pseries: Split retrieval of processor entitlement data into a helper routine · 398778f7
      Robert Jennings 提交于
      Split the retrieval of processor entitlement data returned in the H_GET_PPP
      hcall into its own helper routine.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      398778f7
    • N
      powerpc/pseries: Add memory entitlement capabilities to /proc/ppc64/lparcfg · dfc3403f
      Nathan Fontenot 提交于
      Update /proc/ppc64/lparcfg to display Cooperative Memory
      Overcommitment statistics as reported by the H_GET_MPP hcall.  This
      also updates the lparcfg interface to allow setting memory entitlement
      and weight.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      dfc3403f
    • N
      powerpc/pseries: Split processor entitlement retrieval and gathering to helper routines · 11529396
      Nathan Fotenot 提交于
      Split the retrieval and setting of processor entitlement and weight into
      helper routines.  This also removes the printing of the raw values
      returned from h_get_ppp, the values are already parsed and printed.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      11529396
    • N
      powerpc/pseries: Remove extraneous error reporting for hcall failures in lparcfg · 545500b3
      Nathan Fontenot 提交于
      Remove the extraneous error reporting used when a hcall made from lparcfg fails.
      Signed-off-by: NNathan Fontenot <nfont@austin.ibm.com>
      Signed-off-by: NRobert Jennings <rcj@linux.vnet.ibm.com>
      Acked-by: NPaul Mackerras <paulus@samba.org>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      545500b3
    • S
      powerpc: Fix compile error with binutils 2.15 · 80c60bf9
      Segher Boessenkool 提交于
      My previous patch to fix compilation with binutils-2.17 causes
      a "file truncated" build error from ld with binutils 2.15 (and
      possibly older), and a warning with 2.16 and 2.17.
      
      This fixes it.
      Signed-off-by: NSegher Boessenkool <segher@kernel.crashing.org>
      Acked-by: NChuck Meade <chuckmeade@mindspring.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      80c60bf9
    • M
      powerpc/cell: Fixed IOMMU mapping uses weak ordering for a pcie endpoint · 7886250e
      Mark Nelson 提交于
      At the moment the fixed mapping is by default strongly ordered (the
      iommu_fixed=weak boot option must be used to make the fixed mapping weakly
      ordered). If we're on a setup where the southbridge is being used in
      endpoint mode (triblade and CAB boards) the default should be a weakly
      ordered fixed mapping.
      
      This adds a check so that if a node of type pcie-endpoint can be found in
      the device tree the fixed mapping is set to be weak by default (but can be
      overridden using iommu_fixed=strong).
      Signed-off-by: NMark Nelson <markn@au1.ibm.com>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7886250e
    • L
      powerpc: BookE hardware watchpoint support · d6a61bfc
      Luis Machado 提交于
      This patch implements support for HW based watchpoint via the
      DBSR_DAC (Data Address Compare) facility of the BookE processors.
      
      It does so by interfacing with the existing DABR breakpoint code
      and adding the necessary bits and pieces for the new bits to
      be properly set or cleared
      Signed-off-by: NLuis Machado <luisgpm@br.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      d6a61bfc
    • S
      powerpc: Fallout from sysdev API changes · 00bf6e90
      Stephen Rothwell 提交于
      A struct sysdev_attribute * parameter was added to the show routine by
      commit 4a0b2b4d "sysdev: Pass the
      attribute to the low level sysdev show/store function".
      
      This eliminates a warning:
      
      arch/powerpc/kernel/sysfs.c:538: warning: initialization from incompatible pointer type
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      00bf6e90
    • N
      powerpc: Enable AT_BASE_PLATFORM aux vector · 9115d134
      Nathan Lynch 提交于
      Stash the first platform string matched by identify_cpu() in
      powerpc_base_platform, and supply that to the ELF loader for the value
      of AT_BASE_PLATFORM.
      Signed-off-by: NNathan Lynch <ntl@pobox.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      9115d134
    • C
      s390: use virtio_console for KVM on s390 · faeba830
      Christian Borntraeger 提交于
      This patch enables virtio_console as the default console on kvm for
      s390. We currently use the same notify hack as lguest for early
      console output. I will try to address this for lguest and s390 later.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      faeba830
    • L
      x86/oprofile/nmi_int: add Nehalem to list of ppro cores · 4b9f12a3
      Linus Torvalds 提交于
      ..otherwise oprofile will fall back on that poor timer interrupt.
      
      Also replace the unreadable chain of if-statements with a "switch()"
      statement instead. It generates better code, and is a lot clearer.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b9f12a3
    • L
      x86-64: Clean up 'save/restore_i387()' usage · b30f3ae5
      Linus Torvalds 提交于
      Suresh Siddha wants to fix a possible FPU leakage in error conditions,
      but the fact that save/restore_i387() are inlines in a header file makes
      that harder to do than necessary.  So start off with an obvious cleanup.
      
      This just moves the x86-64 version of save/restore_i387() out of the
      header file, and moves it to the only file that it is actually used in:
      arch/x86/kernel/signal_64.c.  So exposing it in a header file was wrong
      to begin with.
      
      [ Side note: I'd like to fix up some of the games we play with the
        32-bit version of these functions too, but that's a separate
        matter.  The 32-bit versions are shared - under different names
        at that! - by both the native x86-32 code and the x86-64 32-bit
        compatibility code ]
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b30f3ae5
    • L
      x86-64: make BUILD_IRQ() also reset section back · 6209ed9d
      Linus Torvalds 提交于
      Commit 9d25d4db ("x86: BUILD_IRQ say
      .text to avoid .data.percpu") added a ".text" specifier to make sure
      that BUILD_IRQ() builds the irq trampoline in the text segment rather
      than in some random left-over segment that the compiler happened to
      leave the asm in.
      
      However, we should also make sure that we switch back by adding a
      ".previous" at the end, so that there are no subtle issues with
      subsequent compiler-generated code.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6209ed9d
    • D
      rtc-cmos: avoid spurious irqs · 7e2a31da
      David Brownell 提交于
      This fixes kernel http://bugzilla.kernel.org/show_bug.cgi?id=11112 (bogus
      RTC update IRQs reported) for rtc-cmos, in two ways:
      
        - When HPET is stealing the IRQs, use the first IRQ to grab
          the seconds counter which will be monitored (instead of
          using whatever was previously in that memory);
      
        - In sane IRQ handling modes, scrub out old IRQ status before
          enabling IRQs.
      
      That latter is done by tightening up IRQ handling for rtc-cmos everywhere,
      also ensuring that when HPET is used it's the only thing triggering IRQ
      reports to userspace; net object shrink.
      
      Also fix a bogus HPET message related to its RTC emulation.
      Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
      Report-by: NW Unruh <unruh@physics.ubc.ca>
      Cc: Andrew Victor <avictor.za@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7e2a31da
    • U
      flag parameters add-on: remove epoll_create size param · 9fe5ad9c
      Ulrich Drepper 提交于
      Remove the size parameter from the new epoll_create syscall and renames the
      syscall itself.  The updated test program follows.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_epoll_create2
      # ifdef __x86_64__
      #  define __NR_epoll_create2 291
      # elif defined __i386__
      #  define __NR_epoll_create2 329
      # else
      #  error "need __NR_epoll_create2"
      # endif
      #endif
      
      #define EPOLL_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd = syscall (__NR_epoll_create2, 0);
        if (fd == -1)
          {
            puts ("epoll_create2(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("epoll_create2(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_epoll_create2, EPOLL_CLOEXEC);
        if (fd == -1)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("epoll_create2(EPOLL_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9fe5ad9c
    • U
      flag parameters: inotify_init · 4006553b
      Ulrich Drepper 提交于
      This patch introduces the new syscall inotify_init1 (note: the 1 stands for
      the one parameter the syscall takes, as opposed to no parameter before).  The
      values accepted for this parameter are function-specific and defined in the
      inotify.h header.  Here the values must match the O_* flags, though.  In this
      patch CLOEXEC support is introduced.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_inotify_init1
      # ifdef __x86_64__
      #  define __NR_inotify_init1 294
      # elif defined __i386__
      #  define __NR_inotify_init1 332
      # else
      #  error "need __NR_inotify_init1"
      # endif
      #endif
      
      #define IN_CLOEXEC O_CLOEXEC
      
      int
      main (void)
      {
        int fd;
        fd = syscall (__NR_inotify_init1, 0);
        if (fd == -1)
          {
            puts ("inotify_init1(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("inotify_init1(0) set close-on-exit");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_inotify_init1, IN_CLOEXEC);
        if (fd == -1)
          {
            puts ("inotify_init1(IN_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("inotify_init1(O_CLOEXEC) does not set close-on-exit");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      
      [akpm@linux-foundation.org: add sys_ni stub]
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4006553b
    • U
      flag parameters: pipe · ed8cae8b
      Ulrich Drepper 提交于
      This patch introduces the new syscall pipe2 which is like pipe but it also
      takes an additional parameter which takes a flag value.  This patch implements
      the handling of O_CLOEXEC for the flag.  I did not add support for the new
      syscall for the architectures which have a special sys_pipe implementation.  I
      think the maintainers of those archs have the chance to go with the unified
      implementation but that's up to them.
      
      The implementation introduces do_pipe_flags.  I did that instead of changing
      all callers of do_pipe because some of the callers are written in assembler.
      I would probably screw up changing the assembly code.  To avoid breaking code
      do_pipe is now a small wrapper around do_pipe_flags.  Once all callers are
      changed over to do_pipe_flags the old do_pipe function can be removed.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_pipe2
      # ifdef __x86_64__
      #  define __NR_pipe2 293
      # elif defined __i386__
      #  define __NR_pipe2 331
      # else
      #  error "need __NR_pipe2"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd[2];
        if (syscall (__NR_pipe2, fd, 0) != 0)
          {
            puts ("pipe2(0) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if (coe & FD_CLOEXEC)
              {
                printf ("pipe2(0) set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        if (syscall (__NR_pipe2, fd, O_CLOEXEC) != 0)
          {
            puts ("pipe2(O_CLOEXEC) failed");
            return 1;
          }
        for (int i = 0; i < 2; ++i)
          {
            int coe = fcntl (fd[i], F_GETFD);
            if (coe == -1)
              {
                puts ("fcntl failed");
                return 1;
              }
            if ((coe & FD_CLOEXEC) == 0)
              {
                printf ("pipe2(O_CLOEXEC) does not set close-on-exit for fd[%d]\n", i);
                return 1;
              }
          }
        close (fd[0]);
        close (fd[1]);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed8cae8b
    • U
      flag parameters: dup2 · 336dd1f7
      Ulrich Drepper 提交于
      This patch adds the new dup3 syscall.  It extends the old dup2 syscall by one
      parameter which is meant to hold a flag value.  Support for the O_CLOEXEC flag
      is added in this patch.
      
      The following test must be adjusted for architectures other than x86 and
      x86-64 and in case the syscall numbers changed.
      
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      #include <fcntl.h>
      #include <stdio.h>
      #include <time.h>
      #include <unistd.h>
      #include <sys/syscall.h>
      
      #ifndef __NR_dup3
      # ifdef __x86_64__
      #  define __NR_dup3 292
      # elif defined __i386__
      #  define __NR_dup3 330
      # else
      #  error "need __NR_dup3"
      # endif
      #endif
      
      int
      main (void)
      {
        int fd = syscall (__NR_dup3, 1, 4, 0);
        if (fd == -1)
          {
            puts ("dup3(0) failed");
            return 1;
          }
        int coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if (coe & FD_CLOEXEC)
          {
            puts ("dup3(0) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        fd = syscall (__NR_dup3, 1, 4, O_CLOEXEC);
        if (fd == -1)
          {
            puts ("dup3(O_CLOEXEC) failed");
            return 1;
          }
        coe = fcntl (fd, F_GETFD);
        if (coe == -1)
          {
            puts ("fcntl failed");
            return 1;
          }
        if ((coe & FD_CLOEXEC) == 0)
          {
            puts ("dup3(O_CLOEXEC) set close-on-exec flag");
            return 1;
          }
        close (fd);
      
        puts ("OK");
      
        return 0;
      }
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Signed-off-by: NUlrich Drepper <drepper@redhat.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      336dd1f7