1. 23 9月, 2009 32 次提交
    • A
      spi: prefix modalias with "spi:" · e0626e38
      Anton Vorontsov 提交于
      This makes it consistent with other buses (platform, i2c, vio, ...).  I'm
      not sure why we use the prefixes, but there must be a reason.
      
      This was easy enough to do it, and I did it.
      Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com>
      Cc: David Brownell <dbrownell@users.sourceforge.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Jean Delvare <khali@linux-fr.org>
      Cc: Ben Dooks <ben-linux@fluff.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Dmitry Torokhov <dtor@mail.ru>
      Cc: Samuel Ortiz <sameo@openedhand.com>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Acked-by: NMike Frysinger <vapier.adi@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0626e38
    • A
      spi: add support for device table matching · 75368bf6
      Anton Vorontsov 提交于
      With this patch spi drivers can use standard spi_driver.id_table and
      MODULE_DEVICE_TABLE() mechanisms to bind against the devices.  Just like
      we do with I2C drivers.
      
      This is useful when a single driver supports several variants of devices
      but it is not possible to detect them in run-time (like non-JEDEC chips
      probing in drivers/mtd/devices/m25p80.c), and when platform_data usage is
      overkill.
      
      This patch also makes life a lot easier on OpenFirmware platforms, since
      with OF we extensively use proper device IDs in modaliases.
      Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com>
      Cc: David Brownell <dbrownell@users.sourceforge.net>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Cc: Jean Delvare <khali@linux-fr.org>
      Cc: Ben Dooks <ben-linux@fluff.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      75368bf6
    • R
      spi.h: add missing kernel-doc for struct spi_master · b73b2559
      Randy Dunlap 提交于
      Add missing kernel-doc notation in spi.h for struct spi_master:
      
      Warning(include/linux/spi/spi.h:289): No description found for parameter 'mode_bits'
      Warning(include/linux/spi/spi.h:289): No description found for parameter 'flags'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Acked-by: NDavid Brownell <dbrownell@users.sourceforge.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b73b2559
    • M
      ramfs: move RAMFS_MAGIC to include/linux/magic.h · a7e3108c
      maximilian attems 提交于
      initramfs userspace likes to use this magic number.
      
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Signed-off-by: Nmaximilian attems <max@stro.at>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7e3108c
    • K
      kcore: register module area in generic way · 81ac3ad9
      KAMEZAWA Hiroyuki 提交于
      Some archs define MODULED_VADDR/MODULES_END which is not in VMALLOC area.
      This is handled only in x86-64.  This patch make it more generic.  And we
      can use vread/vwrite to access the area.  Fix it.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      81ac3ad9
    • K
      kcore: register vmemmap range · 26562c59
      KAMEZAWA Hiroyuki 提交于
      Benjamin Herrenschmidt <benh@kernel.crashing.org> pointed out that vmemmap
      range is not included in KCORE_RAM, KCORE_VMALLOC ....
      
      This adds KCORE_VMEMMAP if SPARSEMEM_VMEMMAP is used.  By this, vmemmap
      can be readable via /proc/kcore
      
      Because it's not vmalloc area, vread/vwrite cannot be used.  But the range
      is static against the memory layout, this patch handles vmemmap area by
      the same scheme with physical memory.
      
      This patch assumes SPARSEMEM_VMEMMAP range is not in VMALLOC range.  It's
      correct now.
      
      [akpm@linux-foundation.org: fix typo]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26562c59
    • K
      walk system ram range · 908eedc6
      KAMEZAWA Hiroyuki 提交于
      Originally, walk_memory_resource() was introduced to traverse all memory
      of "System RAM" for detecting memory hotplug/unplug range.  For doing so,
      flags of IORESOUCE_MEM|IORESOURCE_BUSY was used and this was enough for
      memory hotplug.
      
      But for using other purpose, /proc/kcore, this may includes some firmware
      area marked as IORESOURCE_BUSY | IORESOUCE_MEM.  This patch makes the
      check strict to find out busy "System RAM".
      
      Note: PPC64 keeps their own walk_memory_resouce(), which walk through
      ppc64's lmb informaton.  Because old kclist_add() is called per lmb, this
      patch makes no difference in behavior, finally.
      
      And this patch removes CONFIG_MEMORY_HOTPLUG check from this function.
      Because pfn_valid() just show "there is memmap or not* and cannot be used
      for "there is physical memory or not", this function is useful in generic
      to scan physical memory range.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Cc: Américo Wang <xiyou.wangcong@gmail.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Roland Dreier <rolandd@cisco.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      908eedc6
    • K
      kcore: add kclist types · c30bb2a2
      KAMEZAWA Hiroyuki 提交于
      Presently, kclist_add() only eats start address and size as its arguments.
      Considering to make kclist dynamically reconfigulable, it's necessary to
      know which kclists are for System RAM and which are not.
      
      This patch add kclist types as
        KCORE_RAM
        KCORE_VMALLOC
        KCORE_TEXT
        KCORE_OTHER
      
      This "type" is used in a patch following this for detecting KCORE_RAM.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c30bb2a2
    • K
      kcore: use usual list for kclist · 2ef43ec7
      KAMEZAWA Hiroyuki 提交于
      This patchset is for /proc/kcore.  With this,
      
       - many per-arch hooks are removed.
      
       - /proc/kcore will know really valid physical memory area.
      
       - /proc/kcore will be aware of memory hotplug.
      
       - /proc/kcore will be architecture independent i.e.
         if an arch supports CONFIG_MMU, it can use /proc/kcore.
         (if the arch uses usual memory layout.)
      
      This patch:
      
      /proc/kcore uses its own list handling codes. It's better to use
      generic list codes.
      
      No changes in logic. just clean up.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: WANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ef43ec7
    • S
      procfs: provide stack information for threads · d899bf7b
      Stefani Seibold 提交于
      A patch to give a better overview of the userland application stack usage,
      especially for embedded linux.
      
      Currently you are only able to dump the main process/thread stack usage
      which is showed in /proc/pid/status by the "VmStk" Value.  But you get no
      information about the consumed stack memory of the the threads.
      
      There is an enhancement in the /proc/<pid>/{task/*,}/*maps and which marks
      the vm mapping where the thread stack pointer reside with "[thread stack
      xxxxxxxx]".  xxxxxxxx is the maximum size of stack.  This is a value
      information, because libpthread doesn't set the start of the stack to the
      top of the mapped area, depending of the pthread usage.
      
      A sample output of /proc/<pid>/task/<tid>/maps looks like:
      
      08048000-08049000 r-xp 00000000 03:00 8312       /opt/z
      08049000-0804a000 rw-p 00001000 03:00 8312       /opt/z
      0804a000-0806b000 rw-p 00000000 00:00 0          [heap]
      a7d12000-a7d13000 ---p 00000000 00:00 0
      a7d13000-a7f13000 rw-p 00000000 00:00 0          [thread stack: 001ff4b4]
      a7f13000-a7f14000 ---p 00000000 00:00 0
      a7f14000-a7f36000 rw-p 00000000 00:00 0
      a7f36000-a8069000 r-xp 00000000 03:00 4222       /lib/libc.so.6
      a8069000-a806b000 r--p 00133000 03:00 4222       /lib/libc.so.6
      a806b000-a806c000 rw-p 00135000 03:00 4222       /lib/libc.so.6
      a806c000-a806f000 rw-p 00000000 00:00 0
      a806f000-a8083000 r-xp 00000000 03:00 14462      /lib/libpthread.so.0
      a8083000-a8084000 r--p 00013000 03:00 14462      /lib/libpthread.so.0
      a8084000-a8085000 rw-p 00014000 03:00 14462      /lib/libpthread.so.0
      a8085000-a8088000 rw-p 00000000 00:00 0
      a8088000-a80a4000 r-xp 00000000 03:00 8317       /lib/ld-linux.so.2
      a80a4000-a80a5000 r--p 0001b000 03:00 8317       /lib/ld-linux.so.2
      a80a5000-a80a6000 rw-p 0001c000 03:00 8317       /lib/ld-linux.so.2
      afaf5000-afb0a000 rw-p 00000000 00:00 0          [stack]
      ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
      
      Also there is a new entry "stack usage" in /proc/<pid>/{task/*,}/status
      which will you give the current stack usage in kb.
      
      A sample output of /proc/self/status looks like:
      
      Name:	cat
      State:	R (running)
      Tgid:	507
      Pid:	507
      .
      .
      .
      CapBnd:	fffffffffffffeff
      voluntary_ctxt_switches:	0
      nonvoluntary_ctxt_switches:	0
      Stack usage:	12 kB
      
      I also fixed stack base address in /proc/<pid>/{task/*,}/stat to the base
      address of the associated thread stack and not the one of the main
      process.  This makes more sense.
      
      [akpm@linux-foundation.org: fs/proc/array.c now needs walk_page_range()]
      Signed-off-by: NStefani Seibold <stefani@seibold.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d899bf7b
    • N
      mmc: make SDIO device/driver struct accessors public · 996ad568
      Nicolas Pitre 提交于
      Especially with the PM framework, those are quite handy to have in driver
      code too.
      Signed-off-by: NNicolas Pitre <nico@marvell.com>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      996ad568
    • O
      sdio: add MMC_QUIRK_LENIENT_FN0 · 7c979ec7
      Ohad Ben-Cohen 提交于
      Normally writes to SDIO function 0 outside the vendor specific CCCR
      registers are prohibited.
      
      To support embedded devices that require writes to SDIO function 0 outside
      this range (e.g.  TI WL127x embedded sdio wifi device),
      MMC_QUIRK_LENIENT_FN0 is introduced.
      
      A card quirks field is added to `struct mmc_card' to support non-standard
      devices (e.g.  embedded sdio devices).
      
      [akpm@linux-foundation.org: code in C, not cpp!]
      Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c979ec7
    • O
      sdio: add CD disable support · 006ebd5d
      Ohad Ben-Cohen 提交于
      Add support to disconnect the pull-up resistor on CD/DAT[3] (pin 1)
      of the card. This may be desired on certain setups of boards,
      controllers and embedded sdio devices which do not need the card's
      pull-up. As a result, card detection is disabled and power is saved.
      
      [akpm@linux-foundation.org: simplify sdio_disable_cd() a bit]
      Signed-off-by: NOhad Ben-Cohen <ohad@wizery.com>
      Acked-by: NMatt Fleming <matt@console-pimps.org>
      Cc: Ian Molton <ian@mnementh.co.uk>
      Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
      Cc: Philip Langdale <philipl@overt.org>
      Cc: Pierre Ossman <pierre@ossman.eu>
      Cc: David Vrabel <david.vrabel@csr.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      006ebd5d
    • A
      mmc: check status after MMC SWITCH command · ef0b27d4
      Adrian Hunter 提交于
      According to the standard, the SWITCH command should be followed by a
      SEND_STATUS command to check for errors.
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Acked-by: NMatt Fleming <matt@console-pimps.org>
      Cc: Ian Molton <ian@mnementh.co.uk>
      Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
      Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
      Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
      Cc: Pierre Ossman <pierre@ossman.eu>
      Cc: Philip Langdale <philipl@overt.org>
      Cc: "Madhusudhan" <madhu.cr@ti.com>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef0b27d4
    • J
      mmc: add mmc card sleep and awake support · b1ebe384
      Jarkko Lavinen 提交于
      Add support for the new MMC command SLEEP_AWAKE.
      Signed-off-by: NJarkko Lavinen <jarkko.lavinen@nokia.com>
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Acked-by: NMatt Fleming <matt@console-pimps.org>
      Cc: Ian Molton <ian@mnementh.co.uk>
      Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
      Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
      Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
      Cc: Pierre Ossman <pierre@ossman.eu>
      Cc: Philip Langdale <philipl@overt.org>
      Cc: "Madhusudhan" <madhu.cr@ti.com>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1ebe384
    • A
      mmc: add ability to save power by powering off cards · eae1aeee
      Adrian Hunter 提交于
      Power can be saved by powering off cards that are not in use.  This is
      similar to suspend / resume except it is under the control of the driver,
      and does not require any power management support.  It can only be used
      when the driver can monitor whether the card is removed, otherwise it is
      unsafe.  This is possible because, unlike suspend, the driver still
      receives card detect and / or cover switch interrupts.
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Acked-by: NMatt Fleming <matt@console-pimps.org>
      Cc: Ian Molton <ian@mnementh.co.uk>
      Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
      Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
      Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
      Cc: Pierre Ossman <pierre@ossman.eu>
      Cc: Philip Langdale <philipl@overt.org>
      Cc: "Madhusudhan" <madhu.cr@ti.com>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eae1aeee
    • A
      mmc: add MMC_CAP_NONREMOVABLE host capability · 9feae246
      Adrian Hunter 提交于
      eMMC's are not removable, so unsafe resume is OK always.
      
      To permit this a new host capability MMC_CAP_NONREMOVABLE has been added
      and suspend / resume updated accordingly.
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Acked-by: NMatt Fleming <matt@console-pimps.org>
      Cc: Ian Molton <ian@mnementh.co.uk>
      Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
      Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
      Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
      Cc: Pierre Ossman <pierre@ossman.eu>
      Cc: Philip Langdale <philipl@overt.org>
      Cc: "Madhusudhan" <madhu.cr@ti.com>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9feae246
    • A
      mmc: allow host claim / release nesting · 319a3f14
      Adrian Hunter 提交于
      This change allows the MMC host to be claimed in situations where the host
      may or may not have already been claimed.  Also 'mmc_try_claim_host()' is
      now exported.
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Acked-by: NMatt Fleming <matt@console-pimps.org>
      Cc: Ian Molton <ian@mnementh.co.uk>
      Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
      Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
      Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
      Cc: Pierre Ossman <pierre@ossman.eu>
      Cc: Philip Langdale <philipl@overt.org>
      Cc: "Madhusudhan" <madhu.cr@ti.com>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      319a3f14
    • A
      mmc: add 'enable' and 'disable' methods to mmc host · 8ea926b2
      Adrian Hunter 提交于
      MMC hosts that support power saving can use the 'enable' and 'disable'
      methods to exit and enter power saving states.  An explanation of their
      use is provided in the comments added to include/linux/mmc/host.h.
      Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
      Acked-by: NMatt Fleming <matt@console-pimps.org>
      Cc: Ian Molton <ian@mnementh.co.uk>
      Cc: "Roberto A. Foglietta" <roberto.foglietta@gmail.com>
      Cc: Jarkko Lavinen <jarkko.lavinen@nokia.com>
      Cc: Denis Karpov <ext-denis.2.karpov@nokia.com>
      Cc: Pierre Ossman <pierre@ossman.eu>
      Cc: Philip Langdale <philipl@overt.org>
      Cc: "Madhusudhan" <madhu.cr@ti.com>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8ea926b2
    • M
      asm/sections: add text/data checking functions for arches to override · 00afe029
      Mike Frysinger 提交于
      Some ports (like the Blackfin arch) have a discontiguous memory map which
      means there may be text or data that falls outside of the standard range
      of the start/end text/data symbols.  Creating some helper functions allows
      these non-standard ports to declare these regions without adversely
      affecting anyone else.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Robin Getz <rgetz@blackfin.uclinux.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00afe029
    • J
      getrusage: fill ru_maxrss value · 1f10206c
      Jiri Pirko 提交于
      Make ->ru_maxrss value in struct rusage filled accordingly to rss hiwater
      mark.  This struct is filled as a parameter to getrusage syscall.
      ->ru_maxrss value is set to KBs which is the way it is done in BSD
      systems.  /usr/bin/time (gnu time) application converts ->ru_maxrss to KBs
      which seems to be incorrect behavior.  Maintainer of this util was
      notified by me with the patch which corrects it and cc'ed.
      
      To make this happen we extend struct signal_struct by two fields.  The
      first one is ->maxrss which we use to store rss hiwater of the task.  The
      second one is ->cmaxrss which we use to store highest rss hiwater of all
      task childs.  These values are used in k_getrusage() to actually fill
      ->ru_maxrss.  k_getrusage() uses current rss hiwater value directly if mm
      struct exists.
      
      Note:
      exec() clear mm->hiwater_rss, but doesn't clear sig->maxrss.
      it is intetionally behavior. *BSD getrusage have exec() inheriting.
      
      test programs
      ========================================================
      
      getrusage.c
      ===========
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <sys/types.h>
       #include <sys/time.h>
       #include <sys/resource.h>
       #include <sys/types.h>
       #include <sys/wait.h>
       #include <unistd.h>
       #include <signal.h>
       #include <sys/mman.h>
      
       #include "common.h"
      
       #define err(str) perror(str), exit(1)
      
      int main(int argc, char** argv)
      {
      	int status;
      
      	printf("allocate 100MB\n");
      	consume(100);
      
      	printf("testcase1: fork inherit? \n");
      	printf("  expect: initial.self ~= child.self\n");
      	show_rusage("initial");
      	if (__fork()) {
      		wait(&status);
      	} else {
      		show_rusage("fork child");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase2: fork inherit? (cont.) \n");
      	printf("  expect: initial.children ~= 100MB, but child.children = 0\n");
      	show_rusage("initial");
      	if (__fork()) {
      		wait(&status);
      	} else {
      		show_rusage("child");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase3: fork + malloc \n");
      	printf("  expect: child.self ~= initial.self + 50MB\n");
      	show_rusage("initial");
      	if (__fork()) {
      		wait(&status);
      	} else {
      		printf("allocate +50MB\n");
      		consume(50);
      		show_rusage("fork child");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase4: grandchild maxrss\n");
      	printf("  expect: post_wait.children ~= 300MB\n");
      	show_rusage("initial");
      	if (__fork()) {
      		wait(&status);
      		show_rusage("post_wait");
      	} else {
      		system("./child -n 0 -g 300");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase5: zombie\n");
      	printf("  expect: pre_wait ~= initial, IOW the zombie process is not accounted.\n");
      	printf("          post_wait ~= 400MB, IOW wait() collect child's max_rss. \n");
      	show_rusage("initial");
      	if (__fork()) {
      		sleep(1); /* children become zombie */
      		show_rusage("pre_wait");
      		wait(&status);
      		show_rusage("post_wait");
      	} else {
      		system("./child -n 400");
      		_exit(0);
      	}
      	printf("\n");
      
      	printf("testcase6: SIG_IGN\n");
      	printf("  expect: initial ~= after_zombie (child's 500MB alloc should be ignored).\n");
      	show_rusage("initial");
      	signal(SIGCHLD, SIG_IGN);
      	if (__fork()) {
      		sleep(1); /* children become zombie */
      		show_rusage("after_zombie");
      	} else {
      		system("./child -n 500");
      		_exit(0);
      	}
      	printf("\n");
      	signal(SIGCHLD, SIG_DFL);
      
      	printf("testcase7: exec (without fork) \n");
      	printf("  expect: initial ~= exec \n");
      	show_rusage("initial");
      	execl("./child", "child", "-v", NULL);
      
      	return 0;
      }
      
      child.c
      =======
       #include <sys/types.h>
       #include <unistd.h>
       #include <sys/types.h>
       #include <sys/wait.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <sys/types.h>
       #include <sys/time.h>
       #include <sys/resource.h>
      
       #include "common.h"
      
      int main(int argc, char** argv)
      {
      	int status;
      	int c;
      	long consume_size = 0;
      	long grandchild_consume_size = 0;
      	int show = 0;
      
      	while ((c = getopt(argc, argv, "n:g:v")) != -1) {
      		switch (c) {
      		case 'n':
      			consume_size = atol(optarg);
      			break;
      		case 'v':
      			show = 1;
      			break;
      		case 'g':
      
      			grandchild_consume_size = atol(optarg);
      			break;
      		default:
      			break;
      		}
      	}
      
      	if (show)
      		show_rusage("exec");
      
      	if (consume_size) {
      		printf("child alloc %ldMB\n", consume_size);
      		consume(consume_size);
      	}
      
      	if (grandchild_consume_size) {
      		if (fork()) {
      			wait(&status);
      		} else {
      			printf("grandchild alloc %ldMB\n", grandchild_consume_size);
      			consume(grandchild_consume_size);
      
      			exit(0);
      		}
      	}
      
      	return 0;
      }
      
      common.c
      ========
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
       #include <sys/types.h>
       #include <sys/time.h>
       #include <sys/resource.h>
       #include <sys/types.h>
       #include <sys/wait.h>
       #include <unistd.h>
       #include <signal.h>
       #include <sys/mman.h>
      
       #include "common.h"
       #define err(str) perror(str), exit(1)
      
      void show_rusage(char *prefix)
      {
          	int err, err2;
          	struct rusage rusage_self;
          	struct rusage rusage_children;
      
          	printf("%s: ", prefix);
          	err = getrusage(RUSAGE_SELF, &rusage_self);
          	if (!err)
          		printf("self %ld ", rusage_self.ru_maxrss);
          	err2 = getrusage(RUSAGE_CHILDREN, &rusage_children);
          	if (!err2)
          		printf("children %ld ", rusage_children.ru_maxrss);
      
          	printf("\n");
      }
      
      /* Some buggy OS need this worthless CPU waste. */
      void make_pagefault(void)
      {
      	void *addr;
      	int size = getpagesize();
      	int i;
      
      	for (i=0; i<1000; i++) {
      		addr = mmap(NULL, size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANON, -1, 0);
      		if (addr == MAP_FAILED)
      			err("make_pagefault");
      		memset(addr, 0, size);
      		munmap(addr, size);
      	}
      }
      
      void consume(int mega)
      {
          	size_t sz = mega * 1024 * 1024;
          	void *ptr;
      
          	ptr = malloc(sz);
          	memset(ptr, 0, sz);
      	make_pagefault();
      }
      
      pid_t __fork(void)
      {
      	pid_t pid;
      
      	pid = fork();
      	make_pagefault();
      
      	return pid;
      }
      
      common.h
      ========
      void show_rusage(char *prefix);
      void make_pagefault(void);
      void consume(int mega);
      pid_t __fork(void);
      
      FreeBSD result (expected result)
      ========================================================
      allocate 100MB
      testcase1: fork inherit?
        expect: initial.self ~= child.self
      initial: self 103492 children 0
      fork child: self 103540 children 0
      
      testcase2: fork inherit? (cont.)
        expect: initial.children ~= 100MB, but child.children = 0
      initial: self 103540 children 103540
      child: self 103564 children 0
      
      testcase3: fork + malloc
        expect: child.self ~= initial.self + 50MB
      initial: self 103564 children 103564
      allocate +50MB
      fork child: self 154860 children 0
      
      testcase4: grandchild maxrss
        expect: post_wait.children ~= 300MB
      initial: self 103564 children 154860
      grandchild alloc 300MB
      post_wait: self 103564 children 308720
      
      testcase5: zombie
        expect: pre_wait ~= initial, IOW the zombie process is not accounted.
                post_wait ~= 400MB, IOW wait() collect child's max_rss.
      initial: self 103564 children 308720
      child alloc 400MB
      pre_wait: self 103564 children 308720
      post_wait: self 103564 children 411312
      
      testcase6: SIG_IGN
        expect: initial ~= after_zombie (child's 500MB alloc should be ignored).
      initial: self 103564 children 411312
      child alloc 500MB
      after_zombie: self 103624 children 411312
      
      testcase7: exec (without fork)
        expect: initial ~= exec
      initial: self 103624 children 411312
      exec: self 103624 children 411312
      
      Linux result (actual test result)
      ========================================================
      allocate 100MB
      testcase1: fork inherit?
        expect: initial.self ~= child.self
      initial: self 102848 children 0
      fork child: self 102572 children 0
      
      testcase2: fork inherit? (cont.)
        expect: initial.children ~= 100MB, but child.children = 0
      initial: self 102876 children 102644
      child: self 102572 children 0
      
      testcase3: fork + malloc
        expect: child.self ~= initial.self + 50MB
      initial: self 102876 children 102644
      allocate +50MB
      fork child: self 153804 children 0
      
      testcase4: grandchild maxrss
        expect: post_wait.children ~= 300MB
      initial: self 102876 children 153864
      grandchild alloc 300MB
      post_wait: self 102876 children 307536
      
      testcase5: zombie
        expect: pre_wait ~= initial, IOW the zombie process is not accounted.
                post_wait ~= 400MB, IOW wait() collect child's max_rss.
      initial: self 102876 children 307536
      child alloc 400MB
      pre_wait: self 102876 children 307536
      post_wait: self 102876 children 410076
      
      testcase6: SIG_IGN
        expect: initial ~= after_zombie (child's 500MB alloc should be ignored).
      initial: self 102876 children 410076
      child alloc 500MB
      after_zombie: self 102880 children 410076
      
      testcase7: exec (without fork)
        expect: initial ~= exec
      initial: self 102880 children 410076
      exec: self 102880 children 410076
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1f10206c
    • A
      kmap_types.h: rename D macro · b28cfd2c
      Andi Kleen 提交于
      I tend to use a 'D' debugging macro a lot during debugging.  When I define
      it before includes I often get conflicts with kmap_types.h's use of 'D'
      too.  It's not very nice when a global include pollutes the name space
      like this.
      
      Rename the kmap_types.h D to KMAP_D.  It is only used temporarily in the
      header so has no effect on anything else.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Reviewed-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b28cfd2c
    • R
      Make sure the value in abs() does not get truncated if it is greater than 2^32 · a49c59c0
      Rolf Eike Beer 提交于
      abs() will truncate the input if is it outside the 2^32 range.  Fix that
      by assuming `long' input.
      
      This might generate worse code in the common case.
      Signed-off-by: NRolf Eike Beer <eike-kernel@sf-tec.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a49c59c0
    • D
      anonfd: split interface into file creation and install · 562787a5
      Davide Libenzi 提交于
      Split the anonfd interface into a bare file pointer creation one, and a
      file pointer creation plus install one.
      
      There are cases, like the usage of eventfds inside other kernel
      interfaces, where the file pointer created by anonfd needs to be used
      inside the initialization of other structures.
      
      As it is right now, as soon as anon_inode_getfd() returns, the kenrle can
      race with userspace closing the newly installed file descriptor.
      
      This patch, while keeping the old anon_inode_getfd(), introduces a new
      anon_inode_getfile() (whose services are reused in anon_inode_getfd())
      that allows to split the file creation phase and the fd install one.
      
      Once all the kernel structures are initialized, the code can call the
      proper fd_install().
      
      Gregory manifested the need for something like this inside KVM.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: James Morris <jmorris@namei.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Gregory Haskins <ghaskins@novell.com>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Acked-by: NRoland Dreier <rolandd@cisco.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      562787a5
    • J
      BUILD_BUG_ON(): fix it and a couple of bogus uses of it · 8c87df45
      Jan Beulich 提交于
      gcc permitting variable length arrays makes the current construct used for
      BUILD_BUG_ON() useless, as that doesn't produce any diagnostic if the
      controlling expression isn't really constant.  Instead, this patch makes
      it so that a bit field gets used here.  Consequently, those uses where the
      condition isn't really constant now also need fixing.
      
      Note that in the gfp.h, kmemcheck.h, and virtio_config.h cases
      MAYBE_BUILD_BUG_ON() really just serves documentation purposes - even if
      the expression is compile time constant (__builtin_constant_p() yields
      true), the array is still deemed of variable length by gcc, and hence the
      whole expression doesn't have the intended effect.
      
      [akpm@linux-foundation.org: make arch/sparc/include/asm/vio.h compile]
      [akpm@linux-foundation.org: more nonsensical assertions in tpm.c..]
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Rajiv Andrade <srajiv@linux.vnet.ibm.com>
      Cc: Mimi Zohar <zohar@us.ibm.com>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c87df45
    • R
      printk_once(): use bool for boolean flag · 70867453
      Roland Dreier 提交于
      Using the type bool (instead of int) for the __print_once flag in the
      printk_once() macro matches the intent of the code better, and allows the
      compiler to generate smaller code; eg a typical callsite with gcc 4.3.3 on
      i386:
      
      add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-6 (-6)
      function                                     old     new   delta
      static.__print_once                            4       1      -3
      get_cpu_vendor                               146     143      -3
      
      Saving 6 bytes of object size per callsite by slightly improving the
      readability of the source seems like a win to me.
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      70867453
    • S
      proc connector: add event for process becoming session leader · 02b51df1
      Scott James Remnant 提交于
      The act of a process becoming a session leader is a useful signal to a
      supervising init daemon such as Upstart.
      
      While a daemon will normally do this as part of the process of becoming a
      daemon, it is rare for its children to do so.  When the children do, it is
      nearly always a sign that the child should be considered detached from the
      parent and not supervised along with it.
      
      The poster-child example is OpenSSH; the per-login children call setsid()
      so that they may control the pty connected to them.  If the primary daemon
      dies or is restarted, we do not want to consider the per-login children
      and want to respawn the primary daemon without killing the children.
      
      This patch adds a new PROC_SID_EVENT and associated structure to the
      proc_event event_data union, it arranges for this to be emitted when the
      special PIDTYPE_SID pid is set.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NScott James Remnant <scott@ubuntu.com>
      Acked-by: NMatt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
      Acked-by: N"David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02b51df1
    • J
      seq_file: constify seq_operations · 88e9d34c
      James Morris 提交于
      Make all seq_operations structs const, to help mitigate against
      revectoring user-triggerable function pointers.
      
      This is derived from the grsecurity patch, although generated from scratch
      because it's simpler than extracting the changes from there.
      Signed-off-by: NJames Morris <jmorris@namei.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Acked-by: NCasey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88e9d34c
    • X
      generic-ipi: make struct call_function_data lockless · 54fdade1
      Xiao Guangrong 提交于
      This patch can remove spinlock from struct call_function_data, the
      reasons are below:
      
      1: add a new interface for cpumask named cpumask_test_and_clear_cpu(),
         it can atomically test and clear specific cpu, we can use it instead
         of cpumask_test_cpu() and cpumask_clear_cpu() and no need data->lock
         to protect those in generic_smp_call_function_interrupt().
      
      2: in smp_call_function_many(), after csd_lock() return, the current's
         cfd_data is deleted from call_function list, so it not have race
         between other cpus, then cfs_data is only used in
         smp_call_function_many() that must disable preemption and not from
         a hardware interrupthandler or from a bottom half handler to call,
         only the correspond cpu can use it, so it not have race in current
         cpu, no need cfs_data->lock to protect it.
      
      3: after 1 and 2, cfs_data->lock is only use to protect cfs_data->refs in
         generic_smp_call_function_interrupt(), so we can define cfs_data->refs
         to atomic_t, and no need cfs_data->lock any more.
      Signed-off-by: NXiao Guangrong <xiaoguangrong@cn.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      [akpm@linux-foundation.org: use atomic_dec_return()]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54fdade1
    • N
      Move magic numbers into magic.h · 1fd7317d
      Nick Black 提交于
      Move various magic-number definitions into magic.h.
      Signed-off-by: NNick Black <dank@qemfd.net>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1fd7317d
    • D
      printk: add printk_delay to make messages readable for some scenarios · af91322e
      Dave Young 提交于
      When syslog is not possible, at the same time there's no serial/net
      console available, it will be hard to read the printk messages.  For
      example oops/panic/warning messages in shutdown phase.
      
      Add a printk delay feature, we can make each printk message delay some
      milliseconds.
      
      Setting the delay by proc/sysctl interface: /proc/sys/kernel/printk_delay
      
      The value range from 0 - 10000, default value is 0
      
      [akpm@linux-foundation.org: fix a few things]
      Signed-off-by: NDave Young <hidave.darkstar@gmail.com>
      Acked-by: NIngo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af91322e
    • A
      include/linux/kmemcheck.h: fix a trillion warnings · fa081b00
      Andrew Morton 提交于
      of the form
      
      include/net/inet_sock.h:208: warning: ISO C90 forbids mixed declarations and code
      
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Acked-by: NVegard Nossum <vegard.nossum@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa081b00
  2. 22 9月, 2009 8 次提交