1. 03 4月, 2009 6 次提交
  2. 01 4月, 2009 34 次提交
    • A
      fbdev: uninline lock_fb_info() · 6a7f2829
      Andrew Morton 提交于
      Before:
      
         text    data     bss     dec     hex filename
         3648    2910      32    6590    19be drivers/video/backlight/backlight.o
         3226    2812      32    6070    17b6 drivers/video/backlight/lcd.o
        30990   16688    8480   56158    db5e drivers/video/console/fbcon.o
        15488    8400      24   23912    5d68 drivers/video/fbmem.o
      
      After:
      
         text    data     bss     dec     hex filename
         3537    2870      32    6439    1927 drivers/video/backlight/backlight.o
         3131    2772      32    5935    172f drivers/video/backlight/lcd.o
        30876   16648    8480   56004    dac4 drivers/video/console/fbcon.o
        15506    8400      24   23930    5d7a drivers/video/fbmem.o
      
      Cc: Andrea Righi <righi.andrea@gmail.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Krzysztof Helt <krzysztof.h1@poczta.fm>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a7f2829
    • K
      cirrusfb: add accelerator constant · 614c0dc9
      Krzysztof Helt 提交于
      Add an accelerator constant so almost all Cirrus are recognized as
      accelerators by the fbset command.
      Signed-off-by: NKrzysztof Helt <krzysztof.h1@wp.pl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      614c0dc9
    • A
      rtc: convert LEAP_YEAR into an inline · 78d89ef4
      Andrew Morton 提交于
      - the LEAP_YEAR macro is buggy - it references its arg multiple times.
        Fix this by turning it into a C function.
      
      - give it a more approriate name
      
      - Move it to rtc.h so that other .c files can use it, instead of copying it.
      
      Cc: dann frazier <dannf@hp.com>
      Acked-by: NAlessandro Zummo <alessandro.zummo@towertech.it>
      Cc: stephane eranian <eranian@googlemail.com>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: David Brownell <david-b@pacbell.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78d89ef4
    • I
      autofs4: fix kernel includes · 79955898
      Ian Kent 提交于
      autofs_dev-ioctl.h is included by both the kernel module and user space tools
      and it includes two kernel header files.  Compiles work if the kernel headers
      are installed but fail otherwise.
      Signed-off-by: NIan Kent <raven@themaw.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79955898
    • A
      spi_mpc83xx: add OF platform driver bindings · 35b4b3c0
      Anton Vorontsov 提交于
      Implement full support for OF SPI bindings.  Now the driver can manage its
      own chip selects without any help from the board files and/or fsl_soc
      constructors.
      
      The "legacy" code is well isolated and could be removed as time goes by.
      Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com>
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Kumar Gala <galak@gate.crashing.org>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      35b4b3c0
    • A
      spi_mpc83xx: rework chip selects handling · 364fdbc0
      Anton Vorontsov 提交于
      The main purpose of this patch is to pass 'struct spi_device' to the chip
      select handling routines.  This is needed so that we could implement
      full-fledged OpenFirmware support for this driver.
      
      While at it, also:
      - Replace two {de,activate}_cs routines by single cs_contol().
      - Don't duplicate platform data callbacks in mpc83xx_spi struct.
      Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com>
      Cc: David Brownell <david-b@pacbell.net>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Kumar Gala <galak@gate.crashing.org>
      Cc: Grant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      364fdbc0
    • D
      epoll keyed wakeups: introduce new *_poll() wakeup macros · c0da3775
      Davide Libenzi 提交于
      Introduce new wakeup macros that allow passing an event mask to the wakeup
      targets.  They exactly mimic their non-_poll() counterpart, with the added
      event mask passing capability.  I did add only the ones currently
      requested, avoiding the _nr() and _all() for the moment.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      Cc: William Lee Irwin III <wli@movementarian.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0da3775
    • D
      epoll keyed wakeups: add __wake_up_locked_key() and __wake_up_sync_key() · 4ede816a
      Davide Libenzi 提交于
      This patchset introduces wakeup hints for some of the most popular (from
      epoll POV) devices, so that epoll code can avoid spurious wakeups on its
      waiters.
      
      The problem with epoll is that the callback-based wakeups do not, ATM,
      carry any information about the events the wakeup is related to.  So the
      only choice epoll has (not being able to call f_op->poll() from inside the
      callback), is to add the file* to a ready-list and resolve the real events
      later on, at epoll_wait() (or its own f_op->poll()) time.  This can cause
      spurious wakeups, since the wake_up() itself might be for an event the
      caller is not interested into.
      
      The rate of these spurious wakeup can be pretty high in case of many
      network sockets being monitored.
      
      By allowing devices to report the events the wakeups refer to (at least
      the two major classes - POLLIN/POLLOUT), we are able to spare useless
      wakeups by proper handling inside the epoll's poll callback.
      
      Epoll will have in any case to call f_op->poll() on the file* later on,
      since the change to be done in order to have the full event set sent via
      wakeup, is too invasive for the way our f_op->poll() system works (the
      full event set is calculated inside the poll function - there are too many
      of them to even start thinking the change - also poll/select would need
      change too).
      
      Epoll is changed in a way that both devices which send event hints, and
      the ones that don't, are correctly handled.  The former will gain some
      efficiency though.
      
      As a general rule for devices, would be to add an event mask by using
      key-aware wakeup macros, when making up poll wait queues.  I tested it
      (together with the epoll's poll fix patch Andrew has in -mm) and wakeups
      for the supported devices are correctly filtered.
      
      Test program available here:
      
      http://www.xmailserver.org/epoll_test.c
      
      This patch:
      
      Nothing revolutionary here.  Just using the available "key" that our
      wakeup core already support.  The __wake_up_locked_key() was no brainer,
      since both __wake_up_locked() and __wake_up_locked_key() are thin wrappers
      around __wake_up_common().
      
      The __wake_up_sync() function had a body, so the choice was between
      borrowing the body for __wake_up_sync_key() and calling it from
      __wake_up_sync(), or make an inline and calling it from both.  I chose the
      former since in most archs it all resolves to "mov $0, REG; jmp ADDR".
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: David Miller <davem@davemloft.net>
      Cc: William Lee Irwin III <wli@movementarian.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ede816a
    • D
      eventfd: improve support for semaphore-like behavior · bcd0b235
      Davide Libenzi 提交于
      People started using eventfd in a semaphore-like way where before they
      were using pipes.
      
      That is, counter-based resource access.  Where a "wait()" returns
      immediately by decrementing the counter by one, if counter is greater than
      zero.  Otherwise will wait.  And where a "post(count)" will add count to
      the counter releasing the appropriate amount of waiters.  If eventfd the
      "post" (write) part is fine, while the "wait" (read) does not dequeue 1,
      but the whole counter value.
      
      The problem with eventfd is that a read() on the fd returns and wipes the
      whole counter, making the use of it as semaphore a little bit more
      cumbersome.  You can do a read() followed by a write() of COUNTER-1, but
      IMO it's pretty easy and cheap to make this work w/out extra steps.  This
      patch introduces a new eventfd flag that tells eventfd to only dequeue 1
      from the counter, allowing simple read/write to make it behave like a
      semaphore.  Simple test here:
      
      http://www.xmailserver.org/eventfd-sem.c
      
      To be back-compatible with earlier kernels, userspace applications should
      probe for the availability of this feature via
      
      #ifdef EFD_SEMAPHORE
      	fd = eventfd2 (CNT, EFD_SEMAPHORE);
      	if (fd == -1 && errno == EINVAL)
      		<fallback>
      #else
      		<fallback>
      #endif
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: <linux-api@vger.kernel.org>
      Tested-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bcd0b235
    • C
      introduce pr_cont() macro · 311d0761
      Cyrill Gorcunov 提交于
      We cover all log-levels by pr_...  macros except KERN_CONT one.  Add it
      for convenience.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Harvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      311d0761
    • E
      filesystem freeze: allow SysRq emergency thaw to thaw frozen filesystems · c2d75438
      Eric Sandeen 提交于
      Now that the filesystem freeze operation has been elevated to the VFS, and
      is just an ioctl away, some sort of safety net for unintentionally frozen
      root filesystems may be in order.
      
      The timeout thaw originally proposed did not get merged, but perhaps
      something like this would be useful in emergencies.
      
      For example, freeze /path/to/mountpoint may freeze your root filesystem if
      you forgot that you had that unmounted.
      
      I chose 'j' as the last remaining character other than 'h' which is sort
      of reserved for help (because help is generated on any unknown character).
      
      I've tested this on a non-root fs with multiple (nested) freezers, as well
      as on a system rendered unresponsive due to a frozen root fs.
      
      [randy.dunlap@oracle.com: emergency thaw only if CONFIG_BLOCK enabled]
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Cc: Takashi Sato <t-sato@yk.jp.nec.com>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2d75438
    • J
      loop: add ioctl to resize a loop device · 53d66608
      J. R. Okajima 提交于
      Add the ability to 'resize' the loop device on the fly.
      
      One practical application is a loop file with XFS filesystem, already
      mounted: You can easily enlarge the file (append some bytes) and then call
      ioctl(fd, LOOP_SET_CAPACITY, new); The loop driver will learn about the
      new size and you can use xfs_growfs later on, which will allow you to use
      full capacity of the loop file without the need to unmount.
      
      Test app:
      
      #include <linux/fs.h>
      #include <linux/loop.h>
      #include <sys/ioctl.h>
      #include <sys/stat.h>
      #include <sys/types.h>
      #include <assert.h>
      #include <errno.h>
      #include <fcntl.h>
      #include <stdio.h>
      #include <stdlib.h>
      #include <unistd.h>
      
      #define _GNU_SOURCE
      #include <getopt.h>
      
      char *me;
      
      void usage(FILE *f)
      {
      	fprintf(f, "%s [options] loop_dev [backend_file]\n"
      		"-s, --set new_size_in_bytes\n"
      		"\twhen backend_file is given, "
      		"it will be expanded too while keeping the original contents\n",
      		me);
      }
      
      struct option opts[] = {
      	{
      		.name		= "set",
      		.has_arg	= 1,
      		.flag		= NULL,
      		.val		= 's'
      	},
      	{
      		.name		= "help",
      		.has_arg	= 0,
      		.flag		= NULL,
      		.val		= 'h'
      	}
      };
      
      void err_size(char *name, __u64 old)
      {
      	fprintf(stderr, "size must be larger than current %s (%llu)\n",
      		name, old);
      }
      
      int main(int argc, char *argv[])
      {
      	int fd, err, c, i, bfd;
      	ssize_t ssz;
      	size_t sz;
      	__u64 old, new, append;
      	char a[BUFSIZ];
      	struct stat st;
      	FILE *out;
      	char *backend, *dev;
      
      	err = EINVAL;
      	out = stderr;
      	me = argv[0];
      	new = 0;
      	while ((c = getopt_long(argc, argv, "s:h", opts, &i)) != -1) {
      		switch (c) {
      		case 's':
      			errno = 0;
      			new = strtoull(optarg, NULL, 0);
      			if (errno) {
      				err = errno;
      				perror(argv[i]);
      				goto out;
      			}
      			break;
      
      		case 'h':
      			err = 0;
      			out = stdout;
      			goto err;
      
      		default:
      			perror(argv[i]);
      			goto err;
      		}
      	}
      
      	if (optind < argc)
      		dev = argv[optind++];
      	else
      		goto err;
      
      	fd = open(dev, O_RDONLY);
      	if (fd < 0) {
      		err = errno;
      		perror(dev);
      		goto out;
      	}
      
      	err = ioctl(fd, BLKGETSIZE64, &old);
      	if (err) {
      		err = errno;
      		perror("ioctl BLKGETSIZE64");
      		goto out;
      	}
      
      	if (!new) {
      		printf("%llu\n", old);
      		goto out;
      	}
      
      	if (new < old) {
      		err = EINVAL;
      		err_size(dev, old);
      		goto out;
      	}
      
      	if (optind < argc) {
      		backend = argv[optind++];
      		bfd = open(backend, O_WRONLY|O_APPEND);
      		if (bfd < 0) {
      			err = errno;
      			perror(backend);
      			goto out;
      		}
      		err = fstat(bfd, &st);
      		if (err) {
      			err = errno;
      			perror(backend);
      			goto out;
      		}
      		if (new < st.st_size) {
      			err = EINVAL;
      			err_size(backend, st.st_size);
      			goto out;
      		}
      		append = new - st.st_size;
      		sz = sizeof(a);
      		while (append > 0) {
      			if (append < sz)
      				sz = append;
      			ssz = write(bfd, a, sz);
      			if (ssz != sz) {
      				err = errno;
      				perror(backend);
      				goto out;
      			}
      			append -= sz;
      		}
      		err = fsync(bfd);
      		if (err) {
      			err = errno;
      			perror(backend);
      			goto out;
      		}
      	}
      
      	err = ioctl(fd, LOOP_SET_CAPACITY, new);
      	if (err) {
      		err = errno;
      		perror("ioctl LOOP_SET_CAPACITY");
      	}
      	goto out;
      
       err:
      	usage(out);
       out:
      	return err;
      }
      Signed-off-by: NJ. R. Okajima <hooanon05@yahoo.co.jp>
      Signed-off-by: NTomas Matejicek <tomas@slax.org>
      Cc: <util-linux-ng@vger.kernel.org>
      Cc: Karel Zak <kzak@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Akinobu Mita <akinobu.mita@gmail.com>
      Cc: <linux-api@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53d66608
    • M
      pm: rework includes, remove arch ifdefs · a8af7898
      Magnus Damm 提交于
      Make the following header file changes:
      
       - remove arch ifdefs and asm/suspend.h from linux/suspend.h
       - add asm/suspend.h to disk.c (for arch_prepare_suspend())
       - add linux/io.h to swsusp.c (for ioremap())
       - x86 32/64 bit compile fixes
      Signed-off-by: NMagnus Damm <damm@igel.co.jp>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8af7898
    • H
      shmem: writepage directly to swap · 9fab5619
      Hugh Dickins 提交于
      Synopsis: if shmem_writepage calls swap_writepage directly, most shmem
      swap loads benefit, and a catastrophic interaction between SLUB and some
      flash storage is avoided.
      
      shmem_writepage() has always been peculiar in making no attempt to write:
      it has just transferred a shmem page from file cache to swap cache, then
      let that page make its way around the LRU again before being written and
      freed.
      
      The idea was that people use tmpfs because they want those pages to stay
      in RAM; so although we give it an overflow to swap, we should resist
      writing too soon, giving those pages a second chance before they can be
      reclaimed.
      
      That was always questionable, and I've toyed with this patch for years;
      but never had a clear justification to depart from the original design.
      
      It became more questionable in 2.6.28, when the split LRU patches classed
      shmem and tmpfs pages as SwapBacked rather than as file_cache: that in
      itself gives them more resistance to reclaim than normal file pages.  I
      prepared this patch for 2.6.29, but the merge window arrived before I'd
      completed gathering statistics to justify sending it in.
      
      Then while comparing SLQB against SLUB, running SLUB on a laptop I'd
      habitually used with SLAB, I found SLUB to run my tmpfs kbuild swapping
      tests five times slower than SLAB or SLQB - other machines slower too, but
      nowhere near so bad.  Simpler "cp -a" swapping tests showed the same.
      
      slub_max_order=0 brings sanity to all, but heavy swapping is too far from
      normal to justify such a tuning.  The crucial factor on that laptop turns
      out to be that I'm using an SD card for swap.  What happens is this:
      
      By default, SLUB uses order-2 pages for shmem_inode_cache (and many other
      fs inodes), so creating tmpfs files under memory pressure brings lumpy
      reclaim into play.  One subpage of the order is chosen from the bottom of
      the LRU as usual, then the other three picked out from their random
      positions on the LRUs.
      
      In a tmpfs load, many of these pages will be ones which already passed
      through shmem_writepage, so already have swap allocated.  And though their
      offsets on swap were probably allocated sequentially, now that the pages
      are picked off at random, their swap offsets are scattered.
      
      But the flash storage on the SD card is very sensitive to having its
      writes merged: once swap is written at scattered offsets, performance
      falls apart.  Rotating disk seeks increase too, but less disastrously.
      
      So: stop giving shmem/tmpfs pages a second pass around the LRU, write them
      out to swap as soon as their swap has been allocated.
      
      It's surely possible to devise an artificial load which runs faster the
      old way, one whose sizing is such that the tmpfs pages on their second
      pass are the ones that are wanted again, and other pages not.
      
      But I've not yet found such a load: on all machines, under the loads I've
      tried, immediate swap_writepage speeds up shmem swapping: especially when
      using the SLUB allocator (and more effectively than slub_max_order=0), but
      also with the others; and it also reduces the variance between runs.  How
      much faster varies widely: a factor of five is rare, 5% is common.
      
      One load which might have suffered: imagine a swapping shmem load in a
      limited mem_cgroup on a machine with plenty of memory.  Before 2.6.29 the
      swapcache was not charged, and such a load would have run quickest with
      the shmem swapcache never written to swap.  But now swapcache is charged,
      so even this load benefits from shmem_writepage directly to swap.
      
      Apologies for the #ifndef CONFIG_SWAP swap_writepage() stub in swap.h:
      it's silly because that will never get called; but refactoring shmem.c
      sensibly according to CONFIG_SWAP will be a separate task.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9fab5619
    • K
      vmscan: fix it to take care of nodemask · 327c0e96
      KAMEZAWA Hiroyuki 提交于
      try_to_free_pages() is used for the direct reclaim of up to
      SWAP_CLUSTER_MAX pages when watermarks are low.  The caller to
      alloc_pages_nodemask() can specify a nodemask of nodes that are allowed to
      be used but this is not passed to try_to_free_pages().  This can lead to
      unnecessary reclaim of pages that are unusable by the caller and int the
      worst case lead to allocation failure as progress was not been make where
      it is needed.
      
      This patch passes the nodemask used for alloc_pages_nodemask() to
      try_to_free_pages().
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      327c0e96
    • D
      nommu: there is no mlock() for NOMMU, so don't provide the bits · 33925b25
      David Howells 提交于
      The mlock() facility does not exist for NOMMU since all mappings are
      effectively locked anyway, so we don't make the bits available when
      they're not useful.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Enrik Berkhan <Enrik.Berkhan@ge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33925b25
    • A
      mm: introduce debug_kmap_atomic · f4112de6
      Akinobu Mita 提交于
      x86 has debug_kmap_atomic_prot() which is error checking function for
      kmap_atomic.  It is usefull for the other architectures, although it needs
      CONFIG_TRACE_IRQFLAGS_SUPPORT.
      
      This patch exposes it to the other architectures.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4112de6
    • N
      mm: page_mkwrite change prototype to match fault · c2ec175c
      Nick Piggin 提交于
      Change the page_mkwrite prototype to take a struct vm_fault, and return
      VM_FAULT_xxx flags.  There should be no functional change.
      
      This makes it possible to return much more detailed error information to
      the VM (and also can provide more information eg.  virtual_address to the
      driver, which might be important in some special cases).
      
      This is required for a subsequent fix.  And will also make it easier to
      merge page_mkwrite() with fault() in future.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Artem Bityutskiy <dedekind@infradead.org>
      Cc: Felix Blyakher <felixb@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2ec175c
    • A
      mm: enable hashdist by default on 64bit NUMA · c2fdf3a9
      Anton Blanchard 提交于
      On PowerPC we allocate large boot time hashes on node 0.  This leads to an
      imbalance in the free memory, for example on a 64GB box (4 x 16GB nodes):
      
      Free memory:
      Node 0: 97.03%
      Node 1: 98.54%
      Node 2: 98.42%
      Node 3: 98.53%
      
      If we switch to using vmalloc (like ia64 and x86-64) things are more
      balanced:
      
      Free memory:
      Node 0: 97.53%
      Node 1: 98.35%
      Node 2: 98.33%
      Node 3: 98.33%
      
      For many HPC applications we are limited by the free available memory on
      the smallest node, so even though the same amount of memory is used the
      better balancing helps.
      
      Since all 64bit NUMA capable architectures should have sufficient vmalloc
      space, it makes sense to enable it via CONFIG_64BIT.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Richard Henderson <rth@twiddle.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2fdf3a9
    • A
      mm: fix proc_dointvec_userhz_jiffies "breakage" · 704503d8
      Alexey Dobriyan 提交于
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=9838
      
      On i386, HZ=1000, jiffies_to_clock_t() converts time in a somewhat strange
      way from the user's point of view:
      
      	# echo 500 >/proc/sys/vm/dirty_writeback_centisecs
      	# cat /proc/sys/vm/dirty_writeback_centisecs
      	499
      
      So, we have 5000 jiffies converted to only 499 clock ticks and reported
      back.
      
      TICK_NSEC = 999848
      ACTHZ = 256039
      
      Keeping in-kernel variable in units passed from userspace will fix issue
      of course, but this probably won't be right for every sysctl.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      704503d8
    • A
      generic debug pagealloc · 6a11f75b
      Akinobu Mita 提交于
      CONFIG_DEBUG_PAGEALLOC is now supported by x86, powerpc, sparc64, and
      s390.  This patch implements it for the rest of the architectures by
      filling the pages with poison byte patterns after free_pages() and
      verifying the poison patterns before alloc_pages().
      
      This generic one cannot detect invalid page accesses immediately but
      invalid read access may cause invalid dereference by poisoned memory and
      invalid write access can be detected after a long delay.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a11f75b
    • L
      memdup_user(): introduce · 610a77e0
      Li Zefan 提交于
      I notice there are many places doing copy_from_user() which follows
      kmalloc():
      
              dst = kmalloc(len, GFP_KERNEL);
              if (!dst)
                      return -ENOMEM;
              if (copy_from_user(dst, src, len)) {
      		kfree(dst);
      		return -EFAULT
      	}
      
      memdup_user() is a wrapper of the above code.  With this new function, we
      don't have to write 'len' twice, which can lead to typos/mistakes.  It
      also produces smaller code and kernel text.
      
      A quick grep shows 250+ places where memdup_user() *may* be used.  I'll
      prepare a patchset to do this conversion.
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Americo Wang <xiyou.wangcong@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      610a77e0
    • K
      mm: remove pagevec_swap_free() · d1d74871
      KOSAKI Motohiro 提交于
      pagevec_swap_free() is now unused.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Rik van Riel <riel@redhat.com>
      Acked-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1d74871
    • E
      vfs: add/use account_page_dirtied() · e3a7cca1
      Edward Shishkin 提交于
      Add a helper function account_page_dirtied().  Use that from two
      callsites.  reiser4 adds a function which adds a third callsite.
      
      Signed-off-by: Edward Shishkin<edward.shishkin@gmail.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e3a7cca1
    • K
      mm: introduce for_each_populated_zone() macro · ee99c71c
      KOSAKI Motohiro 提交于
      Impact: cleanup
      
      In almost cases, for_each_zone() is used with populated_zone().  It's
      because almost function doesn't need memoryless node information.
      Therefore, for_each_populated_zone() can help to make code simplify.
      
      This patch has no functional change.
      
      [akpm@linux-foundation.org: small cleanup]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ee99c71c
    • O
      get_mm_hiwater_xxx: trivial, s/define/inline/ · 9de1581e
      Oleg Nesterov 提交于
      Andrew pointed out get_mm_hiwater_xxx() evaluate "mm" argument thrice/twice,
      make them inline.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9de1581e
    • A
      proc tty: remove struct tty_operations::read_proc · 0f043a81
      Alexey Dobriyan 提交于
      struct tty_operations::proc_fops took it's place and there is one less
      create_proc_read_entry() user now!
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f043a81
    • A
      proc tty: add struct tty_operations::proc_fops · ae149b6b
      Alexey Dobriyan 提交于
      Used for gradual switch of TTY drivers from using ->read_proc which helps
      with gradual switch from ->read_proc for the whole tree.
      
      As side effect, fix possible race condition when ->data initialized after
      PDE is hooked into proc tree.
      
      ->proc_fops takes precedence over ->read_proc.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae149b6b
    • S
      ide: inline SELECT_DRIVE() · fdd88f0a
      Sergei Shtylyov 提交于
      Since SELECT_DRIVE() has boiled down to a mere dev_select() method call, it now
      makes sense to just inline it...
      Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      fdd88f0a
    • S
      ide: turn selectproc() method into dev_select() method (take 5) · abb596b2
      Sergei Shtylyov 提交于
      Turn selectproc() method into dev_select() method by teaching it to write to the
      device register and moving it from 'struct ide_port_ops' to 'struct ide_tp_ops'.
      Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Cc: benh@kernel.crashing.org
      Cc: petkovbb@gmail.com
      [bart: add ->dev_select to at91_ide.c and tx4939.c (__BIG_ENDIAN case)]
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      abb596b2
    • S
      ide: rename IDE_TFLAG_IN_[HOB_]FEATURE · 67625119
      Sergei Shtylyov 提交于
      The feature register has never been readable -- when its location is read, one
      gets the error register value; hence rename IDE_TFLAG_IN_[HOB_]FEATURE into
      IDE_TFLAG_IN_[HOB_]ERROR and introduce the 'hob_error' field into the 'struct
      ide_taskfile' (despite the error register not really depending on the HOB bit).
      Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      67625119
    • S
      ide: turn set_irq() method into write_devctl() method · ecf3a31d
      Sergei Shtylyov 提交于
      Turn set_irq() method with its software reset hack into write_devctl() method
      (for just writing a value into the device control register) at last...
      Signed-off-by: NSergei Shtylyov <sshtylyov@ru.mvista.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      ecf3a31d
    • B
      ide-floppy: use ide_pio_bytes() · 349d12a1
      Bartlomiej Zolnierkiewicz 提交于
      * Fix ide_init_sg_cmd() setup for non-fs requests.
      
      * Convert ide_pc_intr() to use ide_pio_bytes() for floppy media.
      
      * Remove no longer needed ide_io_buffers() and sg/sg_cnt fields
        from struct ide_atapi_pc.
      
      * Remove partial completions; kill idefloppy_update_buffers(), as a
        result.
      
      * Add some more debugging statements.
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Signed-off-by: NBorislav Petkov <petkovbb@gmail.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      349d12a1
    • B
      ide: decrease size of ->pc_buf field in struct ide_atapi_pc · 41fa9f86
      Bartlomiej Zolnierkiewicz 提交于
      struct ide_atapi_pc is often allocated on the stack and size of ->pc_buf
      size is 256 bytes.  However since only ide_floppy_create_read_capacity_cmd()
      and idetape_create_inquiry_cmd() require such size allocate buffers for
      these pc-s explicitely and decrease ->pc_buf size to 64 bytes.
      
      Cc: Borislav Petkov <petkovbb@gmail.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      41fa9f86