1. 01 4月, 2009 40 次提交
    • T
      epoll: clean up ep_modify · e057e15f
      Tony Battersby 提交于
      ep_modify() doesn't need to set event.data from within the ep->lock
      spinlock as the comment suggests.  The only place event.data is used is
      ep_send_events_proc(), and this is protected by ep->mtx instead of
      ep->lock.  Also update the comment for mutex_lock() at the top of
      ep_scan_ready_list(), which mentions epoll_ctl(EPOLL_CTL_DEL) but not
      epoll_ctl(EPOLL_CTL_MOD).
      
      ep_modify() can also use spin_lock_irq() instead of spin_lock_irqsave().
      Signed-off-by: NTony Battersby <tonyb@cybernetics.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e057e15f
    • T
      epoll: remove unnecessary xchg · d1bc90dd
      Tony Battersby 提交于
      xchg in ep_unregister_pollwait() is unnecessary because it is protected by
      either epmutex or ep->mtx (the same protection as ep_remove()).
      
      If xchg was necessary, it would be insufficient to protect against
      problems: if multiple concurrent calls to ep_unregister_pollwait() were
      possible then a second caller that returns without doing anything because
      nwait == 0 could return before the waitqueues are removed by the first
      caller, which looks like it could lead to problematic races with
      ep_poll_callback().
      
      So remove xchg and add comments about the locking.
      Signed-off-by: NTony Battersby <tonyb@cybernetics.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d1bc90dd
    • T
      epoll: remember the event if epoll_wait returns -EFAULT · d0305882
      Tony Battersby 提交于
      If epoll_wait returns -EFAULT, the event that was being returned when the
      fault was encountered will be forgotten.  This is not a big deal since
      EFAULT will happen only if a buggy userspace program passes in a bad
      address, in which case what happens later usually doesn't matter.
      However, it is easy to remember the event for later, and this patch makes
      a simple change to do that.
      Signed-off-by: NTony Battersby <tonyb@cybernetics.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0305882
    • T
      epoll: don't use current in irq context · abff55ce
      Tony Battersby 提交于
      ep_call_nested() (formerly ep_poll_safewake()) uses "current" (without
      dereferencing it) to detect callback recursion, but it may be called from
      irq context where the use of current is generally discouraged.  It would
      be better to use get_cpu() and put_cpu() to detect the callback recursion.
      Signed-off-by: NTony Battersby <tonyb@cybernetics.com>
      Acked-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      abff55ce
    • D
      epoll: remove debugging code · bb57c3ed
      Davide Libenzi 提交于
      Remove debugging code from epoll.  There's no need for it to be included
      into mainline code.
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb57c3ed
    • D
      epoll: fix epoll's own poll (update) · 296e236e
      Davide Libenzi 提交于
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Pavel Pisa <pisa@cmp.felk.cvut.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      296e236e
    • D
      epoll: fix epoll's own poll · 5071f97e
      Davide Libenzi 提交于
      Fix a bug inside the epoll's f_op->poll() code, that returns POLLIN even
      though there are no actual ready monitored fds.  The bug shows up if you
      add an epoll fd inside another fd container (poll, select, epoll).
      
      The problem is that callback-based wake ups used by epoll does not carry
      (patches will follow, to fix this) any information about the events that
      actually happened.  So the callback code, since it can't call the file*
      ->poll() inside the callback, chains the file* into a ready-list.
      
      So, suppose you added an fd with EPOLLOUT only, and some data shows up on
      the fd, the file* mapped by the fd will be added into the ready-list (via
      wakeup callback).  During normal epoll_wait() use, this condition is
      sorted out at the time we're actually able to call the file*'s
      f_op->poll().
      
      Inside the old epoll's f_op->poll() though, only a quick check
      !list_empty(ready-list) was performed, and this could have led to
      reporting POLLIN even though no ready fds would show up at a following
      epoll_wait().  In order to correctly report the ready status for an epoll
      fd, the ready-list must be checked to see if any really available fd+event
      would be ready in a following epoll_wait().
      
      Operation (calling f_op->poll() from inside f_op->poll()) that, like wake
      ups, must be handled with care because of the fact that epoll fds can be
      added to other epoll fds.
      
      Test code:
      
      /*
       *  epoll_test by Davide Libenzi (Simple code to test epoll internals)
       *  Copyright (C) 2008  Davide Libenzi
       *
       *  This program is free software; you can redistribute it and/or modify
       *  it under the terms of the GNU General Public License as published by
       *  the Free Software Foundation; either version 2 of the License, or
       *  (at your option) any later version.
       *
       *  This program is distributed in the hope that it will be useful,
       *  but WITHOUT ANY WARRANTY; without even the implied warranty of
       *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
       *  GNU General Public License for more details.
       *
       *  You should have received a copy of the GNU General Public License
       *  along with this program; if not, write to the Free Software
       *  Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
       *
       *  Davide Libenzi <davidel@xmailserver.org>
       *
       */
      
      #include <sys/types.h>
      #include <unistd.h>
      #include <stdio.h>
      #include <stdlib.h>
      #include <string.h>
      #include <errno.h>
      #include <signal.h>
      #include <limits.h>
      #include <poll.h>
      #include <sys/epoll.h>
      #include <sys/wait.h>
      
      #define EPWAIT_TIMEO	(1 * 1000)
      #ifndef POLLRDHUP
      #define POLLRDHUP 0x2000
      #endif
      
      #define EPOLL_MAX_CHAIN	100L
      
      #define EPOLL_TF_LOOP (1 << 0)
      
      struct epoll_test_cfg {
      	long size;
      	long flags;
      };
      
      static int xepoll_create(int n) {
      	int epfd;
      
      	if ((epfd = epoll_create(n)) == -1) {
      		perror("epoll_create");
      		exit(2);
      	}
      
      	return epfd;
      }
      
      static void xepoll_ctl(int epfd, int cmd, int fd, struct epoll_event *evt) {
      	if (epoll_ctl(epfd, cmd, fd, evt) < 0) {
      		perror("epoll_ctl");
      		exit(3);
      	}
      }
      
      static void xpipe(int *fds) {
      	if (pipe(fds)) {
      		perror("pipe");
      		exit(4);
      	}
      }
      
      static pid_t xfork(void) {
      	pid_t pid;
      
      	if ((pid = fork()) == (pid_t) -1) {
      		perror("pipe");
      		exit(5);
      	}
      
      	return pid;
      }
      
      static int run_forked_proc(int (*proc)(void *), void *data) {
      	int status;
      	pid_t pid;
      
      	if ((pid = xfork()) == 0)
      		exit((*proc)(data));
      	if (waitpid(pid, &status, 0) != pid) {
      		perror("waitpid");
      		return -1;
      	}
      
      	return WIFEXITED(status) ? WEXITSTATUS(status): -2;
      }
      
      static int check_events(int fd, int timeo) {
      	struct pollfd pfd;
      
      	fprintf(stdout, "Checking events for fd %d\n", fd);
      	memset(&pfd, 0, sizeof(pfd));
      	pfd.fd = fd;
      	pfd.events = POLLIN | POLLOUT;
      	if (poll(&pfd, 1, timeo) < 0) {
      		perror("poll()");
      		return 0;
      	}
      	if (pfd.revents & POLLIN)
      		fprintf(stdout, "\tPOLLIN\n");
      	if (pfd.revents & POLLOUT)
      		fprintf(stdout, "\tPOLLOUT\n");
      	if (pfd.revents & POLLERR)
      		fprintf(stdout, "\tPOLLERR\n");
      	if (pfd.revents & POLLHUP)
      		fprintf(stdout, "\tPOLLHUP\n");
      	if (pfd.revents & POLLRDHUP)
      		fprintf(stdout, "\tPOLLRDHUP\n");
      
      	return pfd.revents;
      }
      
      static int epoll_test_tty(void *data) {
      	int epfd, ifd = fileno(stdin), res;
      	struct epoll_event evt;
      
      	if (check_events(ifd, 0) != POLLOUT) {
      		fprintf(stderr, "Something is cooking on STDIN (%d)\n", ifd);
      		return 1;
      	}
      	epfd = xepoll_create(1);
      	fprintf(stdout, "Created epoll fd (%d)\n", epfd);
      	memset(&evt, 0, sizeof(evt));
      	evt.events = EPOLLIN;
      	xepoll_ctl(epfd, EPOLL_CTL_ADD, ifd, &evt);
      	if (check_events(epfd, 0) & POLLIN) {
      		res = epoll_wait(epfd, &evt, 1, 0);
      		if (res == 0) {
      			fprintf(stderr, "Epoll fd (%d) is ready when it shouldn't!\n",
      				epfd);
      			return 2;
      		}
      	}
      
      	return 0;
      }
      
      static int epoll_wakeup_chain(void *data) {
      	struct epoll_test_cfg *tcfg = data;
      	int i, res, epfd, bfd, nfd, pfds[2];
      	pid_t pid;
      	struct epoll_event evt;
      
      	memset(&evt, 0, sizeof(evt));
      	evt.events = EPOLLIN;
      
      	epfd = bfd = xepoll_create(1);
      
      	for (i = 0; i < tcfg->size; i++) {
      		nfd = xepoll_create(1);
      		xepoll_ctl(bfd, EPOLL_CTL_ADD, nfd, &evt);
      		bfd = nfd;
      	}
      	xpipe(pfds);
      	if (tcfg->flags & EPOLL_TF_LOOP)
      	{
      		xepoll_ctl(bfd, EPOLL_CTL_ADD, epfd, &evt);
      		/*
      		 * If we're testing for loop, we want that the wakeup
      		 * triggered by the write to the pipe done in the child
      		 * process, triggers a fake event. So we add the pipe
      		 * read size with EPOLLOUT events. This will trigger
      		 * an addition to the ready-list, but no real events
      		 * will be there. The the epoll kernel code will proceed
      		 * in calling f_op->poll() of the epfd, triggering the
      		 * loop we want to test.
      		 */
      		evt.events = EPOLLOUT;
      	}
      	xepoll_ctl(bfd, EPOLL_CTL_ADD, pfds[0], &evt);
      
      	/*
      	 * The pipe write must come after the poll(2) call inside
      	 * check_events(). This tests the nested wakeup code in
      	 * fs/eventpoll.c:ep_poll_safewake()
      	 * By having the check_events() (hence poll(2)) happens first,
      	 * we have poll wait queue filled up, and the write(2) in the
      	 * child will trigger the wakeup chain.
      	 */
      	if ((pid = xfork()) == 0) {
      		sleep(1);
      		write(pfds[1], "w", 1);
      		exit(0);
      	}
      
      	res = check_events(epfd, 2000) & POLLIN;
      
      	if (waitpid(pid, NULL, 0) != pid) {
      		perror("waitpid");
      		return -1;
      	}
      
      	return res;
      }
      
      static int epoll_poll_chain(void *data) {
      	struct epoll_test_cfg *tcfg = data;
      	int i, res, epfd, bfd, nfd, pfds[2];
      	pid_t pid;
      	struct epoll_event evt;
      
      	memset(&evt, 0, sizeof(evt));
      	evt.events = EPOLLIN;
      
      	epfd = bfd = xepoll_create(1);
      
      	for (i = 0; i < tcfg->size; i++) {
      		nfd = xepoll_create(1);
      		xepoll_ctl(bfd, EPOLL_CTL_ADD, nfd, &evt);
      		bfd = nfd;
      	}
      	xpipe(pfds);
      	if (tcfg->flags & EPOLL_TF_LOOP)
      	{
      		xepoll_ctl(bfd, EPOLL_CTL_ADD, epfd, &evt);
      		/*
      		 * If we're testing for loop, we want that the wakeup
      		 * triggered by the write to the pipe done in the child
      		 * process, triggers a fake event. So we add the pipe
      		 * read size with EPOLLOUT events. This will trigger
      		 * an addition to the ready-list, but no real events
      		 * will be there. The the epoll kernel code will proceed
      		 * in calling f_op->poll() of the epfd, triggering the
      		 * loop we want to test.
      		 */
      		evt.events = EPOLLOUT;
      	}
      	xepoll_ctl(bfd, EPOLL_CTL_ADD, pfds[0], &evt);
      
      	/*
      	 * The pipe write mush come before the poll(2) call inside
      	 * check_events(). This tests the nested f_op->poll calls code in
      	 * fs/eventpoll.c:ep_eventpoll_poll()
      	 * By having the pipe write(2) happen first, we make the kernel
      	 * epoll code to load the ready lists, and the following poll(2)
      	 * done inside check_events() will test nested poll code in
      	 * ep_eventpoll_poll().
      	 */
      	if ((pid = xfork()) == 0) {
      		write(pfds[1], "w", 1);
      		exit(0);
      	}
      	sleep(1);
      	res = check_events(epfd, 1000) & POLLIN;
      
      	if (waitpid(pid, NULL, 0) != pid) {
      		perror("waitpid");
      		return -1;
      	}
      
      	return res;
      }
      
      int main(int ac, char **av) {
      	int error;
      	struct epoll_test_cfg tcfg;
      
      	fprintf(stdout, "\n********** Testing TTY events\n");
      	error = run_forked_proc(epoll_test_tty, NULL);
      	fprintf(stdout, error == 0 ?
      		"********** OK\n": "********** FAIL (%d)\n", error);
      
      	tcfg.size = 3;
      	tcfg.flags = 0;
      	fprintf(stdout, "\n********** Testing short wakeup chain\n");
      	error = run_forked_proc(epoll_wakeup_chain, &tcfg);
      	fprintf(stdout, error == POLLIN ?
      		"********** OK\n": "********** FAIL (%d)\n", error);
      
      	tcfg.size = EPOLL_MAX_CHAIN;
      	tcfg.flags = 0;
      	fprintf(stdout, "\n********** Testing long wakeup chain (HOLD ON)\n");
      	error = run_forked_proc(epoll_wakeup_chain, &tcfg);
      	fprintf(stdout, error == 0 ?
      		"********** OK\n": "********** FAIL (%d)\n", error);
      
      	tcfg.size = 3;
      	tcfg.flags = 0;
      	fprintf(stdout, "\n********** Testing short poll chain\n");
      	error = run_forked_proc(epoll_poll_chain, &tcfg);
      	fprintf(stdout, error == POLLIN ?
      		"********** OK\n": "********** FAIL (%d)\n", error);
      
      	tcfg.size = EPOLL_MAX_CHAIN;
      	tcfg.flags = 0;
      	fprintf(stdout, "\n********** Testing long poll chain (HOLD ON)\n");
      	error = run_forked_proc(epoll_poll_chain, &tcfg);
      	fprintf(stdout, error == 0 ?
      		"********** OK\n": "********** FAIL (%d)\n", error);
      
      	tcfg.size = 3;
      	tcfg.flags = EPOLL_TF_LOOP;
      	fprintf(stdout, "\n********** Testing loopy wakeup chain (HOLD ON)\n");
      	error = run_forked_proc(epoll_wakeup_chain, &tcfg);
      	fprintf(stdout, error == 0 ?
      		"********** OK\n": "********** FAIL (%d)\n", error);
      
      	tcfg.size = 3;
      	tcfg.flags = EPOLL_TF_LOOP;
      	fprintf(stdout, "\n********** Testing loopy poll chain (HOLD ON)\n");
      	error = run_forked_proc(epoll_poll_chain, &tcfg);
      	fprintf(stdout, error == 0 ?
      		"********** OK\n": "********** FAIL (%d)\n", error);
      
      	return 0;
      }
      Signed-off-by: NDavide Libenzi <davidel@xmailserver.org>
      Cc: Pavel Pisa <pisa@cmp.felk.cvut.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5071f97e
    • D
      drivers/misc/isl29003.c: driver for the ISL29003 ambient light sensor · 3cdbbeeb
      Daniel Mack 提交于
      Add a driver for Intersil's ISL29003 ambient light sensor device plus some
      documentation.  Inspired by tsl2550.c, a driver for a similar device.
      
      It is put in drivers/misc for now until the industrial I/O framework gets
      merged.
      Signed-off-by: NDaniel Mack <daniel@caiaq.de>
      Acked-by: NJonathan Cameron <jic23@cam.ac.uk>
      Cc: Jean Delvare <khali@linux-fr.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3cdbbeeb
    • D
      hpilo: reduce frequency of IO operations · 891f7d73
      David Altobelli 提交于
      Change hpilo open and close logic to spin for 10usec between checking device,
      rather than every usec.
      
      Because the loop is coded to take up to 10ms, it seemed prudent to
      increase the interval between polling the device, to reduce the load on
      the system and allow more other work to happen.
      Signed-off-by: NDavid Altobelli <david.altobelli@hp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      891f7d73
    • H
      ntfs: remove private wrapper of endian helpers · 63cd8854
      Harvey Harrison 提交于
      The base versions handle constant folding now and are shorter than these
      private wrappers, use them directly.
      Signed-off-by: NHarvey Harrison <harvey.harrison@gmail.com>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      63cd8854
    • C
      introduce pr_cont() macro · 311d0761
      Cyrill Gorcunov 提交于
      We cover all log-levels by pr_...  macros except KERN_CONT one.  Add it
      for convenience.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Harvey Harrison <harvey.harrison@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      311d0761
    • H
      init/main.c: fix sparse warnings: context imbalance · acdd052a
      Hannes Eder 提交于
      Impact: Attribute function 'init_post' with __releases(...).
      
      Fix these sparse warnings:
        init/main.c:805:21: warning: context imbalance in 'init_post' - unexpected unlock
        init/main.c:899:9: warning: context imbalance in 'kernel_init' - wrong count at exit
      Signed-off-by: NHannes Eder <hannes@hanneseder.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      acdd052a
    • M
      bcm47xx: fix GPIO API return codes · e0f7ad5f
      Michael Buesch 提交于
      The GPIO API is supposed to return 0 or a negative error code,
      but the SSB GPIO functions return the bitmask of the GPIO register.
      Fix this by ignoring the bitmask and always returning 0. The SSB GPIO functions can't fail.
      Signed-off-by: NMichael Buesch <mb@bu3sch.de>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Brownell <david-b@pacbell.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0f7ad5f
    • H
      auxdisplay: remove PARPORT dependency · c0aa24ba
      H Hartley Sweeten 提交于
      Remove PARPORT dependency for Auxiliary Display support.
      
      This is not needed since the dependency for the KS0108 driver is
      PARPORT_PC.
      Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
      Cc: Miguel Ojeda Sandonis <miguel.ojeda.sandonis@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0aa24ba
    • F
      remove unused include/asm-generic/dma-mapping.h · fcd5e162
      FUJITA Tomonori 提交于
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fcd5e162
    • E
      filesystem freeze: allow SysRq emergency thaw to thaw frozen filesystems · c2d75438
      Eric Sandeen 提交于
      Now that the filesystem freeze operation has been elevated to the VFS, and
      is just an ioctl away, some sort of safety net for unintentionally frozen
      root filesystems may be in order.
      
      The timeout thaw originally proposed did not get merged, but perhaps
      something like this would be useful in emergencies.
      
      For example, freeze /path/to/mountpoint may freeze your root filesystem if
      you forgot that you had that unmounted.
      
      I chose 'j' as the last remaining character other than 'h' which is sort
      of reserved for help (because help is generated on any unknown character).
      
      I've tested this on a non-root fs with multiple (nested) freezers, as well
      as on a system rendered unresponsive due to a frozen root fs.
      
      [randy.dunlap@oracle.com: emergency thaw only if CONFIG_BLOCK enabled]
      Signed-off-by: NEric Sandeen <sandeen@redhat.com>
      Cc: Takashi Sato <t-sato@yk.jp.nec.com>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2d75438
    • W
      lib/rbtree.c: optimize rb_erase() · 55a63998
      Wolfram Strepp 提交于
      Tfour 4 redundant if-conditions in function __rb_erase_color() in
      lib/rbtree.c are removed.
      
      In pseudo-source-code, the structure of the code is as follows:
      
      if ((!A || B) && (!C || D)) {
      	.
      	.
      	.
      } else {
      	if (!C || D) {//if this is true, it implies: (A == true) && (B == false)
      		if (A) {//hence this always evaluates to 'true'...
      			.
      		}
      		.
      		//at this point, C always becomes true, because of:
      		__rb_rotate_right/left();
      		//and:
      		other = parent->rb_right/left;
      	}
      	.
      	.
      	if (C) {//...and this too !
      		.
      	}
      }
      Signed-off-by: NWolfram Strepp <wstrepp@gmx.de>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Andrea Arcangeli <andrea@qumranet.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      55a63998
    • J
      loop: add ioctl to resize a loop device · 53d66608
      J. R. Okajima 提交于
      Add the ability to 'resize' the loop device on the fly.
      
      One practical application is a loop file with XFS filesystem, already
      mounted: You can easily enlarge the file (append some bytes) and then call
      ioctl(fd, LOOP_SET_CAPACITY, new); The loop driver will learn about the
      new size and you can use xfs_growfs later on, which will allow you to use
      full capacity of the loop file without the need to unmount.
      
      Test app:
      
      #include <linux/fs.h>
      #include <linux/loop.h>
      #include <sys/ioctl.h>
      #include <sys/stat.h>
      #include <sys/types.h>
      #include <assert.h>
      #include <errno.h>
      #include <fcntl.h>
      #include <stdio.h>
      #include <stdlib.h>
      #include <unistd.h>
      
      #define _GNU_SOURCE
      #include <getopt.h>
      
      char *me;
      
      void usage(FILE *f)
      {
      	fprintf(f, "%s [options] loop_dev [backend_file]\n"
      		"-s, --set new_size_in_bytes\n"
      		"\twhen backend_file is given, "
      		"it will be expanded too while keeping the original contents\n",
      		me);
      }
      
      struct option opts[] = {
      	{
      		.name		= "set",
      		.has_arg	= 1,
      		.flag		= NULL,
      		.val		= 's'
      	},
      	{
      		.name		= "help",
      		.has_arg	= 0,
      		.flag		= NULL,
      		.val		= 'h'
      	}
      };
      
      void err_size(char *name, __u64 old)
      {
      	fprintf(stderr, "size must be larger than current %s (%llu)\n",
      		name, old);
      }
      
      int main(int argc, char *argv[])
      {
      	int fd, err, c, i, bfd;
      	ssize_t ssz;
      	size_t sz;
      	__u64 old, new, append;
      	char a[BUFSIZ];
      	struct stat st;
      	FILE *out;
      	char *backend, *dev;
      
      	err = EINVAL;
      	out = stderr;
      	me = argv[0];
      	new = 0;
      	while ((c = getopt_long(argc, argv, "s:h", opts, &i)) != -1) {
      		switch (c) {
      		case 's':
      			errno = 0;
      			new = strtoull(optarg, NULL, 0);
      			if (errno) {
      				err = errno;
      				perror(argv[i]);
      				goto out;
      			}
      			break;
      
      		case 'h':
      			err = 0;
      			out = stdout;
      			goto err;
      
      		default:
      			perror(argv[i]);
      			goto err;
      		}
      	}
      
      	if (optind < argc)
      		dev = argv[optind++];
      	else
      		goto err;
      
      	fd = open(dev, O_RDONLY);
      	if (fd < 0) {
      		err = errno;
      		perror(dev);
      		goto out;
      	}
      
      	err = ioctl(fd, BLKGETSIZE64, &old);
      	if (err) {
      		err = errno;
      		perror("ioctl BLKGETSIZE64");
      		goto out;
      	}
      
      	if (!new) {
      		printf("%llu\n", old);
      		goto out;
      	}
      
      	if (new < old) {
      		err = EINVAL;
      		err_size(dev, old);
      		goto out;
      	}
      
      	if (optind < argc) {
      		backend = argv[optind++];
      		bfd = open(backend, O_WRONLY|O_APPEND);
      		if (bfd < 0) {
      			err = errno;
      			perror(backend);
      			goto out;
      		}
      		err = fstat(bfd, &st);
      		if (err) {
      			err = errno;
      			perror(backend);
      			goto out;
      		}
      		if (new < st.st_size) {
      			err = EINVAL;
      			err_size(backend, st.st_size);
      			goto out;
      		}
      		append = new - st.st_size;
      		sz = sizeof(a);
      		while (append > 0) {
      			if (append < sz)
      				sz = append;
      			ssz = write(bfd, a, sz);
      			if (ssz != sz) {
      				err = errno;
      				perror(backend);
      				goto out;
      			}
      			append -= sz;
      		}
      		err = fsync(bfd);
      		if (err) {
      			err = errno;
      			perror(backend);
      			goto out;
      		}
      	}
      
      	err = ioctl(fd, LOOP_SET_CAPACITY, new);
      	if (err) {
      		err = errno;
      		perror("ioctl LOOP_SET_CAPACITY");
      	}
      	goto out;
      
       err:
      	usage(out);
       out:
      	return err;
      }
      Signed-off-by: NJ. R. Okajima <hooanon05@yahoo.co.jp>
      Signed-off-by: NTomas Matejicek <tomas@slax.org>
      Cc: <util-linux-ng@vger.kernel.org>
      Cc: Karel Zak <kzak@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Akinobu Mita <akinobu.mita@gmail.com>
      Cc: <linux-api@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      53d66608
    • W
      uml: remove useless comments · 65bd6a9b
      WANG Cong 提交于
      These comments are useless now, remove them.
      Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      65bd6a9b
    • W
      uml: improve error messages · 5062910a
      WANG Cong 提交于
      These error messages are from check_sysemu(), not check_ptrace().
      Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5062910a
    • W
      uml: don't use a too long string literal · dc717687
      WANG Cong 提交于
      uml uses a concatenated string literal to store the contents of .config,
      but .config file content is varaible, it can be very long.
      
      Use an array of string literals instead.
      Signed-off-by: NWANG Cong <xiyou.wangcong@gmail.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dc717687
    • C
      ubd: stop defintining MAJOR_NR · 792dd4fc
      Christoph Hellwig 提交于
      MAJOR_NR isn't needed anymore since very early 2.5 kernels.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      792dd4fc
    • M
      pm: cleanup includes · bf9ed57d
      Magnus Damm 提交于
      Remove unused/duplicate cruft from asm/suspend.h:
      
       - x86_32: remove unused acpi code
       - powerpc: remove duplicate prototypes, see linux/suspend.h
      Signed-off-by: NMagnus Damm <damm@igel.co.jp>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bf9ed57d
    • M
      pm: rework includes, remove arch ifdefs · a8af7898
      Magnus Damm 提交于
      Make the following header file changes:
      
       - remove arch ifdefs and asm/suspend.h from linux/suspend.h
       - add asm/suspend.h to disk.c (for arch_prepare_suspend())
       - add linux/io.h to swsusp.c (for ioremap())
       - x86 32/64 bit compile fixes
      Signed-off-by: NMagnus Damm <damm@igel.co.jp>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8af7898
    • R
      alpha: convert u64 to unsigned long long · 5f0e3da6
      Randy Dunlap 提交于
      Convert alpha architecture to use u64 as unsigned long long.  This is
      being done so that (a) all arches use u64 as unsigned long long and (b)
      printk of a u64 as %ll[ux] will not generate format warnings by gcc.
      
      The only gcc cross-compiler that I have is 4.0.2, which generates errors
      about miscompiling __weak references, so I have commented out that line in
      compiler-gcc4.h so that most of these compile, but more builds and real
      machine testing would be Real Good.
      
      [akpm@linux-foundation.org: fix warning]
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      From: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5f0e3da6
    • I
      alpha: xchg/cmpxchg cleanup and fixes · a6209d6d
      Ivan Kokshaysky 提交于
      - "_local" versions of xchg/cmpxchg functions duplicate code
        of non-local ones (quite a few pages of assembler), except
        memory barriers. We can generate these two variants from a
        single header file using simple macros;
      
      - convert xchg macro back to inline function using always_inline
        attribute;
      
      - use proper argument types for cmpxchg_u8/u16 functions
        to fix a problem with negative arguments.
      Signed-off-by: NIvan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Richard Henderson <rth@twiddle.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a6209d6d
    • C
      MAINTAINERS: add the missing linux alpha port mailling list · a9406699
      Cheng Renquan 提交于
      Signed-off-by: NCheng Renquan <crquan@gmail.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9406699
    • R
      alpha: fix macros · 0b42afd0
      Roel Kluin 提交于
      When this macros isn't called with 'fixup', e.g.  with foo this will
      incorectly expand to foo->foo.bits.errreg
      Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Richard Henderson <rth@twiddle.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b42afd0
    • H
      shmem: writepage directly to swap · 9fab5619
      Hugh Dickins 提交于
      Synopsis: if shmem_writepage calls swap_writepage directly, most shmem
      swap loads benefit, and a catastrophic interaction between SLUB and some
      flash storage is avoided.
      
      shmem_writepage() has always been peculiar in making no attempt to write:
      it has just transferred a shmem page from file cache to swap cache, then
      let that page make its way around the LRU again before being written and
      freed.
      
      The idea was that people use tmpfs because they want those pages to stay
      in RAM; so although we give it an overflow to swap, we should resist
      writing too soon, giving those pages a second chance before they can be
      reclaimed.
      
      That was always questionable, and I've toyed with this patch for years;
      but never had a clear justification to depart from the original design.
      
      It became more questionable in 2.6.28, when the split LRU patches classed
      shmem and tmpfs pages as SwapBacked rather than as file_cache: that in
      itself gives them more resistance to reclaim than normal file pages.  I
      prepared this patch for 2.6.29, but the merge window arrived before I'd
      completed gathering statistics to justify sending it in.
      
      Then while comparing SLQB against SLUB, running SLUB on a laptop I'd
      habitually used with SLAB, I found SLUB to run my tmpfs kbuild swapping
      tests five times slower than SLAB or SLQB - other machines slower too, but
      nowhere near so bad.  Simpler "cp -a" swapping tests showed the same.
      
      slub_max_order=0 brings sanity to all, but heavy swapping is too far from
      normal to justify such a tuning.  The crucial factor on that laptop turns
      out to be that I'm using an SD card for swap.  What happens is this:
      
      By default, SLUB uses order-2 pages for shmem_inode_cache (and many other
      fs inodes), so creating tmpfs files under memory pressure brings lumpy
      reclaim into play.  One subpage of the order is chosen from the bottom of
      the LRU as usual, then the other three picked out from their random
      positions on the LRUs.
      
      In a tmpfs load, many of these pages will be ones which already passed
      through shmem_writepage, so already have swap allocated.  And though their
      offsets on swap were probably allocated sequentially, now that the pages
      are picked off at random, their swap offsets are scattered.
      
      But the flash storage on the SD card is very sensitive to having its
      writes merged: once swap is written at scattered offsets, performance
      falls apart.  Rotating disk seeks increase too, but less disastrously.
      
      So: stop giving shmem/tmpfs pages a second pass around the LRU, write them
      out to swap as soon as their swap has been allocated.
      
      It's surely possible to devise an artificial load which runs faster the
      old way, one whose sizing is such that the tmpfs pages on their second
      pass are the ones that are wanted again, and other pages not.
      
      But I've not yet found such a load: on all machines, under the loads I've
      tried, immediate swap_writepage speeds up shmem swapping: especially when
      using the SLUB allocator (and more effectively than slub_max_order=0), but
      also with the others; and it also reduces the variance between runs.  How
      much faster varies widely: a factor of five is rare, 5% is common.
      
      One load which might have suffered: imagine a swapping shmem load in a
      limited mem_cgroup on a machine with plenty of memory.  Before 2.6.29 the
      swapcache was not charged, and such a load would have run quickest with
      the shmem swapcache never written to swap.  But now swapcache is charged,
      so even this load benefits from shmem_writepage directly to swap.
      
      Apologies for the #ifndef CONFIG_SWAP swap_writepage() stub in swap.h:
      it's silly because that will never get called; but refactoring shmem.c
      sensibly according to CONFIG_SWAP will be a separate task.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9fab5619
    • K
      vmscan: fix it to take care of nodemask · 327c0e96
      KAMEZAWA Hiroyuki 提交于
      try_to_free_pages() is used for the direct reclaim of up to
      SWAP_CLUSTER_MAX pages when watermarks are low.  The caller to
      alloc_pages_nodemask() can specify a nodemask of nodes that are allowed to
      be used but this is not passed to try_to_free_pages().  This can lead to
      unnecessary reclaim of pages that are unusable by the caller and int the
      worst case lead to allocation failure as progress was not been make where
      it is needed.
      
      This patch passes the nodemask used for alloc_pages_nodemask() to
      try_to_free_pages().
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      327c0e96
    • J
      ramfs-nommu: use generic lru cache · 2678958e
      Johannes Weiner 提交于
      Instead of open-coding the lru-list-add pagevec batching when expanding a
      file mapping from zero, defer to the appropriate page cache function that
      also takes care of adding the page to the lru list.
      
      This is cleaner, saves code and reduces the stack footprint by 16 words
      worth of pagevec.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.com>
      Cc: MinChan Kim <minchan.kim@gmail.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2678958e
    • D
      vmscan: print shrink_slab symbol name on negative shrinker objects · 88c3bd70
      David Rientjes 提交于
      When a shrinker has a negative number of objects to delete, the symbol
      name of the shrinker should be printed, not shrink_slab.  This also makes
      the error message slightly more informative.
      
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      88c3bd70
    • D
      nommu: make CONFIG_UNEVICTABLE_LRU available when CONFIG_MMU=n · 71aa653c
      David Howells 提交于
      Make CONFIG_UNEVICTABLE_LRU available when CONFIG_MMU=n.  There's no logical
      reason it shouldn't be available, and it can be used for ramfs.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Enrik Berkhan <Enrik.Berkhan@ge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71aa653c
    • D
      nommu: there is no mlock() for NOMMU, so don't provide the bits · 33925b25
      David Howells 提交于
      The mlock() facility does not exist for NOMMU since all mappings are
      effectively locked anyway, so we don't make the bits available when
      they're not useful.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Greg Ungerer <gerg@snapgear.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: Enrik Berkhan <Enrik.Berkhan@ge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      33925b25
    • A
      mm: use debug_kmap_atomic · 7ca43e75
      Akinobu Mita 提交于
      Use debug_kmap_atomic in kmap_atomic, kmap_atomic_pfn, and
      iomap_atomic_prot_pfn.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ca43e75
    • A
      mm: introduce debug_kmap_atomic · f4112de6
      Akinobu Mita 提交于
      x86 has debug_kmap_atomic_prot() which is error checking function for
      kmap_atomic.  It is usefull for the other architectures, although it needs
      CONFIG_TRACE_IRQFLAGS_SUPPORT.
      
      This patch exposes it to the other architectures.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f4112de6
    • H
      mm: page_mkwrite change prototype to match fault: fix sysfs · 851a039c
      Hugh Dickins 提交于
      Fix warnings and return values in sysfs bin_page_mkwrite(), fixing
      fs/sysfs/bin.c: In function `bin_page_mkwrite':
      fs/sysfs/bin.c:250: warning: passing argument 2 of `bb->vm_ops->page_mkwrite' from incompatible pointer type
      fs/sysfs/bin.c: At top level:
      fs/sysfs/bin.c:280: warning: initialization from incompatible pointer type
      
      Expects to have my [PATCH next] sysfs: fix some bin_vm_ops errors
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: "Eric W. Biederman" <ebiederm@aristanetworks.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      851a039c
    • N
      fs: fix page_mkwrite error cases in core code and btrfs · 56a76f82
      Nick Piggin 提交于
      page_mkwrite is called with neither the page lock nor the ptl held.  This
      means a page can be concurrently truncated or invalidated out from
      underneath it.  Callers are supposed to prevent truncate races themselves,
      however previously the only thing they can do in case they hit one is to
      raise a SIGBUS.  A sigbus is wrong for the case that the page has been
      invalidated or truncated within i_size (eg.  hole punched).  Callers may
      also have to perform memory allocations in this path, where again, SIGBUS
      would be wrong.
      
      The previous patch ("mm: page_mkwrite change prototype to match fault")
      made it possible to properly specify errors.  Convert the generic buffer.c
      code and btrfs to return sane error values (in the case of page removed
      from pagecache, VM_FAULT_NOPAGE will cause the fault handler to exit
      without doing anything, and the fault will be retried properly).
      
      This fixes core code, and converts btrfs as a template/example.  All other
      filesystems defining their own page_mkwrite should be fixed in a similar
      manner.
      Acked-by: NChris Mason <chris.mason@oracle.com>
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      56a76f82
    • N
      mm: page_mkwrite change prototype to match fault · c2ec175c
      Nick Piggin 提交于
      Change the page_mkwrite prototype to take a struct vm_fault, and return
      VM_FAULT_xxx flags.  There should be no functional change.
      
      This makes it possible to return much more detailed error information to
      the VM (and also can provide more information eg.  virtual_address to the
      driver, which might be important in some special cases).
      
      This is required for a subsequent fix.  And will also make it easier to
      merge page_mkwrite() with fault() in future.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <joel.becker@oracle.com>
      Cc: Artem Bityutskiy <dedekind@infradead.org>
      Cc: Felix Blyakher <felixb@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2ec175c
    • A
      mm: enable hashdist by default on 64bit NUMA · c2fdf3a9
      Anton Blanchard 提交于
      On PowerPC we allocate large boot time hashes on node 0.  This leads to an
      imbalance in the free memory, for example on a 64GB box (4 x 16GB nodes):
      
      Free memory:
      Node 0: 97.03%
      Node 1: 98.54%
      Node 2: 98.42%
      Node 3: 98.53%
      
      If we switch to using vmalloc (like ia64 and x86-64) things are more
      balanced:
      
      Free memory:
      Node 0: 97.53%
      Node 1: 98.35%
      Node 2: 98.33%
      Node 3: 98.33%
      
      For many HPC applications we are limited by the free available memory on
      the smallest node, so even though the same amount of memory is used the
      better balancing helps.
      
      Since all 64bit NUMA capable architectures should have sufficient vmalloc
      space, it makes sense to enable it via CONFIG_64BIT.
      Signed-off-by: NAnton Blanchard <anton@samba.org>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NRalf Baechle <ralf@linux-mips.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Richard Henderson <rth@twiddle.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c2fdf3a9