1. 06 2月, 2008 40 次提交
    • M
      ACPI: battery: add sysfs serial number · 7c2670bb
      maximilian attems 提交于
      egrep serial /proc/acpi/battery/BAT0/info
      serial number:           32090
      
      serial number can tell you from the imminent danger
      of beeing set on fire.
      Signed-off-by: Nmaximilian attems <max@stro.at>
      Acked-by: NAlexey Starikovskiy <astarikovskiy@suse.de>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      7c2670bb
    • F
      mac68k: add nubus card definitions and a typo fix · 57dfee7c
      Finn Thain 提交于
      Add some new card definitions and fix a typo (from Eugen Paiuc).
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      57dfee7c
    • R
      leds: add possibility to remove leds classdevs during suspend/resume · fa23f5cc
      Rafael J. Wysocki 提交于
      Make it possible to unregister a led classdev object in a safe way during a
      suspend/resume cycle.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Michael Buesch <mb@bu3sch.de>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa23f5cc
    • R
      HWRNG: add possibility to remove hwrng devices during suspend/resume · a41e3dc4
      Rafael J. Wysocki 提交于
      Make it possible to unregister a Hardware Random Number Generator
      device object in a safe way during a suspend/resume cycle.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Acked-by: NMichael Buesch <mb@bu3sch.de>
      Cc: Michael Buesch <mb@bu3sch.de>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a41e3dc4
    • R
      Misc: Add possibility to remove misc devices during suspend/resume · 533354d4
      Rafael J. Wysocki 提交于
      Make it possible to unregister a misc device object in a safe way during a
      suspend/resume cycle.
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: Michael Buesch <mb@bu3sch.de>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Cc: Richard Purdie <rpurdie@rpsys.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      533354d4
    • M
      latency.c: use QoS infrastructure · f011e2e2
      Mark Gross 提交于
      Replace latency.c use with pm_qos_params use.
      Signed-off-by: Nmark gross <mgross@linux.intel.com>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Jaroslav Kysela <perex@suse.cz>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f011e2e2
    • M
      pm qos infrastructure and interface · d82b3518
      Mark Gross 提交于
      The following patch is a generalization of the latency.c implementation done
      by Arjan last year.  It provides infrastructure for more than one parameter,
      and exposes a user mode interface for processes to register pm_qos
      expectations of processes.
      
      This interface provides a kernel and user mode interface for registering
      performance expectations by drivers, subsystems and user space applications on
      one of the parameters.
      
      Currently we have {cpu_dma_latency, network_latency, network_throughput} as
      the initial set of pm_qos parameters.
      
      The infrastructure exposes multiple misc device nodes one per implemented
      parameter.  The set of parameters implement is defined by pm_qos_power_init()
      and pm_qos_params.h.  This is done because having the available parameters
      being runtime configurable or changeable from a driver was seen as too easy to
      abuse.
      
      For each parameter a list of performance requirements is maintained along with
      an aggregated target value.  The aggregated target value is updated with
      changes to the requirement list or elements of the list.  Typically the
      aggregated target value is simply the max or min of the requirement values
      held in the parameter list elements.
      
      >From kernel mode the use of this interface is simple:
      
      pm_qos_add_requirement(param_id, name, target_value):
      
        Will insert a named element in the list for that identified PM_QOS
        parameter with the target value.  Upon change to this list the new target is
        recomputed and any registered notifiers are called only if the target value
        is now different.
      
      pm_qos_update_requirement(param_id, name, new_target_value):
      
        Will search the list identified by the param_id for the named list element
        and then update its target value, calling the notification tree if the
        aggregated target is changed.  with that name is already registered.
      
      pm_qos_remove_requirement(param_id, name):
      
        Will search the identified list for the named element and remove it, after
        removal it will update the aggregate target and call the notification tree
        if the target was changed as a result of removing the named requirement.
      
      >From user mode:
      
        Only processes can register a pm_qos requirement.  To provide for
        automatic cleanup for process the interface requires the process to register
        its parameter requirements in the following way:
      
        To register the default pm_qos target for the specific parameter, the
        process must open one of /dev/[cpu_dma_latency, network_latency,
        network_throughput]
      
        As long as the device node is held open that process has a registered
        requirement on the parameter.  The name of the requirement is
        "process_<PID>" derived from the current->pid from within the open system
        call.
      
        To change the requested target value the process needs to write a s32
        value to the open device node.  This translates to a
        pm_qos_update_requirement call.
      
        To remove the user mode request for a target value simply close the device
        node.
      
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foundation.org: fix build]
      [akpm@linux-foundation.org: fix build again]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: Nmark gross <mgross@linux.intel.com>
      Cc: "John W. Linville" <linville@tuxdriver.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Jaroslav Kysela <perex@suse.cz>
      Cc: Takashi Iwai <tiwai@suse.de>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Venki Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Adam Belay <abelay@novell.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d82b3518
    • A
      make kernel_shutdown_prepare() static · 4ef7229f
      Adrian Bunk 提交于
      kernel_shutdown_prepare() can now become static.
      Signed-off-by: NAdrian Bunk <bunk@kernel.org>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ef7229f
    • C
      Smack: Simplified Mandatory Access Control Kernel · e114e473
      Casey Schaufler 提交于
      Smack is the Simplified Mandatory Access Control Kernel.
      
      Smack implements mandatory access control (MAC) using labels
      attached to tasks and data containers, including files, SVIPC,
      and other tasks. Smack is a kernel based scheme that requires
      an absolute minimum of application support and a very small
      amount of configuration data.
      
      Smack uses extended attributes and
      provides a set of general mount options, borrowing technics used
      elsewhere. Smack uses netlabel for CIPSO labeling. Smack provides
      a pseudo-filesystem smackfs that is used for manipulation of
      system Smack attributes.
      
      The patch, patches for ls and sshd, a README, a startup script,
      and x86 binaries for ls and sshd are also available on
      
          http://www.schaufler-ca.com
      
      Development has been done using Fedora Core 7 in a virtual machine
      environment and on an old Sony laptop.
      
      Smack provides mandatory access controls based on the label attached
      to a task and the label attached to the object it is attempting to
      access. Smack labels are deliberately short (1-23 characters) text
      strings. Single character labels using special characters are reserved
      for system use. The only operation applied to Smack labels is equality
      comparison. No wildcards or expressions, regular or otherwise, are
      used. Smack labels are composed of printable characters and may not
      include "/".
      
      A file always gets the Smack label of the task that created it.
      
      Smack defines and uses these labels:
      
          "*" - pronounced "star"
          "_" - pronounced "floor"
          "^" - pronounced "hat"
          "?" - pronounced "huh"
      
      The access rules enforced by Smack are, in order:
      
      1. Any access requested by a task labeled "*" is denied.
      2. A read or execute access requested by a task labeled "^"
         is permitted.
      3. A read or execute access requested on an object labeled "_"
         is permitted.
      4. Any access requested on an object labeled "*" is permitted.
      5. Any access requested by a task on an object with the same
         label is permitted.
      6. Any access requested that is explicitly defined in the loaded
         rule set is permitted.
      7. Any other access is denied.
      
      Rules may be explicitly defined by writing subject,object,access
      triples to /smack/load.
      
      Smack rule sets can be easily defined that describe Bell&LaPadula
      sensitivity, Biba integrity, and a variety of interesting
      configurations. Smack rule sets can be modified on the fly to
      accommodate changes in the operating environment or even the time
      of day.
      
      Some practical use cases:
      
      Hierarchical levels. The less common of the two usual uses
      for MLS systems is to define hierarchical levels, often
      unclassified, confidential, secret, and so on. To set up smack
      to support this, these rules could be defined:
      
         C        Unclass rx
         S        C       rx
         S        Unclass rx
         TS       S       rx
         TS       C       rx
         TS       Unclass rx
      
      A TS process can read S, C, and Unclass data, but cannot write it.
      An S process can read C and Unclass. Note that specifying that
      TS can read S and S can read C does not imply TS can read C, it
      has to be explicitly stated.
      
      Non-hierarchical categories. This is the more common of the
      usual uses for an MLS system. Since the default rule is that a
      subject cannot access an object with a different label no
      access rules are required to implement compartmentalization.
      
      A case that the Bell & LaPadula policy does not allow is demonstrated
      with this Smack access rule:
      
      A case that Bell&LaPadula does not allow that Smack does:
      
          ESPN    ABC   r
          ABC     ESPN  r
      
      On my portable video device I have two applications, one that
      shows ABC programming and the other ESPN programming. ESPN wants
      to show me sport stories that show up as news, and ABC will
      only provide minimal information about a sports story if ESPN
      is covering it. Each side can look at the other's info, neither
      can change the other. Neither can see what FOX is up to, which
      is just as well all things considered.
      
      Another case that I especially like:
      
          SatData Guard   w
          Guard   Publish w
      
      A program running with the Guard label opens a UDP socket and
      accepts messages sent by a program running with a SatData label.
      The Guard program inspects the message to ensure it is wholesome
      and if it is sends it to a program running with the Publish label.
      This program then puts the information passed in an appropriate
      place. Note that the Guard program cannot write to a Publish
      file system object because file system semanitic require read as
      well as write.
      
      The four cases (categories, levels, mutual read, guardbox) here
      are all quite real, and problems I've been asked to solve over
      the years. The first two are easy to do with traditonal MLS systems
      while the last two you can't without invoking privilege, at least
      for a while.
      Signed-off-by: NCasey Schaufler <casey@schaufler-ca.com>
      Cc: Joshua Brindle <method@manicmethod.com>
      Cc: Paul Moore <paul.moore@hp.com>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: "Ahmed S. Darwish" <darwish.07@gmail.com>
      Cc: Andrew G. Morgan <morgan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e114e473
    • S
      capabilities: introduce per-process capability bounding set · 3b7391de
      Serge E. Hallyn 提交于
      The capability bounding set is a set beyond which capabilities cannot grow.
       Currently cap_bset is per-system.  It can be manipulated through sysctl,
      but only init can add capabilities.  Root can remove capabilities.  By
      default it includes all caps except CAP_SETPCAP.
      
      This patch makes the bounding set per-process when file capabilities are
      enabled.  It is inherited at fork from parent.  Noone can add elements,
      CAP_SETPCAP is required to remove them.
      
      One example use of this is to start a safer container.  For instance, until
      device namespaces or per-container device whitelists are introduced, it is
      best to take CAP_MKNOD away from a container.
      
      The bounding set will not affect pP and pE immediately.  It will only
      affect pP' and pE' after subsequent exec()s.  It also does not affect pI,
      and exec() does not constrain pI'.  So to really start a shell with no way
      of regain CAP_MKNOD, you would do
      
      	prctl(PR_CAPBSET_DROP, CAP_MKNOD);
      	cap_t cap = cap_get_proc();
      	cap_value_t caparray[1];
      	caparray[0] = CAP_MKNOD;
      	cap_set_flag(cap, CAP_INHERITABLE, 1, caparray, CAP_DROP);
      	cap_set_proc(cap);
      	cap_free(cap);
      
      The following test program will get and set the bounding
      set (but not pI).  For instance
      
      	./bset get
      		(lists capabilities in bset)
      	./bset drop cap_net_raw
      		(starts shell with new bset)
      		(use capset, setuid binary, or binary with
      		file capabilities to try to increase caps)
      
      ************************************************************
      cap_bound.c
      ************************************************************
       #include <sys/prctl.h>
       #include <linux/capability.h>
       #include <sys/types.h>
       #include <unistd.h>
       #include <stdio.h>
       #include <stdlib.h>
       #include <string.h>
      
       #ifndef PR_CAPBSET_READ
       #define PR_CAPBSET_READ 23
       #endif
      
       #ifndef PR_CAPBSET_DROP
       #define PR_CAPBSET_DROP 24
       #endif
      
      int usage(char *me)
      {
      	printf("Usage: %s get\n", me);
      	printf("       %s drop <capability>\n", me);
      	return 1;
      }
      
       #define numcaps 32
      char *captable[numcaps] = {
      	"cap_chown",
      	"cap_dac_override",
      	"cap_dac_read_search",
      	"cap_fowner",
      	"cap_fsetid",
      	"cap_kill",
      	"cap_setgid",
      	"cap_setuid",
      	"cap_setpcap",
      	"cap_linux_immutable",
      	"cap_net_bind_service",
      	"cap_net_broadcast",
      	"cap_net_admin",
      	"cap_net_raw",
      	"cap_ipc_lock",
      	"cap_ipc_owner",
      	"cap_sys_module",
      	"cap_sys_rawio",
      	"cap_sys_chroot",
      	"cap_sys_ptrace",
      	"cap_sys_pacct",
      	"cap_sys_admin",
      	"cap_sys_boot",
      	"cap_sys_nice",
      	"cap_sys_resource",
      	"cap_sys_time",
      	"cap_sys_tty_config",
      	"cap_mknod",
      	"cap_lease",
      	"cap_audit_write",
      	"cap_audit_control",
      	"cap_setfcap"
      };
      
      int getbcap(void)
      {
      	int comma=0;
      	unsigned long i;
      	int ret;
      
      	printf("i know of %d capabilities\n", numcaps);
      	printf("capability bounding set:");
      	for (i=0; i<numcaps; i++) {
      		ret = prctl(PR_CAPBSET_READ, i);
      		if (ret < 0)
      			perror("prctl");
      		else if (ret==1)
      			printf("%s%s", (comma++) ? ", " : " ", captable[i]);
      	}
      	printf("\n");
      	return 0;
      }
      
      int capdrop(char *str)
      {
      	unsigned long i;
      
      	int found=0;
      	for (i=0; i<numcaps; i++) {
      		if (strcmp(captable[i], str) == 0) {
      			found=1;
      			break;
      		}
      	}
      	if (!found)
      		return 1;
      	if (prctl(PR_CAPBSET_DROP, i)) {
      		perror("prctl");
      		return 1;
      	}
      	return 0;
      }
      
      int main(int argc, char *argv[])
      {
      	if (argc<2)
      		return usage(argv[0]);
      	if (strcmp(argv[1], "get")==0)
      		return getbcap();
      	if (strcmp(argv[1], "drop")!=0 || argc<3)
      		return usage(argv[0]);
      	if (capdrop(argv[2])) {
      		printf("unknown capability\n");
      		return 1;
      	}
      	return execl("/bin/bash", "/bin/bash", NULL);
      }
      ************************************************************
      
      [serue@us.ibm.com: fix typo]
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndrew G. Morgan <morgan@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: James Morris <jmorris@namei.org>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>a
      Signed-off-by: N"Serge E. Hallyn" <serue@us.ibm.com>
      Tested-by: NJiri Slaby <jirislaby@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3b7391de
    • A
      Remove unnecessary include from include/linux/capability.h · 46c383cc
      Andrew Morgan 提交于
      KaiGai Kohei observed that this line in the linux header is not needed.
      Signed-off-by: NAndrew G. Morgan <morgan@kernel.org>
      Cc: KaiGai Kohei <kaigai@kaigai.gr.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46c383cc
    • A
      Add 64-bit capability support to the kernel · e338d263
      Andrew Morgan 提交于
      The patch supports legacy (32-bit) capability userspace, and where possible
      translates 32-bit capabilities to/from userspace and the VFS to 64-bit
      kernel space capabilities.  If a capability set cannot be compressed into
      32-bits for consumption by user space, the system call fails, with -ERANGE.
      
      FWIW libcap-2.00 supports this change (and earlier capability formats)
      
       http://www.kernel.org/pub/linux/libs/security/linux-privs/kernel-2.6/
      
      [akpm@linux-foundation.org: coding-syle fixes]
      [akpm@linux-foundation.org: use get_task_comm()]
      [ezk@cs.sunysb.edu: build fix]
      [akpm@linux-foundation.org: do not initialise statics to 0 or NULL]
      [akpm@linux-foundation.org: unused var]
      [serue@us.ibm.com: export __cap_ symbols]
      Signed-off-by: NAndrew G. Morgan <morgan@kernel.org>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Signed-off-by: NErez Zadok <ezk@cs.sunysb.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e338d263
    • A
      revert "capabilities: clean up file capability reading" · 8f6936f4
      Andrew Morton 提交于
      Revert b68680e4 to make way for the next
      patch: "Add 64-bit capability support to the kernel".
      
      We want to keep the vfs_cap_data.data[] structure, using two 'data's for
      64-bit caps (and later three for 96-bit caps), whereas
      b68680e4 had gotten rid of the 'data' struct
      made its members inline.
      
      The 64-bit caps patch keeps the stack abuse fix at get_file_caps(), which was
      the more important part of that patch.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Cc: James Morris <jmorris@namei.org>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Andrew Morgan <morgan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8f6936f4
    • D
      VFS/Security: Rework inode_getsecurity and callers to return resulting buffer · 42492594
      David P. Quigley 提交于
      This patch modifies the interface to inode_getsecurity to have the function
      return a buffer containing the security blob and its length via parameters
      instead of relying on the calling function to give it an appropriately sized
      buffer.
      
      Security blobs obtained with this function should be freed using the
      release_secctx LSM hook.  This alleviates the problem of the caller having to
      guess a length and preallocate a buffer for this function allowing it to be
      used elsewhere for Labeled NFS.
      
      The patch also removed the unused err parameter.  The conversion is similar to
      the one performed by Al Viro for the security_getprocattr hook.
      Signed-off-by: NDavid P. Quigley <dpquigl@tycho.nsa.gov>
      Cc: Stephen Smalley <sds@tycho.nsa.gov>
      Cc: Chris Wright <chrisw@sous-sol.org>
      Acked-by: NJames Morris <jmorris@namei.org>
      Acked-by: NSerge Hallyn <serue@us.ibm.com>
      Cc: Casey Schaufler <casey@schaufler-ca.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42492594
    • F
      writeback: speed up writeback of big dirty files · 8bc3be27
      Fengguang Wu 提交于
      After making dirty a 100M file, the normal behavior is to start the
      writeback for all data after 30s delays.  But sometimes the following
      happens instead:
      
      	- after 30s:    ~4M
      	- after 5s:     ~4M
      	- after 5s:     all remaining 92M
      
      Some analyze shows that the internal io dispatch queues goes like this:
      
      		s_io            s_more_io
      		-------------------------
      	1)	100M,1K         0
      	2)	1K              96M
      	3)	0               96M
      1) initial state with a 100M file and a 1K file
      
      2) 4M written, nr_to_write <= 0, so write more
      
      3) 1K written, nr_to_write > 0, no more writes(BUG)
      
      nr_to_write > 0 in (3) fools the upper layer to think that data have all
      been written out.  The big dirty file is actually still sitting in
      s_more_io.  We cannot simply splice s_more_io back to s_io as soon as s_io
      becomes empty, and let the loop in generic_sync_sb_inodes() continue: this
      may starve newly expired inodes in s_dirty.  It is also not an option to
      draw inodes from both s_more_io and s_dirty, an let the loop go on: this
      might lead to live locks, and might also starve other superblocks in sync
      time(well kupdate may still starve some superblocks, that's another bug).
      
      We have to return when a full scan of s_io completes.  So nr_to_write > 0
      does not necessarily mean that "all data are written".  This patch
      introduces a flag writeback_control.more_io to indicate that more io should
      be done.  With it the big dirty file no longer has to wait for the next
      kupdate invokation 5s later.
      
      In sync_sb_inodes() we only set more_io on super_blocks we actually
      visited.  This avoids the interaction between two pdflush deamons.
      
      Also in __sync_single_inode() we don't blindly keep requeuing the io if the
      filesystem cannot progress.  Failing to do so may lead to 100% iowait.
      Tested-by: NMike Snitzer <snitzer@gmail.com>
      Signed-off-by: NFengguang Wu <wfg@mail.ustc.edu.cn>
      Cc: Michael Rubin <mrubin@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bc3be27
    • N
      mm: fix PageUptodate data race · 0ed361de
      Nick Piggin 提交于
      After running SetPageUptodate, preceeding stores to the page contents to
      actually bring it uptodate may not be ordered with the store to set the
      page uptodate.
      
      Therefore, another CPU which checks PageUptodate is true, then reads the
      page contents can get stale data.
      
      Fix this by having an smp_wmb before SetPageUptodate, and smp_rmb after
      PageUptodate.
      
      Many places that test PageUptodate, do so with the page locked, and this
      would be enough to ensure memory ordering in those places if
      SetPageUptodate were only called while the page is locked.  Unfortunately
      that is not always the case for some filesystems, but it could be an idea
      for the future.
      
      Also bring the handling of anonymous page uptodateness in line with that of
      file backed page management, by marking anon pages as uptodate when they
      _are_ uptodate, rather than when our implementation requires that they be
      marked as such.  Doing allows us to get rid of the smp_wmb's in the page
      copying functions, which were especially added for anonymous pages for an
      analogous memory ordering problem.  Both file and anonymous pages are
      handled with the same barriers.
      
      FAQ:
      Q. Why not do this in flush_dcache_page?
      A. Firstly, flush_dcache_page handles only one side (the smb side) of the
      ordering protocol; we'd still need smp_rmb somewhere. Secondly, hiding away
      memory barriers in a completely unrelated function is nasty; at least in the
      PageUptodate macros, they are located together with (half) the operations
      involved in the ordering. Thirdly, the smp_wmb is only required when first
      bringing the page uptodate, wheras flush_dcache_page should be called each time
      it is written to through the kernel mapping. It is logically the wrong place to
      put it.
      
      Q. Why does this increase my text size / reduce my performance / etc.
      A. Because it is adding the necessary instructions to eliminate the data-race.
      
      Q. Can it be improved?
      A. Yes, eg. if you were to create a rule that all SetPageUptodate operations
      run under the page lock, we could avoid the smp_rmb places where PageUptodate
      is queried under the page lock. Requires audit of all filesystems and at least
      some would need reworking. That's great you're interested, I'm eagerly awaiting
      your patches.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ed361de
    • B
      mm/page-writeback: highmem_is_dirtyable option · 195cf453
      Bron Gondwana 提交于
      Add vm.highmem_is_dirtyable toggle
      
      A 32 bit machine with HIGHMEM64 enabled running DCC has an MMAPed file of
      approximately 2Gb size which contains a hash format that is written
      randomly by the dbclean process.  On 2.6.16 this process took a few
      minutes.  With lowmem only accounting of dirty ratios, this takes about 12
      hours of 100% disk IO, all random writes.
      
      Include a toggle in /proc/sys/vm/highmem_is_dirtyable which can be set to 1 to
      add the highmem back to the total available memory count.
      
      [akpm@linux-foundation.org: Fix the CONFIG_DETECT_SOFTLOCKUP=y build]
      Signed-off-by: NBron Gondwana <brong@fastmail.fm>
      Cc: Ethan Solomita <solo@google.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: WU Fengguang <wfg@mail.ustc.edu.cn>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      195cf453
    • C
      Page allocator: get rid of the list of cold pages · 3dfa5721
      Christoph Lameter 提交于
      We have repeatedly discussed if the cold pages still have a point. There is
      one way to join the two lists: Use a single list and put the cold pages at the
      end and the hot pages at the beginning. That way a single list can serve for
      both types of allocations.
      
      The discussion of the RFC for this and Mel's measurements indicate that
      there may not be too much of a point left to having separate lists for
      hot and cold pages (see http://marc.info/?t=119492914200001&r=1&w=2).
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Martin Bligh <mbligh@mbligh.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3dfa5721
    • C
      Page allocator: clean up pcp draining functions · 9f8f2172
      Christoph Lameter 提交于
      - Add comments explaing how drain_pages() works.
      
      - Eliminate useless functions
      
      - Rename drain_all_local_pages to drain_all_pages(). It does drain
        all pages not only those of the local processor.
      
      - Eliminate useless interrupt off / on sequences. drain_pages()
        disables interrupts on its own. The execution thread is
        pinned to processor by the caller. So there is no need to
        disable interrupts.
      
      - Put drain_all_pages() declaration in gfp.h and remove the
        declarations from suspend.h and from mm/memory_hotplug.c
      
      - Make software suspend call drain_all_pages(). The draining
        of processor local pages is may not the right approach if
        software suspend wants to support SMP. If they call drain_all_pages
        then we can make drain_pages() static.
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Acked-by: NMel Gorman <mel@csn.ul.ie>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Daniel Walker <dwalker@mvista.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f8f2172
    • M
      maps4: move clear_refs code to task_mmu.c · f248dcb3
      Matt Mackall 提交于
      This puts all the clear_refs code where it belongs and probably lets things
      compile on MMU-less systems as well.
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f248dcb3
    • M
      maps4: introduce a generic page walker · e6473092
      Matt Mackall 提交于
      Introduce a general page table walker
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6473092
    • M
      maps4: move is_swap_pte · 698dd4ba
      Matt Mackall 提交于
      Move is_swap_pte helper function to swapops.h for use by pagemap code
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      698dd4ba
    • D
      maps4: rework TASK_SIZE macros · 82455257
      Dave Hansen 提交于
      The following replaces the earlier patches sent.  It should address
      David Rientjes's comments, and has been compile tested on all the
      architectures that it touches, save for parisc.
      
      For the /proc/<pid>/pagemap code[1], we need to able to query how
      much virtual address space a particular task has.  The trick is
      that we do it through /proc and can't use TASK_SIZE since it
      references "current" on some arches.  The process opening the
      /proc file might be a 32-bit process opening a 64-bit process's
      pagemap file.
      
      x86_64 already has a TASK_SIZE_OF() macro:
      
      #define TASK_SIZE_OF(child)     ((test_tsk_thread_flag(child, TIF_IA32)) ? IA32_PAGE_OFFSET : TASK_SIZE64)
      
      I'd like to have that for other architectures.  So, add it
      for all the architectures that actually use "current" in
      their TASK_SIZE.  For the others, just add a quick #define
      in sched.h to use plain old TASK_SIZE.
      
      1. http://www.linuxworld.com/news/2007/042407-kernel.html
      
      - MIPS portion from Ralf Baechle <ralf@linux-mips.org>
      
      [akpm@linux-foundation.org: fix mips build]
      Signed-off-by: NDave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      Signed-off-by: NMatt Mackall <mpm@selenic.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      82455257
    • H
      tmpfs: move swap swizzling into shmem · 73b1262f
      Hugh Dickins 提交于
      move_to_swap_cache and move_from_swap_cache functions (which swizzle a page
      between tmpfs page cache and swap cache, to avoid page copying) are only used
      by shmem.c; and our subsequent fix for unionfs needs different treatments in
      the two instances of move_from_swap_cache.  Move them from swap_state.c into
      their callsites shmem_writepage, shmem_unuse_inode and shmem_getpage, making
      add_to_swap_cache externally visible.
      
      shmem.c likes to say set_page_dirty where swap_state.c liked to say
      SetPageDirty: respect that diversity, which __set_page_dirty_no_writeback
      makes moot (and implies we should lose that "shift page from clean_pages to
      dirty_pages list" comment: it's on neither).
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      73b1262f
    • H
      swapin needs gfp_mask for loop on tmpfs · 02098fea
      Hugh Dickins 提交于
      Building in a filesystem on a loop device on a tmpfs file can hang when
      swapping, the loop thread caught in that infamous throttle_vm_writeout.
      
      In theory this is a long standing problem, which I've either never seen in
      practice, or long ago suppressed the recollection, after discounting my load
      and my tmpfs size as unrealistically high.  But now, with the new aops, it has
      become easy to hang on one machine.
      
      Loop used to grab_cache_page before the old prepare_write to tmpfs, which
      seems to have been enough to free up some memory for any swapin needed; but
      the new write_begin lets tmpfs find or allocate the page (much nicer, since
      grab_cache_page missed tmpfs pages in swapcache).
      
      When allocating a fresh page, tmpfs respects loop's mapping_gfp_mask, which
      has __GFP_IO|__GFP_FS stripped off, and throttle_vm_writeout is designed to
      break out when __GFP_IO or GFP_FS is unset; but when tmfps swaps in,
      read_swap_cache_async allocates with GFP_HIGHUSER_MOVABLE regardless of the
      mapping_gfp_mask - hence the hang.
      
      So, pass gfp_mask down the line from shmem_getpage to shmem_swapin to
      swapin_readahead to read_swap_cache_async to add_to_swap_cache.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02098fea
    • H
      swapin_readahead: move and rearrange args · 46017e95
      Hugh Dickins 提交于
      swapin_readahead has never sat well in mm/memory.c: move it to mm/swap_state.c
      beside its kindred read_swap_cache_async.  Why were its args in a different
      order?  rearrange them.  And since it was always followed by a
      read_swap_cache_async of the target page, fold that in and return struct
      page*.  Then CONFIG_SWAP=n no longer needs valid_swaphandles and
      read_swap_cache_async stubs.
      Signed-off-by: NHugh Dickins <hugh@veritas.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46017e95
    • C
      VM: allow get_page_unless_zero on compound pages · aec2c3ed
      Christoph Lameter 提交于
      Both slab defrag and the large blocksize patches need to ability to take
      refcounts on compound pages.  May be useful in other places as well.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aec2c3ed
    • C
      is_vmalloc_addr(): Check if an address is within the vmalloc boundaries · 9e2779fa
      Christoph Lameter 提交于
      Checking if an address is a vmalloc address is done in a couple of places.
      Define a common version in mm.h and replace the other checks.
      
      Again the include structures suck.  The definition of VMALLOC_START and
      VMALLOC_END is not available in vmalloc.h since highmem.c cannot be included
      there.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e2779fa
    • C
      vmalloc: add const to void* parameters · b3bdda02
      Christoph Lameter 提交于
      Make vmalloc functions work the same way as kfree() and friends that
      take a const void * argument.
      
      [akpm@linux-foundation.org: fix consts, coding-style]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3bdda02
    • C
      Move vmalloc_to_page() to mm/vmalloc. · 48667e7a
      Christoph Lameter 提交于
      We already have page table manipulation for vmalloc in vmalloc.c. Move the
      vmalloc_to_page() function there as well.
      
      Move the definitions for vmalloc related functions in mm.h to a newly created
      section.  A better place would be vmalloc.h but mm.h is basic and may depend
      on these functions.  An alternative would be to include vmalloc.h in mm.h
      (like done for vmstat.h).
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      48667e7a
    • C
      Pagecache zeroing: zero_user_segment, zero_user_segments and zero_user · eebd2aa3
      Christoph Lameter 提交于
      Simplify page cache zeroing of segments of pages through 3 functions
      
      zero_user_segments(page, start1, end1, start2, end2)
      
              Zeros two segments of the page. It takes the position where to
              start and end the zeroing which avoids length calculations and
      	makes code clearer.
      
      zero_user_segment(page, start, end)
      
              Same for a single segment.
      
      zero_user(page, start, length)
      
              Length variant for the case where we know the length.
      
      We remove the zero_user_page macro. Issues:
      
      1. Its a macro. Inline functions are preferable.
      
      2. The KM_USER0 macro is only defined for HIGHMEM.
      
         Having to treat this special case everywhere makes the
         code needlessly complex. The parameter for zeroing is always
         KM_USER0 except in one single case that we open code.
      
      Avoiding KM_USER0 makes a lot of code not having to be dealing
      with the special casing for HIGHMEM anymore. Dealing with
      kmap is only necessary for HIGHMEM configurations. In those
      configurations we use KM_USER0 like we do for a series of other
      functions defined in highmem.h.
      
      Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
      function could not be a macro. zero_user_* functions introduced
      here can be be inline because that constant is not used when these
      functions are called.
      
      Also extract the flushing of the caches to be outside of the kmap.
      
      [akpm@linux-foundation.org: fix nfs and ntfs build]
      [akpm@linux-foundation.org: fix ntfs build some more]
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: <linux-ext4@vger.kernel.org>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: David Chinner <dgc@sgi.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eebd2aa3
    • E
      gpiolib: pca9539 i2c gpio expander support · 9e60fdcf
      eric miao 提交于
      This adds a new-style I2C driver with basic support for the sixteen bit
      PCA9539 GPIO expanders.  These chips have multiple registers, push-pull output
      drivers, and (not supported in this patch) pin change interrupts.
      
      Board-specific code must provide "pca9539_platform_data" with each chip's
      "i2c_board_info".  That provides the GPIO numbers to be used by that chip, and
      callbacks for board-specific setup/teardown logic.
      
      Derived from drivers/i2c/chips/pca9539.c (which has no current known users).
      This is faster and simpler; it uses 16-bit register access, and cache the
      OUTPUT and DIRECTION registers for fast access
      Signed-off-by: Neric miao <eric.miao@marvell.com>
      Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
      Acked-by: NJean Delvare <khali@linux-fr.org>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Philipp Zabel <philipp.zabel@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ben Gardner <bgardner@wabtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9e60fdcf
    • D
      mcp23s08 spi gpio expander support · e58b9e27
      David Brownell 提交于
      Basic driver for 8-bit SPI based MCP23S08 GPIO expander, without support for
      IRQs or the shared chipselect mechanism.
      Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
      Cc: Jean Delvare <khali@linux-fr.org>
      Cc: Eric Miao <eric.miao@marvell.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Philipp Zabel <philipp.zabel@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ben Gardner <bgardner@wabtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e58b9e27
    • D
      gpiolib: pcf857x i2c gpio expander support · 15fae37d
      David Brownell 提交于
      This is a new-style I2C driver for most common 8 and 16 bit I2C based
      "quasi-bidirectional" GPIO expanders: pcf8574 or pcf8575, and several
      compatible models (mostly faster, supporting I2C at up to 1 MHz).
      
      The driver exposes the GPIO signals using the platform-neutral GPIO
      programming interface, so they are easily accessed by other kernel code.  The
      lack of such a flexible kernel API has been a big factor in the proliferation
      of board-specific drivers for these chips...  stuff that rarely makes it
      upstream since it's so ugly.  This driver will let such boards use standard
      calls.
      
      Since it's a new-style driver, these devices must be configured as part of
      board-specific init.  That eliminates the need for error-prone manual
      configuration of module parameters, and makes compatibility with legacy
      drivers (pcf8574.c, pc8575.c) for these chips easier (there's a clear
      either/or disjunction).
      Signed-off-by: NDavid Brownell <dbrownell@users.sourceforge.net>
      Acked-by: NJean Delvare <khali@linux-fr.org>
      Cc: Eric Miao <eric.miao@marvell.com>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: Philipp Zabel <philipp.zabel@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ben Gardner <bgardner@wabtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      15fae37d
    • F
      iommu sg merging: PCI: add dma segment boundary support · 59fc67de
      FUJITA Tomonori 提交于
      This adds PCI's accessor for segment_boundary_mask in device_dma_parameters.
      
      The default segment_boundary is set to 0xffffffff, same to the block layer's
      default value (and the scsi mid layer uses the same value).
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Jeff Garzik <jeff@garzik.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59fc67de
    • F
      iommu sg merging: add accessors for segment_boundary_mask in device_dma_parameters() · d22a6966
      FUJITA Tomonori 提交于
      This adds new accessors for segment_boundary_mask in device_dma_parameters
      structure in the same way I did for max_segment_size.  So we can easily change
      where to place struct device_dma_parameters in the future.
      
      dma_get_segment boundary returns 0xffffffff if dma_parms in struct device
      isn't set up properly.  0xffffffff is the default value used in the block
      layer and the scsi mid layer.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Greg KH <greg@kroah.com>
      Cc: Jeff Garzik <jeff@garzik.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d22a6966
    • F
      iommu sg: add IOMMU helper functions for the free area management · 0291df8c
      FUJITA Tomonori 提交于
      This adds IOMMU helper functions for the free area management.  These
      functions take care of LLD's segment boundary limit for IOMMUs.  They would be
      useful for IOMMUs that use bitmap for the free area management.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Jeff Garzik <jeff@garzik.org>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0291df8c
    • F
      iommu sg merging: PCI: add device_dma_parameters support · 4d57cdfa
      FUJITA Tomonori 提交于
      This adds struct device_dma_parameters in struct pci_dev and properly
      sets up a pointer in struct device.
      
      The default max_segment_size is set to 64K, same to the block layer's
      default value.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Mostly-acked-by: NJeff Garzik <jeff@garzik.org>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4d57cdfa
    • F
      iommu sg merging: add device_dma_parameters structure · 6b7b6510
      FUJITA Tomonori 提交于
      IOMMUs merges scatter/gather segments without considering a low level
      driver's restrictions. The problem is that IOMMUs can't access to the
      limitations because they are in request_queue.
      
      This patchset introduces a new structure, device_dma_parameters,
      including dma information. A pointer to device_dma_parameters is added
      to struct device. The bus specific structures (like pci_dev) includes
      device_dma_parameters. Low level drivers can use dma_set_max_seg_size
      to tell IOMMUs about the restrictions.
      
      We can move more dma stuff in struct device (like dma_mask) to struct
      device_dma_parameters later (needs some cleanups before that).
      
      This includes patches for all the IOMMUs that could merge sg (x86_64,
      ppc, IA64, alpha, sparc64, and parisc) though only the ppc patch was
      tested. The patches for other IOMMUs are only compile tested.
      
      This patch:
      
      Add a new structure, device_dma_parameters, including dma information.  A
      pointer to device_dma_parameters is added to struct device.
      
      - there are only max_segment_size and segment_boundary_mask there but we'll
        move more dma stuff in struct device (like dma_mask) to struct
        device_dma_parameters later.  segment_boundary_mask is not supported yet.
      
      - new accessors for the dma parameters are added.  So we can easily change
        where to place struct device_dma_parameters in the future.
      
      - dma_get_max_seg_size returns 64K if dma_parms in struct device isn't set
        up properly.  64K is the default max_segment_size in the block layer.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Acked-by: NJeff Garzik <jeff@garzik.org>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Acked-by: NJens Axboe <jens.axboe@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6b7b6510
    • W
      8250.c: support specifying DW APB UARTs in device platform_data · 74a19741
      Will Newton 提交于
      Allow the private_data field to be specified in platform_data for the
      standard 8250/16550 UART.  This field is used by DW APB type UARTs and
      without this patch it's only possible to set this field when registering
      the port by hand.  If private_data is not set then the driver will
      potentially oops with a NULL pointer dereference.
      Signed-off-by: NWill Newton <will.newton@gmail.com>
      Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74a19741