- 16 12月, 2009 19 次提交
-
-
由 KOSAKI Motohiro 提交于
If reclaim fails to make sufficient progress, the priority is raised. Once the priority is higher, kswapd starts waiting on congestion. However, if the zone is below the min watermark then kswapd needs to continue working without delay as there is a danger of an increased rate of GFP_ATOMIC allocation failure. This patch changes the conditions under which kswapd waits on congestion by only going to sleep if the min watermarks are being met. [mel@csn.ul.ie: add stats to track how relevant the logic is] [mel@csn.ul.ie: make kswapd only check its own zones and rename the relevant counters] Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NMel Gorman <mel@csn.ul.ie> Reviewed-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
After kswapd balances all zones in a pgdat, it goes to sleep. In the event of no IO congestion, kswapd can go to sleep very shortly after the high watermark was reached. If there are a constant stream of allocations from parallel processes, it can mean that kswapd went to sleep too quickly and the high watermark is not being maintained for sufficient length time. This patch makes kswapd go to sleep as a two-stage process. It first tries to sleep for HZ/10. If it is woken up by another process or the high watermark is no longer met, it's considered a premature sleep and kswapd continues work. Otherwise it goes fully to sleep. This adds more counters to distinguish between fast and slow breaches of watermarks. A "fast" premature sleep is one where the low watermark was hit in a very short time after kswapd going to sleep. A "slow" premature sleep indicates that the high watermark was breached after a very short interval. Signed-off-by: NMel Gorman <mel@csn.ul.ie> Cc: Frans Pop <elendil@planet.nl> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lee Schermerhorn 提交于
Seems that page_io.c doesn't really need to know that page_private(page) is the swp_entry 'val'. Rework map_swap_page() to do what its name says and map a page to a page offset in the swap space. The only other caller of map_swap_page() is internal to mm/swapfile.c and it does want to map a swap entry to the 'sector'. So rename map_swap_page() to map_swap_entry(), make it 'static' and and implement map_swap_page() as a wrapper around that. Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Reorder (and comment) the fields of swap_info_struct, to make better use of its cachelines: it's good for swap_duplicate() in particular if unsigned int max and swap_map are near the start. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
While we're fiddling with the swap_map values, let's assign a particular value to shmem/tmpfs swap pages: their swap counts are never incremented, and it helps swapoff's try_to_unuse() a little if it can immediately distinguish those pages from process pages. Since we've no use for SWAP_MAP_BAD | COUNT_CONTINUED, we might as well use that 0xbf value for SWAP_MAP_SHMEM. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Swap is duplicated (reference count incremented by one) whenever the same swap page is inserted into another mm (when forking finds a swap entry in place of a pte, or when reclaim unmaps a pte to insert the swap entry). swap_info_struct's vmalloc'ed swap_map is the array of these reference counts: but what happens when the unsigned short (or unsigned char since the preceding patch) is full? (and its high bit is kept for a cache flag) We then lose track of it, never freeing, leaving it in use until swapoff: at which point we _hope_ that a single pass will have found all instances, assume there are no more, and will lose user data if we're wrong. Swapping of KSM pages has not yet been enabled; but it is implemented, and makes it very easy for a user to overflow the maximum swap count: possible with ordinary process pages, but unlikely, even when pid_max has been raised from PID_MAX_DEFAULT. This patch implements swap count continuations: when the count overflows, a continuation page is allocated and linked to the original vmalloc'ed map page, and this used to hold the continuation counts for that entry and its neighbours. These continuation pages are seldom referenced: the common paths all work on the original swap_map, only referring to a continuation page when the low "digit" of a count is incremented or decremented through SWAP_MAP_MAX. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Halve the vmalloc'ed swap_map array from unsigned shorts to unsigned chars: it's still very unusual to reach a swap count of 126, and the next patch allows it to be extended indefinitely. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Though swap_count() is useful, I'm finding that swap_has_cache() and encode_swapmap() obscure what happens in the swap_map entry, just at those points where I need to understand it. Remove them, and pass more usable "usage" values to scan_swap_map(), swap_entry_free() and __swap_duplicate(), instead of the SWAP_MAP and SWAP_CACHE enum. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Make better use of the space by folding first swap_extent into its swap_info_struct, instead of just the list_head: swap partitions need only that one, and for others it's used as a circular list anyway. [jirislaby@gmail.com: fix crash on double swapon] Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NJiri Slaby <jirislaby@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
The swap_info_struct is only 76 or 104 bytes, but it does seem wrong to reserve an array of about 30 of them in bss, when most people will want only one. Change swap_info[] to an array of pointers. That does need a "type" field in the structure: pack it as a char with next type and short prio (aha, char is unsigned by default on PowerPC). Use the (admittedly peculiar) name "type" throughout for this index. /proc/swaps does not take swap_lock: I wouldn't want it to, but do take care with barriers when adding a new item to the array (never removed). Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
The swap_info_struct is mostly private to mm/swapfile.c, with only one other in-tree user: get_swap_bio(). Adjust its interface to map_swap_page(), so that we can then remove get_swap_info_struct(). But there is a popular user out-of-tree, TuxOnIce: so leave the declaration of swap_info_struct in linux/swap.h. Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk> Cc: Nigel Cunningham <ncunningham@crca.org.au> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NRik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
Objects passed to NODEMASK_ALLOC() are relatively small in size and are backed by slab caches that are not of large order, traditionally never greater than PAGE_ALLOC_COSTLY_ORDER. Thus, using GFP_KERNEL for these allocations on large machines when CONFIG_NODES_SHIFT > 8 will cause the page allocator to loop endlessly in the allocation attempt, each time invoking both direct reclaim or the oom killer. This is of particular interest when using NODEMASK_ALLOC() from a mempolicy context (either directly in mm/mempolicy.c or the mempolicy constrained hugetlb allocations) since the oom killer always kills current when allocations are constrained by mempolicies. So for all present use cases in the kernel, current would end up being oom killed when direct reclaim fails. That would allow the NODEMASK_ALLOC() to succeed but current would have sacrificed itself upon returning. This patch adds gfp flags to NODEMASK_ALLOC() to pass to kmalloc() on CONFIG_NODES_SHIFT > 8; this parameter is a nop on other configurations. All current use cases either directly from hugetlb code or indirectly via NODEMASK_SCRATCH() union __GFP_NORETRY to avoid direct reclaim and the oom killer when the slab allocator needs to allocate additional pages. The side-effect of this change is that all current use cases of either NODEMASK_ALLOC() or NODEMASK_SCRATCH() need appropriate -ENOMEM handling when the allocation fails (never for CONFIG_NODES_SHIFT <= 8). All current use cases were audited and do have appropriate error handling at this time. Signed-off-by: NDavid Rientjes <rientjes@google.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lee Schermerhorn 提交于
Offload the registration and unregistration of per node hstate sysfs attributes to a worker thread rather than attempt the allocation/attachment or detachment/freeing of the attributes in the context of the memory hotplug handler. I don't know that this is absolutely required, but the registration can sleep in allocations and other mem hot plug handlers do it this way. If it turns out this is NOT required, we can drop this patch. N.B., Only tested build, boot, libhugetlbfs regression. i.e., no memory hotplug testing. Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Reviewed-by: NAndi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
When memory is hot-removed, its node must be cleared in N_HIGH_MEMORY if there are no present pages left. In such a situation, kswapd must also be stopped since it has nothing left to do. Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Rik van Riel <riel@redhat.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lee Schermerhorn 提交于
Add the per huge page size control/query attributes to the per node sysdevs: /sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/ nr_hugepages - r/w free_huge_pages - r/o surplus_huge_pages - r/o The patch attempts to re-use/share as much of the existing global hstate attribute initialization and handling, and the "nodes_allowed" constraint processing as possible. Calling set_max_huge_pages() with no node indicates a change to global hstate parameters. In this case, any non-default task mempolicy will be used to generate the nodes_allowed mask. A valid node id indicates an update to that node's hstate parameters, and the count argument specifies the target count for the specified node. From this info, we compute the target global count for the hstate and construct a nodes_allowed node mask contain only the specified node. Setting the node specific nr_hugepages via the per node attribute effectively ignores any task mempolicy or cpuset constraints. With this patch: (me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB ./ ../ free_hugepages nr_hugepages surplus_hugepages Starting from: Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 0 Node 2 HugePages_Free: 0 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 vm.nr_hugepages = 0 Allocate 16 persistent huge pages on node 2: (me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages [Note that this is equivalent to: numactl -m 2 hugeadmin --pool-pages-min 2M:+16 ] Yields: Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 16 Node 2 HugePages_Free: 16 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 vm.nr_hugepages = 16 Global controls work as expected--reduce pool to 8 persistent huge pages: (me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 8 Node 2 HugePages_Free: 8 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Reviewed-by: NAndi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lee Schermerhorn 提交于
Move definition of NUMA_NO_NODE from ia64 and x86_64 arch specific headers to generic header 'linux/numa.h' for use in generic code. NUMA_NO_NODE replaces bare '-1' where it's used in this series to indicate "no node id specified". Ultimately, it can be used to replace the -1 elsewhere where it is used similarly. Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: NDavid Rientjes <rientjes@google.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Reviewed-by: NAndi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lee Schermerhorn 提交于
This patch derives a "nodes_allowed" node mask from the numa mempolicy of the task modifying the number of persistent huge pages to control the allocation, freeing and adjusting of surplus huge pages when the pool page count is modified via the new sysctl or sysfs attribute "nr_hugepages_mempolicy". The nodes_allowed mask is derived as follows: * For "default" [NULL] task mempolicy, a NULL nodemask_t pointer is produced. This will cause the hugetlb subsystem to use node_online_map as the "nodes_allowed". This preserves the behavior before this patch. * For "preferred" mempolicy, including explicit local allocation, a nodemask with the single preferred node will be produced. "local" policy will NOT track any internode migrations of the task adjusting nr_hugepages. * For "bind" and "interleave" policy, the mempolicy's nodemask will be used. * Other than to inform the construction of the nodes_allowed node mask, the actual mempolicy mode is ignored. That is, all modes behave like interleave over the resulting nodes_allowed mask with no "fallback". See the updated documentation [next patch] for more information about the implications of this patch. Examples: Starting with: Node 0 HugePages_Total: 0 Node 1 HugePages_Total: 0 Node 2 HugePages_Total: 0 Node 3 HugePages_Total: 0 Default behavior [with or without this patch] balances persistent hugepage allocation across nodes [with sufficient contiguous memory]: sysctl vm.nr_hugepages[_mempolicy]=32 yields: Node 0 HugePages_Total: 8 Node 1 HugePages_Total: 8 Node 2 HugePages_Total: 8 Node 3 HugePages_Total: 8 Of course, we only have nr_hugepages_mempolicy with the patch, but with default mempolicy, nr_hugepages_mempolicy behaves the same as nr_hugepages. Applying mempolicy--e.g., with numactl [using '-m' a.k.a. '--membind' because it allows multiple nodes to be specified and it's easy to type]--we can allocate huge pages on individual nodes or sets of nodes. So, starting from the condition above, with 8 huge pages per node, add 8 more to node 2 using: numactl -m 2 sysctl vm.nr_hugepages_mempolicy=40 This yields: Node 0 HugePages_Total: 8 Node 1 HugePages_Total: 8 Node 2 HugePages_Total: 16 Node 3 HugePages_Total: 8 The incremental 8 huge pages were restricted to node 2 by the specified mempolicy. Similarly, we can use mempolicy to free persistent huge pages from specified nodes: numactl -m 0,1 sysctl vm.nr_hugepages_mempolicy=32 yields: Node 0 HugePages_Total: 4 Node 1 HugePages_Total: 4 Node 2 HugePages_Total: 16 Node 3 HugePages_Total: 8 The 8 huge pages freed were balanced over nodes 0 and 1. [rientjes@google.com: accomodate reworked NODEMASK_ALLOC] Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Reviewed-by: NAndi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lee Schermerhorn 提交于
Factor init_nodemask_of_node() out of the nodemask_of_node() macro. This will be used to populate the huge pages "nodes_allowed" nodemask for a single node when basing nodes_allowed on a preferred/local mempolicy or when a persistent huge page pool page count is modified via a per node sysfs attribute. Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: NMel Gorman <mel@csn.ul.ie> Reviewed-by: NAndi Kleen <andi@firstfloor.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Rientjes 提交于
This is a series of patches to provide control over the location of the allocation and freeing of persistent huge pages on a NUMA platform. Please consider for merging into mmotm. This series uses two mechanisms to constrain the nodes from which persistent huge pages are allocated: 1) the task NUMA mempolicy of the task modifying a new sysctl "nr_hugepages_mempolicy", based on a suggestion by Mel Gorman; and 2) a subset of the hugepages hstate sysfs attributes have been added [in V4] to each node system device under: /sys/devices/node/node[0-9]*/hugepages The per node attibutes allow direct assignment of a huge page count on a specific node, regardless of the task's mempolicy or cpuset constraints. This patch: NODEMASK_ALLOC(x, m) assumes x is a type of struct, which is unnecessary. It's perfectly reasonable to use this macro to allocate a nodemask_t, which is anonymous, either dynamically or on the stack depending on NODES_SHIFT. Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NLee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Nishanth Aravamudan <nacc@us.ibm.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Cc: Adam Litke <agl@us.ibm.com> Cc: Andy Whitcroft <apw@canonical.com> Cc: Eric Whitney <eric.whitney@hp.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 15 12月, 2009 7 次提交
-
-
由 Jean Delvare 提交于
There is no user left of I2C_CLIENT_MODULE_PARM, so we can finally get rid of this ugly macro. Signed-off-by: NJean Delvare <khali@linux-fr.org> Tested-by: NWolfram Sang <w.sang@pengutronix.de>
-
由 Jean Delvare 提交于
These macros simply declare an enum, so drivers might as well declare it themselves. This puts an end to the arbitrary limit of 8 chip types per i2c driver. Signed-off-by: NJean Delvare <khali@linux-fr.org> Tested-by: NWolfram Sang <w.sang@pengutronix.de>
-
由 Jean Delvare 提交于
This macro simply declares an enum, so drivers might as well declare it themselves. Signed-off-by: NJean Delvare <khali@linux-fr.org> Tested-by: NWolfram Sang <w.sang@pengutronix.de>
-
由 Jean Delvare 提交于
Struct i2c_client_address_data only contains one field at this point, which makes its usefulness questionable. Get rid of it and pass simple address lists around instead. Signed-off-by: NJean Delvare <khali@linux-fr.org> Tested-by: NWolfram Sang <w.sang@pengutronix.de>
-
由 Jean Delvare 提交于
The "kind" parameter always has value -1, and nobody is using it any longer, so we can remove it. Signed-off-by: NJean Delvare <khali@linux-fr.org> Tested-by: NWolfram Sang <w.sang@pengutronix.de>
-
由 David Howells 提交于
Global variable declarations must match the definitions in section attributes as the compiler is at liberty to vary the method it uses to access a variable, depending on the section it is in. When building the FRV arch, I now see: drivers/built-in.o: In function `pci_apply_final_quirks': drivers/pci/quirks.c:2606: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o drivers/pci/quirks.c:2623: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o drivers/pci/quirks.c:2630: relocation truncated to fit: R_FRV_GPREL12 against symbol `pci_dfl_cache_line_size' defined in .devinit.data section in drivers/built-in.o because the declaration of pci_dfl_cache_line_size in linux/pci.h does not match the definition in drivers/pci/pci.c. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Howells 提交于
If there is no hardware breakpoint support, modify_user_hw_breakpoint() tries to return a NULL pointer through as an 'int' return value: In file included from kernel/exit.c:53: include/linux/hw_breakpoint.h: In function 'modify_user_hw_breakpoint': include/linux/hw_breakpoint.h:96: warning: return makes integer from pointer without a cast Return 0 instead. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 14 12月, 2009 14 次提交
-
-
由 NeilBrown 提交于
Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 Rajendra Nayak 提交于
This patch adds initial support for creating twl6030 PMIC specific voltage regulators in the twl mfd driver. Board specific regulator configurations will have to be passed from respective board files. Signed-off-by: NRajendra Nayak <rnayak@ti.com> Signed-off-by: NBalaji T K <balajitk@ti.com> Acked-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Reviewed-by: NTony Lindgren <tony@atomide.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Rajendra Nayak 提交于
This patch updates the regulator driver to add support for TWL6030 PMIC specific LDO regulators. SMPS resources are not yet supported for TWL6030 and also .set_mode and .get_status for LDO's are yet to be implemented for TWL6030. Signed-off-by: NRajendra Nayak <rnayak@ti.com> Signed-off-by: NBalaji T K <balajitk@ti.com> Acked-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Reviewed-by: NTony Lindgren <tony@atomide.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Balaji T K 提交于
This patch adds support for phoenix interrupt framework. New iInterrupt status register A, B, C are introduced in Phoenix and are cleared on write. Due to the differences in interrupt handling with respect to TWL4030, twl6030-irq.c is created for TWL6030 PMIC Signed-off-by: NRajendra Nayak <rnayak@ti.com> Signed-off-by: NBalaji T K <balajitk@ti.com> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: NTony Lindgren <tony@atomide.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Balaji T K 提交于
This patch renames function names like twl4030_i2c_write_u8, twl4030_i2c_read_u8 to twl_i2c_write_u8, twl_i2c_read_u8 and also common variable in twl-core.c Signed-off-by: NRajendra Nayak <rnayak@ti.com> Signed-off-by: NBalaji T K <balajitk@ti.com> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com> Acked-by: NKevin Hilman <khilman@deeprootsystems.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Santosh Shilimkar 提交于
The upcoming TWL6030 is companion chip for OMAP4 like the current TWL4030 for OMAP3. The common modules like RTC, Regulator creates opportunity to re-use the most of the code from twl4030. This patch renames few common drivers twl4030* files to twl* to enable the code re-use. Signed-off-by: NRajendra Nayak <rnayak@ti.com> Signed-off-by: NBalaji T K <balajitk@ti.com> Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com> Acked-by: NKevin Hilman <khilman@deeprootsystems.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Juha Keski-Saari 提交于
Add all twl4030 regulators to the twl4030 mfd driver and twl4030_platform_data Signed-off-by: NJuha Keski-Saari <ext-juha.1.keski-saari@nokia.com> Acked-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Antonio Ospite 提交于
We used these, in a first version of leds-pcap driver, in order to encode gpio enabling and gpio inversion for a led inside the variable used for the gpio number. In the new leds-pcap driver we rely on gpio_is_valid() to derive if a led is gpio enabled and we have a dedicated flag to tell if the gpio value has to be inverted. Signed-off-by: NAntonio Ospite <ospite@studenti.unina.it> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Uwe Kleine-König 提交于
This fixes several things while still providing the old API: - simplify and fix locking - better error handling - don't ack all irqs making it impossible to detect a reset of the rtc - use a timeout variant to wait for completion of ADC conversion - provide platform-data to regulator subdevice (This allows making struct mc13783 opaque for other drivers after the regulator driver is updated to use its platform_data.) - expose all interrupts - use threaded irq After all users in mainline are converted to the new API, some things (e.g. mc13783-private.h) can go away. Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Mark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Mark Brown 提交于
Replace the wm831x-local IRQ infrastructure with genirq, allowing access to the diagnostic infrastructure of genirq and allowing us to implement interrupt support for the GPIOs. The switchover is done within the wm831x specific IRQ API, further patches will convert the individual drivers to use genirq directly. Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Ilkka Koskinen 提交于
TWL5031 introduces two new interrupts in PIH. Moreover, BCI has changed remarkably and, thus, it's disabled when TWL5031 is in use. Signed-off-by: NIlkka Koskinen <ilkka.koskinen@nokia.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Mark Brown 提交于
This is done as simple code transformation, the semantics of the IRQ API provided by the core are are still very different to those of genirq (mainly with regard to masking). Signed-off-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Ben Dooks 提交于
Add function to allow the configuation fo the VDCDC2 register by external users, to allow changing of the standard and low-power running modes. This is needed, for example, for the Simtec IM2440D20 where we need to use the low-power mode to shutdown the LDO/DCDC that are not needed during suspend (saving substantial power) and the runtime use of the low-power mode to change VCore. Signed-off-by: NBen Dooks <ben@simtec.co.uk> Signed-off-by: NSimtec Linux Team <linux@simtec.co.uk> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-
由 Ilkka Koskinen 提交于
Allows TWL's 32kHz oscillator to go in low-power mode when main battery voltage is running low. Signed-off-by: NIlkka Koskinen <ilkka.koskinen@nokia.com> Signed-off-by: NSamuel Ortiz <sameo@linux.intel.com>
-