- 28 5月, 2010 10 次提交
-
-
FILE_MAPPED per memcg of migrated file cache is not properly updated, because our hook in page_add_file_rmap() can't know to which memcg FILE_MAPPED should be counted. Basically, this patch is for fixing the bug but includes some big changes to fix up other messes. Now, at migrating mapped file, events happen in following sequence. 1. allocate a new page. 2. get memcg of an old page. 3. charge ageinst a new page before migration. But at this point, no changes to new page's page_cgroup, no commit for the charge. (IOW, PCG_USED bit is not set.) 4. page migration replaces radix-tree, old-page and new-page. 5. page migration remaps the new page if the old page was mapped. 6. Here, the new page is unlocked. 7. memcg commits the charge for newpage, Mark the new page's page_cgroup as PCG_USED. Because "commit" happens after page-remap, we can count FILE_MAPPED at "5", because we should avoid to trust page_cgroup->mem_cgroup. if PCG_USED bit is unset. (Note: memcg's LRU removal code does that but LRU-isolation logic is used for helping it. When we overwrite page_cgroup->mem_cgroup, page_cgroup is not on LRU or page_cgroup->mem_cgroup is NULL.) We can lose file_mapped accounting information at 5 because FILE_MAPPED is updated only when mapcount changes 0->1. So we should catch it. BTW, historically, above implemntation comes from migration-failure of anonymous page. Because we charge both of old page and new page with mapcount=0, we can't catch - the page is really freed before remap. - migration fails but it's freed before remap or .....corner cases. New migration sequence with memcg is: 1. allocate a new page. 2. mark PageCgroupMigration to the old page. 3. charge against a new page onto the old page's memcg. (here, new page's pc is marked as PageCgroupUsed.) 4. page migration replaces radix-tree, page table, etc... 5. At remapping, new page's page_cgroup is now makrked as "USED" We can catch 0->1 event and FILE_MAPPED will be properly updated. And we can catch SWAPOUT event after unlock this and freeing this page by unmap() can be caught. 7. Clear PageCgroupMigration of the old page. So, FILE_MAPPED will be correctly updated. Then, for what MIGRATION flag is ? Without it, at migration failure, we may have to charge old page again because it may be fully unmapped. "charge" means that we have to dive into memory reclaim or something complated. So, it's better to avoid charge it again. Before this patch, __commit_charge() was working for both of the old/new page and fixed up all. But this technique has some racy condtion around FILE_MAPPED and SWAPOUT etc... Now, the kernel use MIGRATION flag and don't uncharge old page until the end of migration. I hope this change will make memcg's page migration much simpler. This page migration has caused several troubles. Worth to add a flag for simplification. Reviewed-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Tested-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Reported-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daisuke Nishimura 提交于
This patch adds support for moving charge of file pages, which include normal file, tmpfs file and swaps of tmpfs file. It's enabled by setting bit 1 of <target cgroup>/memory.move_charge_at_immigrate. Unlike the case of anonymous pages, file pages(and swaps) in the range mmapped by the task will be moved even if the task hasn't done page fault, i.e. they might not be the task's "RSS", but other task's "RSS" that maps the same file. And mapcount of the page is ignored(the page can be moved even if page_mapcount(page) > 1). So, conditions that the page/swap should be met to be moved is that it must be in the range mmapped by the target task and it must be charged to the old cgroup. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix warning] Signed-off-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Balbir Singh <balbir@in.ibm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Felipe Balbi 提交于
A few architectures, like OMAP, allow you to set a debouncing time for the gpio before generating the IRQ. Teach gpiolib about that. Mark said: : This would be generally useful for embedded systems, especially where : the interrupt concerned is a wake source. It allows drivers to avoid : spurious interrupts from noisy sources so if the hardware supports it : the driver can avoid having to explicitly wait for the signal to become : stable and software has to cope with fewer events. We've lived without : it for quite some time, though. David said: : I looked at adding debounce support to the generic GPIO calls (and thus : gpiolib) some time back, but decided against it. I forget why at this : time (check list archives) but it wasn't because of lack of utility in : certain contexts. : : One thing to watch out for is just how variable the hardware capabilities : are. Atmel GPIOs have something like a fixed number of 32K clock cycles : for debounce, twl4030 had something odd, OMAPs were more like the Atmel : chips but with a different clock. In some cases debouncing had to be : ganged, not per-GPIO. And so forth. Signed-off-by: NFelipe Balbi <felipe.balbi@nokia.com> Cc: Tony Lindgren <tony@atomide.com> Cc: David Brownell <david-b@pacbell.net> Reviewed-by: NMark Brown <broonie@opensource.wolfsonmicro.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Uwe Kleine-König 提交于
gpiolib doesn't need to modify the names and I assume most initializers use string constants that shouldn't be modified anyhow. [akpm@linux-foundation.org: fix drivers/gpio/cs5535-gpio.c] Signed-off-by: NUwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Kevin Wells <kevin.wells@nxp.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Marc Zyngier 提交于
Most of the GPIO expanders supported by the max732x driver have interrupt generation capability by reporting changes on input pins through an INT# pin. This patch implements the irq_chip functionnality (edge detection only). Signed-off-by: NMarc Zyngier <maz@misterjones.org> Cc: Eric Miao <eric.y.miao@gmail.com> Cc: Jebediah Huang <jebediah.huang@gmail.com> Cc: David Brownell <david-b@pacbell.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Viresh KUMAR 提交于
Add a glue layer to support the sdhci driver on the ST SPEAr platform. Signed-off-by: NViresh Kumar <viresh.kumar@st.com> Cc: <shiraz.hashim@st.com> Cc: Linus Walleij <linus.ml.walleij@gmail.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Grazvydas Ignotas 提交于
SDIO specification allows RAW (Read after Write) operation using IO_RW_DIRECT command (CMD52) by setting the RAW bit. This operation is similar to ordinary read/write commands, except that both write and read are performed using single command/response pair. The Linux SDIO layer already supports this internaly, only external function is missing for drivers to make use, which is added by this patch. This type of command is required to implement proper power save mode support in wl1251 wifi driver. Android has similar patch for G1 in it's tree for the same reason: http://android.git.kernel.org/?p=kernel/common.git;a=commitdiff;h=74a47786f6ecbe6c1cf9fb15efe6a968451deb52Signed-off-by: NGrazvydas Ignotas <notasas@gmail.com> Acked-by: NKalle Valo <kalle.valo@iki.fi> Cc: Dmitry Shmidt <dimitrysh@google.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matt Fleming 提交于
Even though many mmc host drivers pass a pm_message_t argument to mmc_suspend_host() that argument isn't used the by MMC core. As host drivers are converted to dev_pm_ops they'll have to construct pm_message_t's (as they won't be passed by the PM subsystem any more) just to appease the mmc suspend interface. We might as well just delete the unused paramter. Signed-off-by: NMatt Fleming <matt@console-pimps.org> Acked-by: NAnton Vorontsov <cbouatmailru@gmail.com> Acked-by: Michal Miroslaw <mirq-linux@rere.qmqm.pl>ZZ Acked-by: NSascha Sommer <saschasommer@freenet.de> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yusuke Goda 提交于
MMCIF is the MMC Host Interface in SuperH. Signed-off-by: NYusuke Goda <yusuke.goda.sx@renesas.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Magnus Damm <magnus.damm@gmail.com> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Anton Vorontsov 提交于
This includes platform ops, quirks and (de)initialization callbacks. Signed-off-by: NAnton Vorontsov <avorontsov@ru.mvista.com> Cc: Richard Röjfors <richard.rojfors@pelagicore.com> Cc: David Vrabel <david.vrabel@csr.com> Cc: Pierre Ossman <pierre@ossman.eu> Cc: Ben Dooks <ben@simtec.co.uk> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 26 5月, 2010 2 次提交
-
-
由 Linus Torvalds 提交于
This reverts commit b3b77c8c, which was also totally broken (see commit 0d2daf5c that reverted the crc32 version of it). As reported by Stephen Rothwell, it causes problems on big-endian machines: > In file included from fs/jfs/jfs_types.h:33, > from fs/jfs/jfs_incore.h:26, > from fs/jfs/file.c:22: > fs/jfs/endian24.h:36:101: warning: "__LITTLE_ENDIAN" is not defined The kernel has never had that crazy "__BYTE_ORDER == __LITTLE_ENDIAN" model. It's not how we do things, and it isn't how we _should_ do things. So don't go there. Requested-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kay Sievers 提交于
This adds: alias: devname:<name> to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d1-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: David S. Miller <davem@davemloft.net> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Chris Mason <chris.mason@oracle.com> Cc: Alasdair G Kergon <agk@redhat.com> Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk> Cc: Ian Kent <raven@themaw.net> Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org> Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
-
- 25 5月, 2010 28 次提交
-
-
由 Grazvydas Ignotas 提交于
FBIO_WAITFORVSYNC is currently implemented by matroxfb, atyfb, intelfb and more. All of them keep redefining the same FBIO_WAITFORVSYNC macro over and over again, so move it to linux/fb.h and clean up those duplicate defines. Signed-off-by: NGrazvydas Ignotas <notasas@gmail.com> Cc: Ville Syrjala <syrjala@sci.fi> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Maik Broemme <mbroemme@plusserver.de> Cc: Petr Vandrovec <vandrove@vc.cvut.cz> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Krzysztof Helt <krzysztof.h1@poczta.fm> Cc: "Hiremath, Vaibhav" <hvaibhav@ti.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Samu Onkalo 提交于
Content for the 8bit device threaded interrupt handlers. Depending on the interrupt line and chip configuration, either click or wakeup / freefall handler is called. In case of click, BTN_ event is sent via input device. In case of wakeup or freefall, input device ABS_ events are updated immediatelly. It is still possible to configure interrupt line 1 for fast freefall detection and use the second line either for click or threshold based interrupts. Or both lines can be used for click / threshold interrupts. Polled input device can be set to stopped state and still get coordinate updates via input device using interrupt based method. Polled mode and interrupt mode can also be used parallel. BTN_ events are remapped based on existing axis remapping information. Signed-off-by: NSamu Onkalo <samu.p.onkalo@nokia.com> Acked-by: NEric Piel <eric.piel@tremplin-utc.net> Cc: Daniel Mack <daniel@caiaq.de> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Samu Onkalo 提交于
Original lis3 driver didn't provide interrupt handler(s) for click or threshold event handling. This patch adds threaded handlers for one or two interrupt lines for 8 bit device. Actual content for interrupt handling is provided in the separate patch. Signed-off-by: NSamu Onkalo <samu.p.onkalo@nokia.com> Tested-by: NDaniel Mack <daniel@caiaq.de> Acked-by: NEric Piel <eric.piel@tremplin-utc.net> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Samu Onkalo 提交于
8 bit device has two wakeup / free fall units. It was not possible to configure the second unit. This patch introduces configuration entry to the platform data and also corresponding changes to the 8 bit setup function. High pass filters were enabled by default. Patch introduces configuration option for high pass filter cut off frequency and also possibility to disable or enable the filter via platform data. Since the control is a new one and default state was filter enabled, new option is used to disable the filter. This way old platform data is still compatible with the change. Signed-off-by: NSamu Onkalo <samu.p.onkalo@nokia.com> Acked-by: NEric Piel <eric.piel@tremplin-utc.net> Tested-by: NDaniel Mack <daniel@caiaq.de> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andy Shevchenko 提交于
hex_to_bin() is a little method which converts hex digit to its actual value. There are plenty of places where such functionality is needed. [akpm@linux-foundation.org: use tolower(), saving 3 bytes, test the more common case first - it's quicker] [akpm@linux-foundation.org: relocate tolower to make it even faster! (Joe)] Signed-off-by: NAndy Shevchenko <ext-andriy.shevchenko@nokia.com> Cc: Tilman Schmidt <tilman@imap.cc> Cc: Duncan Sands <duncan.sands@free.fr> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@suse.de> Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org> Cc: John W. Linville <linville@tuxdriver.com> Cc: Len Brown <lenb@kernel.org> Cc: Joe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Florian Ragwitz 提交于
[akpm@linux-foundation.org: coding-style fixes] Signed-off-by: NFlorian Ragwitz <rafl@debian.org> Cc: Jason Baron <jbaron@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 OGAWA Hirofumi 提交于
For now, all users of ratelimit_state allocates it statically, so DEFINE_RATELIMIT_STATE() is enough. But, I want to use ratelimit_state for fs, i.e. per super_block to suppress too many error reports. So, this adds ratelimit_state_init() to initialize ratelimite_state which is dynamically allocated, instead of opencoding. Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 OGAWA Hirofumi 提交于
ratelimit_state initialization of printk_ratelimited() seems broken. This fixes it by using DEFINE_RATELIMIT_STATE() to initialize spinlock properly. Signed-off-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Joe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joe Perches 提交于
The current logging macros are pr_<level>, dev_<level>, netdev_<level>, and netif_<level>. pr_ uses warning, the other use warn. Standardize these logging macros a bit more by adding pr_warn and pr_warn_ratelimited. Right now, there are: $ for level in emerg alert crit err warn warning notice info ; do \ for prefix in pr dev netdev netif ; do \ echo -n "${prefix}_${level}: `git grep -w "${prefix}_${level}" | wc -l` " ; \ done ; \ echo ; \ done pr_emerg: 45 dev_emerg: 4 netdev_emerg: 1 netif_emerg: 4 pr_alert: 24 dev_alert: 36 netdev_alert: 1 netif_alert: 6 pr_crit: 24 dev_crit: 22 netdev_crit: 1 netif_crit: 4 pr_err: 2013 dev_err: 8467 netdev_err: 267 netif_err: 240 pr_warn: 0 dev_warn: 1818 netdev_warn: 126 netif_warn: 23 pr_warning: 773 dev_warning: 0 netdev_warning: 0 netif_warning: 0 pr_notice: 148 dev_notice: 111 netdev_notice: 9 netif_notice: 3 pr_info: 1717 dev_info: 3007 netdev_info: 101 netif_info: 85 Signed-off-by: NJoe Perches <joe@perches.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alexey Dobriyan 提交于
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not USHORT_MAX/SHORT_MAX/SHORT_MIN. - Make SHRT_MIN of type s16, not int, for consistency. [akpm@linux-foundation.org: fix drivers/dma/timb_dma.c] [akpm@linux-foundation.org: fix security/keys/keyring.c] Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com> Acked-by: NWANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joakim Tjernlund 提交于
Linux does not define __BYTE_ORDER in its endian header files which makes some header files bend backwards to get at the current endian. Lets #define __BYTE_ORDER in big_endian.h/litte_endian.h to make it easier for header files that are used in user space too. In userspace the convention is that 1. _both_ __LITTLE_ENDIAN and __BIG_ENDIAN are defined, 2. you have to test for e.g. __BYTE_ORDER == __BIG_ENDIAN. Signed-off-by: NJoakim Tjernlund <Joakim.Tjernlund@transmode.se> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jani Nikula 提交于
Add __must_check to error pointer handlers to have the compiler warn about mistakes like: if (err) ERR_PTR(err); It found two bugs: Mar 12 Nikula Jani [PATCH] enclosure: fix error path - actually return ERR_PTR() on error Mar 12 Nikula Jani [PATCH] sunrpc: fix error path - actually return ERR_PTR() on error Signed-off-by: NJani Nikula <ext-jani.1.nikula@nokia.com> Cc: Phil Carmody <ext-phil.2.carmody@nokia.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chris Metcalf 提交于
This ensures that platforms with lowmem PAs above 32 bits work correctly by avoiding truncating the PA during a left shift. Signed-off-by: NChris Metcalf <cmetcalf@tilera.com> Cc: Barry Song <21cnbao@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Haicheng Li 提交于
Add global mutex zonelists_mutex to fix the possible race: CPU0 CPU1 CPU2 (1) zone->present_pages += online_pages; (2) build_all_zonelists(); (3) alloc_page(); (4) free_page(); (5) build_all_zonelists(); (6) __build_all_zonelists(); (7) zone->pageset = alloc_percpu(); In step (3,4), zone->pageset still points to boot_pageset, so bad things may happen if 2+ nodes are in this state. Even if only 1 node is accessing the boot_pageset, (3) may still consume too much memory to fail the memory allocations in step (7). Besides, atomic operation ensures alloc_percpu() in step (7) will never fail since there is a new fresh memory block added in step(6). [haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists] Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Reviewed-by: NAndi Kleen <andi.kleen@intel.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Haicheng Li 提交于
For each new populated zone of hotadded node, need to update its pagesets with dynamically allocated per_cpu_pageset struct for all possible CPUs: 1) Detach zone->pageset from the shared boot_pageset at end of __build_all_zonelists(). 2) Use mutex to protect zone->pageset when it's still shared in onlined_pages() Otherwises, multiple zones of different nodes would share same boot strapping boot_pageset for same CPU, which will finally cause below kernel panic: ------------[ cut here ]------------ kernel BUG at mm/page_alloc.c:1239! invalid opcode: 0000 [#1] SMP ... Call Trace: [<ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0 [<ffffffff81162e67>] alloc_pages_current+0x87/0xd0 [<ffffffff81128407>] __page_cache_alloc+0x67/0x70 [<ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260 [<ffffffff81132751>] ra_submit+0x21/0x30 [<ffffffff811329c6>] ondemand_readahead+0x166/0x2c0 [<ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0 [<ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670 [<ffffffff81266cfa>] nfs_file_read+0xca/0x130 [<ffffffff8117b20a>] do_sync_read+0xfa/0x140 [<ffffffff8117bf75>] vfs_read+0xb5/0x1a0 [<ffffffff8117c151>] sys_read+0x51/0x80 [<ffffffff8103c032>] system_call_fastpath+0x16/0x1b RIP [<ffffffff8112ff13>] get_page_from_freelist+0x883/0x900 RSP <ffff88000d1e78a8> ---[ end trace 4bda28328b9990db ] [akpm@linux-foundation.org: merge fix] Signed-off-by: NHaicheng Li <haicheng.li@linux.intel.com> Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Reviewed-by: NAndi Kleen <andi.kleen@intel.com> Reviewed-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Marcelo Roberto Jimenez 提交于
Got this while compiling for ARM/SA1100: mm/sparse.c: In function '__section_nr': mm/sparse.c:135: warning: 'root' is used uninitialized in this function This patch follows Russell King's suggestion for a new calculation for NR_SECTION_ROOTS. Thanks also to Sergei Shtylyov for pointing out the existence of the macro DIV_ROUND_UP. Atsushi Nemoto observed: : This fix doesn't just silence the warning - it fixes a real problem. : : Without this fix, mem_section[] might have 0 size so mem_section[0] : will share other variable area. For example, I got: : : c030c700 b __warned.16478 : c030c700 B mem_section : c030c701 b __warned.16483 : : This might cause very strange behavior. Your patch actually fixes it. Signed-off-by: NMarcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br> Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Sergei Shtylyov <sshtylyov@mvista.com> Cc: Russell King <rmk@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Akinobu Mita 提交于
In f4112de6 ("mm: introduce debug_kmap_atomic") I said that debug_kmap_atomic() needs CONFIG_TRACE_IRQFLAGS_SUPPORT. It was wrong. (I thought irqs_disabled() is only available when the architecture has CONFIG_TRACE_IRQFLAGS_SUPPORT) Remove the #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT check to enable kmap_atomic() debugging for the architectures which do not have CONFIG_TRACE_IRQFLAGS_SUPPORT. Reported-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 matt mooney 提交于
Add parenthesis in a define. This doesn't change functionality. checkpatch errors: 1) white space fixes 2) add spaces after comas Signed-off-by: Nmatt mooney <mfm@muteddisk.com> Cc: Dan Carpenter <error27@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 matt mooney 提交于
Fix minor spelling errors in a few comments; no code changes. Signed-off-by: Nmatt mooney <mfm@muteddisk.com> Cc: Dan Carpenter <error27@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 minskey guo 提交于
Enable users to online CPUs even if the CPUs belongs to a numa node which doesn't have onlined local memory. The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either in system boot/init period, or at the time of local memory online. For a numa node without onlined local memory, its zonelists are not initialized at present. As a result, any memory allocation operations executed by CPUs within this node will fail. In fact, an out-of-memory error is triggered when attempt to online CPUs before memory comes to online. This patch tries to create zonelists for such numa nodes, so that the memory allocation for this node can be fallback'ed to other nodes. [akpm@linux-foundation.org: remove unneeded export] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: minskey guo<chaohong.guo@intel.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: Yasunori Goto <y-goto@jp.fujitsu.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
For now, we have global isolation vs. memory control group isolation, do not allow the reclaim entry function to set an arbitrary page isolation callback, we do not need that flexibility. And since we already pass around the group descriptor for the memory control group isolation case, just use it to decide which one of the two isolator functions to use. The decisions can be merged into nearby branches, so no extra cost there. In fact, we save the indirect calls. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
The fragmentation index may indicate that a failure is due to external fragmentation but after a compaction run completes, it is still possible for an allocation to fail. There are two obvious reasons as to why o Page migration cannot move all pages so fragmentation remains o A suitable page may exist but watermarks are not met In the event of compaction followed by an allocation failure, this patch defers further compaction in the zone (1 << compact_defer_shift) times. If the next compaction attempt also fails, compact_defer_shift is increased up to a maximum of 6. If compaction succeeds, the defer counters are reset again. The zone that is deferred is the first zone in the zonelist - i.e. the preferred zone. To defer compaction in the other zones, the information would need to be stored in the zonelist or implemented similar to the zonelist_cache. This would impact the fast-paths and is not justified at this time. Signed-off-by: NMel Gorman <mel@csn.ul.ie> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
mm: compaction: add a tunable that decides when memory should be compacted and when it should be reclaimed The kernel applies some heuristics when deciding if memory should be compacted or reclaimed to satisfy a high-order allocation. One of these is based on the fragmentation. If the index is below 500, memory will not be compacted. This choice is arbitrary and not based on data. To help optimise the system and set a sensible default for this value, this patch adds a sysctl extfrag_threshold. The kernel will only compact memory if the fragmentation index is above the extfrag_threshold. [randy.dunlap@oracle.com: Fix build errors when proc fs is not configured] Signed-off-by: NMel Gorman <mel@csn.ul.ie> Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Ordinarily when a high-order allocation fails, direct reclaim is entered to free pages to satisfy the allocation. With this patch, it is determined if an allocation failed due to external fragmentation instead of low memory and if so, the calling process will compact until a suitable page is freed. Compaction by moving pages in memory is considerably cheaper than paging out to disk and works where there are locked pages or no swap. If compaction fails to free a page of a suitable size, then reclaim will still occur. Direct compaction returns as soon as possible. As each block is compacted, it is checked if a suitable page has been freed and if so, it returns. [akpm@linux-foundation.org: Fix build errors] [aarcange@redhat.com: fix count_vm_event preempt in memory compaction direct reclaim] Signed-off-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NRik van Riel <riel@redhat.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Add a per-node sysfs file called compact. When the file is written to, each zone in that node is compacted. The intention that this would be used by something like a job scheduler in a batch system before a job starts so that the job can allocate the maximum number of hugepages without significant start-up cost. Signed-off-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NRik van Riel <riel@redhat.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NChristoph Lameter <cl@linux-foundation.org> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Add a proc file /proc/sys/vm/compact_memory. When an arbitrary value is written to the file, all zones are compacted. The expected user of such a trigger is a job scheduler that prepares the system before the target application runs. Signed-off-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NRik van Riel <riel@redhat.com> Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Reviewed-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: NChristoph Lameter <cl@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
This patch is the core of a mechanism which compacts memory in a zone by relocating movable pages towards the end of the zone. A single compaction run involves a migration scanner and a free scanner. Both scanners operate on pageblock-sized areas in the zone. The migration scanner starts at the bottom of the zone and searches for all movable pages within each area, isolating them onto a private list called migratelist. The free scanner starts at the top of the zone and searches for suitable areas and consumes the free pages within making them available for the migration scanner. The pages isolated for migration are then migrated to the newly isolated free pages. [aarcange@redhat.com: Fix unsafe optimisation] [mel@csn.ul.ie: do not schedule work on other CPUs for compaction] Signed-off-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NRik van Riel <riel@redhat.com> Reviewed-by: NMinchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Currently, vmscan.c defines the isolation modes for __isolate_lru_page(). Memory compaction needs access to these modes for isolating pages for migration. This patch exports them. Signed-off-by: NMel Gorman <mel@csn.ul.ie> Acked-by: NChristoph Lameter <cl@linux-foundation.org> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan.kim@gmail.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-