- 29 9月, 2012 10 次提交
-
-
由 Huang Shijie 提交于
add onfi_get_async_timing_mode() to get the supportted asynchronous timing mode. add onfi_get_sync_timing_mode() to get the supportted synchronous timing mode. Also add the neccessary macros : the timing modes. Signed-off-by: NHuang Shijie <b32955@freescale.com> Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
由 Huang Shijie 提交于
Add the set-features(0xef)/get-features(0xee) helpers for ONFI nand. Also add the necessary macros. Signed-off-by: NHuang Shijie <b32955@freescale.com> Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
由 Mike Dunn 提交于
In the absence of any formal documentation of the nand interface, I thought this patch to the header file might be helpful. Signed-off-by: NMike Dunn <mikedunn@newsguy.com> Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
由 Rafał Miłecki 提交于
This registers MTD driver for serial flash platform device. Right now it supports reading only, writing still has to be implemented. Artem: minor amendments. Signed-off-by: NRafał Miłecki <zajec5@gmail.com> Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
由 Jeff Westfahl 提交于
Added a NAND device flag for subpage read support. Previously this was hard coded based on large page and soft ECC. Updated base NAND driver to use the new subpage read flag if the NAND is large page and soft ECC. Signed-off-by: NJeff Westfahl <jeff.westfahl@ni.com> Reviewed-by: NBrian Norris <computersforpeace@gmail.com> Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
由 Randy Dunlap 提交于
Fix kernel-doc warning in <linux/mtd/nand.h>: Warning(include/linux/mtd/nand.h:659): No description found for parameter 'read_byte' Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net> Acked-by: NBrian Norris <computersforpeace@gmail.com> Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
由 Huang Shijie 提交于
Just as Artem suggested: "Both UBI and JFFS2 are able to read verify what they wrote already. There are also MTD tests which do this verification. So I think there is no reason to keep this in the NAND layer, let alone wasting RAM in the driver to support this feature. Besides, it does not work for sub-pages and many drivers have it broken. It hurts more than it provides benefits." So kill MTD_NAND_VERIFY_WRITE entirely. Signed-off-by: NHuang Shijie <shijie8@gmail.com> Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
由 Brian Norris 提交于
The NAND_CHIPOPTIONS_MSK has limited utility and is causing real bugs. It silently masks off at least one flag that might be set by the driver (NAND_NO_SUBPAGE_WRITE). This breaks the GPMI NAND driver and possibly others. Really, as long as driver writers exercise a small amount of care with NAND_* options, this mask is not necessary at all; it was only here to prevent certain options from accidentally being set by the driver. But the original thought turns out to be a bad idea occasionally. Thus, kill it. Note, this patch fixes some major gpmi-nand breakage. Signed-off-by: NBrian Norris <computersforpeace@gmail.com> Tested-by: NHuang Shijie <shijie8@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
由 Roland Stigge 提交于
This patch makes the MLC NAND driver independent of the single AMBA DMA engine driver by using the platform data provided dma_filter callback. Signed-off-by: NRoland Stigge <stigge@antcom.de> Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
由 Roland Stigge 提交于
This patch makes the SLC NAND driver independent of the single AMBA DMA engine driver by using the platform data provided dma_filter callback. Signed-off-by: NRoland Stigge <stigge@antcom.de> Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
-
- 22 8月, 2012 2 次提交
-
-
由 Rafał Miłecki 提交于
Signed-off-by: NRafał Miłecki <zajec5@gmail.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Rafał Miłecki 提交于
Signed-off-by: NRafał Miłecki <zajec5@gmail.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
- 20 8月, 2012 1 次提交
-
-
由 Johannes Berg 提交于
In order to support using a different MAC address for the P2P Device address we must first have a P2P Device abstraction that can be assigned a MAC address. This abstraction will also be useful to support offloading P2P operations to the device, e.g. periodic listen for discoverability. Currently, the driver is responsible for assigning a MAC address to the P2P Device, but this could be changed by allowing a MAC address to be given to the NEW_INTERFACE command. As it has no associated netdev, a P2P Device can only be identified by its wdev identifier but the previous patches allowed using the wdev identifier in various APIs, e.g. remain-on-channel. Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
-
- 11 8月, 2012 2 次提交
-
-
由 Rafał Miłecki 提交于
We can not assume parallel flash is always present, there are boards with *serial* flash and probably some without flash at all. Define some bits by the way. Signed-off-by: NRafał Miłecki <zajec5@gmail.com> Reviewed-by: NHauke Mehrtens <hauke@hauke-m.de> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
由 Rafał Miłecki 提交于
Signed-off-by: NRafał Miłecki <zajec5@gmail.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
- 10 8月, 2012 1 次提交
-
-
由 Kees Cook 提交于
The higher ptrace restriction levels should be blocking even PTRACE_TRACEME requests. The comments in the LSM documentation are misleading about when the checks happen (the parent does not go through security_ptrace_access_check() on a PTRACE_TRACEME call). Signed-off-by: NKees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org # 3.5.x and later Signed-off-by: NJames Morris <james.l.morris@oracle.com>
-
- 09 8月, 2012 3 次提交
-
-
由 Arnd Bergmann 提交于
The EETI touchscreen asserts its IRQ line as soon as it has data in its internal buffers. The line is automatically deasserted once all data has been read via I2C. Hence, the driver has to monitor the GPIO line and cannot simply rely on the interrupt handler reception. In the current implementation of the driver, irq_to_gpio() is used to determine the GPIO number from the i2c_client's IRQ value. As irq_to_gpio() is not available on all platforms, this patch changes this and makes the driver ignore the passed in IRQ. Instead, a GPIO is added to the platform_data struct and gpio_to_irq is used to derive the IRQ from that GPIO. If this fails, bail out. The driver is only able to work in environments where the touchscreen GPIO can be mapped to an IRQ. Without this patch, building raumfeld_defconfig results in: drivers/input/touchscreen/eeti_ts.c: In function 'eeti_ts_irq_active': drivers/input/touchscreen/eeti_ts.c:65:2: error: implicit declaration of function 'irq_to_gpio' [-Werror=implicit-function-declaration] Signed-off-by: NDaniel Mack <zonque@gmail.com> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: stable@vger.kernel.org (v3.2+) Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Sven Neumann <s.neumann@raumfeld.com> Cc: linux-input@vger.kernel.org Cc: Haojian Zhuang <haojian.zhuang@gmail.com>
-
由 Arnd Bergmann 提交于
The irq_to_gpio function was removed from the pxa platform in linux-3.2, and this driver has been broken since. There is actually no in-tree user of this driver that adds this platform device, but the driver can and does get enabled on some platforms. Without this patch, building ezx_defconfig results in: drivers/mfd/ezx-pcap.c: In function 'pcap_isr_work': drivers/mfd/ezx-pcap.c:205:2: error: implicit declaration of function 'irq_to_gpio' [-Werror=implicit-function-declaration] Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NHaojian Zhuang <haojian.zhuang@gmail.com> Cc: stable@vger.kernel.org (v3.2+) Cc: Samuel Ortiz <sameo@linux.intel.com> Cc: Daniel Ribeiro <drwyrm@gmail.com>
-
由 Rafael J. Wysocki 提交于
Revert commit 45226e94 (NMI watchdog: fix for lockup detector breakage on resume) which breaks resume from system suspend on my SH7372 Mackerel board (by causing a NULL pointer dereference to happen) and is generally wrong, because it abuses the CPU hotplug functionality in a shamelessly blatant way. The original issue should be addressed through appropriate syscore resume callback instead. Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
-
- 08 8月, 2012 1 次提交
-
-
由 Alex Williamson 提交于
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
-
- 07 8月, 2012 2 次提交
-
-
由 Oliver Hartkopp 提交于
The first idea of the CAN FD implementation started with a new struct canfd_frame to be used for both CAN FD frames and legacy CAN frames. The now mainlined implementation supports both CAN frame types simultaneously and distinguishes them only by their required sizes: CAN_MTU and CANFD_MTU. Only the struct canfd_frame contains a flags element which is needed for the additional CAN FD information. As CAN FD implicitly means that the 'Extened Data Length' mode is enabled the formerly defined CANFD_EDL bit became redundant and also confusing as an unset bit would be an error and would always need to be tested. This patch removes the obsolete CANFD_EDL bit and clarifies the documentation for the use of struct canfd_frame and the CAN FD relevant flags. Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net> Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
-
由 Eric Dumazet 提交于
IPv6 needs a cookie in dst_check() call. We need to add rx_dst_cookie and provide a family independent sk_rx_dst_set(sk, skb) method to properly support IPv6 TCP early demux. Signed-off-by: NEric Dumazet <edumazet@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 04 8月, 2012 2 次提交
-
-
由 Artem Bityutskiy 提交于
The pdflush thread is long gone, so this patch removes references to pdflush from vfs comments. Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Artem Bityutskiy 提交于
Finally we can kill the 'sync_supers' kernel thread along with the '->write_super()' superblock operation because all the users are gone. Now every file-system is supposed to self-manage own superblock and its dirty state. The nice thing about killing this thread is that it improves power management. Indeed, 'sync_supers' is a source of monotonic system wake-ups - it woke up every 5 seconds no matter what - even if there were no dirty superblocks and even if there were no file-systems using this service (e.g., btrfs and journalled ext4 do not need it). So it was wasting power most of the time. And because the thread was in the core of the kernel, all systems had to have it. So I am quite happy to make it go away. Interestingly, this thread is a left-over from the pdflush kernel thread which was a self-forking kernel thread responsible for all the write-back in old Linux kernels. It was turned into per-block device BDI threads, and 'sync_supers' was a left-over. Thus, R.I.P, pdflush as well. Signed-off-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 03 8月, 2012 4 次提交
-
-
由 Joerg Roedel 提交于
The 'struct notifier_block' is not used in linux/iommu.h but not declared anywhere. Add a forward declaration for it. Reported-by: NThierry Reding <thierry.reding@avionic-design.de> Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
由 Thierry Reding 提交于
The linux/iommu.h header uses types defined in linux/types.h but doesn't include it. Signed-off-by: NThierry Reding <thierry.reding@avionic-design.de> Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
由 Thomas Renninger 提交于
Otherwise you could run into: WARN_ON in numa_register_memblks(), because node_possible_map is zero References: https://bugzilla.novell.com/show_bug.cgi?id=757888 On this machine (ProLiant ML570 G3) the SRAT table contains: - No processor affinities - One memory affinity structure (which is set disabled) CC: Per Jessen <per@opensuse.org> CC: Andi Kleen <andi@firstfloor.org> Signed-off-by: NThomas Renninger <trenn@suse.de> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Rafał Miłecki 提交于
Signed-off-by: NRafał Miłecki <zajec5@gmail.com> Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
-
- 02 8月, 2012 2 次提交
-
-
由 Ben Hutchings 提交于
A peer (or local user) may cause TCP to use a nominal MSS of as little as 88 (actual MSS of 76 with timestamps). Given that we have a sufficiently prodigious local sender and the peer ACKs quickly enough, it is nevertheless possible to grow the window for such a connection to the point that we will try to send just under 64K at once. This results in a single skb that expands to 861 segments. In some drivers with TSO support, such an skb will require hundreds of DMA descriptors; a substantial fraction of a TX ring or even more than a full ring. The TX queue selected for the skb may stall and trigger the TX watchdog repeatedly (since the problem skb will be retried after the TX reset). This particularly affects sfc, for which the issue is designated as CVE-2012-3412. Therefore: 1. Add the field net_device::gso_max_segs holding the device-specific limit. 2. In netif_skb_features(), if the number of segments is too high then mask out GSO features to force fall back to software GSO. Signed-off-by: NBen Hutchings <bhutchings@solarflare.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 J. Bruce Fields 提交于
In commit 3b6e2723 ("locks: prevent side-effects of locks_release_private before file_lock is initialized") we removed the last user of lm_release_private without removing the field itself. Signed-off-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 8月, 2012 10 次提交
-
-
由 Vivek Goyal 提交于
Add a new operation code (BLKPG_RESIZE_PARTITION) to the BLKPG ioctl that allows altering the size of an existing partition, even if it is currently in use. This patch converts hd_struct->nr_sects into sequence counter because One might extend a partition while IO is happening to it and update of nr_sects can be non-atomic on 32bit machines with 64bit sector_t. This can lead to issues like reading inconsistent size of a partition. Sequence counter have been used so that readers don't have to take bdev mutex lock as we call sector_in_part() very frequently. Now all the access to hd_struct->nr_sects should happen using sequence counter read/update helper functions part_nr_sects_read/part_nr_sects_write. There is one exception though, set_capacity()/get_capacity(). I think theoritically race should exist there too but this patch does not modify set_capacity()/get_capacity() due to sheer number of call sites and I am afraid that change might break something. I have left that as a TODO item. We can handle it later if need be. This patch does not introduce any new races as such w.r.t set_capacity()/get_capacity(). v2: Add CONFIG_LBDAF test to UP preempt case as suggested by Phillip. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NPhillip Susi <psusi@ubuntu.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Guennadi Liakhovetski 提交于
The recent shdma driver split has mistakenly removed support for partial DMA transfer size calculation on forced termination. This patch restores it. Signed-off-by: NGuennadi Liakhovetski <g.liakhovetski@gmx.de> Acked-by: NVinod Koul <vinod.koul@linux.intel.com> Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
-
由 Andres Salomon 提交于
The 1.75-based OLPC EC driver already does this; let's do it for all EC drivers. This gives us nice suspend/resume hooks, amongst other things. We want to run the EC's suspend hooks later than other drivers (which may be setting wakeup masks or be running EC commands). We also want to run the EC's resume hooks earlier than other drivers (which may want to run EC commands). Signed-off-by: NAndres Salomon <dilinger@queued.net> Acked-by: NPaul Fox <pgf@laptop.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Andres Salomon 提交于
This provides a new API allows different OLPC architectures to override the EC driver. x86 and ARM OLPC machines use completely different EC backends. The olpc_ec_cmd is synchronous, and waits for the workqueue to send the command to the EC. Multiple callers can run olpc_ec_cmd() at once, and they will by serialized and sleep while only one executes on the EC at a time. We don't provide an unregister function, as that doesn't make sense within the context of OLPC machines - there's only ever 1 EC, it's critical to functionality, and it certainly not hotpluggable. Signed-off-by: NAndres Salomon <dilinger@queued.net> Acked-by: NPaul Fox <pgf@laptop.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Andres Salomon 提交于
The OLPC EC driver has outgrown arch/x86/platform/. It's time to both share common code amongst different architectures, as well as move it out of arch/x86/. The XO-1.75 is ARM-based, and the EC driver shares a lot of code with the x86 code. Signed-off-by: NAndres Salomon <dilinger@queued.net> Acked-by: NPaul Fox <pgf@laptop.org> Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Mel Gorman 提交于
If a process creates a large hugetlbfs mapping that is eligible for page table sharing and forks heavily with children some of whom fault and others which destroy the mapping then it is possible for page tables to get corrupted. Some teardowns of the mapping encounter a "bad pmd" and output a message to the kernel log. The final teardown will trigger a BUG_ON in mm/filemap.c. This was reproduced in 3.4 but is known to have existed for a long time and goes back at least as far as 2.6.37. It was probably was introduced in 2.6.20 by [39dde65c: shared page table for hugetlb page]. The messages look like this; [ ..........] Lots of bad pmd messages followed by this [ 127.164256] mm/memory.c:391: bad pmd ffff880412e04fe8(80000003de4000e7). [ 127.164257] mm/memory.c:391: bad pmd ffff880412e04ff0(80000003de6000e7). [ 127.164258] mm/memory.c:391: bad pmd ffff880412e04ff8(80000003de0000e7). [ 127.186778] ------------[ cut here ]------------ [ 127.186781] kernel BUG at mm/filemap.c:134! [ 127.186782] invalid opcode: 0000 [#1] SMP [ 127.186783] CPU 7 [ 127.186784] Modules linked in: af_packet cpufreq_conservative cpufreq_userspace cpufreq_powersave acpi_cpufreq mperf ext3 jbd dm_mod coretemp crc32c_intel usb_storage ghash_clmulni_intel aesni_intel i2c_i801 r8169 mii uas sr_mod cdrom sg iTCO_wdt iTCO_vendor_support shpchp serio_raw cryptd aes_x86_64 e1000e pci_hotplug dcdbas aes_generic container microcode ext4 mbcache jbd2 crc16 sd_mod crc_t10dif i915 drm_kms_helper drm i2c_algo_bit ehci_hcd ahci libahci usbcore rtc_cmos usb_common button i2c_core intel_agp video intel_gtt fan processor thermal thermal_sys hwmon ata_generic pata_atiixp libata scsi_mod [ 127.186801] [ 127.186802] Pid: 9017, comm: hugetlbfs-test Not tainted 3.4.0-autobuild #53 Dell Inc. OptiPlex 990/06D7TR [ 127.186804] RIP: 0010:[<ffffffff810ed6ce>] [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160 [ 127.186809] RSP: 0000:ffff8804144b5c08 EFLAGS: 00010002 [ 127.186810] RAX: 0000000000000001 RBX: ffffea000a5c9000 RCX: 00000000ffffffc0 [ 127.186811] RDX: 0000000000000000 RSI: 0000000000000009 RDI: ffff88042dfdad00 [ 127.186812] RBP: ffff8804144b5c18 R08: 0000000000000009 R09: 0000000000000003 [ 127.186813] R10: 0000000000000000 R11: 000000000000002d R12: ffff880412ff83d8 [ 127.186814] R13: ffff880412ff83d8 R14: 0000000000000000 R15: ffff880412ff83d8 [ 127.186815] FS: 00007fe18ed2c700(0000) GS:ffff88042dce0000(0000) knlGS:0000000000000000 [ 127.186816] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b [ 127.186817] CR2: 00007fe340000503 CR3: 0000000417a14000 CR4: 00000000000407e0 [ 127.186818] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 127.186819] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 127.186820] Process hugetlbfs-test (pid: 9017, threadinfo ffff8804144b4000, task ffff880417f803c0) [ 127.186821] Stack: [ 127.186822] ffffea000a5c9000 0000000000000000 ffff8804144b5c48 ffffffff810ed83b [ 127.186824] ffff8804144b5c48 000000000000138a 0000000000001387 ffff8804144b5c98 [ 127.186825] ffff8804144b5d48 ffffffff811bc925 ffff8804144b5cb8 0000000000000000 [ 127.186827] Call Trace: [ 127.186829] [<ffffffff810ed83b>] delete_from_page_cache+0x3b/0x80 [ 127.186832] [<ffffffff811bc925>] truncate_hugepages+0x115/0x220 [ 127.186834] [<ffffffff811bca43>] hugetlbfs_evict_inode+0x13/0x30 [ 127.186837] [<ffffffff811655c7>] evict+0xa7/0x1b0 [ 127.186839] [<ffffffff811657a3>] iput_final+0xd3/0x1f0 [ 127.186840] [<ffffffff811658f9>] iput+0x39/0x50 [ 127.186842] [<ffffffff81162708>] d_kill+0xf8/0x130 [ 127.186843] [<ffffffff81162812>] dput+0xd2/0x1a0 [ 127.186845] [<ffffffff8114e2d0>] __fput+0x170/0x230 [ 127.186848] [<ffffffff81236e0e>] ? rb_erase+0xce/0x150 [ 127.186849] [<ffffffff8114e3ad>] fput+0x1d/0x30 [ 127.186851] [<ffffffff81117db7>] remove_vma+0x37/0x80 [ 127.186853] [<ffffffff81119182>] do_munmap+0x2d2/0x360 [ 127.186855] [<ffffffff811cc639>] sys_shmdt+0xc9/0x170 [ 127.186857] [<ffffffff81410a39>] system_call_fastpath+0x16/0x1b [ 127.186858] Code: 0f 1f 44 00 00 48 8b 43 08 48 8b 00 48 8b 40 28 8b b0 40 03 00 00 85 f6 0f 88 df fe ff ff 48 89 df e8 e7 cb 05 00 e9 d2 fe ff ff <0f> 0b 55 83 e2 fd 48 89 e5 48 83 ec 30 48 89 5d d8 4c 89 65 e0 [ 127.186868] RIP [<ffffffff810ed6ce>] __delete_from_page_cache+0x15e/0x160 [ 127.186870] RSP <ffff8804144b5c08> [ 127.186871] ---[ end trace 7cbac5d1db69f426 ]--- The bug is a race and not always easy to reproduce. To reproduce it I was doing the following on a single socket I7-based machine with 16G of RAM. $ hugeadm --pool-pages-max DEFAULT:13G $ echo $((18*1048576*1024)) > /proc/sys/kernel/shmmax $ echo $((18*1048576*1024)) > /proc/sys/kernel/shmall $ for i in `seq 1 9000`; do ./hugetlbfs-test; done On my particular machine, it usually triggers within 10 minutes but enabling debug options can change the timing such that it never hits. Once the bug is triggered, the machine is in trouble and needs to be rebooted. The machine will respond but processes accessing proc like "ps aux" will hang due to the BUG_ON. shutdown will also hang and needs a hard reset or a sysrq-b. The basic problem is a race between page table sharing and teardown. For the most part page table sharing depends on i_mmap_mutex. In some cases, it is also taking the mm->page_table_lock for the PTE updates but with shared page tables, it is the i_mmap_mutex that is more important. Unfortunately it appears to be also insufficient. Consider the following situation Process A Process B --------- --------- hugetlb_fault shmdt LockWrite(mmap_sem) do_munmap unmap_region unmap_vmas unmap_single_vma unmap_hugepage_range Lock(i_mmap_mutex) Lock(mm->page_table_lock) huge_pmd_unshare/unmap tables <--- (1) Unlock(mm->page_table_lock) Unlock(i_mmap_mutex) huge_pte_alloc ... Lock(i_mmap_mutex) ... vma_prio_walk, find svma, spte ... Lock(mm->page_table_lock) ... share spte ... Unlock(mm->page_table_lock) ... Unlock(i_mmap_mutex) ... hugetlb_no_page <--- (2) free_pgtables unlink_file_vma hugetlb_free_pgd_range remove_vma_list In this scenario, it is possible for Process A to share page tables with Process B that is trying to tear them down. The i_mmap_mutex on its own does not prevent Process A walking Process B's page tables. At (1) above, the page tables are not shared yet so it unmaps the PMDs. Process A sets up page table sharing and at (2) faults a new entry. Process B then trips up on it in free_pgtables. This patch fixes the problem by adding a new function __unmap_hugepage_range_final that is only called when the VMA is about to be destroyed. This function clears VM_MAYSHARE during unmap_hugepage_range() under the i_mmap_mutex. This makes the VMA ineligible for sharing and avoids the race. Superficially this looks like it would then be vunerable to truncate and madvise issues but hugetlbfs has its own truncate handlers so does not use unmap_mapping_range() and does not support madvise(DONTNEED). This should be treated as a -stable candidate if it is merged. Test program is as follows. The test case was mostly written by Michal Hocko with a few minor changes to reproduce this bug. ==== CUT HERE ==== static size_t huge_page_size = (2UL << 20); static size_t nr_huge_page_A = 512; static size_t nr_huge_page_B = 5632; unsigned int get_random(unsigned int max) { struct timeval tv; gettimeofday(&tv, NULL); srandom(tv.tv_usec); return random() % max; } static void play(void *addr, size_t size) { unsigned char *start = addr, *end = start + size, *a; start += get_random(size/2); /* we could itterate on huge pages but let's give it more time. */ for (a = start; a < end; a += 4096) *a = 0; } int main(int argc, char **argv) { key_t key = IPC_PRIVATE; size_t sizeA = nr_huge_page_A * huge_page_size; size_t sizeB = nr_huge_page_B * huge_page_size; int shmidA, shmidB; void *addrA = NULL, *addrB = NULL; int nr_children = 300, n = 0; if ((shmidA = shmget(key, sizeA, IPC_CREAT|SHM_HUGETLB|0660)) == -1) { perror("shmget:"); return 1; } if ((addrA = shmat(shmidA, addrA, SHM_R|SHM_W)) == (void *)-1UL) { perror("shmat"); return 1; } if ((shmidB = shmget(key, sizeB, IPC_CREAT|SHM_HUGETLB|0660)) == -1) { perror("shmget:"); return 1; } if ((addrB = shmat(shmidB, addrB, SHM_R|SHM_W)) == (void *)-1UL) { perror("shmat"); return 1; } fork_child: switch(fork()) { case 0: switch (n%3) { case 0: play(addrA, sizeA); break; case 1: play(addrB, sizeB); break; case 2: break; } break; case -1: perror("fork:"); break; default: if (++n < nr_children) goto fork_child; play(addrA, sizeA); break; } shmdt(addrA); shmdt(addrB); do { wait(NULL); } while (--n > 0); shmctl(shmidA, IPC_RMID, NULL); shmctl(shmidB, IPC_RMID, NULL); return 0; } [akpm@linux-foundation.org: name the declaration's args, fix CONFIG_HUGETLBFS=n build] Signed-off-by: NHugh Dickins <hughd@google.com> Reviewed-by: NMichal Hocko <mhocko@suse.cz> Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
pg_data_t is zeroed before reaching free_area_init_core(), so remove the now unnecessary initializations. Signed-off-by: NMinchan Kim <minchan@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
Compaction (and page migration in general) can currently be hindered through pages being owned by memory cgroups that are at their limits and unreclaimable. The reason is that the replacement page is being charged against the limit while the page being replaced is also still charged. But this seems unnecessary, given that only one of the two pages will still be in use after migration finishes. This patch changes the memcg migration sequence so that the replacement page is not charged. Whatever page is still in use after successful or failed migration gets to keep the charge of the page that was going to be replaced. The replacement page will still show up temporarily in the rss/cache statistics, this can be fixed in a later patch as it's less urgent. Reported-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Acked-by: NMichal Hocko <mhocko@suse.cz> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Wanpeng Li <liwp.linux@gmail.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol ->connect() method. PF_MEMALLOC should allow the allocation of struct socket and related objects and the early (re)setting of SOCK_MEMALLOC should allow us to receive the packets required for the TCP connection buildup. [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases] [dfeng@redhat.com: Fix handling of multiple swap files] [a.p.zijlstra@chello.nl: Original patch] Signed-off-by: NMel Gorman <mgorman@suse.de> Acked-by: NRik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
The patch "mm: add support for a filesystem to activate swap files and use direct_IO for writing swap pages" added support for using direct_IO to write swap pages but it is insufficient for highmem pages. To support highmem pages, this patch kmaps() the page before calling the direct_IO() handler. As direct_IO deals with virtual addresses an additional helper is necessary for get_kernel_pages() to lookup the struct page for a kmap virtual address. Signed-off-by: NMel Gorman <mgorman@suse.de> Acked-by: NRik van Riel <riel@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric B Munson <emunson@mgebm.net> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <jmorris@namei.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Christie <michaelc@cs.wisc.edu> Cc: Neil Brown <neilb@suse.de> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: Xiaotian Feng <dfeng@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-