- 06 10月, 2010 2 次提交
-
-
由 Evgeny Kuznetsov 提交于
The "flags" member of "struct wait_queue_t" is used in several places in the kernel code without beeing initialized by init_wait(). "flags" is used in bitwise operations. If "flags" not initialized then unexpected behaviour may take place. Incorrect flags might used later in code. Added initialization of "wait_queue_t.flags" with zero value into "init_wait". Signed-off-by: NEvgeny Kuznetsov <EXT-Eugeny.Kuznetsov@nokia.com> [ The bit we care about does end up being initialized by both prepare_to_wait() and add_to_wait_queue(), so this doesn't seem to cause actual bugs, but is definitely the right thing to do -Linus ] Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Linus Torvalds 提交于
With all the recent module loading cleanups, we've minimized the code that sits under module_mutex, fixing various deadlocks and making it possible to do most of the module loading in parallel. However, that whole conversion totally missed the rather obscure code that adds a new module to the list for BUG() handling. That code was doubly obscure because (a) the code itself lives in lib/bugs.c (for dubious reasons) and (b) it gets called from the architecture-specific "module_finalize()" rather than from generic code. Calling it from arch-specific code makes no sense what-so-ever to begin with, and is now actively wrong since that code isn't protected by the module loading lock any more. So this commit moves the "module_bug_{finalize,cleanup}()" calls away from the arch-specific code, and into the generic code - and in the process protects it with the module_mutex so that the list operations are now safe. Future fixups: - move the module list handling code into kernel/module.c where it belongs. - get rid of 'module_bug_list' and just use the regular list of modules (called 'modules' - imagine that) that we already create and maintain for other reasons. Reported-and-tested-by: NThomas Gleixner <tglx@linutronix.de> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Adrian Bunk <bunk@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 10月, 2010 3 次提交
-
-
由 Chris Wilson 提交于
In order to be fully threadsafe we need to check that the drm_gem_object refcount is still 0 after acquiring the mutex in order to call the free function. Otherwise, we may encounter scenarios like: Thread A: Thread B: drm_gem_close unreference_unlocked kref_put mutex_lock ... i915_gem_evict ... kref_get -> BUG ... i915_gem_unbind ... kref_put ... i915_gem_object_free ... mutex_unlock mutex_lock i915_gem_object_free -> BUG i915_gem_object_unbind kfree mutex_unlock Note that no driver is currently using the free_unlocked vfunc and it is scheduled for removal, hasten that process. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=30454Reported-and-Tested-by: NMagnus Kessler <Magnus.Kessler@gmx.net> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Suresh Siddha 提交于
Avoid TLB flush IPIs for the cores in deeper c-states by voluntary leave_mm() before entering into that state. CPUs tend to flush TLB in those c-states anyways. acpi_idle does this with C3-type states, but it was not caried over when intel_idle was introduced. intel_idle can apply it to C-states in addition to those that ACPI might export as C3... Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Dave Airlie 提交于
There were lots of places being inconsistent since handle count looked like a kref but it really wasn't. Fix this my just making handle count an atomic on the object, and have it increase the normal object kref. Now i915/radeon/nouveau drivers can drop the normal reference on userspace object creation, and have the handle hold it. This patch fixes a memory leak or corruption on unload, because the driver had no way of knowing if a handle had been actually added for this object, and the fbcon object needed to know this to clean itself up properly. Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 29 9月, 2010 1 次提交
-
-
由 Lucas De Marchi 提交于
Signed-off-by: NLen Brown <len.brown@intel.com>
-
- 28 9月, 2010 3 次提交
-
-
由 David S. Miller 提交于
Fixes kernel bugzilla #16603 tcp_sendmsg() truncates iov_len to an 'int' which a 4GB write to write zero bytes, for example. There is also the problem higher up of how verify_iovec() works. It wants to prevent the total length from looking like an error return value. However it does this using 'int', but syscalls return 'long' (and thus signed 64-bit on 64-bit machines). So it could trigger false-positives on 64-bit as written. So fix it to use 'long'. Reported-by: NOlaf Bonorden <bono@onlinehome.de> Reported-by: NDaniel Büse <dbuese@gmx.de> Reported-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Chris Wilson 提交于
Hook the GEM vm open/close ops into the generic drm vm open/close so that the private vma entries are created and destroy appropriately. Fixes the leak of the drm_vma_entries during the lifetime of the filp. Reported-by: NMatt Mackall <mpm@selenic.com> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Acked-by: NJesse Barnes <jbarnes@virtuousgeek.org> Cc: stable@kernel.org Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Ulrich Weber 提交于
as done in ip_route_connect() Signed-off-by: NUlrich Weber <uweber@astaro.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 27 9月, 2010 3 次提交
-
-
由 Neil Horman 提交于
Clean up a missing exit path in the ipv6 module init routines. In addrconf_init we call ipv6_addr_label_init which calls register_pernet_subsys for the ipv6_addr_label_ops structure. But if module loading fails, or if the ipv6 module is removed, there is no corresponding unregister_pernet_subsys call, which leaves a now-bogus address on the pernet_list, leading to oopses in subsequent registrations. This patch cleans up both the failed load path and the unload path. Tested by myself with good results. Signed-off-by: NNeil Horman <nhorman@tuxdriver.com> include/net/addrconf.h | 1 + net/ipv6/addrconf.c | 11 ++++++++--- net/ipv6/addrlabel.c | 5 +++++ 3 files changed, 14 insertions(+), 3 deletions(-) Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tom Herbert 提交于
Reset queue mapping when an skb is reentering the stack via a tunnel. On second pass, the queue mapping from the original device is no longer valid. Signed-off-by: NTom Herbert <therbert@google.com> Acked-by: NEric Dumazet <eric.dumazet@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Dave Airlie 提交于
fixes https://bugzilla.kernel.org/show_bug.cgi?id=19012 cc: stable@kernel.org Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 23 9月, 2010 9 次提交
-
-
由 Eric Dumazet 提交于
rcu_dereference_bh() doesnt know yet about hard irq being disabled, so lockdep can trigger in netpoll_rx() after commit f0f9deae (netpoll: Disable IRQ around RCU dereference in netpoll_rx) Reported-by: NMiles Lane <miles.lane@gmail.com> Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com> Tested-by: NMiles Lane <miles.lane@gmail.com> Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
-
由 Joerg Roedel 提交于
This patch adds a workaround for an IOMMU BIOS problem to the AMD IOMMU driver. The result of the bug is that the IOMMU does not execute commands anymore when the system comes out of the S3 state resulting in system failure. The bug in the BIOS is that is does not restore certain hardware specific registers correctly. This workaround reads out the contents of these registers at boot time and restores them on resume from S3. The workaround is limited to the specific IOMMU chipset where this problem occurs. Cc: stable@kernel.org Signed-off-by: NJoerg Roedel <joerg.roedel@amd.com>
-
由 FUJITA Tomonori 提交于
This fixes the regression caused by the commit 6fee48cd ("dma-mapping: arm: use generic pci_set_dma_mask and pci_set_consistent_dma_mask"). ARM needs to clip the dma coherent mask for dmabounce devices. This restores the old trick. Note that strictly speaking, the DMA API doesn't allow architectures to do such but I'm not sure it's worth adding the new API to set the dma mask that allows architectures to clip it. Reported-by: NKrzysztof Halasa <khc@pm.waw.pl> Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mathieu Lacage 提交于
Add a missing inline keyword for static function in linux/dmaengine.h to avoid duplicate symbol definitions. Signed-off-by: NMathieu Lacage <mathieu.lacage@sophia.inria.fr> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Jason Baron 提交于
Convert the 'dynamic debug' infrastructure to use jump labels. Signed-off-by: NJason Baron <jbaron@redhat.com> LKML-Reference: <b77627358cea3e27d7be4386f45f66219afb8452.1284733808.git.jbaron@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Jason Baron 提交于
Make use of the jump label infrastructure for tracepoints. Signed-off-by: NJason Baron <jbaron@redhat.com> LKML-Reference: <a9ba2056e2c9cf332c3c300b577463ce66ff23a8.1284733808.git.jbaron@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Jason Baron 提交于
Add a jump_label_text_reserved(void *start, void *end), so that other pieces of code that want to modify kernel text, can first verify that jump label has not reserved the instruction. Acked-by: NMasami Hiramatsu <mhiramat@redhat.com> Signed-off-by: NJason Baron <jbaron@redhat.com> LKML-Reference: <06236663a3a7b1c1f13576bb9eccb6d9c17b7bfe.1284733808.git.jbaron@redhat.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Jason Baron 提交于
base patch to implement 'jump labeling'. Based on a new 'asm goto' inline assembly gcc mechanism, we can now branch to labels from an 'asm goto' statment. This allows us to create a 'no-op' fastpath, which can subsequently be patched with a jump to the slowpath code. This is useful for code which might be rarely used, but which we'd like to be able to call, if needed. Tracepoints are the current usecase that these are being implemented for. Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NJason Baron <jbaron@redhat.com> LKML-Reference: <ee8b3595967989fdaf84e698dc7447d315ce972a.1284733808.git.jbaron@redhat.com> [ cleaned up some formating ] Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Ollie Wild 提交于
This patch reduces namespace pollution by moving the "struct net" declaration out of the userspace-facing portion of linux/netlink.h. It has no impact on the kernel. (This came up because we have several C++ applications which use "net" as a namespace name.) Signed-off-by: NOllie Wild <aaw@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 22 9月, 2010 1 次提交
-
-
由 Sage Weil 提交于
The lock structs are currently protected by the BKL, but are accessed by code in fs/locks.c and misc file system and DLM code. These stubs will allow all users to switch to the new interface before the implementation is changed to a spinlock. Acked-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 9月, 2010 2 次提交
-
-
由 Peter Zijlstra 提交于
These are similar to {get,put}_cpu_var() except for dynamically allocated per-cpu memory. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Acked-by: NTejun Heo <tj@kernel.org> LKML-Reference: <20100917093009.252867712@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Thomas Egerer 提交于
The family parameter xfrm_state_find is used to find a state matching a certain policy. This value is set to the template's family (encap_family) right before xfrm_state_find is called. The family parameter is however also used to construct a temporary state in xfrm_state_find itself which is wrong for inter-family scenarios because it produces a selector for the wrong family. Since this selector is included in the xfrm_user_acquire structure, user space programs misinterpret IPv6 addresses as IPv4 and vice versa. This patch splits up the original init_tempsel function into a part that initializes the selector respectively the props and id of the temporary state, to allow for differing ip address families whithin the state. Signed-off-by: NThomas Egerer <thomas.egerer@secunet.com> Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 18 9月, 2010 1 次提交
-
-
由 Herbert Xu 提交于
We cannot use rcu_dereference_bh safely in netpoll_rx as we may be called with IRQs disabled. We could however simply disable IRQs as that too causes BH to be disabled and is safe in either case. Thanks to John Linville for discovering this bug and providing a patch. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 17 9月, 2010 3 次提交
-
-
由 Peter Zijlstra 提交于
Revert the timer per cpu-context timers because of unfortunate nohz interaction. Fixing that would have been somewhat ugly, so go back to driving things from the regular tick. Provide a jiffies interval feature for people who want slower rotations. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20100917093009.519845633@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Peter Zijlstra 提交于
Aside from allowing software events into a !software group, allow adding !software events to pure software groups. Once we've moved the software group and attached the first !software event, the group will no longer be a pure software group and hence no longer be eligible for movement, at which point the straight ctx comparison is correct again. Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Robert Richter <robert.richter@amd.com> Cc: Paul Mackerras <paulus@samba.org> LKML-Reference: <20100917093009.410784731@chello.nl> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 Jean Pihet 提交于
This patch adds new generic events for dynamic power management tracing: - clock events class: used for clock enable/disable and for clock rate change, - power_domain events class: used for power domains transitions. The OMAP architecture will be using the new events for PM debugging, however the new events are made generic enough to be used by all platforms. Signed-off-by: NJean Pihet <j-pihet@ti.com> Acked-by: NThomas Renninger <trenn@suse.de> Cc: discuss@lesswatts.org Cc: linux-pm@lists.linux-foundation.org Cc: Thomas Renninger <trenn@suse.de> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Len Brown <len.brown@intel.com> Cc: Kevin Hilman <khilman@deeprootsystems.com> LKML-Reference: <AANLkTinUmbSUUuxUzc8++pcb9gd1CZFdyTQFrveTBXyV@mail.gmail.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
- 16 9月, 2010 1 次提交
-
-
由 Alexey Kuznetsov 提交于
If peer uses tiny MSS (say, 75 bytes) and similarly tiny advertised window, the SWS logic will packetize to half the MSS unnecessarily. This causes problems with some embedded devices. However for large MSS devices we do want to half-MSS packetize otherwise we never get enough packets into the pipe for things like fast retransmit and recovery to work. Be careful also to handle the case where MSS > window, otherwise we'll never send until the probe timer. Reported-by: Nツ Leandro Melo de Sales <leandroal@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 15 9月, 2010 2 次提交
-
-
由 Matt Helsley 提交于
The kernel perf event creation path shouldn't use find_task_by_vpid() because a vpid exists in a specific namespace. find_task_by_vpid() uses current's pid namespace which isn't always the correct namespace to use for the vpid in all the places perf_event_create_kernel_counter() (and thus find_get_context()) is called. The goal is to clean up pid namespace handling and prevent bugs like: https://bugzilla.kernel.org/show_bug.cgi?id=17281 Instead of using pids switch find_get_context() to use task struct pointers directly. The syscall is responsible for resolving the pid to a task struct. This moves the pid namespace resolution into the syscall much like every other syscall that takes pid parameters. Signed-off-by: NMatt Helsley <matthltc@us.ibm.com> Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robin Green <greenrd@greenrd.org> Cc: Prasad <prasad@linux.vnet.ibm.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Will Deacon <will.deacon@arm.com> Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> LKML-Reference: <a134e5e392ab0204961fd1a62c84a222bf5874a9.1284407763.git.matthltc@us.ibm.com> Signed-off-by: NIngo Molnar <mingo@elte.hu>
-
由 H. Peter Anvin 提交于
compat_alloc_user_space() expects the caller to independently call access_ok() to verify the returned area. A missing call could introduce problems on some architectures. This patch incorporates the access_ok() check into compat_alloc_user_space() and also adds a sanity check on the length. The existing compat_alloc_user_space() implementations are renamed arch_compat_alloc_user_space() and are used as part of the implementation of the new global function. This patch assumes NULL will cause __get_user()/__put_user() to either fail or access userspace on all architectures. This should be followed by checking the return value of compat_access_user_space() for NULL in the callers, at which time the access_ok() in the callers can also be removed. Reported-by: NBen Hawkes <hawkes@sota.gen.nz> Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com> Acked-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: NChris Metcalf <cmetcalf@tilera.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Acked-by: NIngo Molnar <mingo@elte.hu> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NTony Luck <tony.luck@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: James Bottomley <jejb@parisc-linux.org> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: <stable@kernel.org>
-
- 14 9月, 2010 1 次提交
-
-
由 Chris Wilson 提交于
v2: Julien Cristau pointed out that @nondestructive results in double-negatives and confusion when trying to interpret the parameter, so use @force instead. Much easier to type as well. ;-) And fix the miscompilation of vmgfx reported by Sedat Dilek. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Signed-off-by: NDave Airlie <airlied@redhat.com>
-
- 13 9月, 2010 3 次提交
-
-
由 Chris Wilson 提交于
Destructive load-detection is very expensive and due to failings elsewhere can trigger system wide stalls of up to 600ms. A simple first step to correcting this is not to invoke such an expensive and destructive load-detection operation automatically. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=29536 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=16265Reported-by: NBruno Prémont <bonbons@linux-vserver.org> Tested-by: NSitsofe Wheeler <sitsofe@yahoo.com> Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Cc: stable@kernel.org Signed-off-by: NDave Airlie <airlied@redhat.com>
-
由 Tejun Heo 提交于
Update copyright notice and add Documentation/workqueue.txt. Randy Dunlap, Dave Chinner: misc fixes. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-By: NFlorian Mickler <florian@mickler.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Randy Dunlap <randy.dunlap@oracle.com> Cc: Dave Chinner <david@fromorbit.com>
-
由 Trond Myklebust 提交于
There is a race between rpc_info_open and rpc_release_client() in that nothing stops a process from opening the file after the clnt->cl_kref goes to zero. Fix this by using atomic_inc_unless_zero()... Reported-by: NJ. Bruce Fields <bfields@redhat.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org
-
- 10 9月, 2010 5 次提交
-
-
由 Gwendal Grignou 提交于
Keep track of the link on the which the current request is in progress. It allows support of links behind port multiplier. Not all libata-sff is PMP compliant. Code for native BMDMA controller does not take in accound PMP. Tested on Marvell 7042 and Sil7526. Signed-off-by: NGwendal Grignou <gwendal@google.com> Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
-
由 Tejun Heo 提交于
For some mysterious reason, certain hardware reacts badly to usual EH actions while the system is going for suspend. As the devices won't be needed until the system is resumed, ask EH to skip usual autopsy and recovery and proceed directly to suspend. Signed-off-by: NTejun Heo <tj@kernel.org> Tested-by: NStephan Diestelhorst <stephan.diestelhorst@amd.com> Cc: stable@kernel.org Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
-
由 Christoph Lameter 提交于
mm: page allocator: calculate a better estimate of NR_FREE_PAGES when memory is low and kswapd is awake Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is cheaper than scanning a number of lists. To avoid synchronization overhead, counter deltas are maintained on a per-cpu basis and drained both periodically and when the delta is above a threshold. On large CPU systems, the difference between the estimated and real value of NR_FREE_PAGES can be very high. If NR_FREE_PAGES is much higher than number of real free page in buddy, the VM can allocate pages below min watermark, at worst reducing the real number of pages to zero. Even if the OOM killer kills some victim for freeing memory, it may not free memory if the exit path requires a new page resulting in livelock. This patch introduces a zone_page_state_snapshot() function (courtesy of Christoph) that takes a slightly more accurate view of an arbitrary vmstat counter. It is used to read NR_FREE_PAGES while kswapd is awake to avoid the watermark being accidentally broken. The estimate is not perfect and may result in cache line bounces but is expected to be lighter than the IPI calls necessary to continually drain the per-cpu counters while kswapd is awake. Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NMel Gorman <mel@csn.ul.ie> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Tests with recent firmware on Intel X25-M 80GB and OCZ Vertex 60GB SSDs show a shift since I last tested in December: in part because of firmware updates, in part because of the necessary move from barriers to awaiting completion at the block layer. While discard at swapon still shows as slightly beneficial on both, discarding 1MB swap cluster when allocating is now disadvanteous: adds 25% overhead on Intel, adds 230% on OCZ (YMMV). Surrender: discard as presently implemented is more hindrance than help for swap; but might prove useful on other devices, or with improvements. So continue to do the discard at swapon, but make discard while swapping conditional on a SWAP_FLAG_DISCARD to sys_swapon() (which has been using only the lower 16 bits of int flags). We can add a --discard or -d to swapon(8), and a "discard" to swap in /etc/fstab: matching the mount option for btrfs, ext4, fat, gfs2, nilfs2. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Nigel Cunningham <nigel@tuxonice.net> Cc: Tejun Heo <tj@kernel.org> Cc: Jens Axboe <jaxboe@fusionio.com> Cc: James Bottomley <James.Bottomley@hansenpartnership.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Please revert 2.6.36-rc commit d2997b10 "hibernation: freeze swap at hibernation". It complicated matters by adding a second swap allocation path, just for hibernation; without in any way fixing the issue that it was intended to address - page reclaim after fixing the hibernation image might free swap from a page already imaged as swapcache, letting its swap be reallocated to store a different page of the image: resulting in data corruption if the imaged page were freed as clean then swapped back in. Pages freed to si->swap_map were still in danger of being reallocated by the alternative allocation path. I guess it inadvertently fixed slow SSD swap allocation for hibernation, as reported by Nigel Cunningham: by missing out the discards that occur on the usual swap allocation path; but that was unintentional, and needs a separate fix. Signed-off-by: NHugh Dickins <hughd@google.com> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ondrej Zary <linux@rainbow-software.org> Cc: Andrea Gelmini <andrea.gelmini@gmail.com> Cc: Balbir Singh <balbir@in.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Nigel Cunningham <nigel@tuxonice.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-