- 19 10月, 2010 1 次提交
-
-
由 Yasuaki Ishimatsu 提交于
/proc/diskstats would display a strange output as follows. $ cat /proc/diskstats |grep sda 8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089 8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691 ~~~~~~~~~~ 8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390 8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92 8 4 sda4 4 0 8 0 0 0 0 0 0 0 0 8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137 Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE. The detailed root cause is as follows. Assuming that there are two partition, sda1 and sda2. 1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight is 0 and sda2's one is 1. | hd_struct->in_flight --------------------------- sda1 | 0 sda2 | 1 --------------------------- 2. A bio belongs to sda1 is issued and is merged into the request mentioned on step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed from sda2 region to sda1 region. However the two partition's hd_struct->in_flight are not changed. | hd_struct->in_flight --------------------------- sda1 | 0 sda2 | 1 --------------------------- 3. The request is finished and blk_account_io_done() is called. In this case, sda2's hd_struct->in_flight, not a sda1's one, is decremented. | hd_struct->in_flight --------------------------- sda1 | -1 sda2 | 1 --------------------------- The patch fixes the problem by caching the partition lookup inside the request structure, hence making sure that the increment and decrement will always happen on the same partition struct. This also speeds up IO with accounting enabled, since it cuts down on the number of lookups we have to do. When reloading partition tables, quiesce IO to ensure that no request references to the partition struct exists. When it is safe to free the partition table, the IO for that device is restarted again. Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 15 10月, 2010 1 次提交
-
-
由 Martin K. Petersen 提交于
Previously we tracked whether the integrity metadata had been remapped using a request flag. This was fine for low-level retries. However, if an I/O was redriven by upper layers we would end up remapping again, causing the retry to fail. Deprecate the REQ_INTEGRITY flag and introduce BIO_MAPPED_INTEGRITY which enables filesystems to notify lower layers that the bio in question has already been remapped. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 14 10月, 2010 1 次提交
-
-
由 Martin K. Petersen 提交于
Physical block size was declared unsigned int to accomodate the maximum size reported by READ CAPACITY(16). Make sure we use the right type in the related functions. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Acked-by: NMike Snitzer <snitzer@redhat.com> Cc: stable@kernel.org Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 25 9月, 2010 1 次提交
-
-
由 Mark Lord 提交于
Ensure that 'sysctl_hung_task_timeout_secs' is defined even when CONFIG_DETECT_HUNG_TASK is not set. This way we can safely reference it without need for ifdefs in the code elsewhere. eg. in block/blk-exec.c Signed-off-by: NMark Lord <mlord@pobox.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 16 9月, 2010 1 次提交
-
-
由 Vivek Goyal 提交于
o Actual implementation of throttling policy in block layer. Currently it implements READ and WRITE bytes per second throttling logic. IOPS throttling comes in later patches. Signed-off-by: NVivek Goyal <vgoyal@redhat.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 15 9月, 2010 2 次提交
-
-
由 Will Drewry 提交于
I'm reposting this patch series as v4 since there have been no additional comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches are against Linus's tree: 2bfc96a1 (2.6.36-rc3). Would this patchset be suitable for inclusion in an mm branch? This changes adds a partition_meta_info struct which itself contains a union of structures that provide partition table specific metadata. This change leaves the union empty. The subsequent patch includes an implementation for CONFIG_EFI_PARTITION-based metadata. Signed-off-by: NWill Drewry <wad@chromium.org> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
由 Namhyung Kim 提交于
Change type of 2nd parameter of blk_rq_aligned() into unsigned long and remove unnecessary casting. Now we can call it with 'uaddr' instead of 'ubuf' in __blk_rq_map_user() so that it can remove following warnings from sparse: block/blk-map.c:57:31: warning: incorrect type in argument 2 (different address spaces) block/blk-map.c:57:31: expected void *addr block/blk-map.c:57:31: got void [noderef] <asn:1>*ubuf However blk_rq_map_kern() needs one more local variable to handle it. Signed-off-by: NNamhyung Kim <namhyung@gmail.com> Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
-
- 11 9月, 2010 2 次提交
-
-
由 Martin K. Petersen 提交于
Some controllers have a hardware limit on the number of protection information scatter-gather list segments they can handle. Introduce a max_integrity_segments limit in the block layer and provide a new scsi_host_template setting that allows HBA drivers to provide a value suitable for the hardware. Add support for honoring the integrity segment limit when merging both bios and requests. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <axboe@carl.home.kernel.dk>
-
由 Martin K. Petersen 提交于
We have several users of min_not_zero, each of them using their own definition. Move the define to kernel.h. Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com> Signed-off-by: NJens Axboe <axboe@carl.home.kernel.dk>
-
- 22 8月, 2010 1 次提交
-
-
由 Arjan van de Ven 提交于
With the introduction of the new unified work queue thread pools, we lost one feature: It's no longer possible to know which worker is causing the CPU to wake out of idle. The result is that PowerTOP now reports a lot of "kworker/a:b" instead of more readable results. This patch adds a pair of tracepoints to the new workqueue code, similar in style to the timer/hrtimer tracepoints. With this pair of tracepoints, the next PowerTOP can correctly report which work item caused the wakeup (and how long it took): Interrupt (43) i915 time 3.51ms wakeups 141 Work ieee80211_iface_work time 0.81ms wakeups 29 Work do_dbs_timer time 0.55ms wakeups 24 Process Xorg time 21.36ms wakeups 4 Timer sched_rt_period_timer time 0.01ms wakeups 1 Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 21 8月, 2010 2 次提交
-
-
由 Linus Torvalds 提交于
It's a really simple list, and several of the users want to go backwards in it to find the previous vma. So rather than have to look up the previous entry with 'find_vma_prev()' or something similar, just make it doubly linked instead. Tested-by: NIan Campbell <ijc@hellion.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrea Righi 提交于
kfifo_skip() is currently broken, due to the missing of the internal helper function. Add it. Signed-off-by: NAndrea Righi <arighi@develer.com> Cc: Greg KH <greg@kroah.com> Acked-by: NStefani Seibold <stefani@seibold.net> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 19 8月, 2010 1 次提交
-
-
由 David Howells 提交于
Fix the declaration of sys_execve() in asm-generic/syscalls.h to have various consts applied to its pointers. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 8月, 2010 11 次提交
-
-
由 Jaroslav Kysela 提交于
With some hardware combinations, the PCM interrupts are acknowledged before the period boundary from the emu10k1 chip. The midlevel PCM code gets confused and the playback stream is interrupted. It seems that the interrupt processing shift by 2 samples is enough to fix this issue. This default value does not harm other, non-affected hardware. More information: Kernel bugzilla bug#16300 [A copmile warning fixed by tiwai] Signed-off-by: NJaroslav Kysela <perex@perex.cz> Cc: <stable@kernel.org> Signed-off-by: NTakashi Iwai <tiwai@suse.de>
-
由 Nick Piggin 提交于
fs: scale files_lock Improve scalability of files_lock by adding per-cpu, per-sb files lists, protected with an lglock. The lglock provides fast access to the per-cpu lists to add and remove files. It also provides a snapshot of all the per-cpu lists (although this is very slow). One difficulty with this approach is that a file can be removed from the list by another CPU. We must track which per-cpu list the file is on with a new variale in the file struct (packed into a hole on 64-bit archs). Scalability could suffer if files are frequently removed from different cpu's list. However loads with frequent removal of files imply short interval between adding and removing the files, and the scheduler attempts to avoid moving processes too far away. Also, even in the case of cross-CPU removal, the hardware has much more opportunity to parallelise cacheline transfers with N cachelines than with 1. A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs degenerates to contending on a single lock, which is no worse than before. When more than one CPU are allocating files, even if they are always freed by different CPUs, there will be more parallelism than the single-lock case. Testing results: On a 2 socket, 8 core opteron, I measure the number of times the lock is taken to remove the file, the number of times it is removed by the same CPU that added it, and the number of times it is removed by the same node that added it. Booting: locks= 25049 cpu-hits= 23174 (92.5%) node-hits= 23945 (95.6%) kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%) dbench 64 locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%) So a file is removed from the same CPU it was added by over 90% of the time. It remains within the same node 95% of the time. Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile. throughput 2.6.34-rc2 24.5 +patch 24.9 us sys idle IO wait (in %) 2.6.34-rc2 51.25 28.25 17.25 3.25 +patch 53.75 18.5 19 8.75 So significantly less CPU time spent in kernel code, higher idle time and slightly higher throughput. Single threaded performance difference was within the noise of microbenchmarks. That is not to say penalty does not exist, the code is larger and more memory accesses required so it will be slightly slower. Cc: linux-kernel@vger.kernel.org Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: NNick Piggin <npiggin@kernel.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Nick Piggin 提交于
lglock: introduce special lglock and brlock spin locks This patch introduces "local-global" locks (lglocks). These can be used to: - Provide fast exclusive access to per-CPU data, with exclusive access to another CPU's data allowed but possibly subject to contention, and to provide very slow exclusive access to all per-CPU data. - Or to provide very fast and scalable read serialisation, and to provide very slow exclusive serialisation of data (not necessarily per-CPU data). Brlocks are also implemented as a short-hand notation for the latter use case. Thanks to Paul for local/global naming convention. Cc: linux-kernel@vger.kernel.org Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Signed-off-by: NNick Piggin <npiggin@kernel.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Nick Piggin 提交于
tty: fix fu_list abuse tty code abuses fu_list, which causes a bug in remount,ro handling. If a tty device node is opened on a filesystem, then the last link to the inode removed, the filesystem will be allowed to be remounted readonly. This is because fs_may_remount_ro does not find the 0 link tty inode on the file sb list (because the tty code incorrectly removed it to use for its own purpose). This can result in a filesystem with errors after it is marked "clean". Taking idea from Christoph's initial patch, allocate a tty private struct at file->private_data and put our required list fields in there, linking file and tty. This makes tty nodes behave the same way as other device nodes and avoid meddling with the vfs, and avoids this bug. The error handling is not trivial in the tty code, so for this bugfix, I take the simple approach of using __GFP_NOFAIL and don't worry about memory errors. This is not a problem because our allocator doesn't fail small allocs as a rule anyway. So proper error handling is left as an exercise for tty hackers. [ Arguably filesystem's device inode would ideally be divorced from the driver's pseudo inode when it is opened, but in practice it's not clear whether that will ever be worth implementing. ] Cc: linux-kernel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Cc: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: NNick Piggin <npiggin@kernel.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Nick Piggin 提交于
fs: cleanup files_lock locking Lock tty_files with a new spinlock, tty_files_lock; provide helpers to manipulate the per-sb files list; unexport the files_lock spinlock. Cc: linux-kernel@vger.kernel.org Cc: Christoph Hellwig <hch@infradead.org> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Acked-by: NAndi Kleen <ak@linux.intel.com> Acked-by: NGreg Kroah-Hartman <gregkh@suse.de> Signed-off-by: NNick Piggin <npiggin@kernel.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Nick Piggin 提交于
fs: fs_struct rwlock to spinlock struct fs_struct.lock is an rwlock with the read-side used to protect root and pwd members while taking references to them. Taking a reference to a path typically requires just 2 atomic ops, so the critical section is very small. Parallel read-side operations would have cacheline contention on the lock, the dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a real parallelism increase. Replace it with a spinlock to avoid one or two atomic operations in typical path lookup fastpath. Signed-off-by: NNick Piggin <npiggin@kernel.dk> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
These flags aren't real I/O types, but tell ll_rw_block to always lock the buffer instead of giving up on a failed trylock. Instead add a new write_dirty_buffer helper that implements this semantic and use it from the existing SWRITE* callers. Note that the ll_rw_block code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which this patch fixes. In the ufs code clean up the helper that used to call ll_rw_block to mirror sync_dirty_buffer, which is the function it implements for compound buffers. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Instead of abusing a buffer_head flag just add a variant of sync_dirty_buffer which allows passing the exact type of write flag required. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Ernst Schwab 提交于
Added comments in kernel-doc notation for previously added struct fields. Signed-off-by: NErnst Schwab <eschwab@online.de> Acked-by: NRandy Dunlap <rdunlap@xenotime.net> Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
-
由 David Howells 提交于
Make do_execve() take a const filename pointer so that kernel_execve() compiles correctly on ARM: arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type This also requires the argv and envp arguments to be consted twice, once for the pointer array and once for the strings the array points to. This is because do_execve() passes a pointer to the filename (now const) to copy_strings_kernel(). A simpler alternative would be to cast the filename pointer in do_execve() when it's passed to copy_strings_kernel(). do_execve() may not change any of the strings it is passed as part of the argv or envp lists as they are some of them in .rodata, so marking these strings as const should be fine. Further kernel_execve() and sys_execve() need to be changed to match. This has been test built on x86_64, frv, arm and mips. Signed-off-by: NDavid Howells <dhowells@redhat.com> Tested-by: NRalf Baechle <ralf@linux-mips.org> Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Russell King 提交于
Fix the clock enable/disable tracking in the AMBA CLCD driver so that the driver doesn't try to disable an already disabled clock, thereby causing the clock (if shared) to become unbalanced. This resolves a problem with CLCD on LPC32xx ARM platforms. Reported-by: NKevin Wells <wellsk40@gmail.com> Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
-
- 15 8月, 2010 4 次提交
-
-
由 Zhang Rui 提交于
Remove deprecated ACPI processor procfs I/F, including: /proc/acpi/processor/CPUX/power /proc/acpi/processor/CPUX/limit /proc/acpi/processor/CPUX/info /proc/acpi/processor/CPUX/throttling still exists, as we don't have sysfs I/F available for now. Signed-off-by: NZhang Rui <rui.zhang@intel.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Zhang Rui 提交于
Introduce module parameter acpi.aml_debug_output. With acpi.aml_debug_output set, we can get AML debug object output (Store (AAA, Debug)), even with CONFIG_ACPI_DEBUG cleared. Together with the runtime custom method mechanism, we can debug AML code problems without rebuilding the kernel. Signed-off-by: NZhang Rui <rui.zhang@intel.com> Signed-off-by: NLen Brown <len.brown@intel.com>
-
由 Sam Ravnborg 提交于
unifdef-y and header-y has same semantic. So there is no need to have both. Drop the unifdef-y variant and sort all lines again Signed-off-by: NSam Ravnborg <sam@ravnborg.org>
-
由 Ira W. Snyder 提交于
Add support for exposing all GPIO pins as analog voltages. Though this is not an ideal use of the chip, some hardware engineers may decide that the LTC4245 meets their design requirements when studying the datasheet. The GPIO pins are sampled in round-robin fashion, meaning that a slow reader will see stale data. A userspace application can detect this, because it will get -EAGAIN when reading from a sysfs file which contains stale data. Users can choose to use this feature on a per-chip basis by using either platform data or the OF device tree (where applicable). Signed-off-by: NIra W. Snyder <iws@ovro.caltech.edu> Signed-off-by: NJean Delvare <khali@linux-fr.org>
-
- 14 8月, 2010 3 次提交
-
-
由 David Howells 提交于
Mark arguments to certain system calls as being const where they should be but aren't. The list includes: (*) The filename arguments of various stat syscalls, execve(), various utimes syscalls and some mount syscalls. (*) The filename arguments of some syscall helpers relating to the above. (*) The buffer argument of various write syscalls. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arnd Bergmann 提交于
The last user is gone, so we can safely remove this Signed-off-by: NArnd Bergmann <arnd@arndb.de> Cc: John Kacur <jkacur@redhat.com> Cc: Al Viro <viro@ZenIV.linux.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
-
由 Heiko Carstens 提交于
commit 4565f017 "dma-mapping: unify dma_get_cache_alignment implementations" causes build errors on !HAS_DMA architectures/platforms like s390 and sun3: include/linux/dma-mapping.h:145: error: static declaration of 'dma_get_cache_alignment' follows non-static declaration include/asm-generic/dma-mapping-broken.h:73: error: previous declaration of 'dma_get_cache_alignment' was here Fix this by adding an explicit ifdef. Cc: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 8月, 2010 4 次提交
-
-
由 Chris Metcalf 提交于
Signed-off-by: NChris Metcalf <cmetcalf@tilera.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NArnd Bergmann <arnd@arndb.de> Cc: Eric Paris <eparis@redhat.com>
-
由 Linus Torvalds 提交于
This reverts commit 3bcf3860 (and the accompanying commit c1e5c954 "vfs/fsnotify: fsnotify_close can delay the final work in fput" that was a horribly ugly hack to make it work at all). The 'struct file' approach not only causes that disgusting hack, it somehow breaks pulseaudio, probably due to some other subtlety with f_count handling. Fix up various conflicts due to later fsnotify work. Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Chris Wilson 提交于
The current computation, introduced with f12a15be, of FSEC_PER_SEC using the multiplication of (FSEC_PER_NSEC * NSEC_PER_SEC) is performed only with 32bit integers on small machines, resulting in an overflow and a *very* short intervals being programmed. An interrupt storm follows. Note that we also have to specify FSEC_PER_SEC as being long long to overcome the same limitations. Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk> Signed-off-by: NJohn Stultz <johnstul@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by: NIngo Molnar <mingo@elte.hu> Acked-by: NH. Peter Anvin <hpa@zytor.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 David Howells 提交于
Add a dummy printk function for the maintenance of unused printks through gcc format checking, and also so that side-effect checking is maintained too. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 8月, 2010 5 次提交
-
-
由 Adrian Hunter 提交于
Secure discard is the same as discard except that all copies of the discarded sectors (perhaps created by garbage collection) must also be erased. Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com> Acked-by: NJens Axboe <axboe@kernel.dk> Cc: Kyungmin Park <kmpark@infradead.org> Cc: Madhusudhan Chikkature <madhu.cr@ti.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ben Gardiner <bengardiner@nanometrics.ca> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Adrian Hunter 提交于
SD/MMC cards tend to support an erase operation. In addition, eMMC v4.4 cards can support secure erase, trim and secure trim operations that are all variants of the basic erase command. SD/MMC device attributes "erase_size" and "preferred_erase_size" have been added. "erase_size" is the minimum size, in bytes, of an erase operation. For MMC, "erase_size" is the erase group size reported by the card. Note that "erase_size" does not apply to trim or secure trim operations where the minimum size is always one 512 byte sector. For SD, "erase_size" is 512 if the card is block-addressed, 0 otherwise. SD/MMC cards can erase an arbitrarily large area up to and including the whole card. When erasing a large area it may be desirable to do it in smaller chunks for three reasons: 1. A single erase command will make all other I/O on the card wait. This is not a problem if the whole card is being erased, but erasing one partition will make I/O for another partition on the same card wait for the duration of the erase - which could be a several minutes. 2. To be able to inform the user of erase progress. 3. The erase timeout becomes too large to be very useful. Because the erase timeout contains a margin which is multiplied by the size of the erase area, the value can end up being several minutes for large areas. "erase_size" is not the most efficient unit to erase (especially for SD where it is just one sector), hence "preferred_erase_size" provides a good chunk size for erasing large areas. For MMC, "preferred_erase_size" is the high-capacity erase size if a card specifies one, otherwise it is based on the capacity of the card. For SD, "preferred_erase_size" is the allocation unit size specified by the card. "preferred_erase_size" is in bytes. Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com> Acked-by: NJens Axboe <axboe@kernel.dk> Cc: Kyungmin Park <kmpark@infradead.org> Cc: Madhusudhan Chikkature <madhu.cr@ti.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ben Gardiner <bengardiner@nanometrics.ca> Cc: <linux-mmc@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jan Kara 提交于
Commit 83ba7b07 ("writeback: simplify the write back thread queue") broke writeback_in_progress() as in that commit we started to remove work items from the list at the moment we start working on them and not at the moment they are finished. Thus if the flusher thread was doing some work but there was no other work queued, writeback_in_progress() returned false. This could in particular cause unnecessary queueing of background writeback from balance_dirty_pages() or writeout work from writeback_sb_if_idle(). This patch fixes the problem by introducing a bit in the bdi state which indicates that the flusher thread is processing some work and uses this bit for writeback_in_progress() test. NOTE: Both callsites of writeback_in_progress() (namely, writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually need a different information than what writeback_in_progress() provides. They would need to know whether *the kind of writeback they are going to submit* is already queued. But this information isn't that simple to provide so let's fix writeback_in_progress() for the time being. Signed-off-by: NJan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Wu Fengguang <fengguang.wu@intel.com> Acked-by: NJens Axboe <jaxboe@fusionio.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Wu Fengguang 提交于
Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(), so that the latter can be avoided when under global dirty background threshold (which is the normal state for most systems). Signed-off-by: NWu Fengguang <fengguang.wu@intel.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Thomas Gleixner 提交于
The ACPI_PREEMPTION_POINT() logic was introduced in commit 8bd108d1 (ACPICA: add preemption point after each opcode parse). The follow up commits abe1dfab, 138d1569, c084ca70 tried to fix the preemption logic back and forth, but nobody noticed that the usage of in_atomic_preempt_off() in that context is wrong. The check which guards the call of cond_resched() is: if (!in_atomic_preempt_off() && !irqs_disabled()) in_atomic_preempt_off() is not intended for general use as the comment above the macro definition clearly says: * Check whether we were atomic before we did preempt_disable(): * (used by the scheduler, *after* releasing the kernel lock) On a CONFIG_PREEMPT=n kernel the usage of in_atomic_preempt_off() works by accident, but with CONFIG_PREEMPT=y it's just broken. The whole purpose of the ACPI_PREEMPTION_POINT() is to reduce the latency on a CONFIG_PREEMPT=n kernel, so make ACPI_PREEMPTION_POINT() depend on CONFIG_PREEMPT=n and remove the in_atomic_preempt_off() check. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16210 [akpm@linux-foundation.org: fix build] Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Len Brown <lenb@kernel.org> Cc: Francois Valenduc <francois.valenduc@tvcablenet.be> Cc: Lin Ming <ming.m.lin@intel.com> Cc: <stable@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-