提交 · 7681bfeeccff5efa9eb29bf09249a3c400b15327 · openeuler / raspberrypi-kernel

19 10月, 2010 1 次提交

block: fix accounting bug on cross partition merges · 7681bfee

由 Yasuaki Ishimatsu 提交于 10月 19, 2010

/proc/diskstats would display a strange output as follows.

$ cat /proc/diskstats |grep sda
   8       0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
   8       1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
                                                ~~~~~~~~~~
   8       2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
   8       3 sda3 54 487 2188 92 0 0 0 0 0 88 92
   8       4 sda4 4 0 8 0 0 0 0 0 0 0 0
   8       5 sda5 81 2027 2130 138 0 0 0 0 0 87 137

Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.

The detailed root cause is as follows.

Assuming that there are two partition, sda1 and sda2.

1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
   is 0 and sda2's one is 1.

        | hd_struct->in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

2. A bio belongs to sda1 is issued and is merged into the request mentioned on
   step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
   from sda2 region to sda1 region. However the two partition's
   hd_struct->in_flight are not changed.

        | hd_struct->in_flight
   ---------------------------
   sda1 |          0
   sda2 |          1
   ---------------------------

3. The request is finished and blk_account_io_done() is called. In this case,
   sda2's hd_struct->in_flight, not a sda1's one, is decremented.

        | hd_struct->in_flight
   ---------------------------
   sda1 |         -1
   sda2 |          1
   ---------------------------

The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.

When reloading partition tables, quiesce IO to ensure that no
request references to the partition struct exists. When it is safe
to free the partition table, the IO for that device is restarted
again.
Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

7681bfee

15 10月, 2010 1 次提交

block: Make the integrity mapped property a bio flag · 495d2b38

由 Martin K. Petersen 提交于 10月 15, 2010

Previously we tracked whether the integrity metadata had been remapped
using a request flag. This was fine for low-level retries. However, if
an I/O was redriven by upper layers we would end up remapping again,
causing the retry to fail.

Deprecate the REQ_INTEGRITY flag and introduce BIO_MAPPED_INTEGRITY
which enables filesystems to notify lower layers that the bio in
question has already been remapped.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

495d2b38

14 10月, 2010 1 次提交

block: Ensure physical block size is unsigned int · 892b6f90

由 Martin K. Petersen 提交于 10月 13, 2010

Physical block size was declared unsigned int to accomodate the maximum
size reported by READ CAPACITY(16).  Make sure we use the right type in
the related functions.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

892b6f90

25 9月, 2010 1 次提交

Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK · e4ecda1b

由 Mark Lord 提交于 9月 25, 2010

Ensure that 'sysctl_hung_task_timeout_secs' is defined
even when CONFIG_DETECT_HUNG_TASK is not set.
This way we can safely reference it without need for
ifdefs in the code elsewhere.  eg. in block/blk-exec.c
Signed-off-by: NMark Lord <mlord@pobox.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

e4ecda1b

16 9月, 2010 1 次提交

blkio: Core implementation of throttle policy · e43473b7

由 Vivek Goyal 提交于 9月 15, 2010

o Actual implementation of throttling policy in block layer. Currently it
  implements READ and WRITE bytes per second throttling logic. IOPS throttling
  comes in later patches.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

e43473b7

15 9月, 2010 2 次提交

block, partition: add partition_meta_info to hd_struct · 6d1d8050

由 Will Drewry 提交于 8月 31, 2010

I'm reposting this patch series as v4 since there have been no additional
comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches
are against Linus's tree: 2bfc96a1
(2.6.36-rc3).

Would this patchset be suitable for inclusion in an mm branch?

This changes adds a partition_meta_info struct which itself contains a
union of structures that provide partition table specific metadata.

This change leaves the union empty. The subsequent patch includes an
implementation for CONFIG_EFI_PARTITION-based metadata.
Signed-off-by: NWill Drewry <wad@chromium.org>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

6d1d8050

block: fix an address space warning in blk-map.c · 14417799

由 Namhyung Kim 提交于 9月 15, 2010

Change type of 2nd parameter of blk_rq_aligned() into unsigned long
and remove unnecessary casting. Now we can call it with 'uaddr'
instead of 'ubuf' in __blk_rq_map_user() so that it can remove
following warnings from sparse:

 block/blk-map.c:57:31: warning: incorrect type in argument 2 (different address spaces)
 block/blk-map.c:57:31:    expected void *addr
 block/blk-map.c:57:31:    got void [noderef] <asn:1>*ubuf

However blk_rq_map_kern() needs one more local variable to handle it.
Signed-off-by: NNamhyung Kim <namhyung@gmail.com>
Signed-off-by: NJens Axboe <jaxboe@fusionio.com>

14417799

11 9月, 2010 2 次提交

block/scsi: Provide a limit on the number of integrity segments · 13f05c8d

由 Martin K. Petersen 提交于 9月 10, 2010

Some controllers have a hardware limit on the number of protection
information scatter-gather list segments they can handle.

Introduce a max_integrity_segments limit in the block layer and provide
a new scsi_host_template setting that allows HBA drivers to provide a
value suitable for the hardware.

Add support for honoring the integrity segment limit when merging both
bios and requests.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@carl.home.kernel.dk>

13f05c8d

Consolidate min_not_zero · c8bf1336

由 Martin K. Petersen 提交于 9月 10, 2010

We have several users of min_not_zero, each of them using their own
definition.  Move the define to kernel.h.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <axboe@carl.home.kernel.dk>

c8bf1336

22 8月, 2010 1 次提交

workqueue: Add basic tracepoints to track workqueue execution · e36c886a

由 Arjan van de Ven 提交于 8月 21, 2010

With the introduction of the new unified work queue thread pools,
we lost one feature: It's no longer possible to know which worker
is causing the CPU to wake out of idle. The result is that PowerTOP
now reports a lot of "kworker/a:b" instead of more readable results.

This patch adds a pair of tracepoints to the new workqueue code,
similar in style to the timer/hrtimer tracepoints.

With this pair of tracepoints, the next PowerTOP can correctly
report which work item caused the wakeup (and how long it took):

Interrupt (43) i915 time 3.51ms wakeups 141
Work ieee80211_iface_work time 0.81ms wakeups 29
Work do_dbs_timer time 0.55ms wakeups 24
Process Xorg time 21.36ms wakeups 4
Timer sched_rt_period_timer time 0.01ms wakeups 1
Signed-off-by: NArjan van de Ven <arjan@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e36c886a

21 8月, 2010 2 次提交

mm: make the vma list be doubly linked · 297c5eee

由 Linus Torvalds 提交于 8月 20, 2010

It's a really simple list, and several of the users want to go backwards
in it to find the previous vma.  So rather than have to look up the
previous entry with 'find_vma_prev()' or something similar, just make it
doubly linked instead.
Tested-by: NIan Campbell <ijc@hellion.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

297c5eee

kfifo: implement missing __kfifo_skip_r() · b35de43b

由 Andrea Righi 提交于 8月 19, 2010

kfifo_skip() is currently broken, due to the missing of the internal
helper function.  Add it.
Signed-off-by: NAndrea Righi <arighi@develer.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: NStefani Seibold <stefani@seibold.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b35de43b

19 8月, 2010 1 次提交

Fix the declaration of sys_execve() in asm-generic/syscalls.h · d15ca320

由 David Howells 提交于 8月 18, 2010

Fix the declaration of sys_execve() in asm-generic/syscalls.h to have
various consts applied to its pointers.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d15ca320

18 8月, 2010 11 次提交

ALSA: emu10k1 - delay the PCM interrupts (add pcm_irq_delay parameter) · 56385a12

由 Jaroslav Kysela 提交于 8月 18, 2010

With some hardware combinations, the PCM interrupts are acknowledged
before the period boundary from the emu10k1 chip. The midlevel PCM code
gets confused and the playback stream is interrupted.

It seems that the interrupt processing shift by 2 samples is enough
to fix this issue. This default value does not harm other,
non-affected hardware.

More information: Kernel bugzilla bug#16300

[A copmile warning fixed by tiwai]
Signed-off-by: NJaroslav Kysela <perex@perex.cz>
Cc: <stable@kernel.org>
Signed-off-by: NTakashi Iwai <tiwai@suse.de>

56385a12

fs: scale files_lock · 6416ccb7

由 Nick Piggin 提交于 8月 18, 2010

fs: scale files_lock

Improve scalability of files_lock by adding per-cpu, per-sb files lists,
protected with an lglock. The lglock provides fast access to the per-cpu lists
to add and remove files. It also provides a snapshot of all the per-cpu lists
(although this is very slow).

One difficulty with this approach is that a file can be removed from the list
by another CPU. We must track which per-cpu list the file is on with a new
variale in the file struct (packed into a hole on 64-bit archs). Scalability
could suffer if files are frequently removed from different cpu's list.

However loads with frequent removal of files imply short interval between
adding and removing the files, and the scheduler attempts to avoid moving
processes too far away. Also, even in the case of cross-CPU removal, the
hardware has much more opportunity to parallelise cacheline transfers with N
cachelines than with 1.

A worst-case test of 1 CPU allocating files subsequently being freed by N CPUs
degenerates to contending on a single lock, which is no worse than before. When
more than one CPU are allocating files, even if they are always freed by
different CPUs, there will be more parallelism than the single-lock case.

Testing results:

On a 2 socket, 8 core opteron, I measure the number of times the lock is taken
to remove the file, the number of times it is removed by the same CPU that
added it, and the number of times it is removed by the same node that added it.

Booting:    locks=  25049 cpu-hits=  23174 (92.5%) node-hits=  23945 (95.6%)
kbuild -j16 locks=2281913 cpu-hits=2208126 (96.8%) node-hits=2252674 (98.7%)
dbench 64   locks=4306582 cpu-hits=4287247 (99.6%) node-hits=4299527 (99.8%)

So a file is removed from the same CPU it was added by over 90% of the time.
It remains within the same node 95% of the time.

Tim Chen ran some numbers for a 64 thread Nehalem system performing a compile.

                throughput
2.6.34-rc2      24.5
+patch          24.9

                us      sys     idle    IO wait (in %)
2.6.34-rc2      51.25   28.25   17.25   3.25
+patch          53.75   18.5    19      8.75

So significantly less CPU time spent in kernel code, higher idle time and
slightly higher throughput.

Single threaded performance difference was within the noise of microbenchmarks.
That is not to say penalty does not exist, the code is larger and more memory
accesses required so it will be slightly slower.

Cc: linux-kernel@vger.kernel.org
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6416ccb7

lglock: introduce special lglock and brlock spin locks · 2dc91abe

由 Nick Piggin 提交于 8月 18, 2010

lglock: introduce special lglock and brlock spin locks

This patch introduces "local-global" locks (lglocks). These can be used to:

- Provide fast exclusive access to per-CPU data, with exclusive access to
  another CPU's data allowed but possibly subject to contention, and to provide
  very slow exclusive access to all per-CPU data.
- Or to provide very fast and scalable read serialisation, and to provide
  very slow exclusive serialisation of data (not necessarily per-CPU data).

Brlocks are also implemented as a short-hand notation for the latter use
case.

Thanks to Paul for local/global naming convention.

Cc: linux-kernel@vger.kernel.org
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2dc91abe

tty: fix fu_list abuse · d996b62a

由 Nick Piggin 提交于 8月 18, 2010

tty: fix fu_list abuse

tty code abuses fu_list, which causes a bug in remount,ro handling.

If a tty device node is opened on a filesystem, then the last link to the inode
removed, the filesystem will be allowed to be remounted readonly. This is
because fs_may_remount_ro does not find the 0 link tty inode on the file sb
list (because the tty code incorrectly removed it to use for its own purpose).
This can result in a filesystem with errors after it is marked "clean".

Taking idea from Christoph's initial patch, allocate a tty private struct
at file->private_data and put our required list fields in there, linking
file and tty. This makes tty nodes behave the same way as other device nodes
and avoid meddling with the vfs, and avoids this bug.

The error handling is not trivial in the tty code, so for this bugfix, I take
the simple approach of using __GFP_NOFAIL and don't worry about memory errors.
This is not a problem because our allocator doesn't fail small allocs as a rule
anyway. So proper error handling is left as an exercise for tty hackers.

[ Arguably filesystem's device inode would ideally be divorced from the
driver's pseudo inode when it is opened, but in practice it's not clear whether
that will ever be worth implementing. ]

Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

d996b62a

fs: cleanup files_lock locking · ee2ffa0d

由 Nick Piggin 提交于 8月 18, 2010

fs: cleanup files_lock locking

Lock tty_files with a new spinlock, tty_files_lock; provide helpers to
manipulate the per-sb files list; unexport the files_lock spinlock.

Cc: linux-kernel@vger.kernel.org
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: NAndi Kleen <ak@linux.intel.com>
Acked-by: NGreg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ee2ffa0d

fs: fs_struct rwlock to spinlock · 2a4419b5

由 Nick Piggin 提交于 8月 18, 2010

fs: fs_struct rwlock to spinlock

struct fs_struct.lock is an rwlock with the read-side used to protect root and
pwd members while taking references to them. Taking a reference to a path
typically requires just 2 atomic ops, so the critical section is very small.
Parallel read-side operations would have cacheline contention on the lock, the
dentry, and the vfsmount cachelines, so the rwlock is unlikely to ever give a
real parallelism increase.

Replace it with a spinlock to avoid one or two atomic operations in typical
path lookup fastpath.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

2a4419b5

remove SWRITE* I/O types · 9cb569d6

由 Christoph Hellwig 提交于 8月 11, 2010

These flags aren't real I/O types, but tell ll_rw_block to always
lock the buffer instead of giving up on a failed trylock.

Instead add a new write_dirty_buffer helper that implements this semantic
and use it from the existing SWRITE* callers.  Note that the ll_rw_block
code had a bug where it didn't promote WRITE_SYNC_PLUG properly, which
this patch fixes.

In the ufs code clean up the helper that used to call ll_rw_block
to mirror sync_dirty_buffer, which is the function it implements for
compound buffers.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

9cb569d6

kill BH_Ordered flag · 87e99511

由 Christoph Hellwig 提交于 8月 11, 2010

Instead of abusing a buffer_head flag just add a variant of
sync_dirty_buffer which allows passing the exact type of write
flag required.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

87e99511

spi.h: missing kernel-doc notation, please fix · 5c79a5ae

由 Ernst Schwab 提交于 8月 16, 2010

Added comments in kernel-doc notation for previously added struct fields.
Signed-off-by: NErnst Schwab <eschwab@online.de>
Acked-by: NRandy Dunlap <rdunlap@xenotime.net>
Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>

5c79a5ae

Make do_execve() take a const filename pointer · d7627467

由 David Howells 提交于 8月 17, 2010

Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:

arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type

This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to. This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel(). A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().

do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.

Further kernel_execve() and sys_execve() need to be changed to match.

This has been test built on x86_64, frv, arm and mips.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NRalf Baechle <ralf@linux-mips.org>
Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d7627467

VIDEO: amba clcd: don't disable an already disabled clock · 99c796df

由 Russell King 提交于 8月 17, 2010

Fix the clock enable/disable tracking in the AMBA CLCD driver so
that the driver doesn't try to disable an already disabled clock,
thereby causing the clock (if shared) to become unbalanced.

This resolves a problem with CLCD on LPC32xx ARM platforms.
Reported-by: NKevin Wells <wellsk40@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>

99c796df

15 8月, 2010 4 次提交

ACPI processor: remove deprecated ACPI procfs I/F · d09fe555

由 Zhang Rui 提交于 7月 15, 2010

Remove deprecated ACPI processor procfs I/F, including:
/proc/acpi/processor/CPUX/power
/proc/acpi/processor/CPUX/limit
/proc/acpi/processor/CPUX/info

/proc/acpi/processor/CPUX/throttling still exists,
as we don't have sysfs I/F available for now.
Signed-off-by: NZhang Rui <rui.zhang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

d09fe555

ACPI: introduce module parameter acpi.aml_debug_output · c637e486

由 Zhang Rui 提交于 7月 15, 2010

Introduce module parameter acpi.aml_debug_output.

With acpi.aml_debug_output set, we can get AML debug object output
(Store (AAA, Debug)), even with CONFIG_ACPI_DEBUG cleared.

Together with the runtime custom method mechanism,
we can debug AML code problems without rebuilding the kernel.
Signed-off-by: NZhang Rui <rui.zhang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

c637e486

include: replace unifdef-y with header-y · 60641aa1

由 Sam Ravnborg 提交于 8月 14, 2010

unifdef-y and header-y has same semantic.
So there is no need to have both.

Drop the unifdef-y variant and sort all lines again
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>

60641aa1

hwmon: (ltc4245) Expose all GPIO pins as analog voltages · 5950ec8d

由 Ira W. Snyder 提交于 8月 14, 2010

Add support for exposing all GPIO pins as analog voltages. Though this is
not an ideal use of the chip, some hardware engineers may decide that the
LTC4245 meets their design requirements when studying the datasheet.

The GPIO pins are sampled in round-robin fashion, meaning that a slow
reader will see stale data. A userspace application can detect this,
because it will get -EAGAIN when reading from a sysfs file which contains
stale data.

Users can choose to use this feature on a per-chip basis by using either
platform data or the OF device tree (where applicable).
Signed-off-by: NIra W. Snyder <iws@ovro.caltech.edu>
Signed-off-by: NJean Delvare <khali@linux-fr.org>

5950ec8d

14 8月, 2010 3 次提交

Mark arguments to certain syscalls as being const · c7887325

由 David Howells 提交于 8月 11, 2010

Mark arguments to certain system calls as being const where they should be but
aren't.  The list includes:

 (*) The filename arguments of various stat syscalls, execve(), various utimes
     syscalls and some mount syscalls.

 (*) The filename arguments of some syscall helpers relating to the above.

 (*) The buffer argument of various write syscalls.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c7887325

bkl: Remove locked .ioctl file operation · b19dd42f

由 Arnd Bergmann 提交于 7月 04, 2010

The last user is gone, so we can safely remove this
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>

b19dd42f

dma-mapping: fix build errors on !HAS_DMA architectures · e259f191

由 Heiko Carstens 提交于 8月 13, 2010

commit 4565f017 "dma-mapping: unify
dma_get_cache_alignment implementations" causes build errors on
!HAS_DMA architectures/platforms like s390 and sun3:

include/linux/dma-mapping.h:145: error: static declaration of 'dma_get_cache_alignment' follows non-static declaration
include/asm-generic/dma-mapping-broken.h:73: error: previous declaration of 'dma_get_cache_alignment' was here

Fix this by adding an explicit ifdef.

Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e259f191

13 8月, 2010 4 次提交

Add fanotify syscalls to <asm-generic/unistd.h>. · fad9e93e

由 Chris Metcalf 提交于 8月 11, 2010

Signed-off-by: NChris Metcalf <cmetcalf@tilera.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Cc: Eric Paris <eparis@redhat.com>

fad9e93e

Revert "fsnotify: store struct file not struct path" · 2069601b

由 Linus Torvalds 提交于 8月 12, 2010

This reverts commit 3bcf3860 (and the
accompanying commit c1e5c954 "vfs/fsnotify: fsnotify_close can delay
the final work in fput" that was a horribly ugly hack to make it work at
all).

The 'struct file' approach not only causes that disgusting hack, it
somehow breaks pulseaudio, probably due to some other subtlety with
f_count handling.

Fix up various conflicts due to later fsnotify work.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2069601b

x86/hpet: Use the FSEC_PER_SEC constant for femto-second periods · 4936a3b9

由 Chris Wilson 提交于 8月 09, 2010

The current computation, introduced with f12a15be, of FSEC_PER_SEC using
the multiplication of (FSEC_PER_NSEC * NSEC_PER_SEC) is performed only
with 32bit integers on small machines, resulting in an overflow and a
*very* short intervals being programmed.  An interrupt storm follows.

Note that we also have to specify FSEC_PER_SEC as being long long to
overcome the same limitations.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NJohn Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: NIngo Molnar <mingo@elte.hu>
Acked-by: NH. Peter Anvin <hpa@zytor.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4936a3b9

Add a dummy printk function for the maintenance of unused printks · 12fdff3f

由 David Howells 提交于 8月 12, 2010

Add a dummy printk function for the maintenance of unused printks through gcc
format checking, and also so that side-effect checking is maintained too.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

12fdff3f

12 8月, 2010 5 次提交

block: add secure discard · 8d57a98c

由 Adrian Hunter 提交于 8月 11, 2010

Secure discard is the same as discard except that all copies of the
discarded sectors (perhaps created by garbage collection) must also be
erased.
Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
Acked-by: NJens Axboe <axboe@kernel.dk>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ben Gardiner <bengardiner@nanometrics.ca>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8d57a98c

mmc: add erase, secure erase, trim and secure trim operations · dfe86cba

由 Adrian Hunter 提交于 8月 11, 2010

SD/MMC cards tend to support an erase operation.  In addition, eMMC v4.4
cards can support secure erase, trim and secure trim operations that are
all variants of the basic erase command.

SD/MMC device attributes "erase_size" and "preferred_erase_size" have been
added.

"erase_size" is the minimum size, in bytes, of an erase operation.  For
MMC, "erase_size" is the erase group size reported by the card.  Note that
"erase_size" does not apply to trim or secure trim operations where the
minimum size is always one 512 byte sector.  For SD, "erase_size" is 512
if the card is block-addressed, 0 otherwise.

SD/MMC cards can erase an arbitrarily large area up to and
including the whole card.  When erasing a large area it may
be desirable to do it in smaller chunks for three reasons:

    1. A single erase command will make all other I/O on the card
       wait.  This is not a problem if the whole card is being erased, but
       erasing one partition will make I/O for another partition on the
       same card wait for the duration of the erase - which could be a
       several minutes.

    2. To be able to inform the user of erase progress.

    3. The erase timeout becomes too large to be very useful.
       Because the erase timeout contains a margin which is multiplied by
       the size of the erase area, the value can end up being several
       minutes for large areas.

"erase_size" is not the most efficient unit to erase (especially for SD
where it is just one sector), hence "preferred_erase_size" provides a good
chunk size for erasing large areas.

For MMC, "preferred_erase_size" is the high-capacity erase size if a card
specifies one, otherwise it is based on the capacity of the card.

For SD, "preferred_erase_size" is the allocation unit size specified by
the card.

"preferred_erase_size" is in bytes.
Signed-off-by: NAdrian Hunter <adrian.hunter@nokia.com>
Acked-by: NJens Axboe <axboe@kernel.dk>
Cc: Kyungmin Park <kmpark@infradead.org>
Cc: Madhusudhan Chikkature <madhu.cr@ti.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ben Gardiner <bengardiner@nanometrics.ca>
Cc: <linux-mmc@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

dfe86cba

mm: fix writeback_in_progress() · 81d73a32

由 Jan Kara 提交于 8月 11, 2010

Commit 83ba7b07 ("writeback: simplify the write back thread queue")
broke writeback_in_progress() as in that commit we started to remove work
items from the list at the moment we start working on them and not at the
moment they are finished.  Thus if the flusher thread was doing some work
but there was no other work queued, writeback_in_progress() returned
false.  This could in particular cause unnecessary queueing of background
writeback from balance_dirty_pages() or writeout work from
writeback_sb_if_idle().

This patch fixes the problem by introducing a bit in the bdi state which
indicates that the flusher thread is processing some work and uses this
bit for writeback_in_progress() test.

NOTE: Both callsites of writeback_in_progress() (namely,
writeback_inodes_sb_if_idle() and balance_dirty_pages()) would actually
need a different information than what writeback_in_progress() provides.
They would need to know whether *the kind of writeback they are going to
submit* is already queued.  But this information isn't that simple to
provide so let's fix writeback_in_progress() for the time being.
Signed-off-by: NJan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Acked-by: NJens Axboe <jaxboe@fusionio.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

81d73a32

writeback: avoid unnecessary calculation of bdi dirty thresholds · 16c4042f

由 Wu Fengguang 提交于 8月 11, 2010

Split get_dirty_limits() into global_dirty_limits()+bdi_dirty_limit(), so
that the latter can be avoided when under global dirty background
threshold (which is the normal state for most systems).
Signed-off-by: NWu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

16c4042f

acpi: fix bogus preemption logic · 0a7992c9

由 Thomas Gleixner 提交于 8月 11, 2010

The ACPI_PREEMPTION_POINT() logic was introduced in commit 8bd108d1
(ACPICA: add preemption point after each opcode parse).  The follow up
commits abe1dfab, 138d1569, c084ca70 tried to fix the preemption logic
back and forth, but nobody noticed that the usage of
in_atomic_preempt_off() in that context is wrong.

The check which guards the call of cond_resched() is:

    if (!in_atomic_preempt_off() && !irqs_disabled())

in_atomic_preempt_off() is not intended for general use as the comment
above the macro definition clearly says:

 * Check whether we were atomic before we did preempt_disable():
 * (used by the scheduler, *after* releasing the kernel lock)

On a CONFIG_PREEMPT=n kernel the usage of in_atomic_preempt_off() works by
accident, but with CONFIG_PREEMPT=y it's just broken.

The whole purpose of the ACPI_PREEMPTION_POINT() is to reduce the latency
on a CONFIG_PREEMPT=n kernel, so make ACPI_PREEMPTION_POINT() depend on
CONFIG_PREEMPT=n and remove the in_atomic_preempt_off() check.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16210

[akpm@linux-foundation.org: fix build]
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Len Brown <lenb@kernel.org>
Cc: Francois Valenduc <francois.valenduc@tvcablenet.be>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0a7992c9