提交 · 50eaeb323a170e231263ccb433bb2f99bd9e27ac · openeuler / Kernel

29 4月, 2010 5 次提交

D
cfq-iosched: fix broken cfq_ref_get_cfqf() for CONFIG_BLK_CGROUP=y && CFQ_GROUP_IOSCHED=n · 50eaeb32
由 Dmitry Monakhov 提交于 4月 28, 2010
```
We should return the cfq_group for this case, not NULL.
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
```
50eaeb32

blkdev: add blkdev_issue_zeroout helper function · 3f14d792

由 Dmitry Monakhov 提交于 4月 28, 2010

- Add bio_batch helper primitive. This is rather generic primitive
  for submitting/waiting a complex request which consists of several
  bios.
- blkdev_issue_zeroout() generate number of zero filed write bios.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

3f14d792

blkdev: move blkdev_issue helper functions to separate file · f31e7e40

由 Dmitry Monakhov 提交于 4月 28, 2010

Move blkdev_issue_discard from blk-barrier.c because it is
not barrier related.
Later the file will be populated by other helpers.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f31e7e40

blkdev: allow async blkdev_issue_flush requests · f17e232e

由 Dmitry Monakhov 提交于 4月 28, 2010

In some places caller don't want to wait a request to complete.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

f17e232e

blkdev: generalize flags for blkdev_issue_fn functions · fbd9b09a

由 Dmitry Monakhov 提交于 4月 28, 2010

The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

fbd9b09a

27 4月, 2010 4 次提交

block: implement bd_claiming and claiming block · 6b4517a7

由 Tejun Heo 提交于 4月 07, 2010

Currently, device claiming for exclusive open is done after low level
open - disk->fops->open() - has completed successfully.  This means
that exclusive open attempts while a device is already exclusively
open will fail only after disk->fops->open() is called.

cdrom driver issues commands during open() which means that O_EXCL
open attempt can unintentionally inject commands to in-progress
command stream for burning thus disturbing burning process.  In most
cases, this doesn't cause problems because the first command to be
issued is TUR which most devices can process in the middle of burning.
However, depending on how a device replies to TUR during burning,
cdrom driver may end up issuing further commands.

This can't be resolved trivially by moving bd_claim() before doing
actual open() because that means an open attempt which will end up
failing could interfere other legit O_EXCL open attempts.
ie. unconfirmed open attempts can fail others.

This patch resolves the problem by introducing claiming block which is
started by bd_start_claiming() and terminated either by bd_claim() or
bd_abort_claiming().  bd_claim() from inside a claiming block is
guaranteed to succeed and once a claiming block is started, other
bd_start_claiming() or bd_claim() attempts block till the current
claiming block is terminated.

bd_claim() can still be used standalone although now it always
synchronizes against claiming blocks, so the existing users will keep
working without any change.

blkdev_open() and open_bdev_exclusive() are converted to use claiming
blocks so that exclusive open attempts from these functions don't
interfere with the existing exclusive open.

This problem was discovered while investigating bko#15403.

  https://bugzilla.kernel.org/show_bug.cgi?id=15403

The burning problem itself can be resolved by updating userspace
probing tools to always open w/ O_EXCL.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NMatthias-Christian Ott <ott@mirix.org>
Cc: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

6b4517a7

block: factor out bd_may_claim() · 1a3cbbc5

由 Tejun Heo 提交于 4月 07, 2010

Factor out bd_may_claim() from bd_claim(), add comments and apply a
couple of cosmetic edits.  This is to prepare for further updates to
claim path.
Signed-off-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

1a3cbbc5

blk-cgroup: config options re-arrangement · afc24d49

由 Vivek Goyal 提交于 4月 26, 2010

This patch fixes few usability and configurability issues.

o All the cgroup based controller options are configurable from
  "Genral Setup/Control Group Support/" menu. blkio is the only exception.
  Hence make this option visible in above menu and make it configurable from
  there to bring it inline with rest of the cgroup based controllers.

o Get rid of CONFIG_DEBUG_CFQ_IOSCHED.

  This option currently does two things.

  - Enable printing of cgroup paths in blktrace
  - Enables CONFIG_DEBUG_BLK_CGROUP, which in turn displays additional stat
    files in cgroup.

  If we are using group scheduling, blktrace data is of not really much use
  if cgroup information is not present. To get this data, currently one has to
  also enable CONFIG_DEBUG_CFQ_IOSCHED, which in turn brings the overhead of
  all the additional debug stat files which is not desired.

  Hence, this patch moves printing of cgroup paths under
  CONFIG_CFQ_GROUP_IOSCHED.

  This allows us to get rid of CONFIG_DEBUG_CFQ_IOSCHED completely. Now all
  the debug stat files are controlled only by CONFIG_DEBUG_BLK_CGROUP which
  can be enabled through config menu.
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NDivyesh Shah <dpshah@google.com>
Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

afc24d49

blkio: Fix another BUG_ON() crash due to cfqq movement across groups · e5ff082e

由 Vivek Goyal 提交于 4月 26, 2010

o Once in a while, I was hitting a BUG_ON() in blkio code. empty_time was
  assuming that upon slice expiry, group can't be marked empty already (except
  forced dispatch).

  But this assumption is broken if cfqq can move (group_isolation=0) across
  groups after receiving a request.

  I think most likely in this case we got a request in a cfqq and accounted
  the rq in one group, later while adding the cfqq to tree, we moved the queue
  to a different group which was already marked empty and after dispatch from
  slice we found group already marked empty and raised alarm.

  This patch does not error out if group is already marked empty. This can
  introduce some empty_time stat error only in case of group_isolation=0. This
  is better than crashing. In case of group_isolation=1 we should still get
  same stats as before this patch.

[  222.308546] ------------[ cut here ]------------
[  222.309311] kernel BUG at block/blk-cgroup.c:236!
[  222.309311] invalid opcode: 0000 [#1] SMP
[  222.309311] last sysfs file: /sys/devices/virtual/block/dm-3/queue/scheduler
[  222.309311] CPU 1
[  222.309311] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[  222.309311]
[  222.309311] Pid: 4780, comm: fio Not tainted 2.6.34-rc4-blkio-config #68 0A98h/HP xw8600 Workstation
[  222.309311] RIP: 0010:[<ffffffff8121ad88>]  [<ffffffff8121ad88>] blkiocg_set_start_empty_time+0x50/0x83
[  222.309311] RSP: 0018:ffff8800ba6e79f8  EFLAGS: 00010002
[  222.309311] RAX: 0000000000000082 RBX: ffff8800a13b7990 RCX: ffff8800a13b7808
[  222.309311] RDX: 0000000000002121 RSI: 0000000000000082 RDI: ffff8800a13b7a30
[  222.309311] RBP: ffff8800ba6e7a18 R08: 0000000000000000 R09: 0000000000000001
[  222.309311] R10: 000000000002f8c8 R11: ffff8800ba6e7ad8 R12: ffff8800a13b78ff
[  222.309311] R13: ffff8800a13b7990 R14: 0000000000000001 R15: ffff8800a13b7808
[  222.309311] FS:  00007f3beec476f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
[  222.309311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  222.309311] CR2: 000000000040e7f0 CR3: 00000000a12d5000 CR4: 00000000000006e0
[  222.309311] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  222.309311] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  222.309311] Process fio (pid: 4780, threadinfo ffff8800ba6e6000, task ffff8800b3d6bf00)
[  222.309311] Stack:
[  222.309311]  0000000000000001 ffff8800bab17a48 ffff8800bab17a48 ffff8800a13b7800
[  222.309311] <0> ffff8800ba6e7a68 ffffffff8121da35 ffff880000000001 00ff8800ba5c5698
[  222.309311] <0> ffff8800ba6e7a68 ffff8800a13b7800 0000000000000000 ffff8800bab17a48
[  222.309311] Call Trace:
[  222.309311]  [<ffffffff8121da35>] __cfq_slice_expired+0x2af/0x3ec
[  222.309311]  [<ffffffff8121fd7b>] cfq_dispatch_requests+0x2c8/0x8e8
[  222.309311]  [<ffffffff8120f1cd>] ? spin_unlock_irqrestore+0xe/0x10
[  222.309311]  [<ffffffff8120fb1a>] ? blk_insert_cloned_request+0x70/0x7b
[  222.309311]  [<ffffffff81210461>] blk_peek_request+0x191/0x1a7
[  222.309311]  [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod]
[  222.309311]  [<ffffffff810ae61f>] ? sync_page_killable+0x0/0x35
[  222.309311]  [<ffffffff81210fd4>] __generic_unplug_device+0x32/0x37
[  222.309311]  [<ffffffff81211274>] generic_unplug_device+0x2e/0x3c
[  222.309311]  [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod]
[  222.309311]  [<ffffffff8120ca37>] blk_unplug+0x29/0x2d
[  222.309311]  [<ffffffff8120ca4d>] blk_backing_dev_unplug+0x12/0x14
[  222.309311]  [<ffffffff81109a7a>] block_sync_page+0x35/0x39
[  222.309311]  [<ffffffff810ae616>] sync_page+0x41/0x4a
[  222.309311]  [<ffffffff810ae62d>] sync_page_killable+0xe/0x35
[  222.309311]  [<ffffffff8158aa59>] __wait_on_bit_lock+0x46/0x8f
[  222.309311]  [<ffffffff810ae4f5>] __lock_page_killable+0x66/0x6d
[  222.309311]  [<ffffffff81056f9c>] ? wake_bit_function+0x0/0x33
[  222.309311]  [<ffffffff810ae528>] lock_page_killable+0x2c/0x2e
[  222.309311]  [<ffffffff810afbc5>] generic_file_aio_read+0x361/0x4f0
[  222.309311]  [<ffffffff810ea044>] do_sync_read+0xcb/0x108
[  222.309311]  [<ffffffff811e42f7>] ? security_file_permission+0x16/0x18
[  222.309311]  [<ffffffff810ea6ab>] vfs_read+0xab/0x108
[  222.309311]  [<ffffffff810ea7c8>] sys_read+0x4a/0x6e
[  222.309311]  [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
[  222.309311] Code: 58 01 00 00 00 48 89 c6 75 0a 48 83 bb 60 01 00 00 00 74 09 48 8d bb a0 00 00 00 eb 35 41 fe cc 74 0d f6 83 c0 01 00 00 04 74 04 <0f> 0b eb fe 48 89 75 e8 e8 be e0 de ff 66 83 8b c0 01 00 00 04
[  222.309311] RIP  [<ffffffff8121ad88>] blkiocg_set_start_empty_time+0x50/0x83
[  222.309311]  RSP <ffff8800ba6e79f8>
[  222.309311] ---[ end trace 32b4f71dffc15712 ]---
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Acked-by: NDivyesh Shah <dpshah@google.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e5ff082e

21 4月, 2010 1 次提交

blkio: Fix blkio crash during rq stat update · 7f1dc8a2

由 Vivek Goyal 提交于 4月 21, 2010

blkio + cfq was crashing even when two sequential readers were put in two
separate cgroups (group_isolation=0).

The reason being that cfqq can migrate across groups based on its being
sync-noidle or not, it can happen that at request insertion time, cfqq
belonged to one cfqg and at request dispatch time, it belonged to root
group. In this case request stats per cgroup can go wrong and it also runs
into BUG_ON().

This patch implements rq stashing away a cfq group pointer and not relying
on cfqq->cfqg pointer alone for rq stat accounting.

[   65.163523] ------------[ cut here ]------------
[   65.164301] kernel BUG at block/blk-cgroup.c:117!
[   65.164301] invalid opcode: 0000 [#1] SMP
[   65.164301] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:60:00.1/host9/rport-9:0-0/target9:0:0/9:0:0:2/block/sde/stat
[   65.164301] CPU 1
[   65.164301] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
[   65.164301]
[   65.164301] Pid: 4505, comm: fio Not tainted 2.6.34-rc4-blk-for-35 #34 0A98h/HP xw8600 Workstation
[   65.164301] RIP: 0010:[<ffffffff8121924f>]  [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf
[   65.164301] RSP: 0018:ffff8800ba5a79e8  EFLAGS: 00010046
[   65.164301] RAX: 0000000000000096 RBX: ffff8800bb268d60 RCX: 0000000000000000
[   65.164301] RDX: ffff8800bb268eb8 RSI: 0000000000000000 RDI: ffff8800bb268e00
[   65.164301] RBP: ffff8800ba5a7a08 R08: 0000000000000064 R09: 0000000000000001
[   65.164301] R10: 0000000000079640 R11: ffff8800a0bd5bf0 R12: ffff8800bab4af01
[   65.164301] R13: ffff8800bab4af00 R14: ffff8800bb1d8928 R15: 0000000000000000
[   65.164301] FS:  00007f18f75056f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
[   65.164301] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   65.164301] CR2: 000000000040e7f0 CR3: 00000000ba52b000 CR4: 00000000000006e0
[   65.164301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[   65.164301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[   65.164301] Process fio (pid: 4505, threadinfo ffff8800ba5a6000, task ffff8800ba45ae80)
[   65.164301] Stack:
[   65.164301]  ffff8800ba5a7a08 ffff8800ba722540 ffff8800bab4af68 ffff8800bab4af68
[   65.164301] <0> ffff8800ba5a7a38 ffffffff8121d814 ffff8800ba722540 ffff8800bab4af68
[   65.164301] <0> ffff8800ba722540 ffff8800a08f6800 ffff8800ba5a7a68 ffffffff8121d8ca
[   65.164301] Call Trace:
[   65.164301]  [<ffffffff8121d814>] cfq_remove_request+0xe4/0x116
[   65.164301]  [<ffffffff8121d8ca>] cfq_dispatch_insert+0x84/0xe1
[   65.164301]  [<ffffffff8121e833>] cfq_dispatch_requests+0x767/0x8e8
[   65.164301]  [<ffffffff8120e524>] ? submit_bio+0xc3/0xcc
[   65.164301]  [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35
[   65.164301]  [<ffffffff8120ea8d>] blk_peek_request+0x191/0x1a7
[   65.164301]  [<ffffffffa000109c>] ? dm_get_live_table+0x44/0x4f [dm_mod]
[   65.164301]  [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod]
[   65.164301]  [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35
[   65.164301]  [<ffffffff8120f600>] __generic_unplug_device+0x32/0x37
[   65.164301]  [<ffffffff8120f8a0>] generic_unplug_device+0x2e/0x3c
[   65.164301]  [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod]
[   65.164301]  [<ffffffff8120b063>] blk_unplug+0x29/0x2d
[   65.164301]  [<ffffffff8120b079>] blk_backing_dev_unplug+0x12/0x14
[   65.164301]  [<ffffffff81108a82>] block_sync_page+0x35/0x39
[   65.164301]  [<ffffffff810ad64e>] sync_page+0x41/0x4a
[   65.164301]  [<ffffffff810ad665>] sync_page_killable+0xe/0x35
[   65.164301]  [<ffffffff81589027>] __wait_on_bit_lock+0x46/0x8f
[   65.164301]  [<ffffffff810ad52d>] __lock_page_killable+0x66/0x6d
[   65.164301]  [<ffffffff81055fd4>] ? wake_bit_function+0x0/0x33
[   65.164301]  [<ffffffff810ad560>] lock_page_killable+0x2c/0x2e
[   65.164301]  [<ffffffff810aebfd>] generic_file_aio_read+0x361/0x4f0
[   65.164301]  [<ffffffff810e906c>] do_sync_read+0xcb/0x108
[   65.164301]  [<ffffffff811e32a3>] ? security_file_permission+0x16/0x18
[   65.164301]  [<ffffffff810e96d3>] vfs_read+0xab/0x108
[   65.164301]  [<ffffffff810e97f0>] sys_read+0x4a/0x6e
[   65.164301]  [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
[   65.164301] Code: 00 74 1c 48 8b 8b 60 01 00 00 48 85 c9 75 04 0f 0b eb fe 48 ff c9 48 89 8b 60 01 00 00 eb 1a 48 8b 8b 58 01 00 00 48 85 c9 75 04 <0f> 0b eb fe 48 ff c9 48 89 8b 58 01 00 00 45 84 e4 74 16 48 8b
[   65.164301] RIP  [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf
[   65.164301]  RSP <ffff8800ba5a79e8>
[   65.164301] ---[ end trace 1b2b828753032e68 ]---
Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

7f1dc8a2

16 4月, 2010 1 次提交

blkio: Initialize blkg->stats_lock for the root cfqg too · 8d2a91f8

由 Divyesh Shah 提交于 4月 16, 2010

This fixes the lockdep warning reported by Gui Jianfeng.
Signed-off-by: NDivyesh Shah <dpshah@google.com>
Reviewed-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

8d2a91f8

15 4月, 2010 2 次提交

blkio: fix for modular blk-cgroup build · b6ac23af

由 Divyesh Shah 提交于 4月 15, 2010

After merging the block tree, 20100414's linux-next build (x86_64
allmodconfig) failed like this:

ERROR: "get_gendisk" [block/blk-cgroup.ko] undefined!
ERROR: "sched_clock" [block/blk-cgroup.ko] undefined!

This happens because the two symbols aren't exported and hence not available
when blk-cgroup code is built as a module. I've tried to stay consistent with
the use of EXPORT_SYMBOL or EXPORT_SYMBOL_GPL with the other symbols in the
respective files.
Signed-off-by: NDivyesh Shah <dpshah@google.com>
Acked-by: NGui Jianfeng <guijianfeng@cn.fujitsu.cn>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

b6ac23af

block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer · c0d97e9c

由 Richard Kennedy 提交于 4月 14, 2010

blk_rq_timed_out_timer() relied on blk_add_timer() never returning a
timer value of zero, but commit 7838c15b
removed the code that bumped this value when it was zero.
Therefore when jiffies is near wrap we could get unlucky & not set the
timeout value correctly.

This patch uses a flag to indicate that the timeout value was set and so
handles jiffies wrap correctly, and it keeps all the logic in one
function so should be easier to maintain in the future.
Signed-off-by: NRichard Kennedy <richard@rsk.demon.co.uk>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

c0d97e9c

14 4月, 2010 3 次提交

blkio: Fix compile errors · 28baf442

由 Divyesh Shah 提交于 4月 14, 2010

Fixes compile errors in blk-cgroup code for empty_time stat and a merge fix in
CFQ. The first error was when CONFIG_DEBUG_CFQ_IOSCHED is not set.
Signed-off-by: NDivyesh Shah <dpshah@google.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

28baf442

Merge branch 'master' into for-2.6.35 · 4facdaec

由 Jens Axboe 提交于 4月 13, 2010

Conflicts:
	block/blk-cgroup.c
	block/cfq-iosched.c
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

4facdaec

block: Update to io-controller stats · a11cdaa7

由 Divyesh Shah 提交于 4月 13, 2010

Changelog from v1:
o Call blkiocg_update_idle_time_stats() at cfq_rq_enqueued() instead of at
  dispatch time.

Changelog from original patchset: (in response to Vivek Goyal's comments)
o group blkiocg_update_blkio_group_dequeue_stats() with other DEBUG functions
o rename blkiocg_update_set_active_queue_stats() to
  blkiocg_update_avg_queue_size_stats()
o s/request/io/ in blkiocg_update_request_add_stats() and
  blkiocg_update_request_remove_stats()
o Call cfq_del_timer() at request dispatch() instead of
  blkiocg_update_idle_time_stats()

Signed-off-by: Divyesh Shah<dpshah@google.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

a11cdaa7

13 4月, 2010 24 次提交

io-controller: Document for blkio.weight_device · da69da18

由 Gui Jianfeng 提交于 4月 13, 2010

Here is the document for blkio.weight_device
Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

da69da18

io-controller: Add a new interface "weight_device" for IO-Controller · 34d0f179

由 Gui Jianfeng 提交于 4月 13, 2010

Currently, IO Controller makes use of blkio.weight to assign weight for
all devices. Here a new user interface "blkio.weight_device" is introduced to
assign different weights for different devices. blkio.weight becomes the
default value for devices which are not configured by "blkio.weight_device"

You can use the following format to assigned specific weight for a given
device:
#echo "major:minor weight" > blkio.weight_device

major:minor represents device number.

And you can remove weight for a given device as following:
#echo "major:minor 0" > blkio.weight_device

V1->V2 changes:
- use user interface "weight_device" instead of "policy" suggested by Vivek
- rename some struct suggested by Vivek
- rebase to 2.6-block "for-linus" branch
- remove an useless list_empty check pointed out by Li Zefan
- some trivial typo fix

V2->V3 changes:
- Move policy_*_node() functions up to get rid of forward declarations
- rename related functions by adding prefix "blkio_"
Signed-off-by: NGui Jianfeng <guijianfeng@cn.fujitsu.com>
Acked-by: NVivek Goyal <vgoyal@redhat.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

34d0f179

L

Linux 2.6.34-rc4 · 0d0fb0f9
由 Linus Torvalds 提交于 4月 12, 2010

0d0fb0f9

Merge branch 'anonvma' · 64a8920f

由 Linus Torvalds 提交于 4月 12, 2010

* anonvma:
  anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma
  anon_vma: clone the anon_vma chain in the right order
  vma_adjust: fix the copying of anon_vma chains
  Simplify and comment on anon_vma re-use for anon_vma_prepare()

64a8920f

Merge master.kernel.org:/home/rmk/linux-2.6-arm · 50b88c46

由 Linus Torvalds 提交于 4月 12, 2010

* master.kernel.org:/home/rmk/linux-2.6-arm: (21 commits)
  ARM: Fix ioremap_cached()/ioremap_wc() for SMP platforms
  ARM: 6043/1: AT91 slow-clock resume: Don't wait for a disabled PLL to lock
  ARM: 6031/1: fix Thumb-2 decompressor
  ARM: 6029/1: ep93xx: gpio.c: local functions should be static
  ARM: 6028/1: ARM: add MAINTAINERS for U300
  ARM: 6024/1: bcmring: fix missing down on semaphore in dma.c
  MXC: mach_armadillo5x0: Add USB Host support.
  ARM mach-mx3: duplicated include
  ARM mach-mx3: duplicated include
  imx31: add watchdog device on litekit board.
  imx3: Add watchdog platform device support
  MXC: mach-mx31_3ds: add support for freescale mc13783 power management device.
  MXC: mach-mx31_3ds: Add SPI1 device support.
  MXC: mach-mx31_3ds: Add support for on board NAND Flash.
  MXC: mach-mx31_3ds: Update variable names over recent mach name modification.
  imx31: fix parent clock for rtc
  i.MX51: remove NFC AXI static mapping
  i.MX51: determine silicon revision dynamically
  i.MX51: map TZIC dynamically
  i.MX51: Use correct clock for gpt
  ...

50b88c46

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable · d6cf853d

由 Linus Torvalds 提交于 4月 12, 2010

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
  Btrfs: make sure the chunk allocator doesn't create zero length chunks
  Btrfs: fix data enospc check overflow

d6cf853d

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 · 6a945f38

由 Linus Torvalds 提交于 4月 12, 2010

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
  quota: Fix possible dq_flags corruption
  quota: Hide warnings about writes to the filesystem before quota was turned on
  ext3: symlink must be handled via filesystem specific operation
  ext2: symlink must be handled via filesystem specific operation

6a945f38

Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6 · 50fc88cb

由 Linus Torvalds 提交于 4月 12, 2010

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-udf-2.6:
  udf: add speciffic ->setattr callback
  udf: potential integer overflow

50fc88cb

Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus · 4505a493

由 Linus Torvalds 提交于 4月 12, 2010

* 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: (36 commits)
  MIPS: Calculate proper ebase value for 64-bit kernels
  MIPS: Alchemy: DB1200: Remove custom wait implementation
  MIPS: Big Sur: Make defconfig more useful.
  MIPS: Fix __vmalloc() etc. on MIPS for non-GPL modules
  MIPS: Sibyte: Fix M3 TLB exception handler workaround.
  MIPS: BCM63xx: Fix build failure in board_bcm963xx.c
  MIPS: uasm: Add OR instruction.
  MIPS: Sibyte: Apply M3 workaround only on affected chip types and versions.
  MIPS: BCM63xx: Initialize gpio_out_low & out_high to current value at boot.
  MIPS: BCM63xx: Register SSB SPROM fallback in board's first stage callback
  MIPS: BCM63xx: Fix typo in cpu-feature-overrides file.
  MIPS: BCM63xx: Add support for second uart.
  MIPS: BCM63xx: Fix double gpio registration.
  MIPS: BCM63xx: Add DWVS0 board
  MIPS: BCM63xx: Add the RTA1025W-16 BCM6348-based board to suppported boards.
  MIPS: BCM63xx: Fix BCM6338 and BCM6345 gpio count
  MIPS: libgcc.h: Checkpatch cleanup
  MIPS: Loongson-2F: Flush the branch target history in BTB and RAS
  MIPS: Move signal trampolines off of the stack.
  MIPS: Preliminary VDSO
  ...

4505a493

L
Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux · fedfb947
由 Linus Torvalds 提交于 4月 12, 2010
```
* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux:
  svcrdma: RDMA support not yet compatible with RPC6
```
fedfb947

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 · 44fa2b4b

由 Linus Torvalds 提交于 4月 12, 2010

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
  nilfs2: fix typo "numer" -> "number" in alloc.c
  nilfs2: Remove an uninitialization warning in nilfs_btree_propagate_v()
  nilfs2: fix a wrong type conversion in nilfs_ioctl()

44fa2b4b

anonvma: when setting up page->mapping, we need to pick the _oldest_ anonvma · ea90002b

由 Linus Torvalds 提交于 4月 12, 2010

Otherwise we might be mapping in a page in a new mapping, but that page
(through the swapcache) would later be mapped into an old mapping too.
The page->mapping must be the case that works for everybody, not just
the mapping that happened to page it in first.

Here's the scenario:

 - page gets allocated/mapped by process A. Let's call the anon_vma we
   associate the page with 'A' to keep it easy to track.

 - Process A forks, creating process B. The anon_vma in B is 'B', and has
   a chain that looks like 'B' -> 'A'. Everything is fine.

 - Swapping happens. The page (with mapping pointing to 'A') gets swapped
   out (perhaps not to disk - it's enough to assume that it's just not
   mapped any more, and lives entirely in the swap-cache)

 - Process B pages it in, which goes like this:

        do_swap_page ->
          page = lookup_swap_cache(entry);
         ...
          set_pte_at(mm, address, page_table, pte);
          page_add_anon_rmap(page, vma, address);

   And think about what happens here!

   In particular, what happens is that this will now be the "first"
   mapping of that page, so page_add_anon_rmap() used to do

        if (first)
                __page_set_anon_rmap(page, vma, address);

   and notice what anon_vma it will use? It will use the anon_vma for
   process B!

   What happens then? Trivial: process 'A' also pages it in (nothing
   happens, it's not the first mapping), and then process 'B' execve's
   or exits or unmaps, making anon_vma B go away.

   End result: process A has a page that points to anon_vma B, but
   anon_vma B does not exist any more.  This can go on forever.  Forget
   about RCU grace periods, forget about locking, forget anything like
   that.  The bug is simply that page->mapping points to an anon_vma
   that was correct at one point, but was _not_ the one that was shared
   by all users of that possible mapping.

Changing it to always use the deepest anon_vma in the anonvma chain gets
us to the safest model.

This can be improved in certain cases: if we know the page is private to
just this particular mapping (for example, it's a new page, or it is the
only swapcache entry), we could pick the top (most specific) anon_vma.

But that's a future optimization. Make it _work_ reliably first.
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "What do you know, I think you fixed it!" ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ea90002b

anon_vma: clone the anon_vma chain in the right order · 646d87b4

由 Linus Torvalds 提交于 4月 11, 2010

We want to walk the chain in reverse order when cloning it, so that the
order of the result chain will be the same as the order in the source
chain.  When we add entries to the chain, they go at the head of the
chain, so we want to add the source head last.
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "No, it still oopses" ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

646d87b4

vma_adjust: fix the copying of anon_vma chains · 287d97ac

由 Linus Torvalds 提交于 4月 10, 2010

When we move the boundaries between two vma's due to things like
mprotect, we need to make sure that the anon_vma of the pages that got
moved from one vma to another gets properly copied around.  And that was
not always the case, in this rather hard-to-follow code sequence.

Clarify the code, and fix it so that it copies the anon_vma from the
right source.
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "Yeah, not so much this one either" ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

287d97ac

Simplify and comment on anon_vma re-use for anon_vma_prepare() · d0e9fe17

由 Linus Torvalds 提交于 4月 10, 2010

This changes the anon_vma reuse case to require that we only reuse
simple anon_vma's - ie the case when the vma only has a single anon_vma
associated with it.

This means that a reuse of an anon_vma from an adjacent vma will always
guarantee that both vma's are associated not only with the same
anon_vma, they will also have the same anon_vma chain (of just a single
entry in this case).

And since anon_vma re-use was the only case where the same anon_vma
might be associated with different chains of anon_vma's, we now have the
case that every vma that shares the same anon_vma will always also have
the same chain.  That makes it much easier to think about merging vma's
that share the same anon_vma's: you can always just drop the other
anon_vma chain in anon_vma_merge() since you know that they are always
identical.

This also splits up the function to validate the anon_vma re-use, and
adds a lot of commentary about the possible races.
Reviewed-by: NRik van Riel <riel@redhat.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Tested-by: Borislav Petkov <bp@alien8.de> [ "That didn't fix it" ]
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d0e9fe17

quota: Fix possible dq_flags corruption · 08261673

由 Andrew Perepechko 提交于 4月 12, 2010

dq_flags are modified non-atomically in do_set_dqblk via __set_bit calls and
atomically for example in mark_dquot_dirty or clear_dquot_dirty. Hence a
change done by an atomic operation can be overwritten by a change done by a
non-atomic one. Fix the problem by using atomic bitops even in do_set_dqblk.
Signed-off-by: NAndrew Perepechko <andrew.perepechko@sun.com>
Signed-off-by: NJan Kara <jack@suse.cz>

08261673

quota: Hide warnings about writes to the filesystem before quota was turned on · 4c5e6c0e

由 Jan Kara 提交于 4月 06, 2010

For a root filesystem write to the filesystem before quota is turned on happens
regularly and there's no way around it because of writes to syslog, /etc/mtab,
and similar. So the warning is rather pointless for ordinary users. It's
still useful during development so we just hide the warning behind
__DQUOT_PARANOIA config option.
Signed-off-by: NJan Kara <jack@suse.cz>

4c5e6c0e

ext3: symlink must be handled via filesystem specific operation · 774f03fb

由 Dmitry Monakhov 提交于 3月 26, 2010

generic setattr implementation is no longer responsible for
quota transfer so synlinks must be handled via ext3_setattr.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

774f03fb

ext2: symlink must be handled via filesystem specific operation · fc7683a3

由 Dmitry Monakhov 提交于 3月 26, 2010

generic setattr implementation is no longer responsible for
quota transfer so synlinks must be handled via ext2_setattr.
Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org>
Signed-off-by: NJan Kara <jack@suse.cz>

fc7683a3

MIPS: Calculate proper ebase value for 64-bit kernels · f6be75d0

由 David Daney 提交于 4月 06, 2010

The ebase is relative to CKSEG0 not CAC_BASE. On a 32-bit kernel they
are the same thing, for a 64-bit kernel they are not.

It happens to kind of work on a 64-bit kernel as they both reference
the same physical memory. However since the CPU uses the CKSEG0 base,
determining if a J instruction will reach always gives the wrong result
unless we use the same number the CPU uses.
Signed-off-by: NDavid Daney <ddaney@caviumnetworks.com>
To: linux-mips@linux-mips.org
Patchwork: http://patchwork.linux-mips.org/patch/1093/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

f6be75d0

MIPS: Alchemy: DB1200: Remove custom wait implementation · d8000bee

由 Manuel Lauss 提交于 4月 03, 2010

While playing with the out-of-tree MAE driver module, the system would
panic after a while in the db1200 custom wait code after wakeup due to
a clobbered k0 register being used as target address of a store op.

Remove the custom wait implementation and revert back to the Alchemy-
recommended implementation already set as default.
Signed-off-by: NManuel Lauss <manuel.lauss@gmail.com>
To: Linux-MIPS <linux-mips@linux-mips.org>
Patchwork: http://patchwork.linux-mips.org/patch/1092/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

d8000bee

R
MIPS: Big Sur: Make defconfig more useful. · 2844e49f
由 Ralf Baechle 提交于 4月 03, 2010
```
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
```
2844e49f

MIPS: Fix __vmalloc() etc. on MIPS for non-GPL modules · 7b3e543d

由 Anton Altaparmakov 提交于 3月 25, 2010

Commit b3594a089f1c17ff919f8f78505c3f20e1f6f8ce (lmo) rsp.
35133692 (kernel.org) break non-GPL modules
that use __vmalloc() or any of the vmap(), vm_map_ram(), etc functions on
MIPS.

All those functions are EXPORT_SYMBOL() so are meant to be allowed to be
used by non-GPL kernel modules.  These calls all take page protection as
an argument which is normally a constant like PAGE_KERNEL.

This commit causes all protection constants like PAGE_KERNEL to not be
constants and instead to contain the GPL-only symbol _page_cachable_default.

This means that all calls to __vmalloc(), vmap(), etc, cause non-GPL
modules to fail to link with the complaint that they are trying to use the
GPL-only symbol _page_cachable_default...

Change EXPORT_SYMBOL_GPL(_page_cachable_default) to EXPORT_SYMBOL() for
non-GPL modules that call __vmalloc(), vmap(), vm_map_ram() etc.
Signed-off-by: NAnton Altaparmakov <aia21@cantab.net>
Cc: Chris Dearman <chris@mips.com>
Cc: linux-mips@linux-mips.org
Cc: linux-kernel@vger.kernel.org
Patchwork: http://patchwork.linux-mips.org/patch/1084/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

7b3e543d

MIPS: Sibyte: Fix M3 TLB exception handler workaround. · 3d45285d

由 Ralf Baechle 提交于 3月 23, 2010

The M3 workaround needs to cmpare the region and VPN2 fields only.
Signed-off-by: NRalf Baechle <ralf@linux-mips.org>

3d45285d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功