1. 29 4月, 2010 2 次提交
  2. 28 4月, 2010 1 次提交
  3. 27 4月, 2010 1 次提交
    • T
      block: implement bd_claiming and claiming block · 6b4517a7
      Tejun Heo 提交于
      Currently, device claiming for exclusive open is done after low level
      open - disk->fops->open() - has completed successfully.  This means
      that exclusive open attempts while a device is already exclusively
      open will fail only after disk->fops->open() is called.
      
      cdrom driver issues commands during open() which means that O_EXCL
      open attempt can unintentionally inject commands to in-progress
      command stream for burning thus disturbing burning process.  In most
      cases, this doesn't cause problems because the first command to be
      issued is TUR which most devices can process in the middle of burning.
      However, depending on how a device replies to TUR during burning,
      cdrom driver may end up issuing further commands.
      
      This can't be resolved trivially by moving bd_claim() before doing
      actual open() because that means an open attempt which will end up
      failing could interfere other legit O_EXCL open attempts.
      ie. unconfirmed open attempts can fail others.
      
      This patch resolves the problem by introducing claiming block which is
      started by bd_start_claiming() and terminated either by bd_claim() or
      bd_abort_claiming().  bd_claim() from inside a claiming block is
      guaranteed to succeed and once a claiming block is started, other
      bd_start_claiming() or bd_claim() attempts block till the current
      claiming block is terminated.
      
      bd_claim() can still be used standalone although now it always
      synchronizes against claiming blocks, so the existing users will keep
      working without any change.
      
      blkdev_open() and open_bdev_exclusive() are converted to use claiming
      blocks so that exclusive open attempts from these functions don't
      interfere with the existing exclusive open.
      
      This problem was discovered while investigating bko#15403.
      
        https://bugzilla.kernel.org/show_bug.cgi?id=15403
      
      The burning problem itself can be resolved by updating userspace
      probing tools to always open w/ O_EXCL.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMatthias-Christian Ott <ott@mirix.org>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      6b4517a7
  4. 25 4月, 2010 2 次提交
  5. 24 4月, 2010 1 次提交
  6. 22 4月, 2010 4 次提交
  7. 21 4月, 2010 1 次提交
    • V
      blkio: Fix blkio crash during rq stat update · 7f1dc8a2
      Vivek Goyal 提交于
      blkio + cfq was crashing even when two sequential readers were put in two
      separate cgroups (group_isolation=0).
      
      The reason being that cfqq can migrate across groups based on its being
      sync-noidle or not, it can happen that at request insertion time, cfqq
      belonged to one cfqg and at request dispatch time, it belonged to root
      group. In this case request stats per cgroup can go wrong and it also runs
      into BUG_ON().
      
      This patch implements rq stashing away a cfq group pointer and not relying
      on cfqq->cfqg pointer alone for rq stat accounting.
      
      [   65.163523] ------------[ cut here ]------------
      [   65.164301] kernel BUG at block/blk-cgroup.c:117!
      [   65.164301] invalid opcode: 0000 [#1] SMP
      [   65.164301] last sysfs file: /sys/devices/pci0000:00/0000:00:05.0/0000:60:00.1/host9/rport-9:0-0/target9:0:0/9:0:0:2/block/sde/stat
      [   65.164301] CPU 1
      [   65.164301] Modules linked in: dm_round_robin dm_multipath qla2xxx scsi_transport_fc dm_zero dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]
      [   65.164301]
      [   65.164301] Pid: 4505, comm: fio Not tainted 2.6.34-rc4-blk-for-35 #34 0A98h/HP xw8600 Workstation
      [   65.164301] RIP: 0010:[<ffffffff8121924f>]  [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf
      [   65.164301] RSP: 0018:ffff8800ba5a79e8  EFLAGS: 00010046
      [   65.164301] RAX: 0000000000000096 RBX: ffff8800bb268d60 RCX: 0000000000000000
      [   65.164301] RDX: ffff8800bb268eb8 RSI: 0000000000000000 RDI: ffff8800bb268e00
      [   65.164301] RBP: ffff8800ba5a7a08 R08: 0000000000000064 R09: 0000000000000001
      [   65.164301] R10: 0000000000079640 R11: ffff8800a0bd5bf0 R12: ffff8800bab4af01
      [   65.164301] R13: ffff8800bab4af00 R14: ffff8800bb1d8928 R15: 0000000000000000
      [   65.164301] FS:  00007f18f75056f0(0000) GS:ffff880001e40000(0000) knlGS:0000000000000000
      [   65.164301] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   65.164301] CR2: 000000000040e7f0 CR3: 00000000ba52b000 CR4: 00000000000006e0
      [   65.164301] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [   65.164301] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      [   65.164301] Process fio (pid: 4505, threadinfo ffff8800ba5a6000, task ffff8800ba45ae80)
      [   65.164301] Stack:
      [   65.164301]  ffff8800ba5a7a08 ffff8800ba722540 ffff8800bab4af68 ffff8800bab4af68
      [   65.164301] <0> ffff8800ba5a7a38 ffffffff8121d814 ffff8800ba722540 ffff8800bab4af68
      [   65.164301] <0> ffff8800ba722540 ffff8800a08f6800 ffff8800ba5a7a68 ffffffff8121d8ca
      [   65.164301] Call Trace:
      [   65.164301]  [<ffffffff8121d814>] cfq_remove_request+0xe4/0x116
      [   65.164301]  [<ffffffff8121d8ca>] cfq_dispatch_insert+0x84/0xe1
      [   65.164301]  [<ffffffff8121e833>] cfq_dispatch_requests+0x767/0x8e8
      [   65.164301]  [<ffffffff8120e524>] ? submit_bio+0xc3/0xcc
      [   65.164301]  [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35
      [   65.164301]  [<ffffffff8120ea8d>] blk_peek_request+0x191/0x1a7
      [   65.164301]  [<ffffffffa000109c>] ? dm_get_live_table+0x44/0x4f [dm_mod]
      [   65.164301]  [<ffffffffa0002799>] dm_request_fn+0x38/0x14c [dm_mod]
      [   65.164301]  [<ffffffff810ad657>] ? sync_page_killable+0x0/0x35
      [   65.164301]  [<ffffffff8120f600>] __generic_unplug_device+0x32/0x37
      [   65.164301]  [<ffffffff8120f8a0>] generic_unplug_device+0x2e/0x3c
      [   65.164301]  [<ffffffffa00011a6>] dm_unplug_all+0x42/0x5b [dm_mod]
      [   65.164301]  [<ffffffff8120b063>] blk_unplug+0x29/0x2d
      [   65.164301]  [<ffffffff8120b079>] blk_backing_dev_unplug+0x12/0x14
      [   65.164301]  [<ffffffff81108a82>] block_sync_page+0x35/0x39
      [   65.164301]  [<ffffffff810ad64e>] sync_page+0x41/0x4a
      [   65.164301]  [<ffffffff810ad665>] sync_page_killable+0xe/0x35
      [   65.164301]  [<ffffffff81589027>] __wait_on_bit_lock+0x46/0x8f
      [   65.164301]  [<ffffffff810ad52d>] __lock_page_killable+0x66/0x6d
      [   65.164301]  [<ffffffff81055fd4>] ? wake_bit_function+0x0/0x33
      [   65.164301]  [<ffffffff810ad560>] lock_page_killable+0x2c/0x2e
      [   65.164301]  [<ffffffff810aebfd>] generic_file_aio_read+0x361/0x4f0
      [   65.164301]  [<ffffffff810e906c>] do_sync_read+0xcb/0x108
      [   65.164301]  [<ffffffff811e32a3>] ? security_file_permission+0x16/0x18
      [   65.164301]  [<ffffffff810e96d3>] vfs_read+0xab/0x108
      [   65.164301]  [<ffffffff810e97f0>] sys_read+0x4a/0x6e
      [   65.164301]  [<ffffffff81002b5b>] system_call_fastpath+0x16/0x1b
      [   65.164301] Code: 00 74 1c 48 8b 8b 60 01 00 00 48 85 c9 75 04 0f 0b eb fe 48 ff c9 48 89 8b 60 01 00 00 eb 1a 48 8b 8b 58 01 00 00 48 85 c9 75 04 <0f> 0b eb fe 48 ff c9 48 89 8b 58 01 00 00 45 84 e4 74 16 48 8b
      [   65.164301] RIP  [<ffffffff8121924f>] blkiocg_update_io_remove_stats+0x5b/0xaf
      [   65.164301]  RSP <ffff8800ba5a79e8>
      [   65.164301] ---[ end trace 1b2b828753032e68 ]---
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      7f1dc8a2
  8. 20 4月, 2010 2 次提交
  9. 19 4月, 2010 2 次提交
  10. 16 4月, 2010 1 次提交
  11. 15 4月, 2010 1 次提交
  12. 14 4月, 2010 2 次提交
    • D
      rcu: Better explain the condition parameter of rcu_dereference_check() · c08c68dd
      David Howells 提交于
      Better explain the condition parameter of
      rcu_dereference_check() that describes the conditions under
      which the dereference is permitted to take place (and
      incorporate Yong Zhang's suggestion).  This condition is only
      checked under lockdep proving.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: eric.dumazet@gmail.com
      LKML-Reference: <1270852752-25278-2-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c08c68dd
    • P
      rcu: Add rcu_access_pointer and rcu_dereference_protected · b62730ba
      Paul E. McKenney 提交于
      This patch adds variants of rcu_dereference() that handle
      situations where the RCU-protected data structure cannot change,
      perhaps due to our holding the update-side lock, or where the
      RCU-protected pointer is only to be fetched, not dereferenced.
      These are needed due to some performance concerns with using
      rcu_dereference() where it is not required, aside from the need
      for lockdep/sparse checking.
      
      The new rcu_access_pointer() primitive is for the case where the
      pointer is be fetch and not dereferenced.  This primitive may be
      used without protection, RCU or otherwise, due to the fact that
      it uses ACCESS_ONCE().
      
      The new rcu_dereference_protected() primitive is for the case
      where updates are prevented, for example, due to holding the
      update-side lock.  This primitive does neither ACCESS_ONCE() nor
      smp_read_barrier_depends(), so can only be used when updates are
      somehow prevented.
      Suggested-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: laijs@cn.fujitsu.com
      Cc: dipankar@in.ibm.com
      Cc: mathieu.desnoyers@polymtl.ca
      Cc: josh@joshtriplett.org
      Cc: dvhltc@us.ibm.com
      Cc: niv@us.ibm.com
      Cc: peterz@infradead.org
      Cc: rostedt@goodmis.org
      Cc: Valdis.Kletnieks@vt.edu
      Cc: dhowells@redhat.com
      Cc: eric.dumazet@gmail.com
      LKML-Reference: <1270852752-25278-1-git-send-email-paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b62730ba
  13. 12 4月, 2010 1 次提交
    • T
      NFSv4: fix delegated locking · 0df5dd4a
      Trond Myklebust 提交于
      Arnaud Giersch reports that NFSv4 locking is broken when we hold a
      delegation since commit 8e469ebd (NFSv4:
      Don't allow posix locking against servers that don't support it).
      
      According to Arnaud, the lock succeeds the first time he opens the file
      (since we cannot do a delegated open) but then fails after we start using
      delegated opens.
      
      The following patch fixes it by ensuring that locking behaviour is
      governed by a per-filesystem capability flag that is initially set, but
      gets cleared if the server ever returns an OPEN without the
      NFS4_OPEN_RESULT_LOCKTYPE_POSIX flag being set.
      Reported-by: NArnaud Giersch <arnaud.giersch@iut-bm.univ-fcomte.fr>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Cc: stable@kernel.org
      0df5dd4a
  14. 10 4月, 2010 4 次提交
  15. 09 4月, 2010 3 次提交
    • D
      blkio: Add io_merged stat · 812d4026
      Divyesh Shah 提交于
      This includes both the number of bios merged into requests belonging to this
      cgroup as well as the number of requests merged together.
      In the past, we've observed different merging behavior across upstream kernels,
      some by design some actual bugs. This stat helps a lot in debugging such
      problems when applications report decreased throughput with a new kernel
      version.
      
      This needed adding an extra elevator function to capture bios being merged as I
      did not want to pollute elevator code with blkiocg knowledge and hence needed
      the accounting invocation to come from CFQ.
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      812d4026
    • D
      blkio: Changes to IO controller additional stats patches · 84c124da
      Divyesh Shah 提交于
      that include some minor fixes and addresses all comments.
      
      Changelog: (most based on Vivek Goyal's comments)
      o renamed blkiocg_reset_write to blkiocg_reset_stats
      o more clarification in the documentation on io_service_time and io_wait_time
      o Initialize blkg->stats_lock
      o rename io_add_stat to blkio_add_stat and declare it static
      o use bool for direction and sync
      o derive direction and sync info from existing rq methods
      o use 12 for major:minor string length
      o define io_service_time better to cover the NCQ case
      o add a separate reset_stats interface
      o make the indexed stats a 2d array to simplify macro and function pointer code
      o blkio.time now exports in jiffies as before
      o Added stats description in patch description and
        Documentation/cgroup/blkio-controller.txt
      o Prefix all stats functions with blkio and make them static as applicable
      o replace IO_TYPE_MAX with IO_TYPE_TOTAL
      o Moved #define constant to top of blk-cgroup.c
      o Pass dev_t around instead of char *
      o Add note to documentation file about resetting stats
      o use BLK_CGROUP_MODULE in addition to BLK_CGROUP config option in #ifdef
        statements
      o Avoid struct request specific knowledge in blk-cgroup. blk-cgroup.h now has
        rq_direction() and rq_sync() functions which are used by CFQ and when using
        io-controller at a higher level, bio_* functions can be added.
      
      Signed-off-by: Divyesh Shah<dpshah@google.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      84c124da
    • M
      libata: Fix accesses at LBA28 boundary (old bug, but nasty) (v2) · 45c4d015
      Mark Lord 提交于
      Most drives from Seagate, Hitachi, and possibly other brands,
      do not allow LBA28 access to sector number 0x0fffffff (2^28 - 1).
      So instead use LBA48 for such accesses.
      
      This bug could bite a lot of systems, especially when the user has
      taken care to align partitions to 4KB boundaries. On misaligned systems,
      it is less likely to be encountered, since a 4KB read would end at
      0x10000000 rather than at 0x0fffffff.
      Signed-off-by: NMark Lord <mlord@pobox.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      45c4d015
  16. 08 4月, 2010 1 次提交
  17. 07 4月, 2010 7 次提交
  18. 06 4月, 2010 4 次提交
    • T
      libata: unlock HPA if device shrunk · 445d211b
      Tejun Heo 提交于
      Some BIOSes don't configure HPA during boot but do so while resuming.
      This causes harddrives to shrink during resume making libata detach
      and reattach them.  This can be worked around by unlocking HPA if old
      size equals native size.
      
      Add ATA_DFLAG_UNLOCK_HPA so that HPA unlocking can be controlled
      per-device and update ata_dev_revalidate() such that it sets
      ATA_DFLAG_UNLOCK_HPA and fails with -EIO when the above condition is
      detected.
      
      This patch fixes the following bug.
      
        https://bugzilla.kernel.org/show_bug.cgi?id=15396Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NOleksandr Yermolenko <yaa.bta@gmail.com>
      Signed-off-by: NJeff Garzik <jgarzik@redhat.com>
      445d211b
    • M
      laptop-mode: Make flushes per-device · 31373d09
      Matthew Garrett 提交于
      One of the features of laptop-mode is that it forces a writeout of dirty
      pages if something else triggers a physical read or write from a device.
      The current implementation flushes pages on all devices, rather than only
      the one that triggered the flush. This patch alters the behaviour so that
      only the recently accessed block device is flushed, preventing other
      disks being spun up for no terribly good reason.
      Signed-off-by: NMatthew Garrett <mjg@redhat.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      31373d09
    • H
      Input: matrix_keypad - allow platform to disable key autorepeat · 9d32c305
      H Hartley Sweeten 提交于
      In an embedded system the matrix_keypad driver might be used to
      interface with an external control panel and not an actual keyboard.
      On the control panel some of the keys could be used to turn on/off
      various functions.  If key autorepeat is enabled this causes the
      function to quickly toggle between the on and off states and makes
      operation difficult.
      
      Add an option in the platform-specific data to disable the key
      autorepeat.
      Signed-off-by: NH Hartley Sweeten <hsweeten@visionengravers.com>
      Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
      9d32c305
    • N
      Fix up possibly racy module refcounting · 5fbfb18d
      Nick Piggin 提交于
      Module refcounting is implemented with a per-cpu counter for speed.
      However there is a race when tallying the counter where a reference may
      be taken by one CPU and released by another.  Reference count summation
      may then see the decrement without having seen the previous increment,
      leading to lower than expected count.  A module which never has its
      actual reference drop below 1 may return a reference count of 0 due to
      this race.
      
      Module removal generally runs under stop_machine, which prevents this
      race causing bugs due to removal of in-use modules.  However there are
      other real bugs in module.c code and driver code (module_refcount is
      exported) where the callers do not run under stop_machine.
      
      Fix this by maintaining running per-cpu counters for the number of
      module refcount increments and the number of refcount decrements.  The
      increments are tallied after the decrements, so any decrement seen will
      always have its corresponding increment counted.  The final refcount is
      the difference of the total increments and decrements, preventing a
      low-refcount from being returned.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5fbfb18d