1. 09 2月, 2012 1 次提交
  2. 08 2月, 2012 2 次提交
    • T
      block: don't call elevator callbacks for plug merges · 07c2bd37
      Tejun Heo 提交于
      Plug merge calls two elevator callbacks outside queue lock -
      elevator_allow_merge_fn() and elevator_bio_merged_fn().  Although
      attempt_plug_merge() suggests that elevator is guaranteed to be there
      through the existing request on the plug list, nothing prevents plug
      merge from calling into dying or initializing elevator.
      
      For regular merges, bypass ensures elvpriv count to reach zero, which
      in turn prevents merges as all !ELVPRIV requests get REQ_SOFTBARRIER
      from forced back insertion.  Plug merge doesn't check ELVPRIV, and, as
      the requests haven't gone through elevator insertion yet, it doesn't
      have SOFTBARRIER set allowing merges on a bypassed queue.
      
      This, for example, leads to the following crash during elevator
      switch.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
       IP: [<ffffffff813b34e9>] cfq_allow_merge+0x49/0xa0
       PGD 112cbc067 PUD 115d5c067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP
       CPU 1
       Modules linked in: deadline_iosched
      
       Pid: 819, comm: dd Not tainted 3.3.0-rc2-work+ #76 Bochs Bochs
       RIP: 0010:[<ffffffff813b34e9>]  [<ffffffff813b34e9>] cfq_allow_merge+0x49/0xa0
       RSP: 0018:ffff8801143a38f8  EFLAGS: 00010297
       RAX: 0000000000000000 RBX: ffff88011817ce28 RCX: ffff880116eb6cc0
       RDX: 0000000000000000 RSI: ffff880118056e20 RDI: ffff8801199512f8
       RBP: ffff8801143a3908 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000000 R12: ffff880118195708
       R13: ffff880118052aa0 R14: ffff8801143a3d50 R15: ffff880118195708
       FS:  00007f19f82cb700(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000008 CR3: 0000000112c6a000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process dd (pid: 819, threadinfo ffff8801143a2000, task ffff880116eb6cc0)
       Stack:
        ffff88011817ce28 ffff880118195708 ffff8801143a3928 ffffffff81391bba
        ffff88011817ce28 ffff880118195708 ffff8801143a3948 ffffffff81391bf1
        ffff88011817ce28 0000000000000000 ffff8801143a39a8 ffffffff81398e3e
       Call Trace:
        [<ffffffff81391bba>] elv_rq_merge_ok+0x4a/0x60
        [<ffffffff81391bf1>] elv_try_merge+0x21/0x40
        [<ffffffff81398e3e>] blk_queue_bio+0x8e/0x390
        [<ffffffff81396a5a>] generic_make_request+0xca/0x100
        [<ffffffff81396b04>] submit_bio+0x74/0x100
        [<ffffffff811d45c2>] __blockdev_direct_IO+0x1ce2/0x3450
        [<ffffffff811d0dc7>] blkdev_direct_IO+0x57/0x60
        [<ffffffff811460b5>] generic_file_aio_read+0x6d5/0x760
        [<ffffffff811986b2>] do_sync_read+0xe2/0x120
        [<ffffffff81199345>] vfs_read+0xc5/0x180
        [<ffffffff81199501>] sys_read+0x51/0x90
        [<ffffffff81aeac12>] system_call_fastpath+0x16/0x1b
      
      There are multiple ways to fix this including making plug merge check
      ELVPRIV; however,
      
      * Calling into elevator outside queue lock is confusing and
        error-prone.
      
      * Requests on plug list aren't known to the elevator.  They aren't on
        the elevator yet, so there's no elevator specific state to update.
      
      * Given the nature of plug merges - collecting bio's for the same
        purpose from the same issuer - elevator specific restrictions aren't
        applicable.
      
      So, simply don't call into elevator methods from plug merge by moving
      elv_bio_merged() from bio_attempt_*_merge() to blk_queue_bio(), and
      using blk_try_merge() in attempt_plug_merge().
      
      This is based on Jens' patch to skip elevator_allow_merge_fn() from
      plug merge.
      
      Note that this makes per-cgroup merged stats skip plug merging.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <4F16F3CA.90904@kernel.dk>
      Original-patch-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      07c2bd37
    • T
      block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions · 050c8ea8
      Tejun Heo 提交于
      blk_rq_merge_ok() is the elevator-neutral part of merge eligibility
      test.  blk_try_merge() determines merge direction and expects the
      caller to have tested elv_rq_merge_ok() previously.
      
      elv_rq_merge_ok() now wraps blk_rq_merge_ok() and then calls
      elv_iosched_allow_merge().  elv_try_merge() is removed and the two
      callers are updated to call elv_rq_merge_ok() explicitly followed by
      blk_try_merge().  While at it, make rq_merge_ok() functions return
      bool.
      
      This is to prepare for plug merge update and doesn't introduce any
      behavior change.
      
      This is based on Jens' patch to skip elevator_allow_merge_fn() from
      plug merge.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <4F16F3CA.90904@kernel.dk>
      Original-patch-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      050c8ea8
  3. 07 2月, 2012 1 次提交
    • T
      block: strip out locking optimization in put_io_context() · 11a3122f
      Tejun Heo 提交于
      put_io_context() performed a complex trylock dancing to avoid
      deferring ioc release to workqueue.  It was also broken on UP because
      trylock was always assumed to succeed which resulted in unbalanced
      preemption count.
      
      While there are ways to fix the UP breakage, even the most
      pathological microbench (forced ioc allocation and tight fork/exit
      loop) fails to show any appreciable performance benefit of the
      optimization.  Strip it out.  If there turns out to be workloads which
      are affected by this change, simpler optimization from the discussion
      thread can be applied later.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <1328514611.21268.66.camel@sli10-conroe>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      11a3122f
  4. 18 1月, 2012 19 次提交
  5. 17 1月, 2012 3 次提交
  6. 16 1月, 2012 1 次提交
    • P
      netfilter: revert user-space expectation helper support · 9bf04646
      Pablo Neira Ayuso 提交于
      This patch partially reverts:
      3d058d7b netfilter: rework user-space expectation helper support
      that was applied during the 3.2 development cycle.
      
      After this patch, the tree remains just like before patch bc01befd,
      that initially added the preliminary infrastructure.
      
      I decided to partially revert this patch because the approach
      that I proposed to resolve this problem is broken in NAT setups.
      Moreover, a new infrastructure will be submitted for the 3.3.x
      development cycle that resolve the existing issues while
      providing a neat solution.
      
      Since nobody has been seriously using this infrastructure in
      user-space, the removal of this feature should affect any know
      FOSS project (to my knowledge).
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9bf04646
  7. 15 1月, 2012 2 次提交
    • P
      block: fail SCSI passthrough ioctls on partition devices · 0bfc96cb
      Paolo Bonzini 提交于
      Linux allows executing the SG_IO ioctl on a partition or LVM volume, and
      will pass the command to the underlying block device.  This is
      well-known, but it is also a large security problem when (via Unix
      permissions, ACLs, SELinux or a combination thereof) a program or user
      needs to be granted access only to part of the disk.
      
      This patch lets partitions forward a small set of harmless ioctls;
      others are logged with printk so that we can see which ioctls are
      actually sent.  In my tests only CDROM_GET_CAPABILITY actually occurred.
      Of course it was being sent to a (partition on a) hard disk, so it would
      have failed with ENOTTY and the patch isn't changing anything in
      practice.  Still, I'm treating it specially to avoid spamming the logs.
      
      In principle, this restriction should include programs running with
      CAP_SYS_RAWIO.  If for example I let a program access /dev/sda2 and
      /dev/sdb, it still should not be able to read/write outside the
      boundaries of /dev/sda2 independent of the capabilities.  However, for
      now programs with CAP_SYS_RAWIO will still be allowed to send the
      ioctls.  Their actions will still be logged.
      
      This patch does not affect the non-libata IDE driver.  That driver
      however already tests for bd != bd->bd_contains before issuing some
      ioctl; it could be restricted further to forbid these ioctls even for
      programs running with CAP_SYS_ADMIN/CAP_SYS_RAWIO.
      
      Cc: linux-scsi@vger.kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: James Bottomley <JBottomley@parallels.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      [ Make it also print the command name when warning - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0bfc96cb
    • P
      block: add and use scsi_blk_cmd_ioctl · 577ebb37
      Paolo Bonzini 提交于
      Introduce a wrapper around scsi_cmd_ioctl that takes a block device.
      
      The function will then be enhanced to detect partition block devices
      and, in that case, subject the ioctls to whitelisting.
      
      Cc: linux-scsi@vger.kernel.org
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: James Bottomley <JBottomley@parallels.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      577ebb37
  8. 14 1月, 2012 1 次提交
  9. 13 1月, 2012 10 次提交