1. 07 9月, 2009 1 次提交
    • R
      IB/mad: Fix possible lock-lock-timer deadlock · 6b2eef8f
      Roland Dreier 提交于
      Lockdep reported a possible deadlock with cm_id_priv->lock,
      mad_agent_priv->lock and mad_agent_priv->timed_work.timer; this
      happens because the mad module does
      
      	cancel_delayed_work(&mad_agent_priv->timed_work);
      
      while holding mad_agent_priv->lock.  cancel_delayed_work() internally
      does del_timer_sync(&mad_agent_priv->timed_work.timer).
      
      This can turn into a deadlock because mad_agent_priv->lock is taken
      inside cm_id_priv->lock, so we can get the following set of contexts
      that deadlock each other:
      
       A: holding cm_id_priv->lock, waiting for mad_agent_priv->lock
       B: holding mad_agent_priv->lock, waiting for del_timer_sync()
       C: interrupt during mad_agent_priv->timed_work.timer that takes
          cm_id_priv->lock
      
      Fix this by using the new __cancel_delayed_work() interface (which
      internally does del_timer() instead of del_timer_sync()) in all the
      places where we are holding a lock.
      
      Addresses: http://bugzilla.kernel.org/show_bug.cgi?id=13757Reported-by: NBart Van Assche <bart.vanassche@gmail.com>
      Signed-off-by: NRoland Dreier <rolandd@cisco.com>
      6b2eef8f
  2. 06 9月, 2009 28 次提交
  3. 05 9月, 2009 11 次提交
    • S
      firewire: sbp2: fix freeing of unallocated memory · baed6b82
      Stefan Richter 提交于
      If a target writes invalid status (typically status of a command that
      already timed out), firewire-sbp2 attempts to put away an ORB that
      doesn't exist.  https://bugzilla.redhat.com/show_bug.cgi?id=519772Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      baed6b82
    • S
      firewire: ohci: fix Ricoh R5C832, video reception · 4fe0badd
      Stefan Richter 提交于
      In dual-buffer DMA mode, no video frames are ever received from R5C832
      by libdc1394.  Fallback to packet-per-buffer DMA works reliably.
      http://thread.gmane.org/gmane.linux.kernel.firewire.devel/13393/focus=13476Reported-by: NJonathan Cameron <jic23@cam.ac.uk>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      4fe0badd
    • S
      firewire: ohci: fix Agere FW643 and multiple cameras · fc383796
      Stefan Richter 提交于
      An Agere FW643 OHCI 1.1 card works fine for video reception from one
      camera but fails early if receiving from two cameras.  After a short
      while, no IR IRQ events occur and the context control register does not
      react anymore.  This happens regardless whether both IR DMA contexts are
      dual-buffer or one is dual-buffer and the other packet-per-buffer.
      
      This can be worked around by disabling dual buffer DMA mode entirely.
      http://sourceforge.net/mailarchive/message.php?msg_name=4A7C0594.2020208%40gmail.com
      (Reported by Samuel Audet.)
      
      In another report (by Jonathan Cameron), an FW643 works OK with two
      cameras in dual buffer mode.  Whether this is due to different chip
      revisions or different usage patterns (different video formats) is not
      yet clear.  However, as far as the current capabilities of
      firewire-core's isochronous I/O interface are concerned, simply
      switching off dual-buffer on non-working and working FW643s alike is not
      a problem in practice.  We only need to revisit this issue if we are
      going to enhance the interface, e.g. so that applications can explicitly
      choose modes.
      Reported-by: NSamuel Audet <samuel.audet@gmail.com>
      Reported-by: NJonathan Cameron <jic23@cam.ac.uk>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      fc383796
    • S
      firewire: core: fix crash in iso resource management · 1821bc19
      Stefan Richter 提交于
      This fixes a regression due to post 2.6.30 commit "firewire: core: do
      not DMA-map stack addresses" 6fdc0370.
      
      As David Moore noted, a previously correct sizeof() expression became
      wrong since the commit changed its argument from an array to a pointer.
      This resulted in an oops in ohci_cancel_packet in the shared workqueue
      thread's context when an isochronous resource was to be freed.
      Reported-by: NJonathan Cameron <jic23@cam.ac.uk>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      1821bc19
    • S
      ocfs2: ocfs2_write_begin_nolock() should handle len=0 · 8379e7c4
      Sunil Mushran 提交于
      Bug introduced by mainline commit e7432675
      The bug causes ocfs2_write_begin_nolock() to oops when len=0.
      Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
      Cc: stable@kernel.org
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      8379e7c4
    • M
      dm snapshot: fix on disk chunk size validation · ae0b7448
      Mikulas Patocka 提交于
      Fix some problems seen in the chunk size processing when activating a
      pre-existing snapshot.
      
      For a new snapshot, the chunk size can either be supplied by the creator
      or a default value can be used.  For an existing snapshot, the
      chunk size in the snapshot header on disk should always be used.
      
      If someone attempts to load an existing snapshot and has the 'default
      chunk size' option set, the kernel uses its default value even when it
      is incorrect for the snapshot being loaded.  This patch ensures the
      correct on-disk value is always used.
      
      Secondly, when the code does use the chunk size stored on the disk it is
      prudent to revalidate it, so the code can exit cleanly if it got
      corrupted as happened in
      https://bugzilla.redhat.com/show_bug.cgi?id=461506 .
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ae0b7448
    • M
      dm exception store: split set_chunk_size · 2defcc3f
      Mikulas Patocka 提交于
      Break the function set_chunk_size to two functions in preparation for
      the fix in the following patch.
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      2defcc3f
    • M
      dm snapshot: fix header corruption race on invalidation · 61578dcd
      Mikulas Patocka 提交于
      If a persistent snapshot fills up, a race can corrupt the on-disk header
      which causes a crash on any future attempt to activate the snapshot
      (typically while booting).  This patch fixes the race.
      
      When the snapshot overflows, __invalidate_snapshot is called, which calls
      snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that
      calls write_header. write_header constructs the new header in the "area"
      location.
      
      Concurrently, an existing kcopyd job may finish, call copy_callback
      and commit_exception method, that goes to persistent_commit_exception.
      persistent_commit_exception doesn't do locking, relying on the fact that
      callbacks are single-threaded, but it can race with snapshot invalidation and
      overwrite the header that is just being written while the snapshot is being
      invalidated.
      
      The result of this race is a corrupted header being written that can
      lead to a crash on further reactivation (if chunk_size is zero in the
      corrupted header).
      
      The fix is to use separate memory areas for each.
      
      See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      61578dcd
    • M
      dm snapshot: refactor zero_disk_area to use chunk_io · 02d2fd31
      Mikulas Patocka 提交于
      Refactor chunk_io to prepare for the fix in the following patch.
      
      Pass an area pointer to chunk_io and simplify zero_disk_area to use
      chunk_io.  No functional change.
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      02d2fd31
    • J
      dm log: userspace add luid to distinguish between concurrent log instances · 7ec23d50
      Jonathan Brassow 提交于
      Device-mapper userspace logs (like the clustered log) are
      identified by a universally unique identifier (UUID).  This
      identifier is used to associate requests from the kernel to
      a specific log in userspace.  The UUID must be unique everywhere,
      since multiple machines may use this identifier when communicating
      about a particular log, as is the case for cluster logs.
      
      Sometimes, device-mapper/LVM may re-use a UUID.  This is the
      case during pvmoves, when moving from one segment of an LV
      to another, or when resizing a mirror, etc.  In these cases,
      a new log is created with the same UUID and loaded in the
      "inactive" slot.  When a device-mapper "resume" is issued,
      the "live" table is deactivated and the new "inactive" table
      becomes "live".  (The "inactive" table can also be removed
      via a device-mapper 'clear' command.)
      
      The above two issues were colliding.  More than one log was being
      created with the same UUID, and there was no way to distinguish
      between them.  So, sometimes the wrong log would be swapped
      out during the exchange.
      
      The solution is to create a locally unique identifier,
      'luid', to go along with the UUID.  This new identifier is used
      to determine exactly which log is being referenced by the kernel
      when the log exchange is made.  The identifier is not
      universally safe, but it does not need to be, since
      create/destroy/suspend/resume operations are bound to a specific
      machine; and these are the operations that make up the exchange.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      7ec23d50
    • J
      dm raid1: do not allow log_failure variable to unset after being set · d2b69864
      Jonathan Brassow 提交于
      This patch fixes a bug which was triggering a case where the primary leg
      could not be changed on failure even when the mirror was in-sync.
      
      The case involves the failure of the primary device along with
      the transient failure of the log device.  The problem is that
      bios can be put on the 'failures' list (due to log failure)
      before 'fail_mirror' is called due to the primary device failure.
      Normally, this is fine, but if the log device failure is transient,
      a subsequent iteration of the work thread, 'do_mirror', will
      reset 'log_failure'.  The 'do_failures' function then resets
      the 'in_sync' variable when processing bios on the failures list.
      The 'in_sync' variable is what is used to determine if the
      primary device can be switched in the event of a failure.  Since
      this has been reset, the primary device is incorrectly assumed
      to be not switchable.
      
      The case has been seen in the cluster mirror context, where one
      machine realizes the log device is dead before the other machines.
      As the responsibilities of the server migrate from one node to
      another (because the mirror is being reconfigured due to the failure),
      the new server may think for a moment that the log device is fine -
      thus resetting the 'log_failure' variable.
      
      In any case, it is inappropiate for us to reset the 'log_failure'
      variable.  The above bug simply illustrates that it can actually
      hurt us.
      
      Cc: stable@kernel.org
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d2b69864