1. 11 9月, 2009 2 次提交
    • T
      scsi,block: update SCSI to handle mixed merge failures · da6c5c72
      Tejun Heo 提交于
      Update scsi_io_completion() such that it only fails requests till the
      next error boundary and retry the leftover.  This enables block layer
      to merge requests with different failfast settings and still behave
      correctly on errors.  Allow merge of requests of different failfast
      settings.
      
      As SCSI is currently the only subsystem which follows failfast status,
      there's no need to worry about other block drivers for now.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Niel Lambrechts <niel.lambrechts@gmail.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      da6c5c72
    • G
      md: Fix "strchr" [drivers/md/dm-log-userspace.ko] undefined! · 0d03d59d
      Geert Uytterhoeven 提交于
      Commit b8313b6d ("dm log: remove incorrect
      field from userspace table output") added a call to strstr() with a
      single-character "needle" string parameter.
      
      Unfortunately some versions of gcc replace such calls to strstr() by calls
      to strchr() behind our back.  This causes linking errors if strchr() is
      defined as an inline function in <asm/string.h> (e.g. on m68k):
      
      | WARNING: "strchr" [drivers/md/dm-log-userspace.ko] undefined!
      
      Avoid this by explicitly calling strchr() instead.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0d03d59d
  2. 09 9月, 2009 3 次提交
    • E
      aoe: allocate unused request_queue for sysfs · 7135a71b
      Ed Cashin 提交于
      Andy Whitcroft reported an oops in aoe triggered by use of an
      incorrectly initialised request_queue object:
      
        [ 2645.959090] kobject '<NULL>' (ffff880059ca22c0): tried to add
      		an uninitialized object, something is seriously wrong.
        [ 2645.959104] Pid: 6, comm: events/0 Not tainted 2.6.31-5-generic #24-Ubuntu
        [ 2645.959107] Call Trace:
        [ 2645.959139] [<ffffffff8126ca2f>] kobject_add+0x5f/0x70
        [ 2645.959151] [<ffffffff8125b4ab>] blk_register_queue+0x8b/0xf0
        [ 2645.959155] [<ffffffff8126043f>] add_disk+0x8f/0x160
        [ 2645.959161] [<ffffffffa01673c4>] aoeblk_gdalloc+0x164/0x1c0 [aoe]
      
      The request queue of an aoe device is not used but can be allocated in
      code that does not sleep.
      
      Bruno bisected this regression down to
      
        cd43e26f
      
        block: Expose stacked device queues in sysfs
      
      "This seems to generate /sys/block/$device/queue and its contents for
       everyone who is using queues, not just for those queues that have a
       non-NULL queue->request_fn."
      
      Addresses http://bugs.launchpad.net/bugs/410198
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13942
      
      Note that embedding a queue inside another object has always been
      an illegal construct, since the queues are reference counted and
      must persist until the last reference is dropped. So aoe was
      always buggy in this respect (Jens).
      Signed-off-by: NEd Cashin <ecashin@coraid.com>
      Cc: Andy Whitcroft <apw@canonical.com>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Bruno Premont <bonbons@linux-vserver.org>
      Cc: Martin K. Petersen <martin.petersen@oracle.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      7135a71b
    • L
      i915: disable interrupts before tearing down GEM state · e6890f6f
      Linus Torvalds 提交于
      Reinette Chatre reports a frozen system (with blinking keyboard LEDs)
      when switching from graphics mode to the text console, or when
      suspending (which does the same thing). With netconsole, the oops
      turned out to be
      
      	BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
      	IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
      
      and it's due to the i915_gem.c code doing drm_irq_uninstall() after
      having done i915_gem_idle(). And the i915_gem_idle() path will do
      
        i915_gem_idle() ->
          i915_gem_cleanup_ringbuffer() ->
            i915_gem_cleanup_hws() ->
              dev_priv->hw_status_page = NULL;
      
      but if an i915 interrupt comes in after this stage, it may want to
      access that hw_status_page, and gets the above NULL pointer dereference.
      
      And since the NULL pointer dereference happens from within an interrupt,
      and with the screen still in graphics mode, the common end result is
      simply a silently hung machine.
      
      Fix it by simply uninstalling the irq handler before idling rather than
      after. Fixes
      
          http://bugzilla.kernel.org/show_bug.cgi?id=13819Reported-and-tested-by: NReinette Chatre <reinette.chatre@intel.com>
      Acked-by: NJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e6890f6f
    • Z
      drm/i915: fix mask bits setting · 7c8460db
      Zhenyu Wang 提交于
      eDP is exclusive connector too, and add missing crtc_mask
      setting for TV.
      
      This fixes
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=14139Signed-off-by: NZhenyu Wang <zhenyuw@linux.intel.com>
      Reported-and-tested-by: NCarlos R. Mafra <crmafra2@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c8460db
  3. 07 9月, 2009 1 次提交
  4. 06 9月, 2009 3 次提交
    • D
      gianfar: Fix build. · d9d8e041
      David S. Miller 提交于
      Reported by Michael Guntsche <mike@it-loops.com>
      
      --------------------
      Commit
      38bddf04 gianfar: gfar_remove needs to call unregister_netdev()
      
      breaks the build of the gianfar driver because "dev" is undefined in
      this function. To quickly test rc9 I changed this to priv->ndev but I do
      not know if this is the correct one.
      --------------------
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d9d8e041
    • L
      pty: don't limit the writes to 'pty_space()' inside 'pty_write()' · ac89a917
      Linus Torvalds 提交于
      The whole write-room thing is something that is up to the _caller_ to
      worry about, not the pty layer itself.  The total buffer space will
      still be limited by the buffering routines themselves, so there is no
      advantage or need in having pty_write() artificially limit the size
      somehow.
      
      And what happened was that the caller (the n_tty line discipline, in
      this case) may have verified that there is room for 2 bytes to be
      written (for NL -> CRNL expansion), and it used to then do those writes
      as two single-byte writes.  And if the first byte written (CR) then
      caused a new tty buffer to be allocated, pty_space() may have returned
      zero when trying to write the second byte (LF), and then incorrectly
      failed the write - leading to a lost newline character.
      
      This should finally fix
      
      	http://bugzilla.kernel.org/show_bug.cgi?id=14015Reported-by: NMikael Pettersson <mikpe@it.uu.se>
      Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac89a917
    • L
      n_tty: do O_ONLCR translation as a single write · 37f81fa1
      Linus Torvalds 提交于
      When translating CR to CRNL in the n_tty line discipline, we did it as
      two tty_put_char() calls.  Which works, but is stupid, and has caused
      problems before too with bad interactions with the write_room() logic.
      The generic USB serial driver had that problem, for example.
      
      Now the pty layer had similar issues after being moved to the generic
      tty buffering code (in commit d945cb9c:
      "pty: Rework the pty layer to use the normal buffering logic").
      
      So stop doing the silly separate two writes, and do it as a single write
      instead.  That's what the n_tty layer already does for the space
      expansion of tabs (XTABS), and it means that we'll now always have just
      a single write for the CRNL to match the single 'tty_write_room()' test,
      which hopefully means that the next time somebody screws up buffering,
      it won't cause weeks of debugging.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      37f81fa1
  5. 05 9月, 2009 17 次提交
    • S
      firewire: sbp2: fix freeing of unallocated memory · baed6b82
      Stefan Richter 提交于
      If a target writes invalid status (typically status of a command that
      already timed out), firewire-sbp2 attempts to put away an ORB that
      doesn't exist.  https://bugzilla.redhat.com/show_bug.cgi?id=519772Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      baed6b82
    • S
      firewire: ohci: fix Ricoh R5C832, video reception · 4fe0badd
      Stefan Richter 提交于
      In dual-buffer DMA mode, no video frames are ever received from R5C832
      by libdc1394.  Fallback to packet-per-buffer DMA works reliably.
      http://thread.gmane.org/gmane.linux.kernel.firewire.devel/13393/focus=13476Reported-by: NJonathan Cameron <jic23@cam.ac.uk>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      4fe0badd
    • S
      firewire: ohci: fix Agere FW643 and multiple cameras · fc383796
      Stefan Richter 提交于
      An Agere FW643 OHCI 1.1 card works fine for video reception from one
      camera but fails early if receiving from two cameras.  After a short
      while, no IR IRQ events occur and the context control register does not
      react anymore.  This happens regardless whether both IR DMA contexts are
      dual-buffer or one is dual-buffer and the other packet-per-buffer.
      
      This can be worked around by disabling dual buffer DMA mode entirely.
      http://sourceforge.net/mailarchive/message.php?msg_name=4A7C0594.2020208%40gmail.com
      (Reported by Samuel Audet.)
      
      In another report (by Jonathan Cameron), an FW643 works OK with two
      cameras in dual buffer mode.  Whether this is due to different chip
      revisions or different usage patterns (different video formats) is not
      yet clear.  However, as far as the current capabilities of
      firewire-core's isochronous I/O interface are concerned, simply
      switching off dual-buffer on non-working and working FW643s alike is not
      a problem in practice.  We only need to revisit this issue if we are
      going to enhance the interface, e.g. so that applications can explicitly
      choose modes.
      Reported-by: NSamuel Audet <samuel.audet@gmail.com>
      Reported-by: NJonathan Cameron <jic23@cam.ac.uk>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      fc383796
    • S
      firewire: core: fix crash in iso resource management · 1821bc19
      Stefan Richter 提交于
      This fixes a regression due to post 2.6.30 commit "firewire: core: do
      not DMA-map stack addresses" 6fdc0370.
      
      As David Moore noted, a previously correct sizeof() expression became
      wrong since the commit changed its argument from an array to a pointer.
      This resulted in an oops in ohci_cancel_packet in the shared workqueue
      thread's context when an isochronous resource was to be freed.
      Reported-by: NJonathan Cameron <jic23@cam.ac.uk>
      Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>
      1821bc19
    • M
      dm snapshot: fix on disk chunk size validation · ae0b7448
      Mikulas Patocka 提交于
      Fix some problems seen in the chunk size processing when activating a
      pre-existing snapshot.
      
      For a new snapshot, the chunk size can either be supplied by the creator
      or a default value can be used.  For an existing snapshot, the
      chunk size in the snapshot header on disk should always be used.
      
      If someone attempts to load an existing snapshot and has the 'default
      chunk size' option set, the kernel uses its default value even when it
      is incorrect for the snapshot being loaded.  This patch ensures the
      correct on-disk value is always used.
      
      Secondly, when the code does use the chunk size stored on the disk it is
      prudent to revalidate it, so the code can exit cleanly if it got
      corrupted as happened in
      https://bugzilla.redhat.com/show_bug.cgi?id=461506 .
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ae0b7448
    • M
      dm exception store: split set_chunk_size · 2defcc3f
      Mikulas Patocka 提交于
      Break the function set_chunk_size to two functions in preparation for
      the fix in the following patch.
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      2defcc3f
    • M
      dm snapshot: fix header corruption race on invalidation · 61578dcd
      Mikulas Patocka 提交于
      If a persistent snapshot fills up, a race can corrupt the on-disk header
      which causes a crash on any future attempt to activate the snapshot
      (typically while booting).  This patch fixes the race.
      
      When the snapshot overflows, __invalidate_snapshot is called, which calls
      snapshot store method drop_snapshot. It goes to persistent_drop_snapshot that
      calls write_header. write_header constructs the new header in the "area"
      location.
      
      Concurrently, an existing kcopyd job may finish, call copy_callback
      and commit_exception method, that goes to persistent_commit_exception.
      persistent_commit_exception doesn't do locking, relying on the fact that
      callbacks are single-threaded, but it can race with snapshot invalidation and
      overwrite the header that is just being written while the snapshot is being
      invalidated.
      
      The result of this race is a corrupted header being written that can
      lead to a crash on further reactivation (if chunk_size is zero in the
      corrupted header).
      
      The fix is to use separate memory areas for each.
      
      See the bug: https://bugzilla.redhat.com/show_bug.cgi?id=461506
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      61578dcd
    • M
      dm snapshot: refactor zero_disk_area to use chunk_io · 02d2fd31
      Mikulas Patocka 提交于
      Refactor chunk_io to prepare for the fix in the following patch.
      
      Pass an area pointer to chunk_io and simplify zero_disk_area to use
      chunk_io.  No functional change.
      
      Cc: stable@kernel.org
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      02d2fd31
    • J
      dm log: userspace add luid to distinguish between concurrent log instances · 7ec23d50
      Jonathan Brassow 提交于
      Device-mapper userspace logs (like the clustered log) are
      identified by a universally unique identifier (UUID).  This
      identifier is used to associate requests from the kernel to
      a specific log in userspace.  The UUID must be unique everywhere,
      since multiple machines may use this identifier when communicating
      about a particular log, as is the case for cluster logs.
      
      Sometimes, device-mapper/LVM may re-use a UUID.  This is the
      case during pvmoves, when moving from one segment of an LV
      to another, or when resizing a mirror, etc.  In these cases,
      a new log is created with the same UUID and loaded in the
      "inactive" slot.  When a device-mapper "resume" is issued,
      the "live" table is deactivated and the new "inactive" table
      becomes "live".  (The "inactive" table can also be removed
      via a device-mapper 'clear' command.)
      
      The above two issues were colliding.  More than one log was being
      created with the same UUID, and there was no way to distinguish
      between them.  So, sometimes the wrong log would be swapped
      out during the exchange.
      
      The solution is to create a locally unique identifier,
      'luid', to go along with the UUID.  This new identifier is used
      to determine exactly which log is being referenced by the kernel
      when the log exchange is made.  The identifier is not
      universally safe, but it does not need to be, since
      create/destroy/suspend/resume operations are bound to a specific
      machine; and these are the operations that make up the exchange.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      7ec23d50
    • J
      dm raid1: do not allow log_failure variable to unset after being set · d2b69864
      Jonathan Brassow 提交于
      This patch fixes a bug which was triggering a case where the primary leg
      could not be changed on failure even when the mirror was in-sync.
      
      The case involves the failure of the primary device along with
      the transient failure of the log device.  The problem is that
      bios can be put on the 'failures' list (due to log failure)
      before 'fail_mirror' is called due to the primary device failure.
      Normally, this is fine, but if the log device failure is transient,
      a subsequent iteration of the work thread, 'do_mirror', will
      reset 'log_failure'.  The 'do_failures' function then resets
      the 'in_sync' variable when processing bios on the failures list.
      The 'in_sync' variable is what is used to determine if the
      primary device can be switched in the event of a failure.  Since
      this has been reset, the primary device is incorrectly assumed
      to be not switchable.
      
      The case has been seen in the cluster mirror context, where one
      machine realizes the log device is dead before the other machines.
      As the responsibilities of the server migrate from one node to
      another (because the mirror is being reconfigured due to the failure),
      the new server may think for a moment that the log device is fine -
      thus resetting the 'log_failure' variable.
      
      In any case, it is inappropiate for us to reset the 'log_failure'
      variable.  The above bug simply illustrates that it can actually
      hurt us.
      
      Cc: stable@kernel.org
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      d2b69864
    • J
      dm log: remove incorrect field from userspace table output · b8313b6d
      Jonathan Brassow 提交于
      The output of 'dmsetup table' includes an internal field that should not
      be there.  This patch removes it.  To make the fix simpler, we first
      reorder a constructor argument
      
      The 'device size' argument is generated internally.  Currently it is
      placed as the last space-separated word of the constructor string.
      However, we need to use a version of the string without this word, so we
      move it to the beginning instead so it is trivial to skip past it.
      
      We keep a copy of the arguments passed to userspace for creating a log,
      just in case we need to resend them.  These are the same arguments that
      are desired in the STATUSTYPE_TABLE request, except for one.  When
      creating the userspace log, the userspace daemon must know the size of
      the mirror, so that is added to the arguments given in the constructor
      table.  We were printing this extra argument out as well, which is a
      mistake.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      b8313b6d
    • J
      dm log: fix userspace status output · 4142a969
      Jonathan Brassow 提交于
      Fix 'dmsetup table' output.
      
      There is a missing ' ' at the end of the string causing two
      words to run together.
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      4142a969
    • M
      dm stripe: expose correct io hints · 40bea431
      Mike Snitzer 提交于
      Set sensible I/O hints for striped DM devices in the topology
      infrastructure added for 2.6.31 for userspace tools to
      obtain via sysfs.
      
      Add .io_hints to 'struct target_type' to allow the I/O hints portion
      (io_min and io_opt) of the 'struct queue_limits' to be set by each
      target and implement this for dm-stripe.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      40bea431
    • M
      dm table: add more context to terse warning messages · a963a956
      Mike Snitzer 提交于
      A couple of recent warning messages make it difficult for the reader to
      determine exactly what is wrong.  This patch adds more information to
      those messages.
      
      The messages were added by these commits:
        5dea271b ("dm table: pass correct dev area size
      to device_area_is_valid")
        ea9df47c ("dm table: fix blk_stack_limits arg
      to use bytes not sectors")
      
      The patch also corrects references to logical_block_size in printk format
      strings from %hu to %u.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a963a956
    • M
      dm table: fix queue_limit checking device iterator · f6a1ed10
      Mikulas Patocka 提交于
      The logic to check for valid device areas is inverted relative to proper
      use with iterate_devices.
      
      The iterate_devices method calls its callback for every underlying
      device in the target.  If any callback returns non-zero, iterate_devices
      exits immediately.  But the callback device_area_is_valid() returns 0 on
      error and 1 on success.  The overall effect without is that an error is
      issued only if every device is invalid.
      
      This patch renames device_area_is_valid to device_area_is_invalid and
      inverts the logic so that one invalid device is sufficient to raise
      an error.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f6a1ed10
    • M
      dm snapshot: implement iterate devices · 8811f46c
      Mike Snitzer 提交于
      Implement the .iterate_devices for the origin and snapshot targets.
      dm-snapshot's lack of .iterate_devices resulted in the inability to
      properly establish queue_limits for both targets.
      
      With 4K sector drives: an unfortunate side-effect of not establishing
      proper limits in either targets' DM device was that IO to the devices
      would fail even though both had been created without error.
      
      Commit af4874e0 ("dm target:s introduce
      iterate devices fn") in 2.6.31-rc1 should have implemented .iterate_devices
      for dm-snap.c's origin and snapshot targets.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      8811f46c
    • K
      dm multipath: fix oops when request based io fails when no paths · a77e28c7
      Kiyoshi Ueda 提交于
      The patch posted at http://marc.info/?l=dm-devel&m=124539787228784&w=2
      which was merged into cec47e3d ("dm:
      prepare for request based option") introduced a regression in
      request-based dm.
      
      If map_request() calls dm_kill_unmapped_request() to complete a cloned
      bio without dispatching it, clone->bio is still set when
      dm_end_request() is called and the BUG_ON(clone->bio) is incorrect.
      
      The patch fixes this bug by freeing bio in dm_end_request() if the clone
      has bio.  I've redone my tests to cover all I/O paths and confirmed
      there's no other regression.
      
      Here is the oops I hit in request-based dm when I do I/O to a multipath
      device which doesn't have any active path nor queue_if_no_path setting:
      
      ------------[ cut here ]------------
      kernel BUG at /root/2.6.31-rc4.rqdm/drivers/md/dm.c:828!
      invalid opcode: 0000 [#1] SMP
      last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
      CPU 1
      Modules linked in: autofs4 sunrpc cpufreq_ondemand acpi_cpufreq dm_mirror dm_region_hash dm_log dm_service_time dm_multipath scsi_dh dm_mod video output sbs sbshc battery ac sg sr_mod e1000e button cdrom serio_raw rtc_cmos rtc_core rtc_lib piix lpfc scsi_transport_fc ata_piix libata megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]
      Pid: 7, comm: ksoftirqd/1 Not tainted 2.6.31-rc4.rqdm #1 Express5800/120Lj [N8100-1417]
      RIP: 0010:[<ffffffffa023629d>]  [<ffffffffa023629d>] dm_softirq_done+0xbd/0x100 [dm_mod]
      RSP: 0018:ffff8800280a1f08  EFLAGS: 00010282
      RAX: ffffffffa02544e0 RBX: ffff8802aa1111d0 RCX: ffff8802aa1111e0
      RDX: ffff8802ab913e70 RSI: 0000000000000000 RDI: ffff8802ab913e70
      RBP: ffff8800280a1f28 R08: ffffc90005457040 R09: 0000000000000000
      R10: 0000000000000001 R11: 0000000000000000 R12: 00000000fffffffb
      R13: ffff8802ab913e88 R14: ffff8802ab9c1438 R15: 0000000000000100
      FS:  0000000000000000(0000) GS:ffff88002809e000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
      CR2: 0000003d54a98640 CR3: 000000029f0a1000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process ksoftirqd/1 (pid: 7, threadinfo ffff8802ae50e000, task ffff8802ae4f8040)
      Stack:
       ffff8800280a1f38 0000000000000020 ffffffff814f30a0 0000000000000004
      <0> ffff8800280a1f58 ffffffff8116b245 ffff8800280a1f38 ffff8800280a1f38
      <0> ffff8800280a1f58 0000000000000001 ffff8800280a1fa8 ffffffff810477bc
      Call Trace:
       <IRQ>
       [<ffffffff8116b245>] blk_done_softirq+0x75/0x90
       [<ffffffff810477bc>] __do_softirq+0xcc/0x210
       [<ffffffff81047170>] ? ksoftirqd+0x0/0x110
       [<ffffffff8100ce7c>] call_softirq+0x1c/0x50
       <EOI>
       [<ffffffff8100e785>] do_softirq+0x65/0xa0
       [<ffffffff81047170>] ? ksoftirqd+0x0/0x110
       [<ffffffff810471e0>] ksoftirqd+0x70/0x110
       [<ffffffff81059559>] kthread+0x99/0xb0
       [<ffffffff8100cd7a>] child_rip+0xa/0x20
       [<ffffffff8100c73c>] ? restore_args+0x0/0x30
       [<ffffffff810594c0>] ? kthread+0x0/0xb0
       [<ffffffff8100cd70>] ? child_rip+0x0/0x20
      Code: 44 89 e6 48 89 df e8 23 fb f2 e0 be 01 00 00 00 4c 89 f7 e8 f6 fd ff ff 5b 41 5c 41 5d 41 5e c9 c3 4c 89 ef e8 85 fe ff ff eb ed <0f> 0b eb fe 41 8b 85 dc 00 00 00 48 83 bb 10 01 00 00 00 89 83
      RIP  [<ffffffffa023629d>] dm_softirq_done+0xbd/0x100 [dm_mod]
       RSP <ffff8800280a1f08>
      ---[ end trace 16af0a1d8542da55 ]---
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a77e28c7
  6. 04 9月, 2009 1 次提交
  7. 03 9月, 2009 6 次提交
  8. 02 9月, 2009 1 次提交
    • D
      [CPUFREQ] Re-enable cpufreq suspend and resume code · ce6c3997
      Dominik Brodowski 提交于
      Commit 4bc5d341 is broken and causes regressions:
      
      (1) cpufreq_driver->resume() and ->suspend() were only called on
      __powerpc__, but you could set them on all architectures. In fact,
      ->resume() was defined and used before the PPC-related commit
      42d4dc3f complained about in 4bc5d341.
      
      (2) Therfore, the resume functions in acpi_cpufreq and speedstep-smi
      would never be called.
      
      (3) This means speedstep-smi would be unusuable after suspend or resume.
      
      The _real_ problem was calling cpufreq_driver->get() with interrupts
      off, but it re-enabling interrupts on some platforms. Why is ->get()
      necessary?
      
      Some systems like to change the CPU frequency behind our
      back, especially during BIOS-intensive operations like suspend or
      resume. If such systems also use a CPU frequency-dependant timing loop,
      delays might be off by large factors. Therefore, we need to ascertain
      as soon as possible that the CPU frequency is indeed at the speed we
      think it is. You can do this two ways: either setting it anew, or trying
      to get it. The latter is what was done, the former also has the same IRQ
      issue.
      
      So, let's try something different: defer the checking to after interrupts
      are re-enabled, by calling cpufreq_update_policy() (via schedule_work()).
      Timings may be off until this later stage, so let's watch out for
      resume regressions caused by the deferred handling of frequency changes
      behind the kernel's back.
      Signed-off-by: NDominik Brodowski <linux@dominikbrodowski.net>
      Signed-off-by: NDave Jones <davej@redhat.com>
      ce6c3997
  9. 01 9月, 2009 1 次提交
    • B
      ata_piix: parallel scanning on PATA needs an extra locking · 60c3be38
      Bartlomiej Zolnierkiewicz 提交于
      Commit log for commit 517d3cc1
      ("[libata] ata_piix: Enable parallel scan") says:
      
          This patch turns on parallel scanning for the ata_piix driver.
          This driver is used on most netbooks (no AHCI for cheap storage it seems).
          The scan is the dominating time factor in the kernel boot for these
          devices; with this flag it gets cut in half for the device I used
          for testing (eeepc).
          Alan took a look at the driver source and concluded that it ought to be safe
          to do for this driver.  Alan has also checked with the hardware team.
      
      and it is all true but once we put all things together additional
      constraints for PATA controllers show up (some hardware registers
      have per-host not per-port atomicity) and we risk misprogramming
      the controller.
      
      I used the following test to check whether the issue is real:
      
        @@ -736,8 +736,20 @@ static void piix_set_piomode(struct ata_
         			(timings[pio][1] << 8);
         	}
         	pci_write_config_word(dev, master_port, master_data);
        -	if (is_slave)
        +	if (is_slave) {
        +		if (ap->port_no == 0) {
        +			u8 tmp = slave_data;
        +
        +			while (slave_data == tmp) {
        +				pci_read_config_byte(dev, slave_port, &tmp);
        +				msleep(50);
        +			}
        +
        +			dev_printk(KERN_ERR, &dev->dev, "PATA parallel scan "
        +				   "race detected\n");
        +		}
         		pci_write_config_byte(dev, slave_port, slave_data);
        +	}
      
         	/* Ensure the UDMA bit is off - it will be turned back on if
         	   UDMA is selected */
      
      and it indeed triggered the error message.
      
      Lets fix all such races by adding an extra locking to ->set_piomode
      and ->set_dmamode methods for PATA controllers.
      
      [ Alan: would be better to take the host lock in libata-core for these
        cases so that we fix all the adapters in one swoop.  "Looks fine as a
        temproary quickfix tho" ]
      
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Acked-by: NAlan Cox <alan@linux.intel.com>
      Cc: Jeff Garzik <jgarzik@redhat.com>
      Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      60c3be38
  10. 31 8月, 2009 5 次提交