1. 03 11月, 2013 1 次提交
  2. 01 11月, 2013 1 次提交
  3. 24 10月, 2013 10 次提交
    • S
      raid5: avoid finding "discard" stripe · d47648fc
      Shaohua Li 提交于
      SCSI discard will damage discard stripe bio setting, eg, some fields are
      changed. If the stripe is reused very soon, we have wrong bios setting. We
      remove discard stripe from hash list, so next time the strip will be fully
      initialized.
      
      Suitable for backport to 3.7+.
      
      Cc: <stable@vger.kernel.org> (3.7+)
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      d47648fc
    • S
      raid5: set bio bi_vcnt 0 for discard request · 37c61ff3
      Shaohua Li 提交于
      SCSI layer will add new payload for discard request. If two bios are merged
      to one, the second bio has bi_vcnt 1 which is set in raid5. This will confuse
      SCSI and cause oops.
      
      Suitable for backport to 3.7+
      
      Cc: stable@vger.kernel.org (v3.7+)
      Reported-by: NJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      37c61ff3
    • B
      md: avoid deadlock when md_set_badblocks. · 905b0297
      Bian Yu 提交于
      When operate harddisk and hit errors, md_set_badblocks is called after
      scsi_restart_operations which already disabled the irq. but md_set_badblocks
      will call write_sequnlock_irq and enable irq. so softirq can preempt the
      current thread and that may cause a deadlock. I think this situation should
      use write_sequnlock_irqsave/irqrestore instead.
      
      I met the situation and the call trace is below:
      [  638.919974] BUG: spinlock recursion on CPU#0, scsi_eh_13/1010
      [  638.921923]  lock: 0xffff8800d4d51fc8, .magic: dead4ead, .owner: scsi_eh_13/1010, .owner_cpu: 0
      [  638.923890] CPU: 0 PID: 1010 Comm: scsi_eh_13 Not tainted 3.12.0-rc5+ #37
      [  638.925844] Hardware name: To be filled by O.E.M. To be filled by O.E.M./MAHOBAY, BIOS 4.6.5 03/05/2013
      [  638.927816]  ffff880037ad4640 ffff880118c03d50 ffffffff8172ff85 0000000000000007
      [  638.929829]  ffff8800d4d51fc8 ffff880118c03d70 ffffffff81730030 ffff8800d4d51fc8
      [  638.931848]  ffffffff81a72eb0 ffff880118c03d90 ffffffff81730056 ffff8800d4d51fc8
      [  638.933884] Call Trace:
      [  638.935867]  <IRQ>  [<ffffffff8172ff85>] dump_stack+0x55/0x76
      [  638.937878]  [<ffffffff81730030>] spin_dump+0x8a/0x8f
      [  638.939861]  [<ffffffff81730056>] spin_bug+0x21/0x26
      [  638.941836]  [<ffffffff81336de4>] do_raw_spin_lock+0xa4/0xc0
      [  638.943801]  [<ffffffff8173f036>] _raw_spin_lock+0x66/0x80
      [  638.945747]  [<ffffffff814a73ed>] ? scsi_device_unbusy+0x9d/0xd0
      [  638.947672]  [<ffffffff8173fb1b>] ? _raw_spin_unlock+0x2b/0x50
      [  638.949595]  [<ffffffff814a73ed>] scsi_device_unbusy+0x9d/0xd0
      [  638.951504]  [<ffffffff8149ec47>] scsi_finish_command+0x37/0xe0
      [  638.953388]  [<ffffffff814a75e8>] scsi_softirq_done+0xa8/0x140
      [  638.955248]  [<ffffffff8130e32b>] blk_done_softirq+0x7b/0x90
      [  638.957116]  [<ffffffff8104fddd>] __do_softirq+0xfd/0x330
      [  638.958987]  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
      [  638.960861]  [<ffffffff8174a5cc>] call_softirq+0x1c/0x30
      [  638.962724]  [<ffffffff81004c7d>] do_softirq+0x8d/0xc0
      [  638.964565]  [<ffffffff8105024e>] irq_exit+0x10e/0x150
      [  638.966390]  [<ffffffff8174ad4a>] smp_apic_timer_interrupt+0x4a/0x60
      [  638.968223]  [<ffffffff817499af>] apic_timer_interrupt+0x6f/0x80
      [  638.970079]  <EOI>  [<ffffffff810b964f>] ? __lock_release+0x6f/0x100
      [  638.971899]  [<ffffffff8173fa6a>] ? _raw_spin_unlock_irq+0x3a/0x50
      [  638.973691]  [<ffffffff8173fa60>] ? _raw_spin_unlock_irq+0x30/0x50
      [  638.975475]  [<ffffffff81562393>] md_set_badblocks+0x1f3/0x4a0
      [  638.977243]  [<ffffffff81566e07>] rdev_set_badblocks+0x27/0x80
      [  638.978988]  [<ffffffffa00d97bb>] raid5_end_read_request+0x36b/0x4e0 [raid456]
      [  638.980723]  [<ffffffff811b5a1d>] bio_endio+0x1d/0x40
      [  638.982463]  [<ffffffff81304ff3>] req_bio_endio.isra.65+0x83/0xa0
      [  638.984214]  [<ffffffff81306b9f>] blk_update_request+0x7f/0x350
      [  638.985967]  [<ffffffff81306ea1>] blk_update_bidi_request+0x31/0x90
      [  638.987710]  [<ffffffff813085e0>] __blk_end_bidi_request+0x20/0x50
      [  638.989439]  [<ffffffff8130862f>] __blk_end_request_all+0x1f/0x30
      [  638.991149]  [<ffffffff81308746>] blk_peek_request+0x106/0x250
      [  638.992861]  [<ffffffff814a62a9>] ? scsi_kill_request.isra.32+0xe9/0x130
      [  638.994561]  [<ffffffff814a633a>] scsi_request_fn+0x4a/0x3d0
      [  638.996251]  [<ffffffff813040a7>] __blk_run_queue+0x37/0x50
      [  638.997900]  [<ffffffff813045af>] blk_run_queue+0x2f/0x50
      [  638.999553]  [<ffffffff814a5750>] scsi_run_queue+0xe0/0x1c0
      [  639.001185]  [<ffffffff814a7721>] scsi_run_host_queues+0x21/0x40
      [  639.002798]  [<ffffffff814a2e87>] scsi_restart_operations+0x177/0x200
      [  639.004391]  [<ffffffff814a4fe9>] scsi_error_handler+0xc9/0xe0
      [  639.005996]  [<ffffffff814a4f20>] ? scsi_unjam_host+0xd0/0xd0
      [  639.007600]  [<ffffffff81072f6b>] kthread+0xdb/0xe0
      [  639.009205]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
      [  639.010821]  [<ffffffff81748cac>] ret_from_fork+0x7c/0xb0
      [  639.012437]  [<ffffffff81072e90>] ? flush_kthread_worker+0x170/0x170
      
      This bug was introduce in commit  2e8ac303
      (the first time rdev_set_badblock was call from interrupt context),
      so this patch is appropriate for 3.5 and subsequent kernels.
      
      Cc: <stable@vger.kernel.org> (3.5+)
      Signed-off-by: NBian Yu <bianyu@kedacom.com>
      Reviewed-by: NJianpeng Ma <majianpeng@gmail.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      905b0297
    • L
      md: Fix skipping recovery for read-only arrays. · 61e4947c
      Lukasz Dorau 提交于
      Since:
              commit 7ceb17e8
              md: Allow devices to be re-added to a read-only array.
      
      spares are activated on a read-only array. In case of raid1 and raid10
      personalities it causes that not-in-sync devices are marked in-sync
      without checking if recovery has been finished.
      
      If a read-only array is degraded and one of its devices is not in-sync
      (because the array has been only partially recovered) recovery will be skipped.
      
      This patch adds checking if recovery has been finished before marking a device
      in-sync for raid1 and raid10 personalities. In case of raid5 personality
      such condition is already present (at raid5.c:6029).
      
      Bug was introduced in 3.10 and causes data corruption.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NPawel Baldysiak <pawel.baldysiak@intel.com>
      Signed-off-by: NLukasz Dorau <lukasz.dorau@intel.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      61e4947c
    • C
      EDAC, GHES: Update ghes error record info · 56507694
      Chen, Gong 提交于
      In latest UEFI spec(by now it's 2.4) there are some new
      fields for memory error reporting. Add these new fields for
      ghes_edac interface.
      Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
      Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      56507694
    • C
      ACPI, APEI, CPER: Cleanup CPER memory error output format · f6edea77
      Chen, Gong 提交于
      Memory error reporting is much too verbose.  Most users do not care about
      the DIMM internal bank/row/column information. Downgrade the fine details
      to "pr_debug" status so that those few who do care can get them if they
      really want to.  The detail information will be later be provided by
      perf/trace interface.
      Since things are still a bit scary, and users are sometimes overly
      nervous, provide a reassuring message that corrected errors do not
      generally require any further action.
      Suggested-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
      Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      f6edea77
    • C
      ACPI, APEI, CPER: Enhance memory reporting capability · fbeef85f
      Chen, Gong 提交于
      After H/W error happens under FFM enabled mode, lots of information
      are shown but new fields added by UEFI 2.4 (e.g. DIMM location) need to
      be added.
      
      Original-author: Tony Luck <tony.luck@intel.com>
      Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
      Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      fbeef85f
    • C
      ACPI, APEI, CPER: Add UEFI 2.4 support for memory error · 147de147
      Chen, Gong 提交于
      In latest UEFI spec(by now it is 2.4) memory error definition
      for CPER (UEFI 2.4 Appendix N Common Platform Error Record)
      adds some new fields. These fields help people to locate
      memory error to an actual DIMM location.
      
      Original-author: Tony Luck <tony.luck@intel.com>
      Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
      Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      147de147
    • C
      DMI: Parse memory device (type 17) in SMBIOS · dd6dad42
      Chen, Gong 提交于
      This patch adds a new interface to decode memory device (type 17)
      to help error reporting on DIMMs.
      
      Original-author: Tony Luck <tony.luck@intel.com>
      Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
      Acked-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
      Acked-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      dd6dad42
    • C
      ACPI, x86: Extended error log driver for x86 platform · 4b3db708
      Chen, Gong 提交于
      This H/W error log driver (a.k.a eMCA driver) is implemented based on
      http://www.intel.com/content/www/us/en/architecture-and-technology/enhanced-mca-logging-xeon-paper.html
      
      After errors are captured, more detailed platform specific information
      can be got via this new enhanced H/W error log driver. Most notably we
      can track memory errors back to the DIMM slot silk screen label.
      Signed-off-by: NChen, Gong <gong.chen@linux.intel.com>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      4b3db708
  4. 23 10月, 2013 6 次提交
  5. 22 10月, 2013 17 次提交
  6. 21 10月, 2013 2 次提交
  7. 19 10月, 2013 3 次提交