1. 25 7月, 2013 2 次提交
    • N
      md/raid5: fix interaction of 'replace' and 'recovery'. · f94c0b66
      NeilBrown 提交于
      If a device in a RAID4/5/6 is being replaced while another is being
      recovered, then the writes to the replacement device currently don't
      happen, resulting in corruption when the replacement completes and the
      new drive takes over.
      
      This is because the replacement writes are only triggered when
      's.replacing' is set and not when the similar 's.sync' is set (which
      is the case during resync and recovery - it means all devices need to
      be read).
      
      So schedule those writes when s.replacing is set as well.
      
      In this case we cannot use "STRIPE_INSYNC" to record that the
      replacement has happened as that is needed for recording that any
      parity calculation is complete.  So introduce STRIPE_REPLACED to
      record if the replacement has happened.
      
      For safety we should also check that STRIPE_COMPUTE_RUN is not set.
      This has a similar effect to the "s.locked == 0" test.  The latter
      ensure that now IO has been flagged but not started.  The former
      checks if any parity calculation has been flagged by not started.
      We must wait for both of these to complete before triggering the
      'replace'.
      
      Add a similar test to the subsequent check for "are we finished yet".
      This possibly isn't needed (is subsumed in the STRIPE_INSYNC test),
      but it makes it more obvious that the REPLACE will happen before we
      think we are finished.
      
      Finally if a NeedReplace device is not UPTODATE then that is an
      error.  We really must trigger a warning.
      
      This bug was introduced in commit 9a3e1101
      (md/raid5:  detect and handle replacements during recovery.)
      which introduced replacement for raid5.
      That was in 3.3-rc3, so any stable kernel since then would benefit
      from this fix.
      
      Cc: stable@vger.kernel.org (3.3+)
      Reported-by: Nqindehua <13691222965@163.com>
      Tested-by: Nqindehua <qindehua@163.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f94c0b66
    • N
      md/raid10: remove use-after-free bug. · 0eb25bb0
      NeilBrown 提交于
      We always need to be careful when calling generic_make_request, as it
      can start a chain of events which might free something that we are
      using.
      
      Here is one place I wasn't careful enough.  If the wbio2 is not in
      use, then it might get freed at the first generic_make_request call.
      So perform all necessary tests first.
      
      This bug was introduced in 3.3-rc3 (24afd80d) and can cause an
      oops, so fix is suitable for any -stable since then.
      
      Cc: stable@vger.kernel.org (3.3+)
      Signed-off-by: NNeilBrown <neilb@suse.de>
      0eb25bb0
  2. 18 7月, 2013 3 次提交
    • N
      md/raid1: fix bio handling problems in process_checks() · 30bc9b53
      NeilBrown 提交于
      Recent change to use bio_copy_data() in raid1 when repairing
      an array is faulty.
      
      The underlying may have changed the bio in various ways using
      bio_advance and these need to be undone not just for the 'sbio' which
      is being copied to, but also the 'pbio' (primary) which is being
      copied from.
      
      So perform the reset on all bios that were read from and do it early.
      
      This also ensure that the sbio->bi_io_vec[j].bv_len passed to
      memcmp is correct.
      
      This fixes a crash during a 'check' of a RAID1 array.  The crash was
      introduced in 3.10 so this is suitable for 3.10-stable.
      
      Cc: stable@vger.kernel.org (3.10)
      Reported-by: NJoe Lawrence <joe.lawrence@stratus.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      30bc9b53
    • N
      md: Remove recent change which allows devices to skip recovery. · 5024c298
      NeilBrown 提交于
      commit 7ceb17e8
          md: Allow devices to be re-added to a read-only array.
      
      allowed a bit more than just that.  It also allows devices to be added
      to a read-write array and to end up skipping recovery.
      
      This patch removes the offending piece of code pending a rewrite for a
      subsequent release.
      
      More specifically:
       If the array has a bitmap, then the device will still need a bitmap
       based resync ('saved_raid_disk' is set under different conditions
       is a bitmap is present).
       If the array doesn't have a bitmap, then this is correct as long as
       nothing has been written to the array since the metadata was checked
       by ->validate_super.  However there is no locking to ensure that there
       was no write.
      
      Bug was introduced in 3.10 and causes data corruption so
      patch is suitable for 3.10-stable.
      
      Cc: stable@vger.kernel.org (3.10)
      Reported-by: NJoe Lawrence <joe.lawrence@stratus.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5024c298
    • N
      md/raid10: fix two problems with RAID10 resync. · 7bb23c49
      NeilBrown 提交于
      1/ When an different between blocks is found, data is copied from
         one bio to the other.  However bv_len is used as the length to
         copy and this could be zero.  So use r10_bio->sectors to calculate
         length instead.
         Using bv_len was probably always a bit dubious, but the introduction
         of bio_advance made it much more likely to be a problem.
      
      2/ When preparing some blocks for sync, we don't set BIO_UPTODATE
         except on bios that we schedule for a read.  This ensures that
         missing/failed devices don't confuse the loop at the top of
         sync_request write.
         Commit 8be185f2 "raid10: Use bio_reset()"
         removed a loop which set BIO_UPTDATE on all appropriate bios.
         So we need to re-add that flag.
      
      These bugs were introduced in 3.10, so this patch is suitable for
      3.10-stable, and can remove a potential for data corruption.
      
      Cc: stable@vger.kernel.org (3.10)
      Reported-by: NBrassow Jonathan <jbrassow@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7bb23c49
  3. 15 7月, 2013 1 次提交
  4. 13 7月, 2013 4 次提交
    • N
      via-rhine: fix dma mapping errors · 9b4fe5fb
      Neil Horman 提交于
      this bug:
      https://bugzilla.redhat.com/show_bug.cgi?id=951695
      
      Reported a dma debug backtrace:
      
      WARNING: at lib/dma-debug.c:937 check_unmap+0x47d/0x930()
      Hardware name: To Be Filled By O.E.M.
      via-rhine 0000:00:12.0: DMA-API: device driver failed to check map error[device
      address=0x0000000075a837b2] [size=90 bytes] [mapped as single]
      Modules linked in: ip6_tables gspca_spca561 gspca_main videodev media
      snd_hda_codec_realtek snd_hda_intel i2c_viapro snd_hda_codec snd_hwdep snd_seq
      ppdev mperf via_rhine coretemp snd_pcm mii microcode snd_page_alloc snd_timer
      snd_mpu401 snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore parport_pc
      parport shpchp ata_generic pata_acpi radeon i2c_algo_bit drm_kms_helper ttm drm
      pata_via sata_via i2c_core uinput
      Pid: 295, comm: systemd-journal Not tainted 3.9.0-0.rc6.git2.1.fc20.x86_64 #1
      Call Trace:
       <IRQ>  [<ffffffff81068dd0>] warn_slowpath_common+0x70/0xa0
       [<ffffffff81068e4c>] warn_slowpath_fmt+0x4c/0x50
       [<ffffffff8137ec6d>] check_unmap+0x47d/0x930
       [<ffffffff810ace9f>] ? local_clock+0x5f/0x70
       [<ffffffff8137f17f>] debug_dma_unmap_page+0x5f/0x70
       [<ffffffffa0225edc>] ? rhine_ack_events.isra.14+0x3c/0x50 [via_rhine]
       [<ffffffffa02275f8>] rhine_napipoll+0x1d8/0xd80 [via_rhine]
       [<ffffffff815d3d51>] ? net_rx_action+0xa1/0x380
       [<ffffffff815d3e22>] net_rx_action+0x172/0x380
       [<ffffffff8107345f>] __do_softirq+0xff/0x400
       [<ffffffff81073925>] irq_exit+0xb5/0xc0
       [<ffffffff81724cd6>] do_IRQ+0x56/0xc0
       [<ffffffff81719ff2>] common_interrupt+0x72/0x72
       <EOI>  [<ffffffff8170ff57>] ? __slab_alloc+0x4c2/0x526
       [<ffffffff811992e0>] ? mmap_region+0x2b0/0x5a0
       [<ffffffff810d5807>] ? __lock_is_held+0x57/0x80
       [<ffffffff811992e0>] ? mmap_region+0x2b0/0x5a0
       [<ffffffff811bf1bf>] kmem_cache_alloc+0x2df/0x360
       [<ffffffff811992e0>] mmap_region+0x2b0/0x5a0
       [<ffffffff811998e6>] do_mmap_pgoff+0x316/0x3d0
       [<ffffffff81183ca0>] vm_mmap_pgoff+0x90/0xc0
       [<ffffffff81197d6c>] sys_mmap_pgoff+0x4c/0x190
       [<ffffffff81367d7e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
       [<ffffffff8101eb42>] sys_mmap+0x22/0x30
       [<ffffffff81722fd9>] system_call_fastpath+0x16/0x1b
      
      Usual problem with the usual fix, add the appropriate calls to dma_mapping_error
      where appropriate
      
      Untested, as I don't have hardware, but its pretty straightforward
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: Roger Luethi <rl@hellgate.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b4fe5fb
    • N
      atl1e: fix dma mapping warnings · 352900b5
      Neil Horman 提交于
      Recently had this backtrace reported:
      WARNING: at lib/dma-debug.c:937 check_unmap+0x47d/0x930()
      Hardware name: System Product Name
      ATL1E 0000:02:00.0: DMA-API: device driver failed to check map error[device
      address=0x00000000cbfd1000] [size=90 bytes] [mapped as single]
      Modules linked in: xt_conntrack nf_conntrack ebtable_filter ebtables
      ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek iTCO_wdt
      iTCO_vendor_support snd_hda_intel acpi_cpufreq mperf coretemp btrfs zlib_deflate
      snd_hda_codec snd_hwdep microcode raid6_pq libcrc32c snd_seq usblp serio_raw xor
      snd_seq_device joydev snd_pcm snd_page_alloc snd_timer snd lpc_ich i2c_i801
      soundcore mfd_core atl1e asus_atk0110 ata_generic pata_acpi radeon i2c_algo_bit
      drm_kms_helper ttm drm i2c_core pata_marvell uinput
      Pid: 314, comm: systemd-journal Not tainted 3.9.0-0.rc6.git2.3.fc19.x86_64 #1
      Call Trace:
       <IRQ>  [<ffffffff81069106>] warn_slowpath_common+0x66/0x80
       [<ffffffff8106916c>] warn_slowpath_fmt+0x4c/0x50
       [<ffffffff8138151d>] check_unmap+0x47d/0x930
       [<ffffffff810ad048>] ? sched_clock_cpu+0xa8/0x100
       [<ffffffff81381a2f>] debug_dma_unmap_page+0x5f/0x70
       [<ffffffff8137ce30>] ? unmap_single+0x20/0x30
       [<ffffffffa01569a1>] atl1e_intr+0x3a1/0x5b0 [atl1e]
       [<ffffffff810d53fd>] ? trace_hardirqs_off+0xd/0x10
       [<ffffffff81119636>] handle_irq_event_percpu+0x56/0x390
       [<ffffffff811199ad>] handle_irq_event+0x3d/0x60
       [<ffffffff8111cb6a>] handle_fasteoi_irq+0x5a/0x100
       [<ffffffff8101c36f>] handle_irq+0xbf/0x150
       [<ffffffff811dcb2f>] ? file_sb_list_del+0x3f/0x50
       [<ffffffff81073b10>] ? irq_enter+0x50/0xa0
       [<ffffffff8172738d>] do_IRQ+0x4d/0xc0
       [<ffffffff811dcb2f>] ? file_sb_list_del+0x3f/0x50
       [<ffffffff8171c6b2>] common_interrupt+0x72/0x72
       <EOI>  [<ffffffff810db5b2>] ? lock_release+0xc2/0x310
       [<ffffffff8109ea04>] lg_local_unlock_cpu+0x24/0x50
       [<ffffffff811dcb2f>] file_sb_list_del+0x3f/0x50
       [<ffffffff811dcb6d>] fput+0x2d/0xc0
       [<ffffffff811d8ea1>] filp_close+0x61/0x90
       [<ffffffff811fae4d>] __close_fd+0x8d/0x150
       [<ffffffff811d8ef0>] sys_close+0x20/0x50
       [<ffffffff81725699>] system_call_fastpath+0x16/0x1b
      
      The usual straighforward failure to check for dma_mapping_error after a map
      operation is completed.
      
      This patch should fix it, the reporter wandered off after filing this bz:
      https://bugzilla.redhat.com/show_bug.cgi?id=954170
      
      and I don't have hardware to test, but the fix is pretty straightforward, so I
      figured I'd post it for review.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: Jay Cliburn <jcliburn@gmail.com>
      CC: Chris Snook <chris.snook@gmail.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      352900b5
    • H
      usb/net/r815x: fix cast to restricted __le32 · e7638524
      hayeswang 提交于
      >> drivers/net/usb/r815x.c:38:16: sparse: cast to restricted __le32
      >> drivers/net/usb/r815x.c:67:15: sparse: cast to restricted __le32
      >> drivers/net/usb/r815x.c:69:13: sparse: incorrect type in assignment (different base types)
         drivers/net/usb/r815x.c:69:13:    expected unsigned int [unsigned] [addressable] [assigned] [usertype] tmp
         drivers/net/usb/r815x.c:69:13:    got restricted __le32 [usertype] <noident>
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Spotted-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7638524
    • H
      usb/net/r8152: fix integer overflow in expression · 3ff25e3c
      hayeswang 提交于
      config: make ARCH=avr32 allyesconfig
      drivers/net/usb/r8152.c: In function 'rtl8152_start_xmit':
      drivers/net/usb/r8152.c:956: warning: integer overflow in expression
      
         955	memset(tx_desc, 0, sizeof(*tx_desc));
       > 956	tx_desc->opts1 = cpu_to_le32((len & TX_LEN_MASK) | TX_FS | TX_LS);
         957	tp->tx_skb = skb;
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Spotted-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ff25e3c
  5. 12 7月, 2013 30 次提交