1. 31 10月, 2015 3 次提交
  2. 29 10月, 2015 2 次提交
  3. 28 10月, 2015 5 次提交
    • C
      net/mlx4: Copy/set only sizeof struct mlx4_eqe bytes · c02b0501
      Carol L Soto 提交于
      When doing memcpy/memset of EQEs, we should use sizeof struct
      mlx4_eqe as the base size and not caps.eqe_size which could be bigger.
      
      If caps.eqe_size is bigger than the struct mlx4_eqe then we corrupt
      data in the master context.
      
      When using a 64 byte stride, the memcpy copied over 63 bytes to the
      slave_eq structure.  This resulted in copying over the entire eqe of
      interest, including its ownership bit -- and also 31 bytes of garbage
      into the next WQE in the slave EQ -- which did NOT include the ownership
      bit (and therefore had no impact).
      
      However, once the stride is increased to 128, we are overwriting the
      ownership bits of *three* eqes in the slave_eq struct.  This results
      in an incorrect ownership bit for those eqes, which causes the eq to
      seem to be full. The issue therefore surfaced only once 128-byte EQEs
      started being used in SRIOV and (overarchitectures that have 128/256
      byte cache-lines such as PPC) - e.g after commit 77507aa2
      "net/mlx4_core: Enable CQE/EQE stride support".
      
      Fixes: 08ff3235 ('mlx4: 64-byte CQE/EQE support')
      Signed-off-by: NCarol L Soto <clsoto@linux.vnet.ibm.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c02b0501
    • J
      net/mlx4_en: Explicitly set no vlan tags in WQE ctrl segment when no vlan is present · 092bf0fc
      Jack Morgenstein 提交于
      We do not set the ins_vlan field to zero when no vlan id is present in the packet.
      
      Since WQEs in the TX ring are not zeroed out between uses, this oversight
      could result in having vlan flags present in the WQE ctrl segment when no
      vlan is preset.
      
      Fixes: e38af4fa ('net/mlx4_en: Add support for hardware accelerated 802.1ad vlan')
      Reported-by: NGideon Naim <gideonn@mellanox.com>
      Signed-off-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      092bf0fc
    • M
      vhost: fix performance on LE hosts · e407f39a
      Michael S. Tsirkin 提交于
      commit 2751c988 ("vhost: cross-endian
      support for legacy devices") introduced a minor regression: even with
      cross-endian disabled, and even on LE host, vhost_is_little_endian is
      checking is_le flag so there's always a branch.
      
      To fix, simply check virtio_legacy_is_little_endian first.
      
      Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NGreg Kurz <gkurz@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e407f39a
    • L
      amd-xgbe: Fix race between access of desc and desc index · 20986ed8
      Lendacky, Thomas 提交于
      During Tx cleanup it's still possible for the descriptor data to be
      read ahead of the descriptor index. A memory barrier is required between
      the read of the descriptor index and the start of the Tx cleanup loop.
      This allows a change to a lighter-weight barrier in the Tx transmit
      routine just before updating the current descriptor index.
      
      Since the memory barrier does result in extra overhead on arm64, keep
      the previous change to not chase the current descriptor value. This
      prevents the execution of the barrier for each loop performed.
      Suggested-by: NAlexander Duyck <alexander.duyck@gmail.com>
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20986ed8
    • N
      forcedeth: fix unilateral interrupt disabling in netpoll path · 0b7c8743
      Neil Horman 提交于
      Forcedeth currently uses disable_irq_lockdep and enable_irq_lockdep, which in
      some configurations simply calls local_irq_disable.  This causes errant warnings
      in the netpoll path as in netpoll_send_skb_on_dev, where we disable irqs using
      local_irq_save, leading to the following warning:
      
      WARNING: at net/core/netpoll.c:352 netpoll_send_skb_on_dev+0x243/0x250() (Not
      tainted)
      Hardware name:
      netpoll_send_skb_on_dev(): eth0 enabled interrupts in poll
      (nv_start_xmit_optimized+0x0/0x860 [forcedeth])
      Modules linked in: netconsole(+) configfs ipv6 iptable_filter ip_tables ppdev
      parport_pc parport sg microcode serio_raw edac_core edac_mce_amd k8temp
      snd_hda_codec_realtek snd_hda_codec_generic forcedeth snd_hda_intel
      snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore
      snd_page_alloc i2c_nforce2 i2c_core shpchp ext4 jbd2 mbcache sr_mod cdrom sd_mod
      crc_t10dif pata_amd ata_generic pata_acpi sata_nv dm_mirror dm_region_hash
      dm_log dm_mod [last unloaded: scsi_wait_scan]
      Pid: 1940, comm: modprobe Not tainted 2.6.32-573.7.1.el6.x86_64.debug #1
      Call Trace:
       [<ffffffff8107bbc1>] ? warn_slowpath_common+0x91/0xe0
       [<ffffffff8107bcc6>] ? warn_slowpath_fmt+0x46/0x60
       [<ffffffffa00fe5b0>] ? nv_start_xmit_optimized+0x0/0x860 [forcedeth]
       [<ffffffff814b3593>] ? netpoll_send_skb_on_dev+0x243/0x250
       [<ffffffff814b37c9>] ? netpoll_send_udp+0x229/0x270
       [<ffffffffa02e3299>] ? write_msg+0x39/0x110 [netconsole]
       [<ffffffffa02e331b>] ? write_msg+0xbb/0x110 [netconsole]
       [<ffffffff8107bd55>] ? __call_console_drivers+0x75/0x90
       [<ffffffff8107bdba>] ? _call_console_drivers+0x4a/0x80
       [<ffffffff8107c445>] ? release_console_sem+0xe5/0x250
       [<ffffffff8107d200>] ? register_console+0x190/0x3e0
       [<ffffffffa02e71a6>] ? init_netconsole+0x1a6/0x216 [netconsole]
       [<ffffffffa02e7000>] ? init_netconsole+0x0/0x216 [netconsole]
       [<ffffffff810020d0>] ? do_one_initcall+0xc0/0x280
       [<ffffffff810d4933>] ? sys_init_module+0xe3/0x260
       [<ffffffff8100b0d2>] ? system_call_fastpath+0x16/0x1b
      ---[ end trace f349c7af88e6a6d5 ]---
      console [netcon0] enabled
      netconsole: network logging started
      
      Fix it by modifying the forcedeth code to use
      disable_irq_nosync_lockdep_irqsavedisable_irq_nosync_lockdep_irqsave instead,
      which saves and restores irq state properly.  This also saves us a little code
      in the process
      
      Tested by the reporter, with successful restuls
      
      Patch applies to the head of the net tree
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      CC: "David S. Miller" <davem@davemloft.net>
      Reported-by: NVasily Averin <vvs@sw.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b7c8743
  4. 27 10月, 2015 9 次提交
  5. 26 10月, 2015 7 次提交
  6. 24 10月, 2015 9 次提交
    • N
      md/raid10: fix the 'new' raid10 layout to work correctly. · 8bce6d35
      NeilBrown 提交于
      In Linux 3.9 we introduce a new 'far' layout for RAID10 which was
      supposed to rotate the replicas differently and so provide better
      resilience.  In particular it could survive more combinations of 2
      drive failures.
      
      Unfortunately. due to a coding error, this some did what was wanted,
      sometimes improved less than we hoped, and sometimes - in very
      unlikely circumstances - put multiple replicas on the same device so
      the redundancy was harmed.
      
      No public user-space tool has created arrays using this layout so it
      is very unlikely that zero-redundancy arrays actually exist.  Probably
      no arrays using any form of the new layout exist.  But we cannot be
      certain.
      
      So use another bit in the 'layout' number and introduce a bug-fixed
      version of the layout.
      Also when assembling an array, if it has a zero-redundancy layout,
      give a warning.
      Reported-by: NHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      8bce6d35
    • N
      md/raid10: don't clear bitmap bit when bad-block-list write fails. · c340702c
      NeilBrown 提交于
      When a write fails and a bad-block-list is present, we can
      update the bad-block-list instead of writing the data.  If
      this succeeds then it is OK clear the relevant bitmap-bit as
      no further 'sync' of the block is needed.
      
      However if writing the bad-block-list fails then we need to
      treat the write as failed and particularly must not clear
      the bitmap bit.  Otherwise the device can be re-added (after
      any hardware connection issues are resolved) and because the
      relevant bit in the bitmap is clear, that block will not be
      resynced.  This leads to data corruption.
      
      We already delay the final bio_endio() on the write until
      the bad-block-list is written so that when the write
      returns: either that data is safe, the bad-block record is
      safe, or the fact that the device is faulty is safe.
      However we *don't* delay the clearing of the bitmap, so the
      bitmap bit can be recorded as cleared before we know if the
      bad-block-list was written safely.
      
      So: delay that until the write really is safe.
      i.e. move the call to close_write() until just before
      calling bio_endio(), and recheck the 'is array degraded'
      status before making that call.
      
      This bug goes back to v3.1 when bad-block-lists were
      introduced, though it only affects arrays created with
      mdadm-3.3 or later as only those have bad-block lists.
      
      Backports will require at least
      Commit: 95af587e ("md/raid10: ensure device failure recorded before write request returns.")
      as well.  I'll send that to 'stable' separately.
      
      Note that of the two tests of R10BIO_WriteError that this
      patch adds, the first is certain to fail and the second is
      certain to succeed.  However doing it this way makes the
      patch more obviously correct.  I will tidy the code up in a
      future merge window.
      Reported-by: NNate Dailey <nate.dailey@stratus.com>
      Fixes: bd870a16 ("md/raid10:  Handle write errors by updating badblock log.")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      c340702c
    • N
      md/raid1: don't clear bitmap bit when bad-block-list write fails. · bd8688a1
      NeilBrown 提交于
      When a write fails and a bad-block-list is present, we can
      update the bad-block-list instead of writing the data.  If
      this succeeds then it is OK clear the relevant bitmap-bit as
      no further 'sync' of the block is needed.
      
      However if writing the bad-block-list fails then we need to
      treat the write as failed and particularly must not clear
      the bitmap bit.  Otherwise the device can be re-added (after
      any hardware connection issues are resolved) and because the
      relevant bit in the bitmap is clear, that block will not be
      resynced.  This leads to data corruption.
      
      We already delay the final bio_endio() on the write until
      the bad-block-list is written so that when the write
      returns: either that data is safe, the bad-block record is
      safe, or the fact that the device is faulty is safe.
      However we *don't* delay the clearing of the bitmap, so the
      bitmap bit can be recorded as cleared before we know if the
      bad-block-list was written safely.
      
      So: delay that until the write really is safe.
      i.e. move the call to close_write() until just before
      calling bio_endio(), and recheck the 'is array degraded'
      status before making that call.
      
      This bug goes back to v3.1 when bad-block-lists were
      introduced, though it only affects arrays created with
      mdadm-3.3 or later as only those have bad-block lists.
      
      Backports will require at least
      Commit: 55ce74d4 ("md/raid1: ensure device failure recorded before write request returns.")
      as well.  I'll send that to 'stable' separately.
      
      Note that of the two tests of R1BIO_WriteError that this
      patch adds, the first is certain to fail and the second is
      certain to succeed.  However doing it this way makes the
      patch more obviously correct.  I will tidy the code up in a
      future merge window.
      Reported-and-tested-by: NNate Dailey <nate.dailey@stratus.com>
      Cc: Jes Sorensen <Jes.Sorensen@redhat.com>
      Fixes: cd5ff9a1 ("md/raid1:  Handle write errors by updating badblock log.")
      Signed-off-by: NNeilBrown <neilb@suse.com>
      bd8688a1
    • V
      i2c: pnx: fix runtime warnings caused by enabling unprepared clock · 5dd32eae
      Vladimir Zapolskiy 提交于
      The driver can not be used on a platform with common clock framework
      until clk_prepare/clk_unprepare calls are added, otherwise clk_enable
      calls will fail and a WARN is generated.
      Signed-off-by: NVladimir Zapolskiy <vz@mleia.com>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      5dd32eae
    • J
      dm cache: the CLEAN_SHUTDOWN flag was not being set · 3201ac45
      Joe Thornber 提交于
      If the CLEAN_SHUTDOWN flag is not set when a cache is loaded then all cache
      blocks are marked as dirty and a full writeback occurs.
      
      __commit_transaction() is responsible for setting/clearing
      CLEAN_SHUTDOWN (based the flags_mutator that is passed in).
      
      Fix this issue, of the cache's on-disk flags being wrong, by making sure
      __commit_transaction() does not reset the flags after the mutator has
      altered the flags in preparation for them being serialized to disk.
      
      before:
      
      sb_flags = mutator(le32_to_cpu(disk_super->flags));
      disk_super->flags = cpu_to_le32(sb_flags);
      disk_super->flags = cpu_to_le32(cmd->flags);
      
      after:
      
      disk_super->flags = cpu_to_le32(cmd->flags);
      sb_flags = mutator(le32_to_cpu(disk_super->flags));
      disk_super->flags = cpu_to_le32(sb_flags);
      Reported-by: NBogdan Vasiliev <bogdan.vasiliev@gmail.com>
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      3201ac45
    • M
      dm btree: fix leak of bufio-backed block in btree_split_beneath error path · 4dcb8b57
      Mike Snitzer 提交于
      btree_split_beneath()'s error path had an outstanding FIXME that speaks
      directly to the potential for _not_ cleaning up a previously allocated
      bufio-backed block.
      
      Fix this by releasing the previously allocated bufio block using
      unlock_block().
      Reported-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NJoe Thornber <thornber@redhat.com>
      Cc: stable@vger.kernel.org
      4dcb8b57
    • J
      dm btree remove: fix a bug when rebalancing nodes after removal · 2871c69e
      Joe Thornber 提交于
      Commit 4c7e3093 ("dm btree remove: fix bug in redistribute3") wasn't
      a complete fix for redistribute3().
      
      The redistribute3 function takes 3 btree nodes and shares out the entries
      evenly between them.  If the three nodes in total contained
      (MAX_ENTRIES * 3) - 1 entries between them then this was erroneously getting
      rebalanced as (MAX_ENTRIES - 1) on the left and right, and (MAX_ENTRIES + 1) in
      the center.
      
      Fix this issue by being more careful about calculating the target number
      of entries for the left and right nodes.
      
      Unit tested in userspace using this program:
      https://github.com/jthornber/redistribute3-test/blob/master/redistribute3_t.cSigned-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      2871c69e
    • I
      rbd: prevent kernel stack blow up on rbd map · 6d69bb53
      Ilya Dryomov 提交于
      Mapping an image with a long parent chain (e.g. image foo, whose parent
      is bar, whose parent is baz, etc) currently leads to a kernel stack
      overflow, due to the following recursion in the reply path:
      
        rbd_osd_req_callback()
          rbd_obj_request_complete()
            rbd_img_obj_callback()
              rbd_img_parent_read_callback()
                rbd_obj_request_complete()
                  ...
      
      Limit the parent chain to 16 images, which is ~5K worth of stack.  When
      the above recursion is eliminated, this limit can be lifted.
      
      Fixes: http://tracker.ceph.com/issues/12538
      
      Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NJosh Durgin <jdurgin@redhat.com>
      6d69bb53
    • I
      rbd: don't leak parent_spec in rbd_dev_probe_parent() · 1f2c6651
      Ilya Dryomov 提交于
      Currently we leak parent_spec and trigger a "parent reference
      underflow" warning if rbd_dev_create() in rbd_dev_probe_parent() fails.
      The problem is we take the !parent out_err branch and that only drops
      refcounts; parent_spec that would've been freed had we called
      rbd_dev_unparent() remains and triggers rbd_warn() in
      rbd_dev_parent_put() - at that point we have parent_spec != NULL and
      parent_ref == 0, so counter ends up being -1 after the decrement.
      
      Redo rbd_dev_probe_parent() to fix this.
      
      Cc: stable@vger.kernel.org # 3.10+, needs backporting for < 4.2
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      Reviewed-by: NAlex Elder <elder@linaro.org>
      1f2c6651
  7. 23 10月, 2015 5 次提交