提交 · e4648b014e03baee45d5f5146c1219b19e4e5f2f · openanolis / cloud-kernel

31 5月, 2015 1 次提交

iser-target: Fix error path in isert_create_pi_ctx() · b2feda4f

由 Roland Dreier 提交于 5月 29, 2015

We don't assign pi_ctx to desc->pi_ctx until we're certain to succeed
in the function.  That means the cleanup path should use the local
pi_ctx variable, not desc->pi_ctx.

This was detected by Coverity (CID 1260062).
Signed-off-by: NRoland Dreier <roland@purestorage.com>
Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>

b2feda4f

18 4月, 2015 1 次提交

IB/ipoib: Fix ndo_get_iflink · 2c153959

由 Erez Shitrit 提交于 4月 16, 2015

Currently, iflink of the parent interface was always accessed, even
when interface didn't have a parent and hence we crashed there.

Handle the interface types properly: for a child interface, return
the ifindex of the parent, for parent interface, return its ifindex.

For child devices, make sure to set the parent pointer prior to
invoking register_netdevice(), this allows the new ndo to be called
by the stack immediately after the child device is registered.

Fixes: 5aa7add8 ('infiniband/ipoib: implement ndo_get_iflink')
Reported-by: NHonggang Li <honli@redhat.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NHonggang Li <honli@redhat.com>
Reviewed-By: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>+
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c153959

17 4月, 2015 1 次提交

cxgb4: drop __GFP_NOFAIL allocation · f72f116a

由 Michal Hocko 提交于 4月 14, 2015

set_filter_wr is requesting __GFP_NOFAIL allocation although it can return
ENOMEM without any problems obviously (t4_l2t_set_switching does that
already).  So the non-failing requirement is too strong without any
obvious reason.  Drop __GFP_NOFAIL and reorganize the code to have the
failure paths easier.

The same applies to _c4iw_write_mem_dma_aligned which uses __GFP_NOFAIL
and then checks the return value and returns -ENOMEM on failure.  This
doesn't make any sense what so ever.  Either the allocation cannot fail or
it can.

del_filter_wr seems to be safe as well because the filter entry is not
marked as pending and the return value is propagated up the stack up to
c4iw_destroy_listen.
Signed-off-by: NMichal Hocko <mhocko@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dave Chinner <david@fromorbit.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Hariprasad S <hariprasad@chelsio.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f72f116a

16 4月, 2015 37 次提交

IB/iser: Rewrite bounce buffer code path · ba943fb2

由 Sagi Grimberg 提交于 4月 14, 2015

In some rare cases, IO operations may be not aligned to page
boundaries. This prevents iser from performing fast memory
registration. In order to overcome that iser uses a bounce
buffer to carry the transaction. We basically allocate a buffer
in the size of the transaction and perform a copy.

The buffer allocation using kmalloc is too restrictive since it
requires higher order (atomic) allocations for large transactions
(which may result in memory exhaustion fairly fast for some workloads).
We rewrite the bounce buffer code path to allocate scattered pages
and perform a copy between the transaction sg and the bounce sg.
Reported-by: NAlex Lyakas <alex@zadarastorage.com>
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ba943fb2

IB/iser: Bump version to 1.6 · 4fcd1470

由 Sagi Grimberg 提交于 4月 14, 2015

Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4fcd1470

IB/iser: Remove code duplication for a single DMA entry · ad1e5672

由 Sagi Grimberg 提交于 4月 14, 2015

In singleton scatterlists, DMA memory registration code
is taken both for Fastreg and FMR code paths. Move it to
a function.

This patch does not change any functionality.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAdir Lev <adirl@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ad1e5672

IB/iser: Pass struct iser_mem_reg to iser_fast_reg_mr and iser_reg_sig_mr · 6ef8bb83

由 Sagi Grimberg 提交于 4月 14, 2015

Instead of passing ib_sge as output variable, we pass the mem_reg
pointer to have the routines fill the rkey as well. This reduces
code duplication and extra assignments. This is a preparation step
to unify some registration logics together. Also, pass iser_fast_reg_mr
the fastreg descriptor directly.

This patch does not change any functionality.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAdir Lev <adirl@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6ef8bb83

IB/iser: Modify struct iser_mem_reg members · 90a6684c

由 Sagi Grimberg 提交于 4月 14, 2015

No need to keep lkey, va, len variables, we can keep
them as struct ib_sge. This will help when we change the
memory registration logic.

This patch does not change any functionality.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAdir Lev <adirl@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

90a6684c

IB/iser: Make fastreg pool cache friendly · 8b95aa2c

由 Sagi Grimberg 提交于 4月 14, 2015

Memory regions are resources that are saved
in the device caches. Increase the probability for
a cache hit by adding the MRU descriptor to pool
head.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

8b95aa2c

IB/iser: Move PI context alloc/free to routines · 4dec2a27

由 Sagi Grimberg 提交于 4月 14, 2015

Make iser_[create|destroy]_fastreg_desc shorter, more
readable and easily extendable.

This patch does not change any functionality.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAdir Lev <adirl@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

4dec2a27

IB/iser: Move fastreg descriptor pool get/put to helper functions · bd8b944e

由 Sagi Grimberg 提交于 4月 14, 2015

Instead of open-coding connection fastreg pool get/put,
we introduce iser_reg_desc[get|put] helpers.

We aren't setting these static as this will be a per-device
routine later on. Also, cleanup iser_unreg_rdma_mem_fastreg
a bit.

This patch does not change any functionality.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAdir Lev <adirl@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

bd8b944e

IB/iser: Merge build page-vec into register page-vec · f0e35c27

由 Sagi Grimberg 提交于 4月 14, 2015

No need for these two separate. Keep it in a single routine
like in the fastreg case. This will also make iser_reg_page_vec
closer to iser_fast_reg_mr arguments. This is a preparation
step for registration flow refactor.

This patch does not change any functionality.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAdir Lev <adirl@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

f0e35c27

IB/iser: Get rid of struct iser_rdma_regd · b130eded

由 Sagi Grimberg 提交于 4月 14, 2015

This struct members other than struct iser_mem_reg are unused,
so remove it altogether.

This patch does not change any functionality.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAdir Lev <adirl@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

b130eded

IB/iser: Remove redundant assignments in iser_reg_page_vec · 6847fdeb

由 Sagi Grimberg 提交于 4月 14, 2015

Buffer length was assigned twice, and no reason to set va to
io_addr and then add the offset, just set va to io_addr + offset.

This patch does not change any functionality.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAdir Lev <adirl@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

6847fdeb

IB/iser: Move memory reg/dereg routines to iser_memory.c · d03e61d0

由 Sagi Grimberg 提交于 4月 14, 2015

As memory registration/de-registration methods, lets
move them to their natural location. While we're at it,
make iser_reg_page_vec routine static.

This patch does not change any functionality.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d03e61d0

IB/iser: Don't pass ib_device to fall_to_bounce_buff routine · 56408325

由 Sagi Grimberg 提交于 4月 14, 2015

No need to pass that, we can take it from the task.
In a later stage, this function will be invoked
according to a device capability.

This patch does not change any functionality.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAdir Lev <adirl@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

56408325

IB/iser: Remove a redundant struct iser_data_buf · e3784bd1

由 Sagi Grimberg 提交于 4月 14, 2015

No need to keep two iser_data_buf structures just in case we use
mem copy. We can avoid that just by adding a pointer to the original
sg. So keep only two iser_data_buf per command (data and protection)
and pass the relevant data_buf to bounce buffer routine.

This patch does not change any functionality.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAdir Lev <adirl@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

e3784bd1

IB/iser: Remove redundant cmd_data_len calculation · ecc3993a

由 Sagi Grimberg 提交于 4月 14, 2015

This code was added before we had protection data length
calculation (in iser_send_command), so we needed to calc
the sg data length from the sg itself. This is not needed
anymore.

This patch does not change any functionality.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NAdir Lev <adirl@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ecc3993a

IB/iser: Fix wrong calculation of protection buffer length · a065fe6a

由 Sagi Grimberg 提交于 4月 14, 2015

This length miss-calculation may cause a silent data corruption
in the DIX case and cause the device to reference unmapped area.

Fixes: d77e6535 ('libiscsi, iser: Adjust data_length to include protection information')
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a065fe6a

IB/iser: Handle fastreg/local_inv completion errors · 30bf1d58

由 Sagi Grimberg 提交于 4月 14, 2015

Fast registration and local invalidate work requests can
also fail. We should call error completion handler for them.
Reported-by: NRoi Dayan <roid@mellanox.com>
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

30bf1d58

IB/iser: Fix unload during ep_poll wrong dereference · c4de4663

由 Sagi Grimberg 提交于 4月 14, 2015

In case the user unloaded ib_iser while ep_connect is in
progress, we need to destroy the endpoint although ep_disconnect
wasn't invoked (we detect this by the iser conn state != DOWN).
However, if we got an REJECTED/UNREACHABLE CM event we move the
connection state to DOWN which will prevent us from destroying
the endpoint in the module unload stage. Fix this by setting the
connection state to TERMINATING in iser_conn_error so we can still
destroy the endpoint at unload stage.
Reported-by: NAriel Nahum <arieln@mellanox.com>
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c4de4663

ib_srpt: convert printk's to pr_* functions · 9f5d32af

由 Doug Ledford 提交于 10月 20, 2014

The driver already defined the pr_format, it just hadn't
been converted to use pr_info, pr_warn, and pr_err instead
of the equivalent printks.  Convert so that messages from
the driver are now properly tagged with their driver name
and can be more easily debugged.

In addition, a number of these printk's were not newline
terminated, so fix that at the same time.
Reviewed-by: NBart Van Assche <bvanassche@acm.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9f5d32af

IB/srp: Use P_Key cache for P_Key lookups · 56b5390c

由 Bart Van Assche 提交于 7月 09, 2014

This change slightly reduces the time needed to log in.
Signed-off-by: NBart Van Assche <bvanassche@acm.org>
Reviewed-by: NSagi Grimberg <sagig@mellanox.com>
Reviewed-by: NDavid Dillow <dave@thedillows.org>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

56b5390c

infiniband/mlx4: check for mapping error · cc47d369

由 Sebastian Ott 提交于 3月 16, 2015

Since ib_dma_map_single can fail use ib_dma_mapping_error to check
for errors.
Signed-off-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: NJack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

cc47d369

ib_uverbs: Fix pages leak when using XRC SRQs · a233c4b5

由 Sébastien Dugué 提交于 4月 09, 2015

Hello,

  When an application using XRCs abruptly terminates, the mmaped pages
of the CQ buffers are leaked.

  This comes from the fact that when resources are released in
ib_uverbs_cleanup_ucontext(), we fail to release the CQs because their
refcount is not 0.

  When creating an XRC SRQ, we increment the associated CQ refcount.
This refcount is only decremented when the SRQ is released.

  Therefore we need to release the SRQs prior to the CQs to make sure
that all references to the CQs are gone before trying to release these.
Signed-off-by: NSebastien Dugue <sebastien.dugue@bull.net>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a233c4b5

IB/mlx4: Fix WQE LSO segment calculation · ca9b590c

由 Erez Shitrit 提交于 4月 02, 2015

The current code decreases from the mss size (which is the gso_size
from the kernel skb) the size of the packet headers.

It shouldn't do that because the mss that comes from the stack
(e.g IPoIB) includes only the tcp payload without the headers.

The result is indication to the HW that each packet that the HW sends
is smaller than what it could be, and too many packets will be sent
for big messages.

An easy way to demonstrate one more aspect of the problem is by
configuring the ipoib mtu to be less than 2*hlen (2*56) and then
run app sending big TCP messages. This will tell the HW to send packets
with giant (negative value which under unsigned arithmetics becomes
a huge positive one) length and the QP moves to SQE state.

Fixes: b832be1e ('IB/mlx4: Add IPoIB LSO support')
Reported-by: NMatthew Finlay <matt@mellanox.com>
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

ca9b590c

IB/ipoib: Remove IPOIB_MCAST_RUN bit · 0e5544d9

由 Erez Shitrit 提交于 4月 02, 2015

After Doug Ledford's changes there is no need in that bit, it's
semantic becomes subset of the IPOIB_FLAG_OPER_UP bit.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

0e5544d9

IB/ipoib: Save only IPOIB_MAX_PATH_REC_QUEUE skb's · 1e85b806

由 Erez Shitrit 提交于 4月 02, 2015

Whenever there is no path->ah to the destination, keep only defined
number of skb's. Otherwise there are cases that the driver can keep
infinite list of skb's.

For example, when one device want to send unicast arp to the destination,
and from some reason the SM doesn't respond, the driver currently keeps
all the skb's. If that unicast arp traffic stopped, all  these skb's
are kept by the path object till the interface is down.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

1e85b806

IB/ipoib: Handle QP in SQE state · 2c010730

由 Erez Shitrit 提交于 4月 02, 2015

As the result of a completion error the QP can moved to SQE state by
the hardware. Since it's not the Error state, there are no flushes
and hence the driver doesn't know about that.

The fix creates a task that after completion with error which is not a
flush tracks the QP state and if it is in SQE state moves it back to RTS.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

2c010730

IB/ipoib: Update broadcast record values after each successful join request · 3fd0605c

由 Erez Shitrit 提交于 4月 02, 2015

Update the cached broadcast record in the priv object after every new
join of this broadcast domain group.

These values are needed for the port configuration (MTU size) and to
all the new multicast (non-broadcast) join requests initial parameters.

For example, SM starts with 2K MTU for all the fabric, and after that it
restarts (or handover to new SM) with new port configuration of 4K MTU.
Without using the new values, the driver will keep its old configuration
of 2K and will not apply the new configuration of 4K.
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

3fd0605c

IB/ipoib: Use one linear skb in RX flow · a44878d1

由 Erez Shitrit 提交于 4月 02, 2015

The current code in the RX flow uses two sg entries for each incoming
packet, the first one was for the IB headers and the second for the rest
of the data, that causes two dma map/unmap and two allocations, and few
more actions that were done at the data path.

Use only one linear skb on each incoming packet, for the data (IB
headers and payload), that reduces the packet processing in the
data-path (only one skb, no frags, the first frag was not used anyway,
less memory allocations) and the dma handling (only one dma map/unmap
over each incoming packet instead of two map/unmap per each incoming packet).

After commit 73d3fe6d ("gro: fix aggregation for skb using frag_list") from
Eric Dumazet, we will get full aggregation for large packets.

When running bandwidth tests before and after the (over the card's numa node),
using "netperf -H 1.1.1.3 -T -t TCP_STREAM", the results before are ~12Gbs before
and after ~16Gbs on my setup (Mellanox's ConnectX3).
Signed-off-by: NErez Shitrit <erezsh@mellanox.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

a44878d1

IB/ipoib: drop mcast_mutex usage · 1c0453d6