提交 · d3fbff306c215946cdbcf9ace4d0b78e9f72b5c4 · openeuler / Kernel

14 3月, 2017 4 次提交

rds: ib: unmap the scatter/gather list when error · 569f41d1

由 Zhu Yanjun 提交于 3月 13, 2017

When some errors occur, the scatter/gather list mapped to DMA addresses
should be handled.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

569f41d1

rds: ib: add the static type to the function · edd08f96

由 Zhu Yanjun 提交于 3月 13, 2017

The function rds_ib_map_fmr is used only in the ib_fmr.c
file. As such, the static type is added to limit it in this file.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

edd08f96

rds: ib: remove redundant ib_dealloc_fmr · ea69c883

由 Zhu Yanjun 提交于 3月 13, 2017

The function ib_dealloc_fmr will never be called. As such, it should
be removed.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea69c883

rds: ib: drop unnecessary rdma_reject · b418c527

由 Zhu Yanjun 提交于 3月 13, 2017

When rdma_accept fails, rdma_reject is called in it. As such, it is
not necessary to execute rdma_reject again.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b418c527

10 3月, 2017 2 次提交

net: Work around lockdep limitation in sockets that use sockets · cdfbabfb

由 David Howells 提交于 3月 09, 2017

Lockdep issues a circular dependency warning when AFS issues an operation
through AF_RXRPC from a context in which the VFS/VM holds the mmap_sem.

The theory lockdep comes up with is as follows:

 (1) If the pagefault handler decides it needs to read pages from AFS, it
     calls AFS with mmap_sem held and AFS begins an AF_RXRPC call, but
     creating a call requires the socket lock:

	mmap_sem must be taken before sk_lock-AF_RXRPC

 (2) afs_open_socket() opens an AF_RXRPC socket and binds it.  rxrpc_bind()
     binds the underlying UDP socket whilst holding its socket lock.
     inet_bind() takes its own socket lock:

	sk_lock-AF_RXRPC must be taken before sk_lock-AF_INET

 (3) Reading from a TCP socket into a userspace buffer might cause a fault
     and thus cause the kernel to take the mmap_sem, but the TCP socket is
     locked whilst doing this:

	sk_lock-AF_INET must be taken before mmap_sem

However, lockdep's theory is wrong in this instance because it deals only
with lock classes and not individual locks.  The AF_INET lock in (2) isn't
really equivalent to the AF_INET lock in (3) as the former deals with a
socket entirely internal to the kernel that never sees userspace.  This is
a limitation in the design of lockdep.

Fix the general case by:

 (1) Double up all the locking keys used in sockets so that one set are
     used if the socket is created by userspace and the other set is used
     if the socket is created by the kernel.

 (2) Store the kern parameter passed to sk_alloc() in a variable in the
     sock struct (sk_kern_sock).  This informs sock_lock_init(),
     sock_init_data() and sk_clone_lock() as to the lock keys to be used.

     Note that the child created by sk_clone_lock() inherits the parent's
     kern setting.

 (3) Add a 'kern' parameter to ->accept() that is analogous to the one
     passed in to ->create() that distinguishes whether kernel_accept() or
     sys_accept4() was the caller and can be passed to sk_alloc().

     Note that a lot of accept functions merely dequeue an already
     allocated socket.  I haven't touched these as the new socket already
     exists before we get the parameter.

     Note also that there are a couple of places where I've made the accepted
     socket unconditionally kernel-based:

	irda_accept()
	rds_rcp_accept_one()
	tcp_accept_from_sock()

     because they follow a sock_create_kern() and accept off of that.

Whilst creating this, I noticed that lustre and ocfs don't create sockets
through sock_create_kern() and thus they aren't marked as for-kernel,
though they appear to be internal.  I wonder if these should do that so
that they use the new set of lock keys.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cdfbabfb

rds: ib: add error handle · 3b12f73a

由 Zhu Yanjun 提交于 3月 07, 2017

In the function rds_ib_setup_qp, the error handle is missing. When some
error occurs, it is possible that memory leak occurs. As such, error
handle is added.

Cc: Joe Jin <joe.jin@oracle.com>
Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
Reviewed-by: NGuanglei Li <guanglei.li@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b12f73a

08 3月, 2017 3 次提交

rds: tcp: Sequence teardown of listen and acceptor sockets to avoid races · b21dd450

由 Sowmini Varadhan 提交于 3月 04, 2017

Commit a93d01f5 ("RDS: TCP: avoid bad page reference in
rds_tcp_listen_data_ready") added the function
rds_tcp_listen_sock_def_readable()  to handle the case when a
partially set-up acceptor socket drops into rds_tcp_listen_data_ready().
However, if the listen socket (rtn->rds_tcp_listen_sock) is itself going
through a tear-down via rds_tcp_listen_stop(), the (*ready)() will be
null and we would hit a panic  of the form
  BUG: unable to handle kernel NULL pointer dereference at   (null)
  IP:           (null)
   :
  ? rds_tcp_listen_data_ready+0x59/0xb0 [rds_tcp]
  tcp_data_queue+0x39d/0x5b0
  tcp_rcv_established+0x2e5/0x660
  tcp_v4_do_rcv+0x122/0x220
  tcp_v4_rcv+0x8b7/0x980
    :
In the above case, it is not fatal to encounter a NULL value for
ready- we should just drop the packet and let the flush of the
acceptor thread finish gracefully.

In general, the tear-down sequence for listen() and accept() socket
that is ensured by this commit is:
     rtn->rds_tcp_listen_sock = NULL; /* prevent any new accepts */
     In rds_tcp_listen_stop():
         serialize with, and prevent, further callbacks using lock_sock()
         flush rds_wq
         flush acceptor workq
         sock_release(listen socket)
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b21dd450

rds: tcp: Reorder initialization sequence in rds_tcp_init to avoid races · 16c09b1c

由 Sowmini Varadhan 提交于 3月 04, 2017

Order of initialization in rds_tcp_init needs to be done so
that resources are set up and destroyed in the correct synchronization
sequence with both the data path, as well as netns create/destroy
path. Specifically,

- we must call register_pernet_subsys and get the rds_tcp_netid
  before calling register_netdevice_notifier, otherwise we risk
  the sequence
    1. register_netdevice_notifier sets up netdev notifier callback
    2. rds_tcp_dev_event -> rds_tcp_kill_sock uses netid 0, and finds
       the wrong rtn, resulting in a panic with string that is of the form:

  BUG: unable to handle kernel NULL pointer dereference at 000000000000000d
  IP: rds_tcp_kill_sock+0x3a/0x1d0 [rds_tcp]
         :

- the rds_tcp_incoming_slab kmem_cache must be initialized before the
  datapath starts up. The latter can happen any time after the
  pernet_subsys registration of rds_tcp_net_ops, whose -> init
  function sets up the listen socket. If the rds_tcp_incoming_slab has
  not been set up at that time, a panic of the form below may be
  encountered

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000014
  IP: kmem_cache_alloc+0x90/0x1c0
     :
  rds_tcp_data_recv+0x1e7/0x370 [rds_tcp]
  tcp_read_sock+0x96/0x1c0
  rds_tcp_recv_path+0x65/0x80 [rds_tcp]
     :
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16c09b1c

rds: tcp: Take explicit refcounts on struct net · 8edc3aff

由 Sowmini Varadhan 提交于 3月 04, 2017

It is incorrect for the rds_connection to piggyback on the
sock_net() refcount for the netns because this gives rise to
a chicken-and-egg problem during rds_conn_destroy. Instead explicitly
take a ref on the net, and hold the netns down till the connection
tear-down is complete.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8edc3aff

04 3月, 2017 1 次提交

rds: remove unnecessary returned value check · a8d63a53

由 Zhu Yanjun 提交于 3月 03, 2017

The function rds_trans_register always returns 0. As such, it is not
necessary to check the returned value.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: NYuval Shaia <yuval.shaia@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8d63a53

02 3月, 2017 1 次提交

rds: ib: add the static type to the variables · 4f7bfb39

由 Zhu Yanjun 提交于 2月 28, 2017

The variables rds_ib_mr_1m_pool_size and rds_ib_mr_8k_pool_size
are used only in the ib.c file. As such, the static type is
added to limit them in this file.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f7bfb39

25 2月, 2017 2 次提交

rds: fix memory leak error · 3b5923f0

由 Zhu Yanjun 提交于 2月 24, 2017

When the function register_netdevice_notifier fails, the memory
allocated by kmem_cache_create should be freed by the function
kmem_cache_destroy.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b5923f0

RDS: IB: fix ifnullfree.cocci warnings · 77cc7aee

由 Wu Fengguang 提交于 2月 23, 2017

net/rds/ib.c:115:2-7: WARNING: NULL check before freeing functions like kfree, debugfs_remove, debugfs_remove_recursive or usb_free_urb is not needed. Maybe consider reorganizing relevant code to avoid passing NULL values.

NULL check before some freeing functions is not needed.

Based on checkpatch warning
"kfree(NULL) is safe this check is probably not required"
and kfreeaddr.cocci by Julia Lawall.

Generated by: scripts/coccinelle/free/ifnullfree.cocci
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77cc7aee

18 2月, 2017 1 次提交

rds:Remove unnecessary ib_ring unalloc · d2c58294

由 Zhu Yanjun 提交于 2月 17, 2017

In the function rds_ib_xmit_atomic, ib_ring is not allocated
successfully. As such, it is not necessary to unalloc it.

Cc: Joe Jin <joe.jin@oracle.com>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: NZhu Yanjun <yanjun.zhu@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2c58294

25 1月, 2017 3 次提交

RDS: net: Switch from dma_device to dev.parent · 5f68dcaf

由 Bart Van Assche 提交于 1月 20, 2017

Prepare for removal of ib_device.dma_device.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

5f68dcaf

IB/core: Change the type of an ib_dma_alloc_coherent() argument · d43dbacf

由 Bart Van Assche 提交于 1月 20, 2017

Change the type of the dma_handle argument from u64 * to dma_addr_t *.
This patch does not change any functionality.
Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d43dbacf

RDS: IB: Remove an unused structure member · 69324c20

由 Bart Van Assche 提交于 1月 20, 2017

Signed-off-by: NBart Van Assche <bart.vanassche@sandisk.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rds-devel@oss.oracle.com
Signed-off-by: NDoug Ledford <dledford@redhat.com>

69324c20

07 1月, 2017 1 次提交

RDS: validate the requested traces user input against max supported · 780e9829

由 santosh.shilimkar@oracle.com 提交于 1月 06, 2017

Larger than supported value can lead to array read/write overflow.
Reported-by: NColin Ian King <colin.king@canonical.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

780e9829

03 1月, 2017 17 次提交

RDS: add receive message trace used by application · 3289025a

由 Santosh Shilimkar 提交于 7月 04, 2016

Socket option to tap receive path latency in various stages
in nano seconds. It can be enabled on selective sockets using
using SO_RDS_MSG_RXPATH_LATENCY socket option. RDS will return
the data to application with RDS_CMSG_RXPATH_LATENCY in defined
format. Scope is left to add more trace points for future
without need of change in the interface.
Reviewed-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

3289025a

RDS: make message size limit compliant with spec · f9fb69ad

由 Avinash Repaka 提交于 2月 29, 2016

RDS support max message size as 1M but the code doesn't check this
in all cases. Patch fixes it for RDMA & non-RDMA and RDS MR size
and its enforced irrespective of underlying transport.
Signed-off-by: NAvinash Repaka <avinash.repaka@oracle.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

f9fb69ad

RDS: add stat for socket recv memory usage · 192a798f

由 Venkat Venkatsubra 提交于 7月 09, 2016

Tracks the receive side memory added to scokets and removed from sockets.
Signed-off-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

192a798f

RDS: IB: fix panic due to handlers running post teardown · cf657269

由 Santosh Shilimkar 提交于 9月 29, 2016

Shutdown code reaping loop takes care of emptying the
CQ's before they being destroyed. And once tasklets are
killed, the hanlders are not expected to run.

But because of core tasklet code issues, tasklet handler could
still run even after tasklet_kill,
RDS IB shutdown code already reaps the CQs before freeing
cq/qp resources so as such the handlers have nothing left
to do post shutdown.

On other hand any handler running after teardown and trying
to access already freed qp/cq resources causes issues
Patch fixes this race by  makes sure that handlers returns
without any action post teardown.
Reviewed-by: NWengang <wen.gang.wang@oracle.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

cf657269

RDS: RDMA: Fix the composite message user notification · 941f8d55

由 Santosh Shilimkar 提交于 2月 18, 2016

When application sends an RDS RDMA composite message consist of
RDMA transfer to be followed up by non RDMA payload, it expect to
be notified *only* when the full message gets delivered. RDS RDMA
notification doesn't behave this way though.

Thanks to Venkat for debug and root casuing the issue
where only first part of the message(RDMA) was
successfully delivered but remainder payload delivery failed.
In that case, application should not be notified with
a false positive of message delivery success.

Fix this case by making sure the user gets notified only after
the full message delivery.
Reviewed-by: NVenkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

941f8d55

RDS: IB: Add vector spreading for cqs · be2f76ea

由 Santosh Shilimkar 提交于 7月 04, 2016

Based on available device vectors, allocate cqs accordingly to
get better spread of completion vectors which helps performace
great deal..
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

be2f76ea

RDS: IB: add few useful cache stasts · 09b2b8f5

由 Santosh Shilimkar 提交于 7月 09, 2016

Tracks the ib receive cache total, incoming and frag allocations.
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

09b2b8f5

RDS: IB: track and log active side endpoint in connection · 581d53c9

由 Santosh Shilimkar 提交于 7月 09, 2016

Useful to know the active and passive end points in a
RDS IB connection.
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

581d53c9

RDS: RDMA: silence the use_once mr log flood · c536a068

由 Santosh Shilimkar 提交于 7月 03, 2016

In absence of extension headers, message log will keep
flooding the console. As such even without use_once we can
clean up the MRs so its not really an error case message
so make it debug message
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

c536a068

RDS: IB: split the mr registration and invalidation path · 56012459

由 Santosh Shilimkar 提交于 3月 08, 2016

MR invalidation in RDS is done in background thread and not in
data path like registration. So break the dependency between them
which helps to remove the performance bottleneck.
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

56012459

RDS: RDMA: return appropriate error on rdma map failures · 584a8279

由 Santosh Shilimkar 提交于 7月 04, 2016

The first message to a remote node should prompt a new
connection even if it is RDMA operation. For RDMA operation
the MR mapping can fail because connections is not yet up.

Since the connection establishment is asynchronous,
we make sure the map failure because of unavailable
connection reach to the user by appropriate error code.
Before returning to the user, lets trigger the connection
so that its ready for the next retry.
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

584a8279

RDS: RDMA: start rdma listening after init · 8d5d8a5f

由 Qing Huang 提交于 7月 04, 2016

This prevents RDS from handling incoming rdma packets before RDS
completes initializing its recv/send components.
Signed-off-by: NQing Huang <qing.huang@oracle.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

8d5d8a5f

RDS: RDMA: fix the ib_map_mr_sg_zbva() argument · 3e56c2f8

由 Santosh Shilimkar 提交于 12月 04, 2016

Fixes warning: Using plain integer as NULL pointer
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

3e56c2f8

RDS: IB: make the transport retry count smallest · fab8688d

由 Santosh Shilimkar 提交于 7月 04, 2016

Transport retry is not much useful since it indicate packet loss
in fabric so its better to failover fast rather than longer retry.
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

fab8688d

S
RDS: IB: include faddr in connection log · ff3f19a2
由 Santosh Shilimkar 提交于 3月 14, 2016
```
Also use pr_* for it.
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
```
ff3f19a2

RDS: mark few internal functions static to make sparse build happy · bb789763

由 Santosh Shilimkar 提交于 12月 04, 2016

Fixes below warnings:
warning: symbol 'rds_send_probe' was not declared. Should it be static?
warning: symbol 'rds_send_ping' was not declared. Should it be static?
warning: symbol 'rds_tcp_accept_one_path' was not declared. Should it be static?
warning: symbol 'rds_walk_conn_path_info' was not declared. Should it be static?
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

bb789763

RDS: log the address on bind failure · f69b22e6

由 Santosh Shilimkar 提交于 11月 04, 2015

It's useful to know the IP address when RDS fails to bind a
connection. Thus, adding it to the error message.

Orabug: 21894138
Reviewed-by: NWei Lin Guay <wei.lin.guay@oracle.com>
Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>

f69b22e6

27 12月, 2016 1 次提交
- A
  rds: remove dead code · be6e4d66
  由 Al Viro 提交于 11月 15, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  be6e4d66
21 12月, 2016 1 次提交

RDS: use rb_entry() · a763f78c

由 Geliang Tang 提交于 12月 20, 2016

To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.
Signed-off-by: NGeliang Tang <geliangtang@gmail.com>
Reviewed-by: NLeon Romanovsky <leonro@mellanox.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a763f78c

15 12月, 2016 1 次提交

rds_rdma: log the connection reject message · 39384f04

由 Steve Wise 提交于 10月 26, 2016

Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

39384f04

03 12月, 2016 1 次提交

RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net · 721c7443

由 Sowmini Varadhan 提交于 12月 01, 2016

If some error is encountered in rds_tcp_init_net, make sure to
unregister_netdevice_notifier(), else we could trigger a panic
later on, when the modprobe from a netns fails.
Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

721c7443

18 11月, 2016 1 次提交

netns: make struct pernet_operations::id unsigned int · c7d03a00

由 Alexey Dobriyan 提交于 11月 17, 2016

Make struct pernet_operations::id unsigned.

There are 2 reasons to do so:

1)
This field is really an index into an zero based array and
thus is unsigned entity. Using negative value is out-of-bound
access by definition.

2)
On x86_64 unsigned 32-bit data which are mixed with pointers
via array indexing or offsets added or subtracted to pointers
are preffered to signed 32-bit data.

"int" being used as an array index needs to be sign-extended
to 64-bit before being used.

	void f(long *p, int i)
	{
		g(p[i]);
	}

  roughly translates to

	movsx	rsi, esi
	mov	rdi, [rsi+...]
	call 	g

MOVSX is 3 byte instruction which isn't necessary if the variable is
unsigned because x86_64 is zero extending by default.

Now, there is net_generic() function which, you guessed it right, uses
"int" as an array index:

	static inline void *net_generic(const struct net *net, int id)
	{
		...
		ptr = ng->ptr[id - 1];
		...
	}

And this function is used a lot, so those sign extensions add up.

Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
messing with code generation):

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)

Unfortunately some functions actually grow bigger.
This is a semmingly random artefact of code generation with register
allocator being used differently. gcc decides that some variable
needs to live in new r8+ registers and every access now requires REX
prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
used which is longer than [r8]

However, overall balance is in negative direction:

	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
	function                                     old     new   delta
	nfsd4_lock                                  3886    3959     +73
	tipc_link_build_proto_msg                   1096    1140     +44
	mac80211_hwsim_new_radio                    2776    2808     +32
	tipc_mon_rcv                                1032    1058     +26
	svcauth_gss_legacy_init                     1413    1429     +16
	tipc_bcbase_select_primary                   379     392     +13
	nfsd4_exchange_id                           1247    1260     +13
	nfsd4_setclientid_confirm                    782     793     +11
		...
	put_client_renew_locked                      494     480     -14
	ip_set_sockfn_get                            730     716     -14
	geneve_sock_add                              829     813     -16
	nfsd4_sequence_done                          721     703     -18
	nlmclnt_lookup_host                          708     686     -22
	nfsd4_lockt                                 1085    1063     -22
	nfs_get_client                              1077    1050     -27
	tcf_bpf_init                                1106    1076     -30
	nfsd4_encode_fattr                          5997    5930     -67
	Total: Before=154856051, After=154854321, chg -0.00%
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c7d03a00

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功