提交 · 08d2cc3b26554cae21f279b520ae5c2a3b2be421 · openanolis / cloud-kernel

19 3月, 2015 21 次提交

inet: request sock should init IPv6/IPv4 addresses · 08d2cc3b

由 Eric Dumazet 提交于 3月 18, 2015

In order to be able to use sk_ehashfn() for request socks,
we need to initialize their IPv6/IPv4 addresses.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08d2cc3b

inet: get rid of last __inet_hash_connect() argument · b4d6444e

由 Eric Dumazet 提交于 3月 18, 2015

We now always call __inet_hash_nolisten(), no need to pass it
as an argument.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4d6444e

ipv6: get rid of __inet6_hash() · 77a6a471

由 Eric Dumazet 提交于 3月 18, 2015

We can now use inet_hash() and __inet_hash() instead of private
functions.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77a6a471

inet: add IPv6 support to sk_ehashfn() · d1e559d0

由 Eric Dumazet 提交于 3月 18, 2015

Intent is to converge IPv4 & IPv6 inet_hash functions to
factorize code.

IPv4 sockets initialize sk_rcv_saddr and sk_v6_daddr
in this patch, thanks to new sk_daddr_set() and sk_rcv_saddr_set()
helpers.

__inet6_hash can now use sk_ehashfn() instead of a private
inet6_sk_ehashfn() and will simply use __inet_hash() in a
following patch.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1e559d0

net: introduce sk_ehashfn() helper · 5b441f76

由 Eric Dumazet 提交于 3月 18, 2015

Goal is to unify IPv4/IPv6 inet_hash handling, and use common helpers
for all kind of sockets (full sockets, timewait and request sockets)

inet_sk_ehashfn() becomes sk_ehashfn() but still only copes with IPv4
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b441f76

netns: constify net_hash_mix() and various callers · 6eada011

由 Eric Dumazet 提交于 3月 18, 2015

const qualifiers ease code review by making clear
which objects are not written in a function.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6eada011

Merge branch 'txq_max_rate' · 8f6320de

由 David S. Miller 提交于 3月 18, 2015

Or Gerlitz says:

====================
Add max rate TXQ attribute

Add the ability to set a max-rate limitation for TX queues.
The attribute name is maxrate and the units are Mbs, to make
it similar to the existing max-rate limitation knobs (ETS and
SRIOV ndo calls).

changes from V2:
  - added Documentation (thanks Florian and Tom)
  - rebased to latest net-next to comply with the swdev ndo removal
  - addressed more feedback from Dave on the comments style

changes from V1:
  - addressed feedback from Dave

changes from V0:
  - addressed feedback from Sergei

John Fastabend (1):
  net: Add max rate tx queue attribute
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f6320de

net/mlx4_en: Add tx queue maxrate support · c10e4fc6

由 Or Gerlitz 提交于 3月 18, 2015

Add ndo_set_tx_maxrate support.

To support per tx queue maxrate limit, we use the update-qp firmware
command to do run-time rate setting for the qp that serves this tx ring.
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NIdo Shamay <idos@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c10e4fc6

net/mlx4_core: Add basic support for QP max-rate limiting · fc31e256

由 Or Gerlitz 提交于 3月 18, 2015

Add the low-level device commands and definitions used for QP max-rate limiting.

This is done through the following elements:

  - read rate-limit device caps in QUERY_DEV_CAP: number of different
    rates and the min/max rates in Kbs/Mbs/Gbs units

  - enhance the QP context struct to contain rate limit units and value

  - allow to do run time rate-limit setting to QPs through the
    update-qp firmware command

  - QP rate-limiting is disallowed for VFs
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fc31e256

net: Add max rate tx queue attribute · 822b3b2e

由 John Fastabend 提交于 3月 18, 2015

This adds a tx_maxrate attribute to the tx queue sysfs entry allowing
for max-rate limiting. Along with DCB-ETS and BQL this provides another
knob to tune queue performance. The limit units are Mbps.

By default it is disabled. To disable the rate limitation after it
has been set for a queue, it should be set to zero.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

822b3b2e

Merge branch 'rhashtable_remove_shift' · b65885d2

由 David S. Miller 提交于 3月 18, 2015

Herbert Xu says:

====================
rhashtable: Kill redundant shift parameter

I was trying to squeeze bucket_table->rehash in by downsizing
bucket_table->size, only to find that my spot had been taken
over by bucket_table->shift.  These patches kill shift and makes
me feel better :)

v2 corrects the typo in the test_rhashtable changelog and also
notes the min_shift parameter in the tipc patch changelog.
====================
Acked-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b65885d2

rhashtable: Remove max_shift and min_shift · e2e21c1c

由 Herbert Xu 提交于 3月 18, 2015

Now that nobody uses max_shift and min_shift, we can safely remove
them.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e2e21c1c

test_rhashtable: Use rhashtable max_size instead of max_shift · 4f509df4

由 Herbert Xu 提交于 3月 18, 2015

This patch converts test_rhashtable to use rhashtable max_size
instead of the obsolete max_shift.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f509df4

tipc: Use rhashtable max/min_size instead of max/min_shift · 446c89ac

由 Herbert Xu 提交于 3月 18, 2015

This patch converts tipc to use rhashtable max/min_size instead of
the obsolete max/min_shift.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

446c89ac

netlink: Use rhashtable max_size instead of max_shift · b06eee59

由 Herbert Xu 提交于 3月 18, 2015

This patch converts netlink to use rhashtable max_size instead
of the obsolete max_shift.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b06eee59

rhashtable: Introduce max_size/min_size · c2e213cf

由 Herbert Xu 提交于 3月 18, 2015

This patch adds the parameters max_size and min_size which are
meant to replace max_shift and min_shift.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2e213cf

rhashtable: Remove shift from bucket_table · 6aebd940

由 Herbert Xu 提交于 3月 18, 2015

Keeping both size and shift is silly.  We only need one.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6aebd940

Merge branch 'xgene-next' · a61bfa65

由 David S. Miller 提交于 3月 18, 2015

Keyur Chudgar says:

====================
drivers: net: xgene: Add second SGMII based 1G interface

This patch adds support for second SGMII based 1G interface.
====================
Signed-off-by: NKeyur Chudgar <kchudgar@apm.com>
Signed-off-by: NIyappan Subramanian <isubramanian@apm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a61bfa65

drivers: net: xgene: Add second SGMII based 1G interface · ca626454

由 Keyur Chudgar 提交于 3月 17, 2015

- Added resource initialization based on port-id field
- Enabled second SGMII 1G interface
Signed-off-by: NKeyur Chudgar <kchudgar@apm.com>
Signed-off-by: NIyappan Subramanian <isubramanian@apm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca626454

dtb: xgene: Add second SGMII based 1G interface node · 2d33394e

由 Keyur Chudgar 提交于 3月 17, 2015

- Added new SGMII node for port 1
- Added port-id field
Signed-off-by: NKeyur Chudgar <kchudgar@apm.com>
Signed-off-by: NIyappan Subramanian <isubramanian@apm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d33394e

Documentation: dtb: Add port-id field for APM X-Gene ethernet · c2bc6e11

由 Keyur Chudgar 提交于 3月 17, 2015

Signed-off-by: NKeyur Chudgar <kchudgar@apm.com>
Signed-off-by: NIyappan Subramanian <isubramanian@apm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2bc6e11

18 3月, 2015 19 次提交

Merge branch 'tipc_netns_leak' · e7a9eee5

由 David S. Miller 提交于 3月 17, 2015

Ying Xue says:

====================
tipc: fix netns refcnt leak

The series aims to eliminate the issue of netns refcount leak. But
during fixing it, another two additional problems are found. So all
of known issues associated with the netns refcnt leak are resolved
at the same time in the patchset.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e7a9eee5

tipc: withdraw tipc topology server name when namespace is deleted · 2b9bb7f3

由 Ying Xue 提交于 3月 18, 2015

The TIPC topology server is a per namespace service associated with the
tipc name {1, 1}. When a namespace is deleted, that name must be withdrawn
before we call sk_release_kernel because the kernel socket release is
done in init_net and trying to withdraw a TIPC name published in another
namespace will fail with an error as:

[  170.093264] Unable to remove local publication
[  170.093264] (type=1, lower=1, ref=2184244004, key=2184244005)

We fix this by breaking the association between the topology server name
and socket before calling sk_release_kernel.
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b9bb7f3

tipc: fix a potential deadlock when nametable is purged · 8460504b

由 Ying Xue 提交于 3月 18, 2015

[   28.531768] =============================================
[   28.532322] [ INFO: possible recursive locking detected ]
[   28.532322] 3.19.0+ #194 Not tainted
[   28.532322] ---------------------------------------------
[   28.532322] insmod/583 is trying to acquire lock:
[   28.532322]  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
[   28.532322]
[   28.532322] but task is already holding lock:
[   28.532322]  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
[   28.532322]
[   28.532322] other info that might help us debug this:
[   28.532322]  Possible unsafe locking scenario:
[   28.532322]
[   28.532322]        CPU0
[   28.532322]        ----
[   28.532322]   lock(&(&nseq->lock)->rlock);
[   28.532322]   lock(&(&nseq->lock)->rlock);
[   28.532322]
[   28.532322]  *** DEADLOCK ***
[   28.532322]
[   28.532322]  May be due to missing lock nesting notation
[   28.532322]
[   28.532322] 3 locks held by insmod/583:
[   28.532322]  #0:  (net_mutex){+.+.+.}, at: [<ffffffff8163e30f>] register_pernet_subsys+0x1f/0x50
[   28.532322]  #1:  (&(&tn->nametbl_lock)->rlock){+.....}, at: [<ffffffffa000e091>] tipc_nametbl_stop+0xb1/0x1f0 [tipc]
[   28.532322]  #2:  (&(&nseq->lock)->rlock){+.....}, at: [<ffffffffa000e0dc>] tipc_nametbl_stop+0xfc/0x1f0 [tipc]
[   28.532322]
[   28.532322] stack backtrace:
[   28.532322] CPU: 1 PID: 583 Comm: insmod Not tainted 3.19.0+ #194
[   28.532322] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   28.532322]  ffffffff82394460 ffff8800144cb928 ffffffff81792f3e 0000000000000007
[   28.532322]  ffffffff82394460 ffff8800144cba28 ffffffff810a8080 ffff8800144cb998
[   28.532322]  ffffffff810a4df3 ffff880013e9cb10 ffffffff82b0d330 ffff880013e9cb38
[   28.532322] Call Trace:
[   28.532322]  [<ffffffff81792f3e>] dump_stack+0x4c/0x65
[   28.532322]  [<ffffffff810a8080>] __lock_acquire+0x740/0x1ca0
[   28.532322]  [<ffffffff810a4df3>] ? __bfs+0x23/0x270
[   28.532322]  [<ffffffff810a7506>] ? check_irq_usage+0x96/0xe0
[   28.532322]  [<ffffffff810a8a73>] ? __lock_acquire+0x1133/0x1ca0
[   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
[   28.532322]  [<ffffffff810a9c0c>] lock_acquire+0x9c/0x140
[   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
[   28.532322]  [<ffffffff8179c41f>] _raw_spin_lock_bh+0x3f/0x50
[   28.532322]  [<ffffffffa000d219>] ? tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
[   28.532322]  [<ffffffffa000d219>] tipc_nametbl_remove_publ+0x49/0x2e0 [tipc]
[   28.532322]  [<ffffffffa000e11e>] tipc_nametbl_stop+0x13e/0x1f0 [tipc]
[   28.532322]  [<ffffffffa000dfe5>] ? tipc_nametbl_stop+0x5/0x1f0 [tipc]
[   28.532322]  [<ffffffffa0004bab>] tipc_init_net+0x13b/0x150 [tipc]
[   28.532322]  [<ffffffffa0004a75>] ? tipc_init_net+0x5/0x150 [tipc]
[   28.532322]  [<ffffffff8163dece>] ops_init+0x4e/0x150
[   28.532322]  [<ffffffff810aa66d>] ? trace_hardirqs_on+0xd/0x10
[   28.532322]  [<ffffffff8163e1d3>] register_pernet_operations+0xf3/0x190
[   28.532322]  [<ffffffff8163e31e>] register_pernet_subsys+0x2e/0x50
[   28.532322]  [<ffffffffa002406a>] tipc_init+0x6a/0x1000 [tipc]
[   28.532322]  [<ffffffffa0024000>] ? 0xffffffffa0024000
[   28.532322]  [<ffffffff810002d9>] do_one_initcall+0x89/0x1c0
[   28.532322]  [<ffffffff811b7cb0>] ? kmem_cache_alloc_trace+0x50/0x1b0
[   28.532322]  [<ffffffff810e725b>] ? do_init_module+0x2b/0x200
[   28.532322]  [<ffffffff810e7294>] do_init_module+0x64/0x200
[   28.532322]  [<ffffffff810e9353>] load_module+0x12f3/0x18e0
[   28.532322]  [<ffffffff810e5890>] ? show_initstate+0x50/0x50
[   28.532322]  [<ffffffff810e9a19>] SyS_init_module+0xd9/0x110
[   28.532322]  [<ffffffff8179f3b3>] sysenter_dispatch+0x7/0x1f

Before tipc_purge_publications() calls tipc_nametbl_remove_publ() to
remove a publication with a name sequence, the name sequence's lock
is held. However, when tipc_nametbl_remove_publ() calling
tipc_nameseq_remove_publ() to remove the publication, it first tries
to query name sequence instance with the publication, and then holds
the lock of the found name sequence. But as the lock may be already
taken in tipc_purge_publications(), deadlock happens like above
scenario demonstrated. As tipc_nameseq_remove_publ() doesn't grab name
sequence's lock, the deadlock can be avoided if it's directly invoked
by tipc_purge_publications().

Fixes: 97ede29e ("tipc: convert name table read-write lock to RCU")
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8460504b

tipc: fix netns refcnt leak · 76100a8a

由 Ying Xue 提交于 3月 18, 2015

When the TIPC module is loaded, we launch a topology server in kernel
space, which in its turn is creating TIPC sockets for communication
with topology server users. Because both the socket's creator and
provider reside in the same module, it is necessary that the TIPC
module's reference count remains zero after the server is started and
the socket created; otherwise it becomes impossible to perform "rmmod"
even on an idle module.

Currently, we achieve this by defining a separate "tipc_proto_kern"
protocol struct, that is used only for kernel space socket allocations.
This structure has the "owner" field set to NULL, which restricts the
module reference count from being be bumped when sk_alloc() for local
sockets is called. Furthermore, we have defined three kernel-specific
functions, tipc_sock_create_local(), tipc_sock_release_local() and
tipc_sock_accept_local(), to avoid the module counter being modified
when module local sockets are created or deleted. This has worked well
until we introduced name space support.

However, after name space support was introduced, we have observed that
a reference count leak occurs, because the netns counter is not
decremented in tipc_sock_delete_local().

This commit remedies this problem. But instead of just modifying
tipc_sock_delete_local(), we eliminate the whole parallel socket
handling infrastructure, and start using the regular sk_create_kern(),
kernel_accept() and sk_release_kernel() calls. Since those functions
manipulate the module counter, we must now compensate for that by
explicitly decrementing the counter after module local sockets are
created, and increment it just before calling sk_release_kernel().

Fixes: a62fbcce ("tipc: make subscriber server support net namespace")
Signed-off-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericson.com>
Reviewed-by: NErik Hugne <erik.hugne@ericsson.com>
Reported-by: NCong Wang <cwang@twopensource.com>
Tested-by: NErik Hugne <erik.hugne@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76100a8a

Merge branch 'listener_refactor_part_12' · 52841430

由 David S. Miller 提交于 3月 17, 2015

Eric Dumazet says:

====================
inet: tcp listener refactoring, part 12

By adding a pointer back to listener, we are preparing synack rtx
handling to no longer be governed by listener keepalive timer,
as this is the most problematic source of contention on listener
spinlock. Note that TCP FastOpen had such pointer anyway, so we
make it generic.
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52841430

inet: fix request sock refcounting · 0470c8ca