openeuler / Kernel
大约 1 年前同步成功

5

0

0

代码
- 文件
- 提交
- 分支
- Tags
- 贡献者
- 分支图
- Diff
Issue 0
- 列表
- 看板
- 标记
- 里程碑
合并请求 0
DevOps
Wiki 0
- Wiki
分析
- 仓库
- DevOps
项目成员
Pages

体验新版 GitCode，发现更多精彩内容 >>

03 10月, 2015 18 次提交

J

switchdev: push object ID back to object structure · 9e8f4a54

由 Jiri Pirko 提交于 10月 01, 2015

Suggested-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e8f4a54

J

switchdev: bring back switchdev_obj and use it as a generic object param · 648b4a99

由 Jiri Pirko 提交于 10月 01, 2015

Replace "void *obj" with a generic structure. Introduce couple of
helpers along that.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

648b4a99

J

switchdev: rename switchdev_obj_fdb to switchdev_obj_port_fdb · 52ba57cf

由 Jiri Pirko 提交于 10月 01, 2015

Make the struct name in sync with object id name.
Suggested-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52ba57cf

J

switchdev: rename switchdev_obj_vlan to switchdev_obj_port_vlan · 8f24f309

由 Jiri Pirko 提交于 10月 01, 2015

Make the struct name in sync with object id name.
Suggested-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f24f309

J

switchdev: rename SWITCHDEV_ATTR_* enum values to SWITCHDEV_ATTR_ID_* · 1f868398

由 Jiri Pirko 提交于 10月 01, 2015

To be aligned with obj.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f868398

J

switchdev: rename SWITCHDEV_OBJ_* enum values to SWITCHDEV_OBJ_ID_* · 57d80838

由 Jiri Pirko 提交于 10月 01, 2015

Suggested-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Reviewed-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

57d80838

E

tcp: remove max_qlen_log · ef547f2a

由 Eric Dumazet 提交于 10月 02, 2015

This control variable was set at first listen(fd, backlog)
call, but not updated if application tried to increase or decrease
backlog. It made sense at the time listener had a non resizeable
hash table.

Also rounding to powers of two was not very friendly.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef547f2a

E

tcp/dccp: remove struct listen_sock · 10cbc8f1

由 Eric Dumazet 提交于 10月 02, 2015

It is enough to check listener sk_state, no need for an extra
condition.

max_qlen_log can be moved into struct request_sock_queue

We can remove syn_wait_lock and the alignment it enforced.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10cbc8f1

E

tcp: attach SYNACK messages to request sockets instead of listener · ca6fb065

由 Eric Dumazet 提交于 10月 02, 2015

If a listen backlog is very big (to avoid syncookies), then
the listener sk->sk_wmem_alloc is the main source of false
sharing, as we need to touch it twice per SYNACK re-transmit
and TX completion.

(One SYN packet takes listener lock once, but up to 6 SYNACK
are generated)

By attaching the skb to the request socket, we remove this
source of contention.

Tested:

 listen(fd, 10485760); // single listener (no SO_REUSEPORT)
 16 RX/TX queue NIC
 Sustain a SYNFLOOD attack of ~320,000 SYN per second,
 Sending ~1,400,000 SYNACK per second.
 Perf profiles now show listener spinlock being next bottleneck.

    20.29%  [kernel]  [k] queued_spin_lock_slowpath
    10.06%  [kernel]  [k] __inet_lookup_established
     5.12%  [kernel]  [k] reqsk_timer_handler
     3.22%  [kernel]  [k] get_next_timer_interrupt
     3.00%  [kernel]  [k] tcp_make_synack
     2.77%  [kernel]  [k] ipt_do_table
     2.70%  [kernel]  [k] run_timer_softirq
     2.50%  [kernel]  [k] ip_finish_output
     2.04%  [kernel]  [k] cascade
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca6fb065

E

ipv6: remove obsolete inet6 functions · 1b33bc3e

由 Eric Dumazet 提交于 10月 02, 2015

inet6_csk_search_req() and inet6_csk_reqsk_queue_hash_add()
no longer exist.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b33bc3e

E

tcp/dccp: shrink struct listen_sock · 81b496b3

由 Eric Dumazet 提交于 10月 02, 2015

We no longer use hash_rnd, nr_table_entries and syn_table[]

For a listener with a backlog of 10 millions sockets, this
saves 80 MBytes of vmalloced memory.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81b496b3

E

tcp/dccp: install syn_recv requests into ehash table · 079096f1

由 Eric Dumazet 提交于 10月 02, 2015

In this patch, we insert request sockets into TCP/DCCP
regular ehash table (where ESTABLISHED and TIMEWAIT sockets
are) instead of using the per listener hash table.

ACK packets find SYN_RECV pseudo sockets without having
to find and lock the listener.

In nominal conditions, this halves pressure on listener lock.

Note that this will allow for SO_REUSEPORT refinements,
so that we can select a listener using cpu/numa affinities instead
of the prior 'consistent hash', since only SYN packets will
apply this selection logic.

We will shrink listen_sock in the following patch to ease
code review.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Ying Cai <ycai@google.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

079096f1

E

tcp/dccp: remove inet_csk_reqsk_queue_added() timeout argument · 2feda341

由 Eric Dumazet 提交于 10月 02, 2015

This is no longer used.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2feda341

E

tcp: get_openreq[46]() changes · aa3a0c8c

由 Eric Dumazet 提交于 10月 02, 2015

When request sockets are no longer in a per listener hash table
but on regular TCP ehash, we need to access listener uid
through req->rsk_listener

get_openreq6() also gets a const for its request socket argument.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa3a0c8c

E

tcp/dccp: init sk_prot and call sk_node_init() in reqsk_alloc() · b267cdd1

由 Eric Dumazet 提交于 10月 02, 2015

We plan to use generic functions to insert request sockets
into ehash table.

sk_prot needs to be set (to retrieve sk_prot->h.hashinfo)
sk_node needs to be cleared.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b267cdd1

E

tcp: move synflood_warned into struct request_sock_queue · 8d2675f1

由 Eric Dumazet 提交于 10月 02, 2015

long term plan is to remove struct listen_sock when its hash
table is no longer there.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d2675f1

E

tcp: move qlen/young out of struct listen_sock · aac065c5

由 Eric Dumazet 提交于 10月 02, 2015

qlen_inc & young_inc were protected by listener lock,
while qlen_dec & young_dec were atomic fields.

Everything needs to be atomic for upcoming lockless listener.

Also move qlen/young in request_sock_queue as we'll get rid
of struct listen_sock eventually.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aac065c5

E

tcp: add a spinlock to protect struct request_sock_queue · fff1f300

由 Eric Dumazet 提交于 10月 02, 2015

struct request_sock_queue fields are currently protected
by the listener 'lock' (not a real spinlock)

We need to add a private spinlock instead, so that softirq handlers
creating children do not have to worry with backlog notion
that the listener 'lock' carries.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fff1f300

02 10月, 2015 3 次提交

G

memcg: remove pcp_counter_lock · ef510194

由 Greg Thelen 提交于 10月 01, 2015

Commit 733a572e ("memcg: make mem_cgroup_read_{stat|event}() iterate
possible cpus instead of online") removed the last use of the per memcg
pcp_counter_lock but forgot to remove the variable.

Kill the vestigial variable.
Signed-off-by: NGreg Thelen <gthelen@google.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ef510194

G

memcg: fix dirty page migration · 0610c25d

由 Greg Thelen 提交于 10月 01, 2015

The problem starts with a file backed dirty page which is charged to a
memcg.  Then page migration is used to move oldpage to newpage.

Migration:
 - copies the oldpage's data to newpage
 - clears oldpage.PG_dirty
 - sets newpage.PG_dirty
 - uncharges oldpage from memcg
 - charges newpage to memcg

Clearing oldpage.PG_dirty decrements the charged memcg's dirty page
count.

However, because newpage is not yet charged, setting newpage.PG_dirty
does not increment the memcg's dirty page count.  After migration
completes newpage.PG_dirty is eventually cleared, often in
account_page_cleaned().  At this time newpage is charged to a memcg so
the memcg's dirty page count is decremented which causes underflow
because the count was not previously incremented by migration.  This
underflow causes balance_dirty_pages() to see a very large unsigned
number of dirty memcg pages which leads to aggressive throttling of
buffered writes by processes in non root memcg.

This issue:
 - can harm performance of non root memcg buffered writes.
 - can report too small (even negative) values in
   memory.stat[(total_)dirty] counters of all memcg, including the root.

To avoid polluting migrate.c with #ifdef CONFIG_MEMCG checks, introduce
page_memcg() and set_page_memcg() helpers.

Test:
    0) setup and enter limited memcg
    mkdir /sys/fs/cgroup/test
    echo 1G > /sys/fs/cgroup/test/memory.limit_in_bytes
    echo $$ > /sys/fs/cgroup/test/cgroup.procs

    1) buffered writes baseline
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    2) buffered writes with compaction antagonist to induce migration
    yes 1 > /proc/sys/vm/compact_memory &
    rm -rf /data/tmp/foo
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    kill %
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

    3) buffered writes without antagonist, should match baseline
    rm -rf /data/tmp/foo
    dd if=/dev/zero of=/data/tmp/foo bs=1M count=1k
    sync
    grep ^dirty /sys/fs/cgroup/test/memory.stat

                       (speed, dirty residue)
             unpatched                       patched
    1) 841 MB/s 0 dirty pages          886 MB/s 0 dirty pages
    2) 611 MB/s -33427456 dirty pages  793 MB/s 0 dirty pages
    3) 114 MB/s -33427456 dirty pages  891 MB/s 0 dirty pages

    Notice that unpatched baseline performance (1) fell after
    migration (3): 841 -> 114 MB/s.  In the patched kernel, post
    migration performance matches baseline.

Fixes: c4843a75 ("memcg: add per cgroup dirty page accounting")
Signed-off-by: NGreg Thelen <gthelen@google.com>
Reported-by: NDave Hansen <dave.hansen@intel.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>	[4.2+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0610c25d

A

userfaultfd: remove kernel header include from uapi header · 9ff42d10

由 Andre Przywara 提交于 10月 01, 2015

As include/uapi/linux/userfaultfd.h is a user visible header file, it
should not include kernel-exclusive header files.

So trying to build the userfaultfd test program from the selftests
directory fails, since it contains a reference to linux/compiler.h.  As
it turns out, that header is not really needed there, so we can simply
remove it to fix that issue.
Signed-off-by: NAndre Przywara <andre.przywara@arm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9ff42d10

30 9月, 2015 19 次提交

D

net: Add support for filtering neigh dump by master device · 21fdd092

由 David Ahern 提交于 9月 29, 2015

Add support for filtering neighbor dumps by master device by adding
the NDA_MASTER attribute to the dump request. A new netlink flag,
NLM_F_DUMP_FILTERED, is added to indicate the kernel supports the
request and output is filtered as requested.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21fdd092

V

net: switchdev: extract struct switchdev_obj_* · 44bbcf5c

由 Vivien Didelot 提交于 9月 29, 2015

Now that switchdev and its drivers directly use specific switchdev_obj_*
structures, move them out of the switchdev_obj union and get rif of this
outer structure.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

44bbcf5c

V

net: switchdev: abstract object in add/del ops · ab069002

由 Vivien Didelot 提交于 9月 29, 2015

Similar to the notifier_call callback of a notifier_block, change the
function signature of switchdev add and del operations to:

    int switchdev_port_obj_add/del(struct net_device *dev,
                                   enum switchdev_obj_id id, void *obj);

This allows the caller to pass a specific switchdev_obj_* structure
instead of the generic switchdev_obj one.

Drivers implementation of these operations and switchdev have been
changed accordingly.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab069002

V

net: switchdev: pass callback to dump operation · 25f07adc

由 Vivien Didelot 提交于 9月 29, 2015

Similar to the notifier_call callback of a notifier_block, change the
function signature of switchdev dump operation to:

    int switchdev_port_obj_dump(struct net_device *dev,
                                enum switchdev_obj_id id, void *obj,
                                int (*cb)(void *obj));

This allows the caller to pass and expect back a specific
switchdev_obj_* structure instead of the generic switchdev_obj one.

Drivers implementation of dump operation can now expect this specific
structure and call the callback with it. Drivers have been changed
accordingly.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25f07adc

V

net: switchdev: remove dev from switchdev_obj cb · 03d5fb18

由 Vivien Didelot 提交于 9月 29, 2015

The net_device associated to a dump operation does not have to be passed
to the callback. switchdev stores it in a superset struct, if needed.

Also some drivers (such as DSA drivers) may not have easy access to it.

This will simplify pushing the callback function down to the drivers.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03d5fb18

D

net: Move netif_index_is_l3_master to l3mdev.h · 9478d12d

由 David Ahern 提交于 9月 29, 2015

Change CONFIG dependency to CONFIG_NET_L3_MASTER_DEV as well.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9478d12d

D

net: Remove vrf header file · ec539514

由 David Ahern 提交于 9月 29, 2015

Move remaining structs to VRF driver and delete the vrf header file.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ec539514

D

net: Remove the now unused vrf_ptr · 93a7e7e8

由 David Ahern 提交于 9月 29, 2015

Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93a7e7e8

D

net: Replace calls to vrf_dev_get_rth · 8e1ed705

由 David Ahern 提交于 9月 29, 2015

Replace calls to vrf_dev_get_rth with l3mdev_get_rtable.
The check on the flow flags is handled in the l3mdev operation.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e1ed705

D

net: Replace vrf_dev_table and friends · 3236b004

由 David Ahern 提交于 9月 29, 2015

Replace calls to vrf_dev_table and friends with l3mdev_fib_table
and kin.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3236b004

D

net: Replace vrf_master_ifindex{, _rcu} with l3mdev equivalents · 385add90

由 David Ahern 提交于 9月 29, 2015

Replace calls to vrf_master_ifindex_rcu and vrf_master_ifindex with either
l3mdev_master_ifindex_rcu or l3mdev_master_ifindex.

The pattern:
    oif = vrf_master_ifindex(dev) ? : dev->ifindex;
is replaced with
    oif = l3mdev_fib_oif(dev);

And remove the now unused vrf macros.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

385add90

D

net: Introduce L3 Master device abstraction · 1b69c6d0

由 David Ahern 提交于 9月 29, 2015

L3 master devices allow users of the abstraction to influence FIB lookups
for enslaved devices. Current API provides a means for the master device
to return a specific FIB table for an enslaved device, to return an
rtable/custom dst and influence the OIF used for fib lookups.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b69c6d0

D

net: Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER · 007979ea

由 David Ahern 提交于 9月 29, 2015

Rename IFF_VRF_MASTER to IFF_L3MDEV_MASTER and update the name of the
netif_is_vrf and netif_index_is_vrf macros.
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

007979ea

E

tcp: prepare fastopen code for upcoming listener changes · 0536fcc0

由 Eric Dumazet 提交于 9月 29, 2015

While auditing TCP stack for upcoming 'lockless' listener changes,
I found I had to change fastopen_init_queue() to properly init the object
before publishing it.

Otherwise an other cpu could try to lock the spinlock before it gets
properly initialized.

Instead of adding appropriate barriers, just remove dynamic memory
allocations :
- Structure is 28 bytes on 64bit arches. Using additional 8 bytes
  for holding a pointer seems overkill.
- Two listeners can share same cache line and performance would suffer.

If we really want to save few bytes, we would instead dynamically allocate
whole struct request_sock_queue in the future.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0536fcc0

E

tcp: constify tcp_syn_flood_action() socket argument · 2985aaac

由 Eric Dumazet 提交于 9月 29, 2015

tcp_syn_flood_action() will soon be called with unlocked socket.
In order to avoid SYN flood warning being emitted multiple times,
use xchg().
Extend max_qlen_log and synflood_warned fields in struct listen_sock
to u32
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2985aaac

E

tcp: constify tcp_v{4|6}_route_req() sock argument · f964629e

由 Eric Dumazet 提交于 9月 29, 2015

These functions do not change the listener socket.
Goal is to make sure tcp_conn_request() is not messing with
listener in a racy way.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f964629e

E

tcp: cookie_init_sequence() cleanups · 3f684b4b

由 Eric Dumazet 提交于 9月 29, 2015

Some common IPv4/IPv6 code can be factorized.
Also constify cookie_init_sequence() socket argument.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3f684b4b

E

tcp/dccp: constify syn_recv_sock() method sock argument · 0c27171e

由 Eric Dumazet 提交于 9月 29, 2015

We'll soon no longer hold listener socket lock, these
functions do not modify the socket in any way.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c27171e

E

tcp: constify tcp_create_openreq_child() socket argument · c28c6f04

由 Eric Dumazet 提交于 9月 29, 2015

This method does not touch the listener socket.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c28c6f04