提交 · dd6fd213b05e7a1f590b470500343dd97c3a32c1 · openeuler / Kernel

01 12月, 2016 5 次提交

svcrdma: Remove DMA map accounting · dd6fd213

由 Chuck Lever 提交于 11月 29, 2016

Clean up: sc_dma_used is not required for correct operation. It is
simply a debugging tool to report when svcrdma has leaked DMA maps.

However, manipulating an atomic has a measurable CPU cost, and DMA
map accounting specific to svcrdma will be meaningless once svcrdma
is converted to use the new generic r/w API.

A similar kind of debug accounting can be done simply by enabling
the IOMMU or by using CONFIG_DMA_API_DEBUG, CONFIG_IOMMU_DEBUG, and
CONFIG_IOMMU_LEAK.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

dd6fd213

svcrdma: Remove BH-disabled spin locking in svc_rdma_send() · e4eb42ce

由 Chuck Lever 提交于 11月 29, 2016

svcrdma's current SQ accounting algorithm takes sc_lock and disables
bottom-halves while posting all RDMA Read, Write, and Send WRs.

This is relatively heavyweight serialization. And note that Write and
Send are already fully serialized by the xpt_mutex.

Using a single atomic_t should be all that is necessary to guarantee
that ib_post_send() is called only when there is enough space on the
send queue. This is what the other RDMA-enabled storage targets do.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

e4eb42ce

svcrdma: Renovate sendto chunk list parsing · 5fdca653

由 Chuck Lever 提交于 11月 29, 2016

The current sendto code appears to support clients that provide only
one of a Read list, a Write list, or a Reply chunk. My reading of
that code is that it doesn't support the following cases:

 - Read list + Write list
 - Read list + Reply chunk
 - Write list + Reply chunk
 - Read list + Write list + Reply chunk

The protocol allows more than one Read or Write chunk in those
lists. Some clients do send a Read list and Reply chunk
simultaneously. NFSv4 WRITE uses a Read list for the data payload,
and a Reply chunk because the GETATTR result in the reply can
contain a large object like an ACL.

Generalize one of the sendto code paths needed to support all of
the above cases, and attempt to ensure that only one pass is done
through the RPC Call's transport header to gather chunk list
information for building the reply.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

5fdca653

svcauth_gss: Close connection when dropping an incoming message · 4d712ef1

由 Chuck Lever 提交于 11月 29, 2016

S5.3.3.1 of RFC 2203 requires that an incoming GSS-wrapped message
whose sequence number lies outside the current window is dropped.
The rationale is:

  The reason for discarding requests silently is that the server
  is unable to determine if the duplicate or out of range request
  was due to a sequencing problem in the client, network, or the
  operating system, or due to some quirk in routing, or a replay
  attack by an intruder.  Discarding the request allows the client
  to recover after timing out, if indeed the duplication was
  unintentional or well intended.

However, clients may rely on the server dropping the connection to
indicate that a retransmit is needed. Without a connection reset, a
client can wait forever without retransmitting, and the workload
just stops dead. I've reproduced this behavior by running xfstests
generic/323 on an NFSv4.0 mount with proto=rdma and sec=krb5i.

To address this issue, have the server close the connection when it
silently discards an incoming message due to a GSS sequence number
problem.

There are a few other places where the server will never reply.
Change those spots in a similar fashion.
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

4d712ef1

svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm · 1b9f700b

由 Chuck Lever 提交于 11月 29, 2016

Logic copied from xs_setup_bc_tcp().

Fixes: 39a9beab ('rpc: share one xps between all backchannels')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

1b9f700b

02 11月, 2016 2 次提交

sunrpc: GFP_KERNEL should be GFP_NOFS in crypto code · 56094edd

由 J. Bruce Fields 提交于 10月 26, 2016

Writes may depend on the auth_gss crypto code, so we shouldn't be
allocating with GFP_KERNEL there.

This still leaves some crypto_alloc_* calls which end up doing
GFP_KERNEL allocations in the crypto code.  Those could probably done at
crypto import time.
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

56094edd

svcrdma: backchannel cannot share a page for send and rcv buffers · 8d42629b

由 Chuck Lever 提交于 10月 28, 2016

The underlying transport releases the page pointed to by rq_buffer
during xprt_rdma_bc_send_request. When the backchannel reply arrives,
rq_rbuffer then points to freed memory.

Fixes: 68778945 ('SUNRPC: Separate buffer pointers for RPC ...')
Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

8d42629b

29 10月, 2016 1 次提交

sunrpc: fix some missing rq_rbuffer assignments · 18e601d6

由 Jeff Layton 提交于 10月 24, 2016

We've been seeing some crashes in testing that look like this:

BUG: unable to handle kernel NULL pointer dereference at           (null)
IP: [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
PGD 212ca2067 PUD 212ca3067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ppdev parport_pc i2c_piix4 sg parport i2c_core virtio_balloon pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi virtio_scsi 8139too ata_piix libata 8139cp mii virtio_pci floppy virtio_ring serio_raw virtio
CPU: 1 PID: 1540 Comm: nfsd Not tainted 4.9.0-rc1 #39
Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
task: ffff88020d7ed200 task.stack: ffff880211838000
RIP: 0010:[<ffffffff8135ce99>]  [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
RSP: 0018:ffff88021183bdd0  EFLAGS: 00010206
RAX: 0000000000000000 RBX: ffff88020d7fa000 RCX: 000000f400000000
RDX: 0000000000000014 RSI: ffff880212927020 RDI: 0000000000000000
RBP: ffff88021183be30 R08: 01000000ef896996 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880211704ca8
R13: ffff88021473f000 R14: 00000000ef896996 R15: ffff880211704800
FS:  0000000000000000(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 0000000212ca1000 CR4: 00000000000006e0
Stack:
 ffffffffa01ea087 ffffffff63400001 ffff880215145e00 ffff880211bacd00
 ffff88021473f2b8 0000000000000004 00000000d0679d67 ffff880211bacd00
 ffff88020d7fa000 ffff88021473f000 0000000000000000 ffff88020d7faa30
Call Trace:
 [<ffffffffa01ea087>] ? svc_tcp_recvfrom+0x5a7/0x790 [sunrpc]
 [<ffffffffa01f84d8>] svc_recv+0xad8/0xbd0 [sunrpc]
 [<ffffffffa0262d5e>] nfsd+0xde/0x160 [nfsd]
 [<ffffffffa0262c80>] ? nfsd_destroy+0x60/0x60 [nfsd]
 [<ffffffff810a9418>] kthread+0xd8/0xf0
 [<ffffffff816dbdbf>] ret_from_fork+0x1f/0x40
 [<ffffffff810a9340>] ? kthread_park+0x60/0x60
Code: 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 <4c> 89 07 4c 89 4f 08 4c 89 57 10 4c 89 5f 18 48 8d 7f 20 73 d4
RIP  [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
 RSP <ffff88021183bdd0>
CR2: 0000000000000000

Both Bruce and Eryu ran a bisect here and found that the problematic
patch was 68778945 (SUNRPC: Separate buffer pointers for RPC Call and
Reply messages).

That patch changed rpc_xdr_encode to use a new rq_rbuffer pointer to
set up the receive buffer, but didn't change all of the necessary
codepaths to set it properly. In particular the backchannel setup was
missing.

We need to set rq_rbuffer whenever rq_buffer is set. Ensure that it is.
Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
Tested-by: NChuck Lever <chuck.lever@oracle.com>
Reported-by: NEryu Guan <guaneryu@gmail.com>
Tested-by: NEryu Guan <guaneryu@gmail.com>
Fixes: 68778945 "SUNRPC: Separate buffer pointers..."
Reported-by: NJ. Bruce Fields <bfields@fieldses.org>
Signed-off-by: NJeff Layton <jlayton@redhat.com>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

18e601d6

27 10月, 2016 1 次提交

sunrpc: don't pass on-stack memory to sg_set_buf · 2876a344

由 J. Bruce Fields 提交于 10月 18, 2016

As of ac4e97ab "scatterlist: sg_set_buf() argument must be in linear
mapping", sg_set_buf hits a BUG when make_checksum_v2->xdr_process_buf,
among other callers, passes it memory on the stack.

We only need a scatterlist to pass this to the crypto code, and it seems
like overkill to require kmalloc'd memory just to encrypt a few bytes,
but for now this seems the best fix.

Many of these callers are in the NFS write paths, so we allocate with
GFP_NOFS.  It might be possible to do without allocations here entirely,
but that would probably be a bigger project.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>

2876a344

19 10月, 2016 1 次提交

mm: replace get_user_pages_unlocked() write/force parameters with gup_flags · c164154f

由 Lorenzo Stoakes 提交于 10月 13, 2016

This removes the 'write' and 'force' use from get_user_pages_unlocked()
and replaces them with 'gup_flags' to make the use of FOLL_FORCE
explicit in callers as use of this flag can result in surprising
behaviour (and hence bugs) within the mm subsystem.
Signed-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
Reviewed-by: NJan Kara <jack@suse.cz>
Acked-by: NMichal Hocko <mhocko@suse.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c164154f

14 10月, 2016 4 次提交

net: bridge: add the multicast_flood flag attribute to brport_attrs · 4eb6753c

由 Nikolay Aleksandrov 提交于 10月 13, 2016

When I added the multicast flood control flag, I also added an attribute
for it for sysfs similar to other flags, but I forgot to add it to
brport_attrs.

Fixes: b6cb5ac8 ("net: bridge: add per-port multicast flood flag")
Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4eb6753c

net: rtnl: info leak in rtnl_fill_vfinfo() · 775f4f05

由 Dan Carpenter 提交于 10月 13, 2016

The "vf_vlan_info" struct ends with a 2 byte struct hole so we have to
memset it to ensure that no stack information is revealed to user space.

Fixes: 79aab093 ('net: Update API for VF vlan protocol 802.1ad support')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

775f4f05

tipc: info leak in __tipc_nl_add_udp_addr() · 73076162

由 Dan Carpenter 提交于 10月 13, 2016

We should clear out the padding and unused struct members so that we
don't expose stack information to userspace.

Fixes: fdb3accc ('tipc: add the ability to get UDP options via netlink')
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73076162

net: ipv4: Do not drop to make_route if oif is l3mdev · 6104e112

由 David Ahern 提交于 10月 12, 2016

Commit e0d56fdd was a bit aggressive removing l3mdev calls in
the IPv4 stack. If the fib_lookup fails we do not want to drop to
make_route if the oif is an l3mdev device.

Also reverts 19664c6a ("net: l3mdev: Remove netif_index_is_l3_master")
which removed netif_index_is_l3_master.

Fixes: e0d56fdd ("net: l3mdev: remove redundant calls")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6104e112

13 10月, 2016 6 次提交

ipv6: tcp: restore IP6CB for pktoptions skbs · 8ce48623

由 Eric Dumazet 提交于 10月 12, 2016

Baozeng Ding reported following KASAN splat :

BUG: KASAN: use-after-free in ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 at addr ffff880029c84ec8
Read of size 1 by task poc/25548
Call Trace:
 [<ffffffff82cf43c9>] dump_stack+0x12e/0x185 /lib/dump_stack.c:15
 [<     inline     >] print_address_description /mm/kasan/report.c:204
 [<ffffffff817ced3b>] kasan_report_error+0x48b/0x4b0 /mm/kasan/report.c:283
 [<     inline     >] kasan_report /mm/kasan/report.c:303
 [<ffffffff817ced9e>] __asan_report_load1_noabort+0x3e/0x40 /mm/kasan/report.c:321
 [<ffffffff85c71da1>] ip6_datagram_recv_specific_ctl+0x13f1/0x15c0 /net/ipv6/datagram.c:687
 [<ffffffff85c734c3>] ip6_datagram_recv_ctl+0x33/0x40
 [<ffffffff85c0b07c>] do_ipv6_getsockopt.isra.4+0xaec/0x2150
 [<ffffffff85c0c7f6>] ipv6_getsockopt+0x116/0x230
 [<ffffffff859b5a12>] tcp_getsockopt+0x82/0xd0 /net/ipv4/tcp.c:3035
 [<ffffffff855fb385>] sock_common_getsockopt+0x95/0xd0 /net/core/sock.c:2647
 [<     inline     >] SYSC_getsockopt /net/socket.c:1776
 [<ffffffff855f8ba2>] SyS_getsockopt+0x142/0x230 /net/socket.c:1758
 [<ffffffff8685cdc5>] entry_SYSCALL_64_fastpath+0x23/0xc6
Memory state around the buggy address:
 ffff880029c84d80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff880029c84e00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
> ffff880029c84e80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
                                              ^
 ffff880029c84f00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
 ffff880029c84f80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff

He also provided a syzkaller reproducer.

Issue is that ip6_datagram_recv_specific_ctl() expects to find IP6CB
data that was moved at a different place in tcp_v6_rcv()

This patch moves tcp_v6_restore_cb() up and calls it from
tcp_v6_do_rcv() when np->pktoptions is set.

Fixes: 971f10ec ("tcp: better TCP_SKB_CB layout to reduce cache line misses")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NBaozeng Ding <sploving1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8ce48623

net_sched: reorder pernet ops and act ops registrations · ab102b80

由 WANG Cong 提交于 10月 11, 2016

Krister reported a kernel NULL pointer dereference after
tcf_action_init_1() invokes a_o->init(), it is a race condition
where one thread calling tcf_register_action() to initialize
the netns data after putting act ops in the global list and
the other thread searching the list and then calling
a_o->init(net, ...).

Fix this by moving the pernet ops registration before making
the action ops visible. This is fine because: a) we don't
rely on act_base in pernet ops->init(), b) in the worst case we
have a fully initialized netns but ops is still not ready so
new actions still can't be created.
Reported-by: NKrister Johansen <kjlx@templeofstupid.com>
Tested-by: NKrister Johansen <kjlx@templeofstupid.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ab102b80

openvswitch: add NETIF_F_HW_VLAN_STAG_TX to internal dev · 3145c037

由 Jiri Benc 提交于 10月 10, 2016

The internal device does support 802.1AD offloading since 018c1dda
("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink
attributes").
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Acked-by: NEric Garver <e@erig.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3145c037

openvswitch: fix vlan subtraction from packet length · 72ec108d

由 Jiri Benc 提交于 10月 10, 2016

When the packet has its vlan tag in skb->vlan_tci, the length of the VLAN
header is not counted in skb->len. It doesn't make sense to subtract it.

Fixes: 018c1dda ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Acked-by: NEric Garver <e@erig.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

72ec108d

openvswitch: vlan: remove wrong likely statement · 20ecf1e4

由 Jiri Benc 提交于 10月 10, 2016

This code is called whenever flow key is being extracted from the packet.
The packet may be as likely vlan tagged as not.

Fixes: 018c1dda ("openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes")
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Acked-by: NPravin B Shelar <pshelar@ovn.org>
Acked-by: NEric Garver <e@erig.me>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20ecf1e4

net_sched: do not broadcast RTM_GETTFILTER result · fa59b27c

由 Eric Dumazet 提交于 10月 09, 2016

There are two ways to get tc filters from kernel to user space.

1) Full dump (tc_dump_tfilter())
2) RTM_GETTFILTER to get one precise filter, reducing overhead.

The second operation is unfortunately broadcasting its result,
polluting "tc monitor" users.

This patch makes sure only the requester gets the result, using
netlink_unicast() instead of rtnetlink_send()

Jamal cooked an iproute2 patch to implement "tc filter get" operation,
but other user space libraries already use RTM_GETTFILTER when a single
filter is queried, instead of dumping all filters.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa59b27c

12 10月, 2016 2 次提交

strparser: Propagate correct error code in strp_recv() · 6d3a4c40

由 Geert Uytterhoeven 提交于 10月 06, 2016

With m68k-linux-gnu-gcc-4.1:

    net/strparser/strparser.c: In function ‘strp_recv’:
    net/strparser/strparser.c:98: warning: ‘err’ may be used uninitialized in this function

Pass "len" (which is an error code when negative) instead of the
uninitialized "err" variable to fix this.

Fixes: 43a0c675 ("strparser: Stream parser for messages")
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d3a4c40

treewide: remove redundant #include <linux/kconfig.h> · 97139d4a

由 Masahiro Yamada 提交于 10月 11, 2016

Kernel source files need not include <linux/kconfig.h> explicitly
because the top Makefile forces to include it with:

  -include $(srctree)/include/linux/kconfig.h

This commit removes explicit includes except the following:

  * arch/s390/include/asm/facilities_src.h
  * tools/testing/radix-tree/linux/kernel.h

These two are used for host programs.

Link: http://lkml.kernel.org/r/1473656164-11929-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

97139d4a

11 10月, 2016 2 次提交

netfilter: Fix slab corruption. · bd3769bf

由 Linus Torvalds 提交于 10月 10, 2016

Use the correct pattern for singly linked list insertion and
deletion.  We can also calculate the list head outside of the
mutex.

Fixes: e3b37f11 ("netfilter: replace list_head with single linked list")
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: NAaron Conole <aconole@bytheb.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

net/netfilter/core.c | 108 ++++++++++++++++-----------------------------------
 1 file changed, 33 insertions(+), 75 deletions(-)

bd3769bf

latent_entropy: Mark functions with __latent_entropy · 0766f788

由 Emese Revfy 提交于 6月 20, 2016

The __latent_entropy gcc attribute can be used only on functions and
variables.  If it is on a function then the plugin will instrument it for
gathering control-flow entropy. If the attribute is on a variable then
the plugin will initialize it with random contents.  The variable must
be an integer, an integer array type or a structure with integer fields.

These specific functions have been selected because they are init
functions (to help gather boot-time entropy), are called at unpredictable
times, or they have variable loops, each of which provide some level of
latent entropy.
Signed-off-by: NEmese Revfy <re.emese@gmail.com>
[kees: expanded commit message]
Signed-off-by: NKees Cook <keescook@chromium.org>

0766f788

08 10月, 2016 4 次提交

ipv6 addrconf: disallow rtr_solicits < -1 · cb4a4c69

由 Maciej Żenczykowski 提交于 10月 07, 2016

This disallows setting /proc/sys/net/ipv6/conf/*/router_solicitations
to values below -1.

-1 continues to mean an unlimited number of retransmits.

Note: this depends on 'ipv6 addrconf: remove addrconf_sysctl_hop_limit()'
Signed-off-by: NMaciej Żenczykowski <maze@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cb4a4c69

vfs: Remove {get,set,remove}xattr inode operations · fd50ecad

由 Andreas Gruenbacher 提交于 9月 29, 2016

These inode operations are no longer used; remove them.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

fd50ecad

cred: simpler, 1D supplementary groups · 81243eac

由 Alexey Dobriyan 提交于 10月 07, 2016

Current supplementary groups code can massively overallocate memory and
is implemented in a way so that access to individual gid is done via 2D
array.

If number of gids is <= 32, memory allocation is more or less tolerable
(140/148 bytes).  But if it is not, code allocates full page (!)
regardless and, what's even more fun, doesn't reuse small 32-entry
array.

2D array means dependent shifts, loads and LEAs without possibility to
optimize them (gid is never known at compile time).

All of the above is unnecessary.  Switch to the usual
trailing-zero-len-array scheme.  Memory is allocated with
kmalloc/vmalloc() and only as much as needed.  Accesses become simpler
(LEA 8(gi,idx,4) or even without displacement).

Maximum number of gids is 65536 which translates to 256KB+8 bytes.  I
think kernel can handle such allocation.

On my usual desktop system with whole 9 (nine) aux groups, struct
group_info shrinks from 148 bytes to 44 bytes, yay!

Nice side effects:

 - "gi->gid[i]" is shorter than "GROUP_AT(gi, i)", less typing,

 - fix little mess in net/ipv4/ping.c
   should have been using GROUP_AT macro but this point becomes moot,

 - aux group allocation is persistent and should be accounted as such.

Link: http://lkml.kernel.org/r/20160817201927.GA2096@p183.telecom.bySigned-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Cc: Vasily Kulikov <segoon@openwall.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

81243eac

mm: memcontrol: consolidate cgroup socket tracking · 2d758073

由 Johannes Weiner 提交于 10月 07, 2016

The cgroup core and the memory controller need to track socket ownership
for different purposes, but the tracking sites being entirely different
is kind of ugly.

Be a better citizen and rename the memory controller callbacks to match
the cgroup core callbacks, then move them to the same place.

[akpm@linux-foundation.org: coding-style fixes]
Link: http://lkml.kernel.org/r/20160914194846.11153-3-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NTejun Heo <tj@kernel.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Vladimir Davydov <vdavydov@virtuozzo.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d758073

07 10月, 2016 4 次提交

sockfs: Get rid of getxattr iop · bba0bd31

由 Andreas Gruenbacher 提交于 9月 29, 2016

If we allow pseudo-filesystems created with mount_pseudo to have xattr
handlers, we can replace sockfs_getxattr with a sockfs_xattr_get handler
to use the xattr handler name parsing.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

bba0bd31

sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names · 971df15b

由 Andreas Gruenbacher 提交于 9月 29, 2016

The standard return value for unsupported attribute names is
-EOPNOTSUPP, as opposed to undefined but supported attributes
(-ENODATA).

Also, fail for attribute names like "system.sockprotonameXXX" and
simplify the code a bit.
Signed-off-by: NAndreas Gruenbacher <agruenba@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

971df15b

netlink: do not enter direct reclaim from netlink_dump() · d35c99ff

由 Eric Dumazet 提交于 10月 06, 2016

Since linux-3.15, netlink_dump() can use up to 16384 bytes skb
allocations.

Due to struct skb_shared_info ~320 bytes overhead, we end up using
order-3 (on x86) page allocations, that might trigger direct reclaim and
add stress.

The intent was really to attempt a large allocation but immediately
fallback to a smaller one (order-1 on x86) in case of memory stress.

On recent kernels (linux-4.4), we can remove __GFP_DIRECT_RECLAIM to
meet the goal. Old kernels would need to remove __GFP_WAIT

While we are at it, since we do an order-3 allocation, allow to use
all the allocated bytes instead of 16384 to reduce syscalls during
large dumps.

iproute2 already uses 32KB recvmsg() buffer sizes.

Alexei provided an initial patch downsizing to SKB_WITH_OVERHEAD(16384)

Fixes: 9063e21f ("netlink: autosize skb lengthes")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Greg Thelen <gthelen@google.com>
Reviewed-by: NGreg Rose <grose@lightfleet.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d35c99ff

packet: call fanout_release, while UNREGISTERING a netdev · 66644982

由 Anoob Soman 提交于 10月 05, 2016

If a socket has FANOUT sockopt set, a new proto_hook is registered
as part of fanout_add(). When processing a NETDEV_UNREGISTER event in
af_packet, __fanout_unlink is called for all sockets, but prot_hook which was
registered as part of fanout_add is not removed. Call fanout_release, on a
NETDEV_UNREGISTER, which removes prot_hook and removes fanout from the
fanout_list.

This fixes BUG_ON(!list_empty(&dev->ptype_specific)) in netdev_run_todo()
Signed-off-by: NAnoob Soman <anoob.soman@citrix.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66644982

06 10月, 2016 8 次提交

Bluetooth: Refactor append name and appearance · 1b422066

由 Michał Narajowski 提交于 10月 05, 2016

Use eir_append_data to remove code duplication.
Signed-off-by: NMichał Narajowski <michal.narajowski@codecoup.pl>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

1b422066

Bluetooth: Add appearance to default scan rsp data · 7ddb30c7

由 Michał Narajowski 提交于 10月 05, 2016

Add appearance value to beginning of scan rsp data for
default advertising instance if the value is not 0.
Signed-off-by: NMichał Narajowski <michal.narajowski@codecoup.pl>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

7ddb30c7

Bluetooth: Fix local name in scan rsp · cecbf3e9

由 Michał Narajowski 提交于 10月 05, 2016

Use complete name if it fits. If not and there is short name
check if it fits. If not then use shortened name as prefix
of complete name.
Signed-off-by: NMichał Narajowski <michal.narajowski@codecoup.pl>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

cecbf3e9

rxrpc: Don't request an ACK on the last DATA packet of a call's Tx phase · bf7d620a