提交 · 59331c215daf600a650e281b6e8ef3e1ed1174c2 · openeuler / Kernel

16 12月, 2016 3 次提交

Revert "af_unix: fix hard linked sockets on overlay" · beef5121

由 Miklos Szeredi 提交于 12月 16, 2016

This reverts commit eb0a4a47.

Since commit 51f7e52d ("ovl: share inode for hard link") there's no
need to call d_real_inode() to check two overlay inodes for equality.

Side effect of this revert is that it's no longer possible to connect one
socket on overlayfs to one on the underlying layer (something which didn't
make sense anyway).
Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>

beef5121

Makefile: drop -D__CHECK_ENDIAN__ from cflags · 6bdf1e0e

由 Michael S. Tsirkin 提交于 12月 15, 2016

That's the default now, no need for makefiles to set it.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NKalle Valo <kvalo@codeaurora.org>
Acked-by: NMarcel Holtmann <marcel@holtmann.org>
Acked-by: NMarc Kleine-Budde <mkl@pengutronix.de>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NArend van Spriel <arend.vanspriel@broadcom.com>

6bdf1e0e

linux: drop __bitwise__ everywhere · 9efeccac

由 Michael S. Tsirkin 提交于 12月 11, 2016

__bitwise__ used to mean "yes, please enable sparse checks
unconditionally", but now that we dropped __CHECK_ENDIAN__
__bitwise is exactly the same.
There aren't many users, replace it by __bitwise everywhere.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: NStefan Schmidt <stefan@osg.samsung.com>
Acked-by: NKrzysztof Kozlowski <krzk@kernel.org>
Akced-by: NLee Duncan <lduncan@suse.com>

9efeccac

15 12月, 2016 8 次提交

vsock/virtio: fix src/dst cid format · f83f12d6

由 Michael S. Tsirkin 提交于 12月 06, 2016

These fields are 64 bit, using le32_to_cpu and friends
on these will not do the right thing.
Fix this up.

Cc: stable@vger.kernel.org
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

f83f12d6

vsock/virtio: mark an internal function static · 819483d8

由 Michael S. Tsirkin 提交于 12月 06, 2016

virtio_transport_alloc_pkt is only used locally, make it static.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

819483d8

vsock/virtio: add a missing __le annotation · 6c7efafd

由 Michael S. Tsirkin 提交于 12月 06, 2016

guest cid is read from config space, therefore it's in little endian
format and is treated as such, annotate it accordingly.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>

6c7efafd

rxrpc: abstract away knowledge of IDR internals · 44430612

由 Matthew Wilcox 提交于 12月 14, 2016

Add idr_get_cursor() / idr_set_cursor() APIs, and remove the reference
to IDR_SIZE.

Link: http://lkml.kernel.org/r/1480369871-5271-65-git-send-email-mawilcox@linuxonhyperv.comSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
Reviewed-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

44430612

libceph: remove now unused finish_request() wrapper · 45ee2c1d

由 Ilya Dryomov 提交于 12月 02, 2016

Kill the wrapper and rename __finish_request() to finish_request().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

45ee2c1d

libceph: always signal completion when done · c297eb42

由 Ilya Dryomov 提交于 12月 02, 2016

r_safe_completion is currently, and has always been, signaled only if
on-disk ack was requested. It's there for fsync and syncfs, which wait
for in-flight writes to flush - all data write requests set ONDISK.

However, the pool perm check code introduced in 4.2 sends a write
request with only ACK set. An unfortunately timed syncfs can then hang
forever: r_safe_completion won't be signaled because only an unsafe
reply was requested.

We could patch ceph_osdc_sync() to skip !ONDISK write requests, but
that is somewhat incomplete and yet another special case. Instead,
rename this completion to r_done_completion and always signal it when
the OSD client is done with the request, whether unsafe, safe, or
error. This is a bit cleaner and helps with the cancellation code.
Reported-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

c297eb42

netns: avoid disabling irq for netns id · fba143c6

由 Paul Moore 提交于 11月 29, 2016

Bring back commit bc51dddf ("netns: avoid disabling irq for netns
id") now that we've fixed some audit multicast issues that caused
problems with original attempt.  Additional information, and history,
can be found in the links below:

 * https://github.com/linux-audit/audit-kernel/issues/22
 * https://github.com/linux-audit/audit-kernel/issues/23Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

fba143c6

rds_rdma: log the connection reject message · 39384f04

由 Steve Wise 提交于 10月 26, 2016

Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

39384f04

13 12月, 2016 16 次提交

crush: include mapper.h in mapper.c · f6c0d1a3

由 Tobias Klauser 提交于 10月 28, 2016

Include linux/crush/mapper.h in crush/mapper.c to get the prototypes of
crush_find_rule and crush_do_rule which are defined there. This fixes
the following GCC warnings when building with 'W=1':

  net/ceph/crush/mapper.c:40:5: warning: no previous prototype for ‘crush_find_rule’ [-Wmissing-prototypes]
  net/ceph/crush/mapper.c:793:5: warning: no previous prototype for ‘crush_do_rule’ [-Wmissing-prototypes]
Signed-off-by: NTobias Klauser <tklauser@distanz.ch>
[idryomov@gmail.com: corresponding !__KERNEL__ include]
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f6c0d1a3

libceph: no need to drop con->mutex for ->get_authorizer() · b3bbd3f2

由 Ilya Dryomov 提交于 12月 02, 2016

->get_authorizer(), ->verify_authorizer_reply(), ->sign_message() and
->check_message_signature() shouldn't be doing anything with or on the
connection (like closing it or sending messages).
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

b3bbd3f2

libceph: drop len argument of *verify_authorizer_reply() · 0dde5848

由 Ilya Dryomov 提交于 12月 02, 2016

The length of the reply is protocol-dependent - for cephx it's
ceph_x_authorize_reply.  Nothing sensible can be passed from the
messenger layer anyway.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

0dde5848

libceph: verify authorize reply on connect · 5c056fdc

由 Ilya Dryomov 提交于 12月 02, 2016

After sending an authorizer (ceph_x_authorize_a + ceph_x_authorize_b),
the client gets back a ceph_x_authorize_reply, which it is supposed to
verify to ensure the authenticity and protect against replay attacks.
The code for doing this is there (ceph_x_verify_authorizer_reply(),
ceph_auth_verify_authorizer_reply() + plumbing), but it is never
invoked by the the messenger.

AFAICT this goes back to 2009, when ceph authentication protocols
support was added to the kernel client in 4e7a5dcd ("ceph:
negotiate authentication protocol; implement AUTH_NONE protocol").

The second param of ceph_connection_operations::verify_authorizer_reply
is unused all the way down.  Pass 0 to facilitate backporting, and kill
it in the next commit.

Cc: stable@vger.kernel.org
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

5c056fdc

libceph: no need for GFP_NOFS in ceph_monc_init() · 5418d0a2

由 Ilya Dryomov 提交于 12月 02, 2016

It's called during inital setup, when everything should be allocated
with GFP_KERNEL.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

5418d0a2

libceph: stop allocating a new cipher on every crypto request · 7af3ea18

由 Ilya Dryomov 提交于 12月 02, 2016

This is useless and more importantly not allowed on the writeback path,
because crypto_alloc_skcipher() allocates memory with GFP_KERNEL, which
can recurse back into the filesystem:

    kworker/9:3     D ffff92303f318180     0 20732      2 0x00000080
    Workqueue: ceph-msgr ceph_con_workfn [libceph]
     ffff923035dd4480 ffff923038f8a0c0 0000000000000001 000000009eb27318
     ffff92269eb28000 ffff92269eb27338 ffff923036b145ac ffff923035dd4480
     00000000ffffffff ffff923036b145b0 ffffffff951eb4e1 ffff923036b145a8
    Call Trace:
     [<ffffffff951eb4e1>] ? schedule+0x31/0x80
     [<ffffffff951eb77a>] ? schedule_preempt_disabled+0xa/0x10
     [<ffffffff951ed1f4>] ? __mutex_lock_slowpath+0xb4/0x130
     [<ffffffff951ed28b>] ? mutex_lock+0x1b/0x30
     [<ffffffffc0a974b3>] ? xfs_reclaim_inodes_ag+0x233/0x2d0 [xfs]
     [<ffffffff94d92ba5>] ? move_active_pages_to_lru+0x125/0x270
     [<ffffffff94f2b985>] ? radix_tree_gang_lookup_tag+0xc5/0x1c0
     [<ffffffff94dad0f3>] ? __list_lru_walk_one.isra.3+0x33/0x120
     [<ffffffffc0a98331>] ? xfs_reclaim_inodes_nr+0x31/0x40 [xfs]
     [<ffffffff94e05bfe>] ? super_cache_scan+0x17e/0x190
     [<ffffffff94d919f3>] ? shrink_slab.part.38+0x1e3/0x3d0
     [<ffffffff94d9616a>] ? shrink_node+0x10a/0x320
     [<ffffffff94d96474>] ? do_try_to_free_pages+0xf4/0x350
     [<ffffffff94d967ba>] ? try_to_free_pages+0xea/0x1b0
     [<ffffffff94d863bd>] ? __alloc_pages_nodemask+0x61d/0xe60
     [<ffffffff94ddf42d>] ? cache_grow_begin+0x9d/0x560
     [<ffffffff94ddfb88>] ? fallback_alloc+0x148/0x1c0
     [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130
     [<ffffffff94de09db>] ? __kmalloc+0x1eb/0x580
     [<ffffffffc09fe2db>] ? crush_choose_firstn+0x3eb/0x470 [libceph]
     [<ffffffff94ed84e7>] ? __crypto_alloc_tfm+0x37/0x130
     [<ffffffff94ed9c19>] ? crypto_spawn_tfm+0x39/0x60
     [<ffffffffc08b30a3>] ? crypto_cbc_init_tfm+0x23/0x40 [cbc]
     [<ffffffff94ed857c>] ? __crypto_alloc_tfm+0xcc/0x130
     [<ffffffff94edcc23>] ? crypto_skcipher_init_tfm+0x113/0x180
     [<ffffffff94ed7cc3>] ? crypto_create_tfm+0x43/0xb0
     [<ffffffff94ed83b0>] ? crypto_larval_lookup+0x150/0x150
     [<ffffffff94ed7da2>] ? crypto_alloc_tfm+0x72/0x120
     [<ffffffffc0a01dd7>] ? ceph_aes_encrypt2+0x67/0x400 [libceph]
     [<ffffffffc09fd264>] ? ceph_pg_to_up_acting_osds+0x84/0x5b0 [libceph]
     [<ffffffff950d40a0>] ? release_sock+0x40/0x90
     [<ffffffff95139f94>] ? tcp_recvmsg+0x4b4/0xae0
     [<ffffffffc0a02714>] ? ceph_encrypt2+0x54/0xc0 [libceph]
     [<ffffffffc0a02b4d>] ? ceph_x_encrypt+0x5d/0x90 [libceph]
     [<ffffffffc0a02bdf>] ? calcu_signature+0x5f/0x90 [libceph]
     [<ffffffffc0a02ef5>] ? ceph_x_sign_message+0x35/0x50 [libceph]
     [<ffffffffc09e948c>] ? prepare_write_message_footer+0x5c/0xa0 [libceph]
     [<ffffffffc09ecd18>] ? ceph_con_workfn+0x2258/0x2dd0 [libceph]
     [<ffffffffc09e9903>] ? queue_con_delay+0x33/0xd0 [libceph]
     [<ffffffffc09f68ed>] ? __submit_request+0x20d/0x2f0 [libceph]
     [<ffffffffc09f6ef8>] ? ceph_osdc_start_request+0x28/0x30 [libceph]
     [<ffffffffc0b52603>] ? rbd_queue_workfn+0x2f3/0x350 [rbd]
     [<ffffffff94c94ec0>] ? process_one_work+0x160/0x410
     [<ffffffff94c951bd>] ? worker_thread+0x4d/0x480
     [<ffffffff94c95170>] ? process_one_work+0x410/0x410
     [<ffffffff94c9af8d>] ? kthread+0xcd/0xf0
     [<ffffffff951efb2f>] ? ret_from_fork+0x1f/0x40
     [<ffffffff94c9aec0>] ? kthread_create_on_node+0x190/0x190

Allocating the cipher along with the key fixes the issue - as long the
key doesn't change, a single cipher context can be used concurrently in
multiple requests.

We still can't take that GFP_KERNEL allocation though.  Both
ceph_crypto_key_clone() and ceph_crypto_key_decode() are called from
GFP_NOFS context, so resort to memalloc_noio_{save,restore}() here.
Reported-by: NLucas Stach <l.stach@pengutronix.de>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

7af3ea18

I
libceph: uninline ceph_crypto_key_destroy() · 6db2304a
由 Ilya Dryomov 提交于 12月 02, 2016
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>
```
6db2304a
I
libceph: remove now unused ceph_*{en,de}crypt*() functions · 2b1e1a7c
由 Ilya Dryomov 提交于 12月 02, 2016
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>
```
2b1e1a7c
I
libceph: switch ceph_x_decrypt() to ceph_crypt() · e15fd0a1
由 Ilya Dryomov 提交于 12月 02, 2016
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>
```
e15fd0a1
I
libceph: switch ceph_x_encrypt() to ceph_crypt() · d03857c6
由 Ilya Dryomov 提交于 12月 02, 2016
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>
```
d03857c6

libceph: tweak calcu_signature() a little · 4eb4517c

由 Ilya Dryomov 提交于 12月 02, 2016

- replace an ad-hoc array with a struct
- rename to calc_signature() for consistency
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

4eb4517c

libceph: rename and align ceph_x_authorizer::reply_buf · 7882a26d

由 Ilya Dryomov 提交于 12月 02, 2016

It's going to be used as a temporary buffer for in-place en/decryption
with ceph_crypt() instead of on-stack buffers, so rename to enc_buf.
Ensure alignment to avoid GFP_ATOMIC allocations in the crypto stack.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

7882a26d

libceph: introduce ceph_crypt() for in-place en/decryption · a45f795c

由 Ilya Dryomov 提交于 12月 02, 2016

Starting with 4.9, kernel stacks may be vmalloced and therefore not
guaranteed to be physically contiguous; the new CONFIG_VMAP_STACK
option is enabled by default on x86. This makes it invalid to use
on-stack buffers with the crypto scatterlist API, as sg_set_buf()
expects a logical address and won't work with vmalloced addresses.

There isn't a different (e.g. kvec-based) crypto API we could switch
net/ceph/crypto.c to and the current scatterlist.h API isn't getting
updated to accommodate this use case. Allocating a new header and
padding for each operation is a non-starter, so do the en/decryption
in-place on a single pre-assembled (header + data + padding) heap
buffer. This is explicitly supported by the crypto API:

"... the caller may provide the same scatter/gather list for the
plaintext and cipher text. After the completion of the cipher
operation, the plaintext data is replaced with the ciphertext data
in case of an encryption and vice versa for a decryption."
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

a45f795c

I
libceph: introduce ceph_x_encrypt_offset() · 55d9cc83
由 Ilya Dryomov 提交于 12月 02, 2016
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>
```
55d9cc83

libceph: old_key in process_one_ticket() is redundant · 462e6504

由 Ilya Dryomov 提交于 12月 02, 2016

Since commit 0a990e70 ("ceph: clean up service ticket decoding"),
th->session_key isn't assigned until everything is decoded.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

462e6504

libceph: ceph_x_encrypt_buflen() takes in_len · 36721ece

由 Ilya Dryomov 提交于 12月 02, 2016

Pass what's going to be encrypted - that's msg_b, not ticket_blob.
ceph_x_encrypt_buflen() returns the upper bound, so this doesn't change
the maxlen calculation, but makes it a bit clearer.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NSage Weil <sage@redhat.com>

36721ece

11 12月, 2016 8 次提交

netfilter: nft_counter: rework atomic dump and reset · d84701ec

由 Pablo Neira 提交于 12月 11, 2016

Dump and reset doesn't work unless cmpxchg64() is used both from packet
and control plane paths. This approach is going to be slow though.
Instead, use a percpu seqcount to fetch counters consistently, then
subtract bytes and packets in case a reset was requested.

The cpu that running over the reset code is guaranteed to own this stats
exclusively, we have to turn counters into signed 64bit though so stats
update on reset don't get wrong on underflow.

This patch is based on original sketch from Eric Dumazet.

Fixes: 43da04a5 ("netfilter: nf_tables: atomic dump and reset for stateful objects")
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d84701ec

A
net: l2tp: ppp: change PPPOL2TP_MSG_* => L2TP_MSG_* · fba40c63
由 Asbjørn Sloth Tønnesen 提交于 12月 11, 2016
```
Signed-off-by: NAsbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
fba40c63

net: l2tp: export debug flags to UAPI · 41c43fbe

由 Asbjørn Sloth Tønnesen 提交于 12月 11, 2016

Move the L2TP_MSG_* definitions to UAPI, as it is part of
the netlink API.
Signed-off-by: NAsbjoern Sloth Toennesen <asbjorn@asbjorn.st>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

41c43fbe

net: bridge: shorten ageing time on topology change · 34d8acd8

由 Vivien Didelot 提交于 12月 10, 2016

802.1D [1] specifies that the bridges must use a short value to age out
dynamic entries in the Filtering Database for a period, once a topology
change has been communicated by the root bridge.

Add a bridge_ageing_time member in the net_bridge structure to store the
bridge ageing time value configured by the user (ioctl/netlink/sysfs).

If we are using in-kernel STP, shorten the ageing time value to twice
the forward delay used by the topology when the topology change flag is
set. When the flag is cleared, restore the configured ageing time.

[1] "8.3.5 Notifying topology changes ",
    http://profesores.elo.utfsm.cl/~agv/elo309/doc/802.1D-1998.pdfSigned-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34d8acd8

net: bridge: add helper to set topology change · 8384b5f5

由 Vivien Didelot 提交于 12月 10, 2016

Add a __br_set_topology_change helper to set the topology change value.

This can be later extended to add actions when the topology change flag
is set or cleared.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8384b5f5

net: bridge: add helper to offload ageing time · 82dd4332

由 Vivien Didelot 提交于 12月 10, 2016

The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME switchdev attr is actually set
when initializing a bridge port, and when configuring the bridge ageing
time from ioctl/netlink/sysfs.

Add a __set_ageing_time helper to offload the ageing time to physical
switches, and add the SWITCHDEV_F_DEFER flag since it can be called
under bridge lock.
Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82dd4332

net: socket: removed an unnecessary newline · fa1bd57a

由 Amit Kushwaha 提交于 12月 10, 2016

This patch removes a newline which was added
in socket.c file in net-next
Signed-off-by: NAmit Kushwaha <kushwaha.a@samsung.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa1bd57a

netlink: use blocking notifier · efa172f4

由 WANG Cong 提交于 12月 09, 2016

netlink_chain is called in ->release(), which is apparently
a process context, so we don't have to use an atomic notifier
here.
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

efa172f4

10 12月, 2016 5 次提交

SUNRPC: fix refcounting problems with auth_gss messages. · 1cded9d2

由 NeilBrown 提交于 12月 05, 2016

There are two problems with refcounting of auth_gss messages.

First, the reference on the pipe->pipe list (taken by a call
to rpc_queue_upcall()) is not counted.  It seems to be
assumed that a message in pipe->pipe will always also be in
pipe->in_downcall, where it is correctly reference counted.

However there is no guaranty of this.  I have a report of a
NULL dereferences in rpc_pipe_read() which suggests a msg
that has been freed is still on the pipe->pipe list.

One way I imagine this might happen is:
- message is queued for uid=U and auth->service=S1
- rpc.gssd reads this message and starts processing.
  This removes the message from pipe->pipe
- message is queued for uid=U and auth->service=S2
- rpc.gssd replies to the first message. gss_pipe_downcall()
  calls __gss_find_upcall(pipe, U, NULL) and it finds the
  *second* message, as new messages are placed at the head
  of ->in_downcall, and the service type is not checked.
- This second message is removed from ->in_downcall and freed
  by gss_release_msg() (even though it is still on pipe->pipe)
- rpc.gssd tries to read another message, and dereferences a pointer
  to this message that has just been freed.

I fix this by incrementing the reference count before calling
rpc_queue_upcall(), and decrementing it if that fails, or normally in
gss_pipe_destroy_msg().

It seems strange that the reply doesn't target the message more
precisely, but I don't know all the details.  In any case, I think the
reference counting irregularity became a measureable bug when the
extra arg was added to __gss_find_upcall(), hence the Fixes: line
below.

The second problem is that if rpc_queue_upcall() fails, the new
message is not freed. gss_alloc_msg() set the ->count to 1,
gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
then the pointer is discarded so the memory never gets freed.

Fixes: 9130b8db ("SUNRPC: allow for upcalls for same uid but different gss service")
Cc: stable@vger.kernel.org
Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250Signed-off-by: NNeilBrown <neilb@suse.com>
Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>

1cded9d2

net: skb_condense() can also deal with empty skbs · 3174fed9

由 Eric Dumazet 提交于 12月 09, 2016

It seems attackers can also send UDP packets with no payload at all.

skb_condense() can still be a win in this case.

It will be possible to replace the custom code in tcp_add_backlog()
to get full benefit from skb_condense()
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3174fed9

udp: udp_rmem_release() should touch sk_rmem_alloc later · 02ab0d13

由 Eric Dumazet 提交于 12月 08, 2016

In flood situations, keeping sk_rmem_alloc at a high value
prevents producers from touching the socket.

It makes sense to lower sk_rmem_alloc only at the end
of udp_rmem_release() after the thread draining receive
queue in udp_recvmsg() finished the writes to sk_forward_alloc.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02ab0d13

udp: add batching to udp_rmem_release() · 6b229cf7

由 Eric Dumazet 提交于 12月 08, 2016

If udp_recvmsg() constantly releases sk_rmem_alloc
for every read packet, it gives opportunity for
producers to immediately grab spinlocks and desperatly
try adding another packet, causing false sharing.

We can add a simple heuristic to give the signal
by batches of ~25 % of the queue capacity.

This patch considerably increases performance under
flood by about 50 %, since the thread draining the queue
is no longer slowed by false sharing.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6b229cf7

udp: copy skb->truesize in the first cache line · c84d9490

由 Eric Dumazet 提交于 12月 08, 2016

In UDP RX handler, we currently clear skb->dev before skb
is added to receive queue, because device pointer is no longer
available once we exit from RCU section.

Since this first cache line is always hot, lets reuse this space
to store skb->truesize and thus avoid a cache line miss at
udp_recvmsg()/udp_skb_destructor time while receive queue
spinlock is held.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c84d9490

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功