提交 · ceba1832b1b2da0149c51de62a847c00bca1677a · openeuler / Kernel

28 6月, 2016 10 次提交

calipso: Set the calipso socket label to match the secattr. · ceba1832

由 Huw Davies 提交于 6月 27, 2016

CALIPSO is a hop-by-hop IPv6 option.  A lot of this patch is based on
the equivalent CISPO code.  The main difference is due to manipulating
the options in the hop-by-hop header.
Signed-off-by: NHuw Davies <huw@codeweavers.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

ceba1832

netlabel: Move bitmap manipulation functions to the NetLabel core. · 3faa8f98

由 Huw Davies 提交于 6月 27, 2016

This is to allow the CALIPSO labelling engine to use these.
Signed-off-by: NHuw Davies <huw@codeweavers.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

3faa8f98

ipv6: Add ipv6_renew_options_kern() that accepts a kernel mem pointer. · e67ae213

由 Huw Davies 提交于 6月 27, 2016

The functionality is equivalent to ipv6_renew_options() except
that the newopt pointer is in kernel, not user, memory

The kernel memory implementation will be used by the CALIPSO network
labelling engine, which needs to be able to set IPv6 hop-by-hop
options.
Signed-off-by: NHuw Davies <huw@codeweavers.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

e67ae213

netlabel: Add support for removing a CALIPSO DOI. · d7cce015

由 Huw Davies 提交于 6月 27, 2016

Remove a specified DOI through the NLBL_CALIPSO_C_REMOVE command.
It requires the attribute:
 NLBL_CALIPSO_A_DOI.
Signed-off-by: NHuw Davies <huw@codeweavers.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

d7cce015

netlabel: Add support for creating a CALIPSO protocol domain mapping. · dc7de73f

由 Huw Davies 提交于 6月 27, 2016

This extends the NLBL_MGMT_C_ADD and NLBL_MGMT_C_ADDDEF commands
to accept CALIPSO protocol DOIs.
Signed-off-by: NHuw Davies <huw@codeweavers.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

dc7de73f

netlabel: Add support for enumerating the CALIPSO DOI list. · e1ce69df

由 Huw Davies 提交于 6月 27, 2016

Enumerate the DOI list through the NLBL_CALIPSO_C_LISTALL command.
It takes no attributes.
Signed-off-by: NHuw Davies <huw@codeweavers.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

e1ce69df

netlabel: Add support for querying a CALIPSO DOI. · a5e34490

由 Huw Davies 提交于 6月 27, 2016

Query a specified DOI through the NLBL_CALIPSO_C_LIST command.
It requires the attribute:
 NLBL_CALIPSO_A_DOI.

The reply will contain:
 NLBL_CALIPSO_A_MTYPE
Signed-off-by: NHuw Davies <huw@codeweavers.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

a5e34490

netlabel: Initial support for the CALIPSO netlink protocol. · cb72d382

由 Huw Davies 提交于 6月 27, 2016

CALIPSO is a packet labelling protocol for IPv6 which is very similar
to CIPSO.  It is specified in RFC 5570.  Much of the code is based on
the current CIPSO code.

This adds support for adding passthrough-type CALIPSO DOIs through the
NLBL_CALIPSO_C_ADD command.  It requires attributes:

 NLBL_CALIPSO_A_TYPE which must be CALIPSO_MAP_PASS.
 NLBL_CALIPSO_A_DOI.

In passthrough mode the CALIPSO engine will map MLS secattr levels
and categories directly to the packet label.

At this stage, the major difference between this and the CIPSO
code is that IPv6 may be compiled as a module.  To allow for
this the CALIPSO functions are registered at module init time.
Signed-off-by: NHuw Davies <huw@codeweavers.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

cb72d382

netlabel: Add an address family to domain hash entries. · 8f18e675

由 Huw Davies 提交于 6月 27, 2016

The reason is to allow different labelling protocols for
different address families with the same domain.

This requires the addition of an address family attribute
in the netlink communication protocol.  It is used in several
messages:

NLBL_MGMT_C_ADD and NLBL_MGMT_C_ADDDEF take it as an optional
attribute for the unlabelled protocol.  It may be one of AF_INET,
AF_INET6 or AF_UNSPEC (to specify both address families).  If it
is missing, it defaults to AF_UNSPEC.

NLBL_MGMT_C_LISTALL and NLBL_MGMT_C_LISTDEF return it as part of
the enumeration of each item.  Addtionally, it may be sent to
LISTDEF to specify which address family to return.
Signed-off-by: NHuw Davies <huw@codeweavers.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

8f18e675

netlabel: Mark rcu pointers with __rcu. · 96a8f7f8

由 Huw Davies 提交于 6月 27, 2016

This fixes sparse errors of the form:
  incompatible types in comparison expression (different address spaces)
Signed-off-by: NHuw Davies <huw@codeweavers.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

96a8f7f8

09 6月, 2016 1 次提交

netlabel: handle sparse category maps in netlbl_catmap_getlong() · 50b8629a

由 Paul Moore 提交于 6月 09, 2016

In cases where the category bitmap is sparse enough that gaps exist
between netlbl_lsm_catmap structs, callers to netlbl_catmap_getlong()
could find themselves prematurely ending their search through the
category bitmap. Further, the methods used to calculate the 'idx'
and 'off' values were incorrect for bitmaps this large. This patch
changes the netlbl_catmap_getlong() behavior so that it always skips
over gaps and calculates the index and offset values correctly.
Signed-off-by: NPaul Moore <paul@paul-moore.com>

50b8629a

07 6月, 2016 2 次提交

iucv: properly clone LSM attributes to newly created child sockets · 02f06918

由 Paul Moore 提交于 6月 07, 2016

Much like we had to do for AF_BLUETOOTH and AF_ALG, make sure we
properly clone the parent socket's LSM attributes to newly created
child sockets.
Signed-off-by: NPaul Moore <paul@paul-moore.com>

02f06918

netlabel: add address family checks to netlbl_{sock,req}_delattr() · 0e0e3677

由 Paul Moore 提交于 6月 06, 2016

It seems risky to always rely on the caller to ensure the socket's
address family is correct before passing it to the NetLabel kAPI,
especially since we see at least one LSM which didn't. Add address
family checks to the *_delattr() functions to help prevent future
problems.

Cc: <stable@vger.kernel.org>
Reported-by: NManinder Singh <maninder1.s@samsung.com>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

0e0e3677

12 4月, 2016 1 次提交

KEYS: Add a facility to restrict new links into a keyring · 5ac7eace

由 David Howells 提交于 4月 06, 2016

Add a facility whereby proposed new links to be added to a keyring can be
vetted, permitting them to be rejected if necessary.  This can be used to
block public keys from which the signature cannot be verified or for which
the signature verification fails.  It could also be used to provide
blacklisting.

This affects operations like add_key(), KEYCTL_LINK and KEYCTL_INSTANTIATE.

To this end:

 (1) A function pointer is added to the key struct that, if set, points to
     the vetting function.  This is called as:

	int (*restrict_link)(struct key *keyring,
			     const struct key_type *key_type,
			     unsigned long key_flags,
			     const union key_payload *key_payload),

     where 'keyring' will be the keyring being added to, key_type and
     key_payload will describe the key being added and key_flags[*] can be
     AND'ed with KEY_FLAG_TRUSTED.

     [*] This parameter will be removed in a later patch when
     	 KEY_FLAG_TRUSTED is removed.

     The function should return 0 to allow the link to take place or an
     error (typically -ENOKEY, -ENOPKG or -EKEYREJECTED) to reject the
     link.

     The pointer should not be set directly, but rather should be set
     through keyring_alloc().

     Note that if called during add_key(), preparse is called before this
     method, but a key isn't actually allocated until after this function
     is called.

 (2) KEY_ALLOC_BYPASS_RESTRICTION is added.  This can be passed to
     key_create_or_update() or key_instantiate_and_link() to bypass the
     restriction check.

 (3) KEY_FLAG_TRUSTED_ONLY is removed.  The entire contents of a keyring
     with this restriction emplaced can be considered 'trustworthy' by
     virtue of being in the keyring when that keyring is consulted.

 (4) key_alloc() and keyring_alloc() take an extra argument that will be
     used to set restrict_link in the new key.  This ensures that the
     pointer is set before the key is published, thus preventing a window
     of unrestrictedness.  Normally this argument will be NULL.

 (5) As a temporary affair, keyring_restrict_trusted_only() is added.  It
     should be passed to keyring_alloc() as the extra argument instead of
     setting KEY_FLAG_TRUSTED_ONLY on a keyring.  This will be replaced in
     a later patch with functions that look in the appropriate places for
     authoritative keys.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NMimi Zohar <zohar@linux.vnet.ibm.com>

5ac7eace

06 4月, 2016 1 次提交

netlabel: fix a problem with netlbl_secattr_catmap_setrng() · 341e0cb5

由 Janak Desai 提交于 3月 28, 2016

We try to be clever and set large chunks of the bitmap at once, when
possible; unfortunately we weren't very clever when we wrote the code
and messed up the if-conditional.  Fix this bug and restore proper
operation.
Signed-off-by: NJanak Desai <Janak.Desai@gtri.gatech.edu>
Signed-off-by: NPaul Moore <paul@paul-moore.com>

341e0cb5

02 4月, 2016 1 次提交

tun, bpf: fix suspicious RCU usage in tun_{attach, detach}_filter · 5a5abb1f

由 Daniel Borkmann 提交于 3月 31, 2016

Sasha Levin reported a suspicious rcu_dereference_protected() warning
found while fuzzing with trinity that is similar to this one:

  [   52.765684] net/core/filter.c:2262 suspicious rcu_dereference_protected() usage!
  [   52.765688] other info that might help us debug this:
  [   52.765695] rcu_scheduler_active = 1, debug_locks = 1
  [   52.765701] 1 lock held by a.out/1525:
  [   52.765704]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff816a64b7>] rtnl_lock+0x17/0x20
  [   52.765721] stack backtrace:
  [   52.765728] CPU: 1 PID: 1525 Comm: a.out Not tainted 4.5.0+ #264
  [...]
  [   52.765768] Call Trace:
  [   52.765775]  [<ffffffff813e488d>] dump_stack+0x85/0xc8
  [   52.765784]  [<ffffffff810f2fa5>] lockdep_rcu_suspicious+0xd5/0x110
  [   52.765792]  [<ffffffff816afdc2>] sk_detach_filter+0x82/0x90
  [   52.765801]  [<ffffffffa0883425>] tun_detach_filter+0x35/0x90 [tun]
  [   52.765810]  [<ffffffffa0884ed4>] __tun_chr_ioctl+0x354/0x1130 [tun]
  [   52.765818]  [<ffffffff8136fed0>] ? selinux_file_ioctl+0x130/0x210
  [   52.765827]  [<ffffffffa0885ce3>] tun_chr_ioctl+0x13/0x20 [tun]
  [   52.765834]  [<ffffffff81260ea6>] do_vfs_ioctl+0x96/0x690
  [   52.765843]  [<ffffffff81364af3>] ? security_file_ioctl+0x43/0x60
  [   52.765850]  [<ffffffff81261519>] SyS_ioctl+0x79/0x90
  [   52.765858]  [<ffffffff81003ba2>] do_syscall_64+0x62/0x140
  [   52.765866]  [<ffffffff817d563f>] entry_SYSCALL64_slow_path+0x25/0x25

Same can be triggered with PROVE_RCU (+ PROVE_RCU_REPEATEDLY) enabled
from tun_attach_filter() when user space calls ioctl(tun_fd, TUN{ATTACH,
DETACH}FILTER, ...) for adding/removing a BPF filter on tap devices.

Since the fix in f91ff5b9 ("net: sk_{detach|attach}_filter() rcu
fixes") sk_attach_filter()/sk_detach_filter() now dereferences the
filter with rcu_dereference_protected(), checking whether socket lock
is held in control path.

Since its introduction in 99405162 ("tun: socket filter support"),
tap filters are managed under RTNL lock from __tun_chr_ioctl(). Thus the
sock_owned_by_user(sk) doesn't apply in this specific case and therefore
triggers the false positive.

Extend the BPF API with __sk_attach_filter()/__sk_detach_filter() pair
that is used by tap filters and pass in lockdep_rtnl_is_held() for the
rcu_dereference_protected() checks instead.
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5a5abb1f

01 4月, 2016 1 次提交

rtnl: fix msg size calculation in if_nlmsg_size() · c57c7a95

由 Nicolas Dichtel 提交于 3月 31, 2016

Size of the attribute IFLA_PHYS_PORT_NAME was missing.

Fixes: db24a904 ("net: add support for phys_port_name")
CC: David Ahern <dsahern@gmail.com>
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: NDavid Ahern <dsahern@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c57c7a95

31 3月, 2016 5 次提交

bpf: make padding in bpf_tunnel_key explicit · c0e760c9

由 Daniel Borkmann 提交于 3月 30, 2016

Make the 2 byte padding in struct bpf_tunnel_key between tunnel_ttl
and tunnel_label members explicit. No issue has been observed, and
gcc/llvm does padding for the old struct already, where tunnel_label
was not yet present, so the current code works, but since it's part
of uapi, make sure we don't introduce holes in structs.

Therefore, add tunnel_ext that we can use generically in future
(f.e. to flag OAM messages for backends, etc). Also add the offset
to the compat tests to be sure should some compilers not padd the
tail of the old version of bpf_tunnel_key.

Fixes: 4018ab18 ("bpf: support flow label for bpf_skb_{set, get}_tunnel_key")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0e760c9

ipv6: udp: fix UDP_MIB_IGNOREDMULTI updates · 2d421226

由 Eric Dumazet 提交于 3月 29, 2016

IPv6 counters updates use a different macro than IPv4.

Fixes: 36cbb245 ("udp: Increment UDP_MIB_IGNOREDMULTI for arriving unmatched multicasts")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: Willem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d421226

gro: Allow tunnel stacking in the case of FOU/GUE · c3483384

由 Alexander Duyck 提交于 3月 29, 2016

This patch should fix the issues seen with a recent fix to prevent
tunnel-in-tunnel frames from being generated with GRO. The fix itself is
correct for now as long as we do not add any devices that support
NETIF_F_GSO_GRE_CSUM. When such a device is added it could have the
potential to mess things up due to the fact that the outer transport header
points to the outer UDP header and not the GRE header as would be expected.

Fixes: fac8e0f5 ("tunnels: Don't apply GRO to multiple layers of encapsulation.")
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3483384

sctp: really allow using GFP_KERNEL on sctp_packet_transmit · 28fd3498

由 Marcelo Ricardo Leitner 提交于 3月 29, 2016

Somehow my patch for commit cea8768f ("sctp: allow
sctp_transmit_packet and others to use gfp") missed two important
chunks, which are now added.

Fixes: cea8768f ("sctp: allow sctp_transmit_packet and others to use gfp")
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-By: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

28fd3498

bridge: Allow set bridge ageing time when switchdev disabled · 5e263f71

由 Haishuang Yan 提交于 3月 29, 2016

When NET_SWITCHDEV=n, switchdev_port_attr_set will return -EOPNOTSUPP,
we should ignore this error code and continue to set the ageing time.

Fixes: c62987bb ("bridge: push bridge setting ageing_time down to switchdev")
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Acked-by: NIdo Schimmel <idosch@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e263f71

28 3月, 2016 11 次提交

netfilter: ipv4: fix NULL dereference · 29421198

由 Liping Zhang 提交于 3月 26, 2016

Commit fa50d974 ("ipv4: Namespaceify ip_default_ttl sysctl knob")
use sock_net(skb->sk) to get the net namespace, but we can't assume
that sk_buff->sk is always exist, so when it is NULL, oops will happen.
Signed-off-by: NLiping Zhang <liping.zhang@spreadtrum.com>
Reviewed-by: NNikolay Borisov <kernel@kyup.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

29421198

netfilter: x_tables: enforce nul-terminated table name from getsockopt GET_ENTRIES · b301f253

由 Pablo Neira Ayuso 提交于 3月 24, 2016

Make sure the table names via getsockopt GET_ENTRIES is nul-terminated
in ebtables and all the x_tables variants and their respective compat
code. Uncovered by KASAN.
Reported-by: NBaozeng Ding <sploving1@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

b301f253

netfilter: nfnetlink_queue: honor NFQA_CFG_F_FAIL_OPEN when netlink unicast fails · 93140113

由 Pablo Neira Ayuso 提交于 3月 23, 2016

When netlink unicast fails to deliver the message to userspace, we
should also check if the NFQA_CFG_F_FAIL_OPEN flag is set so we reinject
the packet back to the stack.

I think the user expects no packet drops when this flag is set due to
queueing to userspace errors, no matter if related to the internal queue
or when sending the netlink message to userspace.

The userspace application will still get the ENOBUFS error via recvmsg()
so the user still knows that, with the current configuration that is in
place, the userspace application is not consuming the messages at the
pace that the kernel needs.
Reported-by: N"Yigal Reiss (yreiss)" <yreiss@cisco.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Tested-by: N"Yigal Reiss (yreiss)" <yreiss@cisco.com>

93140113

netfilter: x_tables: fix unconditional helper · 54d83fc7

由 Florian Westphal 提交于 3月 22, 2016

Ben Hawkes says:

 In the mark_source_chains function (net/ipv4/netfilter/ip_tables.c) it
 is possible for a user-supplied ipt_entry structure to have a large
 next_offset field. This field is not bounds checked prior to writing a
 counter value at the supplied offset.

Problem is that mark_source_chains should not have been called --
the rule doesn't have a next entry, so its supposed to return
an absolute verdict of either ACCEPT or DROP.

However, the function conditional() doesn't work as the name implies.
It only checks that the rule is using wildcard address matching.

However, an unconditional rule must also not be using any matches
(no -m args).

The underflow validator only checked the addresses, therefore
passing the 'unconditional absolute verdict' test, while
mark_source_chains also tested for presence of matches, and thus
proceeeded to the next (not-existent) rule.

Unify this so that all the callers have same idea of 'unconditional rule'.
Reported-by: NBen Hawkes <hawkes@google.com>
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

54d83fc7

netfilter: x_tables: make sure e->next_offset covers remaining blob size · 6e94e0cf

由 Florian Westphal 提交于 3月 22, 2016

Otherwise this function may read data beyond the ruleset blob.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

6e94e0cf

netfilter: x_tables: validate e->target_offset early · bdf533de

由 Florian Westphal 提交于 3月 22, 2016

We should check that e->target_offset is sane before
mark_source_chains gets called since it will fetch the target entry
for loop detection.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

bdf533de

openvswitch: call only into reachable nf-nat code · 99b7248e

由 Arnd Bergmann 提交于 3月 18, 2016

The openvswitch code has gained support for calling into the
nf-nat-ipv4/ipv6 modules, however those can be loadable modules
in a configuration in which openvswitch is built-in, leading
to link errors:

net/built-in.o: In function `__ovs_ct_lookup':
:(.text+0x2cc2c8): undefined reference to `nf_nat_icmp_reply_translation'
:(.text+0x2cc66c): undefined reference to `nf_nat_icmpv6_reply_translation'

The dependency on (!NF_NAT || NF_NAT) prevents similar issues,
but NF_NAT is set to 'y' if any of the symbols selecting
it are built-in, but the link error happens when any of them
are modular.

A second issue is that even if CONFIG_NF_NAT_IPV6 is built-in,
CONFIG_NF_NAT_IPV4 might be completely disabled. This is unlikely
to be useful in practice, but the driver currently only handles
IPv6 being optional.

This patch improves the Kconfig dependency so that openvswitch
cannot be built-in if either of the two other symbols are set
to 'm', and it replaces the incorrect #ifdef in ovs_ct_nat_execute()
with two "if (IS_ENABLED())" checks that should catch all corner
cases also make the code more readable.

The same #ifdef exists ovs_ct_nat_to_attr(), where it does not
cause a link error, but for consistency I'm changing it the same
way.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Fixes: 05752523 ("openvswitch: Interface with NAT.")
Acked-by: NJoe Stringer <joe@ovn.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

99b7248e

openvswitch: Fix checking for new expected connections. · 5745b0be

由 Jarno Rajahalme 提交于 3月 21, 2016

OVS should call into CT NAT for packets of new expected connections only
when the conntrack state is persisted with the 'commit' option to the
OVS CT action. The test for this condition is doubly wrong, as the CT
status field is ANDed with the bit number (IPS_EXPECTED_BIT) rather
than the mask (IPS_EXPECTED), and due to the wrong assumption that the
expected bit would apply only for the first (i.e., 'new') packet of a
connection, while in fact the expected bit remains on for the lifetime of
an expected connection. The 'ctinfo' value IP_CT_RELATED derived from
the ct status can be used instead, as it is only ever applicable to
the 'new' packets of the expected connection.

Fixes: 05752523 ('openvswitch: Interface with NAT.')
Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

5745b0be

netfilter: ipset: fix race condition in ipset save, swap and delete · 596cf3fe

由 Vishwanath Pai 提交于 3月 16, 2016

This fix adds a new reference counter (ref_netlink) for the struct ip_set.
The other reference counter (ref) can be swapped out by ip_set_swap and we
need a separate counter to keep track of references for netlink events
like dump. Using the same ref counter for dump causes a race condition
which can be demonstrated by the following script:

ipset create hash_ip1 hash:ip family inet hashsize 1024 maxelem 500000 \
counters
ipset create hash_ip2 hash:ip family inet hashsize 300000 maxelem 500000 \
counters
ipset create hash_ip3 hash:ip family inet hashsize 1024 maxelem 500000 \
counters

ipset save &

ipset swap hash_ip3 hash_ip2
ipset destroy hash_ip3 /* will crash the machine */

Swap will exchange the values of ref so destroy will see ref = 0 instead of
ref = 1. With this fix in place swap will not succeed because ipset save
still has ref_netlink on the set (ip_set_swap doesn't swap ref_netlink).

Both delete and swap will error out if ref_netlink != 0 on the set.

Note: The changes to *_head functions is because previously we would
increment ref whenever we called these functions, we don't do that
anymore.
Reviewed-by: NJoshua Hunt <johunt@akamai.com>
Signed-off-by: NVishwanath Pai <vpai@akamai.com>
Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

596cf3fe

openvswitch: Use proper buffer size in nla_memcpy · ac71b46e

由 Haishuang Yan 提交于 3月 28, 2016

For the input parameter count, it's better to use the size
of destination buffer size, as nla_memcpy would take into
account the length of the source netlink attribute when
a data is copied from an attribute.
Signed-off-by: NHaishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ac71b46e

Fix returned tc and hoplimit values for route with IPv6 encapsulation · 995096a0

由 Quentin Armitage 提交于 3月 27, 2016

For a route with IPv6 encapsulation, the traffic class and hop limit
values are interchanged when returned to userspace by the kernel.
For example, see below.

># ip route add 192.168.0.1 dev eth0.2 encap ip6 dst 0x50 tc 0x50 hoplimit 100 table 1000
># ip route show table 1000
192.168.0.1  encap ip6 id 0 src :: dst fe83::1 hoplimit 80 tc 100 dev eth0.2  scope link
Signed-off-by: NQuentin Armitage <quentin@armitage.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

995096a0

26 3月, 2016 7 次提交

libceph: use KMEM_CACHE macro · 5ee61e95

由 Geliang Tang 提交于 3月 13, 2016

Use KMEM_CACHE() instead of kmem_cache_create() to simplify the code.
Signed-off-by: NGeliang Tang <geliangtang@163.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

5ee61e95

libceph: use sizeof_footer() more · 89f08173

由 Ilya Dryomov 提交于 2月 20, 2016

Don't open-code sizeof_footer() in read_partial_message() and
ceph_msg_revoke().  Also, after switching to sizeof_footer(), it's now
possible to use con_out_kvec_add() in prepare_write_message_footer().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
Reviewed-by: NAlex Elder <elder@linaro.org>

89f08173

libceph: add helper that duplicates last extent operation · 2c63f49a

由 Yan, Zheng 提交于 1月 07, 2016

This helper duplicates last extent operation in OSD request, then
adjusts the new extent operation's offset and length. The helper
is for scatterd page writeback, which adds nonconsecutive dirty
pages to single OSD request.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

2c63f49a

libceph: enable large, variable-sized OSD requests · 3f1af42a

由 Ilya Dryomov 提交于 2月 09, 2016

Turn r_ops into a flexible array member to enable large, consisting of
up to 16 ops, OSD requests.  The use case is scattered writeback in
cephfs and, as far as the kernel client is concerned, 16 is just a made
up number.

r_ops had size 3 for copyup+hint+write, but copyup is really a special
case - it can only happen once.  ceph_osd_request_cache is therefore
stuffed with num_ops=2 requests, anything bigger than that is allocated
with kmalloc().  req_mempool is backed by ceph_osd_request_cache, which
means either num_ops=1 or num_ops=2 for use_mempool=true - all existing
users (ceph_writepages_start(), ceph_osdc_writepages()) are fine with
that.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

3f1af42a

libceph: osdc->req_mempool should be backed by a slab pool · 9e767adb

由 Ilya Dryomov 提交于 2月 09, 2016

ceph_osd_request_cache was introduced a long time ago.  Also, osd_req
is about to get a flexible array member, which ceph_osd_request_cache
is going to be aware of.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

9e767adb

libceph: make r_request msg_size calculation clearer · ae458f5a

由 Ilya Dryomov 提交于 2月 11, 2016

Although msg_size is calculated correctly, the terms are grouped in
a misleading way - snaps appears to not have room for a u32 length.
Move calculation closer to its use and regroup terms.

No functional change.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

ae458f5a

libceph: move r_reply_op_{len,result} into struct ceph_osd_req_op · 7665d85b

由 Yan, Zheng 提交于 1月 07, 2016

This avoids defining large array of r_reply_op_{len,result} in
in struct ceph_osd_request.
Signed-off-by: NYan, Zheng <zyan@redhat.com>
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7665d85b

openeuler / Kernel 接近 2 年 前同步成功

openeuler / Kernel
接近 2 年前同步成功