提交 · 7a28f59bf9fb220cdf56ac6ab539fc4a0ae59414 · openeuler / raspberrypi-kernel

26 5月, 2016 24 次提交

libceph: allocate ceph_osd with GFP_NOFAIL · 7a28f59b

由 Ilya Dryomov 提交于 4月 28, 2016

create_osd() is called way too deep in the stack to be able to error
out in a sane way; a failing create_osd() just messes everything up.
The current req_notarget list solution is broken - the list is never
traversed as it's not entirely clear when to do it, I guess.

If we were to start traversing it at regular intervals and retrying
each request, we wouldn't be far off from what __GFP_NOFAIL is doing,
so allocate OSD sessions with __GFP_NOFAIL, at least until we come up
with a better fix.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

7a28f59b

libceph: osd_init() and osd_cleanup() · 0247a0cf

由 Ilya Dryomov 提交于 4月 28, 2016

These are going to be used by homeless OSD sessions code.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0247a0cf

libceph: handle_one_map() · 42c1b124

由 Ilya Dryomov 提交于 4月 28, 2016

Separate osdmap handling from decoding and iterating over a bag of maps
in a fresh MOSDMap message.  This sets up the scene for the updated OSD
client.

Of particular importance here is the addition of pi->was_full, which
can be used to answer "did this pool go full -> not-full in this map?".
This is the key bit for supporting pool quotas.

We won't be able to downgrade map_sem for much longer, so drop
downgrade_write().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

42c1b124

libceph: allocate dummy osdmap in ceph_osdc_init() · e5253a7b

由 Ilya Dryomov 提交于 4月 28, 2016

This leads to a simpler osdmap handling code, particularly when dealing
with pi->was_full, which is introduced in a later commit.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

e5253a7b

libceph: schedule tick from ceph_osdc_init() · fbca9635

由 Ilya Dryomov 提交于 4月 28, 2016

Both homeless OSD sessions and watch/notify v2, introduced in later
commits, require periodic ticks which don't depend on ->num_requests.
Schedule the initial tick from ceph_osdc_init() and reschedule from
handle_timeout() unconditionally.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fbca9635

libceph: move schedule_delayed_work() in ceph_osdc_init() · b37ee1b9

由 Ilya Dryomov 提交于 4月 28, 2016

ceph_osdc_stop() isn't called if ceph_osdc_init() fails, so we end up
with handle_osds_timeout() running on invalid memory if any one of the
allocations fails.  Call schedule_delayed_work() after everything is
setup, just before returning.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

b37ee1b9

libceph: redo callbacks and factor out MOSDOpReply decoding · fe5da05e

由 Ilya Dryomov 提交于 4月 28, 2016

If you specify ACK | ONDISK and set ->r_unsafe_callback, both
->r_callback and ->r_unsafe_callback(true) are called on ack.  This is
very confusing.  Redo this so that only one of them is called:

    ->r_unsafe_callback(true), on ack
    ->r_unsafe_callback(false), on commit

or

    ->r_callback, on ack|commit

Decode everything in decode_MOSDOpReply() to reduce clutter.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fe5da05e

libceph: drop msg argument from ceph_osdc_callback_t · 85e084fe

由 Ilya Dryomov 提交于 4月 28, 2016

finish_read(), its only user, uses it to get to hdr.data_len, which is
what ->r_result is set to on success. This gains us the ability to
safely call callbacks from contexts other than reply, e.g. map check.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

85e084fe

libceph: switch to calc_target(), part 2 · bb873b53

由 Ilya Dryomov 提交于 5月 26, 2016

The crux of this is getting rid of ceph_osdc_build_request(), so that
MOSDOp can be encoded not before but after calc_target() calculates the
actual target. Encoding now happens within ceph_osdc_start_request().

Also nuked is the accompanying bunch of pointers into the encoded
buffer that was used to update fields on each send - instead, the
entire front is re-encoded. If we want to support target->name_len !=
base->name_len in the future, there is no other way, because oid is
surrounded by other fields in the encoded buffer.

Encoding OSD ops and adding data items to the request message were
mixed together in osd_req_encode_op(). While we want to re-encode OSD
ops, we don't want to add duplicate data items to the message when
resending, so all call to ceph_osdc_msg_data_add() are factored out
into a new setup_request_data().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

bb873b53

libceph: switch to calc_target(), part 1 · a66dd383

由 Ilya Dryomov 提交于 4月 28, 2016

Replace __calc_request_pg() and most of __map_request() with
calc_target() and start using req->r_t.

ceph_osdc_build_request() however still encodes base_oid, because it's
called before calc_target() is and target_oid is empty at that point in
time; a printf in osdc_show() also shows base_oid.  This is fixed in
"libceph: switch to calc_target(), part 2".
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

a66dd383

libceph: introduce ceph_osd_request_target, calc_target() · 63244fa1

由 Ilya Dryomov 提交于 4月 28, 2016

Introduce ceph_osd_request_target, containing all mapping-related
fields of ceph_osd_request and calc_target() for calculating mappings
and populating it.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

63244fa1

libceph: pi->min_size, pi->last_force_request_resend · 04812acf

由 Ilya Dryomov 提交于 4月 28, 2016

Add and decode pi->min_size and pi->last_force_request_resend.  These
are going to be used by calc_target().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

04812acf

libceph: make pgid_cmp() global · f984cb76

由 Ilya Dryomov 提交于 4月 28, 2016

calc_target() code is going to need to know how to compare PGs.  Take
lhs and rhs pgid by const * while at it.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f984cb76

libceph: rename ceph_calc_pg_primary() · f81f1633

由 Ilya Dryomov 提交于 4月 28, 2016

Rename ceph_calc_pg_primary() to ceph_pg_to_acting_primary() to
emphasise that it returns acting primary.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

f81f1633

libceph: ceph_osds, ceph_pg_to_up_acting_osds() · 6f3bfd45

由 Ilya Dryomov 提交于 4月 28, 2016

Knowning just acting set isn't enough, we need to be able to record up
set as well to detect interval changes.  This means returning (up[],
up_len, up_primary, acting[], acting_len, acting_primary) and passing
it around.  Introduce and switch to ceph_osds to help with that.

Rename ceph_calc_pg_acting() to ceph_pg_to_up_acting_osds() and return
both up and acting sets from it.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

6f3bfd45

libceph: rename ceph_oloc_oid_to_pg() · d9591f5e

由 Ilya Dryomov 提交于 4月 28, 2016

Rename ceph_oloc_oid_to_pg() to ceph_object_locator_to_pg().  Emphasise
that returned is raw PG and return -ENOENT instead of -EIO if the pool
doesn't exist.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d9591f5e

libceph: DEFINE_RB_FUNCS macro · fcd00b68

由 Ilya Dryomov 提交于 4月 28, 2016

Given

    struct foo {
        u64 id;
        struct rb_node bar_node;
    };

generate insert_bar(), erase_bar() and lookup_bar() functions with

    DEFINE_RB_FUNCS(bar, struct foo, id, bar_node)

The key is assumed to be an integer (u64, int, etc), compared with
< and >.  nodefld has to be initialized with RB_CLEAR_NODE().

Start using it for MDS, MON and OSD requests and OSD sessions.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

fcd00b68

libceph: open-code remove_{all,old}_osds() · 42a2c09f

由 Ilya Dryomov 提交于 4月 28, 2016

They are called only once, from ceph_osdc_stop() and
handle_osds_timeout() respectively.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

42a2c09f

libceph: nuke unused fields and functions · 0c0a8de1

由 Ilya Dryomov 提交于 4月 28, 2016

Either unused or useless:

    osdmap->mkfs_epoch
    osd->o_marked_for_keepalive
    monc->num_generic_requests
    osdc->map_waiters
    osdc->last_requested_map
    osdc->timeout_tid

    osd_req_op_cls_response_data()

    osdmap_apply_incremental() @msgr arg
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

0c0a8de1

libceph: variable-sized ceph_object_id · d30291b9

由 Ilya Dryomov 提交于 4月 29, 2016

Currently ceph_object_id can hold object names of up to 100
(CEPH_MAX_OID_NAME_LEN) characters.  This is enough for all use cases,
expect one - long rbd image names:

- a format 1 header is named "<imgname>.rbd"
- an object that points to a format 2 header is named "rbd_id.<imgname>"

We operate on these potentially long-named objects during rbd map, and,
for format 1 images, during header refresh.  (A format 2 header name is
a small system-generated string.)

Lift this 100 character limit by making ceph_object_id be able to point
to an externally-allocated string.  Apart from being able to work with
almost arbitrarily-long named objects, this allows us to reduce the
size of ceph_object_id from >100 bytes to 64 bytes.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

d30291b9

libceph: change how osd_op_reply message size is calculated · 711da55d

由 Ilya Dryomov 提交于 4月 27, 2016

For a message pool message, preallocate a page, just like we do for
osd_op.  For a normal message, take ceph_object_id into account and
don't bother subtracting CEPH_OSD_SLAB_OPS ceph_osd_ops.
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

711da55d

libceph: move message allocation out of ceph_osdc_alloc_request() · 13d1ad16

由 Ilya Dryomov 提交于 4月 27, 2016

The size of ->r_request and ->r_reply messages depends on the size of
the object name (ceph_object_id), while the size of ceph_osd_request is
fixed.  Move message allocation into a separate function that would
have to be called after ceph_object_id and ceph_object_locator (which
is also going to become variable in size with RADOS namespaces) have
been filled in:

    req = ceph_osdc_alloc_request(...);
    <fill in req->r_base_oid>
    <fill in req->r_base_oloc>
    ceph_osdc_alloc_messages(req);
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

13d1ad16

libceph: grab snapc in ceph_osdc_alloc_request() · 84127282

由 Ilya Dryomov 提交于 4月 26, 2016

ceph_osdc_build_request() is going away.  Grab snapc and initialize
->r_snapid in ceph_osdc_alloc_request().
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>

84127282

I
libceph: make ceph_osdc_put_request() accept NULL · 3ed97d63
由 Ilya Dryomov 提交于 4月 26, 2016
```
Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
```
3ed97d63

15 5月, 2016 2 次提交

net/route: enforce hoplimit max value · 626abd59

由 Paolo Abeni 提交于 5月 13, 2016

Currently, when creating or updating a route, no check is performed
in both ipv4 and ipv6 code to the hoplimit value.

The caller can i.e. set hoplimit to 256, and when such route will
 be used, packets will be sent with hoplimit/ttl equal to 0.

This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4
ipv6 route code, substituting any value greater than 255 with 255.

This is consistent with what is currently done for ADVMSS and MTU
in the ipv4 code.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

626abd59

nf_conntrack: avoid kernel pointer value leak in slab name · 31b0b385

由 Linus Torvalds 提交于 5月 14, 2016

The slab name ends up being visible in the directory structure under
/sys, and even if you don't have access rights to the file you can see
the filenames.

Just use a 64-bit counter instead of the pointer to the 'net' structure
to generate a unique name.

This code will go away in 4.7 when the conntrack code moves to a single
kmemcache, but this is the backportable simple solution to avoiding
leaking kernel pointers to user space.

Fixes: 5b3501fa ("netfilter: nf_conntrack: per netns nf_conntrack_cachep")
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31b0b385

12 5月, 2016 2 次提交

gre: do not keep the GRE header around in collect medata mode · e271c7b4

由 Jiri Benc 提交于 5月 11, 2016

For ipgre interface in collect metadata mode, it doesn't make sense for the
interface to be of ARPHRD_IPGRE type. The outer header of received packets
is not needed, as all the information from it is present in metadata_dst. We
already don't set ipgre_header_ops for collect metadata interfaces, which is
the only consumer of mac_header pointing to the outer IP header.

Just set the interface type to ARPHRD_NONE in collect metadata mode for
ipgre (not gretap, that still correctly stays ARPHRD_ETHER) and reset
mac_header.

Fixes: a64b04d8 ("gre: do not assign header_ops in collect metadata mode")
Fixes: 2e15ea39 ("ip_gre: Add support to collect tunnel metadata.")
Signed-off-by: NJiri Benc <jbenc@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e271c7b4

openvswitch: Fix cached ct with helper. · 16ec3d4f

由 Joe Stringer 提交于 5月 11, 2016

When using conntrack helpers from OVS, a common configuration is to
perform a lookup without specifying a helper, then go through a
firewalling policy, only to decide to attach a helper afterwards.

In this case, the initial lookup will cause a ct entry to be attached to
the skb, then the later commit with helper should attach the helper and
confirm the connection. However, the helper attachment has been missing.
If the user has enabled automatic helper attachment, then this issue
will be masked as it will be applied in init_conntrack(). It is also
masked if the action is executed from ovs_packet_cmd_execute() as that
will construct a fresh skb.

This patch fixes the issue by making an explicit call to try to assign
the helper if there is a discrepancy between the action's helper and the
current skb->nfct.

Fixes: cae3a262 ("openvswitch: Allow attaching helpers to ct action")
Signed-off-by: NJoe Stringer <joe@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

16ec3d4f

11 5月, 2016 7 次提交

net sched: ife action fix late binding · 4e8c8615

由 Jamal Hadi Salim 提交于 5月 10, 2016

The process below was broken and is fixed with this patch.

//add an ife action and give it an instance id of 1
sudo tc actions add action ife encode \
type 0xDEAD allow mark dst 02:15:15:15:15:15 index 1

//create a filter which binds to ife action id 1
sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:11 action ife index 1

Message before fix was:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e8c8615

net sched: skbedit action fix late binding · 5e1567ae

由 Jamal Hadi Salim 提交于 5月 10, 2016

The process below was broken and is fixed with this patch.

//add a skbedit action and give it an instance id of 1
sudo tc actions add action skbedit mark 10 index 1
//create a filter which binds to skbedit action id 1
sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:10 action skbedit index 1

Message before fix was:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e1567ae

net sched: simple action fix late binding · 0e5538ab

由 Jamal Hadi Salim 提交于 5月 10, 2016

The process below was broken and is fixed with this patch.

//add a simple action and give it an instance id of 1
sudo tc actions add action simple sdata "foobar" index 1
//create a filter which binds to simple action id 1
sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:10 action simple index 1

Message before fix was:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e5538ab

net sched: mirred action fix late binding · 87dfbdc6

由 Jamal Hadi Salim 提交于 5月 10, 2016

The process below was broken and is fixed with this patch.

//add an mirred action and give it an instance id of 1
sudo tc actions add action mirred egress mirror dev $MDEV  index 1
//create a filter which binds to mirred action id 1
sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:10 action mirred index 1

Message before bug fix was:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

87dfbdc6

net sched: ipt action fix late binding · a57f19d3

由 Jamal Hadi Salim 提交于 5月 10, 2016

This was broken and is fixed with this patch.

//add an ipt action and give it an instance id of 1
sudo tc actions add action ipt -j mark --set-mark 2 index 1
//create a filter which binds to ipt action id 1
sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32\
match ip dst 17.0.0.1/32 flowid 1:10 action ipt index 1

Message before bug fix was:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a57f19d3

net sched: vlan action fix late binding · 5026c9b1

由 Jamal Hadi Salim 提交于 5月 10, 2016

Late vlan action binding was broken and is fixed with this patch.

//add a vlan action to pop and give it an instance id of 1
sudo tc actions add action vlan pop index 1
//create filter which binds to vlan action id 1
sudo tc filter add dev $DEV parent ffff: protocol ip prio 1 u32 \
match ip dst 17.0.0.1/32 flowid 1:1 action vlan index 1

current message(before bug fix) was:
RTNETLINK answers: Invalid argument
We have an error talking to the kernel
Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
Reviewed-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5026c9b1

tcp: refresh skb timestamp at retransmit time · 10a81980

由 Eric Dumazet 提交于 5月 09, 2016

In the very unlikely case __tcp_retransmit_skb() can not use the cloning
done in tcp_transmit_skb(), we need to refresh skb_mstamp before doing
the copy and transmit, otherwise TCP TS val will be an exact copy of
original transmit.

Fixes: 7faee5c0 ("tcp: remove TCP_SKB_CB(skb)->when")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: NYuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

10a81980

10 5月, 2016 1 次提交

net: fix a kernel infoleak in x25 module · 79e48650

由 Kangjie Lu 提交于 5月 08, 2016

Stack object "dte_facilities" is allocated in x25_rx_call_request(),
which is supposed to be initialized in x25_negotiate_facilities.
However, 5 fields (8 bytes in total) are not initialized. This
object is then copied to userland via copy_to_user, thus infoleak
occurs.
Signed-off-by: NKangjie Lu <kjlu@gatech.edu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

79e48650

07 5月, 2016 4 次提交

udp_offload: Set encapsulation before inner completes. · 229740c6

由 Jarno Rajahalme 提交于 5月 03, 2016

UDP tunnel segmentation code relies on the inner offsets being set for
an UDP tunnel GSO packet, but the inner *_complete() functions will
set the inner offsets only if 'encapsulation' is set before calling
them. Currently, udp_gro_complete() sets 'encapsulation' only after
the inner *_complete() functions are done. This causes the inner
offsets having invalid values after udp_gro_complete() returns, which
in turn will make it impossible to properly segment the packet in case
it needs to be forwarded, which would be visible to the user either as
invalid packets being sent or as packet loss.

This patch fixes this by setting skb's 'encapsulation' in
udp_gro_complete() before calling into the inner complete functions,
and by making each possible UDP tunnel gro_complete() callback set the
inner_mac_header to the beginning of the tunnel payload.
Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
Reviewed-by: NAlexander Duyck <aduyck@mirantis.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

229740c6

udp_tunnel: Remove redundant udp_tunnel_gro_complete(). · 43b8448c

由 Jarno Rajahalme 提交于 5月 03, 2016

The setting of the UDP tunnel GSO type is already performed by
udp[46]_gro_complete().
Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43b8448c

net: ipv6: tcp reset, icmp need to consider L3 domain · 1d2f7b2d

由 David Ahern 提交于 5月 04, 2016

Responses for packets to unused ports are getting lost with L3 domains.

IPv4 has ip_send_unicast_reply for sending TCP responses which accounts
for L3 domains; update the IPv6 counterpart tcp_v6_send_response.
For icmp the L3 master check needs to be moved up in icmp6_send
to properly respond to UDP packets to a port with no listener.

Fixes: ca254490 ("net: Add VRF support to IPv6 stack")
Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1d2f7b2d

bridge: fix igmp / mld query parsing · 856ce5d0

由 Linus Lüssing 提交于 5月 04, 2016

With the newly introduced helper functions the skb pulling is hidden
in the checksumming function - and undone before returning to the
caller.

The IGMP and MLD query parsing functions in the bridge still
assumed that the skb is pointing to the beginning of the IGMP/MLD
message while it is now kept at the beginning of the IPv4/6 header.

If there is a querier somewhere else, then this either causes
the multicast snooping to stay disabled even though it could be
enabled. Or, if we have the querier enabled too, then this can
create unnecessary IGMP / MLD query messages on the link.

Fixing this by taking the offset between IP and IGMP/MLD header into
account, too.

Fixes: 9afd85c9 ("net: Export IGMP/MLD message validation code")
Reported-by: NSimon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: NLinus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

856ce5d0