提交 · 83ad357dee467f63574de35752bc40033deab30e · openeuler / Kernel

16 6月, 2017 2 次提交

skbuff: make skb_put_zero() return void · 83ad357d

由 Johannes Berg 提交于 6月 14, 2017

It's nicer to return void, since then there's no need to
cast to any structures. Currently none of the users have
a cast, but a number of future conversions do.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83ad357d

tls: kernel TLS support · 3c4d7559

由 Dave Watson 提交于 6月 14, 2017

Software implementation of transport layer security, implemented using ULP
infrastructure.  tcp proto_ops are replaced with tls equivalents of sendmsg and
sendpage.

Only symmetric crypto is done in the kernel, keys are passed by setsockopt
after the handshake is complete.  All control messages are supported via CMSG
data - the actual symmetric encryption is the same, just the message type needs
to be passed separately.

For user API, please see Documentation patch.

Pieces that can be shared between hw and sw implementation
are in tls_main.c
Signed-off-by: NBoris Pismenny <borisp@mellanox.com>
Signed-off-by: NIlya Lesokhin <ilyal@mellanox.com>
Signed-off-by: NAviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: NDave Watson <davejwatson@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3c4d7559

15 6月, 2017 2 次提交

net: update undefined ->ndo_change_mtu() comment · db46a0e1

由 Magnus Damm 提交于 6月 14, 2017

Update ->ndo_change_mtu() callback comment to remove text
about returning error in case of undefined callback. This
change makes the comment match the existing code behavior.
Signed-off-by: NMagnus Damm <damm+renesas@opensource.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db46a0e1

bpf: permits narrower load from bpf program context fields · 31fd8581

由 Yonghong Song 提交于 6月 13, 2017

Currently, verifier will reject a program if it contains an
narrower load from the bpf context structure. For example,
        __u8 h = __sk_buff->hash, or
        __u16 p = __sk_buff->protocol
        __u32 sample_period = bpf_perf_event_data->sample_period
which are narrower loads of 4-byte or 8-byte field.

This patch solves the issue by:
  . Introduce a new parameter ctx_field_size to carry the
    field size of narrower load from prog type
    specific *__is_valid_access validator back to verifier.
  . The non-zero ctx_field_size for a memory access indicates
    (1). underlying prog type specific convert_ctx_accesses
         supporting non-whole-field access
    (2). the current insn is a narrower or whole field access.
  . In verifier, for such loads where load memory size is
    less than ctx_field_size, verifier transforms it
    to a full field load followed by proper masking.
  . Currently, __sk_buff and bpf_perf_event_data->sample_period
    are supporting narrowing loads.
  . Narrower stores are still not allowed as typical ctx stores
    are just normal stores.

Because of this change, some tests in verifier will fail and
these tests are removed. As a bonus, rename some out of bound
__sk_buff->cb access to proper field name and remove two
redundant "skb cb oob" tests.
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NYonghong Song <yhs@fb.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

31fd8581

14 6月, 2017 3 次提交

of_mdio: move of_mdio_parse_addr to header file · 4798a714

由 Jon Mason 提交于 6月 13, 2017

The of_mdio_parse_addr() helper function is useful to other code, but
the module dependency chain causes issues.  To work around this, we can
move of_mdio_parse_addr() to be an inline function in the header file.
This gets rid of the dependencies and still allows for the reuse of
code.
Reported-by: NLiviu Dudau <liviu@dudau.co.uk>
Signed-off-by: NJon Mason <jon.mason@broadcom.com>
Fixes: 342fa196 ("mdio: mux: make child bus walking more permissive and errors more verbose")
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4798a714

net: phy: Make phy_ethtool_ksettings_get return void · 5514174f

由 yuval.shaia@oracle.com 提交于 6月 13, 2017

Make return value void since function never return meaningfull value
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Acked-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5514174f

mdio_bus: handle only single PHY reset GPIO · d396e84c

由 Sergei Shtylyov 提交于 6月 12, 2017

Commit 4c5e7a2c ("dt-bindings: mdio: Clarify binding document")
declared that a MDIO reset GPIO property should have only a single GPIO
reference/specifier, however the supporting code was left intact, still
burdening the kernel with now apparently useless loops -- get rid of them.
Signed-off-by: NSergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d396e84c

13 6月, 2017 2 次提交

cfg80211: support 4-way handshake offloading for 802.1X · 3a00df57

由 Avraham Stern 提交于 6月 09, 2017

Add API for setting the PMK to the driver. For FT support, allow
setting also the PMK-R0 Name.

This can be used by drivers that support 4-Way handshake offload
while IEEE802.1X authentication is managed by upper layers.
Signed-off-by: NAvraham Stern <avraham.stern@intel.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
[arend.vanspriel@broadcom.com: add WANT_1X_4WAY_HS attribute]
Signed-off-by: NArend van Spriel <arend.vanspriel@broadcom.com>
[reword NL80211_EXT_FEATURE_4WAY_HANDSHAKE_STA_1X docs a bit to
say that the device may require it]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

3a00df57

cfg80211: support 4-way handshake offloading for WPA/WPA2-PSK · 91b5ab62

由 Eliad Peller 提交于 6月 09, 2017

Let drivers advertise support for station-mode 4-way handshake
offloading with a new NL80211_EXT_FEATURE_4WAY_HANDSHAKE_STA_PSK flag.

Extend use of NL80211_ATTR_PMK attribute indicating it might be passed
as part of NL80211_CMD_CONNECT command, and contain the PSK (which is
the PMK, hence the name.)

The driver/device is assumed to handle the 4-way handshake by
itself in this case (including key derivations, etc.), instead
of relying on the supplicant.

This patch is somewhat based on this one (by Vladimir Kondratiev):
https://patchwork.kernel.org/patch/1309561/.
Signed-off-by: NVladimir Kondratiev <qca_vkondrat@qca.qualcomm.com>
Signed-off-by: NEliad Peller <eliadx.peller@intel.com>
Signed-off-by: NLuca Coelho <luciano.coelho@intel.com>
[arend.vanspriel@broadcom.com rebase dealing with existing ATTR_PMK]
Signed-off-by: NArend van Spriel <arend.vanspriel@broadcom.com>
[reword NL80211_EXT_FEATURE_4WAY_HANDSHAKE_STA_PSK docs to indicate
that this offload might be required]
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

91b5ab62

12 6月, 2017 3 次提交

udp: avoid a cache miss on dequeue · 0a463c78

由 Paolo Abeni 提交于 6月 12, 2017

Since UDP no more uses sk->destructor, we can clear completely
the skb head state before enqueuing. Amend and use
skb_release_head_state() for that.

All head states share a single cacheline, which is not
normally used/accesses on dequeue. We can avoid entirely accessing
such cacheline implementing and using in the UDP code a specialized
skb free helper which ignores the skb head state.

This saves a cacheline miss at skb deallocation time.

v1 -> v2:
  replaced secpath_reset() with skb_release_head_state()
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a463c78

net: factor out a helper to decrement the skb refcount · 3889a803

由 Paolo Abeni 提交于 6月 12, 2017

The same code is replicated in 3 different places; move it to a
common helper.
Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3889a803

compiler, clang: properly override 'inline' for clang · 6d53cefb

由 Linus Torvalds 提交于 6月 11, 2017

Commit abb2ea7d ("compiler, clang: suppress warning for unused
static inline functions") just caused more warnings due to re-defining
the 'inline' macro.

So undef it before re-defining it, and also add the 'notrace' attribute
like the gcc version that this is overriding does.

Maybe this makes clang happier.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d53cefb

10 6月, 2017 5 次提交

qed*: LL2 callback operations · 0518c12f

由 Michal Kalderon 提交于 6月 09, 2017

LL2 today is interrupt driven - when tx/rx completion arrives [or any
other indication], qed needs to operate on the connection and pass
the information to the protocol-driver [or internal qed consumer].
Since we have several flavors of ll2 employeed by the driver,
each handler needs to do an if-else to determine the right functionality
to use based on the connection type.

In order to make things more scalable [given that we're going to add
additional types of ll2 flavors] move the infrastrucutre into using
a callback-based approach - the callbacks would be provided as part
of the connection's initialization parameters.
Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0518c12f

qed: Cleaner seperation of LL2 inputs · 13c54771

由 Mintz, Yuval 提交于 6月 09, 2017

A LL2 connection [qed_ll2_info] has a sub-structure of type qed_ll2_conn
that contain various inputs for ll2 acquisition, but the connection also
utilizes a couple of other inputs.

Restructure the input structure to include all the inputs and refactor
the code necessary to populate those.
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13c54771

qed: Revise ll2 Rx completion · 68be910c

由 Mintz, Yuval 提交于 6月 09, 2017

This introduces qed_ll2_comp_rx_data as a public struct
and moves handling of Rx packets in LL2 into using it.
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68be910c

qed: LL2 to use packed information for tx · 7c7973b2

由 Mintz, Yuval 提交于 6月 09, 2017

First step in revising the LL2 interface, this declares
qed_ll2_tx_pkt_info as part of the ll2 interface, and uses it for
transmission instead of receiving lots of parameters.
Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c7973b2

Ipvlan should return an error when an address is already in use. · 3ad7d246

由 Krister Johansen 提交于 6月 08, 2017

The ipvlan code already knows how to detect when a duplicate address is
about to be assigned to an ipvlan device. However, that failure is not
propogated outward and leads to a silent failure.

Introduce a validation step at ip address creation time and allow device
drivers to register to validate the incoming ip addresses. The ipvlan
code is the first consumer. If it detects an address in use, we can
return an error to the user before beginning to commit the new ifa in
the networking code.

This can be especially useful if it is necessary to provision many
ipvlans in containers. The provisioning software (or operator) can use
this to detect situations where an ip address is unexpectedly in use.
Signed-off-by: NKrister Johansen <kjlx@templeofstupid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ad7d246

09 6月, 2017 1 次提交

KEYS: sanitize key structs before freeing · 0620fddb

由 Eric Biggers 提交于 6月 08, 2017

While a 'struct key' itself normally does not contain sensitive
information, Documentation/security/keys.txt actually encourages this:

"Having a payload is not required; and the payload can, in fact,
just be a value stored in the struct key itself."

In case someone has taken this advice, or will take this advice in the
future, zero the key structure before freeing it. We might as well, and
as a bonus this could make it a bit more difficult for an adversary to
determine which keys have recently been in use.

This is safe because the key_jar cache does not use a constructor.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NJames Morris <james.l.morris@oracle.com>

0620fddb

08 6月, 2017 8 次提交

srcu: Allow use of Classic SRCU from both process and interrupt context · 1123a604

由 Paolo Bonzini 提交于 5月 31, 2017

Linu Cherian reported a WARN in cleanup_srcu_struct() when shutting
down a guest running iperf on a VFIO assigned device.  This happens
because irqfd_wakeup() calls srcu_read_lock(&kvm->irq_srcu) in interrupt
context, while a worker thread does the same inside kvm_set_irq().  If the
interrupt happens while the worker thread is executing __srcu_read_lock(),
updates to the Classic SRCU ->lock_count[] field or the Tree SRCU
->srcu_lock_count[] field can be lost.

The docs say you are not supposed to call srcu_read_lock() and
srcu_read_unlock() from irq context, but KVM interrupt injection happens
from (host) interrupt context and it would be nice if SRCU supported the
use case.  KVM is using SRCU here not really for the "sleepable" part,
but rather due to its IPI-free fast detection of grace periods.  It is
therefore not desirable to switch back to RCU, which would effectively
revert commit 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING",
2014-01-16).

However, the docs are overly conservative.  You can have an SRCU instance
only has users in irq context, and you can mix process and irq context
as long as process context users disable interrupts.  In addition,
__srcu_read_unlock() actually uses this_cpu_dec() on both Tree SRCU and
Classic SRCU.  For those two implementations, only srcu_read_lock()
is unsafe.

When Classic SRCU's __srcu_read_unlock() was changed to use this_cpu_dec(),
in commit 5a41344a ("srcu: Simplify __srcu_read_unlock() via
this_cpu_dec()", 2012-11-29), __srcu_read_lock() did two increments.
Therefore it kept __this_cpu_inc(), with preempt_disable/enable in
the caller.  Tree SRCU however only does one increment, so on most
architectures it is more efficient for __srcu_read_lock() to use
this_cpu_inc(), and any performance differences appear to be down in
the noise.

Cc: stable@vger.kernel.org
Fixes: 719d93cd ("kvm/irqchip: Speed up KVM_SET_GSI_ROUTING")
Reported-by: NLinu Cherian <linuc.decode@gmail.com>
Suggested-by: NLinu Cherian <linuc.decode@gmail.com>
Cc: kvm@vger.kernel.org
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

1123a604

net: ipv6: Release route when device is unregistering · 8397ed36

由 David Ahern 提交于 6月 07, 2017

Roopa reported attempts to delete a bond device that is referenced in a
multipath route is hanging:

$ ifdown bond2    # ifupdown2 command that deletes virtual devices
unregister_netdevice: waiting for bond2 to become free. Usage count = 2

Steps to reproduce:
    echo 1 > /proc/sys/net/ipv6/conf/all/ignore_routes_with_linkdown
    ip link add dev bond12 type bond
    ip link add dev bond13 type bond
    ip addr add 2001:db8:2::0/64 dev bond12
    ip addr add 2001:db8:3::0/64 dev bond13
    ip route add 2001:db8:33::0/64 nexthop via 2001:db8:2::2 nexthop via 2001:db8:3::2
    ip link del dev bond12
    ip link del dev bond13

The root cause is the recent change to keep routes on a linkdown. Update
the check to detect when the device is unregistering and release the
route for that case.

Fixes: a1a22c12 ("net: ipv6: Keep nexthop of multipath route on admin down")
Reported-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8397ed36

net: propagate tc filter chain index down the ndo_setup_tc call · a5fcf8a6

由 Jiri Pirko 提交于 6月 06, 2017

We need to push the chain index down to the drivers, so they have the
information to which chain the rule belongs. For now, no driver supports
multichain offload, so only chain 0 is supported. This is needed to
prevent chain squashes during offload for now. Later this will be used
to implement multichain offload.
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5fcf8a6

net/mlx5e: Add support for reading connector type from PTYS · 5b4793f8

由 Eran Ben Elisha 提交于 2月 13, 2017

Read port connector type from the firmware instead of caching it in the
driver metadata.
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

5b4793f8

net/mlx5: Update flow table commands layout · 0c90e9c6

由 Maor Gottlieb 提交于 3月 12, 2017

Update struct mlx5_ifc_create(modify)_flow_table_bits according to
the last device specification.
Signed-off-by: NMaor Gottlieb <maorg@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>

0c90e9c6

net: Fix inconsistent teardown and release of private netdev state. · cf124db5

由 David S. Miller 提交于 5月 08, 2017

Network devices can allocate reasources and private memory using
netdev_ops->ndo_init().  However, the release of these resources
can occur in one of two different places.

Either netdev_ops->ndo_uninit() or netdev->destructor().

The decision of which operation frees the resources depends upon
whether it is necessary for all netdev refs to be released before it
is safe to perform the freeing.

netdev_ops->ndo_uninit() presumably can occur right after the
NETDEV_UNREGISTER notifier completes and the unicast and multicast
address lists are flushed.

netdev->destructor(), on the other hand, does not run until the
netdev references all go away.

Further complicating the situation is that netdev->destructor()
almost universally does also a free_netdev().

This creates a problem for the logic in register_netdevice().
Because all callers of register_netdevice() manage the freeing
of the netdev, and invoke free_netdev(dev) if register_netdevice()
fails.

If netdev_ops->ndo_init() succeeds, but something else fails inside
of register_netdevice(), it does call ndo_ops->ndo_uninit().  But
it is not able to invoke netdev->destructor().

This is because netdev->destructor() will do a free_netdev() and
then the caller of register_netdevice() will do the same.

However, this means that the resources that would normally be released
by netdev->destructor() will not be.

Over the years drivers have added local hacks to deal with this, by
invoking their destructor parts by hand when register_netdevice()
fails.

Many drivers do not try to deal with this, and instead we have leaks.

Let's close this hole by formalizing the distinction between what
private things need to be freed up by netdev->destructor() and whether
the driver needs unregister_netdevice() to perform the free_netdev().

netdev->priv_destructor() performs all actions to free up the private
resources that used to be freed by netdev->destructor(), except for
free_netdev().

netdev->needs_free_netdev is a boolean that indicates whether
free_netdev() should be done at the end of unregister_netdevice().

Now, register_netdevice() can sanely release all resources after
ndo_ops->ndo_init() succeeds, by invoking both ndo_ops->ndo_uninit()
and netdev->priv_destructor().

And at the end of unregister_netdevice(), we invoke
netdev->priv_destructor() and optionally call free_netdev().
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf124db5

rxrpc: Provide a cmsg to specify the amount of Tx data for a call · e754eba6

由 David Howells 提交于 6月 07, 2017

Provide a control message that can be specified on the first sendmsg() of a
client call or the first sendmsg() of a service response to indicate the
total length of the data to be transmitted for that call.

Currently, because the length of the payload of an encrypted DATA packet is
encrypted in front of the data, the packet cannot be encrypted until we
know how much data it will hold.

By specifying the length at the beginning of the transmit phase, each DATA
packet length can be set before we start loading data from userspace (where
several sendmsg() calls may contribute to a particular packet).

An error will be returned if too little or too much data is presented in
the Tx phase.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

e754eba6

D
rxrpc: Provide a getsockopt call to query what cmsgs types are supported · 515559ca
由 David Howells 提交于 6月 07, 2017
```
Provide a getsockopt() call that can query what cmsg types are supported by
AF_RXRPC.
```
515559ca

07 6月, 2017 11 次提交

net: phy: add XAUI and 10GBASE-KR PHY connection types · c125ca09

由 Russell King 提交于 6月 05, 2017

XAUI allows XGMII to reach an extended distance by using a XGXS layer at
each end of the MAC to PHY link, operating over four Serdes lanes.

10GBASE-KR is a single lane Serdes backplane ethernet connection method
with autonegotiation on the link.  Some PHYs use this to connect to the
ethernet interface at 10G speeds, switching to other connection types
when utilising slower speeds.

10GBASE-KR is also used for XFI and SFI to connect to XFP and SFP fiber
modules.
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c125ca09

net: phy: hook up clause 45 autonegotiation restart · 002ba705

由 Russell King 提交于 6月 05, 2017

genphy_restart_aneg() can only restart autonegotiation on clause 22
PHYs.  Add a phy_restart_aneg() function which selects between the
clause 22 and clause 45 restart functionality depending on the PHY
type and whether the Clause 45 PHY supports the Clause 22 register set.
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

002ba705

net: phy: add 802.3 clause 45 support to phylib · 5acde34a

由 Russell King 提交于 6月 05, 2017

Add generic helpers for 802.3 clause 45 PHYs for >= 10Gbps support.
Reviewed-by: NAndrew Lunn <andrew@lunn.ch>
Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5acde34a

Revert "ACPI / sleep: Ignore spurious SCI wakeups from suspend-to-idle" · f3b7eaae

由 Rafael J. Wysocki 提交于 6月 07, 2017

Revert commit eed4d47e (ACPI / sleep: Ignore spurious SCI wakeups
from suspend-to-idle) as it turned out to be premature and triggered
a number of different issues on various systems.

That includes, but is not limited to, premature suspend-to-RAM aborts
on Dell XPS 13 (9343) reported by Dominik.

The issue the commit in question attempted to address is real and
will need to be taken care of going forward, but evidently more work
is needed for this purpose.
Reported-by: NDominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f3b7eaae

compiler, clang: suppress warning for unused static inline functions · abb2ea7d

由 David Rientjes 提交于 6月 06, 2017

GCC explicitly does not warn for unused static inline functions for
-Wunused-function.  The manual states:

	Warn whenever a static function is declared but not defined or
	a non-inline static function is unused.

Clang does warn for static inline functions that are unused.

It turns out that suppressing the warnings avoids potentially complex
#ifdef directives, which also reduces LOC.

Suppress the warning for clang.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

abb2ea7d

bpf: Add BPF_OBJ_GET_INFO_BY_FD · 1e270976

由 Martin KaFai Lau 提交于 6月 05, 2017

A single BPF_OBJ_GET_INFO_BY_FD cmd is used to obtain the info
for both bpf_prog and bpf_map.  The kernel can figure out the
fd is associated with a bpf_prog or bpf_map.

The suggested struct bpf_prog_info and struct bpf_map_info are
not meant to be a complete list and it is not the goal of this patch.
New fields can be added in the future patch.

The focus of this patch is to create the interface,
BPF_OBJ_GET_INFO_BY_FD cmd for exposing the bpf_prog's and
bpf_map's info.

The obj's info, which will be extended (and get bigger) over time, is
separated from the bpf_attr to avoid bloating the bpf_attr.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1e270976

bpf: Add jited_len to struct bpf_prog · 783d28dd

由 Martin KaFai Lau 提交于 6月 05, 2017

Add jited_len to struct bpf_prog.  It will be
useful for the struct bpf_prog_info which will
be added in the later patch.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

783d28dd

bpf: Introduce bpf_map ID · f3f1c054

由 Martin KaFai Lau 提交于 6月 05, 2017

This patch generates an unique ID for each created bpf_map.
The approach is similar to the earlier patch for bpf_prog ID.

It is worth to note that the bpf_map's ID and bpf_prog's ID
are in two independent ID spaces and both have the same valid range:
[1, INT_MAX).
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3f1c054

bpf: Introduce bpf_prog ID · dc4bb0e2

由 Martin KaFai Lau 提交于 6月 05, 2017

This patch generates an unique ID for each BPF_PROG_LOAD-ed prog.
It is worth to note that each BPF_PROG_LOAD-ed prog will have
a different ID even they have the same bpf instructions.

The ID is generated by the existing idr_alloc_cyclic().
The ID is ranged from [1, INT_MAX).  It is allocated in cyclic manner,
so an ID will get reused every 2 billion BPF_PROG_LOAD.

The bpf_prog_alloc_id() is done after bpf_prog_select_runtime()
because the jit process may have allocated a new prog.  Hence,
we need to ensure the value of pointer 'prog' will not be changed
any more before storing the prog to the prog_idr.

After bpf_prog_select_runtime(), the prog is read-only.  Hence,
the id is stored in 'struct bpf_prog_aux'.
Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
Acked-by: NAlexei Starovoitov <ast@fb.com>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc4bb0e2

net: phy: Delete unused function phy_ethtool_gset · f8fe9975

由 yuval.shaia@oracle.com 提交于 6月 05, 2017

It's unused, so remove it.
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8fe9975

elevator: fix truncation of icq_cache_name · 9bd2bbc0

由 Eric Biggers 提交于 6月 02, 2017

gcc 7.1 reports the following warning:

    block/elevator.c: In function ‘elv_register’:
    block/elevator.c:898:5: warning: ‘snprintf’ output may be truncated before the last format character [-Wformat-truncation=]
         "%s_io_cq", e->elevator_name);
         ^~~~~~~~~~
    block/elevator.c:897:3: note: ‘snprintf’ output between 7 and 22 bytes into a destination of size 21
       snprintf(e->icq_cache_name, sizeof(e->icq_cache_name),
       ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         "%s_io_cq", e->elevator_name);
         ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The bug is that the name of the icq_cache is 6 characters longer than
the elevator name, but only ELV_NAME_MAX + 5 characters were reserved
for it --- so in the case of a maximum-length elevator name, the 'q'
character in "_io_cq" would be truncated by snprintf().  Fix it by
reserving ELV_NAME_MAX + 6 characters instead.
Signed-off-by: NEric Biggers <ebiggers@google.com>
Reviewed-by: NBart Van Assche <Bart.VanAssche@sandisk.com>
Signed-off-by: NJens Axboe <axboe@fb.com>

9bd2bbc0

05 6月, 2017 3 次提交

net/{mii, smsc}: Make mii_ethtool_get_link_ksettings and smc_netdev_get_ecmd return void · 82c01a84

由 yuval.shaia@oracle.com 提交于 6月 04, 2017

Make return value void since functions never returns meaningfull value.
Signed-off-by: NYuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82c01a84

rxrpc: Add service upgrade support for client connections · 4e255721

由 David Howells 提交于 6月 05, 2017

Make it possible for a client to use AuriStor's service upgrade facility.

The client does this by adding an RXRPC_UPGRADE_SERVICE control message to
the first sendmsg() of a call.  This takes no parameters.

When recvmsg() starts returning data from the call, the service ID field in
the returned msg_name will reflect the result of the upgrade attempt.  If
the upgrade was ignored, srx_service will match what was set in the
sendmsg(); if the upgrade happened the srx_service will be altered to
indicate the service the server upgraded to.

Note that:

 (1) The choice of upgrade service is up to the server

 (2) Further client calls to the same server that would share a connection
     are blocked if an upgrade probe is in progress.

 (3) This should only be used to probe the service.  Clients should then
     use the returned service ID in all subsequent communications with that
     server (and not set the upgrade).  Note that the kernel will not
     retain this information should the connection expire from its cache.

 (4) If a server that supports upgrading is replaced by one that doesn't,
     whilst a connection is live, and if the replacement is running, say,
     OpenAFS 1.6.4 or older or an older IBM AFS, then the replacement
     server will not respond to packets sent to the upgraded connection.

     At this point, calls will time out and the server must be reprobed.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

4e255721

rxrpc: Implement service upgrade · 4722974d

由 David Howells 提交于 6月 05, 2017

Implement AuriStor's service upgrade facility.  There are three problems
that this is meant to deal with:

 (1) Various of the standard AFS RPC calls have IPv4 addresses in their
     requests and/or replies - but there's no room for including IPv6
     addresses.

 (2) Definition of IPv6-specific RPC operations in the standard operation
     sets has not yet been achieved.

 (3) One could envision the creation a new service on the same port that as
     the original service.  The new service could implement improved
     operations - and the client could try this first, falling back to the
     original service if it's not there.

     Unfortunately, certain servers ignore packets addressed to a service
     they don't implement and don't respond in any way - not even with an
     ABORT.  This means that the client must then wait for the call timeout
     to occur.

What service upgrade does is to see if the connection is marked as being
'upgradeable' and if so, change the service ID in the server and thus the
request and reply formats.  Note that the upgrade isn't mandatory - a
server that supports only the original call set will ignore the upgrade
request.

In the protocol, the procedure is then as follows:

 (1) To request an upgrade, the first DATA packet in a new connection must
     have the userStatus set to 1 (this is normally 0).  The userStatus
     value is normally ignored by the server.

 (2) If the server doesn't support upgrading, the reply packets will
     contain the same service ID as for the first request packet.

 (3) If the server does support upgrading, all future reply packets on that
     connection will contain the new service ID and the new service ID will
     be applied to *all* further calls on that connection as well.

 (4) The RPC op used to probe the upgrade must take the same request data
     as the shadow call in the upgrade set (but may return a different
     reply).  GetCapability RPC ops were added to all standard sets for
     just this purpose.  Ops where the request formats differ cannot be
     used for probing.

 (5) The client must wait for completion of the probe before sending any
     further RPC ops to the same destination.  It should then use the
     service ID that recvmsg() reported back in all future calls.

 (6) The shadow service must have call definitions for all the operation
     IDs defined by the original service.


To support service upgrading, a server should:

 (1) Call bind() twice on its AF_RXRPC socket before calling listen().
     Each bind() should supply a different service ID, but the transport
     addresses must be the same.  This allows the server to receive
     requests with either service ID.

 (2) Enable automatic upgrading by calling setsockopt(), specifying
     RXRPC_UPGRADEABLE_SERVICE and passing in a two-member array of
     unsigned shorts as the argument:

	unsigned short optval[2];

     This specifies a pair of service IDs.  They must be different and must
     match the service IDs bound to the socket.  Member 0 is the service ID
     to upgrade from and member 1 is the service ID to upgrade to.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

4722974d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功