提交 · 2da897e51d7f23f872224218b9a4314503b0d360 · openeuler / Kernel

25 2月, 2016 1 次提交

bpf: fix csum setting for bpf_set_tunnel_key · 2da897e5

由 Daniel Borkmann 提交于 2月 23, 2016

The fix in 35e2d115 ("tunnels: Allow IPv6 UDP checksums to be correctly
controlled.") changed behavior for bpf_set_tunnel_key() when in use with
IPv6 and thus uncovered a bug that TUNNEL_CSUM needed to be set but wasn't.
As a result, the stack dropped ingress vxlan IPv6 packets, that have been
sent via eBPF through collect meta data mode due to checksum now being zero.

Since after LCO, we enable IPv4 checksum by default, so make that analogous
and only provide a flag BPF_F_ZERO_CSUM_TX for the user to turn it off in
IPv4 case.

Fixes: 35e2d115 ("tunnels: Allow IPv6 UDP checksums to be correctly controlled.")
Fixes: c6c33454 ("bpf: support ipv6 for bpf_skb_{set,get}_tunnel_key")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2da897e5

31 1月, 2016 1 次提交

block: revert runtime dax control of the raw block device · 9f4736fe

由 Dan Williams 提交于 1月 28, 2016

Dynamically enabling DAX requires that the page cache first be flushed
and invalidated.  This must occur atomically with the change of DAX mode
otherwise we confuse the fsync/msync tracking and violate data
durability guarantees.  Eliminate the possibilty of DAX-disabled to
DAX-enabled transitions for now and revisit this for the next cycle.

Cc: Jan Kara <jack@suse.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

9f4736fe

21 1月, 2016 1 次提交

epoll: add EPOLLEXCLUSIVE flag · df0108c5

由 Jason Baron 提交于 1月 20, 2016

Currently, epoll file descriptors or epfds (the fd returned from
epoll_create[1]()) that are added to a shared wakeup source are always
added in a non-exclusive manner.  This means that when we have multiple
epfds attached to a shared fd source they are all woken up.  This creates
thundering herd type behavior.

Introduce a new 'EPOLLEXCLUSIVE' flag that can be passed as part of the
'event' argument during an epoll_ctl() EPOLL_CTL_ADD operation.  This new
flag allows for exclusive wakeups when there are multiple epfds attached
to a shared fd event source.

The implementation walks the list of exclusive waiters, and queues an
event to each epfd, until it finds the first waiter that has threads
blocked on it via epoll_wait().  The idea is to search for threads which
are idle and ready to process the wakeup events.  Thus, we queue an event
to at least 1 epfd, but may still potentially queue an event to all epfds
that are attached to the shared fd source.

Performance testing was done by Madars Vitolins using a modified version
of Enduro/X.  The use of the 'EPOLLEXCLUSIVE' flag reduce the length of
this particular workload from 860s down to 24s.

Sample epoll_clt text:

EPOLLEXCLUSIVE

  Sets an exclusive wakeup mode for the epfd file descriptor that is
  being attached to the target file descriptor, fd.  Thus, when an event
  occurs and multiple epfd file descriptors are attached to the same
  target file using EPOLLEXCLUSIVE, one or more epfds will receive an
  event with epoll_wait(2).  The default in this scenario (when
  EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
  EPOLLEXCLUSIVE may only be specified with the op EPOLL_CTL_ADD.
Signed-off-by: NJason Baron <jbaron@akamai.com>
Tested-by: NMadars Vitolins <m@silodev.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df0108c5

14 1月, 2016 1 次提交

uapi: update install list after nvme.h rename · a9cf8284

由 Mike Frysinger 提交于 1月 10, 2016

Commit 9d99a8dd ("nvme: move hardware structures out of the uapi
version of nvme.h") renamed nvme.h to nvme_ioctl.h, but the uapi list
still refers to nvme.h.  People trying to install the headers hit a
failure as the header no longer exists.

Cc: stable@vger.kernel.org
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

a9cf8284

12 1月, 2016 5 次提交

lightnvm: introduce factory reset · 8b4970c4

由 Matias Bjørling 提交于 1月 12, 2016

Now that a device can be managed using the system blocks, a method to
reset the device is necessary as well. This patch introduces logic to
reset the device easily to factory state and exposes it through an
ioctl.

The ioctl takes the following flags:

  NVM_FACTORY_ERASE_ONLY_USER
      By default all blocks, except host-reserved blocks are erased upon
      factory reset. Instead of this, only erase host-reserved blocks.
  NVM_FACTORY_RESET_HOST_BLKS
      Mark host-reserved blocks to be erased and set their type to free.
  NVM_FACTORY_RESET_GRWN_BBLKS
      Mark "grown bad blocks" to be erased and set their type to free.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

8b4970c4

lightnvm: introduce ioctl to initialize device · 55696154

由 Matias Bjørling 提交于 1月 12, 2016

Based on the previous patch, we now introduce an ioctl to initialize the
device using nvm_init_sysblock and create the necessary system blocks.
The user may specify the media manager that they wish to instantiate on
top. Default from user-space will be "gennvm".
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

55696154

lightnvm: core on-disk initialization · e3eb3799

由 Matias Bjørling 提交于 1月 12, 2016

An Open-Channel SSD shall be initialized before use. To initialize, we
define an on-disk format, that keeps a small set of metadata to bring up
the media manager on top of the device.

The initial step is introduced to allow a user to format the disks for a
given media manager. During format, a system block is stored on one to
three separate luns on the device. Each lun has the system block
duplicated. During initialization, the system block can be retrieved and
the appropriate media manager can initialized.

The on-disk format currently covers (struct nvm_system_block):

 - Magic value "NVMS".
 - Monotonic increasing sequence number.
 - The physical block erase count.
 - Version of the system block format.
 - Media manager type.
 - Media manager superblock physical address.

The interface provides three functions to manage the system block:

 int nvm_init_sysblock(struct nvm_dev *, struct nvm_sb_info *)
 int nvm_get_sysblock(struct nvm *dev, struct nvm_sb_info *)
 int nvm_update_sysblock(struct nvm *dev, struct nvm_sb_info *)

Each implement a part of the logic to manage the system block. The
initialization creates the first system blocks and mark them on the
device. Get retrieves the latest system block by scanning all pages in
the associated system blocks. The update sysblock writes new metadata
and allocates new block if necessary.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

e3eb3799

bpf: support ipv6 for bpf_skb_{set,get}_tunnel_key · c6c33454

由 Daniel Borkmann 提交于 1月 11, 2016

After IPv6 support has recently been added to metadata dst and related
encaps, add support for populating/reading it from an eBPF program.

Commit d3aa45ce ("bpf: add helpers to access tunnel metadata") started
with initial IPv4-only support back then (due to IPv6 metadata support
not being available yet).

To stay compatible with older programs, we need to test for the passed
structure size. Also TOS and TTL support from the ip_tunnel_info key has
been added. Tested with vxlan devs in collect meta data mode with IPv4,
IPv6 and in compat mode over different network namespaces.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6c33454

bpf: export helper function flags and reject invalid ones · 781c53bc

由 Daniel Borkmann 提交于 1月 11, 2016

Export flags used by eBPF helper functions through UAPI, so they can be
used by programs (instead of them redefining all flags each time or just
using the hard-coded values). It also gives a better overview what flags
are used where and we can further get rid of the extra macros defined in
filter.c. Moreover, reject invalid flags.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

781c53bc

11 1月, 2016 16 次提交

[media] Postpone the addition of MEDIA_IOC_G_TOPOLOGY · be0270ec

由 Mauro Carvalho Chehab 提交于 12月 28, 2015

There are a few discussions left with regards to this ioctl:

1) the name of the new structs will contain _v2_ on it?
2) what's the best alternative to avoid compat32 issues?

Due to that, let's postpone the addition of this new ioctl to
the next Kernel version, to give people more time to discuss it.
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

be0270ec

[media] uapi/media.h: Use u32 for the number of graph objects · 7c9d6731

由 Mauro Carvalho Chehab 提交于 12月 17, 2015

While we need to keep a u64 alignment to avoid compat32 issues,
having the number of entities/pads/links/interfaces represented
by an u64 is incoherent with the ID number, with is an u32.

In order to make it coherent, change those quantities to u32.
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

7c9d6731

[media] media.h: let be clear that tuners need to use connectors · b3109d66

由 Mauro Carvalho Chehab 提交于 12月 11, 2015

The V4L2 core won't be adding connectors to the tuners and other
entities that need them. Let it be clear.
Suggested-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

b3109d66

[media] media-device: Use u64 ints for pointers · b3b7a9f1

由 Mauro Carvalho Chehab 提交于 12月 11, 2015

By using u64 integers and pointers, we can get rid of compat32
logic. So, let's do it!
Suggested-by: NArnd Bergmann <arnd@arndb.de>
Suggested-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

b3b7a9f1

[media] media-device.h: Let clearer that entity function must be initialized · d1b9da2d

由 Mauro Carvalho Chehab 提交于 12月 11, 2015

Improve the documentation to let it clear that the entity function
must be initialized.
Suggested-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

d1b9da2d

[media] uapi/media.h: Rename entities types to functions · 4ca72efa

由 Mauro Carvalho Chehab 提交于 12月 10, 2015

Rename the userspace types from MEDIA_ENT_T_ to MEDIA_ENT_F_
and add the backward compatibility bits.

The changes at the .c files was generated by the following
coccinelle script:

@@
@@
-MEDIA_ENT_T_UNKNOWN
+MEDIA_ENT_F_UNKNOWN
@@
@@
-MEDIA_ENT_T_DVB_BASE
+MEDIA_ENT_F_DVB_BASE
@@
@@
-MEDIA_ENT_T_V4L2_BASE
+MEDIA_ENT_F_V4L2_BASE
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_BASE
+MEDIA_ENT_F_V4L2_SUBDEV_BASE
@@
@@
-MEDIA_ENT_T_CONNECTOR_BASE
+MEDIA_ENT_F_CONNECTOR_BASE
@@
@@
-MEDIA_ENT_T_V4L2_VIDEO
+MEDIA_ENT_F_IO_V4L
@@
@@
-MEDIA_ENT_T_V4L2_VBI
+MEDIA_ENT_F_IO_VBI
@@
@@
-MEDIA_ENT_T_V4L2_SWRADIO
+MEDIA_ENT_F_IO_SWRADIO
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_UNKNOWN
+MEDIA_ENT_F_V4L2_SUBDEV_UNKNOWN
@@
@@
-MEDIA_ENT_T_CONN_RF
+MEDIA_ENT_F_CONN_RF
@@
@@
-MEDIA_ENT_T_CONN_SVIDEO
+MEDIA_ENT_F_CONN_SVIDEO
@@
@@
-MEDIA_ENT_T_CONN_COMPOSITE
+MEDIA_ENT_F_CONN_COMPOSITE
@@
@@
-MEDIA_ENT_T_CONN_TEST
+MEDIA_ENT_F_CONN_TEST
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_SENSOR
+MEDIA_ENT_F_CAM_SENSOR
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_FLASH
+MEDIA_ENT_F_FLASH
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_LENS
+MEDIA_ENT_F_LENS
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_DECODER
+MEDIA_ENT_F_ATV_DECODER
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_TUNER
+MEDIA_ENT_F_TUNER
@@
@@
-MEDIA_ENT_T_DVB_DEMOD
+MEDIA_ENT_F_DTV_DEMOD
@@
@@
-MEDIA_ENT_T_DVB_DEMUX
+MEDIA_ENT_F_TS_DEMUX
@@
@@
-MEDIA_ENT_T_DVB_TSOUT
+MEDIA_ENT_F_IO_DTV
@@
@@
-MEDIA_ENT_T_DVB_CA
+MEDIA_ENT_F_DTV_CA
@@
@@
-MEDIA_ENT_T_DVB_NET_DECAP
+MEDIA_ENT_F_DTV_NET_DECAP
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

4ca72efa

[media] media-device: export the entity function via new ioctl · d87cdb88

由 Mauro Carvalho Chehab 提交于 9月 06, 2015

Now that entities have a main function, expose it via
MEDIA_IOC_G_TOPOLOGY ioctl.

Please notice that some entities may have secundary functions.
Such use case will be addressed later, when we add support for the
Media Controller properties.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

d87cdb88

[media] media.h: create connector entities for hybrid TV devices · 49a11518

由 Mauro Carvalho Chehab 提交于 8月 31, 2015

Add entities to represent the connectors that exists inside a
hybrid TV device.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

49a11518

[media] uapi/media.h: Add MEDIA_IOC_G_TOPOLOGY ioctl · c398bb64

由 Mauro Carvalho Chehab 提交于 8月 23, 2015

Add a new ioctl that will report the entire topology on
one go.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

c398bb64

[media] media.h: don't use legacy entity macros at Kernel · 4376679a

由 Mauro Carvalho Chehab 提交于 8月 29, 2015

Put the legacy MEDIA_ENT_* macros under a #ifndef __KERNEL__,
in order to be sure that none of those old symbols are used
inside the Kernel.
Acked-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

4376679a

[media] media controller: get rid of entity subtype on Kernel · 687b4209

由 Mauro Carvalho Chehab 提交于 5月 07, 2015

Don't use anymore the type/subtype entity data/macros
inside the Kernel.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Acked-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

687b4209

[media] v4l2-subdev: use MEDIA_ENT_T_UNKNOWN for new subdevs · b50bde4e

由 Mauro Carvalho Chehab 提交于 5月 07, 2015

Instead of abusing MEDIA_ENT_T_V4L2_SUBDEV, initialize
new subdev entities as MEDIA_ENT_T_UNKNOWN.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

b50bde4e

[media] media: add macros to check if subdev or V4L2 DMA · fa17b46a

由 Mauro Carvalho Chehab 提交于 8月 21, 2015

As we'll be removing entity subtypes from the Kernel, we need
to provide a way for drivers and core to check if a given
entity is represented by a V4L2 subdev or if it is an V4L2
I/O entity (typically with DMA).

Drivers that create entities that don't belong to any defined subdev
category should use MEDIA_ENT_T_V4L2_SUBDEV_UNKNOWN.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

fa17b46a

[media] uapi/media.h: Fix entity namespace · 32fdc0e1

由 Mauro Carvalho Chehab 提交于 8月 21, 2015

Now that interfaces got created, we need to fix the entity
namespace.

So, let's create a consistent new namespace and add backward
compatibility macros to keep the old namespace preserved.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

32fdc0e1

[media] uapi/media.h: Declare interface types for V4L2 and DVB · a1d2510e

由 Mauro Carvalho Chehab 提交于 8月 20, 2015

Declare the interface types that will be used by the new
G_TOPOLOGY ioctl that will be defined later on.

For now, we need those types, as they'll be used on the
internal structs associated with the new media_interface
graph object defined on the next patch.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: NJavier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

a1d2510e

net, sched: add clsact qdisc · 1f211a1b

由 Daniel Borkmann 提交于 1月 07, 2016

This work adds a generalization of the ingress qdisc as a qdisc holding
only classifiers. The clsact qdisc works on ingress, but also on egress.
In both cases, it's execution happens without taking the qdisc lock, and
the main difference for the egress part compared to prior version of [1]
is that this can be applied with _any_ underlying real egress qdisc (also
classless ones).

Besides solving the use-case of [1], that is, allowing for more programmability
on assigning skb->priority for the mqprio case that is supported by most
popular 10G+ NICs, it also opens up a lot more flexibility for other tc
applications. The main work on classification can already be done at clsact
egress time if the use-case allows and state stored for later retrieval
f.e. again in skb->priority with major/minors (which is checked by most
classful qdiscs before consulting tc_classify()) and/or in other skb fields
like skb->tc_index for some light-weight post-processing to get to the
eventual classid in case of a classful qdisc. Another use case is that
the clsact egress part allows to have a central egress counterpart to
the ingress classifiers, so that classifiers can easily share state (e.g.
in cls_bpf via eBPF maps) for ingress and egress.

Currently, default setups like mq + pfifo_fast would require for this to
use, for example, prio qdisc instead (to get a tc_classify() run) and to
duplicate the egress classifier for each queue. With clsact, it allows
for leaving the setup as is, it can additionally assign skb->priority to
put the skb in one of pfifo_fast's bands and it can share state with maps.
Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
w/o the need to perform a skb_dst_force() to hold on to it any longer. In
lwt case, we can also use this facility to setup dst metadata via cls_bpf
(bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
that (case of IFF_NO_QUEUE devices, for example).

The realization can be done without any changes to the scheduler core
framework. All it takes is that we have two a-priori defined minors/child
classes, where we can mux between ingress and egress classifier list
(dev->ingress_cl_list and dev->egress_cl_list, latter stored close to
dev->_tx to avoid extra cacheline miss for moderate loads). The egress
part is a bit similar modelled to handle_ing() and patched to a noop in
case the functionality is not used. Both handlers are now called
sch_handle_ingress() and sch_handle_egress(), code sharing among the two
doesn't seem practical as there are various minor differences in both
paths, so that making them conditional in a single handler would rather
slow things down.

Full compatibility to ingress qdisc is provided as well. Since both
piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
per netdevice, and thus ingress qdisc specific behaviour can be retained
for user space. This means, either a user does 'tc qdisc add dev foo ingress'
and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
alternative, where both, ingress and egress classifier can be configured
as in the below example. ingress qdisc supports attaching classifier to any
minor number whereas clsact has two fixed minors for muxing between the
lists, therefore to not break user space setups, they are better done as
two separate qdiscs.

I decided to extend the sch_ingress module with clsact functionality so
that commonly used code can be reused, the module is being aliased with
sch_clsact so that it can be auto-loaded properly. Alternative would have been
to add a flag when initializing ingress to alter its behaviour plus aliasing
to a different name (as it's more than just ingress). However, the first would
end up, based on the flag, choosing the new/old behaviour by calling different
function implementations to handle each anyway, the latter would require to
register ingress qdisc once again under different alias. So, this really begs
to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
by its own that share callbacks used by both.

Example, adding qdisc:

# tc qdisc add dev foo clsact
# tc qdisc show dev foo
qdisc mq 0: root
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc clsact ffff: parent ffff:fff1

Adding filters (deleting, etc works analogous by specifying ingress/egress):

# tc filter add dev foo ingress bpf da obj bar.o sec ingress
# tc filter add dev foo egress bpf da obj bar.o sec egress
# tc filter show dev foo ingress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
# tc filter show dev foo egress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action

A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
show an empty list for clsact. Either using the parent names (ingress/egress)
or specifying the full major/minor will then show the related filter lists.

Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.

[1] http://patchwork.ozlabs.org/patch/512949/Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f211a1b

09 1月, 2016 2 次提交

block: enable dax for raw block devices · 5a023cdb

由 Dan Williams 提交于 11月 30, 2015

If an application wants exclusive access to all of the persistent memory
provided by an NVDIMM namespace it can use this raw-block-dax facility
to forgo establishing a filesystem.  This capability is targeted
primarily to hypervisors wanting to provision persistent memory for
guests.  It can be disabled / enabled dynamically via the new BLKDAXSET
ioctl.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Reviewed-by: NJan Kara <jack@suse.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5a023cdb

fs: clean up the flags definition in uapi/linux/fs.h · 68ce7bfc

由 Theodore Ts'o 提交于 1月 08, 2016

Add an explanation for the flags used by FS_IOC_[GS]ETFLAGS and remind
people that changes should be revised by linux-fsdevel and linux-api.

Add flags that are used on-disk for ext4, and remove FS_DIRECTIO_FL
since it was used only by gfs2 and support was removed in 2008 in
commit c9f6a6bb ("The ability to mark files for direct i/o access
when opened normally is both unused and pointless, so this patch
removes support for that feature.")  Now we have _two_ remaining flags
left.  But since we want to discourage people from assigning new
flags, that's OK.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

68ce7bfc

08 1月, 2016 2 次提交

netfilter: nft_ct: add byte/packet counter support · 48f66c90

由 Florian Westphal 提交于 1月 07, 2016

If the accounting extension isn't present, we'll return a counter
value of 0.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

48f66c90

netfilter: nf_tables: Add new attributes into nft_set to store user data. · e6d8ecac

由 Carlos Falgueras García 提交于 1月 05, 2016

User data is stored at after 'nft_set_ops' private data into 'data[]'
flexible array. The field 'udata' points to user data and 'udlen' stores
its length.

Add new flag NFTA_SET_USERDATA.
Signed-off-by: NCarlos Falgueras García <carlosfg@riseup.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e6d8ecac

07 1月, 2016 1 次提交

KVM: s390: implement the RI support of guest · c6e5f166

由 Fan Zhang 提交于 1月 07, 2016

This patch adds runtime instrumentation support for KVM guest. We need to
setup a save area for the runtime instrumentation-controls control block(RICCB)
and implement the necessary interfaces to live migrate the guest settings.

We setup the sie control block in a way, that the runtime
instrumentation instructions of a guest are handled by hardware.

We also add a capability KVM_CAP_S390_RI to make this feature opt-in as
it needs migration support.
Signed-off-by: NFan Zhang <zhangfan@linux.vnet.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

c6e5f166

06 1月, 2016 2 次提交

drivers: md: use ktime_get_real_seconds() · 9ebc6ef1

由 Deepa Dinamani 提交于 12月 21, 2015

get_seconds() API is not y2038 safe on 32 bit systems and the API
is deprecated. Replace it with calls to ktime_get_real_seconds()
API instead. Change mddev structure types to time64_t accordingly.

32 bit signed timestamps will overflow in the year 2038.

Change the user interface mdu_array_info_s structure timestamps:
ctime and utime values used in ioctls GET_ARRAY_INFO and
SET_ARRAY_INFO to unsigned int. This will extend the field to last
until the year 2106.
The long term plan is to get rid of ctime and utime values in
this structure as this information can be read from the on-disk
meta data directly.

Clamp the tim64_t timestamps to positive values with a max of U32_MAX
when returning from GET_ARRAY_INFO ioctl to accommodate above changes
in the data type of timestamps to unsigned int.

v0.90 on disk meta data uses u32 for maintaining time stamps.
So this will also last until year 2106.
Assumption is that the usage of v0.90 will be deprecated by
year 2106.

Timestamp fields in the on disk meta data for v1.0 version already
use 64 bit data types. Remove the truncation of the bits while
writing to or reading from these from the disk.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NNeilBrown <neilb@suse.com>

9ebc6ef1

include/uapi/linux/sockios.h: mark SIOCRTMSG unused · 2fbf5758

由 xypron.glpk@gmx.de 提交于 1月 05, 2016

IOCTL SIOCRTMSG does nothing but return EINVAL.

So comment it as unused.

SIOCRTMSG is only used in:
* net/ipv4/af_inet.c
* include/uapi/linux/sockios.h

inet_ioctl calls ip_rt_ioctl.
ip_rt_ioctl only handles SIOCADDRT and SIOCDELRT and returns -EINVAL
otherwise.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fbf5758

05 1月, 2016 1 次提交

netfilter: nf_tables: add forward expression to the netdev family · 39e6dea2

由 Pablo Neira Ayuso 提交于 11月 25, 2015

You can use this to forward packets from ingress to the egress path of
the specified interface. This provides a fast path to bounce packets
from one interface to another specific destination interface.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

39e6dea2

04 1月, 2016 3 次提交

xfs: introduce per-inode DAX enablement · 58f88ca2

由 Dave Chinner 提交于 1月 04, 2016

Rather than just being able to turn DAX on and off via a mount
option, some applications may only want to enable DAX for certain
performance critical files in a filesystem.

This patch introduces a new inode flag to enable DAX in the v3 inode
di_flags2 field. It adds support for setting and clearing flags in
the di_flags2 field via the XFS_IOC_FSSETXATTR ioctl, and sets the
S_DAX inode flag appropriately when it is seen.

When this flag is set on a directory, it acts as an "inherit flag".
That is, inodes created in the directory will automatically inherit
the on-disk inode DAX flag, enabling administrators to set up
directory heirarchies that automatically use DAX. Setting this flag
on an empty root directory will make the entire filesystem use DAX
by default.
Signed-off-by: NDave Chinner <dchinner@redhat.com>

58f88ca2

fs: XFS_IOC_FS[SG]SETXATTR to FS_IOC_FS[SG]ETXATTR promotion · 334e580a

由 Dave Chinner 提交于 1月 04, 2016

Hoist the ioctl definitions for the XFS_IOC_FS[SG]SETXATTR API from
fs/xfs/libxfs/xfs_fs.h to include/uapi/linux/fs.h so that the ioctls
can be used by all filesystems, not just XFS. This enables
(initially) ext4 to use the ioctl to set project IDs on inodes.

Based-on-patch-from: Li Xi <lixi@ddn.com>
Signed-off-by: NDave Chinner <dchinner@redhat.com>

334e580a

netfilter: nft_limit: allow to invert matching criteria · c7862a5f

由 Pablo Neira Ayuso 提交于 12月 28, 2015

This patch allows you to invert the ratelimit matching criteria, so you
can match packets over the ratelimit. This is required to support what
hashlimit does.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

c7862a5f

01 1月, 2016 1 次提交

vfs: hoist the btrfs deduplication ioctl to the vfs · 54dbc151

由 Darrick J. Wong 提交于 12月 19, 2015

Hoist the btrfs EXTENT_SAME ioctl up to the VFS and make the name
more systematic (FIDEDUPERANGE).
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

54dbc151

31 12月, 2015 1 次提交

ethtool: Add phy statistics · f3a40945

由 Andrew Lunn 提交于 12月 30, 2015

Ethernet PHYs can maintain statistics, for example errors while idle
and receive errors. Add an ethtool mechanism to retrieve these
statistics, using the same model as MAC statistics.
Signed-off-by: NAndrew Lunn <andrew@lunn.ch>
Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3a40945

22 12月, 2015 2 次提交

vfio: Include No-IOMMU mode · 03a76b60

由 Alex Williamson 提交于 12月 21, 2015

There is really no way to safely give a user full access to a DMA
capable device without an IOMMU to protect the host system. There is
also no way to provide DMA translation, for use cases such as device
assignment to virtual machines. However, there are still those users
that want userspace drivers even under those conditions. The UIO
driver exists for this use case, but does not provide the degree of
device access and programming that VFIO has. In an effort to avoid
code duplication, this introduces a No-IOMMU mode for VFIO.

This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
the "enable_unsafe_noiommu_mode" option on the vfio driver. This
should make it very clear that this mode is not safe. Additionally,
CAP_SYS_RAWIO privileges are necessary to work with groups and
containers using this mode. Groups making use of this support are
named /dev/vfio/noiommu-$GROUP and can only make use of the special
VFIO_NOIOMMU_IOMMU for the container. Use of this mode, specifically
binding a device without a native IOMMU group to a VFIO bus driver
will taint the kernel and should therefore not be considered
supported. This patch includes no-iommu support for the vfio-pci bus
driver only.
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>

03a76b60

vfio: Add explicit alignments in vfio_iommu_spapr_tce_create · 77d6bd47

由 Alexey Kardashevskiy 提交于 12月 18, 2015

The vfio_iommu_spapr_tce_create struct has 4x32bit and 2x64bit fields
which should have resulted in sizeof(fio_iommu_spapr_tce_create) equal
to 32 bytes. However due to the gcc's default alignment, the actual
size of this struct is 40 bytes.

This fills gaps with __resv1/2 fields.

This should not cause any change in behavior.
Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: NDavid Gibson <david@gibson.dropbear.id.au>
Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>

77d6bd47

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功