提交 · a2d818030135c293f878fbb772cf40e7a14c5acc · openanolis / cloud-kernel

09 9月, 2017 3 次提交

drivers/pps: aesthetic tweaks to PPS-related content · a2d81803

由 Robert P. J. Day 提交于 9月 08, 2017

Collection of aesthetic adjustments to various PPS-related files,
directories and Documentation, some quite minor just for the sake of
consistency, including:

 * Updated example of pps device tree node (courtesy Rodolfo G.)
 * "PPS-API" -> "PPS API"
 * "pps_source_info_s" -> "pps_source_info"
 * "ktimer driver" -> "pps-ktimer driver"
 * "ppstest /dev/pps0" -> "ppstest /dev/pps1" to match example
 * Add missing PPS-related entries to MAINTAINERS file
 * Other trivialities

Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708261048220.8106@localhost.localdomainSigned-off-by: NRobert P. J. Day <rpjday@crashcourse.ca>
Acked-by: NRodolfo Giometti <giometti@enneenne.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a2d81803

autofs: remove unused AUTOFS_IOC_EXPIRE_DIRECT/INDIRECT · 1f28c5d0

由 Tomohiro Kusumi 提交于 9月 08, 2017

These are not used by either kernel or userspace, although
AUTOFS_IOC_EXPIRE_DIRECT once seems to have been used by userspace in
around 2006-2008, which was technically just an alias of the existing
ioctl AUTOFS_IOC_EXPIRE_MULTI.

ioctls for autofs are already complicated enough that they could be
removed unless these are staying here to be able to compile userspace code
of certain period of time from a decade ago.

Edit: raven@themaw.net
Yes, this is indeed very old and anything that still uses must
be updated becuase it will be using broken functionality.
End edit: raven@themaw.net

Link: http://lkml.kernel.org/r/150285067347.4670.11494624644273072003.stgit@pluto.themaw.netSigned-off-by: NTomohiro Kusumi <tkusumi@tuxera.com>
Signed-off-by: NIan Kent <raven@themaw.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1f28c5d0

autofs: make dev ioctl version and ismountpoint user accessible · 3dd8f7c3

由 Ian Kent 提交于 9月 08, 2017

Some of the autofs miscellaneous device ioctls need to be accessable to
user space applications without CAP_SYS_ADMIN to get information about
autofs mounts.

Link: http://lkml.kernel.org/r/150216642517.11652.2338933266137331637.stgit@pluto.themaw.netSigned-off-by: NIan Kent <raven@themaw.net>
Cc: Colin Walters <walters@redhat.com>
Cc: Ondrej Holy <oholy@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3dd8f7c3

07 9月, 2017 8 次提交

mm,fork: introduce MADV_WIPEONFORK · d2cd9ede

由 Rik van Riel 提交于 9月 06, 2017

Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty
in the child process after fork.  This differs from MADV_DONTFORK in one
important way.

If a child process accesses memory that was MADV_WIPEONFORK, it will get
zeroes.  The address ranges are still valid, they are just empty.

If a child process accesses memory that was MADV_DONTFORK, it will get a
segmentation fault, since those address ranges are no longer valid in
the child after fork.

Since MADV_DONTFORK also seems to be used to allow very large programs
to fork in systems with strict memory overcommit restrictions, changing
the semantics of MADV_DONTFORK might break existing programs.

MADV_WIPEONFORK only works on private, anonymous VMAs.

The use case is libraries that store or cache information, and want to
know that they need to regenerate it in the child process after fork.

Examples of this would be:
 - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
   check, which is too slow without a PID cache)
 - PKCS#11 API reinitialization check (mandated by specification)
 - glibc's upcoming PRNG (reseed after fork)
 - OpenSSL PRNG (reseed after fork)

The security benefits of a forking server having a re-inialized PRNG in
every child process are pretty obvious.  However, due to libraries
having all kinds of internal state, and programs getting compiled with
many different versions of each library, it is unreasonable to expect
calling programs to re-initialize everything manually after fork.

A further complication is the proliferation of clone flags, programs
bypassing glibc's functions to call clone directly, and programs calling
unshare, causing the glibc pthread_atfork hook to not get called.

It would be better to have the kernel take care of this automatically.

The patch also adds MADV_KEEPONFORK, to undo the effects of a prior
MADV_WIPEONFORK.

This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

[akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines]
Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.comSigned-off-by: NRik van Riel <riel@redhat.com>
Reported-by: NFlorian Weimer <fweimer@redhat.com>
Reported-by: NColm MacCártaigh <colm@allcosts.net>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Drewry <wad@chromium.org>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d2cd9ede

mm/shmem: add hugetlbfs support to memfd_create() · 749df87b

由 Mike Kravetz 提交于 9月 06, 2017

This patch came out of discussions in this e-mail thread:
http://lkml.kernel.org/r/1499357846-7481-1-git-send-email-mike.kravetz%40oracle.com

The Oracle JVM team is developing a new garbage collection model. This
new model requires multiple mappings of the same anonymous memory. One
straight forward way to accomplish this is with memfd_create. They can
use the returned fd to create multiple mappings of the same memory.

The JVM today has an option to use (static hugetlb) huge pages. If this
option is specified, they would like to use the same garbage collection
model requiring multiple mappings to the same memory. Using hugetlbfs,
it is possible to explicitly mount a filesystem and specify file paths
in order to get an fd that can be used for multiple mappings. However,
this introduces additional system admin work and coordination.

Ideally they would like to get a hugetlbfs fd without requiring explicit
mounting of a filesystem. Today, mmap and shmget can make use of
hugetlbfs without explicitly mounting a filesystem. The patch adds this
functionality to memfd_create.

Add a new flag MFD_HUGETLB to memfd_create() that will specify the file
to be created resides in the hugetlbfs filesystem. This is the generic
hugetlbfs filesystem not associated with any specific mount point. As
with other system calls that request hugetlbfs backed pages, there is
the ability to encode huge page size in the flag arguments.

hugetlbfs does not support sealing operations, therefore specifying
MFD_ALLOW_SEALING with MFD_HUGETLB will result in EINVAL.

Of course, the memfd_man page would need updating if this type of
functionality moves forward.

Link: http://lkml.kernel.org/r/1502149672-7759-2-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

749df87b

userfaultfd: provide pid in userfault msg - add feat union · a36985d3

由 Andrea Arcangeli 提交于 9月 06, 2017

No ABI change, but this will make it more explicit to software that ptid
is only available if requested by passing UFFD_FEATURE_THREAD_ID to
UFFDIO_API.  The fact it's a union will also self document it shouldn't
be taken for granted there's a tpid there.

Link: http://lkml.kernel.org/r/20170802165145.22628-7-aarcange@redhat.comSigned-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Alexey Perevalov <a.perevalov@samsung.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a36985d3

userfaultfd: provide pid in userfault msg · 9d4ac934

由 Alexey Perevalov 提交于 9月 06, 2017

It could be useful for calculating downtime during postcopy live
migration per vCPU. Side observer or application itself will be
informed about proper task's sleep during userfaultfd processing.

Process's thread id is being provided when user requeste it by setting
UFFD_FEATURE_THREAD_ID bit into uffdio_api.features.

Link: http://lkml.kernel.org/r/20170802165145.22628-6-aarcange@redhat.comSigned-off-by: NAlexey Perevalov <a.perevalov@samsung.com>
Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Maxime Coquelin <maxime.coquelin@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9d4ac934

mm: userfaultfd: add feature to request for a signal delivery · 2d6d6f5a

由 Prakash Sangappa 提交于 9月 06, 2017

In some cases, userfaultfd mechanism should just deliver a SIGBUS signal
to the faulting process, instead of the page-fault event. Dealing with
page-fault event using a monitor thread can be an overhead in these
cases. For example applications like the database could use the
signaling mechanism for robustness purpose.

Database uses hugetlbfs for performance reason. Files on hugetlbfs
filesystem are created and huge pages allocated using fallocate() API.
Pages are deallocated/freed using fallocate() hole punching support.
These files are mmapped and accessed by many processes as shared memory.
The database keeps track of which offsets in the hugetlbfs file have
pages allocated.

Any access to mapped address over holes in the file, which can occur due
to bugs in the application, is considered invalid and expect the process
to simply receive a SIGBUS. However, currently when a hole in the file
is accessed via the mapped address, kernel/mm attempts to automatically
allocate a page at page fault time, resulting in implicitly filling the
hole in the file. This may not be the desired behavior for applications
like the database that want to explicitly manage page allocations of
hugetlbfs files.

Using userfaultfd mechanism with this support to get a signal, database
application can prevent pages from being allocated implicitly when
processes access mapped address over holes in the file.

This patch adds UFFD_FEATURE_SIGBUS feature to userfaultfd mechnism to
request for a SIGBUS signal.

See following for previous discussion about the database requirement
leading to this proposal as suggested by Andrea.

http://www.spinics.net/lists/linux-mm/msg129224.html

Link: http://lkml.kernel.org/r/1501552446-748335-2-git-send-email-prakash.sangappa@oracle.comSigned-off-by: NPrakash Sangappa <prakash.sangappa@oracle.com>
Reviewed-by: NMike Rapoport <rppt@linux.vnet.ibm.com>
Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d6d6f5a

mm: shm: use new hugetlb size encoding definitions · 4da243ac

由 Mike Kravetz 提交于 9月 06, 2017

Use the common definitions from hugetlb_encode.h header file for
encoding hugetlb size definitions in shmget system call flags.

In addition, move these definitions from the internal (kernel) to user
(uapi) header file.

Link: http://lkml.kernel.org/r/1501527386-10736-4-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Suggested-by: NMatthew Wilcox <willy@infradead.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4da243ac

mm: arch: consolidate mmap hugetlb size encodings · aafd4562

由 Mike Kravetz 提交于 9月 06, 2017

A non-default huge page size can be encoded in the flags argument of the
mmap system call.  The definitions for these encodings are in arch
specific header files.  However, all architectures use the same values.

Consolidate all the definitions in the primary user header file
(uapi/linux/mman.h).  Include definitions for all known huge page sizes.
Use the generic encoding definitions in hugetlb_encode.h as the basis
for these definitions.

Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aafd4562

mm: hugetlb: define system call hugetlb size encodings in single file · e652f694

由 Mike Kravetz 提交于 9月 06, 2017

Patch series "Consolidate system call hugetlb page size encodings".

These patches are the result of discussions in
https://lkml.org/lkml/2017/3/8/548.  The following changes are made in the
patch set:

1) Put all the log2 encoded huge page size definitions in a common
   header file.  The idea is have a set of definitions that can be use as
   the basis for system call specific definitions such as MAP_HUGE_* and
   SHM_HUGE_*.

2) Remove MAP_HUGE_* definitions in arch specific files.  All these
   definitions are the same.  Consolidate all definitions in the primary
   user header file (uapi/linux/mman.h).

3) Remove SHM_HUGE_* definitions intended for user space from kernel
   header file, and add to user (uapi/linux/shm.h) header file.  Add
   definitions for all known huge page size encodings as in mmap.

This patch (of 3):

If hugetlb pages are requested in mmap or shmget system calls, a huge
page size other than default can be requested.  This is accomplished by
encoding the log2 of the huge page size in the upper bits of the flag
argument.  asm-generic and arch specific headers all define the same
values for these encodings.

Put common definitions in a single header file.  The primary uapi header
files for mmap and shm will use these definitions as a basis for
definitions specific to those system calls.

Link: http://lkml.kernel.org/r/1501527386-10736-2-git-send-email-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e652f694

05 9月, 2017 16 次提交

media: dvb headers: make checkpatch happier · e4faa09b

由 Mauro Carvalho Chehab 提交于 9月 05, 2017

Adjust dvb ca.h, dmx.h and frontend.h in order to make
checkpatch happier. Now, it only complains about the typedefs,
and those are there just to provide backward userspace
compatibility.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

e4faa09b

media: ca.h: document ca_msg and the corresponding ioctls · 7e6854a9

由 Mauro Carvalho Chehab 提交于 9月 04, 2017

Usually, CA messages are sent/received via reading/writing at
the CA device node. However, two drivers (dst_ca and firedtv-ci)
also implement it via ioctls.

Apparently, on both cases, the net result is the same.

Anyway, let's document it.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

7e6854a9

media: ca docs: document CA_SET_DESCR ioctl and structs · bd9049ed

由 Mauro Carvalho Chehab 提交于 9月 03, 2017

The av7110 driver uses CA_SET_DESCR to store the descrambler
control words at the CA descrambler slots.

Document it.

Thanks-to: Honza Petrouš <jpetrous@gmail.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

bd9049ed

media: net.h: add kernel-doc and use it at Documentation/ · 56d51b65

由 Mauro Carvalho Chehab 提交于 9月 01, 2017

As we did with frontend.h, ca.h and dmx.h, move the struct
definition to net.h.

That should help to keep it updated, as more stuff gets
added there.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

56d51b65

media: frontend.h: Avoid the term DVB when doesn't refer to a delivery system · 5176d6ee

由 Mauro Carvalho Chehab 提交于 9月 01, 2017

The DVB term can either refer to the subsystem or to a delivery
system. Avoid it in the first case at the kernel-doc markups.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

5176d6ee

media: ca.h: document most CA data types · fed7c4fe

由 Mauro Carvalho Chehab 提交于 9月 01, 2017

For most of the stuff there, documenting is easy, as the
header file contains information.

Yet, I was unable to document two data structs:
	ca_msg and ca_descr

As those two structs are used by a few drivers, keep them.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

fed7c4fe

media: ca.h: get rid of CA_SET_PID · 833ff5e7

由 Mauro Carvalho Chehab 提交于 9月 01, 2017

This ioctl seems to be some attempt to support a feature
at the bt8xx dst_ca driver. Yet, as said there, it
"needs more work". Right now, the code there is just
a boilerplate.

At the end of the day, no driver uses this ioctl, nor it is
documented anywhere (except for "needs more work").

So, get rid of it.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

833ff5e7

media: dmx.h: add kernel-doc markups and use it at Documentation/ · bb98e6d2

由 Mauro Carvalho Chehab 提交于 8月 31, 2017

The demux documentation is pretty poor nowadays: most of the
structs and enums aren't documented at all.

Add proper kernel-doc markups for them and use it.

Now, the demux API data structures are fully documented :-)
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

bb98e6d2

media: dmx.h: get rid of DMX_SET_SOURCE · 13adefbe

由 Mauro Carvalho Chehab 提交于 8月 31, 2017

No driver uses this ioctl, nor it is documented anywhere.

So, get rid of it.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

13adefbe

media: dmx.h: get rid of DMX_GET_CAPS · 286fe1ca

由 Mauro Carvalho Chehab 提交于 8月 31, 2017

There's no driver currently using it; it is also not
documented about what it would be supposed to do.

So, get rid of it.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

286fe1ca

media: dmx.h: get rid of unused DMX_KERNEL_CLIENT · 791edca5

由 Mauro Carvalho Chehab 提交于 8月 31, 2017

There's a flag defined for Digital TV demux that is not used
anywhere, called DMX_KERNEL_CLIENT. Get rid of it.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

791edca5

media: dvb frontend docs: use kernel-doc documentation · 9d5e27cb

由 Mauro Carvalho Chehab 提交于 8月 30, 2017

Now that frontend.h contains most documentation for the frontend,
remove the duplicated information from Documentation/ and use the
kernel-doc auto-generated one instead.

That should simplify maintainership of DVB frontend uAPI, as most
of the documentation will stick with the header file.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

9d5e27cb

media: dvb/frontend.h: document the uAPI file · 8220ead8

由 Mauro Carvalho Chehab 提交于 8月 30, 2017

Most of the stuff at the Digital TV frontend header file
are documented only at the Documentation. However, a few
kernel-doc markups are there, several of them with parsing
issues.

Add the missing documentation, copying definitions from the
Documentation when it applies, fixing some bugs.

Please notice that DVBv3 stuff that were deprecated weren't
commented by purpose. Instead, they were clearly tagged as
such.

This patch prepares to move part of the documentation from
Documentation/ to kernel-doc comments.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

8220ead8

media: dvb/frontend.h: move out a private internal structure · f35afa4f

由 Mauro Carvalho Chehab 提交于 8月 30, 2017

struct dtv_cmds_h is just an ancillary struct used by the
dvb_frontend.c to internally store frontend commands.

It doesn't belong to the userspace header, nor it is used anywhere,
except inside the DVB core. So, remove it from the header.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

f35afa4f

media: dmx.h: split typedefs from structs · 3256b36e

由 Mauro Carvalho Chehab 提交于 9月 01, 2017

Using typedefs inside the Kernel is against CodingStyle, and
there's no good usage here.

Just like we did at frontend.h, at commit 0df289a2
("[media] dvb: Get rid of typedev usage for enums"), let's keep
those typedefs only to provide userspace backward compatibility.

No functional changes.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

3256b36e

media: ca.h: split typedefs from structs · c93022a7

由 Mauro Carvalho Chehab 提交于 9月 01, 2017

Using typedefs inside the Kernel is against CodingStyle, and
there's no good usage here.

Just like we did at frontend.h, at commit 0df289a2 ("[media] dvb:
Get rid of typedev usage for enums"), let's keep those typedefs only
to provide userspace backward compatibility.

No functional changes.
Signed-off-by: NMauro Carvalho Chehab <mchehab@s-opensource.com>

c93022a7

04 9月, 2017 3 次提交

netlink: add NLM_F_NONREC flag for deletion requests · 2335ba70

由 Pablo Neira Ayuso 提交于 9月 03, 2017

In the last NFWS in Faro, Portugal, we discussed that netlink is lacking
the semantics to request non recursive deletions, ie. do not delete an
object iff it has child objects that hang from this parent object that
the user requests to be deleted.

We need this new flag to solve a problem for the iptables-compat
backward compatibility utility, that runs iptables commands using the
existing nf_tables netlink interface. Specifically, custom chains in
iptables cannot be deleted if there are rules in it, however, nf_tables
allows to remove any chain that is populated with content. To sort out
this asymmetry, iptables-compat userspace sets this new NLM_F_NONREC
flag to obtain the same semantics that iptables provides.

This new flag should only be used for deletion requests. Note this new
flag value overlaps with the existing:

* NLM_F_ROOT for get requests.
* NLM_F_REPLACE for new requests.

However, those flags should not ever be used in deletion requests.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

2335ba70

netfilter: nft_limit: add stateful object type · a6912055

由 Pablo M. Bermudo Garay 提交于 8月 23, 2017

Register a new limit stateful object type into the stateful object
infrastructure.
Signed-off-by: NPablo M. Bermudo Garay <pablombg@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

a6912055

netfilter: xt_hashlimit: add rate match mode · bea74641

由 Vishwanath Pai 提交于 8月 18, 2017

This patch adds a new feature to hashlimit that allows matching on the
current packet/byte rate without rate limiting. This can be enabled
with a new flag --hashlimit-rate-match. The match returns true if the
current rate of packets is above/below the user specified value.

The main difference between the existing algorithm and the new one is
that the existing algorithm rate-limits the flow whereas the new
algorithm does not. Instead it *classifies* the flow based on whether
it is above or below a certain rate. I will demonstrate this with an
example below. Let us assume this rule:

iptables -A INPUT -m hashlimit --hashlimit-above 10/s -j new_chain

If the packet rate is 15/s, the existing algorithm would ACCEPT 10
packets every second and send 5 packets to "new_chain".

But with the new algorithm, as long as the rate of 15/s is sustained,
all packets will continue to match and every packet is sent to new_chain.

This new functionality will let us classify different flows based on
their current rate, so that further decisions can be made on them based on
what the current rate is.

This is how the new algorithm works:
We divide time into intervals of 1 (sec/min/hour) as specified by
the user. We keep track of the number of packets/bytes processed in the
current interval. After each interval we reset the counter to 0.

When we receive a packet for match, we look at the packet rate
during the current interval and the previous interval to make a
decision:

if [ prev_rate < user and cur_rate < user ]
        return Below
else
        return Above

Where cur_rate is the number of packets/bytes seen in the current
interval, prev is the number of packets/bytes seen in the previous
interval and 'user' is the rate specified by the user.

We also provide flexibility to the user for choosing the time
interval using the option --hashilmit-interval. For example the user can
keep a low rate like x/hour but still keep the interval as small as 1
second.

To preserve backwards compatibility we have to add this feature in a new
revision, so I've created revision 3 for hashlimit. The two new options
we add are:

--hashlimit-rate-match
--hashlimit-rate-interval

I have updated the help text to add these new options. Also added a few
tests for the new options.
Suggested-by: NIgor Lubashev <ilubashe@akamai.com>
Reviewed-by: NJosh Hunt <johunt@akamai.com>
Signed-off-by: NVishwanath Pai <vpai@akamai.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

bea74641

02 9月, 2017 2 次提交

tcp_diag: report TCP MD5 signing keys and addresses · c03fa9bc

由 Ivan Delalande 提交于 8月 31, 2017

Report TCP MD5 (RFC2385) signing keys, addresses and address prefixes to
processes with CAP_NET_ADMIN requesting INET_DIAG_INFO. Currently it is
not possible to retrieve these from the kernel once they have been
configured on sockets.
Signed-off-by: NIvan Delalande <colona@arista.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c03fa9bc

fsmap: fix documentation of FMR_OF_LAST · d897246d

由 Darrick J. Wong 提交于 8月 31, 2017

The FMR_OF_LAST flag is set on the last fsmap record being returned for
the dataset requested, contrary to what the header file says.  Fix the
docs to reflect the behavior of all fsmap implementations.
Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: NDave Chinner <dchinner@redhat.com>

d897246d

01 9月, 2017 4 次提交

ANDROID: binder: Add BINDER_GET_NODE_DEBUG_INFO ioctl · abcc6153

由 Colin Cross 提交于 8月 31, 2017

The BINDER_GET_NODE_DEBUG_INFO ioctl will return debug info on
a node.  Each successive call reusing the previous return value
will return the next node.  The data will be used by
libmemunreachable to mark the pointers with kernel references
as reachable.
Signed-off-by: NColin Cross <ccross@android.com>
Signed-off-by: NMartijn Coenen <maco@android.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

abcc6153

bpf: Add mark and priority to sock options that can be set · 482dca93

由 David Ahern 提交于 8月 31, 2017

Add socket mark and priority to fields that can be set by
ebpf program when a socket is created.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

482dca93

devlink: Add IPv6 header for dpipe · 1797f5b3

由 Arkadi Sharshevsky 提交于 8月 31, 2017

This will be used by the IPv6 host table which will be introduced in the
following patches. The fields in the header are added per-use. This header
is global and can be reused by many drivers.
Signed-off-by: NArkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: NJiri Pirko <jiri@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1797f5b3

annotate RWF_... flags · ddef7ed2

由 Christoph Hellwig 提交于 7月 06, 2017

[AV: added missing annotations in syscalls.h/compat.h]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

ddef7ed2

31 8月, 2017 4 次提交

IB/core: Add completion queue (cq) object actions · 9ee79fce

由 Matan Barak 提交于 8月 03, 2017

Adding CQ ioctl actions:
1. create_cq
2. destroy_cq

This requires adding the following:
1. A specification describing the method
	a. Handler
	b. Attributes specification
		Each attribute is one of the following:
		a. PTR_IN - input data
			    Note: This could be encoded inlined for
				  data < 64bit
		b. PTR_OUT - response data
		c. IDR - idr based object
		d. FD - fd based object
                Blobs attributes (clauses a and b) contain their type,
	        while objects specifications (clauses c and d)
                contains the expected object type (for example, the
                given id should be UVERBS_TYPE_PD) and the required
                access (READ, WRITE, NEW or DESTROY). If a NEW is
                required, the new object's id will be assigned to this
                attribute. All attributes could get UA_FLAGS
                attribute. Currently we support stating that an
		attribute is mandatory or that the specification size
                corresponds to a lower bound (and that this attribute
		could be extended).
		We currently add both default attributes and the two
		generic UHW_IN and UHW_OUT driver specific attributes.
2. Handler
   A handler gets a uverbs_attr_bundle. The handler developer uses
   uverbs_attr_get to fetch an attribute of a given id.
   Each of these attribute groups correspond to the specification
   group defined in the action (clauses 1.b and 1.c respectively).
   The indices of these arrays corresponds to the attribute ids
   declared in the specifications (clause 2).

   The handler is quite simple. It assumes the infrastructure fetched
   all objects and locked, created or destroyed them as required by
   the specification. Pointer (or blob) attributes were validated to
   match their required sizes. After the handler finished, the
   infrastructure commits or rollbacks the objects.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

9ee79fce

IB/core: Add legacy driver's user-data · d70724f1

由 Matan Barak 提交于 8月 03, 2017

In this phase, we don't want to change all the drivers to use
flexible driver's specific attributes. Therefore, we add two default
attributes: UHW_IN and UHW_OUT. These attributes are optional in some
methods and they encode the driver specific command data. We add
a function that extract this data and creates the legacy udata over
it.

Driver's data should start from UVERBS_UDATA_DRIVER_DATA_FLAG. This
turns on the first bit of the namespace, indicating this attribute
belongs to the driver's namespace.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

d70724f1

IB/core: Export ioctl enum types to user-space · 64b19e13

由 Matan Barak 提交于 8月 03, 2017

Add a new ib_user_ioctl_verbs.h which exports all required ABI
enums and structs to the user-space.
Export the default types to user-space through this file.
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

64b19e13

IB/core: Add new ioctl interface · fac9658c

由 Matan Barak 提交于 8月 03, 2017

In this ioctl interface, processing the command starts from
properties of the command and fetching the appropriate user objects
before calling the handler.

Parsing and validation is done according to a specifier declared by
the driver's code. In the driver, all supported objects are declared.
These objects are separated to different object namepsaces. Dividing
objects to namespaces is done at initialization by using the higher
bits of the object ids. This initialization can mix objects declared
in different places to one parsing tree using in this ioctl interface.

For each object we list all supported methods. Similarly to objects,
methods are separated to method namespaces too. Namespacing is done
similarly to the objects case. This could be used in order to add
methods to an existing object.

Each method has a specific handler, which could be either a default
handler or a driver specific handler.
Along with the handler, a bunch of attributes are specified as well.
Similarly to objects and method, attributes are namespaced and hashed
by their ids at initialization too. All supported attributes are
subject to automatic fetching and validation. These attributes include
the command, response and the method's related objects' ids.

When these entities (objects, methods and attributes) are used, the
high bits of the entities ids are used in order to calculate the hash
bucket index. Then, these high bits are masked out in order to have a
zero based index. Since we use these high bits for both bucketing and
namespacing, we get a compact representation and O(1) array access.
This is mandatory for efficient dispatching.

Each attribute has a type (PTR_IN, PTR_OUT, IDR and FD) and a length.
Attributes could be validated through some attributes, like:
(*) Minimum size / Exact size
(*) Fops for FD
(*) Object type for IDR

If an IDR/fd attribute is specified, the kernel also states the object
type and the required access (NEW, WRITE, READ or DESTROY).
All uobject/fd management is done automatically by the infrastructure,
meaning - the infrastructure will fail concurrent commands that at
least one of them requires concurrent access (WRITE/DESTROY),
synchronize actions with device removals (dissociate context events)
and take care of reference counting (increase/decrease) for concurrent
actions invocation. The reference counts on the actual kernel objects
shall be handled by the handlers.

 objects
+--------+
|        |
|        |   methods                                                                +--------+
|        |   ns         method      method_spec                           +-----+   |len     |
+--------+  +------+[d]+-------+   +----------------+[d]+------------+    |attr1+-> |type    |
| object +> |method+-> | spec  +-> +  attr_buckets  +-> |default_chain+--> +-----+   |idr_type|
+--------+  +------+   |handler|   |                |   +------------+    |attr2|   |access  |
|        |  |      |   +-------+   +----------------+   |driver chain|    +-----+   +--------+
|        |  |      |                                    +------------+
|        |  +------+
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
|        |
+--------+

[d] = Hash ids to groups using the high order bits

The right types table is also chosen by using the high bits from
the ids. Currently we have either default or driver specific groups.

Once validation and object fetching (or creation) completed, we call
the handler:
int (*handler)(struct ib_device *ib_dev, struct ib_uverbs_file *ufile,
               struct uverbs_attr_bundle *ctx);

ctx bundles attributes of different namespaces. Each element there
is an array of attributes which corresponds to one namespaces of
attributes. For example, in the usually used case:

 ctx                               core
+----------------------------+     +------------+
| core:                      +---> | valid      |
+----------------------------+     | cmd_attr   |
| driver:                    |     +------------+
|----------------------------+--+  | valid      |
                                |  | cmd_attr   |
                                |  +------------+
                                |  | valid      |
                                |  | obj_attr   |
                                |  +------------+
                                |
                                |  drivers
                                |  +------------+
                                +> | valid      |
                                   | cmd_attr   |
                                   +------------+
                                   | valid      |
                                   | cmd_attr   |
                                   +------------+
                                   | valid      |
                                   | obj_attr   |
                                   +------------+
Signed-off-by: NMatan Barak <matanb@mellanox.com>
Reviewed-by: NYishai Hadas <yishaih@mellanox.com>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

fac9658c

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功