提交 · d4f323672aa63713b7ca26da418f66cc30d3a41a · openanolis / cloud-kernel

06 3月, 2016 1 次提交

nfit, libnvdimm: clear poison command support · d4f32367

由 Dan Williams 提交于 3月 03, 2016

Add the boiler-plate for a 'clear error' command based on section
9.20.7.6 "Function Index 4 - Clear Uncorrectable Error" from the ACPI
6.1 specification, and add a reference implementation in nfit_test.
Reviewed-by: NVishal Verma <vishal.l.verma@intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

d4f32367

24 2月, 2016 1 次提交

nfit: update address range scrub commands to the acpi 6.1 format · 4577b066

由 Dan Williams 提交于 2月 17, 2016

The original format of these commands from the "NVDIMM DSM Interface
Example" [1] are superseded by the ACPI 6.1 definition of the "NVDIMM Root
Device _DSMs" [2].

[1]: http://pmem.io/documents/NVDIMM_DSM_Interface_Example.pdf
[2]: http://www.uefi.org/sites/default/files/resources/ACPI_6_1.pdf
     "9.20.7 NVDIMM Root Device _DSMs"

Changes include:
1/ New 'restart' fields in ars_status, unfortunately these are
   implemented in the middle of the existing definition so this change
   is not backwards compatible.  The expectation is that shipping
   platforms will only ever support the ACPI 6.1 definition.

2/ New status values for ars_start ('busy') and ars_status ('overflow').

Cc: Vishal Verma <vishal.l.verma@intel.com>
Cc: Linda Knippers <linda.knippers@hpe.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

4577b066

31 1月, 2016 1 次提交

block: revert runtime dax control of the raw block device · 9f4736fe

由 Dan Williams 提交于 1月 28, 2016

Dynamically enabling DAX requires that the page cache first be flushed
and invalidated.  This must occur atomically with the change of DAX mode
otherwise we confuse the fsync/msync tracking and violate data
durability guarantees.  Eliminate the possibilty of DAX-disabled to
DAX-enabled transitions for now and revisit this for the next cycle.

Cc: Jan Kara <jack@suse.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

9f4736fe

27 1月, 2016 1 次提交

drm/etnaviv: add further minor features and varyings count · 602eb489

由 Russell King 提交于 1月 24, 2016

Export further minor feature bitmasks and the varyings count from
the GPU specifications registers to userspace.
Acked-by: NChristian Gmeiner <christian.gmeiner@gmail.com>
Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: NLucas Stach <l.stach@pengutronix.de>

602eb489

21 1月, 2016 1 次提交

epoll: add EPOLLEXCLUSIVE flag · df0108c5

由 Jason Baron 提交于 1月 20, 2016

Currently, epoll file descriptors or epfds (the fd returned from
epoll_create[1]()) that are added to a shared wakeup source are always
added in a non-exclusive manner.  This means that when we have multiple
epfds attached to a shared fd source they are all woken up.  This creates
thundering herd type behavior.

Introduce a new 'EPOLLEXCLUSIVE' flag that can be passed as part of the
'event' argument during an epoll_ctl() EPOLL_CTL_ADD operation.  This new
flag allows for exclusive wakeups when there are multiple epfds attached
to a shared fd event source.

The implementation walks the list of exclusive waiters, and queues an
event to each epfd, until it finds the first waiter that has threads
blocked on it via epoll_wait().  The idea is to search for threads which
are idle and ready to process the wakeup events.  Thus, we queue an event
to at least 1 epfd, but may still potentially queue an event to all epfds
that are attached to the shared fd source.

Performance testing was done by Madars Vitolins using a modified version
of Enduro/X.  The use of the 'EPOLLEXCLUSIVE' flag reduce the length of
this particular workload from 860s down to 24s.

Sample epoll_clt text:

EPOLLEXCLUSIVE

  Sets an exclusive wakeup mode for the epfd file descriptor that is
  being attached to the target file descriptor, fd.  Thus, when an event
  occurs and multiple epfd file descriptors are attached to the same
  target file using EPOLLEXCLUSIVE, one or more epfds will receive an
  event with epoll_wait(2).  The default in this scenario (when
  EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
  EPOLLEXCLUSIVE may only be specified with the op EPOLL_CTL_ADD.
Signed-off-by: NJason Baron <jbaron@akamai.com>
Tested-by: NMadars Vitolins <m@silodev.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Al Viro <viro@ftp.linux.org.uk>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Eric Wong <normalperson@yhbt.net>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df0108c5

16 1月, 2016 2 次提交

arch/*/include/uapi/asm/mman.h: : let MADV_FREE have same value for all architectures · 21f55b01

由 Chen Gang 提交于 1月 15, 2016

For uapi, need try to let all macros have same value, and MADV_FREE is
added into main branch recently, so need redefine MADV_FREE for it.

At present, '8' can be shared with all architectures, so redefine it to
'8'.

[sudipm.mukherjee@gmail.com: correct uniform value of MADV_FREE]
Signed-off-by: NChen Gang <gang.chen.5i5j@gmail.com>
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NHugh Dickins <hughd@google.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Chris Zankel <chris@zankel.net>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Shaohua Li <shli@kernel.org>
Cc: <yalin.wang2010@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: Jason Evans <je@fb.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttil <mika.penttila@nextfour.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Shaohua Li <shli@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NSudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

21f55b01

mm: support madvise(MADV_FREE) · 854e9ed0

由 Minchan Kim 提交于 1月 15, 2016

Linux doesn't have an ability to free pages lazy while other OS already
have been supported that named by madvise(MADV_FREE).

The gain is clear that kernel can discard freed pages rather than
swapping out or OOM if memory pressure happens.

Without memory pressure, freed pages would be reused by userspace
without another additional overhead(ex, page fault + allocation +
zeroing).

Jason Evans said:

: Facebook has been using MAP_UNINITIALIZED
: (https://lkml.org/lkml/2012/1/18/308) in some of its applications for
: several years, but there are operational costs to maintaining this
: out-of-tree in our kernel and in jemalloc, and we are anxious to retire it
: in favor of MADV_FREE.  When we first enabled MAP_UNINITIALIZED it
: increased throughput for much of our workload by ~5%, and although the
: benefit has decreased using newer hardware and kernels, there is still
: enough benefit that we cannot reasonably retire it without a replacement.
:
: Aside from Facebook operations, there are numerous broadly used
: applications that would benefit from MADV_FREE.  The ones that immediately
: come to mind are redis, varnish, and MariaDB.  I don't have much insight
: into Android internals and development process, but I would hope to see
: MADV_FREE support eventually end up there as well to benefit applications
: linked with the integrated jemalloc.
:
: jemalloc will use MADV_FREE once it becomes available in the Linux kernel.
: In fact, jemalloc already uses MADV_FREE or equivalent everywhere it's
: available: *BSD, OS X, Windows, and Solaris -- every platform except Linux
: (and AIX, but I'm not sure it even compiles on AIX).  The lack of
: MADV_FREE on Linux forced me down a long series of increasingly
: sophisticated heuristics for madvise() volume reduction, and even so this
: remains a common performance issue for people using jemalloc on Linux.
: Please integrate MADV_FREE; many people will benefit substantially.

How it works:

When madvise syscall is called, VM clears dirty bit of ptes of the
range.  If memory pressure happens, VM checks dirty bit of page table
and if it found still "clean", it means it's a "lazyfree pages" so VM
could discard the page instead of swapping out.  Once there was store
operation for the page before VM peek a page to reclaim, dirty bit is
set so VM can swap out the page instead of discarding.

One thing we should notice is that basically, MADV_FREE relies on dirty
bit in page table entry to decide whether VM allows to discard the page
or not.  IOW, if page table entry includes marked dirty bit, VM
shouldn't discard the page.

However, as a example, if swap-in by read fault happens, page table
entry doesn't have dirty bit so MADV_FREE could discard the page
wrongly.

For avoiding the problem, MADV_FREE did more checks with PageDirty and
PageSwapCache.  It worked out because swapped-in page lives on swap
cache and since it is evicted from the swap cache, the page has PG_dirty
flag.  So both page flags check effectively prevent wrong discarding by
MADV_FREE.

However, a problem in above logic is that swapped-in page has PG_dirty
still after they are removed from swap cache so VM cannot consider the
page as freeable any more even if madvise_free is called in future.

Look at below example for detail.

    ptr = malloc();
    memset(ptr);
    ..
    ..
    .. heavy memory pressure so all of pages are swapped out
    ..
    ..
    var = *ptr; -> a page swapped-in and could be removed from
                   swapcache. Then, page table doesn't mark
                   dirty bit and page descriptor includes PG_dirty
    ..
    ..
    madvise_free(ptr); -> It doesn't clear PG_dirty of the page.
    ..
    ..
    ..
    .. heavy memory pressure again.
    .. In this time, VM cannot discard the page because the page
    .. has *PG_dirty*

To solve the problem, this patch clears PG_dirty if only the page is
owned exclusively by current process when madvise is called because
PG_dirty represents ptes's dirtiness in several processes so we could
clear it only if we own it exclusively.

Firstly, heavy users would be general allocators(ex, jemalloc, tcmalloc
and hope glibc supports it) and jemalloc/tcmalloc already have supported
the feature for other OS(ex, FreeBSD)

  barrios@blaptop:~/benchmark/ebizzy$ lscpu
  Architecture:          x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Byte Order:            Little Endian
  CPU(s):                12
  On-line CPU(s) list:   0-11
  Thread(s) per core:    1
  Core(s) per socket:    1
  Socket(s):             12
  NUMA node(s):          1
  Vendor ID:             GenuineIntel
  CPU family:            6
  Model:                 2
  Stepping:              3
  CPU MHz:               3200.185
  BogoMIPS:              6400.53
  Virtualization:        VT-x
  Hypervisor vendor:     KVM
  Virtualization type:   full
  L1d cache:             32K
  L1i cache:             32K
  L2 cache:              4096K
  NUMA node0 CPU(s):     0-11
  ebizzy benchmark(./ebizzy -S 10 -n 512)

  Higher avg is better.

   vanilla-jemalloc             MADV_free-jemalloc

  1 thread
  records: 10                   records: 10
  avg:   2961.90                avg:  12069.70
  std:     71.96(2.43%)         std:    186.68(1.55%)
  max:   3070.00                max:  12385.00
  min:   2796.00                min:  11746.00

  2 thread
  records: 10                   records: 10
  avg:   5020.00                avg:  17827.00
  std:    264.87(5.28%)         std:    358.52(2.01%)
  max:   5244.00                max:  18760.00
  min:   4251.00                min:  17382.00

  4 thread
  records: 10                   records: 10
  avg:   8988.80                avg:  27930.80
  std:   1175.33(13.08%)        std:   3317.33(11.88%)
  max:   9508.00                max:  30879.00
  min:   5477.00                min:  21024.00

  8 thread
  records: 10                   records: 10
  avg:  13036.50                avg:  33739.40
  std:    170.67(1.31%)         std:   5146.22(15.25%)
  max:  13371.00                max:  40572.00
  min:  12785.00                min:  24088.00

  16 thread
  records: 10                   records: 10
  avg:  11092.40                avg:  31424.20
  std:    710.60(6.41%)         std:   3763.89(11.98%)
  max:  12446.00                max:  36635.00
  min:   9949.00                min:  25669.00

  32 thread
  records: 10                   records: 10
  avg:  11067.00                avg:  34495.80
  std:    971.06(8.77%)         std:   2721.36(7.89%)
  max:  12010.00                max:  38598.00
  min:   9002.00                min:  30636.00

In summary, MADV_FREE is about much faster than MADV_DONTNEED.

This patch (of 12):

Add core MADV_FREE implementation.

[akpm@linux-foundation.org: small cleanups]
Signed-off-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NHugh Dickins <hughd@google.com>
Cc: Mika Penttil <mika.penttila@nextfour.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Jason Evans <je@fb.com>
Cc: Daniel Micay <danielmicay@gmail.com>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: Shaohua Li <shli@kernel.org>
Cc: <yalin.wang2010@gmail.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
Cc: "Shaohua Li" <shli@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen Gang <gang.chen.5i5j@gmail.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Darrick J. Wong <darrick.wong@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Helge Deller <deller@gmx.de>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Roland Dreier <roland@kernel.org>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Shaohua Li <shli@kernel.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

854e9ed0

14 1月, 2016 1 次提交

uapi: update install list after nvme.h rename · a9cf8284

由 Mike Frysinger 提交于 1月 10, 2016

Commit 9d99a8dd ("nvme: move hardware structures out of the uapi
version of nvme.h") renamed nvme.h to nvme_ioctl.h, but the uapi list
still refers to nvme.h.  People trying to install the headers hit a
failure as the header no longer exists.

Cc: stable@vger.kernel.org
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@fb.com>

a9cf8284

12 1月, 2016 5 次提交

lightnvm: introduce factory reset · 8b4970c4

由 Matias Bjørling 提交于 1月 12, 2016

Now that a device can be managed using the system blocks, a method to
reset the device is necessary as well. This patch introduces logic to
reset the device easily to factory state and exposes it through an
ioctl.

The ioctl takes the following flags:

  NVM_FACTORY_ERASE_ONLY_USER
      By default all blocks, except host-reserved blocks are erased upon
      factory reset. Instead of this, only erase host-reserved blocks.
  NVM_FACTORY_RESET_HOST_BLKS
      Mark host-reserved blocks to be erased and set their type to free.
  NVM_FACTORY_RESET_GRWN_BBLKS
      Mark "grown bad blocks" to be erased and set their type to free.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

8b4970c4

lightnvm: introduce ioctl to initialize device · 55696154

由 Matias Bjørling 提交于 1月 12, 2016

Based on the previous patch, we now introduce an ioctl to initialize the
device using nvm_init_sysblock and create the necessary system blocks.
The user may specify the media manager that they wish to instantiate on
top. Default from user-space will be "gennvm".
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

55696154

lightnvm: core on-disk initialization · e3eb3799

由 Matias Bjørling 提交于 1月 12, 2016

An Open-Channel SSD shall be initialized before use. To initialize, we
define an on-disk format, that keeps a small set of metadata to bring up
the media manager on top of the device.

The initial step is introduced to allow a user to format the disks for a
given media manager. During format, a system block is stored on one to
three separate luns on the device. Each lun has the system block
duplicated. During initialization, the system block can be retrieved and
the appropriate media manager can initialized.

The on-disk format currently covers (struct nvm_system_block):

 - Magic value "NVMS".
 - Monotonic increasing sequence number.
 - The physical block erase count.
 - Version of the system block format.
 - Media manager type.
 - Media manager superblock physical address.

The interface provides three functions to manage the system block:

 int nvm_init_sysblock(struct nvm_dev *, struct nvm_sb_info *)
 int nvm_get_sysblock(struct nvm *dev, struct nvm_sb_info *)
 int nvm_update_sysblock(struct nvm *dev, struct nvm_sb_info *)

Each implement a part of the logic to manage the system block. The
initialization creates the first system blocks and mark them on the
device. Get retrieves the latest system block by scanning all pages in
the associated system blocks. The update sysblock writes new metadata
and allocates new block if necessary.
Signed-off-by: NMatias Bjørling <m@bjorling.me>
Signed-off-by: NJens Axboe <axboe@fb.com>

e3eb3799

bpf: support ipv6 for bpf_skb_{set,get}_tunnel_key · c6c33454

由 Daniel Borkmann 提交于 1月 11, 2016

After IPv6 support has recently been added to metadata dst and related
encaps, add support for populating/reading it from an eBPF program.

Commit d3aa45ce ("bpf: add helpers to access tunnel metadata") started
with initial IPv4-only support back then (due to IPv6 metadata support
not being available yet).

To stay compatible with older programs, we need to test for the passed
structure size. Also TOS and TTL support from the ip_tunnel_info key has
been added. Tested with vxlan devs in collect meta data mode with IPv4,
IPv6 and in compat mode over different network namespaces.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6c33454

bpf: export helper function flags and reject invalid ones · 781c53bc

由 Daniel Borkmann 提交于 1月 11, 2016

Export flags used by eBPF helper functions through UAPI, so they can be
used by programs (instead of them redefining all flags each time or just
using the hard-coded values). It also gives a better overview what flags
are used where and we can further get rid of the extra macros defined in
filter.c. Moreover, reject invalid flags.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

781c53bc

11 1月, 2016 16 次提交

[media] Postpone the addition of MEDIA_IOC_G_TOPOLOGY · be0270ec

由 Mauro Carvalho Chehab 提交于 12月 28, 2015

There are a few discussions left with regards to this ioctl:

1) the name of the new structs will contain _v2_ on it?
2) what's the best alternative to avoid compat32 issues?

Due to that, let's postpone the addition of this new ioctl to
the next Kernel version, to give people more time to discuss it.
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

be0270ec

[media] uapi/media.h: Use u32 for the number of graph objects · 7c9d6731

由 Mauro Carvalho Chehab 提交于 12月 17, 2015

While we need to keep a u64 alignment to avoid compat32 issues,
having the number of entities/pads/links/interfaces represented
by an u64 is incoherent with the ID number, with is an u32.

In order to make it coherent, change those quantities to u32.
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

7c9d6731

[media] media.h: let be clear that tuners need to use connectors · b3109d66

由 Mauro Carvalho Chehab 提交于 12月 11, 2015

The V4L2 core won't be adding connectors to the tuners and other
entities that need them. Let it be clear.
Suggested-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

b3109d66

[media] media-device: Use u64 ints for pointers · b3b7a9f1

由 Mauro Carvalho Chehab 提交于 12月 11, 2015

By using u64 integers and pointers, we can get rid of compat32
logic. So, let's do it!
Suggested-by: NArnd Bergmann <arnd@arndb.de>
Suggested-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

b3b7a9f1

[media] media-device.h: Let clearer that entity function must be initialized · d1b9da2d

由 Mauro Carvalho Chehab 提交于 12月 11, 2015

Improve the documentation to let it clear that the entity function
must be initialized.
Suggested-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

d1b9da2d

[media] uapi/media.h: Rename entities types to functions · 4ca72efa

由 Mauro Carvalho Chehab 提交于 12月 10, 2015

Rename the userspace types from MEDIA_ENT_T_ to MEDIA_ENT_F_
and add the backward compatibility bits.

The changes at the .c files was generated by the following
coccinelle script:

@@
@@
-MEDIA_ENT_T_UNKNOWN
+MEDIA_ENT_F_UNKNOWN
@@
@@
-MEDIA_ENT_T_DVB_BASE
+MEDIA_ENT_F_DVB_BASE
@@
@@
-MEDIA_ENT_T_V4L2_BASE
+MEDIA_ENT_F_V4L2_BASE
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_BASE
+MEDIA_ENT_F_V4L2_SUBDEV_BASE
@@
@@
-MEDIA_ENT_T_CONNECTOR_BASE
+MEDIA_ENT_F_CONNECTOR_BASE
@@
@@
-MEDIA_ENT_T_V4L2_VIDEO
+MEDIA_ENT_F_IO_V4L
@@
@@
-MEDIA_ENT_T_V4L2_VBI
+MEDIA_ENT_F_IO_VBI
@@
@@
-MEDIA_ENT_T_V4L2_SWRADIO
+MEDIA_ENT_F_IO_SWRADIO
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_UNKNOWN
+MEDIA_ENT_F_V4L2_SUBDEV_UNKNOWN
@@
@@
-MEDIA_ENT_T_CONN_RF
+MEDIA_ENT_F_CONN_RF
@@
@@
-MEDIA_ENT_T_CONN_SVIDEO
+MEDIA_ENT_F_CONN_SVIDEO
@@
@@
-MEDIA_ENT_T_CONN_COMPOSITE
+MEDIA_ENT_F_CONN_COMPOSITE
@@
@@
-MEDIA_ENT_T_CONN_TEST
+MEDIA_ENT_F_CONN_TEST
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_SENSOR
+MEDIA_ENT_F_CAM_SENSOR
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_FLASH
+MEDIA_ENT_F_FLASH
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_LENS
+MEDIA_ENT_F_LENS
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_DECODER
+MEDIA_ENT_F_ATV_DECODER
@@
@@
-MEDIA_ENT_T_V4L2_SUBDEV_TUNER
+MEDIA_ENT_F_TUNER
@@
@@
-MEDIA_ENT_T_DVB_DEMOD
+MEDIA_ENT_F_DTV_DEMOD
@@
@@
-MEDIA_ENT_T_DVB_DEMUX
+MEDIA_ENT_F_TS_DEMUX
@@
@@
-MEDIA_ENT_T_DVB_TSOUT
+MEDIA_ENT_F_IO_DTV
@@
@@
-MEDIA_ENT_T_DVB_CA
+MEDIA_ENT_F_DTV_CA
@@
@@
-MEDIA_ENT_T_DVB_NET_DECAP
+MEDIA_ENT_F_DTV_NET_DECAP
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

4ca72efa

[media] media-device: export the entity function via new ioctl · d87cdb88

由 Mauro Carvalho Chehab 提交于 9月 06, 2015

Now that entities have a main function, expose it via
MEDIA_IOC_G_TOPOLOGY ioctl.

Please notice that some entities may have secundary functions.
Such use case will be addressed later, when we add support for the
Media Controller properties.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

d87cdb88

[media] media.h: create connector entities for hybrid TV devices · 49a11518

由 Mauro Carvalho Chehab 提交于 8月 31, 2015

Add entities to represent the connectors that exists inside a
hybrid TV device.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

49a11518

[media] uapi/media.h: Add MEDIA_IOC_G_TOPOLOGY ioctl · c398bb64

由 Mauro Carvalho Chehab 提交于 8月 23, 2015

Add a new ioctl that will report the entire topology on
one go.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

c398bb64

[media] media.h: don't use legacy entity macros at Kernel · 4376679a

由 Mauro Carvalho Chehab 提交于 8月 29, 2015

Put the legacy MEDIA_ENT_* macros under a #ifndef __KERNEL__,
in order to be sure that none of those old symbols are used
inside the Kernel.
Acked-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

4376679a

[media] media controller: get rid of entity subtype on Kernel · 687b4209

由 Mauro Carvalho Chehab 提交于 5月 07, 2015

Don't use anymore the type/subtype entity data/macros
inside the Kernel.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Acked-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

687b4209

[media] v4l2-subdev: use MEDIA_ENT_T_UNKNOWN for new subdevs · b50bde4e

由 Mauro Carvalho Chehab 提交于 5月 07, 2015

Instead of abusing MEDIA_ENT_T_V4L2_SUBDEV, initialize
new subdev entities as MEDIA_ENT_T_UNKNOWN.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

b50bde4e

[media] media: add macros to check if subdev or V4L2 DMA · fa17b46a

由 Mauro Carvalho Chehab 提交于 8月 21, 2015

As we'll be removing entity subtypes from the Kernel, we need
to provide a way for drivers and core to check if a given
entity is represented by a V4L2 subdev or if it is an V4L2
I/O entity (typically with DMA).

Drivers that create entities that don't belong to any defined subdev
category should use MEDIA_ENT_T_V4L2_SUBDEV_UNKNOWN.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

fa17b46a

[media] uapi/media.h: Fix entity namespace · 32fdc0e1

由 Mauro Carvalho Chehab 提交于 8月 21, 2015

Now that interfaces got created, we need to fix the entity
namespace.

So, let's create a consistent new namespace and add backward
compatibility macros to keep the old namespace preserved.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

32fdc0e1

[media] uapi/media.h: Declare interface types for V4L2 and DVB · a1d2510e

由 Mauro Carvalho Chehab 提交于 8月 20, 2015

Declare the interface types that will be used by the new
G_TOPOLOGY ioctl that will be defined later on.

For now, we need those types, as they'll be used on the
internal structs associated with the new media_interface
graph object defined on the next patch.
Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
Reviewed-by: NJavier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@osg.samsung.com>

a1d2510e

net, sched: add clsact qdisc · 1f211a1b

由 Daniel Borkmann 提交于 1月 07, 2016

This work adds a generalization of the ingress qdisc as a qdisc holding
only classifiers. The clsact qdisc works on ingress, but also on egress.
In both cases, it's execution happens without taking the qdisc lock, and
the main difference for the egress part compared to prior version of [1]
is that this can be applied with _any_ underlying real egress qdisc (also
classless ones).

Besides solving the use-case of [1], that is, allowing for more programmability
on assigning skb->priority for the mqprio case that is supported by most
popular 10G+ NICs, it also opens up a lot more flexibility for other tc
applications. The main work on classification can already be done at clsact
egress time if the use-case allows and state stored for later retrieval
f.e. again in skb->priority with major/minors (which is checked by most
classful qdiscs before consulting tc_classify()) and/or in other skb fields
like skb->tc_index for some light-weight post-processing to get to the
eventual classid in case of a classful qdisc. Another use case is that
the clsact egress part allows to have a central egress counterpart to
the ingress classifiers, so that classifiers can easily share state (e.g.
in cls_bpf via eBPF maps) for ingress and egress.

Currently, default setups like mq + pfifo_fast would require for this to
use, for example, prio qdisc instead (to get a tc_classify() run) and to
duplicate the egress classifier for each queue. With clsact, it allows
for leaving the setup as is, it can additionally assign skb->priority to
put the skb in one of pfifo_fast's bands and it can share state with maps.
Moreover, we can access the skb's dst entry (f.e. to retrieve tclassid)
w/o the need to perform a skb_dst_force() to hold on to it any longer. In
lwt case, we can also use this facility to setup dst metadata via cls_bpf
(bpf_skb_set_tunnel_key()) without needing a real egress qdisc just for
that (case of IFF_NO_QUEUE devices, for example).

The realization can be done without any changes to the scheduler core
framework. All it takes is that we have two a-priori defined minors/child
classes, where we can mux between ingress and egress classifier list
(dev->ingress_cl_list and dev->egress_cl_list, latter stored close to
dev->_tx to avoid extra cacheline miss for moderate loads). The egress
part is a bit similar modelled to handle_ing() and patched to a noop in
case the functionality is not used. Both handlers are now called
sch_handle_ingress() and sch_handle_egress(), code sharing among the two
doesn't seem practical as there are various minor differences in both
paths, so that making them conditional in a single handler would rather
slow things down.

Full compatibility to ingress qdisc is provided as well. Since both
piggyback on TC_H_CLSACT, only one of them (ingress/clsact) can exist
per netdevice, and thus ingress qdisc specific behaviour can be retained
for user space. This means, either a user does 'tc qdisc add dev foo ingress'
and configures ingress qdisc as usual, or the 'tc qdisc add dev foo clsact'
alternative, where both, ingress and egress classifier can be configured
as in the below example. ingress qdisc supports attaching classifier to any
minor number whereas clsact has two fixed minors for muxing between the
lists, therefore to not break user space setups, they are better done as
two separate qdiscs.

I decided to extend the sch_ingress module with clsact functionality so
that commonly used code can be reused, the module is being aliased with
sch_clsact so that it can be auto-loaded properly. Alternative would have been
to add a flag when initializing ingress to alter its behaviour plus aliasing
to a different name (as it's more than just ingress). However, the first would
end up, based on the flag, choosing the new/old behaviour by calling different
function implementations to handle each anyway, the latter would require to
register ingress qdisc once again under different alias. So, this really begs
to provide a minimal, cleaner approach to have Qdisc_ops and Qdisc_class_ops
by its own that share callbacks used by both.

Example, adding qdisc:

# tc qdisc add dev foo clsact
# tc qdisc show dev foo
qdisc mq 0: root
qdisc pfifo_fast 0: parent :1 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :2 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :3 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc pfifo_fast 0: parent :4 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1
qdisc clsact ffff: parent ffff:fff1

Adding filters (deleting, etc works analogous by specifying ingress/egress):

# tc filter add dev foo ingress bpf da obj bar.o sec ingress
# tc filter add dev foo egress bpf da obj bar.o sec egress
# tc filter show dev foo ingress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[ingress] direct-action
# tc filter show dev foo egress
filter protocol all pref 49152 bpf
filter protocol all pref 49152 bpf handle 0x1 bar.o:[egress] direct-action

A 'tc filter show dev foo' or 'tc filter show dev foo parent ffff:' will
show an empty list for clsact. Either using the parent names (ingress/egress)
or specifying the full major/minor will then show the related filter lists.

Prior work on a mqprio prequeue() facility [1] was done mainly by John Fastabend.

[1] http://patchwork.ozlabs.org/patch/512949/Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f211a1b

09 1月, 2016 2 次提交

block: enable dax for raw block devices · 5a023cdb

由 Dan Williams 提交于 11月 30, 2015

If an application wants exclusive access to all of the persistent memory
provided by an NVDIMM namespace it can use this raw-block-dax facility
to forgo establishing a filesystem.  This capability is targeted
primarily to hypervisors wanting to provision persistent memory for
guests.  It can be disabled / enabled dynamically via the new BLKDAXSET
ioctl.

Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Dave Chinner <david@fromorbit.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
Reviewed-by: NJan Kara <jack@suse.com>
Signed-off-by: NDan Williams <dan.j.williams@intel.com>

5a023cdb

fs: clean up the flags definition in uapi/linux/fs.h · 68ce7bfc

由 Theodore Ts'o 提交于 1月 08, 2016

Add an explanation for the flags used by FS_IOC_[GS]ETFLAGS and remind
people that changes should be revised by linux-fsdevel and linux-api.

Add flags that are used on-disk for ext4, and remove FS_DIRECTIO_FL
since it was used only by gfs2 and support was removed in 2008 in
commit c9f6a6bb ("The ability to mark files for direct i/o access
when opened normally is both unused and pointless, so this patch
removes support for that feature.")  Now we have _two_ remaining flags
left.  But since we want to discourage people from assigning new
flags, that's OK.
Signed-off-by: NTheodore Ts'o <tytso@mit.edu>

68ce7bfc

08 1月, 2016 2 次提交

netfilter: nft_ct: add byte/packet counter support · 48f66c90

由 Florian Westphal 提交于 1月 07, 2016

If the accounting extension isn't present, we'll return a counter
value of 0.
Signed-off-by: NFlorian Westphal <fw@strlen.de>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

48f66c90

netfilter: nf_tables: Add new attributes into nft_set to store user data. · e6d8ecac

由 Carlos Falgueras García 提交于 1月 05, 2016

User data is stored at after 'nft_set_ops' private data into 'data[]'
flexible array. The field 'udata' points to user data and 'udlen' stores
its length.

Add new flag NFTA_SET_USERDATA.
Signed-off-by: NCarlos Falgueras García <carlosfg@riseup.net>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

e6d8ecac

07 1月, 2016 3 次提交

KVM: s390: implement the RI support of guest · c6e5f166

由 Fan Zhang 提交于 1月 07, 2016

This patch adds runtime instrumentation support for KVM guest. We need to
setup a save area for the runtime instrumentation-controls control block(RICCB)
and implement the necessary interfaces to live migrate the guest settings.

We setup the sie control block in a way, that the runtime
instrumentation instructions of a guest are handled by hardware.

We also add a capability KVM_CAP_S390_RI to make this feature opt-in as
it needs migration support.
Signed-off-by: NFan Zhang <zhangfan@linux.vnet.ibm.com>
Acked-by: NCornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>

c6e5f166

xen/gntdev: add ioctl for grant copy · a4cdb556

由 David Vrabel 提交于 12月 02, 2014

Add IOCTL_GNTDEV_GRANT_COPY to allow applications to copy between user
space buffers and grant references.

This interface is similar to the GNTTABOP_copy hypercall ABI except
the local buffers are provided using a virtual address (instead of a
GFN and offset).  To avoid userspace from having to page align its
buffers the driver will use two or more ops if required.

If the ioctl returns 0, the application must check the status of each
segment with the segments status field.  If the ioctl returns a -ve
error code (EINVAL or EFAULT), the status of individual ops is
undefined.
Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
Reviewed-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>

a4cdb556

cxlflash: Fix to avoid virtual LUN failover failure · d5e26bb1

由 Matthew R. Ochs 提交于 12月 14, 2015

Applications which use virtual LUN's that are backed by a physical LUN
over both adapter ports may experience an I/O failure in the event of a
link loss (e.g. cable pull).

Virtual LUNs may be accessed through one or both ports of the adapter.
This access is encoded in the translation entries that comprise the
virtual LUN and used by the AFU for load-balancing I/O and handling
failover scenarios. In a link loss scenario, even though the AFU is able
to maintain connectivity to the LUN, it is up to the application to
retry the failed I/O. When applications are unaware of the virtual LUN's
underlying topology, they are unable to make a sound decision of when to
retry an I/O and therefore are forced to make their reaction to a failed
I/O absolute. The result is either a failure to retry I/O or increased
latency for scenarios where a retry is pointless.

To remedy this scenario, provide feedback back to the application on
virtual LUN creation as to which ports the LUN may be accessed. LUN's
spanning both ports are candidates for a retry in a presence of an I/O
failure.
Signed-off-by: NMatthew R. Ochs <mrochs@linux.vnet.ibm.com>
Acked-by: NManoj Kumar <manoj@linux.vnet.ibm.com>
Reviewed-by: NUma Krishnan <ukrishn@linux.vnet.ibm.com>
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>

d5e26bb1

06 1月, 2016 2 次提交

drivers: md: use ktime_get_real_seconds() · 9ebc6ef1

由 Deepa Dinamani 提交于 12月 21, 2015

get_seconds() API is not y2038 safe on 32 bit systems and the API
is deprecated. Replace it with calls to ktime_get_real_seconds()
API instead. Change mddev structure types to time64_t accordingly.

32 bit signed timestamps will overflow in the year 2038.

Change the user interface mdu_array_info_s structure timestamps:
ctime and utime values used in ioctls GET_ARRAY_INFO and
SET_ARRAY_INFO to unsigned int. This will extend the field to last
until the year 2106.
The long term plan is to get rid of ctime and utime values in
this structure as this information can be read from the on-disk
meta data directly.

Clamp the tim64_t timestamps to positive values with a max of U32_MAX
when returning from GET_ARRAY_INFO ioctl to accommodate above changes
in the data type of timestamps to unsigned int.

v0.90 on disk meta data uses u32 for maintaining time stamps.
So this will also last until year 2106.
Assumption is that the usage of v0.90 will be deprecated by
year 2106.

Timestamp fields in the on disk meta data for v1.0 version already
use 64 bit data types. Remove the truncation of the bits while
writing to or reading from these from the disk.
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Reviewed-by: NArnd Bergmann <arnd@arndb.de>
Signed-off-by: NNeilBrown <neilb@suse.com>

9ebc6ef1

include/uapi/linux/sockios.h: mark SIOCRTMSG unused · 2fbf5758

由 xypron.glpk@gmx.de 提交于 1月 05, 2016

IOCTL SIOCRTMSG does nothing but return EINVAL.

So comment it as unused.

SIOCRTMSG is only used in:
* net/ipv4/af_inet.c
* include/uapi/linux/sockios.h

inet_ioctl calls ip_rt_ioctl.
ip_rt_ioctl only handles SIOCADDRT and SIOCDELRT and returns -EINVAL
otherwise.
Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2fbf5758

05 1月, 2016 2 次提交

soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF · 538950a1

由 Craig Gallek 提交于 1月 04, 2016

Expose socket options for setting a classic or extended BPF program
for use when selecting sockets in an SO_REUSEPORT group.  These options
can be used on the first socket to belong to a group before bind or
on any socket in the group after bind.

This change includes refactoring of the existing sk_filter code to
allow reuse of the existing BPF filter validation checks.
Signed-off-by: NCraig Gallek <kraig@google.com>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

538950a1

netfilter: nf_tables: add forward expression to the netdev family · 39e6dea2

由 Pablo Neira Ayuso 提交于 11月 25, 2015

You can use this to forward packets from ingress to the egress path of
the specified interface. This provides a fast path to bounce packets
from one interface to another specific destination interface.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>

39e6dea2

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功