1. 26 11月, 2015 1 次提交
    • A
      kvm/x86: Hyper-V synthetic interrupt controller · 5c919412
      Andrey Smetanin 提交于
      SynIC (synthetic interrupt controller) is a lapic extension,
      which is controlled via MSRs and maintains for each vCPU
       - 16 synthetic interrupt "lines" (SINT's); each can be configured to
         trigger a specific interrupt vector optionally with auto-EOI
         semantics
       - a message page in the guest memory with 16 256-byte per-SINT message
         slots
       - an event flag page in the guest memory with 16 2048-bit per-SINT
         event flag areas
      
      The host triggers a SINT whenever it delivers a new message to the
      corresponding slot or flips an event flag bit in the corresponding area.
      The guest informs the host that it can try delivering a message by
      explicitly asserting EOI in lapic or writing to End-Of-Message (EOM)
      MSR.
      
      The userspace (qemu) triggers interrupts and receives EOM notifications
      via irqfd with resampler; for that, a GSI is allocated for each
      configured SINT, and irq_routing api is extended to support GSI-SINT
      mapping.
      
      Changes v4:
      * added activation of SynIC by vcpu KVM_ENABLE_CAP
      * added per SynIC active flag
      * added deactivation of APICv upon SynIC activation
      
      Changes v3:
      * added KVM_CAP_HYPERV_SYNIC and KVM_IRQ_ROUTING_HV_SINT notes into
      docs
      
      Changes v2:
      * do not use posted interrupts for Hyper-V SynIC AutoEOI vectors
      * add Hyper-V SynIC vectors into EOI exit bitmap
      * Hyper-V SyniIC SINT msr write logic simplified
      Signed-off-by: NAndrey Smetanin <asmetanin@virtuozzo.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      CC: Gleb Natapov <gleb@kernel.org>
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: Roman Kagan <rkagan@virtuozzo.com>
      CC: Denis V. Lunev <den@openvz.org>
      CC: qemu-devel@nongnu.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5c919412
  2. 06 11月, 2015 2 次提交
    • E
      mm: mlock: add mlock flags to enable VM_LOCKONFAULT usage · b0f205c2
      Eric B Munson 提交于
      The previous patch introduced a flag that specified pages in a VMA should
      be placed on the unevictable LRU, but they should not be made present when
      the area is created.  This patch adds the ability to set this state via
      the new mlock system calls.
      
      We add MLOCK_ONFAULT for mlock2 and MCL_ONFAULT for mlockall.
      MLOCK_ONFAULT will set the VM_LOCKONFAULT modifier for VM_LOCKED.
      MCL_ONFAULT should be used as a modifier to the two other mlockall flags.
      When used with MCL_CURRENT, all current mappings will be marked with
      VM_LOCKED | VM_LOCKONFAULT.  When used with MCL_FUTURE, the mm->def_flags
      will be marked with VM_LOCKED | VM_LOCKONFAULT.  When used with both
      MCL_CURRENT and MCL_FUTURE, all current mappings and mm->def_flags will be
      marked with VM_LOCKED | VM_LOCKONFAULT.
      
      Prior to this patch, mlockall() will unconditionally clear the
      mm->def_flags any time it is called without MCL_FUTURE.  This behavior is
      maintained after adding MCL_ONFAULT.  If a call to mlockall(MCL_FUTURE) is
      followed by mlockall(MCL_CURRENT), the mm->def_flags will be cleared and
      new VMAs will be unlocked.  This remains true with or without MCL_ONFAULT
      in either mlockall() invocation.
      
      munlock() will unconditionally clear both vma flags.  munlockall()
      unconditionally clears for VMA flags on all VMAs and in the mm->def_flags
      field.
      Signed-off-by: NEric B Munson <emunson@akamai.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b0f205c2
    • E
      mm: mlock: add new mlock system call · a8ca5d0e
      Eric B Munson 提交于
      With the refactored mlock code, introduce a new system call for mlock.
      The new call will allow the user to specify what lock states are being
      added.  mlock2 is trivial at the moment, but a follow on patch will add a
      new mlock state making it useful.
      Signed-off-by: NEric B Munson <emunson@akamai.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Guenter Roeck <linux@roeck-us.net>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a8ca5d0e
  3. 05 11月, 2015 2 次提交
    • T
      drm: Use userspace compatible type in fourcc_mod_code macro · 6172180c
      Tvrtko Ursulin 提交于
      __u64 should be used instead of u64.
      
      Feature originally added in:
      
      commit e3eb3250
      Author: Rob Clark <robdclark@gmail.com>
      Date:   Thu Feb 5 14:41:52 2015 +0000
      
          drm: add support for tiled/compressed/etc modifier in addfb2
      Signed-off-by: NTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Rob Clark <robdclark@gmail.com>
      Cc: Daniel Stone <daniels@collabora.com>
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: dri-devel@lists.freedesktop.org
      Cc: stable@vger.kernel.org # v4.1+
      Fixes: e3eb3250 ("drm: add support for tiled/compressed/etc modifier in addfb2")
      Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Link: http://patchwork.freedesktop.org/patch/msgid/1442999431-28568-1-git-send-email-tvrtko.ursulin@linux.intel.comSigned-off-by: NJani Nikula <jani.nikula@intel.com>
      6172180c
    • A
      vfio: Include No-IOMMU mode · 033291ec
      Alex Williamson 提交于
      There is really no way to safely give a user full access to a DMA
      capable device without an IOMMU to protect the host system.  There is
      also no way to provide DMA translation, for use cases such as device
      assignment to virtual machines.  However, there are still those users
      that want userspace drivers even under those conditions.  The UIO
      driver exists for this use case, but does not provide the degree of
      device access and programming that VFIO has.  In an effort to avoid
      code duplication, this introduces a No-IOMMU mode for VFIO.
      
      This mode requires building VFIO with CONFIG_VFIO_NOIOMMU and enabling
      the "enable_unsafe_noiommu_mode" option on the vfio driver.  This
      should make it very clear that this mode is not safe.  Additionally,
      CAP_SYS_RAWIO privileges are necessary to work with groups and
      containers using this mode.  Groups making use of this support are
      named /dev/vfio/noiommu-$GROUP and can only make use of the special
      VFIO_NOIOMMU_IOMMU for the container.  Use of this mode, specifically
      binding a device without a native IOMMU group to a VFIO bus driver
      will taint the kernel and should therefore not be considered
      supported.  This patch includes no-iommu support for the vfio-pci bus
      driver only.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      033291ec
  4. 03 11月, 2015 2 次提交
    • B
      drm/nouveau: remove unnecessary usage of object handles · fcf3f91c
      Ben Skeggs 提交于
      No longer required in a lot of cases, as objects are identified over NVIF
      via an alternate mechanism since the rework.
      Signed-off-by: NBen Skeggs <bskeggs@redhat.com>
      fcf3f91c
    • D
      bpf: add support for persistent maps/progs · b2197755
      Daniel Borkmann 提交于
      This work adds support for "persistent" eBPF maps/programs. The term
      "persistent" is to be understood that maps/programs have a facility
      that lets them survive process termination. This is desired by various
      eBPF subsystem users.
      
      Just to name one example: tc classifier/action. Whenever tc parses
      the ELF object, extracts and loads maps/progs into the kernel, these
      file descriptors will be out of reach after the tc instance exits.
      So a subsequent tc invocation won't be able to access/relocate on this
      resource, and therefore maps cannot easily be shared, f.e. between the
      ingress and egress networking data path.
      
      The current workaround is that Unix domain sockets (UDS) need to be
      instrumented in order to pass the created eBPF map/program file
      descriptors to a third party management daemon through UDS' socket
      passing facility. This makes it a bit complicated to deploy shared
      eBPF maps or programs (programs f.e. for tail calls) among various
      processes.
      
      We've been brainstorming on how we could tackle this issue and various
      approches have been tried out so far, which can be read up further in
      the below reference.
      
      The architecture we eventually ended up with is a minimal file system
      that can hold map/prog objects. The file system is a per mount namespace
      singleton, and the default mount point is /sys/fs/bpf/. Any subsequent
      mounts within a given namespace will point to the same instance. The
      file system allows for creating a user-defined directory structure.
      The objects for maps/progs are created/fetched through bpf(2) with
      two new commands (BPF_OBJ_PIN/BPF_OBJ_GET). I.e. a bpf file descriptor
      along with a pathname is being passed to bpf(2) that in turn creates
      (we call it eBPF object pinning) the file system nodes. Only the pathname
      is being passed to bpf(2) for getting a new BPF file descriptor to an
      existing node. The user can use that to access maps and progs later on,
      through bpf(2). Removal of file system nodes is being managed through
      normal VFS functions such as unlink(2), etc. The file system code is
      kept to a very minimum and can be further extended later on.
      
      The next step I'm working on is to add dump eBPF map/prog commands
      to bpf(2), so that a specification from a given file descriptor can
      be retrieved. This can be used by things like CRIU but also applications
      can inspect the meta data after calling BPF_OBJ_GET.
      
      Big thanks also to Alexei and Hannes who significantly contributed
      in the design discussion that eventually let us end up with this
      architecture here.
      
      Reference: https://lkml.org/lkml/2015/10/15/925Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b2197755
  5. 01 11月, 2015 2 次提交
  6. 30 10月, 2015 3 次提交
  7. 29 10月, 2015 1 次提交
    • M
      lightnvm: Support for Open-Channel SSDs · cd9e9808
      Matias Bjørling 提交于
      Open-channel SSDs are devices that share responsibilities with the host
      in order to implement and maintain features that typical SSDs keep
      strictly in firmware. These include (i) the Flash Translation Layer
      (FTL), (ii) bad block management, and (iii) hardware units such as the
      flash controller, the interface controller, and large amounts of flash
      chips. In this way, Open-channels SSDs exposes direct access to their
      physical flash storage, while keeping a subset of the internal features
      of SSDs.
      
      LightNVM is a specification that gives support to Open-channel SSDs
      LightNVM allows the host to manage data placement, garbage collection,
      and parallelism. Device specific responsibilities such as bad block
      management, FTL extensions to support atomic IOs, or metadata
      persistence are still handled by the device.
      
      The implementation of LightNVM consists of two parts: core and
      (multiple) targets. The core implements functionality shared across
      targets. This is initialization, teardown and statistics. The targets
      implement the interface that exposes physical flash to user-space
      applications. Examples of such targets include key-value store,
      object-store, as well as traditional block devices, which can be
      application-specific.
      
      Contributions in this patch from:
      
        Javier Gonzalez <jg@lightnvm.io>
        Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
        Jesper Madsen <jmad@itu.dk>
      Signed-off-by: NMatias Bjørling <m@bjorling.me>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      cd9e9808
  8. 28 10月, 2015 2 次提交
    • T
      seccomp, ptrace: add support for dumping seccomp filters · f8e529ed
      Tycho Andersen 提交于
      This patch adds support for dumping a process' (classic BPF) seccomp
      filters via ptrace.
      
      PTRACE_SECCOMP_GET_FILTER allows the tracer to dump the user's classic BPF
      seccomp filters. addr should be an integer which represents the ith seccomp
      filter (0 is the most recently installed filter). data should be a struct
      sock_filter * with enough room for the ith filter, or NULL, in which case
      the filter is not saved. The return value for this command is the number of
      BPF instructions the program represents, or negative in the case of errors.
      Command specific errors are ENOENT: which indicates that there is no ith
      filter in this seccomp tree, and EMEDIUMTYPE, which indicates that the ith
      filter was not installed as a classic BPF filter.
      
      A caveat with this approach is that there is no way to get explicitly at
      the heirarchy of seccomp filters, and users need to memcmp() filters to
      decide which are inherited. This means that a task which installs two of
      the same filter can potentially confuse users of this interface.
      
      v2: * make save_orig const
          * check that the orig_prog exists (not necessary right now, but when
             grows eBPF support it will be)
          * s/n/filter_off and make it an unsigned long to match ptrace
          * count "down" the tree instead of "up" when passing a filter offset
      
      v3: * don't take the current task's lock for inspecting its seccomp mode
          * use a 0x42** constant for the ptrace command value
      
      v4: * don't copy to userspace while holding spinlocks
      
      v5: * add another condition to WARN_ON
      
      v6: * rebase on net-next
      Signed-off-by: NTycho Andersen <tycho.andersen@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      CC: Will Drewry <wad@chromium.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      CC: Andy Lutomirski <luto@amacapital.net>
      CC: Pavel Emelyanov <xemul@parallels.com>
      CC: Serge E. Hallyn <serge.hallyn@ubuntu.com>
      CC: Alexei Starovoitov <ast@kernel.org>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f8e529ed
    • S
      Input: add userio module · 5523662e
      Stephen Chandler Paul 提交于
      Debugging input devices, specifically laptop touchpads, can be tricky
      without having the physical device handy. Here we try to remedy that
      with userio. This module allows an application to connect to a character
      device provided by the kernel, and emulate any serio device. In
      combination with userspace programs that can record PS/2 devices and
      replay them through the /dev/userio device, this allows developers to
      debug driver issues on the PS/2 level with devices simply by requesting
      a recording from the user experiencing the issue without having to have
      the physical hardware in front of them.
      Signed-off-by: NStephen Chandler Paul <cpaul@redhat.com>
      Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      5523662e
  9. 27 10月, 2015 6 次提交
  10. 26 10月, 2015 1 次提交
    • J
      mmc: block: Add new ioctl to send multi commands · a5f5774c
      Jon Hunter 提交于
      Certain eMMC devices allow vendor specific device information to be read
      via a sequence of vendor commands. These vendor commands must be issued
      in sequence and an atomic fashion. One way to support this would be to
      add an ioctl function for sending a sequence of commands to the device
      atomically as proposed here. These multi commands are simple array of
      the existing mmc_ioc_cmd structure.
      
      The structure passed via the ioctl uses a __u64 type to specify the number
      of commands (so that the structure is aligned on a 64-bit boundary) and a
      zero length array as a header for list of commands to be issued. The
      maximum number of commands that can be sent is determined by
      MMC_IOC_MAX_CMDS (which defaults to 255 and should be more than
      sufficient).
      
      This based upon work by Seshagiri Holi <sholi@nvidia.com>.
      Signed-off-by: NSeshagiri Holi <sholi@nvidia.com>
      Signed-off-by: NJon Hunter <jonathanh@nvidia.com>
      Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
      a5f5774c
  11. 24 10月, 2015 5 次提交
    • S
      raid5: add basic stripe log · f6bed0ef
      Shaohua Li 提交于
      This introduces a simple log for raid5. Data/parity writing to raid
      array first writes to the log, then write to raid array disks. If
      crash happens, we can recovery data from the log. This can speed up
      raid resync and fix write hole issue.
      
      The log structure is pretty simple. Data/meta data is stored in block
      unit, which is 4k generally. It has only one type of meta data block.
      The meta data block can track 3 types of data, stripe data, stripe
      parity and flush block. MD superblock will point to the last valid
      meta data block. Each meta data block has checksum/seq number, so
      recovery can scan the log correctly. We store a checksum of stripe
      data/parity to the metadata block, so meta data and stripe data/parity
      can be written to log disk together. otherwise, meta data write must
      wait till stripe data/parity is finished.
      
      For stripe data, meta data block will record stripe data sector and
      size. Currently the size is always 4k. This meta data record can be made
      simpler if we just fix write hole (eg, we can record data of a stripe's
      different disks together), but this format can be extended to support
      caching in the future, which must record data address/size.
      
      For stripe parity, meta data block will record stripe sector. It's
      size should be 4k (for raid5) or 8k (for raid6). We always store p
      parity first. This format should work for caching too.
      
      flush block indicates a stripe is in raid array disks. Fixing write
      hole doesn't need this type of meta data, it's for caching extension.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      f6bed0ef
    • S
      md: override md superblock recovery_offset for journal device · 3069aa8d
      Shaohua Li 提交于
      Journal device stores data in a log structure. We need record the log
      start. Here we override md superblock recovery_offset for this purpose.
      This field of a journal device is meaningless otherwise.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      3069aa8d
    • S
      MD: add a new disk role to present write journal device · bac624f3
      Song Liu 提交于
      Next patches will use a disk as raid5/6 journaling. We need a new disk
      role to present the journal device and add MD_FEATURE_JOURNAL to
      feature_map for backward compability.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      bac624f3
    • S
      MD: replace special disk roles with macros · c4d4c91b
      Song Liu 提交于
      Add the following two macros for special roles: spare and faulty
      
      MD_DISK_ROLE_SPARE	0xffff
      MD_DISK_ROLE_FAULTY	0xfffe
      
      Add MD_DISK_ROLE_MAX	0xff00 as the maximal possible regular role,
      and minimal value of special role.
      Signed-off-by: NSong Liu <songliubraving@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      c4d4c91b
    • J
      i2c-dev: Fix typo in ioctl name reference · c57d3e7a
      Jean Delvare 提交于
      The ioctl is named I2C_RDWR for "I2C read/write". But references to it
      were misspelled "rdrw". Fix them.
      Signed-off-by: NJean Delvare <jdelvare@suse.de>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      c57d3e7a
  12. 23 10月, 2015 3 次提交
  13. 22 10月, 2015 6 次提交
  14. 21 10月, 2015 4 次提交