1. 30 4月, 2014 2 次提交
  2. 22 4月, 2014 1 次提交
  3. 18 4月, 2014 2 次提交
    • M
      KVM: VMX: speed up wildcard MMIO EVENTFD · 68c3b4d1
      Michael S. Tsirkin 提交于
      With KVM, MMIO is much slower than PIO, due to the need to
      do page walk and emulation. But with EPT, it does not have to be: we
      know the address from the VMCS so if the address is unique, we can look
      up the eventfd directly, bypassing emulation.
      
      Unfortunately, this only works if userspace does not need to match on
      access length and data.  The implementation adds a separate FAST_MMIO
      bus internally. This serves two purposes:
          - minimize overhead for old userspace that does not use eventfd with lengtth = 0
          - minimize disruption in other code (since we don't know the length,
            devices on the MMIO bus only get a valid address in write, this
            way we don't need to touch all devices to teach them to handle
            an invalid length)
      
      At the moment, this optimization only has effect for EPT on x86.
      
      It will be possible to speed up MMIO for NPT and MMU using the same
      idea in the future.
      
      With this patch applied, on VMX MMIO EVENTFD is essentially as fast as PIO.
      I was unable to detect any measureable slowdown to non-eventfd MMIO.
      
      Making MMIO faster is important for the upcoming virtio 1.0 which
      includes an MMIO signalling capability.
      
      The idea was suggested by Peter Anvin.  Lots of thanks to Gleb for
      pre-review and suggestions.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      68c3b4d1
    • M
      KVM: support any-length wildcard ioeventfd · f848a5a8
      Michael S. Tsirkin 提交于
      It is sometimes benefitial to ignore IO size, and only match on address.
      In hindsight this would have been a better default than matching length
      when KVM_IOEVENTFD_FLAG_DATAMATCH is not set, In particular, this kind
      of access can be optimized on VMX: there no need to do page lookups.
      This can currently be done with many ioeventfds but in a suboptimal way.
      
      However we can't change kernel/userspace ABI without risk of breaking
      some applications.
      Use len = 0 to mean "ignore length for matching" in a more optimal way.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      f848a5a8
  4. 11 4月, 2014 1 次提交
    • K
      NVMe: Retry failed commands with non-fatal errors · edd10d33
      Keith Busch 提交于
      For commands returned with failed status, queue these for resubmission
      and continue retrying them until success or for a limited amount of
      time. The final timeout was arbitrarily chosen so requests can't be
      retried indefinitely.
      
      Since these are requeued on the nvmeq that submitted the command, the
      callbacks have to take an nvmeq instead of an nvme_dev as a parameter
      so that we can use the locked queue to append the iod to retry later.
      
      The nvme_iod conviently can be used to track how long we've been trying
      to successfully complete an iod request. The nvme_iod also provides the
      nvme prp dma mappings, so I had to move a few things around so we can
      keep those mappings.
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      [fixed checkpatch issue with long line]
      Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
      edd10d33
  5. 08 4月, 2014 1 次提交
    • A
      mm, thp: add VM_INIT_DEF_MASK and PRCTL_THP_DISABLE · a0715cc2
      Alex Thorlton 提交于
      Add VM_INIT_DEF_MASK, to allow us to set the default flags for VMs.  It
      also adds a prctl control which allows us to set the THP disable bit in
      mm->def_flags so that VMs will pick up the setting as they are created.
      Signed-off-by: NAlex Thorlton <athorlton@sgi.com>
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Gerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a0715cc2
  6. 06 4月, 2014 1 次提交
  7. 04 4月, 2014 1 次提交
  8. 03 4月, 2014 1 次提交
  9. 02 4月, 2014 2 次提交
    • P
      fuse: Turn writeback cache on · 4d99ff8f
      Pavel Emelyanov 提交于
      Introduce a bit kernel and userspace exchange between each-other on
      the init stage and turn writeback on if the userspace want this and
      mount option 'allow_wbcache' is present (controlled by fusermount).
      
      Also add each writable file into per-inode write list and call the
      generic_file_aio_write to make use of the Linux page cache engine.
      Signed-off-by: NMaxim Patlasov <MPatlasov@parallels.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      4d99ff8f
    • P
      HID: uhid: Add UHID_CREATE2 + UHID_INPUT2 · 4522643a
      Petri Gynther 提交于
      UHID_CREATE2:
      HID report descriptor data (rd_data) is an array in struct uhid_create2_req,
      instead of a pointer. Enables use from languages that don't support pointers,
      e.g. Python.
      
      UHID_INPUT2:
      Data array is the last field of struct uhid_input2_req. Enables userspace to
      write only the required bytes to kernel (ev.type + ev.u.input2.size + the part
      of the data array that matters), instead of the entire struct uhid_input2_req.
      
      Note:
      UHID_CREATE2 increases the total size of struct uhid_event slightly, thus
      increasing the size of messages that are queued for userspace. However, this
      won't affect the userspace processing of these events.
      
      [Jiri Kosina <jkosina@suse.cz>: adjust to hid_get_raw_report() and
      				hid_output_raw_report() API changes]
      Signed-off-by: NPetri Gynther <pgynther@google.com>
      Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      4522643a
  10. 01 4月, 2014 4 次提交
    • M
      vfs: add cross-rename · da1ce067
      Miklos Szeredi 提交于
      If flags contain RENAME_EXCHANGE then exchange source and destination files.
      There's no restriction on the type of the files; e.g. a directory can be
      exchanged with a symlink.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      da1ce067
    • M
      vfs: add RENAME_NOREPLACE flag · 0a7c3937
      Miklos Szeredi 提交于
      If this flag is specified and the target of the rename exists then the
      rename syscall fails with EEXIST.
      
      The VFS does the existence checking, so it is trivial to enable for most
      local filesystems.  This patch only enables it in ext4.
      
      For network filesystems the VFS check is not enough as there may be a race
      between a remote create and the rename, so these filesystems need to handle
      this flag in their ->rename() implementations to ensure atomicity.
      
      Andy writes about why this is useful:
      
      "The trivial answer: to eliminate the race condition from 'mv -i'.
      
      Another answer: there's a common pattern to atomically create a file
      with contents: open a temporary file, write to it, optionally fsync
      it, close it, then link(2) it to the final name, then unlink the
      temporary file.
      
      The reason to use link(2) is because it won't silently clobber the destination.
      
      This is annoying:
       - It requires an extra system call that shouldn't be necessary.
       - It doesn't work on (IMO sensible) filesystems that don't support
      hard links (e.g. vfat).
       - It's not atomic -- there's an intermediate state where both files exist.
       - It's ugly.
      
      The new rename flag will make this totally sensible.
      
      To be fair, on new enough kernels, you can also use O_TMPFILE and
      linkat to achieve the same thing even more cleanly."
      
      Suggested-by: Andy Lutomirski <luto@amacapital.net> 
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Reviewed-by: NJ. Bruce Fields <bfields@redhat.com>
      0a7c3937
    • D
      net-sysfs: expose number of carrier on/off changes · 2d3b479d
      david decotigny 提交于
      This allows to monitor carrier on/off transitions and detect link
      flapping issues:
       - new /sys/class/net/X/carrier_changes
       - new rtnetlink IFLA_CARRIER_CHANGES (getlink)
      
      Tested:
        - grep . /sys/class/net/*/carrier_changes
          + ip link set dev X down/up
          + plug/unplug cable
        - updated iproute2: prints IFLA_CARRIER_CHANGES
        - iproute2 20121211-2 (debian): unchanged behavior
      Signed-off-by: NDavid Decotigny <decot@googlers.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d3b479d
    • F
      net: export NET_ADDR_* values to user-space API · 339e0223
      Florian Fainelli 提交于
      NET_ADDR_* values are exported in the
      /sys/class/net/<iface>/addr_assign_type sysfs attributes, and as such
      constitutes an user-space ABI. Move the NET_ADDR_* definitions from
      include/linux/netdevice.h to include/uapi/linux/netdevice.h
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      339e0223
  11. 29 3月, 2014 1 次提交
  12. 25 3月, 2014 1 次提交
  13. 22 3月, 2014 1 次提交
  14. 21 3月, 2014 3 次提交
  15. 20 3月, 2014 3 次提交
    • A
      audit: Add generic compat syscall support · 4b588411
      AKASHI Takahiro 提交于
      lib/audit.c provides a generic function for auditing system calls.
      This patch extends it for compat syscall support on bi-architectures
      (32/64-bit) by adding lib/compat_audit.c.
      What is required to support this feature are:
       * add asm/unistd32.h for compat system call names
       * select CONFIG_AUDIT_ARCH_COMPAT_GENERIC
      Signed-off-by: NAKASHI Takahiro <takahiro.akashi@linaro.org>
      Acked-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      4b588411
    • W
      audit: Audit proc/<pid>/cmdline aka proctitle · 3f1c8250
      William Roberts 提交于
      During an audit event, cache and print the value of the process's
      proctitle value (proc/<pid>/cmdline). This is useful in situations
      where processes are started via fork'd virtual machines where the
      comm field is incorrect. Often times, setting the comm field still
      is insufficient as the comm width is not very wide and most
      virtual machine "package names" do not fit. Also, during execution,
      many threads have their comm field set as well. By tying it back to
      the global cmdline value for the process, audit records will be more
      complete in systems with these properties. An example of where this
      is useful and applicable is in the realm of Android. With Android,
      their is no fork/exec for VM instances. The bare, preloaded Dalvik
      VM listens for a fork and specialize request. When this request comes
      in, the VM forks, and the loads the specific application (specializing).
      This was done to take advantage of COW and to not require a load of
      basic packages by the VM on very app spawn. When this spawn occurs,
      the package name is set via setproctitle() and shows up in procfs.
      Many of these package names are longer then 16 bytes, the historical
      width of task->comm. Having the cmdline in the audit records will
      couple the application back to the record directly. Also, on my
      Debian development box, some audit records were more useful then
      what was printed under comm.
      
      The cached proctitle is tied to the life-cycle of the audit_context
      structure and is built on demand.
      
      Proctitle is controllable by userspace, and thus should not be trusted.
      It is meant as an aid to assist in debugging. The proctitle event is
      emitted during syscall audits, and can be filtered with auditctl.
      
      Example:
      type=AVC msg=audit(1391217013.924:386): avc:  denied  { getattr } for  pid=1971 comm="mkdir" name="/" dev="selinuxfs" ino=1 scontext=system_u:system_r:consolekit_t:s0-s0:c0.c255 tcontext=system_u:object_r:security_t:s0 tclass=filesystem
      type=SYSCALL msg=audit(1391217013.924:386): arch=c000003e syscall=137 success=yes exit=0 a0=7f019dfc8bd7 a1=7fffa6aed2c0 a2=fffffffffff4bd25 a3=7fffa6aed050 items=0 ppid=1967 pid=1971 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=(none) ses=4294967295 comm="mkdir" exe="/bin/mkdir" subj=system_u:system_r:consolekit_t:s0-s0:c0.c255 key=(null)
      type=UNKNOWN[1327] msg=audit(1391217013.924:386):  proctitle=6D6B646972002D70002F7661722F72756E2F636F6E736F6C65
      
      Acked-by: Steve Grubb <sgrubb@redhat.com> (wrt record formating)
      Signed-off-by: NWilliam Roberts <wroberts@tresys.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      3f1c8250
    • P
      isdn: capi: fix "CAPI_VERSION" comment · 2509671d
      Paul Bolle 提交于
      Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      2509671d
  16. 15 3月, 2014 1 次提交
  17. 13 3月, 2014 7 次提交
  18. 11 3月, 2014 2 次提交
  19. 08 3月, 2014 1 次提交
  20. 07 3月, 2014 3 次提交
  21. 06 3月, 2014 1 次提交
    • J
      netfilter: ipset: add forceadd kernel support for hash set types · 07cf8f5a
      Josh Hunt 提交于
      Adds a new property for hash set types, where if a set is created
      with the 'forceadd' option and the set becomes full the next addition
      to the set may succeed and evict a random entry from the set.
      
      To keep overhead low eviction is done very simply. It checks to see
      which bucket the new entry would be added. If the bucket's pos value
      is non-zero (meaning there's at least one entry in the bucket) it
      replaces the first entry in the bucket. If pos is zero, then it continues
      down the normal add process.
      
      This property is useful if you have a set for 'ban' lists where it may
      not matter if you release some entries from the set early.
      Signed-off-by: NJosh Hunt <johunt@akamai.com>
      Signed-off-by: NJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      07cf8f5a