1. 15 1月, 2010 1 次提交
    • M
      vhost_net: a kernel-level virtio server · 3a4d5c94
      Michael S. Tsirkin 提交于
      What it is: vhost net is a character device that can be used to reduce
      the number of system calls involved in virtio networking.
      Existing virtio net code is used in the guest without modification.
      
      There's similarity with vringfd, with some differences and reduced scope
      - uses eventfd for signalling
      - structures can be moved around in memory at any time (good for
        migration, bug work-arounds in userspace)
      - write logging is supported (good for migration)
      - support memory table and not just an offset (needed for kvm)
      
      common virtio related code has been put in a separate file vhost.c and
      can be made into a separate module if/when more backends appear.  I used
      Rusty's lguest.c as the source for developing this part : this supplied
      me with witty comments I wouldn't be able to write myself.
      
      What it is not: vhost net is not a bus, and not a generic new system
      call. No assumptions are made on how guest performs hypercalls.
      Userspace hypervisors are supported as well as kvm.
      
      How it works: Basically, we connect virtio frontend (configured by
      userspace) to a backend. The backend could be a network device, or a tap
      device.  Backend is also configured by userspace, including vlan/mac
      etc.
      
      Status: This works for me, and I haven't see any crashes.
      Compared to userspace, people reported improved latency (as I save up to
      4 system calls per packet), as well as better bandwidth and CPU
      utilization.
      
      Features that I plan to look at in the future:
      - mergeable buffers
      - zero copy
      - scalability tuning: figure out the best threading model to use
      
      Note on RCU usage (this is also documented in vhost.h, near
      private_pointer which is the value protected by this variant of RCU):
      what is happening is that the rcu_dereference() is being used in a
      workqueue item.  The role of rcu_read_lock() is taken on by the start of
      execution of the workqueue item, of rcu_read_unlock() by the end of
      execution of the workqueue item, and of synchronize_rcu() by
      flush_workqueue()/flush_work(). In the future we might need to apply
      some gcc attribute or sparse annotation to the function passed to
      INIT_WORK(). Paul's ack below is for this RCU usage.
      
      (Includes fixes by Alan Cox <alan@linux.intel.com>,
      David L Stevens <dlstevens@us.ibm.com>,
      Chris Wright <chrisw@redhat.com>)
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Acked-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a4d5c94
  2. 12 12月, 2009 2 次提交
  3. 30 10月, 2009 1 次提交
    • S
      define convenient securebits masks for prctl users (v2) · 5975c725
      Serge E. Hallyn 提交于
      Hi James, would you mind taking the following into
      security-testing?
      
      The securebits are used by passing them to prctl with the
      PR_{S,G}ET_SECUREBITS commands.  But the defines must be
      shifted to be used in prctl, which begs to be confused and
      misused by userspace.  So define some more convenient
      values for userspace to specify.  This way userspace does
      
      	prctl(PR_SET_SECUREBITS, SECBIT_NOROOT);
      
      instead of
      
      	prctl(PR_SET_SECUREBITS, 1 << SECURE_NOROOT);
      
      (Thanks to Michael for the idea)
      
      This patch also adds include/linux/securebits to the installed headers.
      Then perhaps it can be included by glibc's sys/prctl.h.
      
      Changelog:
      	Oct 29: Stephen Rothwell points out that issecure can
      		be under __KERNEL__.
      	Oct 14: (Suggestions by Michael Kerrisk):
      		1. spell out SETUID in SECBIT_NO_SETUID*
      		2. SECBIT_X_LOCKED does not imply SECBIT_X
      		3. add definitions for keepcaps
              Oct 14: As suggested by Michael Kerrisk, don't
      		use SB_* as that convention is already in
      		use.  Use SECBIT_ prefix instead.
      Signed-off-by: NSerge E. Hallyn <serue@us.ibm.com>
      Acked-by: NAndrew G. Morgan <morgan@kernel.org>
      Acked-by: NMichael Kerrisk <mtk.manpages@gmail.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Cc: James Morris <jmorris@namei.org>
      Signed-off-by: NJames Morris <jmorris@namei.org>
      5975c725
  4. 22 10月, 2009 1 次提交
    • C
      virtio: let header files include virtio_ids.h · e95646c3
      Christian Borntraeger 提交于
      Rusty,
      
      commit 3ca4f5ca
          virtio: add virtio IDs file
      moved all device IDs into a single file. While the change itself is
      a very good one, it can break userspace applications. For example
      if a userspace tool wanted to get the ID of virtio_net it used to
      include virtio_net.h. This does no longer work, since virtio_net.h
      does not include virtio_ids.h.
      This patch moves all "#include <linux/virtio_ids.h>" from the C
      files into the header files, making the header files compatible with
      the old ones.
      
      In addition, this patch exports virtio_ids.h to userspace.
      
      CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      e95646c3
  5. 06 10月, 2009 1 次提交
  6. 10 9月, 2009 1 次提交
  7. 22 6月, 2009 1 次提交
    • J
      dm raid1: add userspace log · f5db4af4
      Jonthan Brassow 提交于
      This patch contains a device-mapper mirror log module that forwards
      requests to userspace for processing.
      
      The structures used for communication between kernel and userspace are
      located in include/linux/dm-log-userspace.h.  Due to the frequency,
      diversity, and 2-way communication nature of the exchanges between
      kernel and userspace, 'connector' was chosen as the interface for
      communication.
      
      The first log implementations written in userspace - "clustered-disk"
      and "clustered-core" - support clustered shared storage.   A userspace
      daemon (in the LVM2 source code repository) uses openAIS/corosync to
      process requests in an ordered fashion with the rest of the nodes in the
      cluster so as to prevent log state corruption.  Other implementations
      with no association to LVM or openAIS/corosync, are certainly possible.
      
      (Imagine if two machines are writing to the same region of a mirror.
      They would both mark the region dirty, but you need a cluster-aware
      entity that can handle properly marking the region clean when they are
      done.  Otherwise, you might clear the region when the first machine is
      done, not the second.)
      Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
      Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f5db4af4
  8. 19 6月, 2009 1 次提交
    • R
      LinuxPPS: core support · eae9d2ba
      Rodolfo Giometti 提交于
      This patch adds the kernel side of the PPS support currently named
      "LinuxPPS".
      
      PPS means "pulse per second" and a PPS source is just a device which
      provides a high precision signal each second so that an application can
      use it to adjust system clock time.
      
      Common use is the combination of the NTPD as userland program with a GPS
      receiver as PPS source to obtain a wallclock-time with sub-millisecond
      synchronisation to UTC.
      
      To obtain this goal the userland programs shoud use the PPS API
      specification (RFC 2783 - Pulse-Per-Second API for UNIX-like Operating
      Systems, Version 1.0) which in part is implemented by this patch.  It
      provides a set of chars devices, one per PPS source, which can be used to
      get the time signal.  The RFC's functions can be implemented by accessing
      to these char devices.
      Signed-off-by: NRodolfo Giometti <giometti@linux.it>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Greg KH <greg@kroah.com>
      Cc: Randy Dunlap <randy.dunlap@oracle.com>
      Cc: Kay Sievers <kay.sievers@vrfy.org>
      Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Michael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eae9d2ba
  9. 12 6月, 2009 1 次提交
  10. 04 6月, 2009 1 次提交
    • J
      rfkill: rewrite · 19d337df
      Johannes Berg 提交于
      This patch completely rewrites the rfkill core to address
      the following deficiencies:
      
       * all rfkill drivers need to implement polling where necessary
         rather than having one central implementation
      
       * updating the rfkill state cannot be done from arbitrary
         contexts, forcing drivers to use schedule_work and requiring
         lots of code
      
       * rfkill drivers need to keep track of soft/hard blocked
         internally -- the core should do this
      
       * the rfkill API has many unexpected quirks, for example being
         asymmetric wrt. alloc/free and register/unregister
      
       * rfkill can call back into a driver from within a function the
         driver called -- this is prone to deadlocks and generally
         should be avoided
      
       * rfkill-input pointlessly is a separate module
      
       * drivers need to #ifdef rfkill functions (unless they want to
         depend on or select RFKILL) -- rfkill should provide inlines
         that do nothing if it isn't compiled in
      
       * the rfkill structure is not opaque -- drivers need to initialise
         it correctly (lots of sanity checking code required) -- instead
         force drivers to pass the right variables to rfkill_alloc()
      
       * the documentation is hard to read because it always assumes the
         reader is completely clueless and contains way TOO MANY CAPS
      
       * the rfkill code needlessly uses a lot of locks and atomic
         operations in locked sections
      
       * fix LED trigger to actually change the LED when the radio state
         changes -- this wasn't done before
      Tested-by: NAlan Jenkins <alan-jenkins@tuffmail.co.uk>
      Signed-off-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br> [thinkpad]
      Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
      Signed-off-by: NJohn W. Linville <linville@tuxdriver.com>
      19d337df
  11. 09 5月, 2009 1 次提交
  12. 03 4月, 2009 1 次提交
  13. 30 3月, 2009 2 次提交
  14. 14 3月, 2009 1 次提交
  15. 01 3月, 2009 1 次提交
  16. 03 2月, 2009 1 次提交
    • D
      net: Fix userland breakage wrt. linux/if_tunnel.h · 0afd4a21
      David S. Miller 提交于
      Reported by Andrew Walrond <andrew@walrond.org>
      
      Changeset c19e654d
      ("gre: Add netlink interface") added an include
      of linux/ip.h to linux/if_tunnel.h
      
      We can't really let that get exposed to userspace
      because this conflicts with types defined in netinet/ip.h
      which userland is almost certainly going to have included
      either explicitly or implicitly.
      
      So guard this include with a __KERNEL__ ifdef.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0afd4a21
  17. 30 1月, 2009 1 次提交
  18. 08 1月, 2009 1 次提交
  19. 07 1月, 2009 1 次提交
  20. 06 1月, 2009 2 次提交
  21. 12 12月, 2008 1 次提交
  22. 20 10月, 2008 1 次提交
  23. 11 10月, 2008 1 次提交
  24. 09 10月, 2008 1 次提交
    • S
      include blktrace_api.h in headers_install · c0ddffa8
      Sven Schuetz 提交于
      This header file is of interest for user space programming, i.e.
      for tools that process blktrace data.
      
      We would like to use it for a tool on-top of blktrace which processes
      data provided by blktrace. For this purpose, it would be helpful
      if the blktrace API would make it to /usr/include/linux.
      
      The git tree for the blktrace tools comes with its own copy of this header
      file. I didn't manage to replace that copy with the file generated
      by the patch below yet. A few more cleanups would be needed.
      For example, the blktrace ioctl numbers, which are currently defined in
      usr/include/fs.h, might need to be moved. Should be feasible, though.
      Signed-off-by: NSven Schuetz <sven@linux.vnet.ibm.com>
      Signed-off-by: NMartin Peschke <mp3@de.ibm.com>
      Signed-off-by: NJens Axboe <jens.axboe@oracle.com>
      c0ddffa8
  25. 30 9月, 2008 1 次提交
  26. 05 9月, 2008 1 次提交
  27. 30 8月, 2008 1 次提交
    • D
      net: Unbreak userspace usage of linux/mroute.h · 7c19a3d2
      David S. Miller 提交于
      Nothing in linux/pim.h should be exported to userspace.
      
      This should fix the XORP build failure reported by
      Jose Calhariz, the debain package maintainer.
      
      Nothing originally in linux/mroute.h was exported to userspace
      ever, but some of this stuff started to be when it was moved into
      this new linux/pim.h, and that was wrong.  If we didn't provide these
      definitions for 10 years we can reasonably expect that applications
      defined this stuff locally or used GLIBC headers providing the
      protocol definitions.  And as such the only result of this can
      be conflict and userland build breakage.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c19a3d2
  28. 17 8月, 2008 1 次提交
  29. 12 8月, 2008 1 次提交
  30. 01 8月, 2008 1 次提交
    • J
      ipvs: Move userspace definitions to include/linux/ip_vs.h · bc4768eb
      Julius Volz 提交于
      Current versions of ipvsadm include "/usr/src/linux/include/net/ip_vs.h"
      directly. This file also contains kernel-only definitions. Normally, public
      definitions should live in include/linux, so this patch moves the
      definitions shared with userspace to a new file, "include/linux/ip_vs.h".
      
      This also removes the unused NFC_IPVS_PROPERTY bitmask, which was once
      used to point into skb->nfcache.
      
      To make old ipvsadms still compile with this, the old header file includes
      the new one.
      
      Thanks to Dave Miller and Horms for noting/adding the missing Kbuild entry
      for the new header file.
      Signed-off-by: NJulius Volz <juliusv@google.com>
      Acked-by: NSimon Horman <horms@verge.net.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bc4768eb
  31. 26 7月, 2008 2 次提交
  32. 28 6月, 2008 1 次提交
  33. 17 6月, 2008 1 次提交
  34. 20 5月, 2008 2 次提交
  35. 02 5月, 2008 1 次提交
    • C
      virtio: export more headers to userspace · 81473132
      Christian Borntraeger 提交于
      Rusty,
      
      is there a reason why we dont export the virtio headers for
      9p, balloon, console, pci, and virtio_ring? kvm uses make sync,
      but I think it is still useful to heave these headers exported
      as they might be useful for other userspace tools.
      
      I dont export virtio.h, because it does not seem to have useful
      information for userspace and it requires scatterlist.h which is
      also not exported. See also my other mail about your "virtio:
      change config to guest endian." patch.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      81473132