1. 28 8月, 2013 1 次提交
  2. 20 7月, 2013 1 次提交
  3. 13 7月, 2013 1 次提交
    • A
      Safer ABI for O_TMPFILE · bb458c64
      Al Viro 提交于
      [suggested by Rasmus Villemoes] make O_DIRECTORY | O_RDWR part of O_TMPFILE;
      that will fail on old kernels in a lot more cases than what I came up with.
      And make sure O_CREAT doesn't get there...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bb458c64
  4. 11 7月, 2013 2 次提交
    • E
      net: rename busy poll socket op and globals · 64b0dc51
      Eliezer Tamir 提交于
      Rename LL_SO to BUSY_POLL_SO
      Rename sysctl_net_ll_{read,poll} to sysctl_busy_{read,poll}
      Fix up users of these variables.
      Fix documentation for sysctl.
      
      a patch for the socket.7  man page will follow separately,
      because of limitations of my mail setup.
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      64b0dc51
    • M
      dm: optimize use SRCU and RCU · 83d5e5b0
      Mikulas Patocka 提交于
      This patch removes "io_lock" and "map_lock" in struct mapped_device and
      "holders" in struct dm_table and replaces these mechanisms with
      sleepable-rcu.
      
      Previously, the code would call "dm_get_live_table" and "dm_table_put" to
      get and release table. Now, the code is changed to call "dm_get_live_table"
      and "dm_put_live_table". dm_get_live_table locks sleepable-rcu and
      dm_put_live_table unlocks it.
      
      dm_get_live_table_fast/dm_put_live_table_fast can be used instead of
      dm_get_live_table/dm_put_live_table. These *_fast functions use
      non-sleepable RCU, so the caller must not block between them.
      
      If the code changes active or inactive dm table, it must call
      dm_sync_table before destroying the old table.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      83d5e5b0
  5. 10 7月, 2013 1 次提交
    • M
      fatfs: add FAT_IOCTL_GET_VOLUME_ID · 6e5b93ee
      Mike Lockwood 提交于
      This patch, originally from Android kernel, adds vfat ioctl command
      FAT_IOCTL_GET_VOLUME_ID, with this command we can get the vfat volume ID
      using following code:
      
      	ioctl(fd, FAT_IOCTL_GET_VOLUME_ID, &volume_ID)
      
      This patch is a modified version of the patch by Mike Lockwood, with
      changes from Dmitry Pervushin, who noticed the original patch makes some
      volume IDs abiguous with error returns: for example, if volume id is
      0xFFFFFDAD, that matches -ENOIOCTLCMD, we get "FFFFFFFF" from the user
      space.
      
      So add a parameter to ioctl to get the correct volume ID.
      
      Android uses vfat volume ID to identify different sd card, when a new sd
      card is inserted to device, android can scan the media on it and pop up
      new contents.
      Signed-off-by: NBintian Wang <bintian.wang@linaro.org>
      Cc: dmitry pervushin <dpervushin@gmail.com>
      Cc: Mike Lockwood <lockwood@android.com>
      Cc: Colin Cross <ccross@android.com>
      Acked-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Sean McNeil <sean@mcneil.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e5b93ee
  6. 09 7月, 2013 2 次提交
  7. 04 7月, 2013 1 次提交
    • A
      ptrace: add ability to get/set signal-blocked mask · 29000cae
      Andrey Vagin 提交于
      crtools uses a parasite code for dumping processes.  The parasite code is
      injected into a process with help PTRACE_SEIZE.
      
      Currently crtools blocks signals from a parasite code.  If a process has
      pending signals, crtools wait while a process handles these signals.
      
      This method is not suitable for stopped tasks.  A stopped task can have a
      few pending signals, when we will try to execute a parasite code, we will
      need to drop SIGSTOP, but all other signals must remain pending, because a
      state of processes must not be changed during checkpointing.
      
      This patch adds two ptrace commands to set/get signal-blocked mask.
      
      I think gdb can use this commands too.
      
      [akpm@linux-foundation.org: be consistent with brace layout]
      Signed-off-by: NAndrey Vagin <avagin@openvz.org>
      Reviewed-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Roland McGrath <roland@redhat.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29000cae
  8. 02 7月, 2013 1 次提交
  9. 01 7月, 2013 2 次提交
  10. 29 6月, 2013 2 次提交
  11. 28 6月, 2013 2 次提交
    • T
      ALSA: Replace the magic number 44 with const · 975cc02a
      Takashi Iwai 提交于
      The char arrays with size 44 are for the name string of
      snd_ctl_elem_id.  Define the constant and replace the raw numbers with
      it for clarifying better.
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      975cc02a
    • D
      drm: add hotspot support for cursors. · 4c813d4d
      Dave Airlie 提交于
      So it looks like for virtual hw cursors on QXL we need to inform
      the "hw" device what the cursor hotspot parameters are. This
      makes sense if you think the host has to draw the cursor and interpret
      clicks from it. However the current modesetting interface doesn't support
      passing the hotspot information from userspace.
      
      This implements a new cursor ioctl, that takes the hotspot info as well,
      userspace can try calling the new interface and if it gets -ENOSYS it means
      its on an older kernel and can just fallback.
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      4c813d4d
  12. 27 6月, 2013 2 次提交
  13. 26 6月, 2013 3 次提交
    • A
      ipvs: SH fallback and L4 hashing · eba3b5a7
      Alexander Frolkin 提交于
      By default the SH scheduler rejects connections that are hashed onto a
      realserver of weight 0.  This patch adds a flag to make SH choose a
      different realserver in this case, instead of rejecting the connection.
      
      The patch also adds a flag to make SH include the source port (TCP, UDP,
      SCTP) in the hash as well as the source address.  This basically allows
      for deterministic round-robin load balancing (i.e., where any director
      in a cluster of directors with identical config will send the same
      packet the same way).
      
      The flags are service flags (IP_VS_SVC_F_SCHED*) so that these options
      can be set per service.  They are set using a new option to ipvsadm.
      Signed-off-by: NAlexander Frolkin <avf@eldamar.org.uk>
      Acked-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      eba3b5a7
    • E
      net: poll/select low latency socket support · 2d48d67f
      Eliezer Tamir 提交于
      select/poll busy-poll support.
      
      Split sysctl value into two separate ones, one for read and one for poll.
      updated Documentation/sysctl/net.txt
      
      Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
      sk_poll_ll if possible. sock_poll sets this flag in its return value
      to indicate to select/poll when a socket that can busy poll is found.
      
      When poll/select have nothing to report, call the low-level
      sock_poll again until we are out of time or we find something.
      
      Once the system call finds something, it stops setting POLL_LL, so it can
      return the result to the user ASAP.
      Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d48d67f
    • H
      linux/const.h: Add _BITUL() and _BITULL() · 2fc016c5
      H. Peter Anvin 提交于
      Add macros for single bit definitions of a specific type.  These are
      similar to the BIT() macro that already exists, but with a few
      exceptions:
      
      1. The namespace is such that they can be used in uapi definitions.
      2. The type is set with the _AC() macro to allow it to be used in
         assembly.
      3. The type is explicitly specified to be UL or ULL.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-nbca8p7cg6jyjoit7klh3o91@git.kernel.org
      2fc016c5
  14. 25 6月, 2013 1 次提交
  15. 22 6月, 2013 1 次提交
  16. 21 6月, 2013 11 次提交
    • A
      vfio: hugepage support for vfio_iommu_type1 · 166fd7d9
      Alex Williamson 提交于
      We currently send all mappings to the iommu in PAGE_SIZE chunks,
      which prevents the iommu from enabling support for larger page sizes.
      We still need to pin pages, which means we step through them in
      PAGE_SIZE chunks, but we can batch up contiguous physical memory
      chunks to allow the iommu the opportunity to use larger pages.  The
      approach here is a bit different that the one currently used for
      legacy KVM device assignment.  Rather than looking at the vma page
      size and using that as the maximum size to pass to the iommu, we
      instead simply look at whether the next page is physically
      contiguous.  This means we might ask the iommu to map a 4MB region,
      while legacy KVM might limit itself to a maximum of 2MB.
      
      Splitting our mapping path also allows us to be smarter about locked
      memory because we can more easily unwind if the user attempts to
      exceed the limit.  Therefore, rather than assuming that a mapping
      will result in locked memory, we test each page as it is pinned to
      determine whether it locks RAM vs an mmap'd MMIO region.  This should
      result in better locking granularity and less locked page fudge
      factors in userspace.
      
      The unmap path uses the same algorithm as legacy KVM.  We don't want
      to track the pfn for each mapping ourselves, but we need the pfn in
      order to unpin pages.  We therefore ask the iommu for the iova to
      physical address translation, ask it to unpin a page, and see how many
      pages were actually unpinned.  iommus supporting large pages will
      often return something bigger than a page here, which we know will be
      physically contiguous and we can unpin a batch of pfns.  iommus not
      supporting large mappings won't see an improvement in batching here as
      they only unmap a page at a time.
      
      With this change, we also make a clarification to the API for mapping
      and unmapping DMA.  We can only guarantee unmaps at the same
      granularity as used for the original mapping.  In other words,
      unmapping a subregion of a previous mapping is not guaranteed and may
      result in a larger or smaller unmapping than requested.  The size
      field in the unmapping structure is updated to reflect this.
      Previously this was unmodified on mapping, always returning the the
      requested unmap size.  This is now updated to return the actual unmap
      size on success, allowing userspace to appropriately track mappings.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      166fd7d9
    • H
      [media] v4l2-core: remove support for obsolete VIDIOC_DBG_G_CHIP_IDENT · b71c9980
      Hans Verkuil 提交于
      This has been replaced by the new and much better VIDIOC_DBG_G_CHIP_INFO.
      Signed-off-by: NHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>
      b71c9980
    • S
      RDMA/ucma: Allow user space to specify AF_IB when joining multicast · 5bc2b7b3
      Sean Hefty 提交于
      Allow user space applications to join multicast groups using MGIDs
      directly.  MGIDs may be passed using AF_IB addresses.  Since the
      current multicast join command only supports addresses as large as
      sockaddr_in6, define a new structure for joining addresses specified
      using sockaddr_ib.
      
      Since AF_IB allows the user to specify the qkey when resolving a
      remote UD QP address, when joining the multicast group use the qkey
      value, if one has been assigned.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      5bc2b7b3
    • S
      RDMA/ucma: Allow user space to pass AF_IB into resolve · 209cf2a7
      Sean Hefty 提交于
      Allow user space applications to call resolve_addr using AF_IB.  To
      support sockaddr_ib, we need to define a new structure capable of
      handling the larger address size.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      209cf2a7
    • S
      RDMA/ucma: Allow user space to bind to AF_IB · eebe4c3a
      Sean Hefty 提交于
      Support user space binding to addresses using AF_IB.  Since
      sockaddr_ib is larger than sockaddr_in6, we need to define a larger
      structure when binding using AF_IB.  This time we use sockaddr_storage
      to cover future cases.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      eebe4c3a
    • S
      RDMA/ucma: Name changes to indicate only IP addresses supported · 05ad9457
      Sean Hefty 提交于
      Several commands into the RDMA CM from user space are restricted to
      supporting addresses which fit into a sockaddr_in6 structure: bind
      address, resolve address, and join multicast.
      
      With the addition of AF_IB, we need to support addresses which are
      larger than sockaddr_in6.  This will be done by adding new commands
      that exchange address information using sockaddr_storage.  However, to
      support existing applications, we maintain the current commands and
      structures, but rename them to indicate that they only support IPv4
      and v6 addresses.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      05ad9457
    • S
      RDMA/ucma: Add ability to query GID addresses · edaa7a55
      Sean Hefty 提交于
      Part of address resolution is mapping IP addresses to IB GIDs.  With
      the changes to support querying larger addresses and more path records,
      also provide a way to query IB GIDs after resolution completes.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      edaa7a55
    • S
      RDMA/ucma: Support querying when IB paths are not reversible · ac53b264
      Sean Hefty 提交于
      The current query_route call can return up to two path records.  The
      assumption being that one is the primary path, with optional support
      for an alternate path.  In both cases, the paths are assumed to be
      reversible and are used to send CM MADs.
      
      With the ability to manually set IB path data, the rdma cm can
      eventually be capable of using up to 6 paths per connection:
      
      	forward primary, reverse primary,
      	forward alternate, reverse alternate,
      	reversible primary path for CM MADs
      	reversible alternate path for CM MADs.
      
      (It is unclear at this time if IB routing will complicate this)  In
      order to handle more flexible routing topologies, add a new command to
      report any number of paths.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      ac53b264
    • S
      RDMA/ucma: Support querying for AF_IB addresses · ee7aed45
      Sean Hefty 提交于
      The sockaddr structure for AF_IB is larger than sockaddr_in6.  The
      rdma cm user space ABI uses the latter to exchange address information
      between user space and the kernel.
      
      To support querying for larger addresses, define a new query command
      that exchanges data using sockaddr_storage, rather than sockaddr_in6.
      Unlike the existing query_route command, the new command only returns
      address information.  Route (i.e. path record) data is separated.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      ee7aed45
    • S
      RDMA/cma: Set qkey for AF_IB · 5c438135
      Sean Hefty 提交于
      Allow the user to specify the qkey when using AF_IB.  The qkey is
      added to struct rdma_ucm_conn_param in place of a reserved field, but
      for backwards compatability, is only accessed if the associated
      rdma_cm_id is using AF_IB.
      Signed-off-by: NSean Hefty <sean.hefty@intel.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      5c438135
    • E
      netfilter: xt_socket: add XT_SOCKET_NOWILDCARD flag · 681f130f
      Eric Dumazet 提交于
      xt_socket module can be a nice replacement to conntrack module
      in some cases (SYN filtering for example)
      
      But it lacks the ability to match the 3rd packet of TCP
      handshake (ACK coming from the client).
      
      Add a XT_SOCKET_NOWILDCARD flag to disable the wildcard mechanism.
      
      The wildcard is the legacy socket match behavior, that ignores
      LISTEN sockets bound to INADDR_ANY (or ipv6 equivalent)
      
      iptables -I INPUT -p tcp --syn -j SYN_CHAIN
      iptables -I INPUT -m socket --nowildcard -j ACCEPT
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Patrick McHardy <kaber@trash.net>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      681f130f
  17. 20 6月, 2013 5 次提交
    • A
      powerpc/vfio: Implement IOMMU driver for VFIO · 5ffd229c
      Alexey Kardashevskiy 提交于
      VFIO implements platform independent stuff such as
      a PCI driver, BAR access (via read/write on a file descriptor
      or direct mapping when possible) and IRQ signaling.
      
      The platform dependent part includes IOMMU initialization
      and handling.  This implements an IOMMU driver for VFIO
      which does mapping/unmapping pages for the guest IO and
      provides information about DMA window (required by a POWER
      guest).
      
      Cc: David Gibson <david@gibson.dropbear.id.au>
      Signed-off-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Acked-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      5ffd229c
    • C
      tcp: introduce a per-route knob for quick ack · bcefe17c
      Cong Wang 提交于
      In previous discussions, I tried to find some reasonable heuristics
      for delayed ACK, however this seems not possible, according to Eric:
      
      	"ACKS might also be delayed because of bidirectional
      	traffic, and is more controlled by the application
      	response time. TCP stack can not easily estimate it."
      
      	"ACK can be incredibly useful to recover from losses in
      	a short time.
      
      	The vast majority of TCP sessions are small lived, and we
      	send one ACK per received segment anyway at beginning or
      	retransmits to let the sender smoothly increase its cwnd,
      	so an auto-tuning facility wont help them that much."
      
      and according to David:
      
      	"ACKs are the only information we have to detect loss.
      
      	And, for the same reasons that TCP VEGAS is fundamentally
      	broken, we cannot measure the pipe or some other
      	receiver-side-visible piece of information to determine
      	when it's "safe" to stretch ACK.
      
      	And even if it's "safe", we should not do it so that losses are
      	accurately detected and we don't spuriously retransmit.
      
      	The only way to know when the bandwidth increases is to
      	"test" it, by sending more and more packets until drops happen.
      	That's why all successful congestion control algorithms must
      	operate on explicited tested pieces of information.
      
      	Similarly, it's not really possible to universally know if
      	it's safe to stretch ACK or not."
      
      It still makes sense to enable or disable quick ack mode like
      what TCP_QUICK_ACK does.
      
      Similar to TCP_QUICK_ACK option, but for people who can't
      modify the source code and still wants to control
      TCP delayed ACK behavior. As David suggested, this should belong
      to per-path scope, since different pathes may want different
      behaviors.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Rick Jones <rick.jones2@hp.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Thomas Graf <tgraf@suug.ch>
      CC: David Laight <David.Laight@ACULAB.COM>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bcefe17c
    • S
      netlink: export netlink_diag.h header · 2bd470fc
      stephen hemminger 提交于
      The netlink_diag.h is in include/uapi/linux but not in the Kbuild necessary
      to cause it to be exported by make headers_install.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2bd470fc
    • P
      openvswitch: Add gre tunnel support. · aa310701
      Pravin B Shelar 提交于
      Add gre vport implementation.  Most of gre protocol processing
      is pushed to gre module. It make use of gre demultiplexer
      therefore it can co-exist with linux device based gre tunnels.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa310701
    • P
      openvswitch: Add tunneling interface. · 7d5437c7
      Pravin B Shelar 提交于
      Add ovs tunnel interface for set tunnel action for userspace.
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Acked-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d5437c7
  18. 19 6月, 2013 1 次提交