1. 27 9月, 2013 2 次提交
  2. 18 9月, 2013 1 次提交
  3. 11 9月, 2013 1 次提交
    • G
      fs: bump inode and dentry counters to long · 3942c07c
      Glauber Costa 提交于
      This series reworks our current object cache shrinking infrastructure in
      two main ways:
      
       * Noticing that a lot of users copy and paste their own version of LRU
         lists for objects, we put some effort in providing a generic version.
         It is modeled after the filesystem users: dentries, inodes, and xfs
         (for various tasks), but we expect that other users could benefit in
         the near future with little or no modification.  Let us know if you
         have any issues.
      
       * The underlying list_lru being proposed automatically and
         transparently keeps the elements in per-node lists, and is able to
         manipulate the node lists individually.  Given this infrastructure, we
         are able to modify the up-to-now hammer called shrink_slab to proceed
         with node-reclaim instead of always searching memory from all over like
         it has been doing.
      
      Per-node lru lists are also expected to lead to less contention in the lru
      locks on multi-node scans, since we are now no longer fighting for a
      global lock.  The locks usually disappear from the profilers with this
      change.
      
      Although we have no official benchmarks for this version - be our guest to
      independently evaluate this - earlier versions of this series were
      performance tested (details at
      http://permalink.gmane.org/gmane.linux.kernel.mm/100537) yielding no
      visible performance regressions while yielding a better qualitative
      behavior in NUMA machines.
      
      With this infrastructure in place, we can use the list_lru entry point to
      provide memcg isolation and per-memcg targeted reclaim.  Historically,
      those two pieces of work have been posted together.  This version presents
      only the infrastructure work, deferring the memcg work for a later time,
      so we can focus on getting this part tested.  You can see more about the
      history of such work at http://lwn.net/Articles/552769/
      
      Dave Chinner (18):
        dcache: convert dentry_stat.nr_unused to per-cpu counters
        dentry: move to per-sb LRU locks
        dcache: remove dentries from LRU before putting on dispose list
        mm: new shrinker API
        shrinker: convert superblock shrinkers to new API
        list: add a new LRU list type
        inode: convert inode lru list to generic lru list code.
        dcache: convert to use new lru list infrastructure
        list_lru: per-node list infrastructure
        shrinker: add node awareness
        fs: convert inode and dentry shrinking to be node aware
        xfs: convert buftarg LRU to generic code
        xfs: rework buffer dispose list tracking
        xfs: convert dquot cache lru to list_lru
        fs: convert fs shrinkers to new scan/count API
        drivers: convert shrinkers to new count/scan API
        shrinker: convert remaining shrinkers to count/scan API
        shrinker: Kill old ->shrink API.
      
      Glauber Costa (7):
        fs: bump inode and dentry counters to long
        super: fix calculation of shrinkable objects for small numbers
        list_lru: per-node API
        vmscan: per-node deferred work
        i915: bail out earlier when shrinker cannot acquire mutex
        hugepage: convert huge zero page shrinker to new shrinker API
        list_lru: dynamically adjust node arrays
      
      This patch:
      
      There are situations in very large machines in which we can have a large
      quantity of dirty inodes, unused dentries, etc.  This is particularly true
      when umounting a filesystem, where eventually since every live object will
      eventually be discarded.
      
      Dave Chinner reported a problem with this while experimenting with the
      shrinker revamp patchset.  So we believe it is time for a change.  This
      patch just moves int to longs.  Machines where it matters should have a
      big long anyway.
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: Dave Chinner <dchinner@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      3942c07c
  4. 09 9月, 2013 3 次提交
  5. 08 9月, 2013 2 次提交
    • D
      Input: evdev - add EVIOCREVOKE ioctl · c7dc6573
      David Herrmann 提交于
      If we have multiple sessions on a system, we normally don't want
      background sessions to read input events. Otherwise, it could capture
      passwords and more entered by the user on the foreground session. This is
      a real world problem as the recent XMir development showed:
        http://mjg59.dreamwidth.org/27327.html
      
      We currently rely on sessions to release input devices when being
      deactivated. This relies on trust across sessions. But that's not given on
      usual systems. We therefore need a way to control which processes have
      access to input devices.
      
      With VTs the kernel simply routed them through the active /dev/ttyX. This
      is not possible with evdev devices, though. Moreover, we want to avoid
      routing input-devices through some dispatcher-daemon in userspace (which
      would add some latency).
      
      This patch introduces EVIOCREVOKE. If called on an evdev fd, this revokes
      device-access irrecoverably for that *single* open-file. Hence, once you
      call EVIOCREVOKE on any dup()ed fd, all fds for that open-file will be
      rather useless now (but still valid compared to close()!). This allows us
      to pass fds directly to session-processes from a trusted source. The
      source keeps a dup()ed fd and revokes access once the session-process is
      no longer active.
      Compared to the EVIOCMUTE proposal, we can avoid the CAP_SYS_ADMIN
      restriction now as there is no way to revive the fd again. Hence, a user
      is free to call EVIOCREVOKE themself to kill the fd.
      
      Additionally, this ioctl allows multi-layer access-control (again compared
      to EVIOCMUTE which was limited to one layer via CAP_SYS_ADMIN). A middle
      layer can simply request a new open-file from the layer above and pass it
      to the layer below. Now each layer can call EVIOCREVOKE on the fds to
      revoke access for all layers below, at the expense of one fd per layer.
      
      There's already ongoing experimental user-space work which demonstrates
      how it can be used:
        http://lists.freedesktop.org/archives/systemd-devel/2013-August/012897.htmlSigned-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      c7dc6573
    • L
      Revert "Input: introduce BTN/ABS bits for drums and guitars" · b04c99e3
      Linus Torvalds 提交于
      This reverts commits 61e00655, 73f8645d and 8e22ecb6:
        "Input: introduce BTN/ABS bits for drums and guitars"
        "HID: wiimote: add support for Guitar-Hero drums"
        "HID: wiimote: add support for Guitar-Hero guitars"
      
      The extra new ABS_xx values resulted in ABS_MAX no longer being a
      power-of-two, which broke the comparison logic.  It also caused the
      ioctl numbers to overflow into the next byte, causing problems for that.
      
      We'll try again for 3.13.
      Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Acked-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Benjamin Tissoires <benjamin.tissoires@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b04c99e3
  6. 06 9月, 2013 1 次提交
    • M
      dm: add statistics support · fd2ed4d2
      Mikulas Patocka 提交于
      Support the collection of I/O statistics on user-defined regions of
      a DM device.  If no regions are defined no statistics are collected so
      there isn't any performance impact.  Only bio-based DM devices are
      currently supported.
      
      Each user-defined region specifies a starting sector, length and step.
      Individual statistics will be collected for each step-sized area within
      the range specified.
      
      The I/O statistics counters for each step-sized area of a region are
      in the same format as /sys/block/*/stat or /proc/diskstats but extra
      counters (12 and 13) are provided: total time spent reading and
      writing in milliseconds.  All these counters may be accessed by sending
      the @stats_print message to the appropriate DM device via dmsetup.
      
      The creation of DM statistics will allocate memory via kmalloc or
      fallback to using vmalloc space.  At most, 1/4 of the overall system
      memory may be allocated by DM statistics.  The admin can see how much
      memory is used by reading
      /sys/module/dm_mod/parameters/stats_current_allocated_bytes
      
      See Documentation/device-mapper/statistics.txt for more details.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      fd2ed4d2
  7. 05 9月, 2013 2 次提交
    • A
      vfio-pci: PCI hot reset interface · 8b27ee60
      Alex Williamson 提交于
      The current VFIO_DEVICE_RESET interface only maps to PCI use cases
      where we can isolate the reset to the individual PCI function.  This
      means the device must support FLR (PCIe or AF), PM reset on D3hot->D0
      transition, device specific reset, or be a singleton device on a bus
      for a secondary bus reset.  FLR does not have widespread support,
      PM reset is not very reliable, and bus topology is dictated by the
      system and device design.  We need to provide a means for a user to
      induce a bus reset in cases where the existing mechanisms are not
      available or not reliable.
      
      This device specific extension to VFIO provides the user with this
      ability.  Two new ioctls are introduced:
       - VFIO_DEVICE_PCI_GET_HOT_RESET_INFO
       - VFIO_DEVICE_PCI_HOT_RESET
      
      The first provides the user with information about the extent of
      devices affected by a hot reset.  This is essentially a list of
      devices and the IOMMU groups they belong to.  The user may then
      initiate a hot reset by calling the second ioctl.  We must be
      careful that the user has ownership of all the affected devices
      found via the first ioctl, so the second ioctl takes a list of file
      descriptors for the VFIO groups affected by the reset.  Each group
      must have IOMMU protection established for the ioctl to succeed.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      8b27ee60
    • C
      net: sync some IP headers with glibc · cfd280c9
      Carlos O'Donell 提交于
      Solution:
      =========
      
      - Synchronize linux's `include/uapi/linux/in6.h'
        with glibc's `inet/netinet/in.h'.
      - Synchronize glibc's `inet/netinet/in.h with linux's
        `include/uapi/linux/in6.h'.
      - Allow including the headers in either other.
      - First header included defines the structures and macros.
      
      Details:
      ========
      
      The kernel promises not to break the UAPI ABI so I don't
      see why we can't just have the two userspace headers
      coordinate?
      
      If you include the kernel headers first you get those,
      and if you include the glibc headers first you get those,
      and the following patch arranges a coordination and
      synchronization between the two.
      
      Let's handle `include/uapi/linux/in6.h' from linux,
      and `inet/netinet/in.h' from glibc and ensure they compile
      in any order and preserve the required ABI.
      
      These two patches pass the following compile tests:
      
      cat >> test1.c <<EOF
      int main (void) {
        return 0;
      }
      EOF
      gcc -c test1.c
      
      cat >> test2.c <<EOF
      int main (void) {
        return 0;
      }
      EOF
      gcc -c test2.c
      
      One wrinkle is that the kernel has a different name for one of
      the members in ipv6_mreq. In the kernel patch we create a macro
      to cover the uses of the old name, and while that's not entirely
      clean it's one of the best solutions (aside from an anonymous
      union which has other issues).
      
      I've reviewed the code and it looks to me like the ABI is
      assured and everything matches on both sides.
      
      Notes:
      - You want netinet/in.h to include bits/in.h as early as possible,
        but it needs in_addr so define in_addr early.
      - You want bits/in.h included as early as possible so you can use
        the linux specific code to define __USE_KERNEL_DEFS based on
        the _UAPI_* macro definition and use those to cull in.h.
      - glibc was missing IPPROTO_MH, added here.
      
      Compile tested and inspected.
      Reported-by: NThomas Backlund <tmb@mageia.org>
      Cc: Thomas Backlund <tmb@mageia.org>
      Cc: libc-alpha@sourceware.org
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Cc: David S. Miller <davem@davemloft.net>
      Tested-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NCarlos O'Donell <carlos@redhat.com>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cfd280c9
  8. 04 9月, 2013 5 次提交
  9. 03 9月, 2013 1 次提交
  10. 02 9月, 2013 3 次提交
  11. 01 9月, 2013 3 次提交
  12. 31 8月, 2013 1 次提交
    • T
      drm/radeon/si: Add support for CP DMA to CS checker for compute v2 · e5b9e750
      Tom Stellard 提交于
      Also add a new RADEON_INFO query to check that CP DMA packets are
      supported on the compute ring.
      
      CP DMA has been supported since the 3.8 kernel, but due to an oversight
      we forgot to teach the CS checker that the CP DMA packet was legal for
      the compute ring on Southern Islands GPUs.
      
      This patch fixes a bug where the radeon driver will incorrectly reject a legal
      CP DMA packet from user space.  I would like to have the patch
      backported to stable so that we don't have to require Mesa users to use a
      bleeding edge kernel in order to take advantage of this feature which
      is already present in the stable kernels (3.8 and newer).
      
      v2:
        - Don't bump kms version, so this patch can be backported to stable
          kernels.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NTom Stellard <thomas.stellard@amd.com>
      Signed-off-by: NAlex Deucher <alexander.deucher@amd.com>
      e5b9e750
  13. 30 8月, 2013 8 次提交
    • E
      pkt_sched: fq: Fair Queue packet scheduler · afe4fd06
      Eric Dumazet 提交于
      - Uses perfect flow match (not stochastic hash like SFQ/FQ_codel)
      - Uses the new_flow/old_flow separation from FQ_codel
      - New flows get an initial credit allowing IW10 without added delay.
      - Special FIFO queue for high prio packets (no need for PRIO + FQ)
      - Uses a hash table of RB trees to locate the flows at enqueue() time
      - Smart on demand gc (at enqueue() time, RB tree lookup evicts old
        unused flows)
      - Dynamic memory allocations.
      - Designed to allow millions of concurrent flows per Qdisc.
      - Small memory footprint : ~8K per Qdisc, and 104 bytes per flow.
      - Single high resolution timer for throttled flows (if any).
      - One RB tree to link throttled flows.
      - Ability to have a max rate per flow. We might add a socket option
        to add per socket limitation.
      
      Attempts have been made to add TCP pacing in TCP stack, but this
      seems to add complex code to an already complex stack.
      
      TCP pacing is welcomed for flows having idle times, as the cwnd
      permits TCP stack to queue a possibly large number of packets.
      
      This removes the 'slow start after idle' choice, hitting badly
      large BDP flows, and applications delivering chunks of data
      as video streams.
      
      Nicely spaced packets :
      Here interface is 10Gbit, but flow bottleneck is ~20Mbit
      
      cwin is big, yet FQ avoids the typical bursts generated by TCP
      (as in netperf TCP_RR -- -r 100000,100000)
      
      15:01:23.545279 IP A > B: . 78193:81089(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.545394 IP B > A: . ack 81089 win 3668 <nop,nop,timestamp 11597985 1115>
      15:01:23.546488 IP A > B: . 81089:83985(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.546565 IP B > A: . ack 83985 win 3668 <nop,nop,timestamp 11597986 1115>
      15:01:23.547713 IP A > B: . 83985:86881(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.547778 IP B > A: . ack 86881 win 3668 <nop,nop,timestamp 11597987 1115>
      15:01:23.548911 IP A > B: . 86881:89777(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.548949 IP B > A: . ack 89777 win 3668 <nop,nop,timestamp 11597988 1115>
      15:01:23.550116 IP A > B: . 89777:92673(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.550182 IP B > A: . ack 92673 win 3668 <nop,nop,timestamp 11597989 1115>
      15:01:23.551333 IP A > B: . 92673:95569(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.551406 IP B > A: . ack 95569 win 3668 <nop,nop,timestamp 11597991 1115>
      15:01:23.552539 IP A > B: . 95569:98465(2896) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.552576 IP B > A: . ack 98465 win 3668 <nop,nop,timestamp 11597992 1115>
      15:01:23.553756 IP A > B: . 98465:99913(1448) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.554138 IP A > B: P 99913:100001(88) ack 65248 win 3125 <nop,nop,timestamp 1115 11597805>
      15:01:23.554204 IP B > A: . ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
      15:01:23.554234 IP B > A: . 65248:68144(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
      15:01:23.555620 IP B > A: . 68144:71040(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
      15:01:23.557005 IP B > A: . 71040:73936(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
      15:01:23.558390 IP B > A: . 73936:76832(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
      15:01:23.559773 IP B > A: . 76832:79728(2896) ack 100001 win 3668 <nop,nop,timestamp 11597993 1115>
      15:01:23.561158 IP B > A: . 79728:82624(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.562543 IP B > A: . 82624:85520(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.563928 IP B > A: . 85520:88416(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.565313 IP B > A: . 88416:91312(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.566698 IP B > A: . 91312:94208(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.568083 IP B > A: . 94208:97104(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.569467 IP B > A: . 97104:100000(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.570852 IP B > A: . 100000:102896(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.572237 IP B > A: . 102896:105792(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.573639 IP B > A: . 105792:108688(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.575024 IP B > A: . 108688:111584(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.576408 IP B > A: . 111584:114480(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      15:01:23.577793 IP B > A: . 114480:117376(2896) ack 100001 win 3668 <nop,nop,timestamp 11597994 1115>
      
      TCP timestamps show that most packets from B were queued in the same ms
      timeframe (TSval 1159799{3,4}), but FQ managed to send them right
      in time to avoid a big burst.
      
      In slow start or steady state, very few packets are throttled [1]
      
      FQ gets a bunch of tunables as :
      
        limit : max number of packets on whole Qdisc (default 10000)
      
        flow_limit : max number of packets per flow (default 100)
      
        quantum : the credit per RR round (default is 2 MTU)
      
        initial_quantum : initial credit for new flows (default is 10 MTU)
      
        maxrate : max per flow rate (default : unlimited)
      
        buckets : number of RB trees (default : 1024) in hash table.
                     (consumes 8 bytes per bucket)
      
        [no]pacing : disable/enable pacing (default is enable)
      
      All of them can be changed on a live qdisc.
      
      $ tc qd add dev eth0 root fq help
      Usage: ... fq [ limit PACKETS ] [ flow_limit PACKETS ]
                    [ quantum BYTES ] [ initial_quantum BYTES ]
                    [ maxrate RATE  ] [ buckets NUMBER ]
                    [ [no]pacing ]
      
      $ tc -s -d qd
      qdisc fq 8002: dev eth0 root refcnt 32 limit 10000p flow_limit 100p buckets 256 quantum 3028 initial_quantum 15140
       Sent 216532416 bytes 148395 pkt (dropped 0, overlimits 0 requeues 14)
       backlog 0b 0p requeues 14
        511 flows, 511 inactive, 0 throttled
        110 gc, 0 highprio, 0 retrans, 1143 throttled, 0 flows_plimit
      
      [1] Except if initial srtt is overestimated, as if using
      cached srtt in tcp metrics. We'll provide a fix for this issue.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afe4fd06
    • K
      drm: Advertise async page flip ability through GETCAP ioctl · 62f2104f
      Keith Packard 提交于
      Let applications know whether the kernel supports asynchronous page
      flipping.
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      Signed-off-by: NDave Airlie <airlied@gmail.com>
      62f2104f
    • K
      drm: Add DRM_MODE_PAGE_FLIP_ASYNC flag definition · 9bba0c42
      Keith Packard 提交于
      This requests that the driver perform the page flip as soon as
      possible, not necessarily waiting for vblank.
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      Signed-off-by: NDave Airlie <airlied@gmail.com>
      9bba0c42
    • O
      can: gw: add a per rule limitation of frame hops · 391ac128
      Oliver Hartkopp 提交于
      Usually the received CAN frames can be processed/routed as much as 'max_hops'
      times (which is given at module load time of the can-gw module).
      Introduce a new configuration option to reduce the number of possible hops
      for a specific gateway rule to a value smaller then max_hops.
      Signed-off-by: NOliver Hartkopp <socketcan@hartkopp.net>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      391ac128
    • D
      net: packet: add randomized fanout scheduler · 5df0ddfb
      Daniel Borkmann 提交于
      We currently allow for different fanout scheduling policies in pf_packet
      such as scheduling by skb's rxhash, round-robin, by cpu, and rollover.
      Also allow for a random, equidistributed selection of the socket from the
      fanout process group.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5df0ddfb
    • H
      ipv6: drop fragmented ndisc packets by default (RFC 6980) · b800c3b9
      Hannes Frederic Sowa 提交于
      This patch implements RFC6980: Drop fragmented ndisc packets by
      default. If a fragmented ndisc packet is received the user is informed
      that it is possible to disable the check.
      
      Cc: Fernando Gont <fernando@gont.com.ar>
      Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b800c3b9
    • A
      perf: make events stream always parsable · ff3d527c
      Adrian Hunter 提交于
      The event stream is not always parsable because the format of a sample
      is dependent on the sample_type of the selected event.  When there is
      more than one selected event and the sample_types are not the same then
      parsing becomes problematic.  A sample can be matched to its selected
      event using the ID that is allocated when the event is opened.
      Unfortunately, to get the ID from the sample means first parsing it.
      
      This patch adds a new sample format bit PERF_SAMPLE_IDENTIFER that puts
      the ID at a fixed position so that the ID can be retrieved without
      parsing the sample.  For sample events, that is the first position
      immediately after the header.  For non-sample events, that is the last
      position.
      
      In this respect parsing samples requires that the sample_type and ID
      values are recorded.  For example, perf tools records struct
      perf_event_attr and the IDs within the perf.data file.  Those must be
      read first before it is possible to parse samples found later in the
      perf.data file.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Tested-by: NStephane Eranian <eranian@google.com>
      Acked-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: David Ahern <dsahern@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1377591794-30553-6-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ff3d527c
    • D
      Input: add SYN_MAX and SYN_CNT constants · 52764fed
      David Herrmann 提交于
      SYN_* events are special and not enabled via set_bit() for devices. Hence,
      they haven't been really needed, yet. However, user-space can still make
      great use of that for int->string debugging helpers or alike.
      
      Also, I haven't seen any reason not to define these, so here they are.
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Acked-by: NPeter Hutterer <peter.hutterer@who-t.net>
      Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      52764fed
  14. 29 8月, 2013 7 次提交
    • M
      Omnikey Cardman 4000: pull in ioctl.h in user header · aaaafb7f
      Mike Frysinger 提交于
      This file uses the ioctl helpers (_IOR/_IOW/etc...), so include ioctl.h
      for the definitions.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Cc: Harald Welte <laforge@gnumonks.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aaaafb7f
    • B
      PCI: Add offsets of PCIe capability registers · bd6fb762
      Bjorn Helgaas 提交于
      These offsets are not used, and in some cases are completely reserved
      even in the spec, but I'm adding them for completeness just to match
      the diagrams in the spec, e.g., PCIe spec r3.0, sec 7.8.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      bd6fb762
    • B
      PCI: Tidy bitmasks and spacing of PCIe capability definitions · c0b4b381
      Bjorn Helgaas 提交于
      The convention of showing bits in a mask of the full register width, e.g.,
      "0x00000007" instead of "0x07" for a field in a 32-bit register, is common
      but not universal in this file.  This patch makes it consistently used at
      least for the PCIe capability.
      
      Whitespace and zero-extension changes only; no functional change.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      c0b4b381
    • B
      PCI: Remove obsolete comment reference to pci_pcie_cap2() · 1b121c24
      Bjorn Helgaas 提交于
      pci_pcie_cap2() was replaced by pcie_capability_read_word() and similar
      functions, so update the comment.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      1b121c24
    • B
      PCI: Clarify PCI_EXP_TYPE_PCI_BRIDGE comment · fbf501c3
      Bjorn Helgaas 提交于
      The PCI_EXP_TYPE_PCI_BRIDGE is a *PCIe* function that is a bridge to
      PCI/PCI-X.  See PCIe spec r3.0, sec 7.8.2.
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      fbf501c3
    • H
      IB/core: Export ib_create/destroy_flow through uverbs · 436f2ad0
      Hadar Hen Zion 提交于
      Implement ib_uverbs_create_flow() and ib_uverbs_destroy_flow() to
      support flow steering for user space applications.
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      436f2ad0
    • I
      IB/core: Infrastructure for extensible uverbs commands · 400dbc96
      Igor Ivanov 提交于
      Add infrastructure to support extended uverbs capabilities in a
      forward/backward manner.  Uverbs command opcodes which are based on
      the verbs extensions approach should be greater or equal to
      IB_USER_VERBS_CMD_THRESHOLD.  They have new header format and
      processed a bit differently.
      
      Whenever a specific IB_USER_VERBS_CMD_XXX is extended, which practically means
      it needs to have additional arguments, we will be able to add them without creating
      a completely new IB_USER_VERBS_CMD_YYY command or bumping the uverbs ABI version.
      
      This patch for itself doesn't provide the whole scheme which is also dependent
      on adding a comp_mask field to each extended uverbs command struct.
      
      The new header framework allows for future extension of the CMD arguments
      (ib_uverbs_cmd_hdr.in_words, ib_uverbs_cmd_hdr.out_words) for an existing
      new command (that is a command that supports the new uverbs command header format
      suggested in this patch) w/o bumping ABI version and with maintaining backward
      and formward compatibility to new and old libibverbs versions.
      
      In the uverbs command we are passing both uverbs arguments and the provider arguments.
      We split the ib_uverbs_cmd_hdr.in_words to ib_uverbs_cmd_hdr.in_words which will now carry only
      uverbs input argument struct size and  ib_uverbs_cmd_hdr.provider_in_words that will carry
      the provider input argument size. Same goes for the response (the uverbs CMD output argument).
      
      For example take the create_cq call and the mlx4_ib provider:
      
      The uverbs layer gets libibverb's struct ibv_create_cq (named struct ib_uverbs_create_cq
      in the kernel), mlx4_ib gets libmlx4's struct mlx4_create_cq (which includes struct
      ibv_create_cq and is named struct mlx4_ib_create_cq in the kernel) and
      in_words = sizeof(mlx4_create_cq)/4 .
      
      Thus ib_uverbs_cmd_hdr.in_words carry both uverbs plus mlx4_ib input argument sizes,
      where uverbs assumes it knows the size of its input argument - struct ibv_create_cq.
      
      Now, if we wish to add a variable to struct ibv_create_cq, we can add a comp_mask field
      to the struct which is basically bit field indicating which fields exists in the struct
      (as done for the libibverbs API extension), but we need a way to tell what is the total
      size of the struct and not assume the struct size is predefined (since we may get different
      struct sizes from different user libibverbs versions). So we know at which point the
      provider input argument (struct mlx4_create_cq) begins. Same goes for extending the
      provider struct mlx4_create_cq. Thus we split the ib_uverbs_cmd_hdr.in_words to
      ib_uverbs_cmd_hdr.in_words which will now carry only uverbs input argument struct size and
      ib_uverbs_cmd_hdr.provider_in_words that will carry the provider (mlx4_ib) input argument size.
      Signed-off-by: NIgor Ivanov <Igor.Ivanov@itseez.com>
      Signed-off-by: NHadar Hen Zion <hadarh@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NRoland Dreier <roland@purestorage.com>
      400dbc96