1. 29 11月, 2013 2 次提交
    • J
      genetlink/pmcraid: use proper genetlink multicast API · 5e53e689
      Johannes Berg 提交于
      The pmcraid driver is abusing the genetlink API and is using its
      family ID as the multicast group ID, which is invalid and may
      belong to somebody else (and likely will.)
      
      Make it use the correct API, but since this may already be used
      as-is by userspace, reserve a family ID for this code and also
      reserve that group ID to not break userspace assumptions.
      
      My previous patch broke event delivery in the driver as I missed
      that it wasn't using the right API and forgot to update it later
      in my series.
      
      While changing this, I noticed that the genetlink code could use
      the static group ID instead of a strcmp(), so also do that for
      the VFS_DQUOT family.
      
      Cc: Anil Ravindranath <anil_ravindranath@pmc-sierra.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e53e689
    • N
      diag: warn about missing first netlink attribute · 31e20bad
      Nicolas Dichtel 提交于
      The first netlink attribute (value 0) must always be defined as none/unspec.
      This is correctly done in inet_diag.h, but other diag interfaces are wrong.
      
      Because we cannot change an existing API, I add a comment to point the mistake
      and avoid to propagate it in a new diag API in the future.
      
      CC: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      31e20bad
  2. 20 11月, 2013 1 次提交
  3. 16 11月, 2013 2 次提交
    • E
      pkt_sched: fq: fix pacing for small frames · f52ed899
      Eric Dumazet 提交于
      For performance reasons, sch_fq tried hard to not setup timers for every
      sent packet, using a quantum based heuristic : A delay is setup only if
      the flow exhausted its credit.
      
      Problem is that application limited flows can refill their credit
      for every queued packet, and they can evade pacing.
      
      This problem can also be triggered when TCP flows use small MSS values,
      as TSO auto sizing builds packets that are smaller than the default fq
      quantum (3028 bytes)
      
      This patch adds a 40 ms delay to guard flow credit refill.
      
      Fixes: afe4fd06 ("pkt_sched: fq: Fair Queue packet scheduler")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Maciej Żenczykowski <maze@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Yuchung Cheng <ycheng@google.com>
      Cc: Neal Cardwell <ncardwell@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f52ed899
    • E
      pkt_sched: fq: warn users using defrate · 65c5189a
      Eric Dumazet 提交于
      Commit 7eec4174 ("pkt_sched: fq: fix non TCP flows pacing")
      obsoleted TCA_FQ_FLOW_DEFAULT_RATE without notice for the users.
      
      Suggested by David Miller
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65c5189a
  4. 12 11月, 2013 2 次提交
    • J
      Btrfs: add tests for find_lock_delalloc_range · 294e30fe
      Josef Bacik 提交于
      So both Liu and I made huge messes of find_lock_delalloc_range trying to fix
      stuff, me first by fixing extent size, then him by fixing something I broke and
      then me again telling him to fix it a different way.  So this is obviously a
      candidate for some testing.  This patch adds a pseudo fs so we can allocate fake
      inodes for tests that need an inode or pages.  Then it addes a bunch of tests to
      make sure find_lock_delalloc_range is acting the way it is supposed to.  With
      this patch and all of our previous patches to find_lock_delalloc_range I am sure
      it is working as expected now.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      294e30fe
    • D
      random32: move rnd_state to linux/random.h · 38e9efcd
      Daniel Borkmann 提交于
      struct rnd_state got mistakenly pulled into uapi header. It is not
      used anywhere and does also not belong there!
      
      Commit 5960164f ("lib/random32: export pseudo-random number
      generator for modules"), the last commit on rnd_state before it
      got moved to uapi, says:
      
        This patch moves the definition of struct rnd_state and the inline
        __seed() function to linux/random.h.  It renames the static __random32()
        function to prandom32() and exports it for use in modules.
      
      Hence, the structure was moved from lib/random32.c to linux/random.h
      so that it can be used within modules (FCoE-related code in this
      case), but not from user space. However, it seems to have been
      mistakenly moved to uapi header through the uapi script. Since no-one
      should make use of it from the linux headers, move the structure back
      to the kernel for internal use, so that it can be modified on demand.
      
      Joint work with Hannes Frederic Sowa.
      
      Cc: Joe Eykholt <jeykholt@cisco.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      38e9efcd
  5. 11 11月, 2013 1 次提交
  6. 10 11月, 2013 2 次提交
  7. 08 11月, 2013 1 次提交
  8. 07 11月, 2013 1 次提交
  9. 06 11月, 2013 1 次提交
    • H
      ipv4: introduce new IP_MTU_DISCOVER mode IP_PMTUDISC_INTERFACE · 482fc609
      Hannes Frederic Sowa 提交于
      Sockets marked with IP_PMTUDISC_INTERFACE won't do path mtu discovery,
      their sockets won't accept and install new path mtu information and they
      will always use the interface mtu for outgoing packets. It is guaranteed
      that the packet is not fragmented locally. But we won't set the DF-Flag
      on the outgoing frames.
      
      Florian Weimer had the idea to use this flag to ensure DNS servers are
      never generating outgoing fragments. They may well be fragmented on the
      path, but the server never stores or usees path mtu values, which could
      well be forged in an attack.
      
      (The root of the problem with path MTU discovery is that there is
      no reliable way to authenticate ICMP Fragmentation Needed But DF Set
      messages because they are sent from intermediate routers with their
      source addresses, and the IMCP payload will not always contain sufficient
      information to identify a flow.)
      
      Recent research in the DNS community showed that it is possible to
      implement an attack where DNS cache poisoning is feasible by spoofing
      fragments. This work was done by Amir Herzberg and Haya Shulman:
      <https://sites.google.com/site/hayashulman/files/fragmentation-poisoning.pdf>
      
      This issue was previously discussed among the DNS community, e.g.
      <http://www.ietf.org/mail-archive/web/dnsext/current/msg01204.html>,
      without leading to fixes.
      
      This patch depends on the patch "ipv4: fix DO and PROBE pmtu mode
      regarding local fragmentation with UFO/CORK" for the enforcement of the
      non-fragmentable checks. If other users than ip_append_page/data should
      use this semantic too, we have to add a new flag to IPCB(skb)->flags to
      suppress local fragmentation and check for this in ip_finish_output.
      
      Many thanks to Florian Weimer for the idea and feedback while implementing
      this patch.
      
      Cc: David S. Miller <davem@davemloft.net>
      Suggested-by: NFlorian Weimer <fweimer@redhat.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      482fc609
  10. 04 11月, 2013 1 次提交
    • A
      net/hsr: Add support for the High-availability Seamless Redundancy protocol (HSRv0) · f421436a
      Arvid Brodin 提交于
      High-availability Seamless Redundancy ("HSR") provides instant failover
      redundancy for Ethernet networks. It requires a special network topology where
      all nodes are connected in a ring (each node having two physical network
      interfaces). It is suited for applications that demand high availability and
      very short reaction time.
      
      HSR acts on the Ethernet layer, using a registered Ethernet protocol type to
      send special HSR frames in both directions over the ring. The driver creates
      virtual network interfaces that can be used just like any ordinary Linux
      network interface, for IP/TCP/UDP traffic etc. All nodes in the network ring
      must be HSR capable.
      
      This code is a "best effort" to comply with the HSR standard as described in
      IEC 62439-3:2010 (HSRv0).
      Signed-off-by: NArvid Brodin <arvid.brodin@xdin.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f421436a
  11. 02 11月, 2013 1 次提交
    • J
      openvswitch: TCP flags matching support. · 5eb26b15
      Jarno Rajahalme 提交于
          tcp_flags=flags/mask
              Bitwise  match on TCP flags.  The flags and mask are 16-bit num‐
              bers written in decimal or in hexadecimal prefixed by 0x.   Each
              1-bit  in  mask requires that the corresponding bit in port must
              match.  Each 0-bit in mask causes the corresponding  bit  to  be
              ignored.
      
              TCP  protocol  currently  defines  9 flag bits, and additional 3
              bits are reserved (must be transmitted as zero), see  RFCs  793,
              3168, and 3540.  The flag bits are, numbering from the least
              significant bit:
      
              0: FIN No more data from sender.
      
              1: SYN Synchronize sequence numbers.
      
              2: RST Reset the connection.
      
              3: PSH Push function.
      
              4: ACK Acknowledgement field significant.
      
              5: URG Urgent pointer field significant.
      
              6: ECE ECN Echo.
      
              7: CWR Congestion Windows Reduced.
      
              8: NS  Nonce Sum.
      
              9-11:  Reserved.
      
              12-15: Not matchable, must be zero.
      Signed-off-by: NJarno Rajahalme <jrajahalme@nicira.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      5eb26b15
  12. 31 10月, 2013 2 次提交
    • A
      kvm: Add VFIO device · ec53500f
      Alex Williamson 提交于
      So far we've succeeded at making KVM and VFIO mostly unaware of each
      other, but areas are cropping up where a connection beyond eventfds
      and irqfds needs to be made.  This patch introduces a KVM-VFIO device
      that is meant to be a gateway for such interaction.  The user creates
      the device and can add and remove VFIO groups to it via file
      descriptors.  When a group is added, KVM verifies the group is valid
      and gets a reference to it via the VFIO external user interface.
      Signed-off-by: NAlex Williamson <alex.williamson@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ec53500f
    • B
      kvm: Add KVM_GET_EMULATED_CPUID · 9c15bb1d
      Borislav Petkov 提交于
      Add a kvm ioctl which states which system functionality kvm emulates.
      The format used is that of CPUID and we return the corresponding CPUID
      bits set for which we do emulate functionality.
      
      Make sure ->padding is being passed on clean from userspace so that we
      can use it for something in the future, after the ioctl gets cast in
      stone.
      
      s/kvm_dev_ioctl_get_supported_cpuid/kvm_dev_ioctl_get_cpuid/ while at
      it.
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9c15bb1d
  13. 30 10月, 2013 1 次提交
    • D
      net: sched: cls_bpf: add BPF-based classifier · 7d1d65cb
      Daniel Borkmann 提交于
      This work contains a lightweight BPF-based traffic classifier that can
      serve as a flexible alternative to ematch-based tree classification, i.e.
      now that BPF filter engine can also be JITed in the kernel. Naturally, tc
      actions and policies are supported as well with cls_bpf. Multiple BPF
      programs/filter can be attached for a class, or they can just as well be
      written within a single BPF program, that's really up to the user how he
      wishes to run/optimize the code, e.g. also for inversion of verdicts etc.
      The notion of a BPF program's return/exit codes is being kept as follows:
      
           0: No match
          -1: Select classid given in "tc filter ..." command
        else: flowid, overwrite the default one
      
      As a minimal usage example with iproute2, we use a 3 band prio root qdisc
      on a router with sfq each as leave, and assign ssh and icmp bpf-based
      filters to band 1, http traffic to band 2 and the rest to band 3. For the
      first two bands we load the bytecode from a file, in the 2nd we load it
      inline as an example:
      
      echo 1 > /proc/sys/net/core/bpf_jit_enable
      
      tc qdisc del dev em1 root
      tc qdisc add dev em1 root handle 1: prio bands 3 priomap 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
      
      tc qdisc add dev em1 parent 1:1 sfq perturb 16
      tc qdisc add dev em1 parent 1:2 sfq perturb 16
      tc qdisc add dev em1 parent 1:3 sfq perturb 16
      
      tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/ssh.bpf flowid 1:1
      tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/icmp.bpf flowid 1:1
      tc filter add dev em1 parent 1: bpf run bytecode-file /etc/tc/http.bpf flowid 1:2
      tc filter add dev em1 parent 1: bpf run bytecode "`bpfc -f tc -i misc.ops`" flowid 1:3
      
      BPF programs can be easily created and passed to tc, either as inline
      'bytecode' or 'bytecode-file'. There are a couple of front-ends that can
      compile opcodes, for example:
      
      1) People familiar with tcpdump-like filters:
      
         tcpdump -iem1 -ddd port 22 | tr '\n' ',' > /etc/tc/ssh.bpf
      
      2) People that want to low-level program their filters or use BPF
         extensions that lack support by libpcap's compiler:
      
         bpfc -f tc -i ssh.ops > /etc/tc/ssh.bpf
      
         ssh.ops example code:
         ldh [12]
         jne #0x800, drop
         ldb [23]
         jneq #6, drop
         ldh [20]
         jset #0x1fff, drop
         ldxb 4 * ([14] & 0xf)
         ldh [%x + 14]
         jeq #0x16, pass
         ldh [%x + 16]
         jne #0x16, drop
         pass: ret #-1
         drop: ret #0
      
      It was chosen to load bytecode into tc, since the reverse operation,
      tc filter list dev em1, is then able to show the exact commands again.
      Possible follow-up work could also include a small expression compiler
      for iproute2. Tested with the help of bmon. This idea came up during
      the Netfilter Workshop 2013 in Copenhagen. Also thanks to feedback from
      Eric Dumazet!
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d1d65cb
  14. 29 10月, 2013 3 次提交
    • P
      perf: Fix perf ring buffer memory ordering · bf378d34
      Peter Zijlstra 提交于
      The PPC64 people noticed a missing memory barrier and crufty old
      comments in the perf ring buffer code. So update all the comments and
      add the missing barrier.
      
      When the architecture implements local_t using atomic_long_t there
      will be double barriers issued; but short of introducing more
      conditional barrier primitives this is the best we can do.
      Reported-by: NVictor Kaplansky <victork@il.ibm.com>
      Tested-by: NVictor Kaplansky <victork@il.ibm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: michael@ellerman.id.au
      Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: anton@samba.org
      Cc: benh@kernel.crashing.org
      Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bf378d34
    • W
      NFS: stop using NFS_MOUNT_SECFLAVOUR server flag · 5837f6df
      Weston Andros Adamson 提交于
      Since the parsed sec= flavor is now stored in nfs_server->auth_info,
      we no longer need an nfs_server flag to determine if a sec= option was
      used.
      
      This flag has not been completely removed because it is still needed for
      the (old but still supported) non-text parsed mount options ABI
      compatability.
      Signed-off-by: NWeston Andros Adamson <dros@netapp.com>
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      5837f6df
    • A
      [media] v4l: ti-vpe: Add VPE mem to mem driver · 45719127
      Archit Taneja 提交于
      VPE is a block which consists of a single memory to memory path which
      can perform chrominance up/down sampling, de-interlacing, scaling, and
      color space conversion of raster or tiled YUV420 coplanar, YUV422
      coplanar or YUV422 interleaved video formats.
      
      We create a mem2mem driver based primarily on the mem2mem-testdev
      example. The de-interlacer, scaler and color space converter are all
      bypassed for now to keep the driver simple. Chroma up/down sampler
      blocks are implemented, so conversion beteen different YUV formats is
      possible.
      
      Each mem2mem context allocates a buffer for VPE MMR values which it will
      use when it gets access to the VPE HW via the mem2mem queue, it also
      allocates a VPDMA descriptor list to which configuration and data
      descriptors are added.
      
      Based on the information received via v4l2 ioctls for the source and
      destination queues, the driver configures the values for the MMRs, and
      stores them in the buffer. There are also some VPDMA parameters like
      frame start and line mode which needs to be configured, these are
      configured by direct register writes via the VPDMA helper functions.
      
      The driver's device_run() mem2mem op will add each descriptor based on
      how the source and destination queues are set up for the given ctx, once
      the list is prepared, it's submitted to VPDMA, these descriptors when
      parsed by VPDMA will upload MMR registers, start DMA of video buffers on
      the various input and output clients/ports.
      
      When the list is parsed completely(and the DMAs on all the output ports
      done), an interrupt is generated which we use to notify that the source
      and destination buffers are done. The rest of the driver is quite
      similar to other mem2mem drivers, we use the multiplane v4l2 ioctls as
      the HW support coplanar formats.
      Signed-off-by: NArchit Taneja <archit@ti.com>
      Acked-by: NHans Verkuil <hans.verkuil@cisco.com>
      Signed-off-by: NKamil Debski <k.debski@samsung.com>
      Signed-off-by: NMauro Carvalho Chehab <m.chehab@samsung.com>
      45719127
  15. 28 10月, 2013 1 次提交
  16. 23 10月, 2013 1 次提交
  17. 20 10月, 2013 2 次提交
  18. 18 10月, 2013 1 次提交
  19. 15 10月, 2013 6 次提交
    • J
      ipvs: fix the IPVS_CMD_ATTR_MAX definition · 120c9794
      Julian Anastasov 提交于
      It was wrong (bigger) but problem is harmless.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NSimon Horman <horms@verge.net.au>
      120c9794
    • P
      netfilter: nfnetlink: add batch support and use it from nf_tables · 0628b123
      Pablo Neira Ayuso 提交于
      This patch adds a batch support to nfnetlink. Basically, it adds
      two new control messages:
      
      * NFNL_MSG_BATCH_BEGIN, that indicates the beginning of a batch,
        the nfgenmsg->res_id indicates the nfnetlink subsystem ID.
      
      * NFNL_MSG_BATCH_END, that results in the invocation of the
        ss->commit callback function. If not specified or an error
        ocurred in the batch, the ss->abort function is invoked
        instead.
      
      The end message represents the commit operation in nftables, the
      lack of end message results in an abort. This patch also adds the
      .call_batch function that is only called from the batch receival
      path.
      
      This patch adds atomic rule updates and dumps based on
      bitmask generations. This allows to atomically commit a set of
      rule-set updates incrementally without altering the internal
      state of existing nf_tables expressions/matches/targets.
      
      The idea consists of using a generation cursor of 1 bit and
      a bitmask of 2 bits per rule. Assuming the gencursor is 0,
      then the genmask (expressed as a bitmask) can be interpreted
      as:
      
      00 active in the present, will be active in the next generation.
      01 inactive in the present, will be active in the next generation.
      10 active in the present, will be deleted in the next generation.
       ^
       gencursor
      
      Once you invoke the transition to the next generation, the global
      gencursor is updated:
      
      00 active in the present, will be active in the next generation.
      01 active in the present, needs to zero its future, it becomes 00.
      10 inactive in the present, delete now.
      ^
      gencursor
      
      If a dump is in progress and nf_tables enters a new generation,
      the dump will stop and return -EBUSY to let userspace know that
      it has to retry again. In order to invalidate dumps, a global
      genctr counter is increased everytime nf_tables enters a new
      generation.
      
      This new operation can be used from the user-space utility
      that controls the firewall, eg.
      
      nft -f restore
      
      The rule updates contained in `file' will be applied atomically.
      
      cat file
      -----
      add filter INPUT ip saddr 1.1.1.1 counter accept #1
      del filter INPUT ip daddr 2.2.2.2 counter drop   #2
      -EOF-
      
      Note that the rule 1 will be inactive until the transition to the
      next generation, the rule 2 will be evicted in the next generation.
      
      There is a penalty during the rule update due to the branch
      misprediction in the packet matching framework. But that should be
      quickly resolved once the iteration over the commit list that
      contain rules that require updates is finished.
      
      Event notification happens once the rule-set update has been
      committed. So we skip notifications is case the rule-set update
      is aborted, which can happen in case that the rule-set is tested
      to apply correctly.
      
      This patch squashed the following patches from Pablo:
      
      * nf_tables: atomic rule updates and dumps
      * nf_tables: get rid of per rule list_head for commits
      * nf_tables: use per netns commit list
      * nfnetlink: add batch support and use it from nf_tables
      * nf_tables: all rule updates are transactional
      * nf_tables: attach replacement rule after stale one
      * nf_tables: do not allow deletion/replacement of stale rules
      * nf_tables: remove unused NFTA_RULE_FLAGS
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0628b123
    • E
      netfilter: nf_tables: add insert operation · 5e948466
      Eric Leblond 提交于
      This patch adds a new rule attribute NFTA_RULE_POSITION which is
      used to store the position of a rule relatively to the others.
      By providing the create command and specifying the position, the
      rule is inserted after the rule with the handle equal to the
      provided position.
      
      Regarding notification, the position attribute specifies the
      handle of the previous rule to make sure we don't point to any
      stale rule in notifications coming from the commit path.
      
      This patch includes the following fix from Pablo:
      
      * nf_tables: fix rule deletion event reporting
      Signed-off-by: NEric Leblond <eric@regit.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      5e948466
    • T
      netfilter: nf_tables: Add support for IPv6 NAT · eb31628e
      Tomasz Bursztyka 提交于
      This patch generalizes the NAT expression to support both IPv4 and IPv6
      using the existing IPv4/IPv6 NAT infrastructure. This also adds the
      NAT chain type for IPv6.
      
      This patch collapses the following patches that were posted to the
      netfilter-devel mailing list, from Tomasz:
      
      * nf_tables: Change NFTA_NAT_ attributes to better semantic significance
      * nf_tables: Split IPv4 NAT into NAT expression and IPv4 NAT chain
      * nf_tables: Add support for IPv6 NAT expression
      * nf_tables: Add support for IPv6 NAT chain
      * nf_tables: Fix up build issue on IPv6 NAT support
      
      And, from Pablo Neira Ayuso:
      
      * fix missing dependencies in nft_chain_nat
      Signed-off-by: NTomasz Bursztyka <tomasz.bursztyka@linux.intel.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      eb31628e
    • P
      netfilter: nf_tables: add support for dormant tables · 9ddf6323
      Pablo Neira Ayuso 提交于
      This patch allows you to temporarily disable an entire table.
      You can change the state of a dormant table via NFT_MSG_NEWTABLE
      messages. Using this operation you can wake up a table, so their
      chains are registered.
      
      This provides atomicity at chain level. Thus, the rule-set of one
      chain is applied at once, avoiding any possible intermediate state
      in every chain. Still, the chains that belongs to a table are
      registered consecutively. This also allows you to have inactive
      tables in the kernel.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9ddf6323
    • P
      netfilter: nf_tables: add compatibility layer for x_tables · 0ca743a5
      Pablo Neira Ayuso 提交于
      This patch adds the x_tables compatibility layer. This allows you
      to use existing x_tables matches and targets from nf_tables.
      
      This compatibility later allows us to use existing matches/targets
      for features that are still missing in nf_tables. We can progressively
      replace them with native nf_tables extensions. It also provides the
      userspace compatibility software that allows you to express the
      rule-set using the iptables syntax but using the nf_tables kernel
      components.
      
      In order to get this compatibility layer working, I've done the
      following things:
      
      * add NFNL_SUBSYS_NFT_COMPAT: this new nfnetlink subsystem is used
      to query the x_tables match/target revision, so we don't need to
      use the native x_table getsockopt interface.
      
      * emulate xt structures: this required extending the struct nft_pktinfo
      to include the fragment offset, which is already obtained from
      ip[6]_tables and that is used by some matches/targets.
      
      * add support for default policy to base chains, required to emulate
        x_tables.
      
      * add NFTA_CHAIN_USE attribute to obtain the number of references to
        chains, required by x_tables emulation.
      
      * add chain packet/byte counters using per-cpu.
      
      * support 32-64 bits compat.
      
      For historical reasons, this patch includes the following patches
      that were posted in the netfilter-devel mailing list.
      
      From Pablo Neira Ayuso:
      * nf_tables: add default policy to base chains
      * netfilter: nf_tables: add NFTA_CHAIN_USE attribute
      * nf_tables: nft_compat: private data of target and matches in contiguous area
      * nf_tables: validate hooks for compat match/target
      * nf_tables: nft_compat: release cached matches/targets
      * nf_tables: x_tables support as a compile time option
      * nf_tables: fix alias for xtables over nftables module
      * nf_tables: add packet and byte counters per chain
      * nf_tables: fix per-chain counter stats if no counters are passed
      * nf_tables: don't bump chain stats
      * nf_tables: add protocol and flags for xtables over nf_tables
      * nf_tables: add ip[6]t_entry emulation
      * nf_tables: move specific layer 3 compat code to nf_tables_ipv[4|6]
      * nf_tables: support 32bits-64bits x_tables compat
      * nf_tables: fix compilation if CONFIG_COMPAT is disabled
      
      From Patrick McHardy:
      * nf_tables: move policy to struct nft_base_chain
      * nf_tables: send notifications for base chain policy changes
      
      From Alexander Primak:
      * nf_tables: remove the duplicate NF_INET_LOCAL_OUT
      
      From Nicolas Dichtel:
      * nf_tables: fix compilation when nf-netlink is a module
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      0ca743a5
  20. 14 10月, 2013 3 次提交
    • P
      netfilter: nf_tables: convert built-in tables/chains to chain types · 9370761c
      Pablo Neira Ayuso 提交于
      This patch converts built-in tables/chains to chain types that
      allows you to deploy customized table and chain configurations from
      userspace.
      
      After this patch, you have to specify the chain type when
      creating a new chain:
      
       add chain ip filter output { type filter hook input priority 0; }
                                    ^^^^ ------
      
      The existing chain types after this patch are: filter, route and
      nat. Note that tables are just containers of chains with no specific
      semantics, which is a significant change with regards to iptables.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9370761c
    • P
      netfilter: nf_tables: add netlink set API · 20a69341
      Patrick McHardy 提交于
      This patch adds the new netlink API for maintaining nf_tables sets
      independently of the ruleset. The API supports the following operations:
      
      - creation of sets
      - deletion of sets
      - querying of specific sets
      - dumping of all sets
      
      - addition of set elements
      - removal of set elements
      - dumping of all set elements
      
      Sets are identified by name, each table defines an individual namespace.
      The name of a set may be allocated automatically, this is mostly useful
      in combination with the NFT_SET_ANONYMOUS flag, which destroys a set
      automatically once the last reference has been released.
      
      Sets can be marked constant, meaning they're not allowed to change while
      linked to a rule. This allows to perform lockless operation for set
      types that would otherwise require locking.
      
      Additionally, if the implementation supports it, sets can (as before) be
      used as maps, associating a data value with each key (or range), by
      specifying the NFT_SET_MAP flag and can be used for interval queries by
      specifying the NFT_SET_INTERVAL flag.
      
      Set elements are added and removed incrementally. All element operations
      support batching, reducing netlink message and set lookup overhead.
      
      The old "set" and "hash" expressions are replaced by a generic "lookup"
      expression, which binds to the specified set. Userspace is not aware
      of the actual set implementation used by the kernel anymore, all
      configuration options are generic.
      
      Currently the implementation selection logic is largely missing and the
      kernel will simply use the first registered implementation supporting the
      requested operation. Eventually, the plan is to have userspace supply a
      description of the data characteristics and select the implementation
      based on expected performance and memory use.
      
      This patch includes the new 'lookup' expression to look up for element
      matching in the set.
      
      This patch includes kernel-doc descriptions for this set API and it
      also includes the following fixes.
      
      From Patrick McHardy:
      * netfilter: nf_tables: fix set element data type in dumps
      * netfilter: nf_tables: fix indentation of struct nft_set_elem comments
      * netfilter: nf_tables: fix oops in nft_validate_data_load()
      * netfilter: nf_tables: fix oops while listing sets of built-in tables
      * netfilter: nf_tables: destroy anonymous sets immediately if binding fails
      * netfilter: nf_tables: propagate context to set iter callback
      * netfilter: nf_tables: add loop detection
      
      From Pablo Neira Ayuso:
      * netfilter: nf_tables: allow to dump all existing sets
      * netfilter: nf_tables: fix wrong type for flags variable in newelem
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      20a69341
    • P
      netfilter: add nftables · 96518518
      Patrick McHardy 提交于
      This patch adds nftables which is the intended successor of iptables.
      This packet filtering framework reuses the existing netfilter hooks,
      the connection tracking system, the NAT subsystem, the transparent
      proxying engine, the logging infrastructure and the userspace packet
      queueing facilities.
      
      In a nutshell, nftables provides a pseudo-state machine with 4 general
      purpose registers of 128 bits and 1 specific purpose register to store
      verdicts. This pseudo-machine comes with an extensible instruction set,
      a.k.a. "expressions" in the nftables jargon. The expressions included
      in this patch provide the basic functionality, they are:
      
      * bitwise: to perform bitwise operations.
      * byteorder: to change from host/network endianess.
      * cmp: to compare data with the content of the registers.
      * counter: to enable counters on rules.
      * ct: to store conntrack keys into register.
      * exthdr: to match IPv6 extension headers.
      * immediate: to load data into registers.
      * limit: to limit matching based on packet rate.
      * log: to log packets.
      * meta: to match metainformation that usually comes with the skbuff.
      * nat: to perform Network Address Translation.
      * payload: to fetch data from the packet payload and store it into
        registers.
      * reject (IPv4 only): to explicitly close connection, eg. TCP RST.
      
      Using this instruction-set, the userspace utility 'nft' can transform
      the rules expressed in human-readable text representation (using a
      new syntax, inspired by tcpdump) to nftables bytecode.
      
      nftables also inherits the table, chain and rule objects from
      iptables, but in a more configurable way, and it also includes the
      original datatype-agnostic set infrastructure with mapping support.
      This set infrastructure is enhanced in the follow up patch (netfilter:
      nf_tables: add netlink set API).
      
      This patch includes the following components:
      
      * the netlink API: net/netfilter/nf_tables_api.c and
        include/uapi/netfilter/nf_tables.h
      * the packet filter core: net/netfilter/nf_tables_core.c
      * the expressions (described above): net/netfilter/nft_*.c
      * the filter tables: arp, IPv4, IPv6 and bridge:
        net/ipv4/netfilter/nf_tables_ipv4.c
        net/ipv6/netfilter/nf_tables_ipv6.c
        net/ipv4/netfilter/nf_tables_arp.c
        net/bridge/netfilter/nf_tables_bridge.c
      * the NAT table (IPv4 only):
        net/ipv4/netfilter/nf_table_nat_ipv4.c
      * the route table (similar to mangle):
        net/ipv4/netfilter/nf_table_route_ipv4.c
        net/ipv6/netfilter/nf_table_route_ipv6.c
      * internal definitions under:
        include/net/netfilter/nf_tables.h
        include/net/netfilter/nf_tables_core.h
      * It also includes an skeleton expression:
        net/netfilter/nft_expr_template.c
        and the preliminary implementation of the meta target
        net/netfilter/nft_meta_target.c
      
      It also includes a change in struct nf_hook_ops to add a new
      pointer to store private data to the hook, that is used to store
      the rule list per chain.
      
      This patch is based on the patch from Patrick McHardy, plus merged
      accumulated cleanups, fixes and small enhancements to the nftables
      code that has been done since 2009, which are:
      
      From Patrick McHardy:
      * nf_tables: adjust netlink handler function signatures
      * nf_tables: only retry table lookup after successful table module load
      * nf_tables: fix event notification echo and avoid unnecessary messages
      * nft_ct: add l3proto support
      * nf_tables: pass expression context to nft_validate_data_load()
      * nf_tables: remove redundant definition
      * nft_ct: fix maxattr initialization
      * nf_tables: fix invalid event type in nf_tables_getrule()
      * nf_tables: simplify nft_data_init() usage
      * nf_tables: build in more core modules
      * nf_tables: fix double lookup expression unregistation
      * nf_tables: move expression initialization to nf_tables_core.c
      * nf_tables: build in payload module
      * nf_tables: use NFPROTO constants
      * nf_tables: rename pid variables to portid
      * nf_tables: save 48 bits per rule
      * nf_tables: introduce chain rename
      * nf_tables: check for duplicate names on chain rename
      * nf_tables: remove ability to specify handles for new rules
      * nf_tables: return error for rule change request
      * nf_tables: return error for NLM_F_REPLACE without rule handle
      * nf_tables: include NLM_F_APPEND/NLM_F_REPLACE flags in rule notification
      * nf_tables: fix NLM_F_MULTI usage in netlink notifications
      * nf_tables: include NLM_F_APPEND in rule dumps
      
      From Pablo Neira Ayuso:
      * nf_tables: fix stack overflow in nf_tables_newrule
      * nf_tables: nft_ct: fix compilation warning
      * nf_tables: nft_ct: fix crash with invalid packets
      * nft_log: group and qthreshold are 2^16
      * nf_tables: nft_meta: fix socket uid,gid handling
      * nft_counter: allow to restore counters
      * nf_tables: fix module autoload
      * nf_tables: allow to remove all rules placed in one chain
      * nf_tables: use 64-bits rule handle instead of 16-bits
      * nf_tables: fix chain after rule deletion
      * nf_tables: improve deletion performance
      * nf_tables: add missing code in route chain type
      * nf_tables: rise maximum number of expressions from 12 to 128
      * nf_tables: don't delete table if in use
      * nf_tables: fix basechain release
      
      From Tomasz Bursztyka:
      * nf_tables: Add support for changing users chain's name
      * nf_tables: Change chain's name to be fixed sized
      * nf_tables: Add support for replacing a rule by another one
      * nf_tables: Update uapi nftables netlink header documentation
      
      From Florian Westphal:
      * nft_log: group is u16, snaplen u32
      
      From Phil Oester:
      * nf_tables: operational limit match
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      96518518
  21. 11 10月, 2013 1 次提交
  22. 09 10月, 2013 1 次提交
  23. 08 10月, 2013 1 次提交
  24. 06 10月, 2013 1 次提交
    • D
      misc: mic: Enable OSPM suspend and resume support. · af190494
      Dasaratharaman Chandramouli 提交于
      This patch enables support for OSPM suspend and resume in the MIC
      driver. During a host suspend event, the driver performs an
      orderly shutdown of the cards if they are online. Upon resume, any
      cards that were previously online before suspend are rebooted.
      The driver performs an orderly shutdown of the card primarily to
      ensure that applications in the card are terminated and mounted
      devices are safely un-mounted before the card is powered down in
      the event of an OSPM suspend.
      
      The driver makes use of the MIC daemon to accomplish OSPM suspend
      and resume. The driver registers a PM notifier per MIC device.
      The devices get notified synchronously during PM_SUSPEND_PREPARE and
      PM_POST_SUSPEND phases.
      
      During the PM_SUSPEND_PREPARE phase, the driver performs one of the
      following three tasks.
      1) If the card is 'offline', the driver sets the card to a
         'suspended' state and returns.
      2) If the card is 'online', the driver initiates card shutdown by
         setting the card state to suspending. This notifies the MIC
         daemon which invokes shutdown and sets card state to 'suspended'.
         The driver returns after the shutdown is complete.
      3) If the card is already being shutdown, possibly by a host user
         space application, the driver sets the card state to 'suspended'
         and returns after the shutdown is complete.
      
      During the PM_POST_SUSPEND phase, the driver simply notifies the
      daemon and returns. The daemon boots those cards that were previously
      online during the suspend phase.
      Signed-off-by: NAshutosh Dixit <ashutosh.dixit@intel.com>
      Signed-off-by: NNikhil Rao <nikhil.rao@intel.com>
      Signed-off-by: NHarshavardhan R Kharche <harshavardhan.r.kharche@intel.com>
      Signed-off-by: NSudeep Dutt <sudeep.dutt@intel.com>
      Signed-off-by: NDasaratharaman Chandramouli <dasaratharaman.chandramouli@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      af190494
  25. 04 10月, 2013 1 次提交
    • A
      perf: Add generic transaction flags · fdfbbd07
      Andi Kleen 提交于
      Add a generic qualifier for transaction events, as a new sample
      type that returns a flag word. This is particularly useful
      for qualifying aborts: to distinguish aborts which happen
      due to asynchronous events (like conflicts caused by another
      CPU) versus instructions that lead to an abort.
      
      The tuning strategies are very different for those cases,
      so it's important to distinguish them easily and early.
      
      Since it's inconvenient and inflexible to filter for this
      in the kernel we report all the events out and allow
      some post processing in user space.
      
      The flags are based on the Intel TSX events, but should be fairly
      generic and mostly applicable to other HTM architectures too. In addition
      to various flag words there's also reserved space to report an
      program supplied abort code. For TSX this is used to distinguish specific
      classes of aborts, like a lock busy abort when doing lock elision.
      
      Flags:
      
      Elision and generic transactions 		   (ELISION vs TRANSACTION)
      (HLE vs RTM on TSX; IBM etc.  would likely only use TRANSACTION)
      Aborts caused by current thread vs aborts caused by others (SYNC vs ASYNC)
      Retryable transaction				   (RETRY)
      Conflicts with other threads			   (CONFLICT)
      Transaction write capacity overflow		   (CAPACITY WRITE)
      Transaction read capacity overflow		   (CAPACITY READ)
      
      Transactions implicitely aborted can also return an abort code.
      This can be used to signal specific events to the profiler. A common
      case is abort on lock busy in a RTM eliding library (code 0xff)
      To handle this case we include the TSX abort code
      
      Common example aborts in TSX would be:
      
      - Data conflict with another thread on memory read.
                                            Flags: TRANSACTION|ASYNC|CONFLICT
      - executing a WRMSR in a transaction. Flags: TRANSACTION|SYNC
      - HLE transaction in user space is too large
                                            Flags: ELISION|SYNC|CAPACITY-WRITE
      
      The only flag that is somewhat TSX specific is ELISION.
      
      This adds the perf core glue needed for reporting the new flag word out.
      
      v2: Add MEM/MISC
      v3: Move transaction to the end
      v4: Separate capacity-read/write and remove misc
      v5: Remove _SAMPLE. Move abort flags to 32bit. Rename
          transaction to txn
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1379688044-14173-2-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fdfbbd07