1. 07 1月, 2014 7 次提交
  2. 06 1月, 2014 1 次提交
  3. 05 1月, 2014 6 次提交
  4. 04 1月, 2014 14 次提交
    • J
      pci_regs.h: Add PCI bus link speed and width defines · 55fdbfe7
      Jeff Kirsher 提交于
      Add missing PCI bus link speed 8.0 GT/s and bus link widths of
      x1, x2, x4 and x8.
      
      CC: <linux-kernel@vger.kernel.org>
      CC: Bjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      55fdbfe7
    • S
      bonding: add ad_info attribute netlink support · 4ee7ac75
      sfeldma@cumulusnetworks.com 提交于
      Add nested IFLA_BOND_AD_INFO for bonding 802.3ad info.
      Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ee7ac75
    • S
      bonding: add ad_select attribute netlink support · ec029fac
      sfeldma@cumulusnetworks.com 提交于
      Add IFLA_BOND_AD_SELECT to allow get/set of bonding parameter
      ad_select via netlink.
      Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec029fac
    • S
      bonding: add lacp_rate attribute netlink support · 998e40bb
      sfeldma@cumulusnetworks.com 提交于
      Add IFLA_BOND_AD_LACP_RATE to allow get/set of bonding parameter
      lacp_rate via netlink.
      Signed-off-by: NScott Feldman <sfeldma@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      998e40bb
    • S
      llc: make lock static · 5e419e68
      stephen hemminger 提交于
      The llc_sap_list_lock does not need to be global, only acquired
      in core.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e419e68
    • S
      socket: cleanups · 8f09898b
      stephen hemminger 提交于
      Namespace related cleaning
      
       * make cred_to_ucred static
       * remove unused sock_rmalloc function
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8f09898b
    • T
      ipv4: Use percpu Cache route in IP tunnels · 9a4aa9af
      Tom Herbert 提交于
      percpu route cache eliminates share of dst refcnt between CPUs.
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a4aa9af
    • T
      ipv4: Cache dst in tunnels · 7d442fab
      Tom Herbert 提交于
      Avoid doing a route lookup on every packet being tunneled.
      
      In ip_tunnel.c cache the route returned from ip_route_output if
      the tunnel is "connected" so that all the rouitng parameters are
      taken from tunnel parms for a packet. Specifically, not NBMA tunnel
      and tos is from tunnel parms (not inner packet).
      Signed-off-by: NTom Herbert <therbert@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d442fab
    • D
      netfilter: x_tables: lightweight process control group matching · 82a37132
      Daniel Borkmann 提交于
      It would be useful e.g. in a server or desktop environment to have
      a facility in the notion of fine-grained "per application" or "per
      application group" firewall policies. Probably, users in the mobile,
      embedded area (e.g. Android based) with different security policy
      requirements for application groups could have great benefit from
      that as well. For example, with a little bit of configuration effort,
      an admin could whitelist well-known applications, and thus block
      otherwise unwanted "hard-to-track" applications like [1] from a
      user's machine. Blocking is just one example, but it is not limited
      to that, meaning we can have much different scenarios/policies that
      netfilter allows us than just blocking, e.g. fine grained settings
      where applications are allowed to connect/send traffic to, application
      traffic marking/conntracking, application-specific packet mangling,
      and so on.
      
      Implementation of PID-based matching would not be appropriate
      as they frequently change, and child tracking would make that
      even more complex and ugly. Cgroups would be a perfect candidate
      for accomplishing that as they associate a set of tasks with a
      set of parameters for one or more subsystems, in our case the
      netfilter subsystem, which, of course, can be combined with other
      cgroup subsystems into something more complex if needed.
      
      As mentioned, to overcome this constraint, such processes could
      be placed into one or multiple cgroups where different fine-grained
      rules can be defined depending on the application scenario, while
      e.g. everything else that is not part of that could be dropped (or
      vice versa), thus making life harder for unwanted processes to
      communicate to the outside world. So, we make use of cgroups here
      to track jobs and limit their resources in terms of iptables
      policies; in other words, limiting, tracking, etc what they are
      allowed to communicate.
      
      In our case we're working on outgoing traffic based on which local
      socket that originated from. Also, one doesn't even need to have
      an a-prio knowledge of the application internals regarding their
      particular use of ports or protocols. Matching is *extremly*
      lightweight as we just test for the sk_classid marker of sockets,
      originating from net_cls. net_cls and netfilter do not contradict
      each other; in fact, each construct can live as standalone or they
      can be used in combination with each other, which is perfectly fine,
      plus it serves Tejun's requirement to not introduce a new cgroups
      subsystem. Through this, we result in a very minimal and efficient
      module, and don't add anything except netfilter code.
      
      One possible, minimal usage example (many other iptables options
      can be applied obviously):
      
       1) Configuring cgroups if not already done, e.g.:
      
        mkdir /sys/fs/cgroup/net_cls
        mount -t cgroup -o net_cls net_cls /sys/fs/cgroup/net_cls
        mkdir /sys/fs/cgroup/net_cls/0
        echo 1 > /sys/fs/cgroup/net_cls/0/net_cls.classid
        (resp. a real flow handle id for tc)
      
       2) Configuring netfilter (iptables-nftables), e.g.:
      
        iptables -A OUTPUT -m cgroup ! --cgroup 1 -j DROP
      
       3) Running applications, e.g.:
      
        ping 208.67.222.222  <pid:1799>
        echo 1799 > /sys/fs/cgroup/net_cls/0/tasks
        64 bytes from 208.67.222.222: icmp_seq=44 ttl=49 time=11.9 ms
        [...]
        ping 208.67.220.220  <pid:1804>
        ping: sendmsg: Operation not permitted
        [...]
        echo 1804 > /sys/fs/cgroup/net_cls/0/tasks
        64 bytes from 208.67.220.220: icmp_seq=89 ttl=56 time=19.0 ms
        [...]
      
      Of course, real-world deployments would make use of cgroups user
      space toolsuite, or own custom policy daemons dynamically moving
      applications from/to various cgroups.
      
        [1] http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-biondi/bh-eu-06-biondi-up.pdfSigned-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: cgroups@vger.kernel.org
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      82a37132
    • D
      net: netprio: rename config to be more consistent with cgroup configs · 86f8515f
      Daniel Borkmann 提交于
      While we're at it and introduced CGROUP_NET_CLASSID, lets also make
      NETPRIO_CGROUP more consistent with the rest of cgroups and rename it
      into CONFIG_CGROUP_NET_PRIO so that for networking, we now have
      CONFIG_CGROUP_NET_{PRIO,CLASSID}. This not only makes the CONFIG
      option consistent among networking cgroups, but also among cgroups
      CONFIG conventions in general as the vast majority has a prefix of
      CONFIG_CGROUP_<SUBSYS>.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: cgroups@vger.kernel.org
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      86f8515f
    • D
      net: net_cls: move cgroupfs classid handling into core · fe1217c4
      Daniel Borkmann 提交于
      Zefan Li requested [1] to perform the following cleanup/refactoring:
      
      - Split cgroupfs classid handling into net core to better express a
        possible more generic use.
      
      - Disable module support for cgroupfs bits as the majority of other
        cgroupfs subsystems do not have that, and seems to be not wished
        from cgroup side. Zefan probably might want to follow-up for netprio
        later on.
      
      - By this, code can be further reduced which previously took care of
        functionality built when compiled as module.
      
      cgroupfs bits are being placed under net/core/netclassid_cgroup.c, so
      that we are consistent with {netclassid,netprio}_cgroup naming that is
      under net/core/ as suggested by Zefan.
      
      No change in functionality, but only code refactoring that is being
      done here.
      
       [1] http://patchwork.ozlabs.org/patch/304825/Suggested-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: cgroups@vger.kernel.org
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      fe1217c4
    • S
      netfilter: nf_conntrack: remove dead code · dcd93ed4
      stephen hemminger 提交于
      The following code is not used in current upstream code.
      Some of this seems to be old hooks, other might be used by some
      out of tree module (which I don't care about breaking), and
      the need_ipv4_conntrack was used by old NAT code but no longer
      called.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      dcd93ed4
    • S
      netfilter: ipset: remove unused code · 02eca9d2
      stephen hemminger 提交于
      Function never used in current upstream code.
      Signed-off-by: NStephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      02eca9d2
    • D
      netfilter: nf_nat: add full port randomization support · 34ce3240
      Daniel Borkmann 提交于
      We currently use prandom_u32() for allocation of ports in tcp bind(0)
      and udp code. In case of plain SNAT we try to keep the ports as is
      or increment on collision.
      
      SNAT --random mode does use per-destination incrementing port
      allocation. As a recent paper pointed out in [1] that this mode of
      port allocation makes it possible to an attacker to find the randomly
      allocated ports through a timing side-channel in a socket overloading
      attack conducted through an off-path attacker.
      
      So, NF_NAT_RANGE_PROTO_RANDOM actually weakens the port randomization
      in regard to the attack described in this paper. As we need to keep
      compatibility, add another flag called NF_NAT_RANGE_PROTO_RANDOM_FULLY
      that would replace the NF_NAT_RANGE_PROTO_RANDOM hash-based port
      selection algorithm with a simple prandom_u32() in order to mitigate
      this attack vector. Note that the lfsr113's internal state is
      periodically reseeded by the kernel through a local secure entropy
      source.
      
      More details can be found in [1], the basic idea is to send bursts
      of packets to a socket to overflow its receive queue and measure
      the latency to detect a possible retransmit when the port is found.
      Because of increasing ports to given destination and port, further
      allocations can be predicted. This information could then be used by
      an attacker for e.g. for cache-poisoning, NS pinning, and degradation
      of service attacks against DNS servers [1]:
      
        The best defense against the poisoning attacks is to properly
        deploy and validate DNSSEC; DNSSEC provides security not only
        against off-path attacker but even against MitM attacker. We hope
        that our results will help motivate administrators to adopt DNSSEC.
        However, full DNSSEC deployment make take significant time, and
        until that happens, we recommend short-term, non-cryptographic
        defenses. We recommend to support full port randomisation,
        according to practices recommended in [2], and to avoid
        per-destination sequential port allocation, which we show may be
        vulnerable to derandomisation attacks.
      
      Joint work between Hannes Frederic Sowa and Daniel Borkmann.
      
       [1] https://sites.google.com/site/hayashulman/files/NIC-derandomisation.pdf
       [2] http://arxiv.org/pdf/1205.5190v1.pdfSigned-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      34ce3240
  5. 03 1月, 2014 2 次提交
    • W
      ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC · 7a7ffbab
      Wei-Chun Chao 提交于
      VM to VM GSO traffic is broken if it goes through VXLAN or GRE
      tunnel and the physical NIC on the host supports hardware VXLAN/GRE
      GSO offload (e.g. bnx2x and next-gen mlx4).
      
      Two issues -
      (VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with
      SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header
      integrity check fails in udp4_ufo_fragment if inner protocol is
      TCP. Also gso_segs is calculated incorrectly using skb->len that
      includes tunnel header. Fix: robust check should only be applied
      to the inner packet.
      
      (VXLAN & GRE) Once GSO header integrity check passes, NULL segs
      is returned and the original skb is sent to hardware. However the
      tunnel header is already pulled. Fix: tunnel header needs to be
      restored so that hardware can perform GSO properly on the original
      packet.
      Signed-off-by: NWei-Chun Chao <weichunc@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a7ffbab
    • V
      sctp: Remove outqueue empty state · 619a60ee
      Vlad Yasevich 提交于
      The SCTP outqueue structure maintains a data chunks
      that are pending transmission, the list of chunks that
      are pending a retransmission and a length of data in
      flight.  It also tries to keep the emtpy state so that
      it can performe shutdown sequence or notify user.
      
      The problem is that the empy state is inconsistently
      tracked.  It is possible to completely drain the queue
      without sending anything when using PR-SCTP.  In this
      case, the empty state will not be correctly state as
      report by Jamal Hadi Salim <jhs@mojatatu.com>.  This
      can cause an association to be perminantly stuck in the
      SHUTDOWN_PENDING state.
      
      Additionally, SCTP is incredibly inefficient when setting
      the empty state.  Even though all the data is availaible
      in the outqueue structure, we ignore it and walk a list
      of trasnports.
      
      In the end, we can completely remove the extra empty
      state and figure out if the queue is empty by looking
      at 3 things:  length of pending data, length of in-flight
      data, and exisiting of retransmit data.  All of these
      are already in the strucutre.
      Reported-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      619a60ee
  6. 02 1月, 2014 8 次提交
  7. 01 1月, 2014 2 次提交
    • D
      vlan: Fix header ops passthru when doing TX VLAN offload. · 2205369a
      David S. Miller 提交于
      When the vlan code detects that the real device can do TX VLAN offloads
      in hardware, it tries to arrange for the real device's header_ops to
      be invoked directly.
      
      But it does so illegally, by simply hooking the real device's
      header_ops up to the VLAN device.
      
      This doesn't work because we will end up invoking a set of header_ops
      routines which expect a device type which matches the real device, but
      will see a VLAN device instead.
      
      Fix this by providing a pass-thru set of header_ops which will arrange
      to pass the proper real device instead.
      
      To facilitate this add a dev_rebuild_header().  There are
      implementations which provide a ->cache and ->create but not a
      ->rebuild (f.e. PLIP).  So we need a helper function just like
      dev_hard_header() to avoid crashes.
      
      Use this helper in the one existing place where the
      header_ops->rebuild was being invoked, the neighbour code.
      
      With lots of help from Florian Westphal.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2205369a
    • Z
      net, rps: fix build failure when CONFIG_RPS isn't set · c9d8ca04
      Zhi Yong Wu 提交于
      In file included from net/socket.c:99:0:
      include/net/sock.h: In function ‘sock_rps_record_flow’:
      include/net/sock.h:849:30: error: ‘const struct sock’ has no member named ‘sk_rxhash’
      include/net/sock.h: In function ‘sock_rps_reset_flow’:
      include/net/sock.h:854:29: error: ‘const struct sock’ has no member named ‘sk_rxhash’
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9d8ca04