1. 02 7月, 2016 15 次提交
    • S
      RDS: TCP: Hooks to set up a single connection path · b04e8554
      Sowmini Varadhan 提交于
      This patch adds ->conn_path_connect callbacks in the rds_transport
      that are used to set up a single connection path.
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b04e8554
    • S
      RDS: TCP: make receive path use the rds_conn_path · 2da43c4a
      Sowmini Varadhan 提交于
      The ->sk_user_data contains a pointer to the rds_conn_path
      for the socket. Use this consistently in the rds_tcp_data_ready
      callbacks to get the rds_conn_path for rds_recv_incoming.
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2da43c4a
    • S
      RDS: TCP: make ->sk_user_data point to a rds_conn_path · ea3b1ea5
      Sowmini Varadhan 提交于
      The socket callbacks should all operate on a struct rds_conn_path,
      in preparation for a MP capable RDS-TCP.
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ea3b1ea5
    • S
      RDS: TCP: Refactor connection destruction to handle multiple paths · afb4164d
      Sowmini Varadhan 提交于
      A single rds_connection may have multiple rds_conn_paths that have
      to be carefully and correctly destroyed, for both rmmod and
      netns-delete cases.
      
      For both cases, we extract a single rds_tcp_connection for
      each conn into a temporary list, and then invoke rds_conn_destroy()
      which iteratively dismantles every path in the rds_connection.
      
      For the netns deletion case, we additionally have to make sure
      that we do not leave a socket in TIME_WAIT state, as this will
      hold up the netns deletion. Thus we call rds_tcp_conn_paths_destroy()
      to reset state quickly.
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      afb4164d
    • S
      RDS: TCP: Make rds_tcp_connection track the rds_conn_path · 02105b2c
      Sowmini Varadhan 提交于
      The struct rds_tcp_connection is the transport-specific private
      data structure that tracks TCP information per rds_conn_path.
      Modify this structure to have a back-pointer to the rds_conn_path
      for which it is the ->cp_transport_data.
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02105b2c
    • S
      RDS: TCP: Remove dead logic around c_passive in rds-tcp · 26e4e6bb
      Sowmini Varadhan 提交于
      The c_passive bit is only intended for the IB transport and will
      never be encountered in rds-tcp, so remove the dead logic that
      predicates on this bit.
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26e4e6bb
    • S
      RDS: Rework path specific indirections · 226f7a7d
      Sowmini Varadhan 提交于
      Refactor code to avoid separate indirections for single-path
      and multipath transports. All transports (both single and mp-capable)
      will get a pointer to the rds_conn_path, and can trivially derive
      the rds_connection from the ->cp_conn.
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      226f7a7d
    • D
      Merge branch 'bpf-cgroup2' · dc9a2002
      David S. Miller 提交于
      Martin KaFai Lau says:
      
      ====================
      cgroup: bpf: cgroup2 membership test on skb
      
      This series is to implement a bpf-way to
      check the cgroup2 membership of a skb (sk_buff).
      
      It is similar to the feature added in netfilter:
      c38c4597 ("netfilter: implement xt_cgroup cgroup2 path match")
      
      The current target is the tc-like usage.
      
      v3:
      - Remove WARN_ON_ONCE(!rcu_read_lock_held())
      - Stop BPF_MAP_TYPE_CGROUP_ARRAY usage in patch 2/4
      - Avoid mounting bpf fs manually in patch 4/4
      
      - Thanks for Daniel's review and the above suggestions
      
      - Check CONFIG_SOCK_CGROUP_DATA instead of CONFIG_CGROUPS.  Thanks to
        the kbuild bot's report.
        Patch 2/4 only needs CONFIG_CGROUPS while patch 3/4 needs
        CONFIG_SOCK_CGROUP_DATA.  Since a single bpf cgrp2 array alone is
        not useful for now, CONFIG_SOCK_CGROUP_DATA is also used in
        patch 2/4.  We can fine tune it later if we find other use cases
        for the cgrp2 array.
      - Return EAGAIN instead of ENOENT if the cgrp2 array entry is
        NULL.  It is to distinguish these two cases: 1) the userland has
        not populated this array entry yet. or 2) not finding cgrp2 from the skb.
      
      - Be-lated thanks to Alexei and Tejun on reviewing v1 and giving advice on
        this work.
      
      v2:
      - Fix two return cases in cgroup_get_from_fd()
      - Fix compilation errors when CONFIG_CGROUPS is not used:
        - arraymap.c: avoid registering BPF_MAP_TYPE_CGROUP_ARRAY
        - filter.c: tc_cls_act_func_proto() returns NULL on BPF_FUNC_skb_in_cgroup
      - Add comments to BPF_FUNC_skb_in_cgroup and cgroup_get_from_fd()
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dc9a2002
    • M
      cgroup: bpf: Add an example to do cgroup checking in BPF · a3f74617
      Martin KaFai Lau 提交于
      test_cgrp2_array_pin.c:
      A userland program that creates a bpf_map (BPF_MAP_TYPE_GROUP_ARRAY),
      pouplates/updates it with a cgroup2's backed fd and pins it to a
      bpf-fs's file.  The pinned file can be loaded by tc and then used
      by the bpf prog later.  This program can also update an existing pinned
      array and it could be useful for debugging/testing purpose.
      
      test_cgrp2_tc_kern.c:
      A bpf prog which should be loaded by tc.  It is to demonstrate
      the usage of bpf_skb_in_cgroup.
      
      test_cgrp2_tc.sh:
      A script that glues the test_cgrp2_array_pin.c and
      test_cgrp2_tc_kern.c together.  The idea is like:
      1. Load the test_cgrp2_tc_kern.o by tc
      2. Use test_cgrp2_array_pin.c to populate a BPF_MAP_TYPE_CGROUP_ARRAY
         with a cgroup fd
      3. Do a 'ping -6 ff02::1%ve' to ensure the packet has been
         dropped because of a match on the cgroup
      
      Most of the lines in test_cgrp2_tc.sh is the boilerplate
      to setup the cgroup/bpf-fs/net-devices/netns...etc.  It is
      not bulletproof on errors but should work well enough and
      give enough debug info if things did not go well.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a3f74617
    • M
      cgroup: bpf: Add bpf_skb_in_cgroup_proto · 4a482f34
      Martin KaFai Lau 提交于
      Adds a bpf helper, bpf_skb_in_cgroup, to decide if a skb->sk
      belongs to a descendant of a cgroup2.  It is similar to the
      feature added in netfilter:
      commit c38c4597 ("netfilter: implement xt_cgroup cgroup2 path match")
      
      The user is expected to populate a BPF_MAP_TYPE_CGROUP_ARRAY
      which will be used by the bpf_skb_in_cgroup.
      
      Modifications to the bpf verifier is to ensure BPF_MAP_TYPE_CGROUP_ARRAY
      and bpf_skb_in_cgroup() are always used together.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4a482f34
    • M
      cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY · 4ed8ec52
      Martin KaFai Lau 提交于
      Add a BPF_MAP_TYPE_CGROUP_ARRAY and its bpf_map_ops's implementations.
      To update an element, the caller is expected to obtain a cgroup2 backed
      fd by open(cgroup2_dir) and then update the array with that fd.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ed8ec52
    • M
      cgroup: Add cgroup_get_from_fd · 1f3fe7eb
      Martin KaFai Lau 提交于
      Add a helper function to get a cgroup2 from a fd.  It will be
      stored in a bpf array (BPF_MAP_TYPE_CGROUP_ARRAY) which will
      be introduced in the later patch.
      Signed-off-by: NMartin KaFai Lau <kafai@fb.com>
      Cc: Alexei Starovoitov <ast@fb.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Tejun Heo <tj@kernel.org>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1f3fe7eb
    • D
      Merge branch 'bpf-robustify' · 6bd3847b
      David S. Miller 提交于
      Daniel Borkmann says:
      
      ====================
      Further robustify putting BPF progs
      
      This series addresses a potential issue reported to us by Jann Horn
      with regards to putting progs. First patch moves progs generally under
      RCU destruction and second patch refactors getting of progs to simplify
      code a bit. For details, please see individual patches. Note, we think
      that addressing this one in net-next should be sufficient.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6bd3847b
    • D
      bpf: refactor bpf_prog_get and type check into helper · 113214be
      Daniel Borkmann 提交于
      Since bpf_prog_get() and program type check is used in a couple of places,
      refactor this into a small helper function that we can make use of. Since
      the non RO prog->aux part is not used in performance critical paths and a
      program destruction via RCU is rather very unlikley when doing the put, we
      shouldn't have an issue just doing the bpf_prog_get() + prog->type != type
      check, but actually not taking the ref at all (due to being in fdget() /
      fdput() section of the bpf fd) is even cleaner and makes the diff smaller
      as well, so just go for that. Callsites are changed to make use of the new
      helper where possible.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      113214be
    • D
      bpf: generally move prog destruction to RCU deferral · 1aacde3d
      Daniel Borkmann 提交于
      Jann Horn reported following analysis that could potentially result
      in a very hard to trigger (if not impossible) UAF race, to quote his
      event timeline:
      
       - Set up a process with threads T1, T2 and T3
       - Let T1 set up a socket filter F1 that invokes another filter F2
         through a BPF map [tail call]
       - Let T1 trigger the socket filter via a unix domain socket write,
         don't wait for completion
       - Let T2 call PERF_EVENT_IOC_SET_BPF with F2, don't wait for completion
       - Now T2 should be behind bpf_prog_get(), but before bpf_prog_put()
       - Let T3 close the file descriptor for F2, dropping the reference
         count of F2 to 2
       - At this point, T1 should have looked up F2 from the map, but not
         finished executing it
       - Let T3 remove F2 from the BPF map, dropping the reference count of
         F2 to 1
       - Now T2 should call bpf_prog_put() (wrong BPF program type), dropping
         the reference count of F2 to 0 and scheduling bpf_prog_free_deferred()
         via schedule_work()
       - At this point, the BPF program could be freed
       - BPF execution is still running in a freed BPF program
      
      While at PERF_EVENT_IOC_SET_BPF time it's only guaranteed that the perf
      event fd we're doing the syscall on doesn't disappear from underneath us
      for whole syscall time, it may not be the case for the bpf fd used as
      an argument only after we did the put. It needs to be a valid fd pointing
      to a BPF program at the time of the call to make the bpf_prog_get() and
      while T2 gets preempted, F2 must have dropped reference to 1 on the other
      CPU. The fput() from the close() in T3 should also add additionally delay
      to the reference drop via exit_task_work() when bpf_prog_release() gets
      called as well as scheduling bpf_prog_free_deferred().
      
      That said, it makes nevertheless sense to move the BPF prog destruction
      generally after RCU grace period to guarantee that such scenario above,
      but also others as recently fixed in ceb56070 ("bpf, perf: delay release
      of BPF prog after grace period") with regards to tail calls won't happen.
      Integrating bpf_prog_free_deferred() directly into the RCU callback is
      not allowed since the invocation might happen from either softirq or
      process context, so we're not permitted to block. Reviewing all bpf_prog_put()
      invocations from eBPF side (note, cBPF -> eBPF progs don't use this for
      their destruction) with call_rcu() look good to me.
      
      Since we don't know whether at the time of attaching the program, we're
      already part of a tail call map, we need to use RCU variant. However, due
      to this, there won't be severely more stress on the RCU callback queue:
      situations with above bpf_prog_get() and bpf_prog_put() combo in practice
      normally won't lead to releases, but even if they would, enough effort/
      cycles have to be put into loading a BPF program into the kernel already.
      Reported-by: NJann Horn <jannh@google.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1aacde3d
  2. 01 7月, 2016 21 次提交
  3. 30 6月, 2016 4 次提交
    • D
      Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · 435c556c
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      Intel Wired LAN Driver Updates 2016-06-29
      
      This series contains updates and fixes to e1000e, igb, ixgbe and fm10k.  A
      true smorgasbord of changes.
      
      Jake cleans up some obscurity by not using the BIT() macro on bitshift
      operation and also fixed the calculated index when looping through the
      indir array.  Fixes the issue with igb's workqueue item for overflow
      check from causing a surprise remove event.  The ptp_flags variable is
      added to simplify the work of writing several complex MAC type checks
      in the PTP code while fixing the workqueue.
      
      Alex Duyck fixes the receive buffers alignment which should not be L1
      cache aligned, but to 512 bytes instead.
      
      Denys Vlasenko prevents a division by zero which was reported under
      VMWare for e1000e.
      
      Amritha fixes an issue where filters in a child hash table must be
      cleared from the hardware before delete the filter links in ixgbe.
      
      Bhaktipriya Shridhar simply replaces the deprecated create_workqueue()
      with alloc_workqueue() for fm10k.
      
      Tony corrects ixgbe ethtool reporting to show x550 supports hardware
      timestamping of all packets.
      
      Emil fixes an issue where MAC-VLANs on the VF fail to pass traffic due
      to spoofed packets.
      
      Andrew Lunn increases performance on some systems where syncing a buffer
      for DMA is expensive.  So rather than sync the whole 2K receive buffer,
      only synchronize the length of the frame.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      435c556c
    • D
      Merge branch 'nfp-next' · c435e6e0
      David S. Miller 提交于
      Jakub Kicinski says:
      
      ====================
      nfp: few code improvements
      
      Three small patches for net-next.  First and second patches
      improve the code quality by spelling things correctly and
      removing unused parameters.  Third patch hooks-in standard
      kernel implementation of .get_link() in ethtool ops.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c435e6e0
    • J
      nfp: implement ethtool .get_link() callback · 2370def2
      Jakub Kicinski 提交于
      Point the ethtool .get_link() callback to the standard
      ethtool_op_get_link() implementation.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2370def2
    • J
      nfp: remove unused parameter from nfp_net_write_mac_addr() · f642963b
      Jakub Kicinski 提交于
      nfp_net_write_mac_addr() always writes to the BAR the current
      device address taken from netdev struct.  The address given
      as parameter is actually ignored.  Since all callers pass
      netdev->dev_addr simply remove the parameter.
      
      While at it improve the function's kdoc a bit.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f642963b
新手
引导
客服 返回
顶部