1. 03 8月, 2014 1 次提交
    • A
      net: filter: split 'struct sk_filter' into socket and bpf parts · 7ae457c1
      Alexei Starovoitov 提交于
      clean up names related to socket filtering and bpf in the following way:
      - everything that deals with sockets keeps 'sk_*' prefix
      - everything that is pure BPF is changed to 'bpf_*' prefix
      
      split 'struct sk_filter' into
      struct sk_filter {
      	atomic_t        refcnt;
      	struct rcu_head rcu;
      	struct bpf_prog *prog;
      };
      and
      struct bpf_prog {
              u32                     jited:1,
                                      len:31;
              struct sock_fprog_kern  *orig_prog;
              unsigned int            (*bpf_func)(const struct sk_buff *skb,
                                                  const struct bpf_insn *filter);
              union {
                      struct sock_filter      insns[0];
                      struct bpf_insn         insnsi[0];
                      struct work_struct      work;
              };
      };
      so that 'struct bpf_prog' can be used independent of sockets and cleans up
      'unattached' bpf use cases
      
      split SK_RUN_FILTER macro into:
          SK_RUN_FILTER to be used with 'struct sk_filter *' and
          BPF_PROG_RUN to be used with 'struct bpf_prog *'
      
      __sk_filter_release(struct sk_filter *) gains
      __bpf_prog_release(struct bpf_prog *) helper function
      
      also perform related renames for the functions that work
      with 'struct bpf_prog *', since they're on the same lines:
      
      sk_filter_size -> bpf_prog_size
      sk_filter_select_runtime -> bpf_prog_select_runtime
      sk_filter_free -> bpf_prog_free
      sk_unattached_filter_create -> bpf_prog_create
      sk_unattached_filter_destroy -> bpf_prog_destroy
      sk_store_orig_filter -> bpf_prog_store_orig_filter
      sk_release_orig_filter -> bpf_release_orig_filter
      __sk_migrate_filter -> bpf_migrate_filter
      __sk_prepare_filter -> bpf_prepare_filter
      
      API for attaching classic BPF to a socket stays the same:
      sk_attach_filter(prog, struct sock *)/sk_detach_filter(struct sock *)
      and SK_RUN_FILTER(struct sk_filter *, ctx) to execute a program
      which is used by sockets, tun, af_packet
      
      API for 'unattached' BPF programs becomes:
      bpf_prog_create(struct bpf_prog **)/bpf_prog_destroy(struct bpf_prog *)
      and BPF_PROG_RUN(struct bpf_prog *, ctx) to execute a program
      which is used by isdn, ppp, team, seccomp, ptp, xt_bpf, cls_bpf, test_bpf
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ae457c1
  2. 25 7月, 2014 1 次提交
  3. 21 7月, 2014 2 次提交
    • C
      net_sched: avoid generating same handle for u32 filters · 7801db8a
      Cong Wang 提交于
      When kernel generates a handle for a u32 filter, it tries to start
      from the max in the bucket. So when we have a filter with the max (fff)
      handle, it will cause kernel always generates the same handle for new
      filters. This can be shown by the following command:
      
      	tc qdisc add dev eth0 ingress
      	tc filter add dev eth0 parent ffff: protocol ip pref 770 handle 800::fff u32 match ip protocol 1 0xff
      	tc filter add dev eth0 parent ffff: protocol ip pref 770 u32 match ip protocol 1 0xff
      	...
      
      we will get some u32 filters with same handle:
      
       # tc filter show dev eth0 parent ffff:
      filter protocol ip pref 770 u32
      filter protocol ip pref 770 u32 fh 800: ht divisor 1
      filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
        match 00010000/00ff0000 at 8
      filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
        match 00010000/00ff0000 at 8
      filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
        match 00010000/00ff0000 at 8
      filter protocol ip pref 770 u32 fh 800::fff order 4095 key ht 800 bkt 0
        match 00010000/00ff0000 at 8
      
      handles should be unique. This patch fixes it by looking up a bitmap,
      so that can guarantee the handle is as unique as possible. For compatibility,
      we still start from 0x800.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7801db8a
    • C
      net_sched: hold tcf_lock in netdevice notifier · 224e923c
      Cong Wang 提交于
      We modify mirred action (m->tcfm_dev) in netdev event, we need to
      prevent on-going mirred actions from reading freed m->tcfm_dev.
      So we need to acquire this spin lock.
      
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      224e923c
  4. 18 7月, 2014 1 次提交
  5. 16 7月, 2014 1 次提交
    • T
      net: set name_assign_type in alloc_netdev() · c835a677
      Tom Gundersen 提交于
      Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
      all users to pass NET_NAME_UNKNOWN.
      
      Coccinelle patch:
      
      @@
      expression sizeof_priv, name, setup, txqs, rxqs, count;
      @@
      
      (
      -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
      +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
      |
      -alloc_netdev_mq(sizeof_priv, name, setup, count)
      +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
      |
      -alloc_netdev(sizeof_priv, name, setup)
      +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
      )
      
      v9: move comments here from the wrong commit
      Signed-off-by: NTom Gundersen <teg@jklm.no>
      Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c835a677
  6. 02 7月, 2014 1 次提交
  7. 22 6月, 2014 1 次提交
  8. 12 6月, 2014 1 次提交
    • F
      net_sched: drr: warn when qdisc is not work conserving · 6e765a00
      Florian Westphal 提交于
      The DRR scheduler requires that items on the active list are work
      conserving, i.e. do not hold on to skbs for throttling purposes, etc.
      Attaching e.g. tbf renders DRR useless because all other classes on the
      active list are delayed as well.
      
      So, warn users that this configuration won't work as expected; we
      already do this in couple of other qdiscs, see e.g.
      
      commit b00355db
      ('pkt_sched: sch_hfsc: sch_htb: Add non-work-conserving warning handler')
      
      The 'const' change is needed to avoid compiler warning ("discards 'const'
      qualifier from pointer target type").
      
      tested with:
      drr_hier() {
              parent=$1
              classes=$2
              for i in  $(seq 1 $classes); do
                      classid=$parent$(printf %x $i)
                      tc class add dev eth0 parent $parent classid $classid drr
      		tc qdisc add dev eth0 parent $classid tbf rate 64kbit burst 256kbit limit 64kbit
              done
      }
      tc qdisc add dev eth0 root handle 1: drr
      drr_hier 1: 32
      tc filter add dev eth0 protocol all pref 1 parent 1: handle 1 flow hash keys dst perturb 1 divisor 32
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e765a00
  9. 05 6月, 2014 1 次提交
  10. 24 5月, 2014 1 次提交
    • D
      net: filter: let unattached filters use sock_fprog_kern · b1fcd35c
      Daniel Borkmann 提交于
      The sk_unattached_filter_create() API is used by BPF filters that
      are not directly attached or related to sockets, and are used in
      team, ptp, xt_bpf, cls_bpf, etc. As such all users do their own
      internal managment of obtaining filter blocks and thus already
      have them in kernel memory and set up before calling into
      sk_unattached_filter_create(). As a result, due to __user annotation
      in sock_fprog, sparse triggers false positives (incorrect type in
      assignment [different address space]) when filters are set up before
      passing them to sk_unattached_filter_create(). Therefore, let
      sk_unattached_filter_create() API use sock_fprog_kern to overcome
      this issue.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1fcd35c
  11. 22 5月, 2014 1 次提交
    • C
      net_sched: fix an oops in tcindex filter · bf63ac73
      Cong Wang 提交于
      Kelly reported the following crash:
      
              IP: [<ffffffff817a993d>] tcf_action_exec+0x46/0x90
              PGD 3009067 PUD 300c067 PMD 11ff30067 PTE 800000011634b060
              Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
              CPU: 1 PID: 639 Comm: dhclient Not tainted 3.15.0-rc4+ #342
              Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
              task: ffff8801169ecd00 ti: ffff8800d21b8000 task.ti: ffff8800d21b8000
              RIP: 0010:[<ffffffff817a993d>]  [<ffffffff817a993d>] tcf_action_exec+0x46/0x90
              RSP: 0018:ffff8800d21b9b90  EFLAGS: 00010283
              RAX: 00000000ffffffff RBX: ffff88011634b8e8 RCX: ffff8800cf7133d8
              RDX: ffff88011634b900 RSI: ffff8800cf7133e0 RDI: ffff8800d210f840
              RBP: ffff8800d21b9bb0 R08: ffffffff8287bf60 R09: 0000000000000001
              R10: ffff8800d2b22b24 R11: 0000000000000001 R12: ffff8800d210f840
              R13: ffff8800d21b9c50 R14: ffff8800cf7133e0 R15: ffff8800cad433d8
              FS:  00007f49723e1840(0000) GS:ffff88011a800000(0000) knlGS:0000000000000000
              CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
              CR2: ffff88011634b8f0 CR3: 00000000ce469000 CR4: 00000000000006e0
              Stack:
               ffff8800d2170188 ffff8800d210f840 ffff8800d2171b90 0000000000000000
               ffff8800d21b9be8 ffffffff817c55bb ffff8800d21b9c50 ffff8800d2171b90
               ffff8800d210f840 ffff8800d21b0300 ffff8800d21b9c50 ffff8800d21b9c18
              Call Trace:
               [<ffffffff817c55bb>] tcindex_classify+0x88/0x9b
               [<ffffffff817a7f7d>] tc_classify_compat+0x3e/0x7b
               [<ffffffff817a7fdf>] tc_classify+0x25/0x9f
               [<ffffffff817b0e68>] htb_enqueue+0x55/0x27a
               [<ffffffff817b6c2e>] dsmark_enqueue+0x165/0x1a4
               [<ffffffff81775642>] __dev_queue_xmit+0x35e/0x536
               [<ffffffff8177582a>] dev_queue_xmit+0x10/0x12
               [<ffffffff818f8ecd>] packet_sendmsg+0xb26/0xb9a
               [<ffffffff810b1507>] ? __lock_acquire+0x3ae/0xdf3
               [<ffffffff8175cf08>] __sock_sendmsg_nosec+0x25/0x27
               [<ffffffff8175d916>] sock_aio_write+0xd0/0xe7
               [<ffffffff8117d6b8>] do_sync_write+0x59/0x78
               [<ffffffff8117d84d>] vfs_write+0xb5/0x10a
               [<ffffffff8117d96a>] SyS_write+0x49/0x7f
               [<ffffffff8198e212>] system_call_fastpath+0x16/0x1b
      
      This is because we memcpy struct tcindex_filter_result which contains
      struct tcf_exts, obviously struct list_head can not be simply copied.
      This is a regression introduced by commit 33be6271
      (net_sched: act: use standard struct list_head).
      
      It's not very easy to fix it as the code is a mess:
      
             if (old_r)
                     memcpy(&cr, r, sizeof(cr));
             else {
                     memset(&cr, 0, sizeof(cr));
                     tcf_exts_init(&cr.exts, TCA_TCINDEX_ACT, TCA_TCINDEX_POLICE);
             }
             ...
             tcf_exts_change(tp, &cr.exts, &e);
             ...
             memcpy(r, &cr, sizeof(cr));
      
      the above code should equal to:
      
              tcindex_filter_result_init(&cr);
              if (old_r)
                     cr.res = r->res;
              ...
              if (old_r)
                     tcf_exts_change(tp, &r->exts, &e);
              else
                     tcf_exts_change(tp, &cr.exts, &e);
              ...
              r->res = cr.res;
      
      after this change, since there is no need to copy struct tcf_exts.
      
      And it also fixes other places zero'ing struct's contains struct tcf_exts.
      
      Fixes: commit 33be6271 (net_sched: act: use standard struct list_head)
      Reported-by: NKelly Anderson <kelly@xilka.com>
      Tested-by: NKelly Anderson <kelly@xilka.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf63ac73
  12. 13 5月, 2014 1 次提交
  13. 05 5月, 2014 1 次提交
  14. 03 5月, 2014 1 次提交
  15. 28 4月, 2014 2 次提交
  16. 25 4月, 2014 1 次提交
  17. 01 4月, 2014 1 次提交
  18. 19 3月, 2014 1 次提交
    • E
      net: sched: use no more than one page in struct fw_head · d37d8ac1
      Eric Dumazet 提交于
      In commit b4e9b520 ("[NET_SCHED]: Add mask support to fwmark
      classifier") Patrick added an u32 field in fw_head, making it slightly
      bigger than one page.
      
      Lets use 256 slots to make fw_hash() more straight forward, and move
      @mask to the beginning of the structure as we often use a small number
      of skb->mark. @mask and first hash buckets share the same cache line.
      
      This brings back the memory usage to less than 4000 bytes, and permits
      John to add a rcu_head at the end of the structure later without any
      worry.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d37d8ac1
  19. 14 3月, 2014 1 次提交
  20. 12 3月, 2014 2 次提交
  21. 11 3月, 2014 2 次提交
  22. 09 3月, 2014 1 次提交
  23. 07 3月, 2014 1 次提交
  24. 04 3月, 2014 1 次提交
  25. 28 2月, 2014 1 次提交
  26. 18 2月, 2014 1 次提交
  27. 14 2月, 2014 4 次提交
  28. 13 2月, 2014 5 次提交
  29. 27 1月, 2014 1 次提交