1. 19 3月, 2015 5 次提交
  2. 18 3月, 2015 5 次提交
    • D
      act_bpf: allow non-default TC_ACT opcodes as BPF exec outcome · ced585c8
      Daniel Borkmann 提交于
      Revisiting commit d23b8ad8 ("tc: add BPF based action") with regards
      to eBPF support, I was thinking that it might be better to improve
      return semantics from a BPF program invoked through BPF_PROG_RUN().
      
      Currently, in case filter_res is 0, we overwrite the default action
      opcode with TC_ACT_SHOT. A default action opcode configured through tc's
      m_bpf can be: TC_ACT_RECLASSIFY, TC_ACT_PIPE, TC_ACT_SHOT, TC_ACT_UNSPEC,
      TC_ACT_OK.
      
      In cls_bpf, we have the possibility to overwrite the default class
      associated with the classifier in case filter_res is _not_ 0xffffffff
      (-1).
      
      That allows us to fold multiple [e]BPF programs into a single one, where
      they would otherwise need to be defined as a separate classifier with
      its own classid, needlessly redoing parsing work, etc.
      
      Similarly, we could do better in act_bpf: Since above TC_ACT* opcodes
      are exported to UAPI anyway, we reuse them for return-code-to-tc-opcode
      mapping, where we would allow above possibilities. Thus, like in cls_bpf,
      a filter_res of 0xffffffff (-1) means that the configured _default_ action
      is used. Any unkown return code from the BPF program would fail in
      tcf_bpf() with TC_ACT_UNSPEC.
      
      Should we one day want to make use of TC_ACT_STOLEN or TC_ACT_QUEUED,
      which both have the same semantics, we have the option to either use
      that as a default action (filter_res of 0xffffffff) or non-default BPF
      return code.
      
      All that will allow us to transparently use tcf_bpf() for both BPF
      flavours.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Acked-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ced585c8
    • R
      Revert "smc91x: retrieve IRQ and trigger flags in a modern way" · 8d7d9cca
      Robert Jarzmik 提交于
      The commit breaks the legacy platforms, ie. these not using device-tree,
      and setting up the interrupt resources with a flag to activate edge
      detection. The issue was found on the zylonite platform.
      
      The reason is that zylonite uses platform resources to pass the interrupt number
      and the irq flags (here IORESOURCE_IRQ_HIGHEDGE). It expects the driver to
      request the irq with these flags, which in turn setups the irq as high edge
      triggered.
      
      After the patch, this was supposed to be taken care of with :
        irq_resflags = irqd_get_trigger_type(irq_get_irq_data(ndev->irq));
      
      But irq_resflags is 0 for legacy platforms, while for example in
      arch/arm/mach-pxa/zylonite.c, in struct resource smc91x_resources[] the
      irq flag is specified. This breaks zylonite because the interrupt is not
      setup as triggered, and hardware doesn't provide interrupts.
      Signed-off-by: NRobert Jarzmik <robert.jarzmik@free.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d7d9cca
    • E
      inet: Clean up inet_csk_wait_for_connect() vs. might_sleep() · cb7cf8a3
      Eric Dumazet 提交于
      I got the following trace with current net-next kernel :
      
      [14723.885290] WARNING: CPU: 26 PID: 22658 at kernel/sched/core.c:7285 __might_sleep+0x89/0xa0()
      [14723.885325] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff810e8734>] prepare_to_wait_exclusive+0x34/0xa0
      [14723.885355] CPU: 26 PID: 22658 Comm: netserver Not tainted 4.0.0-dbg-DEV #1379
      [14723.885359]  ffffffff81a223a8 ffff881fae9e7ca8 ffffffff81650b5d 0000000000000001
      [14723.885364]  ffff881fae9e7cf8 ffff881fae9e7ce8 ffffffff810a72e7 0000000000000000
      [14723.885367]  ffffffff81a57620 000000000000093a 0000000000000000 ffff881fae9e7e64
      [14723.885371] Call Trace:
      [14723.885377]  [<ffffffff81650b5d>] dump_stack+0x4c/0x65
      [14723.885382]  [<ffffffff810a72e7>] warn_slowpath_common+0x97/0xe0
      [14723.885386]  [<ffffffff810a73e6>] warn_slowpath_fmt+0x46/0x50
      [14723.885390]  [<ffffffff810f4c5d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
      [14723.885393]  [<ffffffff810e8734>] ? prepare_to_wait_exclusive+0x34/0xa0
      [14723.885396]  [<ffffffff810e8734>] ? prepare_to_wait_exclusive+0x34/0xa0
      [14723.885399]  [<ffffffff810ccdc9>] __might_sleep+0x89/0xa0
      [14723.885403]  [<ffffffff81581846>] lock_sock_nested+0x36/0xb0
      [14723.885406]  [<ffffffff815829a3>] ? release_sock+0x173/0x1c0
      [14723.885411]  [<ffffffff815ea1f7>] inet_csk_accept+0x157/0x2a0
      [14723.885415]  [<ffffffff810e8900>] ? abort_exclusive_wait+0xc0/0xc0
      [14723.885419]  [<ffffffff8161b96d>] inet_accept+0x2d/0x150
      [14723.885424]  [<ffffffff8157db6f>] SYSC_accept4+0xff/0x210
      [14723.885428]  [<ffffffff8165a451>] ? retint_swapgs+0xe/0x44
      [14723.885431]  [<ffffffff810f4c5d>] ? trace_hardirqs_on_caller+0x10d/0x1d0
      [14723.885437]  [<ffffffff81369c0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [14723.885441]  [<ffffffff8157ef40>] SyS_accept+0x10/0x20
      [14723.885444]  [<ffffffff81659872>] system_call_fastpath+0x12/0x17
      [14723.885447] ---[ end trace ff74cd83355b1873 ]---
      
      In commit 26cabd31
      Peter added a sched_annotate_sleep() in sk_wait_event()
      
      Is the following patch needed as well ?
      
      Alternative would be to use sk_wait_event() from inet_csk_wait_for_connect()
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb7cf8a3
    • N
      ip6_tunnel: fix error code when tunnel exists · 37355565
      Nicolas Dichtel 提交于
      After commit 2b0bb01b, the kernel returns -ENOBUFS when user tries to add
      an existing tunnel with ioctl API:
      $ ip -6 tunnel add ip6tnl1 mode ip6ip6 dev eth1
      add tunnel "ip6tnl0" failed: No buffer space available
      
      It's confusing, the right error is EEXIST.
      
      This patch also change a bit the code returned:
       - ENOBUFS -> ENOMEM
       - ENOENT -> ENODEV
      
      Fixes: 2b0bb01b ("ip6_tunnel: Return an error when adding an existing tunnel.")
      CC: Steffen Klassert <steffen.klassert@secunet.com>
      Reported-by: NPierre Cheynier <me@pierre-cheynier.net>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37355565
    • N
      netdevice.h: fix ndo_bridge_* comments · ad41faa8
      Nicolas Dichtel 提交于
      The argument 'flags' was missing in ndo_bridge_setlink().
      ndo_bridge_dellink() was missing.
      
      Fixes: 407af329 ("bridge: Add netlink interface to configure vlans on bridge ports")
      Fixes: add511b3 ("bridge: add flags argument to ndo_bridge_setlink and ndo_bridge_dellink")
      CC: Vlad Yasevich <vyasevic@redhat.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad41faa8
  3. 17 3月, 2015 3 次提交
    • M
      bnx2x: fix encapsulation features on 57710/57711 · a8e0c246
      Michal Schmidt 提交于
      E1x chips (57710, 57711(E)) have no support for encapsulation
      offload. bnx2x incorrectly advertises the support as available.
      
      Setting of those features is conditional on "!CHIP_IS_E1x(bp)", but
      the bp struct is not initialized yet at this point and consequently
      any chip passes the check.
      The check must use the "chip_is_e1x" local variable instead to work
      correctly.
      Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8e0c246
    • D
      Merge tag 'mac80211-for-davem-2015-03-16' of... · 48b810d9
      David S. Miller 提交于
      Merge tag 'mac80211-for-davem-2015-03-16' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
      
      Johannes Berg says:
      
      ====================
      Here are a few fixes that I'd like to still get in:
       * disable U-APSD for better interoperability, from Michal Kazior
       * drop unencrypted frames in mesh forwarding, from Bob Copeland
       * treat non-QoS/WMM HT stations as non-HT, to fix confusion when
         they connect and then get QoS packets anyway due to HT
       * fix counting interfaces for combination checks, otherwise the
         interface combinations aren't properly enforced (from Andrei)
       * fix pure ECSA by reacting to the IE change
       * ignore erroneous (E)CSA to the current channel which sometimes
         happens due to AP/GO bugs
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48b810d9
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · ca00942a
      David S. Miller 提交于
      Steffen Klassert says:
      
      ====================
      pull request (net): ipsec 2015-03-16
      
      1) Fix the network header offset in _decode_session6
         when multiple IPv6 extension headers are present.
         From Hajime Tazaki.
      
      2) Fix an interfamily tunnel crash. We set outer mode
         protocol too early and may dispatch to the wrong
         address family. Move the setting of the outer mode
         protocol behind the last accessing of the inner mode
         to fix the crash.
      
      3) Most callers of xfrm_lookup() expect that dst_orig
         is released on error. But xfrm_lookup_route() may
         need dst_orig to handle certain error cases. So
         introduce a flag that tells what should be done in
         case of error. From Huaibin Wang.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca00942a
  4. 16 3月, 2015 7 次提交
  5. 15 3月, 2015 2 次提交
  6. 14 3月, 2015 6 次提交
  7. 13 3月, 2015 1 次提交
    • J
      virtio-net: correctly delete napi hash · ab3971b1
      Jason Wang 提交于
      We don't delete napi from hash list during module exit. This will
      cause the following panic when doing module load and unload:
      
      BUG: unable to handle kernel paging request at 0000004e00000075
      IP: [<ffffffff816bd01b>] napi_hash_add+0x6b/0xf0
      PGD 3c5d5067 PUD 0
      Oops: 0000 [#1] SMP
      ...
      Call Trace:
      [<ffffffffa0a5bfb7>] init_vqs+0x107/0x490 [virtio_net]
      [<ffffffffa0a5c9f2>] virtnet_probe+0x562/0x791815639d880be [virtio_net]
      [<ffffffff8139e667>] virtio_dev_probe+0x137/0x200
      [<ffffffff814c7f2a>] driver_probe_device+0x7a/0x250
      [<ffffffff814c81d3>] __driver_attach+0x93/0xa0
      [<ffffffff814c8140>] ? __device_attach+0x40/0x40
      [<ffffffff814c6053>] bus_for_each_dev+0x63/0xa0
      [<ffffffff814c7a79>] driver_attach+0x19/0x20
      [<ffffffff814c76f0>] bus_add_driver+0x170/0x220
      [<ffffffffa0a60000>] ? 0xffffffffa0a60000
      [<ffffffff814c894f>] driver_register+0x5f/0xf0
      [<ffffffff8139e41b>] register_virtio_driver+0x1b/0x30
      [<ffffffffa0a60010>] virtio_net_driver_init+0x10/0x12 [virtio_net]
      
      This patch fixes this by doing this in virtnet_free_queues(). And also
      don't delete napi in virtnet_freeze() since it will call
      virtnet_free_queues() which has already did this.
      
      Fixes 91815639 ("virtio-net: rx busy polling support")
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab3971b1
  8. 12 3月, 2015 8 次提交
    • A
      rds: avoid potential stack overflow · f862e07c
      Arnd Bergmann 提交于
      The rds_iw_update_cm_id function stores a large 'struct rds_sock' object
      on the stack in order to pass a pair of addresses. This happens to just
      fit withint the 1024 byte stack size warning limit on x86, but just
      exceed that limit on ARM, which gives us this warning:
      
      net/rds/iw_rdma.c:200:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=]
      
      As the use of this large variable is basically bogus, we can rearrange
      the code to not do that. Instead of passing an rds socket into
      rds_iw_get_device, we now just pass the two addresses that we have
      available in rds_iw_update_cm_id, and we change rds_iw_get_mr accordingly,
      to create two address structures on the stack there.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f862e07c
    • W
      sock: fix possible NULL sk dereference in __skb_tstamp_tx · 3a8dd971
      Willem de Bruijn 提交于
      Test that sk != NULL before reading sk->sk_tsflags.
      
      Fixes: 49ca0d8b ("net-timestamp: no-payload option")
      Reported-by: NOne Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a8dd971
    • E
      xps: must clear sender_cpu before forwarding · c29390c6
      Eric Dumazet 提交于
      John reported that my previous commit added a regression
      on his router.
      
      This is because sender_cpu & napi_id share a common location,
      so get_xps_queue() can see garbage and perform an out of bound access.
      
      We need to make sure sender_cpu is cleared before doing the transmit,
      otherwise any NIC busy poll enabled (skb_mark_napi_id()) can trigger
      this bug.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJohn <jw@nuclearfallout.net>
      Bisected-by: NJohn <jw@nuclearfallout.net>
      Fixes: 2bd82484 ("xps: fix xps for stacked devices")
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c29390c6
    • D
      xen-netback: notify immediately after pushing Tx response. · c8a4d299
      David Vrabel 提交于
      This fixes a performance regression introduced by
      7fbb9d84 (xen-netback: release pending
      index before pushing Tx responses)
      
      Moving the notify outside of the spin locks means it can be delayed a
      long time (if the dealloc thread is descheduled or there is an
      interrupt or softirq).
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      Reviewed-by: NZoltan Kiss <zoltan.kiss@linaro.org>
      Acked-by: NWei Liu <wei.liu2@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c8a4d299
    • A
      net: sysctl_net_core: check SNDBUF and RCVBUF for min length · b1cb59cf
      Alexey Kodanev 提交于
      sysctl has sysctl.net.core.rmem_*/wmem_* parameters which can be
      set to incorrect values. Given that 'struct sk_buff' allocates from
      rcvbuf, incorrectly set buffer length could result to memory
      allocation failures. For example, set them as follows:
      
          # sysctl net.core.rmem_default=64
            net.core.wmem_default = 64
          # sysctl net.core.wmem_default=64
            net.core.wmem_default = 64
          # ping localhost -s 1024 -i 0 > /dev/null
      
      This could result to the following failure:
      
      skbuff: skb_over_panic: text:ffffffff81628db4 len:-32 put:-32
      head:ffff88003a1cc200 data:ffff88003a1cc200 tail:0xffffffe0 end:0xc0 dev:<NULL>
      kernel BUG at net/core/skbuff.c:102!
      invalid opcode: 0000 [#1] SMP
      ...
      task: ffff88003b7f5550 ti: ffff88003ae88000 task.ti: ffff88003ae88000
      RIP: 0010:[<ffffffff8155fbd1>]  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
      RSP: 0018:ffff88003ae8bc68  EFLAGS: 00010296
      RAX: 000000000000008d RBX: 00000000ffffffe0 RCX: 0000000000000000
      RDX: ffff88003fdcf598 RSI: ffff88003fdcd9c8 RDI: ffff88003fdcd9c8
      RBP: ffff88003ae8bc88 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000001 R11: 00000000000002b2 R12: 0000000000000000
      R13: 0000000000000000 R14: ffff88003d3f7300 R15: ffff88000012a900
      FS:  00007fa0e2b4a840(0000) GS:ffff88003fc00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000d0f7e0 CR3: 000000003b8fb000 CR4: 00000000000006f0
      Stack:
       ffff88003a1cc200 00000000ffffffe0 00000000000000c0 ffffffff818cab1d
       ffff88003ae8bd68 ffffffff81628db4 ffff88003ae8bd48 ffff88003b7f5550
       ffff880031a09408 ffff88003b7f5550 ffff88000012aa48 ffff88000012ab00
      Call Trace:
       [<ffffffff81628db4>] unix_stream_sendmsg+0x2c4/0x470
       [<ffffffff81556f56>] sock_write_iter+0x146/0x160
       [<ffffffff811d9612>] new_sync_write+0x92/0xd0
       [<ffffffff811d9cd6>] vfs_write+0xd6/0x180
       [<ffffffff811da499>] SyS_write+0x59/0xd0
       [<ffffffff81651532>] system_call_fastpath+0x12/0x17
      Code: 00 00 48 89 44 24 10 8b 87 c8 00 00 00 48 89 44 24 08 48 8b 87 d8 00
            00 00 48 c7 c7 30 db 91 81 48 89 04 24 31 c0 e8 4f a8 0e 00 <0f> 0b
            eb fe 66 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 48 83
      RIP  [<ffffffff8155fbd1>] skb_put+0xa1/0xb0
      RSP <ffff88003ae8bc68>
      Kernel panic - not syncing: Fatal exception
      
      Moreover, the possible minimum is 1, so we can get another kernel panic:
      ...
      BUG: unable to handle kernel paging request at ffff88013caee5c0
      IP: [<ffffffff815604cf>] __alloc_skb+0x12f/0x1f0
      ...
      Signed-off-by: NAlexey Kodanev <alexey.kodanev@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1cb59cf
    • N
      tcp: restore 1.5x per RTT limit to CUBIC cwnd growth in congestion avoidance · d578e18c
      Neal Cardwell 提交于
      Commit 814d488c ("tcp: fix the timid additive increase on stretch
      ACKs") fixed a bug where tcp_cong_avoid_ai() would either credit a
      connection with an increase of snd_cwnd_cnt, or increase snd_cwnd, but
      not both, resulting in cwnd increasing by 1 packet on at most every
      alternate invocation of tcp_cong_avoid_ai().
      
      Although the commit correctly implemented the CUBIC algorithm, which
      can increase cwnd by as much as 1 packet per 1 packet ACKed (2x per
      RTT), in practice that could be too aggressive: in tests on network
      paths with small buffers, YouTube server retransmission rates nearly
      doubled.
      
      This commit restores CUBIC to a maximum cwnd growth rate of 1 packet
      per 2 packets ACKed (1.5x per RTT). In YouTube tests this restored
      retransmit rates to low levels.
      
      Testing: This patch has been tested in datacenter netperf transfers
      and live youtube.com and google.com servers.
      
      Fixes: 9cd981dc ("tcp: fix stretch ACK bugs in CUBIC")
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d578e18c
    • N
      tcp: fix tcp_cong_avoid_ai() credit accumulation bug with decreases in w · 9949afa4
      Neal Cardwell 提交于
      The recent change to tcp_cong_avoid_ai() to handle stretch ACKs
      introduced a bug where snd_cwnd_cnt could accumulate a very large
      value while w was large, and then if w was reduced snd_cwnd could be
      incremented by a large delta, leading to a large burst and high packet
      loss. This was tickled when CUBIC's bictcp_update() sets "ca->cnt =
      100 * cwnd".
      
      This bug crept in while preparing the upstream version of
      814d488c.
      
      Testing: This patch has been tested in datacenter netperf transfers
      and live youtube.com and google.com servers.
      
      Fixes: 814d488c ("tcp: fix the timid additive increase on stretch ACKs")
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9949afa4
    • C
      MAINTAINERS: Update my email address · 366c1bd1
      chas williams - CONTRACTOR 提交于
      Changed to my private email address.
      Signed-off-by: NChas Williams -- CONTRACTOR <chas@cmf.nrl.navy.mil>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      366c1bd1
  9. 11 3月, 2015 3 次提交