1. 05 4月, 2018 1 次提交
  2. 01 4月, 2018 10 次提交
    • E
      inet: frags: get rid of ipfrag_skb_cb/FRAG_CB · bf663371
      Eric Dumazet 提交于
      ip_defrag uses skb->cb[] to store the fragment offset, and unfortunately
      this integer is currently in a different cache line than skb->next,
      meaning that we use two cache lines per skb when finding the insertion point.
      
      By aliasing skb->ip_defrag_offset and skb->dev, we pack all the fields
      in a single cache line and save precious memory bandwidth.
      
      Note that after the fast path added by Changli Gao in commit
      d6bebca9 ("fragment: add fast path for in-order fragments")
      this change wont help the fast path, since we still need
      to access prev->len (2nd cache line), but will show great
      benefits when slow path is entered, since we perform
      a linear scan of a potentially long list.
      
      Also, note that this potential long list is an attack vector,
      we might consider also using an rb-tree there eventually.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf663371
    • E
      inet: frags: do not clone skb in ip_expire() · 1eec5d56
      Eric Dumazet 提交于
      An skb_clone() was added in commit ec4fbd64 ("inet: frag: release
      spinlock before calling icmp_send()")
      
      While fixing the bug at that time, it also added a very high cost
      for DDOS frags, as the ICMP rate limit is applied after this
      expensive operation (skb_clone() + consume_skb(), implying memory
      allocations, copy, and freeing)
      
      We can use skb_get(head) here, all we want is to make sure skb wont
      be freed by another cpu.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1eec5d56
    • E
      inet: frags: break the 2GB limit for frags storage · 3e67f106
      Eric Dumazet 提交于
      Some users are willing to provision huge amounts of memory to be able
      to perform reassembly reasonnably well under pressure.
      
      Current memory tracking is using one atomic_t and integers.
      
      Switch to atomic_long_t so that 64bit arches can use more than 2GB,
      without any cost for 32bit arches.
      
      Note that this patch avoids an overflow error, if high_thresh was set
      to ~2GB, since this test in inet_frag_alloc() was never true :
      
      if (... || frag_mem_limit(nf) > nf->high_thresh)
      
      Tested:
      
      $ echo 16000000000 >/proc/sys/net/ipv4/ipfrag_high_thresh
      
      <frag DDOS>
      
      $ grep FRAG /proc/net/sockstat
      FRAG: inuse 14705885 memory 16000002880
      
      $ nstat -n ; sleep 1 ; nstat | grep Reas
      IpReasmReqds                    3317150            0.0
      IpReasmFails                    3317112            0.0
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3e67f106
    • E
      inet: frags: remove inet_frag_maybe_warn_overflow() · 2d44ed22
      Eric Dumazet 提交于
      This function is obsolete, after rhashtable addition to inet defrag.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d44ed22
    • E
      inet: frags: get rif of inet_frag_evicting() · 399d1404
      Eric Dumazet 提交于
      This refactors ip_expire() since one indentation level is removed.
      
      Note: in the future, we should try hard to avoid the skb_clone()
      since this is a serious performance cost.
      Under DDOS, the ICMP message wont be sent because of rate limits.
      
      Fact that ip6_expire_frag_queue() does not use skb_clone() is
      disturbing too. Presumably IPv6 should have the same
      issue than the one we fixed in commit ec4fbd64
      ("inet: frag: release spinlock before calling icmp_send()")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      399d1404
    • E
      inet: frags: remove some helpers · 6befe4a7
      Eric Dumazet 提交于
      Remove sum_frag_mem_limit(), ip_frag_mem() & ip6_frag_mem()
      
      Also since we use rhashtable we can bring back the number of fragments
      in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
      removed in commit 434d3054 ("inet: frag: don't account number
      of fragment queues")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6befe4a7
    • E
      inet: frags: use rhashtables for reassembly units · 648700f7
      Eric Dumazet 提交于
      Some applications still rely on IP fragmentation, and to be fair linux
      reassembly unit is not working under any serious load.
      
      It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)
      
      A work queue is supposed to garbage collect items when host is under memory
      pressure, and doing a hash rebuild, changing seed used in hash computations.
      
      This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
      occurring every 5 seconds if host is under fire.
      
      Then there is the problem of sharing this hash table for all netns.
      
      It is time to switch to rhashtables, and allocate one of them per netns
      to speedup netns dismantle, since this is a critical metric these days.
      
      Lookup is now using RCU. A followup patch will even remove
      the refcount hold/release left from prior implementation and save
      a couple of atomic operations.
      
      Before this patch, 16 cpus (16 RX queue NIC) could not handle more
      than 1 Mpps frags DDOS.
      
      After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
      of storage for the fragments (exact number depends on frags being evicted
      after timeout)
      
      $ grep FRAG /proc/net/sockstat
      FRAG: inuse 1966916 memory 2140004608
      
      A followup patch will change the limits for 64bit arches.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: Jesper Dangaard Brouer <brouer@redhat.com>
      Cc: Alexander Aring <alex.aring@gmail.com>
      Cc: Stefan Schmidt <stefan@osg.samsung.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      648700f7
    • E
      inet: frags: refactor ipfrag_init() · 483a6e4f
      Eric Dumazet 提交于
      We need to call inet_frags_init() before register_pernet_subsys(),
      as a prereq for following patch ("inet: frags: use rhashtables for reassembly units")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      483a6e4f
    • E
      inet: frags: add a pointer to struct netns_frags · 093ba729
      Eric Dumazet 提交于
      In order to simplify the API, add a pointer to struct inet_frags.
      This will allow us to make things less complex.
      
      These functions no longer have a struct inet_frags parameter :
      
      inet_frag_destroy(struct inet_frag_queue *q  /*, struct inet_frags *f */)
      inet_frag_put(struct inet_frag_queue *q /*, struct inet_frags *f */)
      inet_frag_kill(struct inet_frag_queue *q /*, struct inet_frags *f */)
      inet_frags_exit_net(struct netns_frags *nf /*, struct inet_frags *f */)
      ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      093ba729
    • E
      inet: frags: change inet_frags_init_net() return value · 787bea77
      Eric Dumazet 提交于
      We will soon initialize one rhashtable per struct netns_frags
      in inet_frags_init_net().
      
      This patch changes the return value to eventually propagate an
      error.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      787bea77
  3. 28 3月, 2018 1 次提交
  4. 13 2月, 2018 1 次提交
    • K
      net: Convert pernet_subsys, registered from inet_init() · f84c6821
      Kirill Tkhai 提交于
      arp_net_ops just addr/removes /proc entry.
      
      devinet_ops allocates and frees duplicate of init_net tables
      and (un)registers sysctl entries.
      
      fib_net_ops allocates and frees pernet tables, creates/destroys
      netlink socket and (un)initializes /proc entries. Foreign
      pernet_operations do not touch them.
      
      ip_rt_proc_ops only modifies pernet /proc entries.
      
      xfrm_net_ops creates/destroys /proc entries, allocates/frees
      pernet statistics, hashes and tables, and (un)initializes
      sysctl files. These are not touched by foreigh pernet_operations
      
      xfrm4_net_ops allocates/frees private pernet memory, and
      configures sysctls.
      
      sysctl_route_ops creates/destroys sysctls.
      
      rt_genid_ops only initializes fields of just allocated net.
      
      ipv4_inetpeer_ops allocated/frees net private memory.
      
      igmp_net_ops just creates/destroys /proc files and socket,
      noone else interested in.
      
      tcp_sk_ops seems to be safe, because tcp_sk_init() does not
      depend on any other pernet_operations modifications. Iteration
      over hash table in inet_twsk_purge() is made under RCU lock,
      and it's safe to iterate the table this way. Removing from
      the table happen from inet_twsk_deschedule_put(), but this
      function is safe without any extern locks, as it's synchronized
      inside itself. There are many examples, it's used in different
      context. So, it's safe to leave tcp_sk_exit_batch() unlocked.
      
      tcp_net_metrics_ops is synchronized on tcp_metrics_lock and safe.
      
      udplite4_net_ops only creates/destroys pernet /proc file.
      
      icmp_sk_ops creates percpu sockets, not touched by foreign
      pernet_operations.
      
      ipmr_net_ops creates/destroys pernet fib tables, (un)registers
      fib rules and /proc files. This seem to be safe to execute
      in parallel with foreign pernet_operations.
      
      af_inet_ops just sets up default parameters of newly created net.
      
      ipv4_mib_ops creates and destroys pernet percpu statistics.
      
      raw_net_ops, tcp4_net_ops, udp4_net_ops, ping_v4_net_ops
      and ip_proc_ops only create/destroy pernet /proc files.
      
      ip4_frags_ops creates and destroys sysctl file.
      
      So, it's safe to make the pernet_operations async.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f84c6821
  5. 02 11月, 2017 1 次提交
    • G
      License cleanup: add SPDX GPL-2.0 license identifier to files with no license · b2441318
      Greg Kroah-Hartman 提交于
      Many source files in the tree are missing licensing information, which
      makes it harder for compliance tools to determine the correct license.
      
      By default all files without license information are under the default
      license of the kernel, which is GPL version 2.
      
      Update the files which contain no license information with the 'GPL-2.0'
      SPDX license identifier.  The SPDX identifier is a legally binding
      shorthand, which can be used instead of the full boiler plate text.
      
      This patch is based on work done by Thomas Gleixner and Kate Stewart and
      Philippe Ombredanne.
      
      How this work was done:
      
      Patches were generated and checked against linux-4.14-rc6 for a subset of
      the use cases:
       - file had no licensing information it it.
       - file was a */uapi/* one with no licensing information in it,
       - file was a */uapi/* one with existing licensing information,
      
      Further patches will be generated in subsequent months to fix up cases
      where non-standard license headers were used, and references to license
      had to be inferred by heuristics based on keywords.
      
      The analysis to determine which SPDX License Identifier to be applied to
      a file was done in a spreadsheet of side by side results from of the
      output of two independent scanners (ScanCode & Windriver) producing SPDX
      tag:value files created by Philippe Ombredanne.  Philippe prepared the
      base worksheet, and did an initial spot review of a few 1000 files.
      
      The 4.13 kernel was the starting point of the analysis with 60,537 files
      assessed.  Kate Stewart did a file by file comparison of the scanner
      results in the spreadsheet to determine which SPDX license identifier(s)
      to be applied to the file. She confirmed any determination that was not
      immediately clear with lawyers working with the Linux Foundation.
      
      Criteria used to select files for SPDX license identifier tagging was:
       - Files considered eligible had to be source code files.
       - Make and config files were included as candidates if they contained >5
         lines of source
       - File already had some variant of a license header in it (even if <5
         lines).
      
      All documentation files were explicitly excluded.
      
      The following heuristics were used to determine which SPDX license
      identifiers to apply.
      
       - when both scanners couldn't find any license traces, file was
         considered to have no license information in it, and the top level
         COPYING file license applied.
      
         For non */uapi/* files that summary was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0                                              11139
      
         and resulted in the first patch in this series.
      
         If that file was a */uapi/* path one, it was "GPL-2.0 WITH
         Linux-syscall-note" otherwise it was "GPL-2.0".  Results of that was:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|-------
         GPL-2.0 WITH Linux-syscall-note                        930
      
         and resulted in the second patch in this series.
      
       - if a file had some form of licensing information in it, and was one
         of the */uapi/* ones, it was denoted with the Linux-syscall-note if
         any GPL family license was found in the file or had no licensing in
         it (per prior point).  Results summary:
      
         SPDX license identifier                            # files
         ---------------------------------------------------|------
         GPL-2.0 WITH Linux-syscall-note                       270
         GPL-2.0+ WITH Linux-syscall-note                      169
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause)    21
         ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)    17
         LGPL-2.1+ WITH Linux-syscall-note                      15
         GPL-1.0+ WITH Linux-syscall-note                       14
         ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause)    5
         LGPL-2.0+ WITH Linux-syscall-note                       4
         LGPL-2.1 WITH Linux-syscall-note                        3
         ((GPL-2.0 WITH Linux-syscall-note) OR MIT)              3
         ((GPL-2.0 WITH Linux-syscall-note) AND MIT)             1
      
         and that resulted in the third patch in this series.
      
       - when the two scanners agreed on the detected license(s), that became
         the concluded license(s).
      
       - when there was disagreement between the two scanners (one detected a
         license but the other didn't, or they both detected different
         licenses) a manual inspection of the file occurred.
      
       - In most cases a manual inspection of the information in the file
         resulted in a clear resolution of the license that should apply (and
         which scanner probably needed to revisit its heuristics).
      
       - When it was not immediately clear, the license identifier was
         confirmed with lawyers working with the Linux Foundation.
      
       - If there was any question as to the appropriate license identifier,
         the file was flagged for further research and to be revisited later
         in time.
      
      In total, over 70 hours of logged manual review was done on the
      spreadsheet to determine the SPDX license identifiers to apply to the
      source files by Kate, Philippe, Thomas and, in some cases, confirmation
      by lawyers working with the Linux Foundation.
      
      Kate also obtained a third independent scan of the 4.13 code base from
      FOSSology, and compared selected files where the other two scanners
      disagreed against that SPDX file, to see if there was new insights.  The
      Windriver scanner is based on an older version of FOSSology in part, so
      they are related.
      
      Thomas did random spot checks in about 500 files from the spreadsheets
      for the uapi headers and agreed with SPDX license identifier in the
      files he inspected. For the non-uapi files Thomas did random spot checks
      in about 15000 files.
      
      In initial set of patches against 4.14-rc6, 3 files were found to have
      copy/paste license identifier errors, and have been fixed to reflect the
      correct identifier.
      
      Additionally Philippe spent 10 hours this week doing a detailed manual
      inspection and review of the 12,461 patched files from the initial patch
      version early this week with:
       - a full scancode scan run, collecting the matched texts, detected
         license ids and scores
       - reviewing anything where there was a license detected (about 500+
         files) to ensure that the applied SPDX license was correct
       - reviewing anything where there was no detection but the patch license
         was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied
         SPDX license was correct
      
      This produced a worksheet with 20 files needing minor correction.  This
      worksheet was then exported into 3 different .csv files for the
      different types of files to be modified.
      
      These .csv files were then reviewed by Greg.  Thomas wrote a script to
      parse the csv files and add the proper SPDX tag to the file, in the
      format that the file expected.  This script was further refined by Greg
      based on the output to detect more types of files automatically and to
      distinguish between header and source .c files (which need different
      comment types.)  Finally Greg ran the script using the .csv files to
      generate the patches.
      Reviewed-by: NKate Stewart <kstewart@linuxfoundation.org>
      Reviewed-by: NPhilippe Ombredanne <pombredanne@nexb.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b2441318
  6. 18 10月, 2017 1 次提交
    • K
      inet: frags: Convert timers to use timer_setup() · 78802011
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: Alexander Aring <alex.aring@gmail.com>
      Cc: Stefan Schmidt <stefan@osg.samsung.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: Pablo Neira Ayuso <pablo@netfilter.org>
      Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Cc: Florian Westphal <fw@strlen.de>
      Cc: linux-wpan@vger.kernel.org
      Cc: netdev@vger.kernel.org
      Cc: netfilter-devel@vger.kernel.org
      Cc: coreteam@netfilter.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: Stefan Schmidt <stefan@osg.samsung.com> # for ieee802154
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      78802011
  7. 04 9月, 2017 1 次提交
  8. 01 7月, 2017 1 次提交
  9. 23 3月, 2017 1 次提交
    • E
      inet: frag: release spinlock before calling icmp_send() · ec4fbd64
      Eric Dumazet 提交于
      Dmitry reported a lockdep splat [1] (false positive) that we can fix
      by releasing the spinlock before calling icmp_send() from ip_expire()
      
      This is a false positive because sending an ICMP message can not
      possibly re-enter the IP frag engine.
      
      [1]
      [ INFO: possible circular locking dependency detected ]
      4.10.0+ #29 Not tainted
      -------------------------------------------------------
      modprobe/12392 is trying to acquire lock:
       (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] spin_lock
      include/linux/spinlock.h:299 [inline]
       (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>] __netif_tx_lock
      include/linux/netdevice.h:3486 [inline]
       (_xmit_ETHER#2){+.-...}, at: [<ffffffff837a8182>]
      sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
      
      but task is already holding lock:
       (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock
      include/linux/spinlock.h:299 [inline]
       (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>]
      ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201
      
      which lock already depends on the new lock.
      
      the existing dependency chain (in reverse order) is:
      
      -> #1 (&(&q->lock)->rlock){+.-...}:
             validate_chain kernel/locking/lockdep.c:2267 [inline]
             __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
             lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
             __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
             _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
             spin_lock include/linux/spinlock.h:299 [inline]
             ip_defrag+0x3a2/0x4130 net/ipv4/ip_fragment.c:669
             ip_check_defrag+0x4e3/0x8b0 net/ipv4/ip_fragment.c:713
             packet_rcv_fanout+0x282/0x800 net/packet/af_packet.c:1459
             deliver_skb net/core/dev.c:1834 [inline]
             dev_queue_xmit_nit+0x294/0xa90 net/core/dev.c:1890
             xmit_one net/core/dev.c:2903 [inline]
             dev_hard_start_xmit+0x16b/0xab0 net/core/dev.c:2923
             sch_direct_xmit+0x31f/0x6d0 net/sched/sch_generic.c:182
             __dev_xmit_skb net/core/dev.c:3092 [inline]
             __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
             dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
             neigh_resolve_output+0x6b9/0xb10 net/core/neighbour.c:1308
             neigh_output include/net/neighbour.h:478 [inline]
             ip_finish_output2+0x8b8/0x15a0 net/ipv4/ip_output.c:228
             ip_do_fragment+0x1d93/0x2720 net/ipv4/ip_output.c:672
             ip_fragment.constprop.54+0x145/0x200 net/ipv4/ip_output.c:545
             ip_finish_output+0x82d/0xe10 net/ipv4/ip_output.c:314
             NF_HOOK_COND include/linux/netfilter.h:246 [inline]
             ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
             dst_output include/net/dst.h:486 [inline]
             ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
             ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
             ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
             raw_sendmsg+0x26de/0x3a00 net/ipv4/raw.c:655
             inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:761
             sock_sendmsg_nosec net/socket.c:633 [inline]
             sock_sendmsg+0xca/0x110 net/socket.c:643
             ___sys_sendmsg+0x4a3/0x9f0 net/socket.c:1985
             __sys_sendmmsg+0x25c/0x750 net/socket.c:2075
             SYSC_sendmmsg net/socket.c:2106 [inline]
             SyS_sendmmsg+0x35/0x60 net/socket.c:2101
             do_syscall_64+0x2e8/0x930 arch/x86/entry/common.c:281
             return_from_SYSCALL_64+0x0/0x7a
      
      -> #0 (_xmit_ETHER#2){+.-...}:
             check_prev_add kernel/locking/lockdep.c:1830 [inline]
             check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
             validate_chain kernel/locking/lockdep.c:2267 [inline]
             __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
             lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
             __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
             _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
             spin_lock include/linux/spinlock.h:299 [inline]
             __netif_tx_lock include/linux/netdevice.h:3486 [inline]
             sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
             __dev_xmit_skb net/core/dev.c:3092 [inline]
             __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
             dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
             neigh_hh_output include/net/neighbour.h:468 [inline]
             neigh_output include/net/neighbour.h:476 [inline]
             ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228
             ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316
             NF_HOOK_COND include/linux/netfilter.h:246 [inline]
             ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
             dst_output include/net/dst.h:486 [inline]
             ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
             ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
             ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
             icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394
             icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754
             ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239
             call_timer_fn+0x241/0x820 kernel/time/timer.c:1268
             expire_timers kernel/time/timer.c:1307 [inline]
             __run_timers+0x960/0xcf0 kernel/time/timer.c:1601
             run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
             __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
             invoke_softirq kernel/softirq.c:364 [inline]
             irq_exit+0x1cc/0x200 kernel/softirq.c:405
             exiting_irq arch/x86/include/asm/apic.h:657 [inline]
             smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
             apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
             __read_once_size include/linux/compiler.h:254 [inline]
             atomic_read arch/x86/include/asm/atomic.h:26 [inline]
             rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline]
             __rcu_is_watching kernel/rcu/tree.c:1133 [inline]
             rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147
             rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293
             radix_tree_deref_slot include/linux/radix-tree.h:238 [inline]
             filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335
             do_fault_around mm/memory.c:3231 [inline]
             do_read_fault mm/memory.c:3265 [inline]
             do_fault+0xbd5/0x2080 mm/memory.c:3370
             handle_pte_fault mm/memory.c:3600 [inline]
             __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714
             handle_mm_fault+0x1e2/0x480 mm/memory.c:3751
             __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397
             do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460
             page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011
      
      other info that might help us debug this:
      
       Possible unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(&(&q->lock)->rlock);
                                     lock(_xmit_ETHER#2);
                                     lock(&(&q->lock)->rlock);
        lock(_xmit_ETHER#2);
      
       *** DEADLOCK ***
      
      10 locks held by modprobe/12392:
       #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff81329758>]
      __do_page_fault+0x2b8/0xb60 arch/x86/mm/fault.c:1336
       #1:  (rcu_read_lock){......}, at: [<ffffffff8188cab6>]
      filemap_map_pages+0x1e6/0x1570 mm/filemap.c:2324
       #2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
      spin_lock include/linux/spinlock.h:299 [inline]
       #2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
      pte_alloc_one_map mm/memory.c:2944 [inline]
       #2:  (&(ptlock_ptr(page))->rlock#2){+.+...}, at: [<ffffffff81984a78>]
      alloc_set_pte+0x13b8/0x1b90 mm/memory.c:3072
       #3:  (((&q->timer))){+.-...}, at: [<ffffffff81627e72>]
      lockdep_copy_map include/linux/lockdep.h:175 [inline]
       #3:  (((&q->timer))){+.-...}, at: [<ffffffff81627e72>]
      call_timer_fn+0x1c2/0x820 kernel/time/timer.c:1258
       #4:  (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>] spin_lock
      include/linux/spinlock.h:299 [inline]
       #4:  (&(&q->lock)->rlock){+.-...}, at: [<ffffffff8389a4d1>]
      ip_expire+0x51/0x6c0 net/ipv4/ip_fragment.c:201
       #5:  (rcu_read_lock){......}, at: [<ffffffff8389a633>]
      ip_expire+0x1b3/0x6c0 net/ipv4/ip_fragment.c:216
       #6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] spin_trylock
      include/linux/spinlock.h:309 [inline]
       #6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>] icmp_xmit_lock
      net/ipv4/icmp.c:219 [inline]
       #6:  (slock-AF_INET){+.-...}, at: [<ffffffff839b3313>]
      icmp_send+0x803/0x1c80 net/ipv4/icmp.c:681
       #7:  (rcu_read_lock_bh){......}, at: [<ffffffff838ab9a1>]
      ip_finish_output2+0x2c1/0x15a0 net/ipv4/ip_output.c:198
       #8:  (rcu_read_lock_bh){......}, at: [<ffffffff836d1dee>]
      __dev_queue_xmit+0x23e/0x1e60 net/core/dev.c:3324
       #9:  (dev->qdisc_running_key ?: &qdisc_running_key){+.....}, at:
      [<ffffffff836d3a27>] dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
      
      stack backtrace:
      CPU: 0 PID: 12392 Comm: modprobe Not tainted 4.10.0+ #29
      Hardware name: Google Google Compute Engine/Google Compute Engine,
      BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:16 [inline]
       dump_stack+0x2ee/0x3ef lib/dump_stack.c:52
       print_circular_bug+0x307/0x3b0 kernel/locking/lockdep.c:1204
       check_prev_add kernel/locking/lockdep.c:1830 [inline]
       check_prevs_add+0xa8f/0x19f0 kernel/locking/lockdep.c:1940
       validate_chain kernel/locking/lockdep.c:2267 [inline]
       __lock_acquire+0x2149/0x3430 kernel/locking/lockdep.c:3340
       lock_acquire+0x2a1/0x630 kernel/locking/lockdep.c:3755
       __raw_spin_lock include/linux/spinlock_api_smp.h:142 [inline]
       _raw_spin_lock+0x33/0x50 kernel/locking/spinlock.c:151
       spin_lock include/linux/spinlock.h:299 [inline]
       __netif_tx_lock include/linux/netdevice.h:3486 [inline]
       sch_direct_xmit+0x282/0x6d0 net/sched/sch_generic.c:180
       __dev_xmit_skb net/core/dev.c:3092 [inline]
       __dev_queue_xmit+0x13e5/0x1e60 net/core/dev.c:3358
       dev_queue_xmit+0x17/0x20 net/core/dev.c:3423
       neigh_hh_output include/net/neighbour.h:468 [inline]
       neigh_output include/net/neighbour.h:476 [inline]
       ip_finish_output2+0xf6c/0x15a0 net/ipv4/ip_output.c:228
       ip_finish_output+0xa29/0xe10 net/ipv4/ip_output.c:316
       NF_HOOK_COND include/linux/netfilter.h:246 [inline]
       ip_output+0x1f0/0x7a0 net/ipv4/ip_output.c:404
       dst_output include/net/dst.h:486 [inline]
       ip_local_out+0x95/0x170 net/ipv4/ip_output.c:124
       ip_send_skb+0x3c/0xc0 net/ipv4/ip_output.c:1492
       ip_push_pending_frames+0x64/0x80 net/ipv4/ip_output.c:1512
       icmp_push_reply+0x372/0x4d0 net/ipv4/icmp.c:394
       icmp_send+0x156c/0x1c80 net/ipv4/icmp.c:754
       ip_expire+0x40e/0x6c0 net/ipv4/ip_fragment.c:239
       call_timer_fn+0x241/0x820 kernel/time/timer.c:1268
       expire_timers kernel/time/timer.c:1307 [inline]
       __run_timers+0x960/0xcf0 kernel/time/timer.c:1601
       run_timer_softirq+0x21/0x80 kernel/time/timer.c:1614
       __do_softirq+0x31f/0xbe7 kernel/softirq.c:284
       invoke_softirq kernel/softirq.c:364 [inline]
       irq_exit+0x1cc/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:657 [inline]
       smp_apic_timer_interrupt+0x76/0xa0 arch/x86/kernel/apic/apic.c:962
       apic_timer_interrupt+0x93/0xa0 arch/x86/entry/entry_64.S:707
      RIP: 0010:__read_once_size include/linux/compiler.h:254 [inline]
      RIP: 0010:atomic_read arch/x86/include/asm/atomic.h:26 [inline]
      RIP: 0010:rcu_dynticks_curr_cpu_in_eqs kernel/rcu/tree.c:350 [inline]
      RIP: 0010:__rcu_is_watching kernel/rcu/tree.c:1133 [inline]
      RIP: 0010:rcu_is_watching+0x83/0x110 kernel/rcu/tree.c:1147
      RSP: 0000:ffff8801c391f120 EFLAGS: 00000a03 ORIG_RAX: ffffffffffffff10
      RAX: dffffc0000000000 RBX: ffff8801c391f148 RCX: 0000000000000000
      RDX: 0000000000000000 RSI: 000055edd4374000 RDI: ffff8801dbe1ae0c
      RBP: ffff8801c391f1a0 R08: 0000000000000002 R09: 0000000000000000
      R10: dffffc0000000000 R11: 0000000000000002 R12: 1ffff10038723e25
      R13: ffff8801dbe1ae00 R14: ffff8801c391f680 R15: dffffc0000000000
       </IRQ>
       rcu_read_lock_held+0x87/0xc0 kernel/rcu/update.c:293
       radix_tree_deref_slot include/linux/radix-tree.h:238 [inline]
       filemap_map_pages+0x6d4/0x1570 mm/filemap.c:2335
       do_fault_around mm/memory.c:3231 [inline]
       do_read_fault mm/memory.c:3265 [inline]
       do_fault+0xbd5/0x2080 mm/memory.c:3370
       handle_pte_fault mm/memory.c:3600 [inline]
       __handle_mm_fault+0x1062/0x2cb0 mm/memory.c:3714
       handle_mm_fault+0x1e2/0x480 mm/memory.c:3751
       __do_page_fault+0x4f6/0xb60 arch/x86/mm/fault.c:1397
       do_page_fault+0x54/0x70 arch/x86/mm/fault.c:1460
       page_fault+0x28/0x30 arch/x86/entry/entry_64.S:1011
      RIP: 0033:0x7f83172f2786
      RSP: 002b:00007fffe859ae80 EFLAGS: 00010293
      RAX: 000055edd4373040 RBX: 00007f83175111c8 RCX: 000055edd4373238
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007f8317510970
      RBP: 00007fffe859afd0 R08: 0000000000000009 R09: 0000000000000000
      R10: 0000000000000064 R11: 0000000000000000 R12: 000055edd4373040
      R13: 0000000000000000 R14: 00007fffe859afe8 R15: 0000000000000000
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NDmitry Vyukov <dvyukov@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ec4fbd64
  10. 28 4月, 2016 1 次提交
  11. 17 2月, 2016 2 次提交
  12. 29 1月, 2016 1 次提交
    • J
      inet: frag: Always orphan skbs inside ip_defrag() · 8282f274
      Joe Stringer 提交于
      Later parts of the stack (including fragmentation) expect that there is
      never a socket attached to frag in a frag_list, however this invariant
      was not enforced on all defrag paths. This could lead to the
      BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
      end of this commit message.
      
      While the call could be added to openvswitch to fix this particular
      error, the head and tail of the frags list are already orphaned
      indirectly inside ip_defrag(), so it seems like the remaining fragments
      should all be orphaned in all circumstances.
      
      kernel BUG at net/ipv4/ip_output.c:586!
      [...]
      Call Trace:
       <IRQ>
       [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch]
       [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch]
       [<ffffffff81667830>] ? dst_discard_out+0x20/0x20
       [<ffffffff81667810>] ? dst_ifdown+0x80/0x80
       [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
       [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch]
       [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch]
       [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch]
       [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
       [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch]
       [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch]
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
       [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch]
       [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
       [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0
       [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0
       [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930
       [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930
       [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
       [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220
       [<ffffffff81661060>] dev_queue_xmit+0x10/0x20
       [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220
       [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590
       [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590
       [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590
       [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
       [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
       [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
       [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
       [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
       [<ffffffff816ab4c0>] ip_output+0x70/0x110
       [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
       [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
       [<ffffffff816abf89>] ip_send_skb+0x19/0x40
       [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
       [<ffffffff816df21a>] icmp_push_reply+0xea/0x120
       [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230
       [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70
       [<ffffffff816dfa06>] icmp_echo+0x36/0x70
       [<ffffffff816e0d11>] icmp_rcv+0x271/0x450
       [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0
       [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0
       [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0
       [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560
       [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560
       [<ffffffff816a5453>] ip_rcv+0x283/0x3e0
       [<ffffffff810b6302>] ? match_held_lock+0x192/0x200
       [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40
       [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0
       [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230
       [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
       [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60
       [<ffffffff8165e678>] process_backlog+0x78/0x230
       [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230
       [<ffffffff8165e355>] net_rx_action+0x155/0x400
       [<ffffffff8106b48c>] __do_softirq+0xcc/0x420
       [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590
       [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30
       <EOI>
       [<ffffffff8106b88e>] do_softirq+0x4e/0x60
       [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0
       [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590
       [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0
       [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
       [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
       [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
       [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
       [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
       [<ffffffff816ab4c0>] ip_output+0x70/0x110
       [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
       [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
       [<ffffffff816abf89>] ip_send_skb+0x19/0x40
       [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
       [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30
       [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
       [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0
       [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
       [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0
       [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0
       [<ffffffff8163e398>] sock_sendmsg+0x38/0x50
       [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270
       [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320
       [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40
       [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460
       [<ffffffff81204886>] ? __fget_light+0x66/0x90
       [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80
       [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20
       [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f
      Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f
       66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
      RIP  [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0
       RSP <ffff88006d603170>
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Signed-off-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8282f274
  13. 06 1月, 2016 1 次提交
  14. 03 11月, 2015 1 次提交
  15. 13 10月, 2015 1 次提交
  16. 30 9月, 2015 1 次提交
  17. 29 8月, 2015 1 次提交
  18. 14 8月, 2015 1 次提交
  19. 27 7月, 2015 2 次提交
  20. 22 7月, 2015 1 次提交
  21. 12 7月, 2015 1 次提交
    • F
      Revert "ipv4: use skb coalescing in defragmentation" · 14fe22e3
      Florian Westphal 提交于
      This reverts commit 3cc49492.
      
      There is nothing wrong with coalescing during defragmentation, it
      reduces truesize overhead and simplifies things for the receiving
      socket (no fraglist walk needed).
      
      However, it also destroys geometry of the original fragments.
      While that doesn't cause any breakage (we make sure to not exceed largest
      original size) ip_do_fragment contains a 'fastpath' that takes advantage
      of a present frag list and results in fragments that (in most cases)
      match what was received.
      
      In case its needed the coalescing could be done later, when we're sure
      the skb is not forwarded.  But discussion during NFWS resulted in
      'lets just remove this for now'.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      14fe22e3
  22. 28 5月, 2015 1 次提交
    • F
      ip_fragment: don't forward defragmented DF packet · d6b915e2
      Florian Westphal 提交于
      We currently always send fragments without DF bit set.
      
      Thus, given following setup:
      
      mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280
         A           R1              R2         B
      
      Where R1 and R2 run linux with netfilter defragmentation/conntrack
      enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1
      will respond with icmp too big error if one of these fragments exceeded
      1400 bytes.
      
      However, if R1 receives fragment sizes 1200 and 100, it would
      forward the reassembled packet without refragmenting, i.e.
      R2 will send an icmp error in response to a packet that was never sent,
      citing mtu that the original sender never exceeded.
      
      The other minor issue is that a refragmentation on R1 will conceal the
      MTU of R2-B since refragmentation does not set DF bit on the fragments.
      
      This modifies ip_fragment so that we track largest fragment size seen
      both for DF and non-DF packets, and set frag_max_size to the largest
      value.
      
      If the DF fragment size is larger or equal to the non-df one, we will
      consider the packet a path mtu probe:
      We set DF bit on the reassembled skb and also tag it with a new IPCB flag
      to force refragmentation even if skb fits outdev mtu.
      
      We will also set DF bit on each fragment in this case.
      
      Joint work with Hannes Frederic Sowa.
      Reported-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d6b915e2
  23. 19 5月, 2015 2 次提交
  24. 04 4月, 2015 2 次提交
  25. 06 3月, 2015 1 次提交
  26. 21 2月, 2015 1 次提交
  27. 12 11月, 2014 1 次提交
    • J
      net: Convert LIMIT_NETDEBUG to net_dbg_ratelimited · ba7a46f1
      Joe Perches 提交于
      Use the more common dynamic_debug capable net_dbg_ratelimited
      and remove the LIMIT_NETDEBUG macro.
      
      All messages are still ratelimited.
      
      Some KERN_<LEVEL> uses are changed to KERN_DEBUG.
      
      This may have some negative impact on messages that were
      emitted at KERN_INFO that are not not enabled at all unless
      DEBUG is defined or dynamic_debug is enabled.  Even so,
      these messages are now _not_ emitted by default.
      
      This also eliminates the use of the net_msg_warn sysctl
      "/proc/sys/net/core/warnings".  For backward compatibility,
      the sysctl is not removed, but it has no function.  The extern
      declaration of net_msg_warn is removed from sock.h and made
      static in net/core/sysctl_net_core.c
      
      Miscellanea:
      
      o Update the sysctl documentation
      o Remove the embedded uses of pr_fmt
      o Coalesce format fragments
      o Realign arguments
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ba7a46f1