1. 03 8月, 2014 12 次提交
  2. 01 8月, 2014 7 次提交
  3. 31 7月, 2014 4 次提交
    • P
      net: filter: don't release unattached filter through call_rcu() · 34c5bd66
      Pablo Neira 提交于
      sk_unattached_filter_destroy() does not always need to release the
      filter object via rcu. Since this filter is never attached to the
      socket, the caller should be responsible for releasing the filter
      in a safe way, which may not necessarily imply rcu.
      
      This is a short summary of clients of this function:
      
      1) xt_bpf.c and cls_bpf.c use the bpf matchers from rules, these rules
         are removed from the packet path before the filter is released. Thus,
         the framework makes sure the filter is safely removed.
      
      2) In the ppp driver, the ppp_lock ensures serialization between the
         xmit and filter attachment/detachment path. This doesn't use rcu
         so deferred release via rcu makes no sense.
      
      3) In the isdn/ppp driver, it is called from isdn_ppp_release()
         the isdn_ppp_ioctl(). This driver uses mutex and spinlocks, no rcu.
         Thus, deferred rcu makes no sense to me either, the deferred releases
         may be just masking the effects of wrong locking strategy, which
         should be fixed in the driver itself.
      
      4) In the team driver, this is the only place where the rcu
         synchronization with unattached filter is used. Therefore, this
         patch introduces synchronize_rcu() which is called from the
         genetlink path to make sure the filter doesn't go away while packets
         are still walking over it. I think we can revisit this once struct
         bpf_prog (that only wraps specific bpf code bits) is in place, then
         add some specific struct rcu_head in the scope of the team driver if
         Jiri thinks this is needed.
      
      Deferred rcu release for unattached filters was originally introduced
      in 302d6637 ("filter: Allow to create sk-unattached filters").
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34c5bd66
    • T
      net: Remove unlikely() for WARN_ON() conditions · 80019d31
      Thomas Graf 提交于
      No need for the unlikely(), WARN_ON() and BUG_ON() internally use
      unlikely() on the condition.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80019d31
    • A
      dcbnl : Fix misleading dcb_app->priority explanation · 16eecd9b
      Anish Bhatt 提交于
      Current explanation of dcb_app->priority is wrong. It says priority is
      expected to be a 3-bit unsigned integer which is only true when working with
      DCBx-IEEE. Use of dcb_app->priority by DCBx-CEE expects it to be 802.1p user
      priority bitmap. Updated accordingly
      
      This affects the cxgb4 driver, but I will post those changes as part of a
      larger changeset shortly.
      
      Fixes: 3e29027a ("dcbnl: add support for ieee8021Qaz attributes")
      Signed-off-by: NAnish Bhatt <anish@chelsio.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16eecd9b
    • A
      netfilter: nfnetlink_acct: dump unmodified nfacct flags · d24675cb
      Alexey Perevalov 提交于
      NFNL_MSG_ACCT_GET_CTRZERO modifies dumped flags, in this case
      client see unmodified (uncleared) counter value and cleared
      overquota state - end user doesn't know anything about overquota state,
      unless end user subscribed on overquota report.
      Signed-off-by: NAlexey Perevalov <a.perevalov@samsung.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d24675cb
  4. 30 7月, 2014 11 次提交
    • K
      ipv4: clean up cast warning in do_ip_getsockopt · c54a5e02
      Karoly Kemeny 提交于
      Sparse warns because of implicit pointer cast.
      
      v2: subject line correction, space between "void" and "*"
      Signed-off-by: NKaroly Kemeny <karoly.kemeny@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c54a5e02
    • W
      tipc: remove duplicated include from socket.c · ad025a56
      Wei Yongjun 提交于
      Remove duplicated include.
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad025a56
    • H
      net/udp_offload: Use IS_ERR_OR_NULL · 27446442
      Himangi Saraogi 提交于
      This patch introduces the use of the macro IS_ERR_OR_NULL in place of
      tests for NULL and IS_ERR.
      
      The following Coccinelle semantic patch was used for making the change:
      
      @@
      expression e;
      @@
      
      - e == NULL || IS_ERR(e)
      + IS_ERR_OR_NULL(e)
       || ...
      Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
      Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27446442
    • H
      openvswitch: Use IS_ERR_OR_NULL · d0e992aa
      Himangi Saraogi 提交于
      This patch introduces the use of the macro IS_ERR_OR_NULL in place of
      tests for NULL and IS_ERR.
      
      The following Coccinelle semantic patch was used for making the change:
      
      @@
      expression e;
      @@
      
      - e == NULL || IS_ERR(e)
      + IS_ERR_OR_NULL(e)
       || ...
      Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
      Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
      Acked-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0e992aa
    • H
      net/ipv4: Use IS_ERR_OR_NULL · 5a8dbf03
      Himangi Saraogi 提交于
      This patch introduces the use of the macro IS_ERR_OR_NULL in place of
      tests for NULL and IS_ERR.
      
      The following Coccinelle semantic patch was used for making the change:
      
      @@
      expression e;
      @@
      
      - e == NULL || IS_ERR(e)
      + IS_ERR_OR_NULL(e)
       || ...
      Signed-off-by: NHimangi Saraogi <himangi774@gmail.com>
      Acked-by: NJulia Lawall <julia.lawall@lip6.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a8dbf03
    • A
      net: sendmsg: fix NULL pointer dereference · 40eea803
      Andrey Ryabinin 提交于
      Sasha's report:
      	> While fuzzing with trinity inside a KVM tools guest running the latest -next
      	> kernel with the KASAN patchset, I've stumbled on the following spew:
      	>
      	> [ 4448.949424] ==================================================================
      	> [ 4448.951737] AddressSanitizer: user-memory-access on address 0
      	> [ 4448.952988] Read of size 2 by thread T19638:
      	> [ 4448.954510] CPU: 28 PID: 19638 Comm: trinity-c76 Not tainted 3.16.0-rc4-next-20140711-sasha-00046-g07d3099-dirty #813
      	> [ 4448.956823]  ffff88046d86ca40 0000000000000000 ffff880082f37e78 ffff880082f37a40
      	> [ 4448.958233]  ffffffffb6e47068 ffff880082f37a68 ffff880082f37a58 ffffffffb242708d
      	> [ 4448.959552]  0000000000000000 ffff880082f37a88 ffffffffb24255b1 0000000000000000
      	> [ 4448.961266] Call Trace:
      	> [ 4448.963158] dump_stack (lib/dump_stack.c:52)
      	> [ 4448.964244] kasan_report_user_access (mm/kasan/report.c:184)
      	> [ 4448.965507] __asan_load2 (mm/kasan/kasan.c:352)
      	> [ 4448.966482] ? netlink_sendmsg (net/netlink/af_netlink.c:2339)
      	> [ 4448.967541] netlink_sendmsg (net/netlink/af_netlink.c:2339)
      	> [ 4448.968537] ? get_parent_ip (kernel/sched/core.c:2555)
      	> [ 4448.970103] sock_sendmsg (net/socket.c:654)
      	> [ 4448.971584] ? might_fault (mm/memory.c:3741)
      	> [ 4448.972526] ? might_fault (./arch/x86/include/asm/current.h:14 mm/memory.c:3740)
      	> [ 4448.973596] ? verify_iovec (net/core/iovec.c:64)
      	> [ 4448.974522] ___sys_sendmsg (net/socket.c:2096)
      	> [ 4448.975797] ? put_lock_stats.isra.13 (./arch/x86/include/asm/preempt.h:98 kernel/locking/lockdep.c:254)
      	> [ 4448.977030] ? lock_release_holdtime (kernel/locking/lockdep.c:273)
      	> [ 4448.978197] ? lock_release_non_nested (kernel/locking/lockdep.c:3434 (discriminator 1))
      	> [ 4448.979346] ? check_chain_key (kernel/locking/lockdep.c:2188)
      	> [ 4448.980535] __sys_sendmmsg (net/socket.c:2181)
      	> [ 4448.981592] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
      	> [ 4448.982773] ? trace_hardirqs_on (kernel/locking/lockdep.c:2607)
      	> [ 4448.984458] ? syscall_trace_enter (arch/x86/kernel/ptrace.c:1500 (discriminator 2))
      	> [ 4448.985621] ? trace_hardirqs_on_caller (kernel/locking/lockdep.c:2600)
      	> [ 4448.986754] SyS_sendmmsg (net/socket.c:2201)
      	> [ 4448.987708] tracesys (arch/x86/kernel/entry_64.S:542)
      	> [ 4448.988929] ==================================================================
      
      This reports means that we've come to netlink_sendmsg() with msg->msg_name == NULL and msg->msg_namelen > 0.
      
      After this report there was no usual "Unable to handle kernel NULL pointer dereference"
      and this gave me a clue that address 0 is mapped and contains valid socket address structure in it.
      
      This bug was introduced in f3d33426
      (net: rework recvmsg handler msg_name and msg_namelen logic).
      Commit message states that:
      	"Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
      	 non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
      	 affect sendto as it would bail out earlier while trying to copy-in the
      	 address."
      But in fact this affects sendto when address 0 is mapped and contains
      socket address structure in it. In such case copy-in address will succeed,
      verify_iovec() function will successfully exit with msg->msg_namelen > 0
      and msg->msg_name == NULL.
      
      This patch fixes it by setting msg_namelen to 0 if msg_name == NULL.
      
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: <stable@vger.kernel.org>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40eea803
    • W
      vlan: fail early when creating netdev named config · 9c5ff24f
      WANG Cong 提交于
      Similarly, vlan will create  /proc/net/vlan/<dev>, so when we
      create dev with name "config", it will confict with
      /proc/net/vlan/config.
      Reported-by: NStephane Chazelas <stephane.chazelas@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c5ff24f
    • W
      ipv6: fail early when creating netdev named all or default · a317a2f1
      WANG Cong 提交于
      We create a proc dir for each network device, this will cause
      conflicts when the devices have name "all" or "default".
      
      Rather than emitting an ugly kernel warning, we could just
      fail earlier by checking the device name.
      Reported-by: NStephane Chazelas <stephane.chazelas@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a317a2f1
    • W
      ipv4: fail early when creating netdev named all or default · 20e61da7
      WANG Cong 提交于
      We create a proc dir for each network device, this will cause
      conflicts when the devices have name "all" or "default".
      
      Rather than emitting an ugly kernel warning, we could just
      fail earlier by checking the device name.
      Reported-by: NStephane Chazelas <stephane.chazelas@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      20e61da7
    • W
      net: remove deprecated syststamp timestamp · 4d276eb6
      Willem de Bruijn 提交于
      The SO_TIMESTAMPING API defines three types of timestamps: software,
      hardware in raw format (hwtstamp) and hardware converted to system
      format (syststamp). The last has been deprecated in favor of combining
      hwtstamp with a PTP clock driver. There are no active users in the
      kernel.
      
      The option was device driver dependent. If set, but without hardware
      support, the correct behavior is to return zero in the relevant field
      in the SCM_TIMESTAMPING ancillary message. Without device drivers
      implementing the option, this field is effectively always zero.
      
      Remove the internal plumbing to dissuage new drivers from implementing
      the feature. Keep the SOF_TIMESTAMPING_SYS_HARDWARE flag, however, to
      avoid breaking existing applications that request the timestamp.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4d276eb6
    • W
      packet: remove deprecated syststamp timestamp · 68a360e8
      Willem de Bruijn 提交于
      No device driver will ever return an skb_shared_info structure with
      syststamp non-zero, so remove the branch that tests for this and
      optionally marks the packet timestamp as TP_STATUS_TS_SYS_HARDWARE.
      
      Do not remove the definition TP_STATUS_TS_SYS_HARDWARE, as processes
      may refer to it.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68a360e8
  5. 29 7月, 2014 3 次提交
    • E
      ip: make IP identifiers less predictable · 04ca6973
      Eric Dumazet 提交于
      In "Counting Packets Sent Between Arbitrary Internet Hosts", Jeffrey and
      Jedidiah describe ways exploiting linux IP identifier generation to
      infer whether two machines are exchanging packets.
      
      With commit 73f156a6 ("inetpeer: get rid of ip_id_count"), we
      changed IP id generation, but this does not really prevent this
      side-channel technique.
      
      This patch adds a random amount of perturbation so that IP identifiers
      for a given destination [1] are no longer monotonically increasing after
      an idle period.
      
      Note that prandom_u32_max(1) returns 0, so if generator is used at most
      once per jiffy, this patch inserts no hole in the ID suite and do not
      increase collision probability.
      
      This is jiffies based, so in the worst case (HZ=1000), the id can
      rollover after ~65 seconds of idle time, which should be fine.
      
      We also change the hash used in __ip_select_ident() to not only hash
      on daddr, but also saddr and protocol, so that ICMP probes can not be
      used to infer information for other protocols.
      
      For IPv6, adds saddr into the hash as well, but not nexthdr.
      
      If I ping the patched target, we can see ID are now hard to predict.
      
      21:57:11.008086 IP (...)
          A > target: ICMP echo request, seq 1, length 64
      21:57:11.010752 IP (... id 2081 ...)
          target > A: ICMP echo reply, seq 1, length 64
      
      21:57:12.013133 IP (...)
          A > target: ICMP echo request, seq 2, length 64
      21:57:12.015737 IP (... id 3039 ...)
          target > A: ICMP echo reply, seq 2, length 64
      
      21:57:13.016580 IP (...)
          A > target: ICMP echo request, seq 3, length 64
      21:57:13.019251 IP (... id 3437 ...)
          target > A: ICMP echo reply, seq 3, length 64
      
      [1] TCP sessions uses a per flow ID generator not changed by this patch.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NJeffrey Knockel <jeffk@cs.unm.edu>
      Reported-by: NJedidiah R. Crandall <crandall@cs.unm.edu>
      Cc: Willy Tarreau <w@1wt.eu>
      Cc: Hannes Frederic Sowa <hannes@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04ca6973
    • J
      tipc: make tipc_buf_append() more robust · 13e9b997
      Jon Paul Maloy 提交于
      As per comment from David Miller, we try to make the buffer reassembly
      function more resilient to user errors than it is today.
      
      - We check that the "*buf" parameter always is set, since this is
        mandatory input.
      
      - We ensure that *buf->next always is set to NULL before linking in
        the buffer, instead of relying of the caller to have done this.
      
      - We ensure that the "tail" pointer in the head buffer's control
        block is initialized to NULL when the first fragment arrives.
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13e9b997
    • J
      neighbour : fix ndm_type type error issue · 545469f7
      Jun Zhao 提交于
      ndm_type means L3 address type, in neighbour proxy and vxlan, it's RTN_UNICAST.
      NDA_DST is for netlink TLV type, hence it's not right value in this context.
      Signed-off-by: NJun Zhao <mypopydev@gmail.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      545469f7
  6. 28 7月, 2014 3 次提交
    • N
      inet: frag: set limits and make init_net's high_thresh limit global · 1bab4c75
      Nikolay Aleksandrov 提交于
      This patch makes init_net's high_thresh limit to be the maximum for all
      namespaces, thus introducing a global memory limit threshold equal to the
      sum of the individual high_thresh limits which are capped.
      It also introduces some sane minimums for low_thresh as it shouldn't be
      able to drop below 0 (or > high_thresh in the unsigned case), and
      overall low_thresh should not ever be above high_thresh, so we make the
      following relations for a namespace:
      init_net:
       high_thresh - max(not capped), min(init_net low_thresh)
       low_thresh - max(init_net high_thresh), min (0)
      
      all other namespaces:
       high_thresh = max(init_net high_thresh), min(namespace's low_thresh)
       low_thresh = max(namespace's high_thresh), min(0)
      
      The major issue with having low_thresh > high_thresh is that we'll
      schedule eviction but never evict anything and thus rely only on the
      timers.
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bab4c75
    • F
      inet: frag: use seqlock for hash rebuild · ab1c724f
      Florian Westphal 提交于
      rehash is rare operation, don't force readers to take
      the read-side rwlock.
      
      Instead, we only have to detect the (rare) case where
      the secret was altered while we are trying to insert
      a new inetfrag queue into the table.
      
      If it was changed, drop the bucket lock and recompute
      the hash to get the 'new' chain bucket that we have to
      insert into.
      
      Joint work with Nikolay Aleksandrov.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ab1c724f
    • F
      inet: frag: remove periodic secret rebuild timer · e3a57d18
      Florian Westphal 提交于
      merge functionality into the eviction workqueue.
      
      Instead of rebuilding every n seconds, take advantage of the upper
      hash chain length limit.
      
      If we hit it, mark table for rebuild and schedule workqueue.
      To prevent frequent rebuilds when we're completely overloaded,
      don't rebuild more than once every 5 seconds.
      
      ipfrag_secret_interval sysctl is now obsolete and has been marked as
      deprecated, it still can be changed so scripts won't be broken but it
      won't have any effect. A comment is left above each unused secret_timer
      variable to avoid confusion.
      
      Joint work with Nikolay Aleksandrov.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3a57d18