1. 17 1月, 2018 6 次提交
  2. 16 1月, 2018 5 次提交
  3. 15 1月, 2018 1 次提交
  4. 13 1月, 2018 1 次提交
  5. 11 1月, 2018 14 次提交
  6. 10 1月, 2018 13 次提交
    • J
      net: free RX queue structures · 82aaff2f
      Jakub Kicinski 提交于
      Looks like commit e817f856 ("xdp: generic XDP handling of
      xdp_rxq_info") replaced kvfree(dev->_rx) in free_netdev() with
      a call to netif_free_rx_queues() which doesn't actually free
      the rings?
      
      While at it remove the unnecessary temporary variable.
      
      Fixes: e817f856 ("xdp: generic XDP handling of xdp_rxq_info")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      82aaff2f
    • J
      net: use the right variant of kfree · 141b52a9
      Jakub Kicinski 提交于
      kvzalloc'ed memory should be kvfree'd.
      
      Fixes: e817f856 ("xdp: generic XDP handling of xdp_rxq_info")
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reviewed-by: NSimon Horman <simon.horman@netronome.com>
      Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      141b52a9
    • A
      bpf: introduce BPF_JIT_ALWAYS_ON config · 290af866
      Alexei Starovoitov 提交于
      The BPF interpreter has been used as part of the spectre 2 attack CVE-2017-5715.
      
      A quote from goolge project zero blog:
      "At this point, it would normally be necessary to locate gadgets in
      the host kernel code that can be used to actually leak data by reading
      from an attacker-controlled location, shifting and masking the result
      appropriately and then using the result of that as offset to an
      attacker-controlled address for a load. But piecing gadgets together
      and figuring out which ones work in a speculation context seems annoying.
      So instead, we decided to use the eBPF interpreter, which is built into
      the host kernel - while there is no legitimate way to invoke it from inside
      a VM, the presence of the code in the host kernel's text section is sufficient
      to make it usable for the attack, just like with ordinary ROP gadgets."
      
      To make attacker job harder introduce BPF_JIT_ALWAYS_ON config
      option that removes interpreter from the kernel in favor of JIT-only mode.
      So far eBPF JIT is supported by:
      x64, arm64, arm32, sparc64, s390, powerpc64, mips64
      
      The start of JITed program is randomized and code page is marked as read-only.
      In addition "constant blinding" can be turned on with net.core.bpf_jit_harden
      
      v2->v3:
      - move __bpf_prog_ret0 under ifdef (Daniel)
      
      v1->v2:
      - fix init order, test_bpf and cBPF (Daniel's feedback)
      - fix offloaded bpf (Jakub's feedback)
      - add 'return 0' dummy in case something can invoke prog->bpf_func
      - retarget bpf tree. For bpf-next the patch would need one extra hunk.
        It will be sent when the trees are merged back to net-next
      
      Considered doing:
        int bpf_jit_enable __read_mostly = BPF_EBPF_JIT_DEFAULT;
      but it seems better to land the patch as-is and in bpf-next remove
      bpf_jit_enable global variable from all JITs, consolidate in one place
      and remove this jit_init() function.
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      290af866
    • J
      tipc: improve poll() for group member socket · eb929a91
      Jon Maloy 提交于
      The current criteria for returning POLLOUT from a group member socket is
      too simplistic. It basically returns POLLOUT as soon as the group has
      external destinations, something obviously leading to a lot of spinning
      during destination congestion situations. At the same time, the internal
      congestion handling is unnecessarily complex.
      
      We now change this as follows.
      
      - We introduce an 'open' flag in  struct tipc_group. This flag is used
        only to help poll() get the setting of POLLOUT right, and *not* for
        congeston handling as such. This means that a user can choose to
        ignore an  EAGAIN for a destination and go on sending messages to
        other destinations in the group if he wants to.
      
      - The flag is set to false every time we return EAGAIN on a send call.
      
      - The flag is set to true every time any member, i.e., not necessarily
        the member that caused EAGAIN, is removed from the small_win list.
      
      - We remove the group member 'usr_pending' flag. The size of the send
        window and presence in the 'small_win' list is sufficient criteria
        for recognizing congestion.
      
      This solution seems to be a reasonable compromise between 'anycast',
      which is normally not waiting for POLLOUT for a specific destination,
      and the other three send modes, which are.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb929a91
    • J
      tipc: improve groupcast scope handling · 232d07b7
      Jon Maloy 提交于
      When a member joins a group, it also indicates a binding scope. This
      makes it possible to create both node local groups, invisible to other
      nodes, as well as cluster global groups, visible everywhere.
      
      In order to avoid that different members end up having permanently
      differing views of group size and memberhip, we must inhibit locally
      and globally bound members from joining the same group.
      
      We do this by using the binding scope as an additional separator between
      groups. I.e., a member must ignore all membership events from sockets
      using a different scope than itself, and all lookups for message
      destinations must require an exact match between the message's lookup
      scope and the potential target's binding scope.
      
      Apart from making it possible to create local groups using the same
      identity on different nodes, a side effect of this is that it now also
      becomes possible to create a cluster global group with the same identity
      across the same nodes, without interfering with the local groups.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      232d07b7
    • J
      tipc: add option to suppress PUBLISH events for pre-existing publications · 8348500f
      Jon Maloy 提交于
      Currently, when a user is subscribing for binding table publications,
      he will receive a PUBLISH event for all already existing matching items
      in the binding table.
      
      However, a group socket making a subscriptions doesn't need this initial
      status update from the binding table, because it has already scanned it
      during the join operation. Worse, the multiplicatory effect of issuing
      mutual events for dozens or hundreds group members within a short time
      frame put a heavy load on the topology server, with the end result that
      scale out operations on a big group tend to take much longer than needed.
      
      We now add a new filter option, TIPC_SUB_NO_STATUS, for topology server
      subscriptions, so that this initial avalanche of events is suppressed.
      This change, along with the previous commit, significantly improves the
      range and speed of group scale out operations.
      
      We keep the new option internal for the tipc driver, at least for now.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8348500f
    • J
      tipc: send out join messages as soon as new member is discovered · d12d2e12
      Jon Maloy 提交于
      When a socket is joining a group, we look up in the binding table to
      find if there are already other members of the group present. This is
      used for being able to return EAGAIN instead of EHOSTUNREACH if the
      user proceeds directly to a send attempt.
      
      However, the information in the binding table can be used to directly
      set the created member in state MBR_PUBLISHED and send a JOIN message
      to the peer, instead of waiting for a topology PUBLISH event to do this.
      When there are many members in a group, the propagation time for such
      events can be significant, and we can save time during the join
      operation if we use the initial lookup result fully.
      
      In this commit, we eliminate the member state MBR_DISCOVERED which has
      been the result of the initial lookup, and do instead go directly to
      MBR_PUBLISHED, which initiates the setup.
      
      After this change, the tipc_member FSM looks as follows:
      
           +-----------+
      ---->| PUBLISHED |-----------------------------------------------+
      PUB- +-----------+                                 LEAVE/WITHRAW |
      LISH       |JOIN                                                 |
                 |     +-------------------------------------------+   |
                 |     |                            LEAVE/WITHDRAW |   |
                 |     |                +------------+             |   |
                 |     |   +----------->|  PENDING   |---------+   |   |
                 |     |   |msg/maxactv +-+---+------+  LEAVE/ |   |   |
                 |     |   |              |   |       WITHDRAW |   |   |
                 |     |   |   +----------+   |                |   |   |
                 |     |   |   |revert/maxactv|                |   |   |
                 |     |   |   V              V                V   V   V
                 |   +----------+  msg  +------------+       +-----------+
                 +-->|  JOINED  |------>|   ACTIVE   |------>|  LEAVING  |--->
                 |   +----------+       +--- -+------+ LEAVE/+-----------+DOWN
                 |        A   A               |      WITHDRAW A   A    A   EVT
                 |        |   |               |RECLAIM        |   |    |
                 |        |   |REMIT          V               |   |    |
                 |        |   |== adv   +------------+        |   |    |
                 |        |   +---------| RECLAIMING |--------+   |    |
                 |        |             +-----+------+  LEAVE/    |    |
                 |        |                   |REMIT   WITHDRAW   |    |
                 |        |                   |< adv              |    |
                 |        |msg/               V            LEAVE/ |    |
                 |        |adv==ADV_IDLE+------------+   WITHDRAW |    |
                 |        +-------------|  REMITTED  |------------+    |
                 |                      +------------+                 |
                 |PUBLISH                                              |
      JOIN +-----------+                                LEAVE/WITHDRAW |
      ---->|  JOINING  |-----------------------------------------------+
           +-----------+
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d12d2e12
    • J
      tipc: simplify group LEAVE sequence · c2b22bcf
      Jon Maloy 提交于
      After the changes in the previous commit the group LEAVE sequence
      can be simplified.
      
      We now let the arrival of a LEAVE message unconditionally issue a group
      DOWN event to the user. When a topology WITHDRAW event is received, the
      member, if it still there, is set to state LEAVING, but we only issue a
      group DOWN event when the link to the peer node is gone, so that no
      LEAVE message is to be expected.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c2b22bcf
    • J
      tipc: create group member event messages when they are needed · 7ad32bcb
      Jon Maloy 提交于
      In the current implementation, a group socket receiving topology
      events about other members just converts the topology event message
      into a group event message and stores it until it reaches the right
      state to issue it to the user. This complicates the code unnecessarily,
      and becomes impractical when we in the coming commits will need to
      create and issue membership events independently.
      
      In this commit, we change this so that we just notice the type and
      origin of the incoming topology event, and then drop the buffer. Only
      when it is time to actually send a group event to the user do we
      explicitly create a new message and send it upwards.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ad32bcb
    • J
      tipc: adjustment to group member FSM · 0233493a
      Jon Maloy 提交于
      Analysis reveals that the member state MBR_QURANTINED in reality is
      unnecessary, and can be replaced by the state MBR_JOINING at all
      occurrencs.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0233493a
    • J
      tipc: let group member stay in JOINED mode if unable to reclaim · 4ea5dab5
      Jon Maloy 提交于
      We handle a corner case in the function tipc_group_update_rcv_win().
      During extreme pessure it might happen that a message receiver has all
      its active senders in RECLAIMING or REMITTED mode, meaning that there
      is nobody to reclaim advertisements from if an additional sender tries
      to go active.
      
      Currently we just set the new sender to ACTIVE anyway, hence at least
      theoretically opening up for a receiver queue overflow by exceeding the
      MAX_ACTIVE limit. The correct solution to this is to instead add the
      member to the pending queue, while letting the oldest member in that
      queue revert to JOINED state.
      
      In this commit we refactor the code for handling message arrival from
      a JOINED member, both to make it more comprehensible and to cover the
      case described above.
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ea5dab5
    • J
      tipc: a couple of cleanups · 8d5dee21
      Jon Maloy 提交于
      - We remove the 'reclaiming' member list in struct tipc_group, since
        it doesn't serve any purpose.
      
      - We simplify the GRP_REMIT_MSG branch of tipc_group_protocol_rcv().
      Acked-by: NYing Xue <ying.xue@windriver.com>
      Signed-off-by: NJon Maloy <jon.maloy@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8d5dee21
    • W
      ipv6: remove null_entry before adding default route · 4512c43e
      Wei Wang 提交于
      In the current code, when creating a new fib6 table, tb6_root.leaf gets
      initialized to net->ipv6.ip6_null_entry.
      If a default route is being added with rt->rt6i_metric = 0xffffffff,
      fib6_add() will add this route after net->ipv6.ip6_null_entry. As
      null_entry is shared, it could cause problem.
      
      In order to fix it, set fn->leaf to NULL before calling
      fib6_add_rt2node() when trying to add the first default route.
      And reset fn->leaf to null_entry when adding fails or when deleting the
      last default route.
      
      syzkaller reported the following issue which is fixed by this commit:
      
      WARNING: suspicious RCU usage
      4.15.0-rc5+ #171 Not tainted
      -----------------------------
      net/ipv6/ip6_fib.c:1702 suspicious rcu_dereference_protected() usage!
      
      other info that might help us debug this:
      
      rcu_scheduler_active = 2, debug_locks = 1
      4 locks held by swapper/0/0:
       #0:  ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] lockdep_copy_map include/linux/lockdep.h:178 [inline]
       #0:  ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [<00000000d43f631b>] call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1310
       #1:  (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] spin_lock_bh include/linux/spinlock.h:315 [inline]
       #1:  (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<000000002ff9d65c>] fib6_run_gc+0x9d/0x3c0 net/ipv6/ip6_fib.c:2007
       #2:  (rcu_read_lock){....}, at: [<0000000091db762d>] __fib6_clean_all+0x0/0x3a0 net/ipv6/ip6_fib.c:1560
       #3:  (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] spin_lock_bh include/linux/spinlock.h:315 [inline]
       #3:  (&(&tb->tb6_lock)->rlock){+.-.}, at: [<000000009e503581>] __fib6_clean_all+0x1d0/0x3a0 net/ipv6/ip6_fib.c:1948
      
      stack backtrace:
      CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #171
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       <IRQ>
       __dump_stack lib/dump_stack.c:17 [inline]
       dump_stack+0x194/0x257 lib/dump_stack.c:53
       lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585
       fib6_del+0xcaa/0x11b0 net/ipv6/ip6_fib.c:1701
       fib6_clean_node+0x3aa/0x4f0 net/ipv6/ip6_fib.c:1892
       fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1815
       fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1863
       fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1933
       __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1949
       fib6_clean_all net/ipv6/ip6_fib.c:1960 [inline]
       fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2016
       fib6_gc_timer_cb+0x20/0x30 net/ipv6/ip6_fib.c:2033
       call_timer_fn+0x228/0x820 kernel/time/timer.c:1320
       expire_timers kernel/time/timer.c:1357 [inline]
       __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660
       run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
       __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
       invoke_softirq kernel/softirq.c:365 [inline]
       irq_exit+0x1cc/0x200 kernel/softirq.c:405
       exiting_irq arch/x86/include/asm/apic.h:540 [inline]
       smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
       apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904
       </IRQ>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Fixes: 66f5d6ce ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4512c43e