1. 23 5月, 2015 6 次提交
    • N
    • M
      ipv4: fill in table id when replacing a route · d4e64c29
      Michal Kubeček 提交于
      When replacing an IPv4 route, tb_id member of the new fib_alias
      structure is not set in the replace code path so that the new route is
      ignored.
      
      Fixes: 0ddcf43d ("ipv4: FIB Local/MAIN table collapse")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NAlexander Duyck <alexander.h.duyck@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4e64c29
    • B
      cdc_ncm: Fix tx_bytes statistics · 44f6731d
      Bjørn Mork 提交于
      The tx_curr_frame_payload field is u32. When we try to calculate a
      small negative delta based on it, we end up with a positive integer
      close to 2^32 instead.  So the tx_bytes pointer increases by about
      2^32 for every transmitted frame.
      
      Fix by calculating the delta as a signed long.
      
      Cc: Ben Hutchings <ben.hutchings@codethink.co.uk>
      Reported-by: NFlorian Bruhin <me@the-compiler.org>
      Fixes: 7a1e890e ("usbnet: Fix tx_bytes statistic running backward in cdc_ncm")
      Signed-off-by: NBjørn Mork <bjorn@mork.no>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44f6731d
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 572152ad
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      Netfilter fixes for net
      
      The following patchset contain Netfilter fixes for your net tree, they are:
      
      1) Fix a race in nfnetlink_log and nfnetlink_queue that can lead to a crash.
         This problem is due to wrong order in the per-net registration and netlink
         socket events. Patch from Francesco Ruggeri.
      
      2) Make sure that counters that userspace pass us are higher than 0 in all the
         x_tables frontends. Discovered via Trinity, patch from Dave Jones.
      
      3) Revert a patch for br_netfilter to rely on the conntrack status bits. This
         breaks stateless IPv6 NAT transformations. Patch from Florian Westphal.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      572152ad
    • E
      ipv4: Avoid crashing in ip_error · 381c759d
      Eric W. Biederman 提交于
      ip_error does not check if in_dev is NULL before dereferencing it.
      
      IThe following sequence of calls is possible:
      CPU A                          CPU B
      ip_rcv_finish
          ip_route_input_noref()
              ip_route_input_slow()
                                     inetdev_destroy()
          dst_input()
      
      With the result that a network device can be destroyed while processing
      an input packet.
      
      A crash was triggered with only unicast packets in flight, and
      forwarding enabled on the only network device.   The error condition
      was created by the removal of the network device.
      
      As such it is likely the that error code was -EHOSTUNREACH, and the
      action taken by ip_error (if in_dev had been accessible) would have
      been to not increment any counters and to have tried and likely failed
      to send an icmp error as the network device is going away.
      
      Therefore handle this weird case by just dropping the packet if
      !in_dev.  It will result in dropping the packet sooner, and will not
      result in an actual change of behavior.
      
      Fixes: 251da413 ("ipv4: Cache ip_error() routes even when not forwarding.")
      Reported-by: NVittorio Gambaletta <linuxbugs@vittgam.net>
      Tested-by: NVittorio Gambaletta <linuxbugs@vittgam.net>
      Signed-off-by: NVittorio Gambaletta <linuxbugs@vittgam.net>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      381c759d
    • E
      tcp: fix a potential deadlock in tcp_get_info() · d654976c
      Eric Dumazet 提交于
      Taking socket spinlock in tcp_get_info() can deadlock, as
      inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
      while packet processing can use the reverse locking order.
      
      We could avoid this locking for TCP_LISTEN states, but lockdep would
      certainly get confused as all TCP sockets share same lockdep classes.
      
      [  523.722504] ======================================================
      [  523.728706] [ INFO: possible circular locking dependency detected ]
      [  523.734990] 4.1.0-dbg-DEV #1676 Not tainted
      [  523.739202] -------------------------------------------------------
      [  523.745474] ss/18032 is trying to acquire lock:
      [  523.750002]  (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
      [  523.758129]
      [  523.758129] but task is already holding lock:
      [  523.763968]  (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
      [  523.774661]
      [  523.774661] which lock already depends on the new lock.
      [  523.774661]
      [  523.782850]
      [  523.782850] the existing dependency chain (in reverse order) is:
      [  523.790326]
      -> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
      [  523.796599]        [<ffffffff811126bb>] lock_acquire+0xbb/0x270
      [  523.802565]        [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
      [  523.808628]        [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
      [  523.815273]        [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
      [  523.822067]        [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
      [  523.828199]        [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
      [  523.834331]        [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
      [  523.840202]        [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
      [  523.847214]        [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
      [  523.853440]        [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
      [  523.859624]        [<ffffffff81659db7>] ip_rcv+0x307/0x420
      
      Lets use u64_sync infrastructure instead. As a bonus, 64bit
      arches get optimized, as these are nop for them.
      
      Fixes: 0df48c26 ("tcp: add tcpi_bytes_acked to tcp_info")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d654976c
  2. 22 5月, 2015 1 次提交
    • D
      net: sched: fix call_rcu() race on classifier module unloads · c78e1746
      Daniel Borkmann 提交于
      Vijay reported that a loop as simple as ...
      
        while true; do
          tc qdisc add dev foo root handle 1: prio
          tc filter add dev foo parent 1: u32 match u32 0 0  flowid 1
          tc qdisc del dev foo root
          rmmod cls_u32
        done
      
      ... will panic the kernel. Moreover, he bisected the change
      apparently introducing it to 78fd1d0a ("netlink: Re-add
      locking to netlink_lookup() and seq walker").
      
      The removal of synchronize_net() from the netlink socket
      triggering the qdisc to be removed, seems to have uncovered
      an RCU resp. module reference count race from the tc API.
      Given that RCU conversion was done after e341694e ("netlink:
      Convert netlink_lookup() to use RCU protected hash table")
      which added the synchronize_net() originally, occasion of
      hitting the bug was less likely (not impossible though):
      
      When qdiscs that i) support attaching classifiers and,
      ii) have at least one of them attached, get deleted, they
      invoke tcf_destroy_chain(), and thus call into ->destroy()
      handler from a classifier module.
      
      After RCU conversion, all classifier that have an internal
      prio list, unlink them and initiate freeing via call_rcu()
      deferral.
      
      Meanhile, tcf_destroy() releases already reference to the
      tp->ops->owner module before the queued RCU callback handler
      has been invoked.
      
      Subsequent rmmod on the classifier module is then not prevented
      since all module references are already dropped.
      
      By the time, the kernel invokes the RCU callback handler from
      the module, that function address is then invalid.
      
      One way to fix it would be to add an rcu_barrier() to
      unregister_tcf_proto_ops() to wait for all pending call_rcu()s
      to complete.
      
      synchronize_rcu() is not appropriate as under heavy RCU
      callback load, registered call_rcu()s could be deferred
      longer than a grace period. In case we don't have any pending
      call_rcu()s, the barrier is allowed to return immediately.
      
      Since we came here via unregister_tcf_proto_ops(), there
      are no users of a given classifier anymore. Further nested
      call_rcu()s pointing into the module space are not being
      done anywhere.
      
      Only cls_bpf_delete_prog() may schedule a work item, to
      unlock pages eventually, but that is not in the range/context
      of cls_bpf anymore.
      
      Fixes: 25d8c0d5 ("net: rcu-ify tcf_proto")
      Fixes: 9888faef ("net: sched: cls_basic use RCU")
      Reported-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Tested-by: NVijay Subramanian <subramanian.vijay@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c78e1746
  3. 21 5月, 2015 4 次提交
    • T
      net: phy: Make sure phy_start() always re-enables the phy interrupts · c15e10e7
      Tim Beale 提交于
      This is an alternative way of fixing:
       commit db9683fb ("net: phy: Make sure PHY_RESUMING state change
                            is always processed")
      
      When the PHY state transitions from PHY_HALTED to PHY_RESUMING, there are
      two things we need to do:
      1). Re-enable interrupts (and power up the physical link, if powered down)
      2). Update the PHY state and net-device based on the link status.
      
      There's no strict reason why #1 has to be done from within the main
      phy_state_machine() function. There is a risk that other changes to the
      PHY (e.g. setting speed/duplex, which calls phy_start_aneg()) could cause
      a subsequent state transition before phy_state_machine() has processed
      the PHY_RESUMING state change. This would leave the PHY with interrupts
      disabled and/or still in the BMCR_PDOWN/low-power mode.
      
      Moving enabling the interrupts and phy_resume() into phy_start() will
      guarantee this work always gets done. As the PHY is already in the HALTED
      state and interrupts are disabled, it shouldn't conflict with any work
      being done in phy_state_machine(). The downside of this change is that if
      the PHY_RESUMING state is ever entered from anywhere else, it'll also have
      to repeat this work.
      Signed-off-by: NTim Beale <tim.beale@alliedtelesis.co.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c15e10e7
    • D
      Merge branch 'ipv6_ecmp_fixes' · 7764b9dd
      David S. Miller 提交于
      Michal Kubecek says:
      
      ====================
      IPv6 ECMP route add/replace fixes
      
      (1) When adding a nexthop of a multipath route fails (e.g. because of a
      conflict with an existing route), we are supposed to delete nexthops
      already added. However, currently we try to also delete all nexthops we
      haven't even tried to add yet so that a "ip route add" command can
      actually remove pre-existing routes if it fails.
      
      (2) Attempt to replace a multipath route results in a broken siblings
      linked list. Following commands (like "ip route del") can then either
      follow a link into freed memory or end in an infinite loop (if the slab
      object has been reused).
      
      v2: fix an omission in first patch
      
      v3: change the semantics of replace operation to better match IPv4
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7764b9dd
    • M
      ipv6: fix ECMP route replacement · 27596472
      Michal Kubeček 提交于
      When replacing an IPv6 multipath route with "ip route replace", i.e.
      NLM_F_CREATE | NLM_F_REPLACE, fib6_add_rt2node() replaces only first
      matching route without fixing its siblings, resulting in corrupted
      siblings linked list; removing one of the siblings can then end in an
      infinite loop.
      
      IPv6 ECMP implementation is a bit different from IPv4 so that route
      replacement cannot work in exactly the same way. This should be a
      reasonable approximation:
      
      1. If the new route is ECMP-able and there is a matching ECMP-able one
      already, replace it and all its siblings (if any).
      
      2. If the new route is ECMP-able and no matching ECMP-able route exists,
      replace first matching non-ECMP-able (if any) or just add the new one.
      
      3. If the new route is not ECMP-able, replace first matching
      non-ECMP-able route (if any) or add the new route.
      
      We also need to remove the NLM_F_REPLACE flag after replacing old
      route(s) by first nexthop of an ECMP route so that each subsequent
      nexthop does not replace previous one.
      
      Fixes: 51ebd318 ("ipv6: add support of equal cost multipath (ECMP)")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27596472
    • M
      ipv6: do not delete previously existing ECMP routes if add fails · 35f1b4e9
      Michal Kubeček 提交于
      If adding a nexthop of an IPv6 multipath route fails, comment in
      ip6_route_multipath() says we are going to delete all nexthops already
      added. However, current implementation deletes even the routes it
      hasn't even tried to add yet. For example, running
      
        ip route add 1234:5678::/64 \
            nexthop via fe80::aa dev dummy1 \
            nexthop via fe80::bb dev dummy1 \
            nexthop via fe80::cc dev dummy1
      
      twice results in removing all routes first command added.
      
      Limit the second (delete) run to nexthops that succeeded in the first
      (add) run.
      
      Fixes: 51ebd318 ("ipv6: add support of equal cost multipath (ECMP)")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35f1b4e9
  4. 20 5月, 2015 7 次提交
  5. 19 5月, 2015 3 次提交
  6. 18 5月, 2015 2 次提交
  7. 17 5月, 2015 4 次提交
    • H
      rhashtable: Add cap on number of elements in hash table · 07ee0722
      Herbert Xu 提交于
      We currently have no limit on the number of elements in a hash table.
      This is a problem because some users (tipc) set a ceiling on the
      maximum table size and when that is reached the hash table may
      degenerate.  Others may encounter OOM when growing and if we allow
      insertions when that happens the hash table perofrmance may also
      suffer.
      
      This patch adds a new paramater insecure_max_entries which becomes
      the cap on the table.  If unset it defaults to max_size * 2.  If
      it is also zero it means that there is no cap on the number of
      elements in the table.  However, the table will grow whenever the
      utilisation hits 100% and if that growth fails, you will get ENOMEM
      on insertion.
      
      As allowing oversubscription is potentially dangerous, the name
      contains the word insecure.
      
      Note that the cap is not a hard limit.  This is done for performance
      reasons as enforcing a hard limit will result in use of atomic ops
      that are heavier than the ones we currently use.
      
      The reasoning is that we're only guarding against a gross over-
      subscription of the table, rather than a small breach of the limit.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      07ee0722
    • T
      net: phy: Make sure PHY_RESUMING state change is always processed · db9683fb
      Tim Beale 提交于
      If phy_start_aneg() was called while the phydev is in the PHY_RESUMING
      state, then its state would immediately transition to PHY_AN (or
      PHY_FORCING). This meant the phy_state_machine() never processed the
      PHY_RESUMING state change, which meant interrupts weren't enabled for the
      PHY. If the PHY used low-power mode (i.e. using BMCR_PDOWN), then the
      physical link wouldn't get powered up again.
      
      There seems no point for phy_start_aneg() to make the PHY_RESUMING -->
      PHY_AN transition, as the state machine will do this anyway. I'm not sure
      about the case where autoneg is disabled, as my patch will change
      behaviour so that the PHY goes to PHY_NOLINK instead of PHY_FORCING. An
      alternative solution would be to move the phy_config_interrupt() and
      phy_resume() work out of the state machine and into phy_start().
      
      The background behind this: we're running linux v3.16.7 and from user-space
      we want to enable the eth port (i.e. do a SIOCSIFFLAGS ioctl with the
      IFF_UP flag) and immediately afterward set the interface's speed/duplex.
      Enabling the interface calls .ndo_open() then phy_start() and the PHY
      transitions PHY_HALTED --> PHY_RESUMING. Setting the speed/duplex ends up
      calling phy_ethtool_sset(), which calls phy_start_aneg() (meanwhile the
      phy_state_machine() hasn't processed the PHY_RESUMING state change yet).
      Signed-off-by: NTim Beale <tim.beale@alliedtelesis.co.nz>
      Reviewed-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      db9683fb
    • H
      netlink: Reset portid after netlink_insert failure · c0bb07df
      Herbert Xu 提交于
      The commit c5adde94 ("netlink:
      eliminate nl_sk_hash_lock") breaks the autobind retry mechanism
      because it doesn't reset portid after a failed netlink_insert.
      
      This means that should autobind fail the first time around, then
      the socket will be stuck in limbo as it can never be bound again
      since it already has a non-zero portid.
      
      Fixes: c5adde94 ("netlink: eliminate nl_sk_hash_lock")
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c0bb07df
    • D
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf · 1d605701
      David S. Miller 提交于
      Pablo Neira Ayuso says:
      
      ====================
      The following patchset contains Netfilter fixes for your net tree, they are:
      
      1) Fix a leak in IPVS, the sysctl table is not released accordingly when
         destroying a netns, patch from Tommi Rantala.
      
      2) Fix a build error when TPROXY and socket are built-in but IPv6 defrag is
         compiled as module, from Florian Westphal.
      
      3) Fix TCP tracket wrt. RFC5961 challenge ACK when in LAST_ACK state, patch
         from Jesper Dangaard Brouer.
      
      4) Fix a bogus WARN_ON() in nf_tables when deleting a set element that stores
         a map, from Mirek Kratochvil.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d605701
  8. 16 5月, 2015 6 次提交
  9. 15 5月, 2015 4 次提交
  10. 14 5月, 2015 3 次提交
    • W
      Bluetooth: Fix remote name event return directly. · 177d0506
      Wesley Kuo 提交于
      This patch fixes hci_remote_name_evt dose not resolve name during
      discovery status is RESOLVING. Before simultaneous dual mode scan enabled,
      hci_check_pending_name will set discovery status to STOPPED eventually.
      Signed-off-by: NWesley Kuo <wesley.kuo@intel.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      177d0506
    • V
      vlan: Correctly propagate promisc|allmulti flags in notifier. · be346ffa
      Vlad Yasevich 提交于
      Currently vlan notifier handler will try to update all vlans
      for a device when that device comes up.  A problem occurs,
      however, when the vlan device was set to promiscuous, but not
      by the user (ex: a bridge).  In that case, dev->gflags are
      not updated.  What results is that the lower device ends
      up with an extra promiscuity count.  Here are the
      backtraces that prove this:
      [62852.052179]  [<ffffffff814fe248>] __dev_set_promiscuity+0x38/0x1e0
      [62852.052186]  [<ffffffff8160bcbb>] ? _raw_spin_unlock_bh+0x1b/0x40
      [62852.052188]  [<ffffffff814fe4be>] ? dev_set_rx_mode+0x2e/0x40
      [62852.052190]  [<ffffffff814fe694>] dev_set_promiscuity+0x24/0x50
      [62852.052194]  [<ffffffffa0324795>] vlan_dev_open+0xd5/0x1f0 [8021q]
      [62852.052196]  [<ffffffff814fe58f>] __dev_open+0xbf/0x140
      [62852.052198]  [<ffffffff814fe88d>] __dev_change_flags+0x9d/0x170
      [62852.052200]  [<ffffffff814fe989>] dev_change_flags+0x29/0x60
      
      The above comes from the setting the vlan device to IFF_UP state.
      
      [62852.053569]  [<ffffffff814fe248>] __dev_set_promiscuity+0x38/0x1e0
      [62852.053571]  [<ffffffffa032459b>] ? vlan_dev_set_rx_mode+0x2b/0x30
      [8021q]
      [62852.053573]  [<ffffffff814fe8d5>] __dev_change_flags+0xe5/0x170
      [62852.053645]  [<ffffffff814fe989>] dev_change_flags+0x29/0x60
      [62852.053647]  [<ffffffffa032334a>] vlan_device_event+0x18a/0x690
      [8021q]
      [62852.053649]  [<ffffffff8161036c>] notifier_call_chain+0x4c/0x70
      [62852.053651]  [<ffffffff8109d456>] raw_notifier_call_chain+0x16/0x20
      [62852.053653]  [<ffffffff814f744d>] call_netdevice_notifiers+0x2d/0x60
      [62852.053654]  [<ffffffff814fe1a3>] __dev_notify_flags+0x33/0xa0
      [62852.053656]  [<ffffffff814fe9b2>] dev_change_flags+0x52/0x60
      [62852.053657]  [<ffffffff8150cd57>] do_setlink+0x397/0xa40
      
      And this one comes from the notification code.  What we end
      up with is a vlan with promiscuity count of 1 and and a physical
      device with a promiscuity count of 2.  They should both have
      a count 1.
      
      To resolve this issue, vlan code can use dev_get_flags() api
      which correctly masks promiscuity and allmulti flags.
      Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      be346ffa
    • D
      Bluetooth: ath3k: add support of 04ca:300f AR3012 device · ec0810d2
      Dmitry Tunin 提交于
      BugLink: https://bugs.launchpad.net/bugs/1449730
      
      T:  Bus=01 Lev=01 Prnt=01 Port=04 Cnt=02 Dev#=  3 Spd=12  MxCh= 0
      D:  Ver= 1.10 Cls=e0(wlcon) Sub=01 Prot=01 MxPS=64 #Cfgs=  1
      P:  Vendor=04ca ProdID=300f Rev=00.01
      C:  #Ifs= 2 Cfg#= 1 Atr=e0 MxPwr=100mA
      I:  If#= 0 Alt= 0 #EPs= 3 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      I:  If#= 1 Alt= 0 #EPs= 2 Cls=e0(wlcon) Sub=01 Prot=01 Driver=btusb
      Signed-off-by: NDmitry Tunin <hanipouspilot@gmail.com>
      Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>
      ec0810d2