1. 13 3月, 2015 3 次提交
    • D
      ebpf: verifier: check that call reg with ARG_ANYTHING is initialized · 80f1d68c
      Daniel Borkmann 提交于
      I noticed that a helper function with argument type ARG_ANYTHING does
      not need to have an initialized value (register).
      
      This can worst case lead to unintented stack memory leakage in future
      helper functions if they are not carefully designed, or unintended
      application behaviour in case the application developer was not careful
      enough to match a correct helper function signature in the API.
      
      The underlying issue is that ARG_ANYTHING should actually be split
      into two different semantics:
      
        1) ARG_DONTCARE for function arguments that the helper function
           does not care about (in other words: the default for unused
           function arguments), and
      
        2) ARG_ANYTHING that is an argument actually being used by a
           helper function and *guaranteed* to be an initialized register.
      
      The current risk is low: ARG_ANYTHING is only used for the 'flags'
      argument (r4) in bpf_map_update_elem() that internally does strict
      checking.
      
      Fixes: 17a52670 ("bpf: verifier (add verifier core)")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      80f1d68c
    • E
      net: Introduce possible_net_t · 0c5c9fb5
      Eric W. Biederman 提交于
      Having to say
      > #ifdef CONFIG_NET_NS
      > 	struct net *net;
      > #endif
      
      in structures is a little bit wordy and a little bit error prone.
      
      Instead it is possible to say:
      > typedef struct {
      > #ifdef CONFIG_NET_NS
      >       struct net *net;
      > #endif
      > } possible_net_t;
      
      And then in a header say:
      
      > 	possible_net_t net;
      
      Which is cleaner and easier to use and easier to test, as the
      possible_net_t is always there no matter what the compile options.
      
      Further this allows read_pnet and write_pnet to be functions in all
      cases which is better at catching typos.
      
      This change adds possible_net_t, updates the definitions of read_pnet
      and write_pnet, updates optional struct net * variables that
      write_pnet uses on to have the type possible_net_t, and finally fixes
      up the b0rked users of read_pnet and write_pnet.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c5c9fb5
    • E
      net: Kill hold_net release_net · efd7ef1c
      Eric W. Biederman 提交于
      hold_net and release_net were an idea that turned out to be useless.
      The code has been disabled since 2008.  Kill the code it is long past due.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efd7ef1c
  2. 12 3月, 2015 3 次提交
    • E
      net: add real socket cookies · 33cf7c90
      Eric Dumazet 提交于
      A long standing problem in netlink socket dumps is the use
      of kernel socket addresses as cookies.
      
      1) It is a security concern.
      
      2) Sockets can be reused quite quickly, so there is
         no guarantee a cookie is used once and identify
         a flow.
      
      3) request sock, establish sock, and timewait socks
         for a given flow have different cookies.
      
      Part of our effort to bring better TCP statistics requires
      to switch to a different allocator.
      
      In this patch, I chose to use a per network namespace 64bit generator,
      and to use it only in the case a socket needs to be dumped to netlink.
      (This might be refined later if needed)
      
      Note that I tried to carry cookies from request sock, to establish sock,
      then timewait sockets.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Eric Salo <salo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33cf7c90
    • F
      of: mdio: export of_mdio_parse_addr · 33d67377
      Florian Fainelli 提交于
      Export of_mdio_parse_addr() which allows parsing a given Ethernet PHY
      node MDIO address, verify it is within the allowed range, and return
      its value. This is going to be useful for the DSA code which needs to
      deal with multiple layers of MDIO buses.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Acked-by: NRob Herring <robh@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33d67377
    • H
      rhashtable: Move hash_rnd into bucket_table · 988dfbd7
      Herbert Xu 提交于
      Currently hash_rnd is a parameter that users can set.  However,
      no existing users set this parameter.  It is also something that
      people are unlikely to want to set directly since it's just a
      random number.
      
      In preparation for allowing the reseeding/rehashing of rhashtable,
      this patch moves hash_rnd into bucket_table so that it's now an
      internal state rather than a parameter.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Acked-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      988dfbd7
  3. 11 3月, 2015 2 次提交
  4. 10 3月, 2015 4 次提交
  5. 09 3月, 2015 2 次提交
  6. 07 3月, 2015 3 次提交
  7. 06 3月, 2015 3 次提交
    • S
      netdevice: add IPv4 fib add/del ops · 4586f1bb
      Scott Feldman 提交于
      Add two new ndo ops for IPv4 fib offload support, add and del.  Add uses
      modifiy semantics if fib entry already offloaded.  Drivers implementing the new
      ndo ops will return err<0 if programming device fails, for example if device's
      tables are full.
      Signed-off-by: NScott Feldman <sfeldma@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4586f1bb
    • R
      cpuidle / sleep: Use broadcast timer for states that stop local timer · ef2b22ac
      Rafael J. Wysocki 提交于
      Commit 38106313 (PM / sleep: Re-implement suspend-to-idle handling)
      overlooked the fact that entering some sufficiently deep idle states
      by CPUs may cause their local timers to stop and in those cases it
      is necessary to switch over to a broadcast timer prior to entering
      the idle state.  If the cpuidle driver in use does not provide
      the new ->enter_freeze callback for any of the idle states, that
      problem affects suspend-to-idle too, but it is not taken into account
      after the changes made by commit 38106313.
      
      Fix that by changing the definition of cpuidle_enter_freeze() and
      re-arranging of the code in cpuidle_idle_call(), so the former does
      not call cpuidle_enter() any more and the fallback case is handled
      by cpuidle_idle_call() directly.
      
      Fixes: 38106313 (PM / sleep: Re-implement suspend-to-idle handling)
      Reported-and-tested-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      ef2b22ac
    • J
      bridge: Extend Proxy ARP design to allow optional rules for Wi-Fi · 842a9ae0
      Jouni Malinen 提交于
      This extends the design in commit 95850116 ("bridge: Add support for
      IEEE 802.11 Proxy ARP") with optional set of rules that are needed to
      meet the IEEE 802.11 and Hotspot 2.0 requirements for ProxyARP. The
      previously added BR_PROXYARP behavior is left as-is and a new
      BR_PROXYARP_WIFI alternative is added so that this behavior can be
      configured from user space when required.
      
      In addition, this enables proxyarp functionality for unicast ARP
      requests for both BR_PROXYARP and BR_PROXYARP_WIFI since it is possible
      to use unicast as well as broadcast for these frames.
      
      The key differences in functionality:
      
      BR_PROXYARP:
      - uses the flag on the bridge port on which the request frame was
        received to determine whether to reply
      - block bridge port flooding completely on ports that enable proxy ARP
      
      BR_PROXYARP_WIFI:
      - uses the flag on the bridge port to which the target device of the
        request belongs
      - block bridge port flooding selectively based on whether the proxyarp
        functionality replied
      Signed-off-by: NJouni Malinen <jouni@codeaurora.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      842a9ae0
  8. 05 3月, 2015 4 次提交
    • T
      workqueue: fix hang involving racing cancel[_delayed]_work_sync()'s for PREEMPT_NONE · 8603e1b3
      Tejun Heo 提交于
      cancel[_delayed]_work_sync() are implemented using
      __cancel_work_timer() which grabs the PENDING bit using
      try_to_grab_pending() and then flushes the work item with PENDING set
      to prevent the on-going execution of the work item from requeueing
      itself.
      
      try_to_grab_pending() can always grab PENDING bit without blocking
      except when someone else is doing the above flushing during
      cancelation.  In that case, try_to_grab_pending() returns -ENOENT.  In
      this case, __cancel_work_timer() currently invokes flush_work().  The
      assumption is that the completion of the work item is what the other
      canceling task would be waiting for too and thus waiting for the same
      condition and retrying should allow forward progress without excessive
      busy looping
      
      Unfortunately, this doesn't work if preemption is disabled or the
      latter task has real time priority.  Let's say task A just got woken
      up from flush_work() by the completion of the target work item.  If,
      before task A starts executing, task B gets scheduled and invokes
      __cancel_work_timer() on the same work item, its try_to_grab_pending()
      will return -ENOENT as the work item is still being canceled by task A
      and flush_work() will also immediately return false as the work item
      is no longer executing.  This puts task B in a busy loop possibly
      preventing task A from executing and clearing the canceling state on
      the work item leading to a hang.
      
      task A			task B			worker
      
      						executing work
      __cancel_work_timer()
        try_to_grab_pending()
        set work CANCELING
        flush_work()
          block for work completion
      						completion, wakes up A
      			__cancel_work_timer()
      			while (forever) {
      			  try_to_grab_pending()
      			    -ENOENT as work is being canceled
      			  flush_work()
      			    false as work is no longer executing
      			}
      
      This patch removes the possible hang by updating __cancel_work_timer()
      to explicitly wait for clearing of CANCELING rather than invoking
      flush_work() after try_to_grab_pending() fails with -ENOENT.
      
      Link: http://lkml.kernel.org/g/20150206171156.GA8942@axis.com
      
      v3: bit_waitqueue() can't be used for work items defined in vmalloc
          area.  Switched to custom wake function which matches the target
          work item and exclusive wait and wakeup.
      
      v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if
          the target bit waitqueue has wait_bit_queue's on it.  Use
          DEFINE_WAIT_BIT() and __wake_up_bit() instead.  Reported by Tomeu
          Vizoso.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NRabin Vincent <rabin.vincent@axis.com>
      Cc: Tomeu Vizoso <tomeu.vizoso@gmail.com>
      Cc: stable@vger.kernel.org
      Tested-by: NJesper Nilsson <jesper.nilsson@axis.com>
      Tested-by: NRabin Vincent <rabin.vincent@axis.com>
      8603e1b3
    • R
      bcma: move internal function declarations to private header · 0a4e699a
      Rafał Miłecki 提交于
      These functions are not exported nor used anywhere, so there is no
      reason to put them in public headers.
      Also drop unused bcma_chipco_(suspend|resume).
      Signed-off-by: NRafał Miłecki <zajec5@gmail.com>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      0a4e699a
    • R
      bcma: make bcma_host_pci_(up|down) calls safe for every config · c32ec2a1
      Rafał Miłecki 提交于
      We were providing declarations but actual code was compiled only with
      CONFIG_BCMA_HOST_PCI set. This could result in:
      ERROR: "bcma_host_pci_down" [drivers/net/wireless/brcm80211/brcmsmac/brcmsmac.ko] undefined!
      ERROR: "bcma_host_pci_up" [drivers/net/wireless/brcm80211/brcmsmac/brcmsmac.ko] undefined!
      ERROR: "bcma_host_pci_down" [drivers/net/wireless/b43/b43.ko] undefined!
      ERROR: "bcma_host_pci_up" [drivers/net/wireless/b43/b43.ko] undefined!
      Reported-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NRafał Miłecki <zajec5@gmail.com>
      Signed-off-by: NKalle Valo <kvalo@codeaurora.org>
      c32ec2a1
    • R
      genirq / PM: Add flag for shared NO_SUSPEND interrupt lines · 17f48034
      Rafael J. Wysocki 提交于
      It currently is required that all users of NO_SUSPEND interrupt
      lines pass the IRQF_NO_SUSPEND flag when requesting the IRQ or the
      WARN_ON_ONCE() in irq_pm_install_action() will trigger.  That is
      done to warn about situations in which unprepared interrupt handlers
      may be run unnecessarily for suspended devices and may attempt to
      access those devices by mistake.  However, it may cause drivers
      that have no technical reasons for using IRQF_NO_SUSPEND to set
      that flag just because they happen to share the interrupt line
      with something like a timer.
      
      Moreover, the generic handling of wakeup interrupts introduced by
      commit 9ce7a258 (genirq: Simplify wakeup mechanism) only works
      for IRQs without any NO_SUSPEND users, so the drivers of wakeup
      devices needing to use shared NO_SUSPEND interrupt lines for
      signaling system wakeup generally have to detect wakeup in their
      interrupt handlers.  Thus if they happen to share an interrupt line
      with a NO_SUSPEND user, they also need to request that their
      interrupt handlers be run after suspend_device_irqs().
      
      In both cases the reason for using IRQF_NO_SUSPEND is not because
      the driver in question has a genuine need to run its interrupt
      handler after suspend_device_irqs(), but because it happens to
      share the line with some other NO_SUSPEND user.  Otherwise, the
      driver would do without IRQF_NO_SUSPEND just fine.
      
      To make it possible to specify that condition explicitly, introduce
      a new IRQ action handler flag for shared IRQs, IRQF_COND_SUSPEND,
      that, when set, will indicate to the IRQ core that the interrupt
      user is generally fine with suspending the IRQ, but it also can
      tolerate handler invocations after suspend_device_irqs() and, in
      particular, it is capable of detecting system wakeup and triggering
      it as appropriate from its interrupt handler.
      
      That will allow us to work around a problem with a shared timer
      interrupt line on at91 platforms.
      
      Link: http://marc.info/?l=linux-kernel&m=142252777602084&w=2
      Link: http://marc.info/?t=142252775300011&r=1&w=2
      Link: https://lkml.org/lkml/2014/12/15/552Reported-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      17f48034
  9. 04 3月, 2015 2 次提交
    • E
      mpls: Basic routing support · 0189197f
      Eric W. Biederman 提交于
      This change adds a new Kconfig option MPLS_ROUTING.
      
      The core of this change is the code to look at an mpls packet received
      from another machine.  Look that packet up in a routing table and
      forward the packet on.
      
      Support of MPLS over ATM is not considered or attempted here.  This
      implemntation follows RFC3032 and implements the MPLS shim header that
      can pass over essentially any network.
      
      What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
      net->mpls.platform_label[].  What RFC3031 refers to as the Next Label
      Hop Forwarding Entry (NHLFE) I call mpls_route.  Though calling it the
      label fordwarding information base (lfib) might also be valid.
      
      Further the implemntation forwards packets as described in RFC3032.
      There is no need and given the original motivation for MPLS a strong
      discincentive to have a flexible label forwarding path.  In essence
      the logic is the topmost label is read, looked up, removed, and
      replaced by 0 or more new lables and the sent out the specified
      interface to it's next hop.
      
      Quite a few optional features are not implemented here.  Among them
      are generation of ICMP errors when the TTL is exceeded or the packet
      is larger than the next hop MTU (those conditions are detected and the
      packets are dropped instead of generating an icmp error).  The traffic
      class field is always set to 0.  The implementation focuses on IP over
      MPLS and does not handle egress of other kinds of protocols.
      
      Instead of implementing coordination with the neighbour table and
      sorting out how to input next hops in a different address family (for
      which there is value).  I was lazy and implemented a next hop mac
      address instead.  The code is simpler and there are flavor of MPLS
      such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
      appropriate so a next hop by mac address would need to be implemented
      at some point.
      
      Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.
      
      Decoding the mpls header must be done by first byeswapping a 32bit bit
      endian word into the local cpu endian and then bit shifting to extract
      the pieces.  There is no C bit-field that can represent a wire format
      mpls header on a little endian machine as the low bits of the 20bit
      label wind up in the wrong half of third byte.  Therefore internally
      everything is deal with in cpu native byte order except when writing
      to and reading from a packet.
      
      For management simplicity if a label is configured to forward out
      an interface that is down the packet is dropped early.  Similarly
      if an network interface is removed rt_dev is updated to NULL
      (so no reference is preserved) and any packets for that label
      are dropped.  Keeping the label entries in the kernel allows
      the kernel label table to function as the definitive source
      of which labels are allocated and which are not.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0189197f
    • T
      NFS: Fix a regression in the read() syscall · 874f9463
      Trond Myklebust 提交于
      When invalidating the page cache for a regular file, we want to first
      sync all dirty data to disk and then call invalidate_inode_pages2().
      The latter relies on nfs_launder_page() and nfs_release_page() to deal
      respectively with dirty pages, and unstable written pages.
      
      When commit 95905446 ("NFS: avoid deadlocks with loop-back mounted
      NFS filesystems.") changed the behaviour of nfs_release_page(), then it
      made it possible for invalidate_inode_pages2() to fail with an EBUSY.
      Unfortunately, that error is then propagated back to read().
      
      Let's therefore work around the problem for now by protecting the call
      to sync the data and invalidate_inode_pages2() so that they are atomic
      w.r.t. the addition of new writes.
      Later on, we can revisit whether or not we still need nfs_launder_page()
      and nfs_release_page().
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      874f9463
  10. 03 3月, 2015 5 次提交
  11. 02 3月, 2015 9 次提交