1. 11 1月, 2014 1 次提交
    • J
      net: core: explicitly select a txq before doing l2 forwarding · f663dd9a
      Jason Wang 提交于
      Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
      will cause several issues:
      
      - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
        instead of lower device which misses the necessary txq synchronization for
        lower device such as txq stopping or frozen required by dev watchdog or
        control path.
      - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
        watchdog.
      - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
        when tso is disabled for lower device.
      
      Fix this by explicitly introducing a new param for .ndo_select_queue() for just
      selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
      extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
      forwarding transmission.
      
      With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
      to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
      a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
      dev_queue_xmit() to do the transmission.
      
      In the future, it was also required for macvtap l2 forwarding support since it
      provides a necessary synchronization method.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: e1000-devel@lists.sourceforge.net
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f663dd9a
  2. 03 1月, 2014 2 次提交
    • W
      ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC · 7a7ffbab
      Wei-Chun Chao 提交于
      VM to VM GSO traffic is broken if it goes through VXLAN or GRE
      tunnel and the physical NIC on the host supports hardware VXLAN/GRE
      GSO offload (e.g. bnx2x and next-gen mlx4).
      
      Two issues -
      (VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with
      SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header
      integrity check fails in udp4_ufo_fragment if inner protocol is
      TCP. Also gso_segs is calculated incorrectly using skb->len that
      includes tunnel header. Fix: robust check should only be applied
      to the inner packet.
      
      (VXLAN & GRE) Once GSO header integrity check passes, NULL segs
      is returned and the original skb is sent to hardware. However the
      tunnel header is already pulled. Fix: tunnel header needs to be
      restored so that hardware can perform GSO properly on the original
      packet.
      Signed-off-by: NWei-Chun Chao <weichunc@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a7ffbab
    • V
      sctp: Remove outqueue empty state · 619a60ee
      Vlad Yasevich 提交于
      The SCTP outqueue structure maintains a data chunks
      that are pending transmission, the list of chunks that
      are pending a retransmission and a length of data in
      flight.  It also tries to keep the emtpy state so that
      it can performe shutdown sequence or notify user.
      
      The problem is that the empy state is inconsistently
      tracked.  It is possible to completely drain the queue
      without sending anything when using PR-SCTP.  In this
      case, the empty state will not be correctly state as
      report by Jamal Hadi Salim <jhs@mojatatu.com>.  This
      can cause an association to be perminantly stuck in the
      SHUTDOWN_PENDING state.
      
      Additionally, SCTP is incredibly inefficient when setting
      the empty state.  Even though all the data is availaible
      in the outqueue structure, we ignore it and walk a list
      of trasnports.
      
      In the end, we can completely remove the extra empty
      state and figure out if the queue is empty by looking
      at 3 things:  length of pending data, length of in-flight
      data, and exisiting of retransmit data.  All of these
      are already in the strucutre.
      Reported-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Tested-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      619a60ee
  3. 02 1月, 2014 1 次提交
    • D
      net: llc: fix order of evaluation in llc_conn_ac_inc_vr_by_1 · 7e030963
      Daniel Borkmann 提交于
      Function llc_conn_ac_inc_vr_by_1() evaluates via macro
      PDU_GET_NEXT_Vr() into ...
      
        llc_sk(sk)->vR = ++llc_sk(sk)->vR & 0xffffffffffffff7f
      
      ... but the order in which the side effects take place is
      undefined because there is no intervening sequence point.
      
      As llc_sk(sk)->vR is written in llc_sk(sk)->vR (assignment
      left-hand side) and written in ++llc_sk(sk)->vR & 0xffffffffffffff7f
      this might possibly yield undefined behavior.
      
      The final value of llc_sk(sk)->vR is ambiguous, because,
      depending on the order of expression evaluation, the
      increment may occur before, after, or interleaved with
      the assignment. In C, evaluating such an expression yields
      undefined behavior.
      
      Since we're doing the increment via PDU_GET_NEXT_Vr() macro
      and the only place it is being used is from
      llc_conn_ac_inc_vr_by_1(), in order to increment vR by 1
      with a follow-up optimized modulo, rewrite the expression
      into ((vR + 1) & CONST) in order to fix this.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e030963
  4. 01 1月, 2014 1 次提交
    • D
      vlan: Fix header ops passthru when doing TX VLAN offload. · 2205369a
      David S. Miller 提交于
      When the vlan code detects that the real device can do TX VLAN offloads
      in hardware, it tries to arrange for the real device's header_ops to
      be invoked directly.
      
      But it does so illegally, by simply hooking the real device's
      header_ops up to the VLAN device.
      
      This doesn't work because we will end up invoking a set of header_ops
      routines which expect a device type which matches the real device, but
      will see a VLAN device instead.
      
      Fix this by providing a pass-thru set of header_ops which will arrange
      to pass the proper real device instead.
      
      To facilitate this add a dev_rebuild_header().  There are
      implementations which provide a ->cache and ->create but not a
      ->rebuild (f.e. PLIP).  So we need a helper function just like
      dev_hard_header() to avoid crashes.
      
      Use this helper in the one existing place where the
      header_ops->rebuild was being invoked, the neighbour code.
      
      With lots of help from Florian Westphal.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2205369a
  5. 31 12月, 2013 1 次提交
    • R
      ACPIPHP / radeon / nouveau: Fix VGA switcheroo problem related to hotplug · f244d8b6
      Rafael J. Wysocki 提交于
      The changes in the ACPI-based PCI hotplug (ACPIPHP) subsystem made
      during the 3.12 development cycle uncovered a problem with VGA
      switcheroo that on some systems, when the device-specific method
      (ATPX in the radeon case, _DSM in the nouveau case) is used to turn
      off the discrete graphics, the BIOS generates ACPI hotplug events for
      that device and those events cause ACPIPHP to attempt to remove the
      device from the system (they are events for a device that was present
      previously and is not present any more, so that's what should be done
      according to the spec).  Then, the system stops functioning correctly.
      
      Since the hotplug events in question were simply silently ignored
      previously, the least intrusive way to address that problem is to
      make ACPIPHP ignore them again.  For this purpose, introduce a new
      ACPI device flag, no_hotplug, and modify ACPIPHP to ignore hotplug
      events for PCI devices whose ACPI companions have that flag set.
      Next, make the radeon and nouveau switcheroo detection code set the
      no_hotplug flag for the discrete graphics' ACPI companion.
      
      Fixes: bbd34fcd (ACPI / hotplug / PCI: Register all devices under the given bridge)
      References: https://bugzilla.kernel.org/show_bug.cgi?id=61891
      References: https://bugzilla.kernel.org/show_bug.cgi?id=64891Reported-and-tested-by: NMike Lothian <mike@fireburn.co.uk>
      Reported-and-tested-by: <madcatx@atlas.cz>
      Reported-and-tested-by: NJoaquín Aramendía <samsagax@gmail.com>
      Cc: Alex Deucher <alexdeucher@gmail.com>
      Cc: Dave Airlie <airlied@linux.ie>
      Cc: Takashi Iwai <tiwai@suse.de>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
      f244d8b6
  6. 28 12月, 2013 1 次提交
  7. 25 12月, 2013 1 次提交
  8. 23 12月, 2013 2 次提交
  9. 22 12月, 2013 1 次提交
    • B
      aio/migratepages: make aio migrate pages sane · 8e321fef
      Benjamin LaHaise 提交于
      The arbitrary restriction on page counts offered by the core
      migrate_page_move_mapping() code results in rather suspicious looking
      fiddling with page reference counts in the aio_migratepage() operation.
      To fix this, make migrate_page_move_mapping() take an extra_count parameter
      that allows aio to tell the code about its own reference count on the page
      being migrated.
      
      While cleaning up aio_migratepage(), make it validate that the old page
      being passed in is actually what aio_migratepage() expects to prevent
      misbehaviour in the case of races.
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      8e321fef
  10. 21 12月, 2013 3 次提交
  11. 19 12月, 2013 6 次提交
  12. 18 12月, 2013 1 次提交
  13. 17 12月, 2013 3 次提交
  14. 16 12月, 2013 1 次提交
  15. 13 12月, 2013 7 次提交
  16. 12 12月, 2013 2 次提交
  17. 11 12月, 2013 6 次提交
    • P
      sched/fair: Rework sched_fair time accounting · 9dbdb155
      Peter Zijlstra 提交于
      Christian suffers from a bad BIOS that wrecks his i5's TSC sync. This
      results in him occasionally seeing time going backwards - which
      crashes the scheduler ...
      
      Most of our time accounting can actually handle that except the most
      common one; the tick time update of sched_fair.
      
      There is a further problem with that code; previously we assumed that
      because we get a tick every TICK_NSEC our time delta could never
      exceed 32bits and math was simpler.
      
      However, ever since Frederic managed to get NO_HZ_FULL merged; this is
      no longer the case since now a task can run for a long time indeed
      without getting a tick. It only takes about ~4.2 seconds to overflow
      our u32 in nanoseconds.
      
      This means we not only need to better deal with time going backwards;
      but also means we need to be able to deal with large deltas.
      
      This patch reworks the entire code and uses mul_u64_u32_shr() as
      proposed by Andy a long while ago.
      
      We express our virtual time scale factor in a u32 multiplier and shift
      right and the 32bit mul_u64_u32_shr() implementation reduces to a
      single 32x32->64 multiply if the time delta is still short (common
      case).
      
      For 64bit a 64x64->128 multiply can be used if ARCH_SUPPORTS_INT128.
      Reported-and-Tested-by: NChristian Engelmayer <cengelma@gmx.at>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: fweisbec@gmail.com
      Cc: Paul Turner <pjt@google.com>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20131118172706.GI3866@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9dbdb155
    • P
      math64: Add mul_u64_u32_shr() · be5e610c
      Peter Zijlstra 提交于
      Introduce mul_u64_u32_shr() as proposed by Andy a while back; it
      allows using 64x64->128 muls on 64bit archs and recent GCC
      which defines __SIZEOF_INT128__ and __int128.
      
      (This new method will be used by the scheduler.)
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: fweisbec@gmail.com
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-hxjoeuzmrcaumR0uZwjpe2pv@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      be5e610c
    • P
      sched: Remove PREEMPT_NEED_RESCHED from generic code · ba1f14fb
      Peter Zijlstra 提交于
      While hunting a preemption issue with Alexander, Ben noticed that the
      currently generic PREEMPT_NEED_RESCHED stuff is horribly broken for
      load-store architectures.
      
      We currently rely on the IPI to fold TIF_NEED_RESCHED into
      PREEMPT_NEED_RESCHED, but when this IPI lands while we already have
      a load for the preempt-count but before the store, the store will erase
      the PREEMPT_NEED_RESCHED change.
      
      The current preempt-count only works on load-store archs because
      interrupts are assumed to be completely balanced wrt their preempt_count
      fiddling; the previous preempt_count load will match the preempt_count
      state after the interrupt and therefore nothing gets lost.
      
      This patch removes the PREEMPT_NEED_RESCHED usage from generic code and
      pushes it into x86 arch code; the generic code goes back to relying on
      TIF_NEED_RESCHED.
      
      Boot tested on x86_64 and compile tested on ppc64.
      Reported-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Reported-and-Tested-by: NAlexander Graf <agraf@suse.de>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20131128132641.GP10022@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ba1f14fb
    • N
      sctp: properly latch and use autoclose value from sock to association · 9f70f46b
      Neil Horman 提交于
      Currently, sctp associations latch a sockets autoclose value to an association
      at association init time, subject to capping constraints from the max_autoclose
      sysctl value.  This leads to an odd situation where an application may set a
      socket level autoclose timeout, but sliently sctp will limit the autoclose
      timeout to something less than that.
      
      Fix this by modifying the autoclose setsockopt function to check the limit, cap
      it and warn the user via syslog that the timeout is capped.  This will allow
      getsockopt to return valid autoclose timeout values that reflect what subsequent
      associations actually use.
      
      While were at it, also elimintate the assoc->autoclose variable, it duplicates
      whats in the timeout array, which leads to multiple sources for the same
      information, that may differ (as the former isn't subject to any capping).  This
      gives us the timeout information in a canonical place and saves some space in
      the association structure as well.
      Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NVlad Yasevich <vyasevich@gmail.com>
      CC: Wang Weidong <wangweidong1@huawei.com>
      CC: David Miller <davem@davemloft.net>
      CC: Vlad Yasevich <vyasevich@gmail.com>
      CC: netdev@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9f70f46b
    • S
      net: unix: allow set_peek_off to fail · 12663bfc
      Sasha Levin 提交于
      unix_dgram_recvmsg() will hold the readlock of the socket until recv
      is complete.
      
      In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until
      unix_dgram_recvmsg() will complete (which can take a while) without allowing
      us to break out of it, triggering a hung task spew.
      
      Instead, allow set_peek_off to fail, this way userspace will not hang.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Acked-by: NPavel Emelyanov <xemul@parallels.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12663bfc
    • H
      x86, build, icc: Remove uninitialized_var() from compiler-intel.h · 503cf95c
      H. Peter Anvin 提交于
      When compiling with icc, <linux/compiler-gcc.h> ends up included
      because the icc environment defines __GNUC__.  Thus, we neither need
      nor want to have this macro defined in both compiler-gcc.h and
      compiler-intel.h, and the fact that they are inconsistent just makes
      the compiler spew warnings.
      Reported-by: NSunil K. Pandey <sunil.k.pandey@intel.com>
      Cc: Kevin B. Smith <kevin.b.smith@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-0mbwou1zt7pafij09b897lg3@git.kernel.org
      Cc: <stable@vger.kernel.org>
      503cf95c