1. 11 1月, 2014 1 次提交
    • J
      net: core: explicitly select a txq before doing l2 forwarding · f663dd9a
      Jason Wang 提交于
      Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
      will cause several issues:
      
      - NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
        instead of lower device which misses the necessary txq synchronization for
        lower device such as txq stopping or frozen required by dev watchdog or
        control path.
      - dev_hard_start_xmit() was called with NULL txq which bypasses the net device
        watchdog.
      - dev_hard_start_xmit() does not check txq everywhere which will lead a crash
        when tso is disabled for lower device.
      
      Fix this by explicitly introducing a new param for .ndo_select_queue() for just
      selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
      extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
      forwarding transmission.
      
      With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
      to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
      a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
      dev_queue_xmit() to do the transmission.
      
      In the future, it was also required for macvtap l2 forwarding support since it
      provides a necessary synchronization method.
      
      Cc: John Fastabend <john.r.fastabend@intel.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: e1000-devel@lists.sourceforge.net
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f663dd9a
  2. 03 1月, 2014 1 次提交
    • W
      ipv4: fix tunneled VM traffic over hw VXLAN/GRE GSO NIC · 7a7ffbab
      Wei-Chun Chao 提交于
      VM to VM GSO traffic is broken if it goes through VXLAN or GRE
      tunnel and the physical NIC on the host supports hardware VXLAN/GRE
      GSO offload (e.g. bnx2x and next-gen mlx4).
      
      Two issues -
      (VXLAN) VM traffic has SKB_GSO_DODGY and SKB_GSO_UDP_TUNNEL with
      SKB_GSO_TCP/UDP set depending on the inner protocol. GSO header
      integrity check fails in udp4_ufo_fragment if inner protocol is
      TCP. Also gso_segs is calculated incorrectly using skb->len that
      includes tunnel header. Fix: robust check should only be applied
      to the inner packet.
      
      (VXLAN & GRE) Once GSO header integrity check passes, NULL segs
      is returned and the original skb is sent to hardware. However the
      tunnel header is already pulled. Fix: tunnel header needs to be
      restored so that hardware can perform GSO properly on the original
      packet.
      Signed-off-by: NWei-Chun Chao <weichunc@plumgrid.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a7ffbab
  3. 01 1月, 2014 1 次提交
    • D
      vlan: Fix header ops passthru when doing TX VLAN offload. · 2205369a
      David S. Miller 提交于
      When the vlan code detects that the real device can do TX VLAN offloads
      in hardware, it tries to arrange for the real device's header_ops to
      be invoked directly.
      
      But it does so illegally, by simply hooking the real device's
      header_ops up to the VLAN device.
      
      This doesn't work because we will end up invoking a set of header_ops
      routines which expect a device type which matches the real device, but
      will see a VLAN device instead.
      
      Fix this by providing a pass-thru set of header_ops which will arrange
      to pass the proper real device instead.
      
      To facilitate this add a dev_rebuild_header().  There are
      implementations which provide a ->cache and ->create but not a
      ->rebuild (f.e. PLIP).  So we need a helper function just like
      dev_hard_header() to avoid crashes.
      
      Use this helper in the one existing place where the
      header_ops->rebuild was being invoked, the neighbour code.
      
      With lots of help from Florian Westphal.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2205369a
  4. 28 12月, 2013 1 次提交
  5. 25 12月, 2013 1 次提交
  6. 22 12月, 2013 1 次提交
    • B
      aio/migratepages: make aio migrate pages sane · 8e321fef
      Benjamin LaHaise 提交于
      The arbitrary restriction on page counts offered by the core
      migrate_page_move_mapping() code results in rather suspicious looking
      fiddling with page reference counts in the aio_migratepage() operation.
      To fix this, make migrate_page_move_mapping() take an extra_count parameter
      that allows aio to tell the code about its own reference count on the page
      being migrated.
      
      While cleaning up aio_migratepage(), make it validate that the old page
      being passed in is actually what aio_migratepage() expects to prevent
      misbehaviour in the case of races.
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      8e321fef
  7. 21 12月, 2013 2 次提交
  8. 19 12月, 2013 5 次提交
  9. 18 12月, 2013 1 次提交
  10. 17 12月, 2013 1 次提交
  11. 13 12月, 2013 5 次提交
  12. 12 12月, 2013 1 次提交
  13. 11 12月, 2013 5 次提交
  14. 08 12月, 2013 2 次提交
  15. 06 12月, 2013 2 次提交
    • P
      xen-netback: fix fragment detection in checksum setup · 1431fb31
      Paul Durrant 提交于
      The code to detect fragments in checksum_setup() was missing for IPv4 and
      too eager for IPv6. (It transpires that Windows seems to send IPv6 packets
      with a fragment header even if they are not a fragment - i.e. offset is zero,
      and M bit is not set).
      
      This patch also incorporates a fix to callers of maybe_pull_tail() where
      skb->network_header was being erroneously added to the length argument.
      Signed-off-by: NPaul Durrant <paul.durrant@citrix.com>
      Signed-off-by: NZoltan Kiss <zoltan.kiss@citrix.com>
      Cc: Wei Liu <wei.liu2@citrix.com>
      Cc: Ian Campbell <ian.campbell@citrix.com>
      Cc: David Vrabel <david.vrabel@citrix.com>
      cc: David Miller <davem@davemloft.net>
      Acked-by: NWei Liu <wei.liu2@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1431fb31
    • T
      percpu: fix spurious sparse warnings from DEFINE_PER_CPU() · b1a0fbfd
      Tejun Heo 提交于
      When CONFIG_DEBUG_FORCE_WEAK_PER_CPU or CONFIG_ARCH_NEEDS_WEAK_PER_CPU
      is set, DEFINE_PER_CPU() explodes into cryptic series of definitions
      to still allow using "static" for percpu variables while keeping all
      per-cpu symbols unique in the kernel image which is required for weak
      symbols.  This ultimately converts the actual symbol to global whether
      DEFINE_PER_CPU() is prefixed with static or not.
      
      Unfortunately, the macro forgot to add explicit extern declartion of
      the actual symbol ending up defining global symbol without preceding
      declaration for static definitions which naturally don't have matching
      DECLARE_PER_CPU().  The only ill effect is triggering of the following
      warnings.
      
       fs/inode.c:74:8: warning: symbol 'nr_inodes' was not declared. Should it be static?
       fs/inode.c:75:8: warning: symbol 'nr_unused' was not declared. Should it be static?
      
      Fix it by adding extern declaration in the DEFINE_PER_CPU() macro.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NWanlong Gao <gaowanlong@cn.fujitsu.com>
      Tested-by: NWanlong Gao <gaowanlong@cn.fujitsu.com>
      b1a0fbfd
  16. 03 12月, 2013 7 次提交
    • A
      gpiolib: add missing declarations · c9a9972b
      Alexandre Courbot 提交于
      Add declaration of 'struct of_phandle_args' to avoid the following
      warning:
      
        In file included from arch/arm/mach-tegra/board-paz00.c:21:0:
        include/linux/gpio/driver.h:102:17: warning: 'struct of_phandle_args' declared inside parameter list
        include/linux/gpio/driver.h:102:17: warning: its scope is only this definition or declaration, which is probably not what you want
      
      Also proactively add other definitions/includes that could be missing
      in other contexts.
      Signed-off-by: NAlexandre Courbot <acourbot@nvidia.com>
      Reported-by: NStephen Warren <swarren@wwwdotorg.org>
      Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      c9a9972b
    • T
      usb: wusbcore: fix deadlock in wusbhc_gtk_rekey · 471e42ad
      Thomas Pugliese 提交于
      When multiple wireless USB devices are connected and one of the devices
      disconnects, the host will distribute a new group key to the remaining
      devicese using wusbhc_gtk_rekey.  wusbhc_gtk_rekey takes the
      wusbhc->mutex and holds it while it submits a URB to set the new key.
      This causes a deadlock in wa_urb_enqueue when it calls a device lookup
      helper function that takes the same lock.
      
      This patch changes wusbhc_gtk_rekey to submit a work item to set the GTK
      so that the URB is submitted without holding wusbhc->mutex.
      Signed-off-by: NThomas Pugliese <thomas.pugliese@gmail.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      471e42ad
    • D
      Revert "net: Handle CHECKSUM_COMPLETE more adequately in pskb_trim_rcsum()." · 7ce5a27f
      David S. Miller 提交于
      This reverts commit 018c5bba.
      
      It causes regressions for people using chips driven by the sungem
      driver.  Suspicion is that the skb->csum value isn't being adjusted
      properly.
      
      The change also has a bug in that if __pskb_trim() fails, we'll leave
      a corruped skb->csum value in there.  We would really need to revert
      it to it's original value in that case.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7ce5a27f
    • S
      iio: hid-sensors: Fix power and report state · 751d17e2
      Srinivas Pandruvada 提交于
      In the original HID sensor hub firmwares all Named array enums were
      to 0-based. But the most recent hub implemented as 1-based,
      because of the implementation by one of the major OS vendor.
      Using logical minimum for the field as the base of enum. So we add
      logical minimum to the selector values before setting those fields.
      Some sensor hub FWs already changed logical minimum from 0 to 1
      to reflect this and hope every other vendor will follow.
      There is no easy way to add a common HID quirk for NAry elements,
      even if the standard specifies these field as NAry, the collection
      used to describe selectors is still just "logical".
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NJonathan Cameron <jic23@kernel.org>
      751d17e2
    • S
      HID: hid-sensor-hub: Add logical min and max · 9f740ffa
      Srinivas Pandruvada 提交于
      Exporting logical minimum and maximum of HID fields as part of the
      hid sensor attribute info. This can be used for range checking and
      to calculate enumeration base for NAry fields of HID sensor hub.
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      Signed-off-by: NJonathan Cameron <jic23@kernel.org>
      9f740ffa
    • R
      PCI / tg3: Give up chip reset and carrier loss handling if PCI device is not present · 8496e85c
      Rafael J. Wysocki 提交于
      Modify tg3_chip_reset() and tg3_close() to check if the PCI network
      adapter device is accessible at all in order to skip poking it or
      trying to handle a carrier loss in vain when that's not the case.
      Introduce a special PCI helper function pci_device_is_present()
      for this purpose.
      
      Of course, this uncovers the lack of the appropriate RTNL locking
      in tg3_suspend() and tg3_resume(), so add that locking in there
      too.
      
      These changes prevent tg3 from burning a CPU at 100% load level for
      solid several seconds after the Thunderbolt link is disconnected from
      a Matrox DS1 docking station.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NMichael Chan <mchan@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8496e85c
    • D
      usb: xhci: Link TRB must not occur within a USB payload burst · 35773dac
      David Laight 提交于
      Section 4.11.7.1 of rev 1.0 of the xhci specification states that a link TRB
      can only occur at a boundary between underlying USB frames (512 bytes for
      high speed devices).
      
      If this isn't done the USB frames aren't formatted correctly and, for example,
      the USB3 ethernet ax88179_178a card will stop sending (while still receiving)
      when running a netperf tcp transmit test with (say) and 8k buffer.
      
      This should be a candidate for stable, the ax88179_178a driver defaults to
      gso and tso enabled so it passes a lot of fragmented skb to the USB stack.
      
      Notes from Sarah:
      
      Discussion: http://marc.info/?l=linux-usb&m=138384509604981&w=2
      
      This patch fixes a long-standing xHCI driver bug that was revealed by a
      change in 3.12 in the usb-net driver.  Commit
      638c5115 "USBNET: support DMA SG" added
      support to use bulk endpoint scatter-gather (urb->sg).  Only the USB
      ethernet drivers trigger this bug, because the mass storage driver sends
      sg list entries in page-sized chunks.
      
      This patch only fixes the issue for bulk endpoint scatter-gather.  The
      problem will still occur for periodic endpoints, because hosts will
      interpret no-op transfers as a request to skip a service interval, which
      is not what we want.
      
      Luckily, the USB core isn't set up for scatter-gather on isochronous
      endpoints, and no USB drivers use scatter-gather for interrupt
      endpoints.  Document this known limitation so that developers won't try
      to use urb->sg for interrupt endpoints until this issue is fixed.  The
      more comprehensive fix would be to allow link TRBs in the middle of the
      endpoint ring and revert this patch, but that fix would touch too much
      code to be allowed in for stable.
      
      This patch should be backported to kernels as old as 3.12, that contain
      the commit 638c5115 "USBNET: support DMA
      SG".  Without this patch, the USB network device gets wedged, and stops
      sending packets.  Mark Lord confirms this patch fixes the regression:
      
      http://marc.info/?l=linux-netdev&m=138487107625966&w=2Signed-off-by: NDavid Laight <david.laight@aculab.com>
      Signed-off-by: NSarah Sharp <sarah.a.sharp@linux.intel.com>
      Tested-by: NMark Lord <mlord@pobox.com>
      Cc: stable@vger.kernel.org
      35773dac
  17. 02 12月, 2013 2 次提交
    • E
      security: shmem: implement kernel private shmem inodes · c7277090
      Eric Paris 提交于
      We have a problem where the big_key key storage implementation uses a
      shmem backed inode to hold the key contents.  Because of this detail of
      implementation LSM checks are being done between processes trying to
      read the keys and the tmpfs backed inode.  The LSM checks are already
      being handled on the key interface level and should not be enforced at
      the inode level (since the inode is an implementation detail, not a
      part of the security model)
      
      This patch implements a new function shmem_kernel_file_setup() which
      returns the equivalent to shmem_file_setup() only the underlying inode
      has S_PRIVATE set.  This means that all LSM checks for the inode in
      question are skipped.  It should only be used for kernel internal
      operations where the inode is not exposed to userspace without proper
      LSM checking.  It is possible that some other users of
      shmem_file_setup() should use the new interface, but this has not been
      explored.
      
      Reproducing this bug is a little bit difficult.  The steps I used on
      Fedora are:
      
       (1) Turn off selinux enforcing:
      
      	setenforce 0
      
       (2) Create a huge key
      
      	k=`dd if=/dev/zero bs=8192 count=1 | keyctl padd big_key test-key @s`
      
       (3) Access the key in another context:
      
      	runcon system_u:system_r:httpd_t:s0-s0:c0.c1023 keyctl print $k >/dev/null
      
       (4) Examine the audit logs:
      
      	ausearch -m AVC -i --subject httpd_t | audit2allow
      
      If the last command's output includes a line that looks like:
      
      	allow httpd_t user_tmpfs_t:file { open read };
      
      There was an inode check between httpd and the tmpfs filesystem.  With
      this patch no such denial will be seen.  (NOTE! you should clear your
      audit log if you have tested for this previously)
      
      (Please return you box to enforcing)
      Signed-off-by: NEric Paris <eparis@redhat.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Hugh Dickins <hughd@google.com>
      cc: linux-mm@kvack.org
      c7277090
    • D
      KEYS: Fix multiple key add into associative array · 23fd78d7
      David Howells 提交于
      If sufficient keys (or keyrings) are added into a keyring such that a node in
      the associative array's tree overflows (each node has a capacity N, currently
      16) and such that all N+1 keys have the same index key segment for that level
      of the tree (the level'th nibble of the index key), then assoc_array_insert()
      calls ops->diff_objects() to indicate at which bit position the two index keys
      vary.
      
      However, __key_link_begin() passes a NULL object to assoc_array_insert() with
      the intention of supplying the correct pointer later before we commit the
      change.  This means that keyring_diff_objects() is given a NULL pointer as one
      of its arguments which it does not expect.  This results in an oops like the
      attached.
      
      With the previous patch to fix the keyring hash function, this can be forced
      much more easily by creating a keyring and only adding keyrings to it.  Add any
      other sort of key and a different insertion path is taken - all 16+1 objects
      must want to cluster in the same node slot.
      
      This can be tested by:
      
      	r=`keyctl newring sandbox @s`
      	for ((i=0; i<=16; i++)); do keyctl newring ring$i $r; done
      
      This should work fine, but oopses when the 17th keyring is added.
      
      Since ops->diff_objects() is always called with the first pointer pointing to
      the object to be inserted (ie. the NULL pointer), we can fix the problem by
      changing the to-be-inserted object pointer to point to the index key passed
      into assoc_array_insert() instead.
      
      Whilst we're at it, we also switch the arguments so that they are the same as
      for ->compare_object().
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000088
      IP: [<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
      ...
      RIP: 0010:[<ffffffff81191ee4>] hash_key_type_and_desc+0x18/0xb0
      ...
      Call Trace:
       [<ffffffff81191f9d>] keyring_diff_objects+0x21/0xd2
       [<ffffffff811f09ef>] assoc_array_insert+0x3b6/0x908
       [<ffffffff811929a7>] __key_link_begin+0x78/0xe5
       [<ffffffff81191a2e>] key_create_or_update+0x17d/0x36a
       [<ffffffff81192e0a>] SyS_add_key+0x123/0x183
       [<ffffffff81400ddb>] tracesys+0xdd/0xe2
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-by: NStephen Gallagher <sgallagh@redhat.com>
      23fd78d7
  18. 29 11月, 2013 1 次提交
    • S
      efivars, efi-pstore: Hold off deletion of sysfs entry until the scan is completed · e0d59733
      Seiji Aguchi 提交于
      Currently, when mounting pstore file system, a read callback of
      efi_pstore driver runs mutiple times as below.
      
      - In the first read callback, scan efivar_sysfs_list from head and pass
        a kmsg buffer of a entry to an upper pstore layer.
      - In the second read callback, rescan efivar_sysfs_list from the entry
        and pass another kmsg buffer to it.
      - Repeat the scan and pass until the end of efivar_sysfs_list.
      
      In this process, an entry is read across the multiple read function
      calls. To avoid race between the read and erasion, the whole process
      above is protected by a spinlock, holding in open() and releasing in
      close().
      
      At the same time, kmemdup() is called to pass the buffer to pstore
      filesystem during it. And then, it causes a following lockdep warning.
      
      To make the dynamic memory allocation runnable without taking spinlock,
      holding off a deletion of sysfs entry if it happens while scanning it
      via efi_pstore, and deleting it after the scan is completed.
      
      To implement it, this patch introduces two flags, scanning and deleting,
      to efivar_entry.
      
      On the code basis, it seems that all the scanning and deleting logic is
      not needed because __efivars->lock are not dropped when reading from the
      EFI variable store.
      
      But, the scanning and deleting logic is still needed because an
      efi-pstore and a pstore filesystem works as follows.
      
      In case an entry(A) is found, the pointer is saved to psi->data.  And
      efi_pstore_read() passes the entry(A) to a pstore filesystem by
      releasing  __efivars->lock.
      
      And then, the pstore filesystem calls efi_pstore_read() again and the
      same entry(A), which is saved to psi->data, is used for resuming to scan
      a sysfs-list.
      
      So, to protect the entry(A), the logic is needed.
      
      [    1.143710] ------------[ cut here ]------------
      [    1.144058] WARNING: CPU: 1 PID: 1 at kernel/lockdep.c:2740 lockdep_trace_alloc+0x104/0x110()
      [    1.144058] DEBUG_LOCKS_WARN_ON(irqs_disabled_flags(flags))
      [    1.144058] Modules linked in:
      [    1.144058] CPU: 1 PID: 1 Comm: systemd Not tainted 3.11.0-rc5 #2
      [    1.144058]  0000000000000009 ffff8800797e9ae0 ffffffff816614a5 ffff8800797e9b28
      [    1.144058]  ffff8800797e9b18 ffffffff8105510d 0000000000000080 0000000000000046
      [    1.144058]  00000000000000d0 00000000000003af ffffffff81ccd0c0 ffff8800797e9b78
      [    1.144058] Call Trace:
      [    1.144058]  [<ffffffff816614a5>] dump_stack+0x54/0x74
      [    1.144058]  [<ffffffff8105510d>] warn_slowpath_common+0x7d/0xa0
      [    1.144058]  [<ffffffff8105517c>] warn_slowpath_fmt+0x4c/0x50
      [    1.144058]  [<ffffffff8131290f>] ? vsscanf+0x57f/0x7b0
      [    1.144058]  [<ffffffff810bbd74>] lockdep_trace_alloc+0x104/0x110
      [    1.144058]  [<ffffffff81192da0>] __kmalloc_track_caller+0x50/0x280
      [    1.144058]  [<ffffffff815147bb>] ? efi_pstore_read_func.part.1+0x12b/0x170
      [    1.144058]  [<ffffffff8115b260>] kmemdup+0x20/0x50
      [    1.144058]  [<ffffffff815147bb>] efi_pstore_read_func.part.1+0x12b/0x170
      [    1.144058]  [<ffffffff81514800>] ? efi_pstore_read_func.part.1+0x170/0x170
      [    1.144058]  [<ffffffff815148b4>] efi_pstore_read_func+0xb4/0xe0
      [    1.144058]  [<ffffffff81512b7b>] __efivar_entry_iter+0xfb/0x120
      [    1.144058]  [<ffffffff8151428f>] efi_pstore_read+0x3f/0x50
      [    1.144058]  [<ffffffff8128d7ba>] pstore_get_records+0x9a/0x150
      [    1.158207]  [<ffffffff812af25c>] ? selinux_d_instantiate+0x1c/0x20
      [    1.158207]  [<ffffffff8128ce30>] ? parse_options+0x80/0x80
      [    1.158207]  [<ffffffff8128ced5>] pstore_fill_super+0xa5/0xc0
      [    1.158207]  [<ffffffff811ae7d2>] mount_single+0xa2/0xd0
      [    1.158207]  [<ffffffff8128ccf8>] pstore_mount+0x18/0x20
      [    1.158207]  [<ffffffff811ae8b9>] mount_fs+0x39/0x1b0
      [    1.158207]  [<ffffffff81160550>] ? __alloc_percpu+0x10/0x20
      [    1.158207]  [<ffffffff811c9493>] vfs_kern_mount+0x63/0xf0
      [    1.158207]  [<ffffffff811cbb0e>] do_mount+0x23e/0xa20
      [    1.158207]  [<ffffffff8115b51b>] ? strndup_user+0x4b/0xf0
      [    1.158207]  [<ffffffff811cc373>] SyS_mount+0x83/0xc0
      [    1.158207]  [<ffffffff81673cc2>] system_call_fastpath+0x16/0x1b
      [    1.158207] ---[ end trace 61981bc62de9f6f4 ]---
      Signed-off-by: NSeiji Aguchi <seiji.aguchi@hds.com>
      Tested-by: NMadper Xie <cxie@redhat.com>
      Cc: stable@kernel.org
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      e0d59733