1. 06 9月, 2013 4 次提交
    • M
      fscache: Netfs function for cleanup post readpages · 5a6f282a
      Milosz Tanski 提交于
      Currently the fscache code expect the netfs to call fscache_readpages_or_alloc
      inside the aops readpages callback.  It marks all the pages in the list
      provided by readahead with PG_private_2.  In the cases that the netfs fails to
      read all the pages (which is legal) it ends up returning to the readahead and
      triggering a BUG.  This happens because the page list still contains marked
      pages.
      
      This patch implements a simple fscache_readpages_cancel function that the netfs
      should call before returning from readpages.  It will revoke the pages from the
      underlying cache backend and unmark them.
      
      The problem was originally worked out in the Ceph devel tree, but it also
      occurs in CIFS.  It appears that NFS, AFS and 9P are okay as read_cache_pages()
      will clean up the unprocessed pages in the case of an error.
      
      This can be used to address the following oops:
      
      [12410647.597278] BUG: Bad page state in process petabucket  pfn:3d504e
      [12410647.597292] page:ffffea000f541380 count:0 mapcount:0 mapping:
      	(null) index:0x0
      [12410647.597298] page flags: 0x200000000001000(private_2)
      
      ...
      
      [12410647.597334] Call Trace:
      [12410647.597345]  [<ffffffff815523f2>] dump_stack+0x19/0x1b
      [12410647.597356]  [<ffffffff8111def7>] bad_page+0xc7/0x120
      [12410647.597359]  [<ffffffff8111e49e>] free_pages_prepare+0x10e/0x120
      [12410647.597361]  [<ffffffff8111fc80>] free_hot_cold_page+0x40/0x170
      [12410647.597363]  [<ffffffff81123507>] __put_single_page+0x27/0x30
      [12410647.597365]  [<ffffffff81123df5>] put_page+0x25/0x40
      [12410647.597376]  [<ffffffffa02bdcf9>] ceph_readpages+0x2e9/0x6e0 [ceph]
      [12410647.597379]  [<ffffffff81122a8f>] __do_page_cache_readahead+0x1af/0x260
      [12410647.597382]  [<ffffffff81122ea1>] ra_submit+0x21/0x30
      [12410647.597384]  [<ffffffff81118f64>] filemap_fault+0x254/0x490
      [12410647.597387]  [<ffffffff8113a74f>] __do_fault+0x6f/0x4e0
      [12410647.597391]  [<ffffffff810125bd>] ? __switch_to+0x16d/0x4a0
      [12410647.597395]  [<ffffffff810865ba>] ? finish_task_switch+0x5a/0xc0
      [12410647.597398]  [<ffffffff8113d856>] handle_pte_fault+0xf6/0x930
      [12410647.597401]  [<ffffffff81008c33>] ? pte_mfn_to_pfn+0x93/0x110
      [12410647.597403]  [<ffffffff81008cce>] ? xen_pmd_val+0xe/0x10
      [12410647.597405]  [<ffffffff81005469>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
      [12410647.597407]  [<ffffffff8113f361>] handle_mm_fault+0x251/0x370
      [12410647.597411]  [<ffffffff812b0ac4>] ? call_rwsem_down_read_failed+0x14/0x30
      [12410647.597414]  [<ffffffff8155bffa>] __do_page_fault+0x1aa/0x550
      [12410647.597418]  [<ffffffff8108011d>] ? up_write+0x1d/0x20
      [12410647.597422]  [<ffffffff8113141c>] ? vm_mmap_pgoff+0xbc/0xe0
      [12410647.597425]  [<ffffffff81143bb8>] ? SyS_mmap_pgoff+0xd8/0x240
      [12410647.597427]  [<ffffffff8155c3ae>] do_page_fault+0xe/0x10
      [12410647.597431]  [<ffffffff81558818>] page_fault+0x28/0x30
      Signed-off-by: NMilosz Tanski <milosz@adfin.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      5a6f282a
    • D
      FS-Cache: Fix heading in documentation · 696f69b6
      David Howells 提交于
      Fix a heading in the documentation to make it consistent with the contents
      list.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      696f69b6
    • D
      CacheFiles: Implement interface to check cache consistency · 5002d7be
      David Howells 提交于
      Implement the FS-Cache interface to check the consistency of a cache object in
      CacheFiles.
      
      Original-author: Hongyi Jia <jiayisuse@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Hongyi Jia <jiayisuse@gmail.com>
      cc: Milosz Tanski <milosz@adfin.com>
      5002d7be
    • D
      FS-Cache: Add interface to check consistency of a cached object · da9803bc
      David Howells 提交于
      Extend the fscache netfs API so that the netfs can ask as to whether a cache
      object is up to date with respect to its corresponding netfs object:
      
      	int fscache_check_consistency(struct fscache_cookie *cookie)
      
      This will call back to the netfs to check whether the auxiliary data associated
      with a cookie is correct.  It returns 0 if it is and -ESTALE if it isn't; it
      may also return -ENOMEM and -ERESTARTSYS.
      
      The backends now have to implement a mandatory operation pointer:
      
      	int (*check_consistency)(struct fscache_object *object)
      
      that corresponds to the above API call.  FS-Cache takes care of pinning the
      object and the cookie in memory and managing this call with respect to the
      object state.
      
      Original-author: Hongyi Jia <jiayisuse@gmail.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Hongyi Jia <jiayisuse@gmail.com>
      cc: Milosz Tanski <milosz@adfin.com>
      da9803bc
  2. 03 9月, 2013 4 次提交
  3. 31 8月, 2013 15 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · a8787645
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) There was a simplification in the ipv6 ndisc packet sending
          attempted here, which avoided using memory accounting on the
          per-netns ndisc socket for sending NDISC packets.  It did fix some
          important issues, but it causes regressions so it gets reverted here
          too.  Specifically, the problem with this change is that the IPV6
          output path really depends upon there being a valid skb->sk
          attached.
      
          The reason we want to do this change in some form when we figure out
          how to do it right, is that if a device goes down the ndisc_sk
          socket send queue will fill up and block NDISC packets that we want
          to send to other devices too.  That's really bad behavior.
      
          Hopefully Thomas can come up with a better version of this change.
      
       2) Fix a severe TCP performance regression by reverting a change made
          to dev_pick_tx() quite some time ago.  From Eric Dumazet.
      
       3) TIPC returns wrongly signed error codes, fix from Erik Hugne.
      
       4) Fix OOPS when doing IPSEC over ipv4 tunnels due to orphaning the
          skb->sk too early.  Fix from Li Hongjun.
      
       5) RAW ipv4 sockets can use the wrong routing key during lookup, from
          Chris Clark.
      
       6) Similar to #1 revert an older change that tried to use plain
          alloc_skb() for SYN/ACK TCP packets, this broke the netfilter owner
          mark which needs to see the skb->sk for such frames.  From Phil
          Oester.
      
       7) BNX2x driver bug fixes from Ariel Elior and Yuval Mintz,
          specifically in the handling of virtual functions.
      
       8) IPSEC path error propagations to sockets is not done properly when
          we have v4 in v6, and v6 in v4 type rules.  Fix from Hannes Frederic
          Sowa.
      
       9) Fix missing channel context release in mac80211, from Johannes Berg.
      
      10) Fix network namespace handing wrt.  SCM_RIGHTS, from Andy
          Lutomirski.
      
      11) Fix usage of bogus NAPI weight in jme, netxen, and ps3_gelic
          drivers.  From Michal Schmidt.
      
      12) Hopefully a complete and correct fix for the genetlink dump locking
          and module reference counting.  From Pravin B Shelar.
      
      13) sk_busy_loop() must do a cpu_relax(), from Eliezer Tamir.
      
      14) Fix handling of timestamp offset when restoring a snapshotted TCP
          socket.  From Andrew Vagin.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
        net: fec: fix time stamping logic after napi conversion
        net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay
        mISDN: return -EINVAL on error in dsp_control_req()
        net: revert 8728c544 ("net: dev_pick_tx() fix")
        Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages"
        ipv4 tunnels: fix an oops when using ipip/sit with IPsec
        tipc: set sk_err correctly when connection fails
        tcp: tcp_make_synack() should use sock_wmalloc
        bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones
        ipv6: Don't depend on per socket memory for neighbour discovery messages
        ipv4: sendto/hdrincl: don't use destination address found in header
        tcp: don't apply tsoffset if rcv_tsecr is zero
        tcp: initialize rcv_tstamp for restored sockets
        net: xilinx: fix memleak
        net: usb: Add HP hs2434 device to ZLP exception table
        net: add cpu_relax to busy poll loop
        net: stmmac: fixed the pbl setting with DT
        genl: Hold reference on correct module while netlink-dump.
        genl: Fix genl dumpit() locking.
        xfrm: Fix potential null pointer dereference in xdst_queue_output
        ...
      a8787645
    • I
      MAINTAINERS: change my DT related maintainer address · de80963e
      Ian Campbell 提交于
      Filtering capabilities on my work email are pretty much non-existent and this
      has turned out to be something of a firehose...
      
      Cc: Stephen Warren <swarren@wwwdotorg.org>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Acked-by: NPawel Moll <pawel.moll@arm.com>
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de80963e
    • L
      Merge tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · 936dbcc3
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "This contains two Oops fixes (opti9xx and HD-audio) and a simple fixup
        for an Acer laptop.  All marked as stable patches"
      
      * tag 'sound-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: opti9xx: Fix conflicting driver object name
        ALSA: hda - Fix NULL dereference with CONFIG_SND_DYNAMIC_MINORS=n
        ALSA: hda - Add inverted digital mic fixup for Acer Aspire One
      936dbcc3
    • L
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · d9eda0fa
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Olof Johansson:
       "Two straggling fixes that I had missed as they were posted a couple of
        weeks ago, causing problems with interrupts (breaking them completely)
        on the CSR SiRF platforms"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        arm: prima2: drop nr_irqs in mach as we moved to linear irqdomain
        irqchip: sirf: move from legacy mode to linear irqdomain
      d9eda0fa
    • L
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 418a95bc
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "Since we are getting to the pointy end, one i915 black screen on some
        machines, and one vmwgfx stop userspace ability to nuke the VM,
      
        There might be one or two ati or nouveau fixes trickle in before
        final, but I think this should pretty much be it"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/vmwgfx: Split GMR2_REMAP commands if they are to large
        drm/i915: ivb: fix edp voltage swing reg val
      418a95bc
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 155e3a35
      Linus Torvalds 提交于
      Pull input layer updates from Dmitry Torokhov:
       "Just a couple of new IDs in Wacom and xpad drivers, i8042 is now
        disabled on ARC, and data checks in Elantech driver that were overly
        relaxed by the previous patch are now tightened"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: i8042 - disable the driver on ARC platforms
        Input: xpad - add signature for Razer Onza Classic Edition
        Input: elantech - fix packet check for v3 and v4 hardware
        Input: wacom - add support for 0x300 and 0x301
      155e3a35
    • R
      net: fec: fix time stamping logic after napi conversion · 0affdf34
      Richard Cochran 提交于
      Commit dc975382 "net: fec: add napi support to improve proformance"
      converted the fec driver to the napi model. However, that commit
      forgot to remove the call to skb_defer_rx_timestamp which is only
      needed in non-napi drivers.
      
      (The function napi_gro_receive eventually calls netif_receive_skb,
      which in turn calls skb_defer_rx_timestamp.)
      
      This patch should also be applied to the 3.9 and 3.10 kernels.
      Signed-off-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0affdf34
    • D
      net: bridge: convert MLDv2 Query MRC into msecs_to_jiffies for max_delay · 2d98c29b
      Daniel Borkmann 提交于
      While looking into MLDv1/v2 code, I noticed that bridging code does
      not convert it's max delay into jiffies for MLDv2 messages as we do
      in core IPv6' multicast code.
      
      RFC3810, 5.1.3. Maximum Response Code says:
      
        The Maximum Response Code field specifies the maximum time allowed
        before sending a responding Report. The actual time allowed, called
        the Maximum Response Delay, is represented in units of milliseconds,
        and is derived from the Maximum Response Code as follows: [...]
      
      As we update timers that work with jiffies, we need to convert it.
      Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
      Cc: Linus Lüssing <linus.luessing@web.de>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d98c29b
    • D
      mISDN: return -EINVAL on error in dsp_control_req() · 0d63c27d
      Dan Carpenter 提交于
      If skb->len is too short then we should return an error.  Otherwise we
      read beyond the end of skb->data for several bytes.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d63c27d
    • E
      net: revert 8728c544 ("net: dev_pick_tx() fix") · 702821f4
      Eric Dumazet 提交于
      commit 8728c544 ("net: dev_pick_tx() fix") and commit
      b6fe83e9 ("bonding: refine IFF_XMIT_DST_RELEASE capability")
      are quite incompatible : Queue selection is disabled because skb
      dst was dropped before entering bonding device.
      
      This causes major performance regression, mainly because TCP packets
      for a given flow can be sent to multiple queues.
      
      This is particularly visible when using the new FQ packet scheduler
      with MQ + FQ setup on the slaves.
      
      We can safely revert the first commit now that 416186fb
      ("net: Split core bits of netdev_pick_tx into __netdev_pick_tx")
      properly caps the queue_index.
      Reported-by: NXi Wang <xii@google.com>
      Diagnosed-by: NXi Wang <xii@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Alexander Duyck <alexander.h.duyck@intel.com>
      Cc: Denys Fedorysychenko <nuclearcat@nuclearcat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      702821f4
    • D
      Revert "ipv6: Don't depend on per socket memory for neighbour discovery messages" · 25ad6117
      David S. Miller 提交于
      This reverts commit 1f324e38.
      
      It seems to cause regressions, and in particular the output path
      really depends upon there being a socket attached to skb->sk for
      checks such as sk_mc_loop(skb->sk) for example.  See ip6_output_finish2().
      Reported-by: NStephen Warren <swarren@wwwdotorg.org>
      Reported-by: NFabio Estevam <festevam@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25ad6117
    • L
      ipv4 tunnels: fix an oops when using ipip/sit with IPsec · 737e828b
      Li Hongjun 提交于
      Since commit 3d7b46cd (ip_tunnel: push generic protocol handling to
      ip_tunnel module.), an Oops is triggered when an xfrm policy is configured on
      an IPv4 over IPv4 tunnel.
      
      xfrm4_policy_check() calls __xfrm_policy_check2(), which uses skb_dst(skb). But
      this field is NULL because iptunnel_pull_header() calls skb_dst_drop(skb).
      Signed-off-by: NLi Hongjun <hongjun.li@6wind.com>
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      737e828b
    • E
      tipc: set sk_err correctly when connection fails · 2c8d8518
      Erik Hugne 提交于
      Should a connect fail, if the publication/server is unavailable or
      due to some other error, a positive value will be returned and errno
      is never set. If the application code checks for an explicit zero
      return from connect (success) or a negative return (failure), it
      will not catch the error and subsequent send() calls will fail as
      shown from the strace snippet below.
      
      socket(0x1e /* PF_??? */, SOCK_SEQPACKET, 0) = 3
      connect(3, {sa_family=0x1e /* AF_??? */, sa_data="\2\1\322\4\0\0\322\4\0\0\0\0\0\0"}, 16) = 111
      sendto(3, "test", 4, 0, NULL, 0)        = -1 EPIPE (Broken pipe)
      
      The reason for this behaviour is that TIPC wrongly inverts error
      codes set in sk_err.
      Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2c8d8518
    • P
      tcp: tcp_make_synack() should use sock_wmalloc · eb8895de
      Phil Oester 提交于
      In commit 90ba9b19 (tcp: tcp_make_synack() can use alloc_skb()), Eric changed
      the call to sock_wmalloc in tcp_make_synack to alloc_skb.  In doing so,
      the netfilter owner match lost its ability to block the SYNACK packet on
      outbound listening sockets.  Revert the change, restoring the owner match
      functionality.
      
      This closes netfilter bugzilla #847.
      Signed-off-by: NPhil Oester <kernel@linuxace.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eb8895de
    • L
      bridge: separate querier and query timer into IGMP/IPv4 and MLD/IPv6 ones · cc0fdd80
      Linus Lüssing 提交于
      Currently we would still potentially suffer multicast packet loss if there
      is just either an IGMP or an MLD querier: For the former case, we would
      possibly drop IPv6 multicast packets, for the latter IPv4 ones. This is
      because we are currently assuming that if either an IGMP or MLD querier
      is present that the other one is present, too.
      
      This patch makes the behaviour and fix added in
      "bridge: disable snooping if there is no querier" (b00589af)
      to also work if there is either just an IGMP or an MLD querier on the
      link: It refines the deactivation of the snooping to be protocol
      specific by using separate timers for the snooped IGMP and MLD queries
      as well as separate timers for our internal IGMP and MLD queriers.
      Signed-off-by: NLinus Lüssing <linus.luessing@web.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc0fdd80
  4. 30 8月, 2013 14 次提交
  5. 29 8月, 2013 3 次提交
    • H
      cgroup: fix rmdir EBUSY regression in 3.11 · bb78a92f
      Hugh Dickins 提交于
      On 3.11-rc we are seeing cgroup directories left behind when they should
      have been removed.  Here's a trivial reproducer:
      
      cd /sys/fs/cgroup/memory
      mkdir parent parent/child; rmdir parent/child parent
      rmdir: failed to remove `parent': Device or resource busy
      
      It's because cgroup_destroy_locked() (step 1 of destruction) leaves
      cgroup on parent's children list, letting cgroup_offline_fn() (step 2 of
      destruction) remove it; but step 2 is run by work queue, which may not
      yet have removed the children when parent destruction checks the list.
      
      Fix that by checking through a non-empty list of children: if every one
      of them has already been marked CGRP_DEAD, then it's safe to proceed:
      those children are invisible to userspace, and should not obstruct rmdir.
      
      (I didn't see any reason to keep the cgrp->children checks under the
      unrelated css_set_lock, so moved them out.)
      
      tj: Flattened nested ifs a bit and updated comment so that it's
          correct on both for-3.11-fixes and for-3.12.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      bb78a92f
    • T
      workqueue: cond_resched() after processing each work item · b22ce278
      Tejun Heo 提交于
      If !PREEMPT, a kworker running work items back to back can hog CPU.
      This becomes dangerous when a self-requeueing work item which is
      waiting for something to happen races against stop_machine.  Such
      self-requeueing work item would requeue itself indefinitely hogging
      the kworker and CPU it's running on while stop_machine would wait for
      that CPU to enter stop_machine while preventing anything else from
      happening on all other CPUs.  The two would deadlock.
      
      Jamie Liu reports that this deadlock scenario exists around
      scsi_requeue_run_queue() and libata port multiplier support, where one
      port may exclude command processing from other ports.  With the right
      timing, scsi_requeue_run_queue() can end up requeueing itself trying
      to execute an IO which is asked to be retried while another device has
      an exclusive access, which in turn can't make forward progress due to
      stop_machine.
      
      Fix it by invoking cond_resched() after executing each work item.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJamie Liu <jamieliu@google.com>
      References: http://thread.gmane.org/gmane.linux.kernel/1552567
      Cc: stable@vger.kernel.org
      --
       kernel/workqueue.c |    9 +++++++++
       1 file changed, 9 insertions(+)
      b22ce278
    • L
      Merge branch 'akpm' (patches from Andrew Morton) · c95389b4
      Linus Torvalds 提交于
      Merge fixes from Andrew Morton:
       "Five fixes.
      
        err, make that six.  let me try again"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers
        memcg: check that kmem_cache has memcg_params before accessing it
        drivers/base/memory.c: fix show_mem_removable() to handle missing sections
        IPC: bugfix for msgrcv with msgtyp < 0
        Omnikey Cardman 4000: pull in ioctl.h in user header
        timer_list: correct the iterator for timer_list
      c95389b4