1. 30 7月, 2012 8 次提交
    • A
      virtio-blk: Use block layer provided spinlock · 2c95a329
      Asias He 提交于
      Block layer will allocate a spinlock for the queue if the driver does
      not provide one in blk_init_queue().
      
      The reason to use the internal spinlock is that blk_cleanup_queue() will
      switch to use the internal spinlock in the cleanup code path.
      
              if (q->queue_lock != &q->__queue_lock)
                      q->queue_lock = &q->__queue_lock;
      
      However, processes which are in D state might have taken the driver
      provided spinlock, when the processes wake up, they would release the
      block provided spinlock.
      
      =====================================
      [ BUG: bad unlock balance detected! ]
      3.4.0-rc7+ #238 Not tainted
      -------------------------------------
      fio/3587 is trying to release lock (&(&q->__queue_lock)->rlock) at:
      [<ffffffff813274d2>] blk_queue_bio+0x2a2/0x380
      but there are no more locks to release!
      
      other info that might help us debug this:
      1 lock held by fio/3587:
       #0:  (&(&vblk->lock)->rlock){......}, at:
      [<ffffffff8132661a>] get_request_wait+0x19a/0x250
      
      Other drivers use block layer provided spinlock as well, e.g. SCSI.
      
      Switching to the block layer provided spinlock saves a bit of memory and
      does not increase lock contention. Performance test shows no real
      difference is observed before and after this patch.
      
      Changes in v2: Improve commit log as Michael suggested.
      
      Cc: virtualization@lists.linux-foundation.org
      Cc: kvm@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NAsias He <asias@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      2c95a329
    • A
      virtio-blk: Reset device after blk_cleanup_queue() · 483001c7
      Asias He 提交于
      blk_cleanup_queue() will call blk_drian_queue() to drain all the
      requests before queue DEAD marking. If we reset the device before
      blk_cleanup_queue() the drain would fail.
      
      1) if the queue is stopped in do_virtblk_request() because device is
      full, the q->request_fn() will not be called.
      
      blk_drain_queue() {
         while(true) {
            ...
            if (!list_empty(&q->queue_head))
              __blk_run_queue(q) {
      	    if (queue is not stoped)
      		q->request_fn()
      	}
            ...
         }
      }
      
      Do no reset the device before blk_cleanup_queue() gives the chance to
      start the queue in interrupt handler blk_done().
      
      2) In commit b79d866c, We abort requests
      dispatched to driver before blk_cleanup_queue(). There is a race if
      requests are dispatched to driver after the abort and before the queue
      DEAD mark. To fix this, instead of aborting the requests explicitly, we
      can just reset the device after after blk_cleanup_queue so that the
      device can complete all the requests before queue DEAD marking in the
      drain process.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: virtualization@lists.linux-foundation.org
      Cc: kvm@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NAsias He <asias@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      483001c7
    • A
      virtio-blk: Call del_gendisk() before disable guest kick · 02e2b124
      Asias He 提交于
      del_gendisk() might not return due to failing to remove the
      /sys/block/vda/serial sysfs entry when another thread (udev) is
      trying to read it.
      
      virtblk_remove()
        vdev->config->reset() : guest will not kick us through interrupt
          del_gendisk()
            device_del()
              kobject_del(): got stuck, sysfs entry ref count non zero
      
      sysfs_open_file(): user space process read /sys/block/vda/serial
         sysfs_get_active() : got sysfs entry ref count
            dev_attr_show()
              virtblk_serial_show()
                 blk_execute_rq() : got stuck, interrupt is disabled
                                    request cannot be finished
      
      This patch fixes it by calling del_gendisk() before we disable guest's
      interrupt so that the request sent in virtblk_serial_show() will be
      finished and del_gendisk() will success.
      
      This fixes another race in hot-unplug process.
      
      It is save to call del_gendisk(vblk->disk) before
      flush_work(&vblk->config_work) which might access vblk->disk, because
      vblk->disk is not freed until put_disk(vblk->disk).
      
      Cc: virtualization@lists.linux-foundation.org
      Cc: kvm@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NAsias He <asias@redhat.com>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      02e2b124
    • A
      virtio: rng: s3/s4 support · 0bc1a2ef
      Amit Shah 提交于
      Unregister from the hwrng interface and remove the vq before entering
      the S3 or S4 states.  Add the vq and re-register with hwrng on restore.
      Signed-off-by: NAmit Shah <amit.shah@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      0bc1a2ef
    • A
      virtio: rng: split out common code in probe / remove for s3/s4 ops · 178d855e
      Amit Shah 提交于
      The freeze/restore s3/s4 operations will use code that's common to the
      probe and remove routines.  Put the common code in separate funcitons.
      Signed-off-by: NAmit Shah <amit.shah@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      178d855e
    • A
      virtio: rng: don't wait on host when module is going away · 4476987a
      Amit Shah 提交于
      No use waiting for input from host when the module is being removed.
      We're going to remove the vq in the next step anyway, so just perform
      any other steps for cleanup (currently none).
      Signed-off-by: NAmit Shah <amit.shah@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      4476987a
    • A
      virtio: rng: allow tasks to be killed that are waiting for rng input · cc8744e1
      Amit Shah 提交于
      Use wait_for_completion_killable() instead of wait_for_completion() when
      waiting for the host to send us entropy.  Without this,
      
        # cat /dev/hwrng
        ^C
      
      just hangs.
      Signed-off-by: NAmit Shah <amit.shah@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      cc8744e1
    • A
      virtio ids: fix comment for virtio-rng · ddcc2869
      Amit Shah 提交于
      It's virtio-rng, not virtio-ring.
      Signed-off-by: NAmit Shah <amit.shah@redhat.com>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      ddcc2869
  2. 28 7月, 2012 13 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · f7da9cdf
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
       "Several bug fixes, some to new features appearing in this merge
        window, some that have been around for a while.
      
        I have a short list of known problems that need to be sorted out, but
        all of them can be solved easily during the run up to 3.6-final.
      
        I'll be offline until Sunday afternoon, but nothing need hold up
        3.6-rc1 and the close of the merge window, networking wise, at this
        point.
      
        1) Fix interface check in ipv4 TCP early demux, from Eric Dumazet.
      
        2) Fix a long standing bug in TCP DMA to userspace offload that can
           hang applications using MSG_TRUNC, from Jiri Kosina.
      
        3) Don't allow TCP_USER_TIMEOUT to be negative, from Hangbin Liu.
      
        4) Don't use GFP_KERNEL under spinlock in kaweth driver, from Dan
           Carpenter"
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        tcp: perform DMA to userspace only if there is a task waiting for it
        Revert "openvswitch: potential NULL deref in sample()"
        ipv4: fix TCP early demux
        net: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handling
        USB: kaweth.c: use GFP_ATOMIC under spin_lock
        tcp: Add TCP_USER_TIMEOUT negative value check
        bcma: add missing iounmap on error path
        bcma: fix regression in interrupt assignment on mips
        mac80211_hwsim: fix possible race condition in usage of info->control.sta & control.vif
      f7da9cdf
    • L
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 173f8654
      Linus Torvalds 提交于
      Pull ext4 updates from Ted Ts'o:
       "The usual collection of bug fixes and optimizations.  Perhaps of
        greatest note is a speed up for parallel, non-allocating DIO writes,
        since we no longer take the i_mutex lock in that case.
      
        For bug fixes, we fix an incorrect overhead calculation which caused
        slightly incorrect results for df(1) and statfs(2).  We also fixed
        bugs in the metadata checksum feature."
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (23 commits)
        ext4: undo ext4_calc_metadata_amount if we fail to claim space
        ext4: don't let i_reserved_meta_blocks go negative
        ext4: fix hole punch failure when depth is greater than 0
        ext4: remove unnecessary argument from __ext4_handle_dirty_metadata()
        ext4: weed out ext4_write_super
        ext4: remove unnecessary superblock dirtying
        ext4: convert last user of ext4_mark_super_dirty() to ext4_handle_dirty_super()
        ext4: remove useless marking of superblock dirty
        ext4: fix ext4 mismerge back in January
        ext4: remove dynamic array size in ext4_chksum()
        ext4: remove unused variable in ext4_update_super()
        ext4: make quota as first class supported feature
        ext4: don't take the i_mutex lock when doing DIO overwrites
        ext4: add a new nolock flag in ext4_map_blocks
        ext4: split ext4_file_write into buffered IO and direct IO
        ext4: remove an unused statement in ext4_mb_get_buddy_page_lock()
        ext4: fix out-of-date comments in extents.c
        ext4: use s_csum_seed instead of i_csum_seed for xattr block
        ext4: use proper csum calculation in ext4_rename
        ext4: fix overhead calculation used by ext4_statfs()
        ...
      173f8654
    • L
      Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm · cea8f46c
      Linus Torvalds 提交于
      Pull ARM updates from Russell King:
       "First ARM push of this merge window, post me coming back from holiday.
        This is what has been in linux-next for the last few weeks.  Not much
        to say which isn't described by the commit summaries."
      
      * 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (32 commits)
        ARM: 7463/1: topology: Update cpu_power according to DT information
        ARM: 7462/1: topology: factorize the update of sibling masks
        ARM: 7461/1: topology: Add arch_scale_freq_power function
        ARM: 7456/1: ptrace: provide separate functions for tracing syscall {entry,exit}
        ARM: 7455/1: audit: move syscall auditing until after ptrace SIGTRAP handling
        ARM: 7454/1: entry: don't bother with syscall tracing on ret_from_fork path
        ARM: 7453/1: audit: only allow syscall auditing for pure EABI userspace
        ARM: 7452/1: delay: allow timer-based delay implementation to be selected
        ARM: 7451/1: arch timer: implement read_current_timer and get_cycles
        ARM: 7450/1: dcache: select DCACHE_WORD_ACCESS for little-endian ARMv6+ CPUs
        ARM: 7449/1: use generic strnlen_user and strncpy_from_user functions
        ARM: 7448/1: perf: remove arm_perf_pmu_ids global enumeration
        ARM: 7447/1: rwlocks: remove unused branch labels from trylock routines
        ARM: 7446/1: spinlock: use ticket algorithm for ARMv6+ locking implementation
        ARM: 7445/1: mm: update CONTEXTIDR register to contain PID of current process
        ARM: 7444/1: kernel: add arch-timer C3STOP feature
        ARM: 7460/1: remove asm/locks.h
        ARM: 7439/1: head.S: simplify initial page table mapping
        ARM: 7437/1: zImage: Allow DTB command line concatenation with ATAG_CMDLINE
        ARM: 7436/1: Do not map the vectors page as write-through on UP systems
        ...
      cea8f46c
    • R
    • D
      Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless · 7b9b04fb
      David S. Miller 提交于
      John W. Linville says:
      
      ====================
      These fixes are intended for the 3.6 stream.
      
      Hauke Mehrtens provides a pair of bcma fixes, one to fix a build
      regression on mips and another to correct a pair of missing iounmap
      calls.
      
      Thomas Huehn offers a mac80211_hwsim fix to avoid a possible
      use-after-free bug.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7b9b04fb
    • J
      tcp: perform DMA to userspace only if there is a task waiting for it · 59ea33a6
      Jiri Kosina 提交于
      Back in 2006, commit 1a2449a8 ("[I/OAT]: TCP recv offload to I/OAT")
      added support for receive offloading to IOAT dma engine if available.
      
      The code in tcp_rcv_established() tries to perform early DMA copy if
      applicable. It however does so without checking whether the userspace
      task is actually expecting the data in the buffer.
      
      This is not a problem under normal circumstances, but there is a corner
      case where this doesn't work -- and that's when MSG_TRUNC flag to
      recvmsg() is used.
      
      If the IOAT dma engine is not used, the code properly checks whether
      there is a valid ucopy.task and the socket is owned by userspace, but
      misses the check in the dmaengine case.
      
      This problem can be observed in real trivially -- for example 'tbench' is a
      good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
      IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
      have been already early-copied in tcp_rcv_established() using dma engine.
      
      This patch introduces the same check we are performing in the simple
      iovec copy case to the IOAT case as well. It fixes the indefinite
      recvmsg(MSG_TRUNC) hangs.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      59ea33a6
    • J
      Revert "openvswitch: potential NULL deref in sample()" · 60810307
      Jesse Gross 提交于
      This reverts commit 5b3e7e6c.
      
      The problem that the original commit was attempting to fix can
      never happen in practice because validation is done one a per-flow
      basis rather than a per-packet basis.  Adding additional checks at
      runtime is unnecessary and inconsistent with the rest of the code.
      
      CC: Dan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJesse Gross <jesse@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      60810307
    • E
      ipv4: fix TCP early demux · 505fbcf0
      Eric Dumazet 提交于
      commit 92101b3b (ipv4: Prepare for change of rt->rt_iif encoding.)
      invalidated TCP early demux, because rx_dst_ifindex is not properly
      initialized and checked.
      
      Also remove the use of inet_iif(skb) in favor or skb->skb_iif
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      505fbcf0
    • J
      net: fix rtnetlink IFF_PROMISC and IFF_ALLMULTI handling · b1beb681
      Jiri Benc 提交于
      When device flags are set using rtnetlink, IFF_PROMISC and IFF_ALLMULTI
      flags are handled specially. Function dev_change_flags sets IFF_PROMISC and
      IFF_ALLMULTI bits in dev->gflags according to the passed value but
      do_setlink passes a result of rtnl_dev_combine_flags which takes those bits
      from dev->flags.
      
      This can be easily trigerred by doing:
      
      tcpdump -i eth0 &
      ip l s up eth0
      
      ip sets IFF_UP flag in ifi_flags and ifi_change, which is combined with
      IFF_PROMISC by rtnl_dev_combine_flags, causing __dev_change_flags to set
      IFF_PROMISC in gflags.
      Reported-by: NMax Matveev <makc@redhat.com>
      Signed-off-by: NJiri Benc <jbenc@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1beb681
    • D
      USB: kaweth.c: use GFP_ATOMIC under spin_lock · e4c7f259
      Dan Carpenter 提交于
      The problem is that we call this with a spin lock held.  The call tree
      is:
      	kaweth_start_xmit() holds kaweth->device_lock.
      	-> kaweth_async_set_rx_mode()
      	   -> kaweth_control()
      	      -> kaweth_internal_control_msg()
      
      The kaweth_internal_control_msg() function is only called from
      kaweth_control() which used GFP_ATOMIC for its allocations.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e4c7f259
    • H
      tcp: Add TCP_USER_TIMEOUT negative value check · 42493570
      Hangbin Liu 提交于
      TCP_USER_TIMEOUT is a TCP level socket option that takes an unsigned int. But
      patch "tcp: Add TCP_USER_TIMEOUT socket option"(dca43c75) didn't check the negative
      values. If a user assign -1 to it, the socket will set successfully and wait
      for 4294967295 miliseconds. This patch add a negative value check to avoid
      this issue.
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      42493570
    • L
      Merge tag 'tty-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty · c1e7179a
      Linus Torvalds 提交于
      Pull TTY/Serial patches from Greg Kroah-Hartman:
       "Here's the "tiny" set of patches for 3.6-rc1 for the tty layer and
        serial drivers.  They were cherry-picked from the tty-next branch of
        the tty git tree, as they are small and "obvious" fixes.  The larger
        changes, as mentioned before, will be saved for the 3.7-rc1 merge
        window.
      
        All of these changes have been in the linux-next releases for quite a
        while.
      
        Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"
      
      * tag 'tty-3.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
        pch_uart: Fix parity setting issue
        pch_uart: Fix rx error interrupt setting issue
        pch_uart: Fix missing break for 16 byte fifo
        tty ldisc: Close/Reopen race prevention should check the proper flag
        pch_uart: Add eg20t_port lock field, avoid recursive spinlocks
        vt: fix race in vt_waitactive()
        serial/of-serial: Add LPC3220 standard UART compatible string
        serial/8250: Add LPC3220 standard UART type
        serial_core: Update buffer overrun statistics.
        serial: samsung: Fixed wrong comparison for baudclk_rate
      c1e7179a
    • L
      Merge branch 'kmap_atomic' of git://github.com/congwang/linux · 84eda280
      Linus Torvalds 提交于
      Pull final kmap_atomic cleanups from Cong Wang:
       "This should be the final round of cleanup, as the definitions of enum
        km_type finally get removed from the whole tree.  The patches have
        been in linux-next for a long time."
      
      * 'kmap_atomic' of git://github.com/congwang/linux:
        pipe: remove KM_USER0 from comments
        vmalloc: remove KM_USER0 from comments
        feature-removal-schedule.txt: remove kmap_atomic(page, km_type)
        tile: remove km_type definitions
        um: remove km_type definitions
        asm-generic: remove km_type definitions
        avr32: remove km_type definitions
        frv: remove km_type definitions
        powerpc: remove km_type definitions
        arm: remove km_type definitions
        highmem: remove the deprecated form of kmap_atomic
        tile: remove usage of enum km_type
        frv: remove the second parameter of kmap_atomic_primary()
        jbd2: remove the second argument of kmap_atomic
      84eda280
  3. 27 7月, 2012 19 次提交