1. 22 6月, 2013 1 次提交
  2. 18 6月, 2013 2 次提交
    • K
      xen/blkback: Check for insane amounts of request on the ring (v6). · 8e3f8755
      Konrad Rzeszutek Wilk 提交于
      Check that the ring does not have an insane amount of requests
      (more than there could fit on the ring).
      
      If we detect this case we will stop processing the requests
      and wait until the XenBus disconnects the ring.
      
      The existing check RING_REQUEST_CONS_OVERFLOW which checks for how
      many responses we have created in the past (rsp_prod_pvt) vs
      requests consumed (req_cons) and whether said difference is greater or
      equal to the size of the ring, does not catch this case.
      
      Wha the condition does check if there is a need to process more
      as we still have a backlog of responses to finish. Note that both
      of those values (rsp_prod_pvt and req_cons) are not exposed on the
      shared ring.
      
      To understand this problem a mini crash course in ring protocol
      response/request updates is in place.
      
      There are four entries: req_prod and rsp_prod; req_event and rsp_event
      to track the ring entries. We are only concerned about the first two -
      which set the tone of this bug.
      
      The req_prod is a value incremented by frontend for each request put
      on the ring. Conversely the rsp_prod is a value incremented by the backend
      for each response put on the ring (rsp_prod gets set by rsp_prod_pvt when
      pushing the responses on the ring).  Both values can
      wrap and are modulo the size of the ring (in block case that is 32).
      Please see RING_GET_REQUEST and RING_GET_RESPONSE for the more details.
      
      The culprit here is that if the difference between the
      req_prod and req_cons is greater than the ring size we have a problem.
      Fortunately for us, the '__do_block_io_op' loop:
      
      	rc = blk_rings->common.req_cons;
      	rp = blk_rings->common.sring->req_prod;
      
      	while (rc != rp) {
      
      		..
      		blk_rings->common.req_cons = ++rc; /* before make_response() */
      
      	}
      
      will loop up to the point when rc == rp. The macros inside of the
      loop (RING_GET_REQUEST) is smart and is indexing based on the modulo
      of the ring size. If the frontend has provided a bogus req_prod value
      we will loop until the 'rc == rp' - which means we could be processing
      already processed requests (or responses) often.
      
      The reason the RING_REQUEST_CONS_OVERFLOW is not helping here is
      b/c it only tracks how many responses we have internally produced
      and whether we would should process more. The astute reader will
      notice that the macro RING_REQUEST_CONS_OVERFLOW provides two
      arguments - more on this later.
      
      For example, if we were to enter this function with these values:
      
             	blk_rings->common.sring->req_prod =  X+31415 (X is the value from
      		the last time __do_block_io_op was called).
              blk_rings->common.req_cons = X
              blk_rings->common.rsp_prod_pvt = X
      
      The RING_REQUEST_CONS_OVERFLOW(&blk_rings->common, blk_rings->common.req_cons)
      is doing:
      
      	req_cons - rsp_prod_pvt >= 32
      
      Which is,
      	X - X >= 32 or 0 >= 32
      
      And that is false, so we continue on looping (this bug).
      
      If we re-use said macro RING_REQUEST_CONS_OVERFLOW and pass in the rp
      instead (sring->req_prod) of rc, the this macro can do the check:
      
           req_prod - rsp_prov_pvt >= 32
      
      Which is,
             X + 31415 - X >= 32 , or 31415 >= 32
      
      which is true, so we can error out and break out of the function.
      
      Unfortunatly the difference between rsp_prov_pvt and req_prod can be
      at 32 (which would error out in the macro). This condition exists when
      the backend is lagging behind with the responses and still has not finished
      responding to all of them (so make_response has not been called), and
      the rsp_prov_pvt + 32 == req_cons. This ends up with us not being able
      to use said macro.
      
      Hence introducing a new macro called RING_REQUEST_PROD_OVERFLOW which does
      a simple check of:
      
          req_prod - rsp_prod_pvt > RING_SIZE
      
      And with the X values from above:
      
         X + 31415 - X > 32
      
      Returns true. Also not that if the ring is full (which is where
      the RING_REQUEST_CONS_OVERFLOW triggered), we would not hit the
      same condition:
      
         X + 32 - X > 32
      
      Which is false.
      
      Lets use that macro.
      Note that in v5 of this patchset the macro was different - we used an
      earlier version.
      
      Cc: stable@vger.kernel.org
      [v1: Move the check outside the loop]
      [v2: Add a pr_warn as suggested by David]
      [v3: Use RING_REQUEST_CONS_OVERFLOW as suggested by Jan]
      [v4: Move wake_up after kthread_stop as suggested by Jan]
      [v5: Use RING_REQUEST_PROD_OVERFLOW instead]
      [v6: Use RING_REQUEST_PROD_OVERFLOW - Jan's version]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Reviewed-by: NJan Beulich <jbeulich@suse.com>
      
      gadsa
      8e3f8755
    • J
      xen/io/ring.h: new macro to detect whether there are too many requests on the ring · 8d925690
      Jan Beulich 提交于
      Backends may need to protect themselves against an insane number of
      produced requests stored by a frontend, in case they iterate over
      requests until reaching the req_prod value. There can't be more
      requests on the ring than the difference between produced requests
      and produced (but possibly not yet published) responses.
      
      This is a more strict alternative to a patch previously posted by
      Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      8d925690
  3. 08 6月, 2013 2 次提交
    • K
      xen/blkback: Check device permissions before allowing OP_DISCARD · 604c499c
      Konrad Rzeszutek Wilk 提交于
      We need to make sure that the device is not RO or that
      the request is not past the number of sectors we want to
      issue the DISCARD operation for.
      
      This fixes CVE-2013-2140.
      
      Cc: stable@vger.kernel.org
      Acked-by: NJan Beulich <JBeulich@suse.com>
      Acked-by: NIan Campbell <Ian.Campbell@citrix.com>
      [v1: Made it pr_warn instead of pr_debug]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      604c499c
    • S
      xen/blkback: Use physical sector size for setup · 7c4d7d71
      Stefan Bader 提交于
      Currently xen-blkback passes the logical sector size over xenbus and
      xen-blkfront sets up the paravirt disk with that logical block size.
      But newer drives usually have the logical sector size set to 512 for
      compatibility reasons and would show the actual sector size only in
      physical sector size.
      This results in the device being partitioned and accessed in dom0 with
      the correct sector size, but the guest thinks 512 bytes is the correct
      block size. And that results in poor performance.
      
      To fix this, blkback gets modified to pass also physical-sector-size
      over xenbus and blkfront to use both values to set up the paravirt
      disk. I did not just change the passed in sector-size because I am
      not sure having a bigger logical sector size than the physical one
      is valid (and that would happen if a newer dom0 kernel hits an older
      domU kernel). Also this way a domU set up before should still be
      accessible (just some tools might detect the unaligned setup).
      
      [v2: Make xenbus write failure non-fatal]
      [v3: Use xenbus_scanf instead of xenbus_gather]
      [v4: Rebased against segment changes]
      Signed-off-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      7c4d7d71
  4. 05 6月, 2013 2 次提交
  5. 08 5月, 2013 1 次提交
  6. 07 5月, 2013 1 次提交
  7. 19 4月, 2013 1 次提交
    • R
      xen-block: implement indirect descriptors · 402b27f9
      Roger Pau Monne 提交于
      Indirect descriptors introduce a new block operation
      (BLKIF_OP_INDIRECT) that passes grant references instead of segments
      in the request. This grant references are filled with arrays of
      blkif_request_segment_aligned, this way we can send more segments in a
      request.
      
      The proposed implementation sets the maximum number of indirect grefs
      (frames filled with blkif_request_segment_aligned) to 256 in the
      backend and 32 in the frontend. The value in the frontend has been
      chosen experimentally, and the backend value has been set to a sane
      value that allows expanding the maximum number of indirect descriptors
      in the frontend if needed.
      
      The migration code has changed from the previous implementation, in
      which we simply remapped the segments on the shared ring. Now the
      maximum number of segments allowed in a request can change depending
      on the backend, so we have to requeue all the requests in the ring and
      in the queue and split the bios in them if they are bigger than the
      new maximum number of segments.
      
      [v2: Fixed minor comments by Konrad.
      [v1: Added padding to make the indirect request 64bit aligned.
       Added some BUGs, comments; fixed number of indirect pages in
       blkif_get_x86_{32/64}_req. Added description about the indirect operation
       in blkif.h]
      Signed-off-by: NRoger Pau Monné <roger.pau@citrix.com>
      [v3: Fixed spaces and tabs mix ups]
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      402b27f9
  8. 18 4月, 2013 6 次提交
  9. 15 4月, 2013 10 次提交
  10. 14 4月, 2013 3 次提交
    • N
      watchdog: Revert the AT91RM9200_WATCHDOG dependency · 09549cd0
      Nicolas Ferre 提交于
      Compiling the at91rm9200_wdt.c driver without at91rm9200
      support was leading to several errors:
      
      drivers/built-in.o: In function `at91_wdt_close':
      at91_adc.c:(.text+0xc9fe4): undefined reference to `at91_st_base'
      drivers/built-in.o: In function `at91_wdt_write':
      at91_adc.c:(.text+0xca004): undefined reference to `at91_st_base'
      drivers/built-in.o: In function `at91wdt_shutdown':
      at91_adc.c:(.text+0xca01c): undefined reference to `at91_st_base'
      drivers/built-in.o: In function `at91wdt_suspend':
      at91_adc.c:(.text+0xca038): undefined reference to `at91_st_base'
      drivers/built-in.o: In function `at91_wdt_open':
      at91_adc.c:(.text+0xca0cc): undefined reference to `at91_st_base'
      drivers/built-in.o:at91_adc.c:(.text+0xca2c8): more undefined references to
      `at91_st_base' follow
      
      So, reverting the modification of the "depends" Kconfig line
      introduced by patch a6a1bcd3 (watchdog: at91rm9200: add DT support)
      seems to be the good solution.
      Signed-off-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Acked-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
      09549cd0
    • S
      vfs: Revert spurious fix to spinning prevention in prune_icache_sb · 5b55d708
      Suleiman Souhlal 提交于
      Revert commit 62a3ddef ("vfs: fix spinning prevention in prune_icache_sb").
      
      This commit doesn't look right: since we are looking at the tail of the
      list (sb->s_inode_lru.prev) if we want to skip an inode, we should put
      it back at the head of the list instead of the tail, otherwise we will
      keep spinning on it.
      
      Discovered when investigating why prune_icache_sb came top in perf
      reports of a swapping load.
      Signed-off-by: NSuleiman Souhlal <suleiman@google.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org # v3.2+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b55d708
    • L
      kobject: fix kset_find_obj() race with concurrent last kobject_put() · a49b7e82
      Linus Torvalds 提交于
      Anatol Pomozov identified a race condition that hits module unloading
      and re-loading.  To quote Anatol:
      
       "This is a race codition that exists between kset_find_obj() and
        kobject_put().  kset_find_obj() might return kobject that has refcount
        equal to 0 if this kobject is freeing by kobject_put() in other
        thread.
      
        Here is timeline for the crash in case if kset_find_obj() searches for
        an object tht nobody holds and other thread is doing kobject_put() on
        the same kobject:
      
          THREAD A (calls kset_find_obj())     THREAD B (calls kobject_put())
          splin_lock()
                                               atomic_dec_return(kobj->kref), counter gets zero here
                                               ... starts kobject cleanup ....
                                               spin_lock() // WAIT thread A in kobj_kset_leave()
          iterate over kset->list
          atomic_inc(kobj->kref) (counter becomes 1)
          spin_unlock()
                                               spin_lock() // taken
                                               // it does not know that thread A increased counter so it
                                               remove obj from list
                                               spin_unlock()
                                               vfree(module) // frees module object with containing kobj
      
          // kobj points to freed memory area!!
          kobject_put(kobj) // OOPS!!!!
      
        The race above happens because module.c tries to use kset_find_obj()
        when somebody unloads module.  The module.c code was introduced in
        commit 6494a93d"
      
      Anatol supplied a patch specific for module.c that worked around the
      problem by simply not using kset_find_obj() at all, but rather than make
      a local band-aid, this just fixes kset_find_obj() to be thread-safe
      using the proper model of refusing the get a new reference if the
      refcount has already dropped to zero.
      
      See examples of this proper refcount handling not only in the kref
      documentation, but in various other equivalent uses of this pattern by
      grepping for atomic_inc_not_zero().
      
      [ Side note: the module race does indicate that module loading and
        unloading is not properly serialized wrt sysfs information using the
        module mutex.  That may require further thought, but this is the
        correct fix at the kobject layer regardless. ]
      Reported-analyzed-and-tested-by: NAnatol Pomozov <anatol.pomozov@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a49b7e82
  11. 13 4月, 2013 7 次提交
    • J
      Btrfs: make sure nbytes are right after log replay · 4bc4bee4
      Josef Bacik 提交于
      While trying to track down a tree log replay bug I noticed that fsck was always
      complaining about nbytes not being right for our fsynced file.  That is because
      the new fsync stuff doesn't wait for ordered extents to complete, so the inodes
      nbytes are not necessarily updated properly when we log it.  So to fix this we
      need to set nbytes to whatever it is on the inode that is on disk, so when we
      replay the extents we can just add the bytes that are being added as we replay
      the extent.  This makes it work for the case that we have the wrong nbytes or
      the case that we logged everything and nbytes is actually correct.  With this
      I'm no longer getting nbytes errors out of btrfsck.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      Signed-off-by: NChris Mason <chris.mason@fusionio.com>
      4bc4bee4
    • D
      x86-32: Fix possible incomplete TLB invalidate with PAE pagetables · 1de14c3c
      Dave Hansen 提交于
      This patch attempts to fix:
      
      	https://bugzilla.kernel.org/show_bug.cgi?id=56461
      
      The symptom is a crash and messages like this:
      
      	chrome: Corrupted page table at address 34a03000
      	*pdpt = 0000000000000000 *pde = 0000000000000000
      	Bad pagetable: 000f [#1] PREEMPT SMP
      
      Ingo guesses this got introduced by commit 611ae8e3 ("x86/tlb:
      enable tlb flush range support for x86") since that code started to free
      unused pagetables.
      
      On x86-32 PAE kernels, that new code has the potential to free an entire
      PMD page and will clear one of the four page-directory-pointer-table
      (aka pgd_t entries).
      
      The hardware aggressively "caches" these top-level entries and invlpg
      does not actually affect the CPU's copy.  If we clear one we *HAVE* to
      do a full TLB flush, otherwise we might continue using a freed pmd page.
      (note, we do this properly on the population side in pud_populate()).
      
      This patch tracks whenever we clear one of these entries in the 'struct
      mmu_gather', and ensures that we follow up with a full tlb flush.
      
      BTW, I disassembled and checked that:
      
      	if (tlb->fullmm == 0)
      and
      	if (!tlb->fullmm && !tlb->need_flush_all)
      
      generate essentially the same code, so there should be zero impact there
      to the !PAE case.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Artem S Tashkinov <t.artem@mailcity.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1de14c3c
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending · bf81710c
      Linus Torvalds 提交于
      Pull SCSI target fixes from Nicholas Bellinger:
       "Here are remaining target-pending items for v3.9-rc7 code.
      
        The tcm_vhost patches are more than I'd usually include in a -rc7
        pull, but are changes required for v3.9 to work correctly with the
        pending vhost-scsi-pci QEMU upstream series merge.  (Paolo CC'ed)
      
        Plus Asias's conversion to use vhost_virtqueue->private_data + RCU for
        managing vhost-scsi endpoints has gotten alot of review + testing over
        the past weeks, and MST has ACKed the full series.
      
        Also, there is a target patch to fix a long-standing bug within
        control CDB handling with Standby/Offline/Transition ALUA port access
        states, that had been incorrectly rejecting the control CDBs required
        for LUN scan to work during these port group states.  CC'ing to
        stable."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
        target: Fix incorrect fallthrough of ALUA Standby/Offline/Transition CDBs
        tcm_vhost: Send bad target to guest when cmd fails
        tcm_vhost: Add vhost_scsi_send_bad_target() helper
        tcm_vhost: Fix tv_cmd leak in vhost_scsi_handle_vq
        tcm_vhost: Remove double check of response
        tcm_vhost: Initialize vq->last_used_idx when set endpoint
        tcm_vhost: Use vq->private_data to indicate if the endpoint is setup
        tcm_vhost: Use ACCESS_ONCE for vs->vs_tpg[target] access
      bf81710c
    • L
      Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi · 90f340e2
      Linus Torvalds 提交于
      Pull SCSI fixes from James Bottomley:
       "This is a set of ten bug fixes (and two consisting of copyright year
        update and version number change) pretty much all of which involve
        either a crash or a hang except the removal of the random sleep from
        the qla2xxx driver (which is a coding error so bad, we want it gone
        before anyone has a chance to copy it)."
      
      * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
        [SCSI] lpfc: fix potential NULL pointer dereference in lpfc_sli4_rq_put()
        [SCSI] libsas: fix handling vacant phy in sas_set_ex_phy()
        [SCSI] ibmvscsi: Fix slave_configure deadlock
        [SCSI] qla2xxx: Update the driver version to 8.04.00.13-k.
        [SCSI] qla2xxx: Remove debug code that msleeps for random duration.
        [SCSI] qla2xxx: Update copyright dates information in LICENSE.qla2xxx file.
        [SCSI] qla2xxx: Fix crash during firmware dump procedure.
        [SCSI] Revert "qla2xxx: Add setting of driver version string for vendor application."
        [SCSI] ipr: dlpar failed when adding an adapter back
        [SCSI] ipr: fix addition of abort command to HRRQ free queue
        [SCSI] st: Take additional queue ref in st_probe
        [SCSI] libsas: use right function to alloc smp response
        [SCSI] ipr: ipr_test_msi() fails when running with msi-x enabled adapter
      90f340e2
    • L
      Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · 0b1fd266
      Linus Torvalds 提交于
      Pull CIFS fix from Steve French:
       "Fixes a regression in cifs in which a password which begins with a
        comma is parsed incorrectly as a blank password"
      
      * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
        cifs: Allow passwords which begin with a delimitor
      0b1fd266
    • S
      ftrace: Move ftrace_filter_lseek out of CONFIG_DYNAMIC_FTRACE section · 7f49ef69
      Steven Rostedt (Red Hat) 提交于
      As ftrace_filter_lseek is now used with ftrace_pid_fops, it needs to
      be moved out of the #ifdef CONFIG_DYNAMIC_FTRACE section as the
      ftrace_pid_fops is defined when DYNAMIC_FTRACE is not.
      
      Cc: stable@vger.kernel.org
      Cc: Namhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      7f49ef69
    • N
      tracing: Fix possible NULL pointer dereferences · 6a76f8c0
      Namhyung Kim 提交于
      Currently set_ftrace_pid and set_graph_function files use seq_lseek
      for their fops.  However seq_open() is called only for FMODE_READ in
      the fops->open() so that if an user tries to seek one of those file
      when she open it for writing, it sees NULL seq_file and then panic.
      
      It can be easily reproduced with following command:
      
        $ cd /sys/kernel/debug/tracing
        $ echo 1234 | sudo tee -a set_ftrace_pid
      
      In this example, GNU coreutils' tee opens the file with fopen(, "a")
      and then the fopen() internally calls lseek().
      
      Link: http://lkml.kernel.org/r/1365663302-2170-1-git-send-email-namhyung@kernel.org
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Namhyung Kim <namhyung.kim@lge.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NNamhyung Kim <namhyung@kernel.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6a76f8c0
  12. 12 4月, 2013 4 次提交