1. 06 2月, 2015 12 次提交
    • N
      md: make reconfig_mutex optional for writes to md sysfs files. · 6791875e
      NeilBrown 提交于
      Rather than using mddev_lock() to take the reconfig_mutex
      when writing to any md sysfs file, we only take mddev_lock()
      in the particular _store() functions that require it.
      Admittedly this is most, but it isn't all.
      
      This also allows us to remove special-case handling for new_dev_store
      (in md_attr_store).
      Signed-off-by: NNeilBrown <neilb@suse.de>
      6791875e
    • N
      md: move mddev_lock and related to md.h · 5c47daf6
      NeilBrown 提交于
      The one which is not inline (mddev_unlock) gets EXPORTed.
      
      This makes the locking available to personality modules so that it
      doesn't have to be imposed upon them.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5c47daf6
    • N
      md: use mddev->lock to protect updates to resync_{min,max}. · 23da422b
      NeilBrown 提交于
      There are interdependencies between these two sysfs attributes
      and whether a resync is currently running.
      
      Rather than depending on reconfig_mutex to ensure no races when
      testing these interdependencies are met, use the spinlock.
      This will allow the mutex to be remove from protecting this
      code in a subsequent patch.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      23da422b
    • N
      md: minor cleanup in safe_delay_store. · 1b30e66f
      NeilBrown 提交于
      There isn't really much room for races with ->safemode_delay.
      But as I am trying to clean up any racy code and will soon
      be removing reconfig_mutex protection from most _store()
      functions:
       - only set mddev->safemode_delay once, to ensure no code
         can see an intermediate value
       - use safemode_timer to call md_safemode_timeout() rather than
         calling it directly, to ensure it never races with itself.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1b30e66f
    • N
      md: move GET_BITMAP_FILE ioctl out from mddev_lock. · 4af1a041
      NeilBrown 提交于
      It makes more sense to report bitmap_info->file, rather than
      bitmap->file (the later is only available once the array is
      active).
      
      With that change, use mddev->lock to protect bitmap_info being
      set to NULL, and we can call get_bitmap_file() without taking
      the mutex.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      4af1a041
    • N
      md: tidy up set_bitmap_file · 1e594bb2
      NeilBrown 提交于
      1/ delay setting mddev->bitmap_info.file until 'f' looks
         usable, so we don't have to unset it.
      2/ Don't allow bitmap file to be set if bitmap_info.file
         is already set.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      1e594bb2
    • N
      md: remove unnecessary 'buf' from get_bitmap_file. · f4ad3d38
      NeilBrown 提交于
      'buf' is only used because d_path fills from the end of the
      buffer instead of from the start.
      We don't need a separate buf to handle that, we just need to use
      memmove() to move the string to the start.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f4ad3d38
    • N
      md: remove mddev_lock from rdev_attr_show() · 758bfc8a
      NeilBrown 提交于
      No rdev attributes need locking for 'show', though
      state_show() might benefit from ensuring it sees a
      consistent set of flags.
      
      None even use rdev->mddev, so testing for it isn't really
      needed and it certainly doesn't need to be held constant.
      
      So improve state_show() and remove the locking.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      758bfc8a
    • N
      md: remove mddev_lock() from md_attr_show() · b7b17c9b
      NeilBrown 提交于
      Most attributes can be read safely without any locking.
      A race might lead to a slightly out-dated value, but nothing wrong.
      
      We already have locking in some places where needed.
      All that remains is can_clear_show(), behind_writes_used_show()
      and action_show() which are easily fixed.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      b7b17c9b
    • N
      md/raid5: use ->lock to protect accessing raid5 sysfs attributes. · 7b1485ba
      NeilBrown 提交于
      It is important that mddev->private isn't freed while
      a sysfs attribute function is accessing it.
      
      So use mddev->lock to protect the setting of ->private to NULL, and
      take that lock when checking ->private for NULL and de-referencing it
      in the sysfs access functions.
      
      This only applies to the read ('show') side of access.  Write
      access will be handled separately.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7b1485ba
    • N
      md: remove need for mddev_lock() in md_seq_show() · f97fcad3
      NeilBrown 提交于
      The only access in md_seq_show that could suffer from races
      not protected by ->lock is walking the rdev list.
      This can receive sufficient protection from 'rcu'.
      
      So use rdev_for_each_rcu() and get rid of mddev_lock().
      
      Now reading /proc/mdstat will never block in md_seq_show.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      f97fcad3
    • N
      md/bitmap: protect clearing of ->bitmap by mddev->lock · 978a7a47
      NeilBrown 提交于
      This makes it safe to inspect the struct while holding only
      the spinlock.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      978a7a47
  2. 04 2月, 2015 14 次提交
    • N
      md: protect ->pers changes with mddev->lock · 36d091f4
      NeilBrown 提交于
      ->pers is already protected by ->reconfig_mutex, and
      cannot possibly change when there are threads running or
      outstanding IO.
      
      However there are some places where we access ->pers
      not in a thread or IO context, and where ->reconfig_mutex
      is unnecessarily heavy-weight:  level_show and md_seq_show().
      
      So protect all changes, and those accesses, with ->lock.
      This is a step toward taking those accesses out from under
      reconfig_mutex.
      
      [Fixed missing "mddev->pers" -> "pers" conversion, thanks to
       Dan Carpenter <dan.carpenter@oracle.com>]
      Signed-off-by: NNeilBrown <neilb@suse.de>
      36d091f4
    • N
      md: level_store: group all important changes into one place. · db721d32
      NeilBrown 提交于
      Gather all the changes that can happen atomically and might
      be relevant to other code into one place.  This will
      make it easier to refine the locking.
      
      Note that this puts quite a few things between mddev_detach()
      and ->free().  Enabling this was the point of some recent patches.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      db721d32
    • N
      md: rename ->stop to ->free · afa0f557
      NeilBrown 提交于
      Now that the ->stop function only frees the private data,
      rename is accordingly.
      
      Also pass in the private pointer as an arg rather than using
      mddev->private.  This flexibility will be useful in level_store().
      
      Finally, don't clear ->private.  It doesn't make sense to clear
      it seeing that isn't what we free, and it is no longer necessary
      to clear ->private (it was some time ago before  ->to_remove was
      introduced).
      
      Setting ->to_remove in ->free() is a bit of a wart, but not a
      big problem at the moment.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      afa0f557
    • N
      md: split detach operation out from ->stop. · 5aa61f42
      NeilBrown 提交于
      Each md personality has a 'stop' operation which does two
      things:
       1/ it finalizes some aspects of the array to ensure nothing
          is accessing the ->private data
       2/ it frees the ->private data.
      
      All the steps in '1' can apply to all arrays and so can be
      performed in common code.
      
      This is useful as in the case where we change the personality which
      manages an array (in level_store()), it would be helpful to do
      step 1 early, and step 2 later.
      
      So split the 'step 1' functionality out into a new mddev_detach().
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5aa61f42
    • N
      md/linear: remove rcu protections in favour of suspend/resume · 3be260cc
      NeilBrown 提交于
      The use of 'rcu' to protect accesses to ->private_data so that
      the ->private_data could be updated predates the introduction
      of mddev_suspend/mddev_resume.
      These are a cleaner mechanism for providing stability while
      swapping in a new ->private data - it is used by level_store()
      to support changing of raid levels.
      
      So get rid of the RCU stuff and just use mddev_suspend, mddev_resume.
      
      As these function call ->quiesce(), we add an empty function for
      linear just like for raid0.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      3be260cc
    • N
      md: make merge_bvec_fn more robust in face of personality changes. · 64590f45
      NeilBrown 提交于
      There is no locking around calls to merge_bvec_fn(), so
      it is possible that calls which coincide with a level (or personality)
      change could go wrong.
      
      So create a central dispatch point for these functions and use
      rcu_read_lock().
      If the array is suspended, reject any merge that can be rejected.
      If not, we know it is safe to call the function.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      64590f45
    • N
      md: make ->congested robust against personality changes. · 5c675f83
      NeilBrown 提交于
      There is currently no locking around calls to the 'congested'
      bdi function.  If called at an awkward time while an array is
      being converted from one level (or personality) to another, there
      is a tiny chance of running code in an unreferenced module etc.
      
      So add a 'congested' function to the md_personality operations
      structure, and call it with appropriate locking from a central
      'mddev_congested'.
      
      When the array personality is changing the array will be 'suspended'
      so no IO is processed.
      If mddev_congested detects this, it simply reports that the
      array is congested, which is a safe guess.
      As mddev_suspend calls synchronize_rcu(), mddev_congested can
      avoid races by included the whole call inside an rcu_read_lock()
      region.
      This require that the congested functions for all subordinate devices
      can be run under rcu_lock.  Fortunately this is the case.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5c675f83
    • N
      md: rename mddev->write_lock to mddev->lock · 85572d7c
      NeilBrown 提交于
      This lock is used for (slightly) more than helping with writing
      superblocks, and it will soon be extended further.  So the
      name is inappropriate.
      
      Also, the _irq variant hasn't been needed since 2.6.37 as it is
      never taking from interrupt or bh context.
      
      So:
        -rename write_lock to lock
        -document what it protects
        -remove _irq ... except in md_flush_request() as there
           is no wait_event_lock() (with no _irq).  This can be
           cleaned up after appropriate changes to wait.h.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      85572d7c
    • N
      md/raid5: need_this_block: tidy/fix last condition. · ea664c82
      NeilBrown 提交于
      That last condition is unclear and over cautious.
      
      There are two related issues here.
      
      If a partial write is destined for a missing device, then
      either RMW or RCW can work.  We must read all the available
      block.  Only then can the missing blocks be calculated, and
      then the parity update performed.
      
      If RMW is not an option, then there is a complication even
      without partial writes.  If we would need to read a missing
      device to perform the reconstruction, then we must first read every
      block so the missing device data can be computed.
      This is the case for RAID6 (Which currently does not support
      RMW) and for times when we don't trust the parity (after a crash)
      and so are in the process of resyncing it.
      
      So make these two cases more clear and separate, and perform
      the relevant tests more  thoroughly.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ea664c82
    • N
      md/raid5: need_this_block: start simplifying the last two conditions. · a9d56950
      NeilBrown 提交于
      Both the last two cases are only relevant if something has failed and
      something needs to be written (but not over-written), and if it is OK
      to pre-read blocks at this point.  So factor out those tests and
      explain them.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a9d56950
    • N
      md/raid5: separate out the easy conditions in need_this_block. · a79cfe12
      NeilBrown 提交于
      Some of the conditions in need_this_block have very straight
      forward motivation.  Separate those out and document them.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      a79cfe12
    • N
      md/raid5: separate large if clause out of fetch_block(). · 2c58f06e
      NeilBrown 提交于
      fetch_block() has a very large and hard to read 'if' condition.
      
      Separate it into its own function so that it can be
      made more readable.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      2c58f06e
    • J
      md: do_release_stripe(): No need to call md_wakeup_thread() twice · ad3ab8b6
      Jes Sorensen 提交于
      67f45548 introduced a call to
      md_wakeup_thread() when adding to the delayed_list. However the md
      thread is woken up unconditionally just below.
      
      Remove the unnecessary wakeup call.
      Signed-off-by: NJes Sorensen <Jes.Sorensen@redhat.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      ad3ab8b6
    • J
      x86/raid6: correctly check for assembler capabilities · 75aaf4c3
      Jan Beulich 提交于
      Just like for AVX2 (which simply needs an #if -> #ifdef conversion),
      SSSE3 assembler support should be checked for before using it.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Cc: Jim Kukunas <james.t.kukunas@linux.intel.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      75aaf4c3
  3. 02 2月, 2015 2 次提交
  4. 28 1月, 2015 5 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 59343cd7
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Don't OOPS on socket AIO, from Christoph Hellwig.
      
       2) Scheduled scans should be aborted upon RFKILL, from Emmanuel
          Grumbach.
      
       3) Fix sleep in atomic context in kvaser_usb, from Ahmed S Darwish.
      
       4) Fix RCU locking across copy_to_user() in bpf code, from Alexei
          Starovoitov.
      
       5) Lots of crash, memory leak, short TX packet et al bug fixes in
          sh_eth from Ben Hutchings.
      
       6) Fix memory corruption in SCTP wrt.  INIT collitions, from Daniel
          Borkmann.
      
       7) Fix return value logic for poll handlers in netxen, enic, and bnx2x.
          From Eric Dumazet and Govindarajulu Varadarajan.
      
       8) Header length calculation fix in mac80211 from Fred Chou.
      
       9) mv643xx_eth doesn't handle highmem correctly in non-TSO code paths.
          From Ezequiel Garcia.
      
      10) udp_diag has bogus logic in it's hash chain skipping, copy same fix
          tcp diag used.  From Herbert Xu.
      
      11) amd-xgbe programs wrong rx flow control register, from Thomas
          Lendacky.
      
      12) Fix race leading to use after free in ping receive path, from Subash
          Abhinov Kasiviswanathan.
      
      13) Cache redirect routes otherwise we can get a heavy backlog of rcu
          jobs liberating DST_NOCACHE entries.  From Hannes Frederic Sowa.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
        net: don't OOPS on socket aio
        stmmac: prevent probe drivers to crash kernel
        bnx2x: fix napi poll return value for repoll
        ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too
        sh_eth: Fix DMA-API usage for RX buffers
        sh_eth: Check for DMA mapping errors on transmit
        sh_eth: Ensure DMA engines are stopped before freeing buffers
        sh_eth: Remove RX overflow log messages
        ping: Fix race in free in receive path
        udp_diag: Fix socket skipping within chain
        can: kvaser_usb: Fix state handling upon BUS_ERROR events
        can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT
        can: kvaser_usb: Send correct context to URB completion
        can: kvaser_usb: Do not sleep in atomic context
        ipv4: try to cache dst_entries which would cause a redirect
        samples: bpf: relax test_maps check
        bpf: rcu lock must not be held when calling copy_to_user()
        net: sctp: fix slab corruption from use after free on INIT collisions
        net: mv643xx_eth: Fix highmem support in non-TSO egress path
        sh_eth: Fix serialisation of interrupt disable with interrupt & NAPI handlers
        ...
      59343cd7
    • C
      net: don't OOPS on socket aio · 06539d30
      Christoph Hellwig 提交于
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06539d30
    • A
      stmmac: prevent probe drivers to crash kernel · 9afec6ef
      Andy Shevchenko 提交于
      In the case when alloc_netdev fails we return NULL to a caller. But there is no
      check for NULL in the probe drivers. This patch changes NULL to an error
      pointer. The function description is amended to reflect what we may get
      returned.
      Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9afec6ef
    • L
      Merge tag 'powerpc-3.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux · 7da323bb
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
       "Two powerpc fixes"
      
      * tag 'powerpc-3.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux:
        powerpc/powernv: Restore LPCR with LPCR_PECE1 cleared
        powerpc/xmon: Fix another endiannes issue in RTAS call from xmon
      7da323bb
    • L
      Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux · 41592e2f
      Linus Torvalds 提交于
      Pull one more module fix from Rusty Russell:
       "SCSI was using module_refcount() to figure out when the module was
        unloading: this broke with new atomic refcounting.  The code is still
        suspicious, but this solves the WARN_ON()"
      
      * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
        scsi: always increment reference count
      41592e2f
  5. 27 1月, 2015 7 次提交
    • G
      bnx2x: fix napi poll return value for repoll · 24e579c8
      Govindarajulu Varadarajan 提交于
      With the commit d75b1ade ("net: less interrupt masking in NAPI") napi
      repoll is done only when work_done == budget. When in busy_poll is we return 0
      in napi_poll. We should return budget.
      Signed-off-by: NGovindarajulu Varadarajan <_govind@gmx.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      24e579c8
    • D
      Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec · bf693f7b
      David S. Miller 提交于
      Steffen Klassert says:
      
      ====================
      ipsec 2015-01-26
      
      Just two small fixes for _decode_session6() where we
      might decode to wrong header information in some rare
      situations.
      
      Please pull or let me know if there are problems.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bf693f7b
    • H
      ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too · 6e9e16e6
      Hannes Frederic Sowa 提交于
      Lubomir Rintel reported that during replacing a route the interface
      reference counter isn't correctly decremented.
      
      To quote bug <https://bugzilla.kernel.org/show_bug.cgi?id=91941>:
      | [root@rhel7-5 lkundrak]# sh -x lal
      | + ip link add dev0 type dummy
      | + ip link set dev0 up
      | + ip link add dev1 type dummy
      | + ip link set dev1 up
      | + ip addr add 2001:db8:8086::2/64 dev dev0
      | + ip route add 2001:db8:8086::/48 dev dev0 proto static metric 20
      | + ip route add 2001:db8:8088::/48 dev dev1 proto static metric 10
      | + ip route replace 2001:db8:8086::/48 dev dev1 proto static metric 20
      | + ip link del dev0 type dummy
      | Message from syslogd@rhel7-5 at Jan 23 10:54:41 ...
      |  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
      |
      | Message from syslogd@rhel7-5 at Jan 23 10:54:51 ...
      |  kernel:unregister_netdevice: waiting for dev0 to become free. Usage count = 2
      
      During replacement of a rt6_info we must walk all parent nodes and check
      if the to be replaced rt6_info got propagated. If so, replace it with
      an alive one.
      
      Fixes: 4a287eba ("IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag")
      Reported-by: NLubomir Rintel <lkundrak@v3.sk>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Tested-by: NLubomir Rintel <lkundrak@v3.sk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e9e16e6
    • D
      Merge branch 'sh_eth' · 22577609
      David S. Miller 提交于
      Ben Hutchings says:
      
      ====================
      Fixes for sh_eth #3
      
      I'm continuing review and testing of Ethernet support on the R-Car H2
      chip.  This series fixes the last of the more serious issues I've found.
      
      These are not tested on any of the other supported chips.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22577609
    • B
      sh_eth: Fix DMA-API usage for RX buffers · 52b9fa36
      Ben Hutchings 提交于
      - Use the return value of dma_map_single(), rather than calling
        virt_to_page() separately
      - Check for mapping failue
      - Call dma_unmap_single() rather than dma_sync_single_for_cpu()
      Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52b9fa36
    • B
      sh_eth: Check for DMA mapping errors on transmit · aa3933b8
      Ben Hutchings 提交于
      dma_map_single() may fail if an IOMMU or swiotlb is in use, so
      we need to check for this.
      Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      aa3933b8
    • B
      sh_eth: Ensure DMA engines are stopped before freeing buffers · 740c7f31
      Ben Hutchings 提交于
      Currently we try to clear EDRRR and EDTRR and immediately continue to
      free buffers.  This is unsafe because:
      
      - In general, register writes are not serialised with DMA, so we still
        have to wait for DMA to complete somehow
      - The R8A7790 (R-Car H2) manual states that the TX running flag cannot
        be cleared by writing to EDTRR
      - The same manual states that clearing the RX running flag only stops
        RX DMA at the next packet boundary
      
      I applied this patch to the driver to detect DMA writes to freed
      buffers:
      
      > --- a/drivers/net/ethernet/renesas/sh_eth.c
      > +++ b/drivers/net/ethernet/renesas/sh_eth.c
      > @@ -1098,7 +1098,14 @@ static void sh_eth_ring_free(struct net_device *ndev)
      >  	/* Free Rx skb ringbuffer */
      >  	if (mdp->rx_skbuff) {
      >  		for (i = 0; i < mdp->num_rx_ring; i++)
      > +			memcpy(mdp->rx_skbuff[i]->data,
      > +			       "Hello, world", 12);
      > +		msleep(100);
      > +		for (i = 0; i < mdp->num_rx_ring; i++) {
      > +			WARN_ON(memcmp(mdp->rx_skbuff[i]->data,
      > +				       "Hello, world", 12));
      >  			dev_kfree_skb(mdp->rx_skbuff[i]);
      > +		}
      >  	}
      >  	kfree(mdp->rx_skbuff);
      >  	mdp->rx_skbuff = NULL;
      
      then ran the loop:
      
          while ethtool -G eth0 rx 128 ; ethtool -G eth0 rx 64; do echo -n .; done
      
      and 'ping -f' toward the sh_eth port from another machine.  The
      warning fired several times a minute.
      
      To fix these issues:
      
      - Deactivate all TX descriptors rather than writing to EDTRR
      - As there seems to be no way of telling when RX DMA is stopped,
        perform a soft reset to ensure that both DMA enginess are stopped
      - To reduce the possibility of the reset truncating a transmitted
        frame, disable egress and wait a reasonable time to reach a
        packet boundary before resetting
      - Update statistics before resetting
      
      (The 'reasonable time' does not allow for CS/CD in half-duplex
      mode, but half-duplex no longer seems reasonable!)
      Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      740c7f31